How to use joblib in Python for parallelism?

2

I was trying to use Thread of Python to parallelize my code but I ran into a problem: when I command to create the threads, the amount of them exceeds 1,000 Threads easily, which, from 140, all begin to give error. Searching a bit I found joblib , but I did not find any examples of how to use it with my functions ... For example, I want a function, created by me, that has 3 parameters, and this function is inside a for, which will be repeated thousands of times ...

repeticoes = 10000
for i in range(repeticoes):
    minha_funcao(data[i], top, param3)

Would I use it like this?

from joblib import Parallel, delayed
Parallel(n_jobs=4, verbose=1)(delayed(minha_funcao)(data[i], top, param3) for i in range(repeticoes))
    
asked by anonymous 12.08.2017 / 19:37

1 answer

1

Threads create a "parallel processing line" within your program - but they do not do magic: they all use features and do exactly what you want them to do.

Ensure that a given program does not create more than one optimum number of threads (depending on the nature of the program it may vary - a program that is computationally intensive and will not benefit from more than one thread per logical core of the your CPU (*) - while threads that rely on response times to network or peripheral requests can benefit from a larger number).

  • In pure Python it also does not solve many threads if the problem is processing - only one Python code thread is executed at a time, no matter how many physical cores you have, due to a feature in language implementation

In any case, a logic like: "If the total number of threads already running is greater than 'MAX_THREADS' (a number that you rated is optimal), just write down the task and run it on one of the threds already created when one of the running tasks is completed ". This logic is not so difficult to implement in a simple way - but it becomes complicated as we want maximum efficiency, and take care of all the "corner cases" and always do the "right thing".

Because of this, the default Python library (starting with version 3.2) implements objects known as "Futures" and "threadpools" - which implement this logic exactly. I suggest reading about "concurrent.futures" and implement your program using concurrent.futures.ThreadPoolExecutor . (And if your problem is intensive in processing, ProcessPoolExecutor - that way all processor cores can be used)

    
13.08.2017 / 06:01