pyprind with joblib #21

ajkl · 2016-02-22T22:21:49Z

Is it possible to use the callback from joblib parallel to make it work with pyprind for Parallel processing tasks ?

rasbt · 2016-02-23T05:56:40Z

Hi,
thanks for the suggestion/request, supporting joblib sounds like a useful feature. Personally, I haven't experimented with this combo, yet.

So, I could think of 2 possible scenarios here:

Having an outer for loop that runs multiple joblib processes iteratively and updates them like so

pbar = ProgBar(len(x))
for _ in x:
    # do something w. joblib in parallel
    pbar.update()

which would already work I guess.

Tracking the process inside joblib. Here, you are running multiple processes via joblib where each of them has a for-loop. The goal is to

def some_func():
    for _ in x:
        # so something
pbar = ProgBar(n)
# run multiple instances of some_func in parallel
# let all processes update the pbar

Is this what you have in mind? I think in theory this should be easily possible; all the processes would have to do is to call the update method I guess!? Would be nice if you have some example code that we could use to experiment a bit.

ajkl · 2016-02-23T06:46:01Z

The second option is what I was looking for but doesnt seem to work with your suggestion of letting all processes update pbar

from joblib import Parallel, delayed
import time
import pyprind
timesleep = 0.05
n=1000
bar = pyprind.ProgBar(n)
def foo(x):
    time.sleep(timesleep)
    bar.update()
    return x
Parallel(n_jobs=4, verbose=0)(delayed(foo)(i) for i in range(n))

rasbt · 2016-02-24T04:54:14Z

Hm, I think the problem is that the standard output is blocked during the computation which is why the pogressbar appears after everything has finished. I think this is something to investigate further after the "double progressbar" support had been added (see #18)

In any case, another problem is that multiprocessing created copies of the objects that are send to the different processors (in contrast to threading). So basically, there are 4 progressbars then that are running from 0% to 25% each if you use 4 processors.

Honest question: What's the advantage of joblib over multiprocessing? I saw it in certain libraries (e.g., scikit-learn) but never really understood why joblib instead of multiprocessing. E.g.,

from joblib import Parallel, delayed
import time
import pyprind

timesleep = 0.05
n = 1000
n_jobs = 4

bar = pyprind.ProgBar(n, stream=1)
def foo(x):
    time.sleep(timesleep)
    bar.update()
    return x

results = Parallel(n_jobs=n_jobs, 
                   verbose=0, 
                   backend="multiprocessing")(delayed(foo)(i) for i in range(n))

vs.

import multiprocessing as mp
pool = mp.Pool(processes=2)
results = [pool.apply(foo, args=(x,)) for x in range(n)]

ajkl · 2016-02-24T08:29:08Z

well i am kinda new to the python ecosystem and I recently came across joblib. I noticed sklearn is using it, so kinda assumed it must be solving some issues that multiprocessing might have. Honestly didnt evaluate the 2 yet.
I understand that multiprocessing is creating different object hence you always see 25% on the above example. Not sure if there is an easy solution around it. I dont want to waste your time since it is not that critical. Thanks for this great package!

rasbt · 2016-02-24T18:16:23Z

I understand that multiprocessing is creating different object hence you always see 25% on the above example. Not sure if there is an easy solution around it.

I think there could be a way around that ... but it'll require some tweaks. Btw. if you use the "threading" backend, it should give you the 100% correctly but the problem is still how to print to stdout while the processing are still running...

Not sure if there is an easy solution around it. I dont want to waste your time since it is not that critical. Thanks for this great package!

Unfortunately, there are too many things on my to do list, currently. But I will leave this issue open, maybe someone has a good idea how to implement it, or maybe there will be a boring weekend for me some day ... ;)

rasbt added the enhancement label Feb 24, 2016

rasbt closed this as completed Feb 24, 2016

rasbt reopened this Feb 24, 2016

rasbt mentioned this issue May 2, 2016

pyprind with theano, numpy, CUDA, etc. #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyprind with joblib #21

pyprind with joblib #21

ajkl commented Feb 22, 2016

rasbt commented Feb 23, 2016

ajkl commented Feb 23, 2016

rasbt commented Feb 24, 2016

ajkl commented Feb 24, 2016

rasbt commented Feb 24, 2016

pyprind with joblib #21

pyprind with joblib #21

Comments

ajkl commented Feb 22, 2016

rasbt commented Feb 23, 2016

ajkl commented Feb 23, 2016

rasbt commented Feb 24, 2016

ajkl commented Feb 24, 2016

rasbt commented Feb 24, 2016