-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent threading hangs with process pools #52
Comments
As I turned off concurrency in my project to debug further, I unconvered a segfault in the work that one of the workers would have been doing. Would a segfaulting worker cause the master to hang? |
Seems unlikely but without a way to test it myself, I can't say anything conclusive. Subprocess handling is really flaky on Python 2 and that is unfortunately unfixable to my best knowledge. |
I have a similar issue - my best theory is something around this: http://bugs.python.org/issue6721 If you have multiple threads (which you will do if you use multiprocessing, since there are queue feeder threads etc.) and one of these threads holds a lock when the process forks, the lock is never released on the child. In particular, this affects logging (but the multiprocessing has it's own logging which apparently is not affected), and also flushing stdout. |
I've made a small test case that displays this problem: https://gist.github.com/gromgull/3a2e343d50184a853fcf1dca5e690a6b This breaks on python 2.7.12 for me, maybe 25-50% of the time. |
One last comment: if I tweak that example slightly and make it use |
I'm not sure what you expect here. Forking with threads is a Really Really Bad Idea, and to the best of my knowledge, subprocess handling is permanently broken on (C)Python 2. |
I know threads and forks don't mix, and my test program above has no threads of its own. The link above just meant as a possible explanation - since the multiprocessing module uses threads internally (you can see this if you add a If the bottom line is that a robust |
Yes, that is certainly worth considering. |
I've mentioned this problem in the README. Good enough? |
It's an improvement, but I think @gromgull's suggestion of a warning would be ideal. If you want to stick with just the readme mention, I'd add a few more keywords like "hang" so search engines find it. Cheers! |
I've been chasing this for a year or so (across various versions of Python and futures—this time 2.7.6 and 3.0.3) and finally went through the rigamarole of settings up the Python gdb tools to get some decent tracebacks out of it. In short, during large jobs with thousands of tasks, execution sometimes hangs. It runs for about an hour, getting somewhere between 11-17% done in the current reproduction; conveniently, I have a progress bar. The variation makes me think it's some kind of timing bug. The CPU use slowly falls down to 0 as the worker processes complete and no new ones are scheduled to replace them. I end up with a process table like this:
The defunct processes are the workers. Adding
-L
, we can see the threads futures spins up to coordinate the work distribution:I don't know why there are only 3 of them, when my process pool is of size 4. Maybe that's a clue?
The Python traceback, from attaching with gdb and using its Python tools, looks like this:
Here's the calling code.
Here's the C traceback as well, in case it's helpful:
Let me know if I can supply any more information. I'm also not sure if this is more properly filed with upstream, as my codebase isn't Python 3 clean. Thank you!
The text was updated successfully, but these errors were encountered: