-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/#869 - Investigate using Queue without lock #871
Fix/#869 - Investigate using Queue without lock #871
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about the orchestrator.cancel_job()
method.
If we remove the lock on the job dispatcher, we can be in a strange state:
Let's consider a job A. We can start canceling A by removing some subsequent jobs from the blocked_job
list. Then in parallel, A is enqueued to be executed. Then finally we continue canceling A trying to remove it from the jobs_to_run
queue (it is not in the queue anymore).
What do you think?
We keep the lock in the |
except Exception: # In case the last job of the queue has been removed. | ||
self._logger.warning(f"{job.id} is no longer in the list of jobs to run.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The except
here is wrong because in incase the last job has been removed, the job.id
in the warning will be the previous job id
I noticed you're focusing only on locks used in dispatcher, what about the ones in orchestrator like submit function? |
For some code blocks using a lock in the
Either case, the lock is needed there |
I believe we cannot remove the lock at all because when canceling a job or when failing subsequent jobs, we do instantiate a new queue
That means we cannot rely on the locking mechanism of a queue instance since we may have multiple instances. |
except Empty: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. We cannot log the job id!
However, I would still except all Exceptions, just to be sure that we keep running if any other exception is raised.
except Empty: | |
pass | |
except Exception: | |
pass |
After more investigation, I believe the mechanism when using First:
Second, all
Initializing a new Queue in However, we were still suffer from these problem with the previous implementation since the In conclusion, I propose to replace the |
Interesting findings!! Didn't think about that before! |
9810acc
to
5820909
Compare
5820909
to
9c05966
Compare
9c05966
to
22e2b98
Compare
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
22e2b98
to
82ef3d0
Compare
Accessing the |
Resolves #869