Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential deadlock when deleting model container replica queue #758

Open
robberlang opened this issue Dec 1, 2019 · 0 comments
Open

Potential deadlock when deleting model container replica queue #758

robberlang opened this issue Dec 1, 2019 · 0 comments

Comments

@robberlang
Copy link

A deadlock can occur when a model container replica is removed, crippling all communication between the frontend and all model containers. Any requests that are not in the cache will time out.

This happens as follows (with TaskExecutionThreadPool in src/libclipper/include/clipper/threadpool.hpp):
-interrupt_thread is called, sending an interrupt to the queue's worker thread
-delete_queue is called: it acquires a unique lock on queues_mutex_, invalidates the queue in question, then waits for the worker thread to finish
-the worker thread blocks to acquire a shared lock on queues_mutex_

The worker thread never acquires the shared lock on the mutex because the delete_queue function holds a unique lock on it and delete_queue never relinquishes because it is waiting for the thread to finish. Following this, no tasks can be sent to any queue because the submit function blocks to acquire a shared lock on queues_mutex but never can, again because delete_queue has a unique lock on it that it never relinquishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant