-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: sync agent's kernel-registry to actual container periodically #2179
Conversation
Your org has enabled the Graphite merge queue for merging into mainAdd the label “flow:merge-queue” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “flow:hotfix” to add to the merge queue as a hot fix. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
50ab7d0
to
964c9f3
Compare
6698e2a
to
11f42db
Compare
7ca0eb4
to
58002f0
Compare
11f42db
to
c1cd4fa
Compare
58002f0
to
09893dc
Compare
c1cd4fa
to
20a9b6c
Compare
09893dc
to
b351f26
Compare
Close this PR since the idea of it is not confirmed |
Intro
The status information of the container is divided into three types in BackendAI system: DB on the manager side, agent's kernel registry, and actual container. This PR is about agent's kernel registry and actual container.
Problem
In the current implementation, kernel data is inserted and removed from the agent's kernel registry in the task of creating and destroying containers. In the case of a container creation, when any unhandled error occurs, the kernel data inserted into the kernel registry is removed. Such removing is not reliable and any other unpredictable errors can cause mismatch between kernel registry and actual container state.
So, let's sync kernel registry to the actual container state in a periodic loop.
Checklist: (if applicable)