-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node Agent Integration #29
Open
mmoreiradj
wants to merge
8
commits into
dev-sys-do:main
Choose a base branch
from
mmoreiradj:feat/node-agent
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add Cargo.toml file at the root of the project to manage the dependencies of the different crates. Building, formatting and linting will now be easier. Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
Add a wrapper for containerd-client lib (and ctr cli for creation) to create a container and run a workload. This will however mean more resource usage, overhead, latency, potential socket exhaustion, etc.. This is a tradeoff we are willing to make for now, since these issues are not likely to be a problem in the near future. The ctr cli was used to create the container because the containerd-client we did not manage to create a container with the containerd-client lib. It runs the command ctr run -d <image> <container-id>. **Note**: It seems Kuruyia has managed to prepare a rootfs and run a container, you can find it [here](https://github.com/Martin-Moreira-de-jesus/orka/blob/tmp/containerd-create-workload/node-agent/src/workload_manager/container/client.rs). Also note that for some reason if you dont build the node-agent binary from the folder node-agent, the containerd socket cannot be found for some reason. Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
The `containerd-client` is used to kill the container. We did however have an issue concerning an ambiguous type definition. Indeed, the Workload Signal enum is ambiguous. It is not clear what signal STOP represents. For now we made the assumption that stop is SIGINT (2). Both Stop (SIGINT) and Kill (SIGKILL) signals are sent to the workload, then the workload is cleaned up. Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
Add method to connect to cluster and stream node status. Retries in case of the connection closing. Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
add cni and format code Signed-off-by: Martin Moreira de Jesus <[email protected]> Signed-off-by: Noé Tarbouriech <[email protected]> Co-Authored-By: Noé Tarbouriech <[email protected]>
mmoreiradj
force-pushed
the
feat/node-agent
branch
from
September 1, 2023 13:56
97a0389
to
cd28860
Compare
Rebased. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Node Agent Integration
gRPC Worklod Service Server Integration
Add a wrapper for
containerd-client
lib (andctr
cli for creation) to create a container and run a workload.The wrapper uses gRPC to communicate to the containerd socket. It reconnects to the socket before each request. This is done to avoid potential issues with resource management, isolation, and security and simplicity. It is also more fault-tolerant.
This will however mean more resource usage, overhead, latency, potential socket exhaustion, etc… This is a tradeoff we are willing to make for now, since these issues are not likely to be a problem in the near future.
The
ctr
cli was used to create the container because thecontainerd-client
we did not manage to create a container with thecontainerd-client
lib. It runs the commandctr run -d <image> <container-id>
.The
containerd-client
lib was also used to kill the container. We did however have an issue concerning an ambiguous type efinition. Indeed, the Workload Signal enum is ambiguous. It is not clear what signal STOP represents. For now, we made the assumption that stop is SIGINT (2). Both Stop (SIGINT) and Kill (SIGKILL) signals are sent to the workload, then the workload is cleaned up.Note: It seems @Kuruyia has managed to prepare a rootfs and run a container, you can find it here.
But to be honest, going with
runc
or a container runtime made in rust might be an easier way to achieve our goals.gRPC Lifecycle Service Client and Status Update Service Client Integration
When the program starts, it connects to the scheduler and registers itself to it. If it fails, it will retry every 5 seconds. If it fails 3 times (can be modified in args or env, see
node-agent/src/args.rs
), it will exit. It also starts the gRPC server.If any of the two processes fail (gRPC of lifecycle), the server will shut down.
Note: We would be glad to know how you think we managed the application's lifecycle. It was a first for us, and it feels clunky.
Cargo Workspaces
We added a
Cargo.toml
file at the root of the project to manage the dependencies of the different crates. Building, formatting and linting will now be easier.However, we had an issue when building the node-agent. When you build the binary from the root of the project, a request to the
containerd
socket fails, saying it didn't find it. We did not manage to fix this issue. We did however manage to make it work only if build from the node-agent directory. We are not sure why this is happening.Misc
Add ide directory to gitignore.
What's next
Use the network team's CNI implementation.
@sameo we split the commits for convenience. Note that two thirds of the lines we added is a lock file.
@noetarbouriech