Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Agent Integration #29

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mmoreiradj
Copy link

Node Agent Integration

gRPC Worklod Service Server Integration

Add a wrapper for containerd-client lib (and ctr cli for creation) to create a container and run a workload.

The wrapper uses gRPC to communicate to the containerd socket. It reconnects to the socket before each request. This is done to avoid potential issues with resource management, isolation, and security and simplicity. It is also more fault-tolerant.

This will however mean more resource usage, overhead, latency, potential socket exhaustion, etc… This is a tradeoff we are willing to make for now, since these issues are not likely to be a problem in the near future.

The ctr cli was used to create the container because the containerd-client we did not manage to create a container with the containerd-client lib. It runs the command ctr run -d <image> <container-id>.

The containerd-client lib was also used to kill the container. We did however have an issue concerning an ambiguous type efinition. Indeed, the Workload Signal enum is ambiguous. It is not clear what signal STOP represents. For now, we made the assumption that stop is SIGINT (2). Both Stop (SIGINT) and Kill (SIGKILL) signals are sent to the workload, then the workload is cleaned up.

Note: It seems @Kuruyia has managed to prepare a rootfs and run a container, you can find it here.

But to be honest, going with runc or a container runtime made in rust might be an easier way to achieve our goals.

gRPC Lifecycle Service Client and Status Update Service Client Integration

When the program starts, it connects to the scheduler and registers itself to it. If it fails, it will retry every 5 seconds. If it fails 3 times (can be modified in args or env, see node-agent/src/args.rs), it will exit. It also starts the gRPC server.

If any of the two processes fail (gRPC of lifecycle), the server will shut down.

Note: We would be glad to know how you think we managed the application's lifecycle. It was a first for us, and it feels clunky.

Cargo Workspaces

We added a Cargo.toml file at the root of the project to manage the dependencies of the different crates. Building, formatting and linting will now be easier.

However, we had an issue when building the node-agent. When you build the binary from the root of the project, a request to the containerd socket fails, saying it didn't find it. We did not manage to fix this issue. We did however manage to make it work only if build from the node-agent directory. We are not sure why this is happening.

Misc

Add ide directory to gitignore.

What's next

Use the network team's CNI implementation.

@sameo we split the commits for convenience. Note that two thirds of the lines we added is a lock file.

@noetarbouriech

Martin Moreira de Jesus and others added 8 commits September 1, 2023 15:56
Add Cargo.toml file at the root of the project to manage the dependencies of the different crates. Building, formatting and linting will now be easier.

Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
Add a wrapper for containerd-client lib (and ctr cli for creation) to create a container and run a workload.

This will however mean more resource usage, overhead, latency, potential socket exhaustion, etc.. This is a tradeoff we are willing to make for now, since these issues are not likely to be a problem in the near future.

The ctr cli was used to create the container because the containerd-client we did not manage to create a container with the containerd-client lib. It runs the command ctr run -d <image> <container-id>.

**Note**: It seems Kuruyia has managed to prepare a rootfs and run a container, you can find it [here](https://github.com/Martin-Moreira-de-jesus/orka/blob/tmp/containerd-create-workload/node-agent/src/workload_manager/container/client.rs).

Also note that for some reason if you dont build the node-agent binary from the folder node-agent, the containerd socket cannot be found for some reason.

Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
The `containerd-client` is used to kill the container. We did however have an issue concerning an ambiguous type definition. Indeed, the Workload Signal enum is ambiguous. It is not clear what signal STOP represents. For now we made the assumption that stop is SIGINT (2). Both Stop (SIGINT) and Kill (SIGKILL) signals are sent to the workload, then the workload is cleaned up.

Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
Add method to connect to cluster and stream node status. Retries  in case of the connection closing.

Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
add cni and format code

Signed-off-by: Martin Moreira de Jesus <[email protected]>
Signed-off-by: Noé Tarbouriech <[email protected]>
Co-Authored-By: Noé Tarbouriech <[email protected]>
@mmoreiradj
Copy link
Author

Rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant