Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(slurm_ops)!: implement SackdManager for sackd service #55

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

NucciTheBoss
Copy link
Member

This PR is currently a draft because the apt charm library does not work on Noble based machines. Related bug report: canonical/operator-libs-linux#135.

Will do the version bump after apt charm library issues are fixed and the PR has been approved.

This PR introduces the SackdManager class which can be used in a downstream sackd-operator to manage the sackd service on machines.

BREAKING CHANGES: This PR drops the PPA ubuntu-hpc/slurm-wlm-23.02 as it doesn't provide a sackd package, and the Slurm version that we can pull from Universe on Noble is newer than what is provided by the PPA. The ubuntu-hpc/slurm-wlm-23.02 PPA also doesn't provide any packages that work on Noble.

Dropping this PPA means that you will now get an older version of Slurm on Jammy machines, but that isn't a huge issue as we're planning to push the Slurm charms to Noble anyway.

Misc.

Also this PR makes it such that prometheus-slurm-exporter is only installed on slurmctld machines since the slurmctld operator is the only Slurm charm that currently integrates with COS via the cos-agent relation. Helps minimize our install size just a little bit.

…ines

Currently the Slurm prometheus exporter is only activated on slurmctld machines,
so it doesn't make sense to install the exporter on all nodes. If other nodes need the
Slurm exporter, then we can add directly to the package list for that specific service type.

Signed-off-by: Jason C. Nucciarone <[email protected]>
BREAKING CHANGES: No longer uses the PPA `ubuntu-hpc/slurm-wlm-23.02` as it
 only supports Jammy, and it does not provide a `sackd` package. Instead,
 `_AptManager` now pulls the relevant Slurm packages from either
 `ubuntu-hpc/experimental` or `universe`.
 .
 Eventually we'll need to identify how we want to enable the latest and
 greatest Slurm version in deb format, but we'll boil that pot later.

Signed-off-by: Jason C. Nucciarone <[email protected]>
@NucciTheBoss NucciTheBoss added the enhancement New feature or request label Nov 20, 2024
@@ -908,6 +890,29 @@ def scontrol(*args) -> str:
return _call("scontrol", *args).stdout


class SackdManager(_SlurmManagerBase):
"""Manager for the Sackd service."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does SACKD_CONFIG_SERVER need documented here as SLURMD_CONFIG_SERVER is for SlurmdManager?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be documented! I'm going to clean up the docstrings while the work on enabling the apt charm library to handle deb822 formatted sources.

match self._service_name:
case "sackd":
packages.extend(["slurm-client"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "libpmix-dev" and "openmpi-bin" to let users build MPI codes and to keep the environment consistent with the compute nodes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey hey, libpmix-dev is a runtime dependency for slurmd and slurmctld. Openmpi-bin is added to compute nodes so they can run mpi workloads. I don't think either of these packages belong on the login node.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Definitely agreed MPI workloads shouldn't run on the login node. My thought process for having OpenMPI was to allow users to compile MPI binaries for use on the compute nodes since that's commonly done on login nodes (though it'd need "libopenmpi-dev" since "openmpi-bin" provides only mpicc, etc. compiler wrappers and not mpi.h).

Maybe worth revisiting in a future PR. Happy to keep this one focused on SackdManager and stick with "slurm-client".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we should have slurm-client be the only additional package on the login node.

When we put together some howtos for common HPC use cases such as how to submit jobs, that would be a great time to identify other packages we should be shipping on the nodes. Our software stack story is also still loosely defined, so it might not even be the sackd charm itself providing the packages folks will load and use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants