Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add slurm prometheus exporter to snapcraft.yaml recipe #9

Merged
merged 3 commits into from
Jun 4, 2024

Conversation

NucciTheBoss
Copy link
Member

This pull request adds the Slurm prometheus exporter into the snap image for Slurm. The part clones the exporter from GitHub and builds using swig to bind to slurm/slurm.h. I also added a simple daemon for the exporter that can be started once all the other Slurm services have been started on the cluster.

What still needs to be done

I need to add integration tests for the snap package in general. Testing can be done manually using a local LXD installation, but it would be great to have something that just sets up a minicluster and ensures that all the snap services + prometheus exporter are working. I will add this in a subsequent pull request; I've done something similar before here: https://nuccitheboss.github.io/cleantest/tutorials/using-a-mini-hpc-cluster/

Misc

  1. Redefined SLURM_CONF as global environment variable rather than having it scoped individually to each command. Functionally it's similar to how it was defined before, but now if we need to update the location of the slurm.conf configuration file, we only have to change on value rather than dozens!
  2. Bumped the max cyclomatic complexity to 15 from 10. This is the value that I usually set. This should fix our CI errors. Tbh I'm not totally in love with the current hooks implementation - it works - but I would have something that's much more straightforward like what's available with the ondemand snap.

`CPATH` must be set in build environment as the exporter
uses `swig` to bind the Golang code to slurm/slurm.h

Signed-off-by: Jason C. Nucciarone <[email protected]>
15 is the standard number used by HPC Engineering

Signed-off-by: Jason C. Nucciarone <[email protected]>
Better than dozens of individual declarations for each command.
If change needs to be made to the `slurm.conf` path, only one env
var needs to be updated instead of many.

Signed-off-by: Jason C. Nucciarone <[email protected]>
@NucciTheBoss NucciTheBoss added the enhancement New feature or request label Jun 4, 2024
@NucciTheBoss NucciTheBoss self-assigned this Jun 4, 2024
Copy link
Contributor

@jedel1043 jedel1043 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Really liked the simplification of all the SLURM_CONFs

@NucciTheBoss NucciTheBoss merged commit fe4d587 into main Jun 4, 2024
3 checks passed
@NucciTheBoss NucciTheBoss deleted the add-exporter branch June 11, 2024 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants