Experience trying AuthType auth/slurm over auth/munge #11
Replies: 6 comments 7 replies
-
All about this! I'll give it a go and report back! |
Beta Was this translation helpful? Give feedback.
-
Hmm... I'm interested to know why MUNGE is still the recommended default authentication plugin, even after the introduction of In regards to info that needed to be exchanged between the login node (sackd) and the controller (slurmctld), is it just the following?
I could see us wanting to use The required changes to |
Beta Was this translation helpful? Give feedback.
-
My guess is MUNGE is more heavily tested just by virtue of having been around longer, so would be the recommended default until Can confirm only the slurmctld hostname:port and One comment is it was necessary to remove |
Beta Was this translation helpful? Give feedback.
-
Edit: nope! It's the |
Beta Was this translation helpful? Give feedback.
-
Through our game of telephone with SchedMD at SC24 (thanks Quinn), it looks like MUNGE working with Not necessarily reassuring to hear that, but it seems that As for why MUNGE is still the recommended default, it's because SchedMD wants to provide a transition period for migrating from
Personally, I'm leaning towards option 2 as I'd rather help dogfood the new authentication mechanism, and we have to do less work later to support a migration path when MUNGE is deprecated for realz. Also, it's one less service that we have to manage on all of the Slurm nodes or applications that need to be able to talk to Slurm such as Open OnDemand. Thoughts @jamesbeedy @dsloanm? |
Beta Was this translation helpful? Give feedback.
-
Okay, judging from our discussion here, it looks like our best course of action to use
Thanks for the great discussion everyone! |
Beta Was this translation helpful? Give feedback.
-
Of interest, Slurm 23.11 contains a new, built-in plugin for creating and validating credentials, i.e. AuthType = auth/slurm. This is positioned as an alternative to auth/munge, although "MUNGE is currently the default and recommended option" by the Slurm developers.
Part of this new authentication mechanism is the daemon
sackd
, intended for use on login nodes not running a fullslurmd
to enable both authentication via auth/slurm and retrieval of Slurm configuration files fromslurmctld
when running configless (as Charmed-HPC does).As Slurm 23.11 is available in the Ubuntu 24.04 repos, including
sackd
, I set up a number of 24.04 containers in LXD with typical Slurm+MUNGE, switched over to auth/slurm, then created a newsackd
-enabled login node. No real issues were found -- I was impressed at how smoothly the switchover went, at least in this toy example.I've shared my experience/setup steps in the collapsible blocks below. MUNGE is still a dependency of the
slurm-client
,slurmctld
,slurmd
, etc. deb packages but I can see scope for that to change in future.Cluster Setup
LXD containers:
head-0 (10.212.98.197)
runningslurmctld
compute-0 (10.212.98.102)
,compute-1 (10.212.98.65)
,compute-2 (10.212.98.189)
compute nodes runningslurmd
ldap-0 (10.212.98.168)
providing a directory service for consistent user identities across the clusternfs-0 (10.212.98.109)
providing a shared NFS file systemThe full
/etc/slurm/slurm.conf
onhead-0
is below:Full slurm.conf
Switching to auth/slurm
On the host, generate
slurm.key
Push file to all nodes running
slurmctld
andslurmd
and set ownership/permissions:Log into
slurmctld
head nodeModify
slurm.conf
replacing:
with:
then restart
slurmctld
Log out from
head-0
to return to the host then restartslurmd
on all compute nodesStop
munge.service
and movemunge.key
to be certain new auth method is being usedLog into
head-0
as a normal user and submit a test jobCheck the output
Example output:
A job was successfully submitted and run. Success!
Creating a login node
Copy an existing compute node to a new container
login-0
Log in and replace
slurmd
withsackd
Create a
/etc/default/sackd
file pointing toslurmctld
hosthead-0(10.212.98.197)
.Note: this file does not exist by default but is referenced by the packaged
sackd.service
file. Attempting to startsackd
without creating/etc/default/sackd
fails withsackd.service: Referenced but unset environment variable evaluates to an empty string: SACKD_OPTIONS
.Enable and start
sackd
systemctl enable --now sackd
Check
sackd
has successfully pulledslurm.conf
fromhead-0
:Example output:
Confirm Slurm commands work
Example output:
Submit a test job as a normal user (
test.submit
created earlier onhead-0
is available onlogin-0
through the shared NFS file system)Check the output
Example output:
Success again!
Beta Was this translation helpful? Give feedback.
All reactions