You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With an image-based deploy the current workflow for adding a node looks like:
Boot a new compute node. It will attempt to join the cluster, slurmctld will say it doesn't have a nodename entry, and slurmd will die.
Run the role on the ENTIRE cluster, so that:
new slurm.conf generated including the new node
slurmctld and ALL slurmd restarted (inc. the new, failed one) in the correct order
Item 2 is really noisy as all the compute nodes run all the ansible. It would be good if really we could just run the appropriate steps for these cases.
I think the cases covered are:
Adding nodes with an appropriate image
Deleting nodes
We probably could do something just using the configure tag, but this needs testing/documenting.
The text was updated successfully, but these errors were encountered:
With an image-based deploy the current workflow for adding a node looks like:
Item 2 is really noisy as all the compute nodes run all the ansible. It would be good if really we could just run the appropriate steps for these cases.
I think the cases covered are:
We probably could do something just using the
configure
tag, but this needs testing/documenting.The text was updated successfully, but these errors were encountered: