-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask Cluster stuck in the pending status and shutdown itself with Dask Gateway over the Slurm HPC Cluster #478
Comments
Can you get the log from the failed process? As far as I can tell, the printout only says that it terminated with a non-zero code. |
Hi Martin, you mean that slurmctld or slurmd log ? Where exactly can I view job logs ? |
I'm afraid I don't know where such a log would appear, perhaps your sysadmin would know. |
In the worker node when I view the logs I notice that some errors. Related logs in the below.
|
What happened: When I try to create cluster via dask gateway I getting error like below. When cluster created successfully ; I think it stucks in the pending status and shut down itself automatically. When I only use slurm command like sbatch I can verified that job successfully run over slurm cluster but when I try to generate job via dask gateway it automatically close itself after a few seconds.
dask_gateway_config.py
scontrol show job output
Environment:
The text was updated successfully, but these errors were encountered: