Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how farm queue priorities work #54

Open
ctb opened this issue Mar 1, 2023 · 0 comments
Open

how farm queue priorities work #54

ctb opened this issue Mar 1, 2023 · 0 comments

Comments

@ctb
Copy link
Member

ctb commented Mar 1, 2023

hpc priority stuff

Question:

I have had a batch job submitted to bmm which farm has been refusing to start for roughly 12 hours now. As batch jobs go, its demands are not exorbitant (16 CPUs, 128 GB memory, 24 hours clock time). Is there some reason why I should be seeing farm effectively go on strike?

Answer:

looks like somebody from $OTHER_GROUP is consuming basically all of bmm with 32 core / 8G 20-30 day jobs 🙃

Question:

but shouldn’t we have precedence over them for our buy-in nodes?

Answer:

yes, that's what bmh is -- ctbrowngrp has a QoS that defines how many cores / RAM worth of buyin you've paid out, and bmh can suspend jobs on bmm

looks like ctbrowngrp has high priority access on 48 CPUs and 500G RAM on bmh, ie the bmX nodes
224 CPUs / 500G ram on high2
and several others
which is to say: jobs on the medium partitions cannot suspend other medium partition jobs; jobs on the high partitions suspend their corresponding medium or low partition jobs

not on bmm, you'd need to submit on bmh
on bmm you just have bumped priority
but if another lab has the same priority on that partition it'll be down to slurm's magic on job length / cpu cores / mem / throughput etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant