Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] All Anvil models needs updating #118

Open
fbaig opened this issue May 15, 2024 · 5 comments
Open

[Bug] All Anvil models needs updating #118

fbaig opened this issue May 15, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@fbaig
Copy link
Contributor

fbaig commented May 15, 2024

Due to a backend change on Anvil, all Anvil models, specifying mem_per_cpu option will results in following (similar) error

srun: fatal: cpus_per_task set by two different environment variables SLURM_CPUS_PER_TASK=2 != SLURM_TRES_PER_TASK=cpu:1

According to updated Anvil configurations, the memory is assigned automatically as 2GB/core, so mem_per_cpu option will be redundant and will cause issues on job submissions. Refer to following for further details https://github.com/I-GUIDE/container_images/issues/11

Possible Resolution

  • Identify all CyberGIS-Compute models which has an option to be executed on Anvil as HPC
  • Remove mem_per_cpu option from these models' manifest.json
@fbaig fbaig added the bug Something isn't working label May 15, 2024
@alexandermichels
Copy link
Member

Added an announcement to the UI to notify users:
image

@fbaig
Copy link
Contributor Author

fbaig commented May 15, 2024

Great, thanks. Is there a way to identify which models are configured to use Anvil on our end?

@alexandermichels
Copy link
Member

There isn't a good way, no. I guess the best approach would be to go through this page (https://cgjobsup.cigi.illinois.edu/v2/git) and control find "anvil_community" then create an issue on each repo?

@fbaig
Copy link
Contributor Author

fbaig commented May 15, 2024

The proposed solution mentioned above will only work for models using Anvil ONLY. If a model supports more than one HPC, removing mem_per_cpu option altogether may result in unexpected behavior on non-Anvil HPCs.

Is there a way to provide conditional configurations in manifest.json? If not, I think it may be easier to update cybergis-compute-core to ignore this parameter when submitting jobs to Anvil.

@alexandermichels
Copy link
Member

I don't think it's a big deal. Globus on Keeling is broken, I don't think we currently have credits on Bridges or Expanse, and ACES requires per-user approval, so Anvil is the main Hpc being used currently.

We don't currently have a way to remove configs on a per HPC basis, only to add them. If Anvil can't fix this issue and modifying the manifests won't work, I can hack together a patch tomorrow and put it on production, but a longer-term solution might take a while because the code isn't set up for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants