Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to increase shm size for ilab launcher script #721

Closed
relyt0925 opened this issue Aug 4, 2024 · 0 comments · Fixed by #722
Closed

Need to increase shm size for ilab launcher script #721

relyt0925 opened this issue Aug 4, 2024 · 0 comments · Fixed by #722

Comments

@relyt0925
Copy link
Contributor

Currently: the ilab wrapper script here: https://github.com/containers/ai-lab-recipes/blob/main/training/ilab-wrapper/ilab does not update the shm size to be 10 Gigs. It is noted in requirements for running deepspeed and vllm that a shm size of 10GB is necessary to run full scale model inferencing and training.

A related issue: vllm-project/vllm#1710 and just to note: the earlier scripts that launched vllm directly had a shm size of 10GB

relyt0925 added a commit to relyt0925/ai-lab-recipes that referenced this issue Aug 4, 2024
Include ILAB_GLOBAL_CONFIG, VLLM_LOGGING_LEVEL, and NCCL_DEBUG as environment variables when starting the ilab container. Also add shared memory size of 10G to enable vllm execution. Resolves: containers#721
relyt0925 added a commit to relyt0925/ai-lab-recipes that referenced this issue Aug 4, 2024
Include ILAB_GLOBAL_CONFIG, VLLM_LOGGING_LEVEL, and NCCL_DEBUG as environment variables when starting the ilab container. Also add shared memory size of 10G to enable vllm execution. Resolves: containers#721

Signed-off-by: Tyler Lisowski <[email protected]>
@rhatdan rhatdan closed this as completed in ea64b86 Aug 4, 2024
jhutar pushed a commit to jhutar/ai-lab-recipes that referenced this issue Aug 5, 2024
…r vllm to 10GB

Include ILAB_GLOBAL_CONFIG, VLLM_LOGGING_LEVEL, and NCCL_DEBUG as environment variables when starting the ilab container. Also add shared memory size of 10G to enable vllm execution. Resolves: containers#721

Signed-off-by: Tyler Lisowski <[email protected]>
jhutar pushed a commit to jhutar/ai-lab-recipes that referenced this issue Aug 5, 2024
…r vllm to 10GB

Include ILAB_GLOBAL_CONFIG, VLLM_LOGGING_LEVEL, and NCCL_DEBUG as environment variables when starting the ilab container. Also add shared memory size of 10G to enable vllm execution. Resolves: containers#721

Signed-off-by: Tyler Lisowski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant