Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU provisioning on AWS - specifically 48G NVidia L40S, 80G H100 - and compare to 48G RTX-A6000 #14

Open
obriensystems opened this issue Mar 9, 2024 · 1 comment
Assignees

Comments

@obriensystems
Copy link
Member

obriensystems commented Mar 9, 2024

Use your AWS Credits from the ARRC program
image

GPU provisioning on AWS - specifically 48G NVidia L40S, 80G H100, 141G H200 - and compare to 48G RTX-A6000

https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html

P5 - 8 x H100

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/p5-instances-started.html
using the aws DL ubuntu image
https://aws.amazon.com/releasenotes/aws-deep-learning-base-gpu-ami-ubuntu-20-04/

image
https://shop.lambdalabs.com/deep-learning/servers/blade/customize?_gl=1*aktggw*_ga*MTQzODY0NTQ2OC4xNzEwMDI3MTcz*_ga_43EZT1FM6Q*MTcxMDAyNzE3Mi4xLjAuMTcxMDAyNzE3Mi42MC4wLjA.

P5.48xlarge at $98 US/hr
image

Instance launch failed
We currently do not have sufficient p5.48xlarge capacity in zones with support for 'gp3' volumes. Our system will be working on provisioning additional capacity.

P5e - 8 x H200 with 141G

G6 - L4 or G6e L40s

https://aws.amazon.com/blogs/machine-learning/introducing-three-new-nvidia-gpu-based-amazon-ec2-instances/

Quota increases

evaluating L40S, H100,H200 for model sizes from 40-80G
I am migrating from an on prem RTX-A6000 with 48G vram and 2 x RTX-A4500 with 40G vram

https://us-east-1.console.aws.amazon.com/servicequotas/home/services/ec2/quotas/L-417A185B

0 to 96 for Request quota increase: Running On-Demand P instances
0 to 96 for Request quota increase: Running On-Demand DL instances

https://support.console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase&limitType=service-code-ec2-instances&serviceLimitIncreaseType=ec2-instances&type=service_limit_increase

Tested on another account where I do have 692 for DL and 96 for P instances
same result - no capacity

We currently do not have sufficient p4d.24xlarge capacity in zones with support for 'gp3' volumes. Our system will be working on provisioning additional capacity.
@obriensystems
Copy link
Member Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant