Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Dockerfile to CUDA 12.6 #56

Merged
merged 1 commit into from
Dec 31, 2024

Conversation

bkoszarsky
Copy link
Contributor

Speedup of 30 ms / step on 1xH100 -> ~5s total on 8xH100 to align between @YouJiacheng record and replication runs

@YouJiacheng
Copy link
Contributor

YouJiacheng commented Dec 19, 2024

Uh interesting, I actually use v2.6.0.dev20241203+cu124


Just updated to cu126, let's see.
(btw, now it's a bit tricky to install with uv astral-sh/uv#9651)

[project]
name = "modded-nanogpt"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "numpy>=2.1.3",
    "torch==2.6.0.dev20241203",
    "pytorch-triton>=3.2.0",
    "huggingface-hub>=0.26.2",
    "tqdm>=4.67.0",
]

[tool.uv]
environments = [
    "platform_system == 'Linux'",
]

[tool.uv.sources]
torch = [
    { index = "pytorch-nightly-cu126"},
]
pytorch-triton = [
    { index = "pytorch-nightly-cu126"},
]

[[tool.uv.index]]
name = "pytorch-nightly-cu126"
url = "https://download.pytorch.org/whl/nightly/cu126"
explicit = true

[[tool.uv.index]]
name = "pytorch-cu124"
url = "https://download.pytorch.org/whl/cu124"
explicit = true

@YouJiacheng
Copy link
Contributor

YouJiacheng commented Dec 19, 2024

I can observe a ~0.5s speedup on my environment. So I think we can get aligned results!!!
image

@tysam-code
Copy link
Contributor

Bumping, this seems useful to merge!

@KellerJordan KellerJordan merged commit 3fe9250 into KellerJordan:master Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants