Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batched newtonschulz implementation #54

Open
wants to merge 506 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
506 commits
Select commit Hold shift + click to select a range
0d661bd
.
KellerJordan Nov 5, 2024
8fbfa55
.
KellerJordan Nov 5, 2024
81ded9f
.
KellerJordan Nov 5, 2024
e67296d
.
KellerJordan Nov 6, 2024
a6a86ab
.
KellerJordan Nov 6, 2024
1dcad7c
.
KellerJordan Nov 6, 2024
8bf3e08
.
KellerJordan Nov 6, 2024
cb851a1
.
KellerJordan Nov 6, 2024
8ff35a0
.
KellerJordan Nov 6, 2024
e46062d
.
KellerJordan Nov 6, 2024
912481e
.
KellerJordan Nov 6, 2024
71dafe7
.
KellerJordan Nov 6, 2024
0bee414
.
KellerJordan Nov 6, 2024
e3d0a8d
.
KellerJordan Nov 7, 2024
e01b457
.
KellerJordan Nov 7, 2024
319a23e
Update README.md
KellerJordan Nov 7, 2024
9946a2e
Update README.md
KellerJordan Nov 7, 2024
bc75dfa
Update README.md
KellerJordan Nov 8, 2024
453323e
Update README.md
KellerJordan Nov 8, 2024
a5622e5
Update README.md
KellerJordan Nov 8, 2024
42e26ee
Update README.md
KellerJordan Nov 8, 2024
d42297d
Update README.md
KellerJordan Nov 8, 2024
832b03f
Update README.md
KellerJordan Nov 8, 2024
7397995
Update README.md
KellerJordan Nov 8, 2024
802d76b
Update README.md
KellerJordan Nov 8, 2024
b18a911
Update README.md
KellerJordan Nov 8, 2024
b3aafcc
Update README.md
KellerJordan Nov 8, 2024
37b4787
Update README.md
KellerJordan Nov 8, 2024
0b80ac3
Update README.md
KellerJordan Nov 8, 2024
d2e3950
Update README.md
KellerJordan Nov 8, 2024
596917d
Update README.md
KellerJordan Nov 8, 2024
303e096
Update README.md
KellerJordan Nov 8, 2024
b0ad2c1
Update README.md
KellerJordan Nov 8, 2024
caf5c94
Update README.md
KellerJordan Nov 8, 2024
e21905f
Update README.md
KellerJordan Nov 8, 2024
0157a47
Update README.md
KellerJordan Nov 8, 2024
d22c770
Update README.md
KellerJordan Nov 8, 2024
0eb6d4b
Update README.md
KellerJordan Nov 8, 2024
e134368
Update README.md
KellerJordan Nov 8, 2024
8c2252f
.
KellerJordan Nov 9, 2024
a0dcbfd
.
KellerJordan Nov 9, 2024
c7bc6dc
.
KellerJordan Nov 9, 2024
8317279
Update README.md
KellerJordan Nov 9, 2024
1ea9c05
.
KellerJordan Nov 9, 2024
a8d7654
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 9, 2024
cd6b75e
.
KellerJordan Nov 9, 2024
088db86
Update README.md
KellerJordan Nov 9, 2024
7b2ca87
Update README.md
KellerJordan Nov 9, 2024
096e59f
Update README.md
KellerJordan Nov 9, 2024
a598325
Update README.md
KellerJordan Nov 9, 2024
61955b1
.
KellerJordan Nov 9, 2024
a86a2b5
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 9, 2024
fe8c862
.
KellerJordan Nov 9, 2024
4d772e7
.
KellerJordan Nov 9, 2024
ab0eb61
.
KellerJordan Nov 10, 2024
aa97945
.
KellerJordan Nov 10, 2024
c6ea6f3
Update README.md
KellerJordan Nov 10, 2024
ce12a1f
.
KellerJordan Nov 10, 2024
e2d099c
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 10, 2024
d7b1cd9
Update README.md
KellerJordan Nov 10, 2024
d52aafe
Update README.md
KellerJordan Nov 10, 2024
5364fa9
.
KellerJordan Nov 11, 2024
bcc607a
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 11, 2024
49e5bff
.
KellerJordan Nov 11, 2024
b473be6
.
KellerJordan Nov 11, 2024
6d050c5
.
KellerJordan Nov 11, 2024
d74dc46
.
KellerJordan Nov 11, 2024
a4b40a5
.
KellerJordan Nov 11, 2024
7c11b5e
.
KellerJordan Nov 11, 2024
b29a05a
Update README.md
KellerJordan Nov 11, 2024
78b1eee
.
KellerJordan Nov 11, 2024
2d58c1a
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 11, 2024
5cab023
.
KellerJordan Nov 11, 2024
3b15911
.
KellerJordan Nov 11, 2024
1f690d7
.
KellerJordan Nov 11, 2024
458a02d
.
KellerJordan Nov 11, 2024
599e345
.
KellerJordan Nov 11, 2024
917332c
Dockerfile and update train_gpt2.py to most recent record
bluecoconut Nov 12, 2024
0cca2ac
.
bluecoconut Nov 12, 2024
b3c41c7
actually, only do 1 change at once
bluecoconut Nov 12, 2024
ecbc001
replace iframe tag with image link to `Initial D - Deja vu`
dantetemplar Nov 13, 2024
2ba0e19
Merge pull request #25 from bluecoconut/master
KellerJordan Nov 13, 2024
4aedac9
Merge pull request #27 from dantetemplar/master
KellerJordan Nov 13, 2024
5c6f1ba
Update README.md
KellerJordan Nov 13, 2024
6972285
Update README.md
KellerJordan Nov 13, 2024
459bd85
Update README.md
KellerJordan Nov 13, 2024
8179d34
Update README.md
KellerJordan Nov 13, 2024
f01b7ec
Update README.md
KellerJordan Nov 13, 2024
f68ec76
Update README.md
KellerJordan Nov 13, 2024
e2f4af5
.
KellerJordan Nov 14, 2024
8523b88
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 14, 2024
1f94c1c
.
KellerJordan Nov 14, 2024
13e43e3
Update README.md
KellerJordan Nov 14, 2024
47da85a
Update README.md
KellerJordan Nov 19, 2024
3562d09
update with 11/10/24 record
KellerJordan Nov 20, 2024
494f816
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 20, 2024
5a5bd12
new 5-minute FlexAttention record by @KoszarskyB
KellerJordan Nov 20, 2024
93ba9fc
.
KellerJordan Nov 20, 2024
a86539c
.
KellerJordan Nov 20, 2024
59c8183
.
KellerJordan Nov 20, 2024
22862e7
.
KellerJordan Nov 20, 2024
abcf16b
.
KellerJordan Nov 20, 2024
2de23fb
.
KellerJordan Nov 20, 2024
0dce01c
.
KellerJordan Nov 20, 2024
0b9d3ca
.
KellerJordan Nov 20, 2024
585a62a
.
KellerJordan Nov 20, 2024
afa58b8
.
KellerJordan Nov 20, 2024
8e642c8
.
KellerJordan Nov 20, 2024
b8c0e58
.
KellerJordan Nov 20, 2024
a3ff237
.
KellerJordan Nov 20, 2024
4344f95
.
KellerJordan Nov 20, 2024
4f71b37
Update README.md
KellerJordan Nov 20, 2024
a82d12c
Update README.md
KellerJordan Nov 20, 2024
1a3594c
Update README.md
KellerJordan Nov 21, 2024
e1a1f87
Update README.md
KellerJordan Nov 21, 2024
17dafe7
Update README.md
KellerJordan Nov 21, 2024
cbc099d
Update README.md
KellerJordan Nov 21, 2024
f92118b
Update train_gpt2.py
KellerJordan Nov 21, 2024
e926fb9
Update README.md
KellerJordan Nov 21, 2024
bb720f2
Create README.md
KellerJordan Nov 21, 2024
7d8ae57
Update README.md
KellerJordan Nov 21, 2024
2e97eab
Update README.md
KellerJordan Nov 22, 2024
ff4f7d6
Update train_gpt2.py
KellerJordan Nov 22, 2024
184edb2
Update README.md
KellerJordan Nov 22, 2024
494cf75
Update README.md
KellerJordan Nov 22, 2024
42aab06
11/24/24 record
KellerJordan Nov 25, 2024
4e853f7
.
KellerJordan Nov 25, 2024
eb52e76
more runs
KellerJordan Nov 25, 2024
3e63ff7
.
KellerJordan Nov 25, 2024
779d11c
Update README.md
KellerJordan Nov 25, 2024
a61f737
Update README.md
KellerJordan Nov 25, 2024
9788090
Update README.md
KellerJordan Nov 25, 2024
3111ab0
Update README.md
KellerJordan Nov 25, 2024
87b34c8
Update README.md
KellerJordan Nov 25, 2024
ee7c9c4
Update README.md
KellerJordan Nov 25, 2024
92a5449
Update README.md
KellerJordan Nov 25, 2024
7a0a3ed
Update README.md
KellerJordan Nov 25, 2024
c0f4f26
Update train_gpt2.py
timlautk Nov 25, 2024
5dcb352
Update train_gpt2.py
timlautk Nov 25, 2024
badc5dc
Merge pull request #31 from timlautk/master
KellerJordan Nov 25, 2024
e6505cd
.
KellerJordan Nov 25, 2024
52acaaa
.
KellerJordan Nov 25, 2024
2685e27
.
KellerJordan Nov 25, 2024
469dc2f
.
KellerJordan Nov 25, 2024
5384d30
.
KellerJordan Nov 25, 2024
beb8368
.
KellerJordan Nov 25, 2024
2b24502
.
KellerJordan Nov 25, 2024
b51050a
Update README.md
KellerJordan Nov 25, 2024
013a194
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Nov 25, 2024
341c860
.
KellerJordan Nov 26, 2024
c857b1a
.
KellerJordan Nov 26, 2024
b9e4d52
.
KellerJordan Nov 27, 2024
9e35b93
Update train_gpt2.py
KellerJordan Nov 28, 2024
f85860f
.
KellerJordan Dec 1, 2024
17b8d68
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Dec 1, 2024
8d19c38
.
KellerJordan Dec 1, 2024
3295409
rotary refactor and leave cos/sin in fp32. fixes #39
KellerJordan Dec 1, 2024
e238001
fix
KellerJordan Dec 1, 2024
442343d
.
KellerJordan Dec 1, 2024
82a8665
.
KellerJordan Dec 1, 2024
5d248a7
.
KellerJordan Dec 4, 2024
6d515b8
.
KellerJordan Dec 4, 2024
5503b71
.
KellerJordan Dec 4, 2024
ce41ea4
.
KellerJordan Dec 4, 2024
4a97240
.
KellerJordan Dec 4, 2024
4876b46
Update README.md
KellerJordan Dec 4, 2024
e5ecb31
new KoszarskyB record
KellerJordan Dec 5, 2024
6b4b58e
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Dec 5, 2024
14358da
.
KellerJordan Dec 5, 2024
de12915
.
KellerJordan Dec 5, 2024
1e2d8c9
.
KellerJordan Dec 5, 2024
747ad62
.
KellerJordan Dec 5, 2024
a1cc63c
.
KellerJordan Dec 5, 2024
27ccb5e
.
KellerJordan Dec 5, 2024
f79eed9
.
KellerJordan Dec 5, 2024
4ee04e3
.
KellerJordan Dec 5, 2024
1ee7117
Update README.md
KellerJordan Dec 5, 2024
ffa5c1b
Update README.md
KellerJordan Dec 5, 2024
310578d
Update README.md
KellerJordan Dec 5, 2024
64b42b8
Update README.md
KellerJordan Dec 5, 2024
f378341
Update README.md
KellerJordan Dec 5, 2024
cbed62a
Update README.md
KellerJordan Dec 5, 2024
54cf7e2
Update README.md
KellerJordan Dec 5, 2024
d58a9d5
Update README.md
KellerJordan Dec 5, 2024
7b99dc9
Update README.md
KellerJordan Dec 5, 2024
2c34eac
Update README.md
KellerJordan Dec 5, 2024
2151036
Update README.md
KellerJordan Dec 5, 2024
8edae39
Fix peak memory usage not being saved to the logfile
tysam-code Dec 5, 2024
9730304
Merge pull request #42 from tysam-code/patch-1
KellerJordan Dec 6, 2024
11aba6c
Update train_gpt2.py
YouJiacheng Dec 6, 2024
74f35d0
Log numpy version as well as pytorch version
segyges Dec 7, 2024
bd8d0d5
Update README.md
KellerJordan Dec 9, 2024
5a4406d
Update README.md
KellerJordan Dec 9, 2024
5d8347a
Update README.md
KellerJordan Dec 9, 2024
51b9863
.
KellerJordan Dec 9, 2024
60bcf44
Update README.md
KellerJordan Dec 9, 2024
2fed60f
.
KellerJordan Dec 9, 2024
34967af
.
KellerJordan Dec 9, 2024
d60ecc7
.
KellerJordan Dec 9, 2024
f7a1760
.
KellerJordan Dec 9, 2024
45b5ab9
.
KellerJordan Dec 9, 2024
db53ec3
.
KellerJordan Dec 9, 2024
70c6da9
.
KellerJordan Dec 9, 2024
8e47ae3
.
KellerJordan Dec 9, 2024
cfff780
.
KellerJordan Dec 9, 2024
ed15a79
.
KellerJordan Dec 9, 2024
f5105fb
.
KellerJordan Dec 9, 2024
8c71828
.
KellerJordan Dec 9, 2024
6e60c0a
.
KellerJordan Dec 9, 2024
b9ab551
.
KellerJordan Dec 9, 2024
23bf0b0
.
KellerJordan Dec 9, 2024
692c183
Add logging of python version
segyges Dec 9, 2024
7163cc3
.
KellerJordan Dec 9, 2024
9223641
.
KellerJordan Dec 9, 2024
4b1f58d
Merge pull request #43 from YouJiacheng/patch-1
KellerJordan Dec 9, 2024
0d780e6
Merge pull request #44 from segyges/log-numpy-version
KellerJordan Dec 9, 2024
f0b00fe
.
KellerJordan Dec 9, 2024
33daa9a
.
KellerJordan Dec 9, 2024
650371a
.
KellerJordan Dec 9, 2024
8ffa1fb
.
KellerJordan Dec 9, 2024
b88d2e0
.
KellerJordan Dec 9, 2024
5b9ec5a
.
KellerJordan Dec 9, 2024
f120bfc
.
KellerJordan Dec 9, 2024
19a664a
Update README.md
KellerJordan Dec 9, 2024
06a2c67
Update README.md
KellerJordan Dec 9, 2024
c5ce40b
Update README.md to include git clone command for initial install
tysam-code Dec 10, 2024
4a7a492
Update Dockerfile to pin to 12.03 torch nightly
tysam-code Dec 10, 2024
f59ffb1
Merge pull request #51 from tysam-code/patch-2
KellerJordan Dec 11, 2024
0243d63
.
KellerJordan Dec 11, 2024
ed2d85d
Merge branch 'master' of https://github.com/KellerJordan/modded-nanogpt
KellerJordan Dec 11, 2024
5196070
.
KellerJordan Dec 11, 2024
d7614a7
.
KellerJordan Dec 11, 2024
4a5c352
new record 12/10/24
KellerJordan Dec 12, 2024
5cad2ab
Update README.md
KellerJordan Dec 12, 2024
b5401da
Update requirements.txt
KellerJordan Dec 14, 2024
a0a775f
Update train_gpt2.py
KellerJordan Dec 14, 2024
4dc1ac4
Update README.md
KellerJordan Dec 14, 2024
d2f4449
Update train_gpt2.py
KellerJordan Dec 14, 2024
a5da718
Update train_gpt2.py
KellerJordan Dec 14, 2024
9dbdf39
Update train_gpt2.py
KellerJordan Dec 14, 2024
c206a1a
Update train_gpt2.py
KellerJordan Dec 14, 2024
5458085
Update README.md
KellerJordan Dec 14, 2024
993e1fa
Update train_gpt2.py
KellerJordan Dec 14, 2024
73769d4
Update train_gpt2.py
KellerJordan Dec 14, 2024
241e27f
Update train_gpt2.py
KellerJordan Dec 14, 2024
ee2b3b2
Update train_gpt2.py
KellerJordan Dec 14, 2024
b71157f
Update train_gpt2.py
KellerJordan Dec 14, 2024
243746b
Update train_gpt2.py
KellerJordan Dec 14, 2024
8a5502f
Update train_gpt2.py
KellerJordan Dec 14, 2024
f7bea5b
batched newtonschulz implementation
scottjmaddox Dec 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
fineweb10B/
pylog124M/
__pycache__/
logs/
33 changes: 33 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
FROM nvidia/cuda:12.6.2-cudnn-devel-ubuntu24.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHON_VERSION=3.12.7
ENV PATH=/usr/local/bin:$PATH

RUN apt update && apt install -y --no-install-recommends build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev curl git libncursesw5-dev xz-utils tk-dev libxml2-dev \
libxmlsec1-dev libffi-dev liblzma-dev \
&& apt clean && rm -rf /var/lib/apt/lists/*

RUN curl -O https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz && \
tar -xzf Python-${PYTHON_VERSION}.tgz && \
cd Python-${PYTHON_VERSION} && \
./configure --enable-optimizations && \
make -j$(nproc) && \
make altinstall && \
cd .. && \
rm -rf Python-${PYTHON_VERSION} Python-${PYTHON_VERSION}.tgz

RUN ln -s /usr/local/bin/python3.12 /usr/local/bin/python && \
ln -s /usr/local/bin/pip3.12 /usr/local/bin/pip

COPY requirements.txt /modded-nanogpt/requirements.txt
WORKDIR /modded-nanogpt

RUN python -m pip install --upgrade pip && \
pip install -r requirements.txt

RUN pip install --pre torch==2.6.0.dev20241203+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124 --upgrade

CMD ["bash"]
ENTRYPOINT []
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Keller Jordan

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
262 changes: 242 additions & 20 deletions README.md

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions data/cached_fineweb100B.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os
import sys
from huggingface_hub import hf_hub_download
# Download the GPT-2 tokens of Fineweb100B from huggingface. This
# saves about an hour of startup time compared to regenerating them.
def get(fname):
local_dir = os.path.join(os.path.dirname(__file__), 'fineweb100B')
if not os.path.exists(os.path.join(local_dir, fname)):
hf_hub_download(repo_id="kjj0/fineweb100B-gpt2", filename=fname,
repo_type="dataset", local_dir=local_dir)
get("fineweb_val_%06d.bin" % 0)
num_chunks = 1030 # full fineweb100B. Each chunk is 100M tokens
if len(sys.argv) >= 2: # we can pass an argument to download less
num_chunks = int(sys.argv[1])
for i in range(1, num_chunks+1):
get("fineweb_train_%06d.bin" % i)
16 changes: 16 additions & 0 deletions data/cached_fineweb10B.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os
import sys
from huggingface_hub import hf_hub_download
# Download the GPT-2 tokens of Fineweb10B from huggingface. This
# saves about an hour of startup time compared to regenerating them.
def get(fname):
local_dir = os.path.join(os.path.dirname(__file__), 'fineweb10B')
if not os.path.exists(os.path.join(local_dir, fname)):
hf_hub_download(repo_id="kjj0/fineweb10B-gpt2", filename=fname,
repo_type="dataset", local_dir=local_dir)
get("fineweb_val_%06d.bin" % 0)
num_chunks = 103 # full fineweb10B. Each chunk is 100M tokens
if len(sys.argv) >= 2: # we can pass an argument to download less
num_chunks = int(sys.argv[1])
for i in range(1, num_chunks+1):
get("fineweb_train_%06d.bin" % i)
2 changes: 2 additions & 0 deletions data/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
datasets
tiktoken
Binary file added img/algo_optimizer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/dofa.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/fig_optimizer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nanogpt_speedrun51.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nanogpt_speedrun52.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nanogpt_speedrun53.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nanogpt_speedrun54.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions records/060624_AdamW/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
This is the log for my baseline AdamW training to which I compared the new Muon and SOAP optimizers.

just the log, which is in the old llm.c format ("tel" lines are val loss)

this was batch size 2^19, so ~5B tokens

was learning rate 0.0018, warmup=250, warmdown=2000, betas=(0.9, 0.95) IIRC

Loading