Why build-nanogpt and nanogpt differs ? #53
adamskrodzki
started this conversation in
General
Replies: 1 comment
-
build-nanoGPT (this repo) is the repo accompanying the youtube video Andrej released. nanoGPT is a more formalised version which might explain why he included dropout and other stuff there. My guess is he probaly didn't have enough time to fit all of these elements in a youtube video |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I'm a begginer in ML so that might be stupid ommision on my side, but as far as I've compared and understood
there are differences between the two, and it is not as readme states
"This repo holds the from-scratch reproduction of nanoGPT"
One example of a difference is:
In this repo there is no dropout step (
x = self.dropout(x)
) in MLP.forwardIs there something I'm missing here?
There are some others (biases are missing ? flash attention is forced ?) but it might be just my lack of understanding of a code.
Anyway this repository is great work and contribution, helped me a lot with understanding how LLM works, so thank you for that.
Beta Was this translation helpful? Give feedback.
All reactions