Skip to content

DanjieTang/GemmaLLM

Repository files navigation

image

My implementation of the Gemma LLM.

Training data.

a) All English Wikipedia pages(6.5 million).

b) ~2 billion tokens.

Key insights from this implementation.

a)RMS Normalization

b)ROPE Embedding

c)MultiQueryAttention

d)GeGLU Activations

e)Pre-Norm Transformers

Training detail.

a) 2 Million parameters

b) Contextual length of 64 tokens.

About

Foundation model with Gemma architecture.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published