Take an LLM, reduce the hidden_size
for its matrices, and then overfit it to some text.
This is done to get a lightweight version of the same architecture, for testing.
-
Reduced models can be found in this HF ggml-org repo. Currently supported LLMs:
Architecture HF repo hidden size base (MB) lora (MB) Phi3ForCausalLM
microsoft/Phi-3-mini-4k-instruct
64 20 12 LlamaForCausalLM
meta-llama/Meta-Llama-3-8B-Instruct
64 68 52 Gemma2ForCausalLM
google/gemma-2-2b
64 77 5
- Run with:
make HF_REPO=<your hf model repo>
- What's happening?
make run
sets up the repo and then, for each<model-name>
:- Fetch
<model-name>
from HF. - Reduce the size of the matrices of the model.
- Overfit the model to a paragraph of text (this will be the
base
model). - Overfit a lora adapter on top of
base
to a different paragraph of text. - Assert models are overfitted.
- Upload these two models to
<your hf model repo>
.
- Fetch
Via a user write access token to be set as the environment variable HF_TOKEN
.
-
Environment (
poetry
required):make setup
-
To run the full script for a specific model run:
python reduce_llms_for_testing/main.py -m "<model-name>" -hf "<your hf model repo>"