Reduce LLMs size for testing

Take an LLM, reduce the hidden_size for its matrices, and then overfit it to some text. This is done to get a lightweight version of the same architecture, for testing.

Reduced models can be found in this HF ggml-org repo. Currently supported LLMs:

Architecture	HF repo	hidden size	base (MB)	lora (MB)
`Phi3ForCausalLM`	`microsoft/Phi-3-mini-4k-instruct`	64	20	12
`LlamaForCausalLM`	`meta-llama/Meta-Llama-3-8B-Instruct`	64	68	52
`Gemma2ForCausalLM`	`google/gemma-2-2b`	64	77	5

Run with:
```
make HF_REPO=<your hf model repo>
```

What's happening? make run sets up the repo and then, for each <model-name>:
1. Fetch <model-name> from HF.
2. Reduce the size of the matrices of the model.
3. Overfit the model to a paragraph of text (this will be the base model).
4. Overfit a lora adapter on top of base to a different paragraph of text.
5. Assert models are overfitted.
6. Upload these two models to <your hf model repo>.

HuggingFace access

Via a user write access token to be set as the environment variable HF_TOKEN.

Development

Environment (poetry required):
```
make setup
```

To run the full script for a specific model run:

python reduce_llms_for_testing/main.py -m "<model-name>" -hf "<your hf model repo>"

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
reduce_llms_for_testing		reduce_llms_for_testing
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reduce LLMs size for testing

HuggingFace access

Development

About

Releases

Packages

Languages

ltoniazzi/reduce-llms-for-testing

Folders and files

Latest commit

History

Repository files navigation

Reduce LLMs size for testing

HuggingFace access

Development

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages