TinyLLM

Minimal, high-performance inference engine for LLM's -- used in development environments

Overview

TinyLLM streamlines the inference pipeline with minimal overhead, focusing on memory efficiency and throughput optimization. We include a custom tokenizer for self-developed models, and it's compataibile with existing LLM's through our scheduling systme.

Features

Memory managment pruning
Efficient batch processing and response streaming
Optimized scheduling for multi-model deployments
Custom tokenizer implmentation for self-developed models
Inference API
KV cache implementation
Training CLI for development models
Byte-level tokenization

This is very much still an experiment, especially the tokenizer, our scheduler is somewhat well-written, memory management is decent.

I'll continue to slowly improve these components over my weekends.

Scope

This is solely a inference engine. It does not:

Implement large model architectures
Include pre-trained models
Support distributed training

How to use?

Clone repository

git clone https://github.com/andrewn6/tinyllm

pip install -e .

Register your trained model

tinyllm model register transformer-19m v1 \
    --checkpoint models/tiny-19m.pt \
    --model-type native \
    --description "19M parameter transformer"

Serve and expose to localhost

tinyllm serve \
    --model-name mymodel \
    --port 8000 \
    --model-type native

List models

tinyllm model list

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
device		device
prometheus		prometheus
tinyllm.egg-info		tinyllm.egg-info
tinyllm		tinyllm
.gitignore		.gitignore
Dockerfile.cuda		Dockerfile.cuda
Dockerfile.m1		Dockerfile.m1
Dockerfile.vagrant		Dockerfile.vagrant
README.MD		README.MD
Vagrantfile		Vagrantfile
device_compatibility.sh		device_compatibility.sh
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyLLM

Overview

Features

Scope

How to use?

About

Releases

Packages

Languages

andrewn6/tinyllm

Folders and files

Latest commit

History

Repository files navigation

TinyLLM

Overview

Features

Scope

How to use?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages