Skip to content

Files

Latest commit

 

History

History
27 lines (18 loc) · 1.59 KB

File metadata and controls

27 lines (18 loc) · 1.59 KB

This package provides containers for both ExLlama and ExLlamaV2:

Both loaders are also supported in the oobabooga text-generation-webui container.

Inference Benchmark

Substitute the GPTQ model from HuggingFace Hub that you want to run (see exllama compatible models)

./run.sh --workdir=/opt/exllama $(./autotag exllama) /bin/bash -c \
  'python3 test_benchmark_inference.py --perf --validate -d $(huggingface-downloader TheBloke/Llama-2-7B-GPTQ)'

If the model repository is private or requires authentication, add --env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN>

Memory Usage

Model Memory (MB)
TheBloke/Llama-2-7B-GPTQ 5,200
TheBloke/Llama-2-13B-GPTQ 9,135
TheBloke/LLaMA-30b-GPTQ 20,206
TheBloke/Llama-2-70B-GPTQ 35,462