-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: huggingface/text-generation-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct
#2775
opened Nov 22, 2024 by
akowalsk
2 of 4 tasks
The same model, but different loading methods will result in very different inference speeds?
#2757
opened Nov 19, 2024 by
hjs2027864933
2 of 4 tasks
Regression in 2.4.0 : Input Valdidation errors return code 200 and do not return the error message
#2749
opened Nov 15, 2024 by
leonarddls
2 of 4 tasks
On-The-Fly Quantization for Inference appears not to be working as per documentation.
#2748
opened Nov 15, 2024 by
colin-byrneireland1
1 of 4 tasks
Different inference results and speed between /generate and OpenAI endpoint
#2747
opened Nov 14, 2024 by
jegork
2 of 4 tasks
In dev mode, server is stuck at Server started at unix:///tmp/text-generation-server-0
#2735
opened Nov 10, 2024 by
mokeddembillel
2 of 4 tasks
launch TGI with the argument
--max-input-tokens
smaller than sliding_window=4096 (got here max_input_tokens=16384)
#2730
opened Nov 7, 2024 by
ashwincv0112
1 of 4 tasks
device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar
#2729
opened Nov 6, 2024 by
SokolAnn
2 of 4 tasks
Python client: Pydantic protected namespace "model_"
#2722
opened Nov 4, 2024 by
Simon-Stone
4 tasks
FlashLlamaForCausalLM
's using name dense
for its mlp submodule causes error when using LoRA adapter
#2715
opened Nov 2, 2024 by
sadra-barikbin
CUDA Error: No kernel image is available for execution on the device
#2703
opened Oct 28, 2024 by
shubhamgajbhiye1994
2 of 4 tasks
Complexe response format lead the container to run forever on CPU
#2681
opened Oct 23, 2024 by
Rictus
2 of 4 tasks
PREFIX_CACHING=0 does not disable prefix caching in v2.3.1
#2676
opened Oct 21, 2024 by
sam-ulrich1
2 of 4 tasks
(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently
#2675
opened Oct 21, 2024 by
nathan-az
3 of 4 tasks
Prefix caching causes 2 different responses from the same HTTP call with seed set depending on what machine calls
#2670
opened Oct 18, 2024 by
sam-ulrich1
2 of 4 tasks
Previous Next
ProTip!
Follow long discussions with comments:>50.