huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.1k
Star 9.1k

Code
Issues 127
Pull requests 11
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

127 Open 1,235 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

"RuntimeError: weight lm_head.weight does not exist" quantizing Llama-3.2-11B-Vision-Instruct

#2775 opened Nov 22, 2024 by akowalsk

2 of 4 tasks

Latest Docker Image failing for A40 GPU

#2763 opened Nov 20, 2024 by SMAntony

2 of 4 tasks

The same model, but different loading methods will result in very different inference speeds?

#2757 opened Nov 19, 2024 by hjs2027864933

2 of 4 tasks

Regression in 2.4.0 : Input Valdidation errors return code 200 and do not return the error message

#2749 opened Nov 15, 2024 by leonarddls

2 of 4 tasks

On-The-Fly Quantization for Inference appears not to be working as per documentation.

#2748 opened Nov 15, 2024 by colin-byrneireland1

1 of 4 tasks

Different inference results and speed between /generate and OpenAI endpoint

#2747 opened Nov 14, 2024 by jegork

2 of 4 tasks

CUDA OutOfMemory even after warmup phase succeeded

#2744 opened Nov 13, 2024 by martinigoyanes

Support for Falcon-Mamba-7B

#2736 opened Nov 10, 2024 by mokeddembillel

1 of 2 tasks

In dev mode, server is stuck at Server started at unix:///tmp/text-generation-server-0

#2735 opened Nov 10, 2024 by mokeddembillel

2 of 4 tasks

Failed to build vllm in local install

#2734 opened Nov 9, 2024 by mokeddembillel

2 of 4 tasks

Bi-gram Repetation Penalty for the TGI configuration

#2731 opened Nov 7, 2024 by mertege

launch TGI with the argument --max-input-tokens smaller than sliding_window=4096 (got here max_input_tokens=16384)

#2730 opened Nov 7, 2024 by ashwincv0112

1 of 4 tasks

device-side assert triggered when trying to use LLaMA 3.2 Vision with grammar

#2729 opened Nov 6, 2024 by SokolAnn

2 of 4 tasks

TGI crashes while loading Qwen2-VL-7B-Instruct

#2728 opened Nov 6, 2024 by ktobah

2 of 4 tasks

Unable to load/run LoRA Adapters on llama - 7B

#2727 opened Nov 5, 2024 by kaushikmitr

Python client: Pydantic protected namespace "model_"

#2722 opened Nov 4, 2024 by Simon-Stone

4 tasks

FlashLlamaForCausalLM's using name dense for its mlp submodule causes error when using LoRA adapter

#2715 opened Nov 2, 2024 by sadra-barikbin

detokenize

#2705 opened Oct 29, 2024 by oroojlooy

CUDA Error: No kernel image is available for execution on the device

#2703 opened Oct 28, 2024 by shubhamgajbhiye1994

2 of 4 tasks

Is there a way to defines "bad_words"?

#2700 opened Oct 28, 2024 by tonylek

TGI Server should be installable via pip

#2696 opened Oct 27, 2024 by cdoern

Complexe response format lead the container to run forever on CPU

#2681 opened Oct 23, 2024 by Rictus

2 of 4 tasks

PREFIX_CACHING=0 does not disable prefix caching in v2.3.1

#2676 opened Oct 21, 2024 by sam-ulrich1

2 of 4 tasks

(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently

#2675 opened Oct 21, 2024 by nathan-az

3 of 4 tasks

Prefix caching causes 2 different responses from the same HTTP call with seed set depending on what machine calls

#2670 opened Oct 18, 2024 by sam-ulrich1

2 of 4 tasks

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly