Additional models using huggingface/text-generation-inference #719

fxa76 · 2023-06-15T12:37:58Z

fxa76
Jun 15, 2023

Hi,

I have edited privateGPT.py to add the possibility to use HuggingFace Text Gen Inference Localy.
Follow install instruction at https://github.com/huggingface/text-generation-inference and add the new Case "fxatest"

case "GPT4All": llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False) case "fxatest": llm = HuggingFaceTextGenInference( inference_server_url='http://localhost:8080/', max_new_tokens=512, top_k=10, top_p=0.95, typical_p=0.95, temperature=0.01, repetition_penalty=1.03, ) case _default: print(f"Model {model_type} not supported!") exit;

After installing text-generation-inference you can start on windows using this batch:
`set model=bigscience/bloom-560m
set num_shard=1
set volume=C:/Factory/huggingface_testgen/data

docker run --gpus 0 --shm-size 1g -p 8080:80 -v %volume%:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id %model% --num-shard %num_shard%`

It works with bigscience/bloom-560m and a few others but unfortunately no success yet with Camembert or other French models yet.
Would be glad to find a way to make it work.

On the positive side it runs fast using GPU (tested with NVIDIA 3060)

fxa76 · 2023-06-15T16:05:15Z

fxa76
Jun 15, 2023
Author

Managed to load the camembert-large model after resaving the original model and tonkenizer to hugging-face https://huggingface.co/fxa76/camembert-large-resaved.
`
docker run --env-file test.env --gpus 0 --shm-size 4g -p 8080:80 -v C:/Factory/huggingface_testgen/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id fxa76/camembert-large-resaved --num-shard 1
2023-06-15T16:03:34.011794Z INFO text_generation_launcher: Args { model_id: "fxa76/camembert-large-resaved", revision: None, sharded: None, num_shard: Some(1), quantize: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false }
2023-06-15T16:03:34.011891Z INFO text_generation_launcher: Starting download process.
2023-06-15T16:03:37.031444Z INFO download: text_generation_launcher: Files are already present on the host. Skipping download.

2023-06-15T16:03:37.416535Z INFO text_generation_launcher: Successfully downloaded weights.
2023-06-15T16:03:37.416791Z INFO text_generation_launcher: Starting shard 0
2023-06-15T16:03:47.431277Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-15T16:03:57.445115Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-15T16:04:07.456968Z INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-06-15T16:04:10.330564Z INFO shard-manager: text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
rank=0
2023-06-15T16:04:10.361342Z INFO text_generation_launcher: Shard 0 ready in 32.943981825s
2023-06-15T16:04:10.454592Z INFO text_generation_launcher: Starting Webserver
2023-06-15T16:04:10.777944Z WARN text_generation_router: router/src/main.rs:136: Could not find a fast tokenizer implementation for fxa76/camembert-large-resaved
2023-06-15T16:04:10.777982Z WARN text_generation_router: router/src/main.rs:139: Rust input length validation and truncation is disabled
2023-06-15T16:04:10.931664Z INFO text_generation_router: router/src/main.rs:178: Connected
`

Now the issue is happeing when quiering the model : "text_generation.errors.GenerationError: Request failed during generation: Server error: Expected size for first two dimensions of batch2 tensor to be: [16, 376] but got: [16, 1]."

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional models using huggingface/text-generation-inference #719

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Additional models using huggingface/text-generation-inference #719

fxa76 Jun 15, 2023

Replies: 1 comment

fxa76 Jun 15, 2023 Author

fxa76
Jun 15, 2023

fxa76
Jun 15, 2023
Author