Additional models using huggingface/text-generation-inference #719
Replies: 1 comment
-
Managed to load the camembert-large model after resaving the original model and tonkenizer to hugging-face https://huggingface.co/fxa76/camembert-large-resaved. 2023-06-15T16:03:37.416535Z INFO text_generation_launcher: Successfully downloaded weights. Now the issue is happeing when quiering the model : "text_generation.errors.GenerationError: Request failed during generation: Server error: Expected size for first two dimensions of batch2 tensor to be: [16, 376] but got: [16, 1]." |
Beta Was this translation helpful? Give feedback.
-
Hi,
I have edited privateGPT.py to add the possibility to use HuggingFace Text Gen Inference Localy.
Follow install instruction at https://github.com/huggingface/text-generation-inference and add the new Case "fxatest"
case "GPT4All": llm = GPT4All(model=model_path, n_ctx=model_n_ctx, backend='gptj', callbacks=callbacks, verbose=False) case "fxatest": llm = HuggingFaceTextGenInference( inference_server_url='http://localhost:8080/', max_new_tokens=512, top_k=10, top_p=0.95, typical_p=0.95, temperature=0.01, repetition_penalty=1.03, ) case _default: print(f"Model {model_type} not supported!") exit;
After installing text-generation-inference you can start on windows using this batch:
`set model=bigscience/bloom-560m
set num_shard=1
set volume=C:/Factory/huggingface_testgen/data
docker run --gpus 0 --shm-size 1g -p 8080:80 -v %volume%:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id %model% --num-shard %num_shard%`
It works with bigscience/bloom-560m and a few others but unfortunately no success yet with Camembert or other French models yet.
Would be glad to find a way to make it work.
On the positive side it runs fast using GPU (tested with NVIDIA 3060)
Beta Was this translation helpful? Give feedback.
All reactions