Replies: 1 comment
-
I had a similar xp. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi There.
I'm using LLaMA 13B Q4_0 with the Python bindings in CPU Mode.
I don't manage to get out good responses like I'm used to when working with the text-generation-webui.
I think I'm doing something wrong.
Thats how I Instantiate the Model:
llm = Llama(model_path="./llama.cpp/models/13B/ggml-model-q4_0.bin", seed = 0, n_ctx = 1200)
And thats how I try to get a response:
output = llm(prompt, max_tokens=64, stop=[Human_Name + ":", "\n"], echo=True)
Technically everything works, but the quality of the response I get is quite off, and no where near I want it to be.
It should be like in chat-mode. Mostly I get out of context responses, sometimes empty responses and sometimes gibberish.
Here's the Full Prompt I use in output = llm():
Can you help me?
Some more Info about the Model:
Response Meta:
Beta Was this translation helpful? Give feedback.
All reactions