-
-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom local LLMs #389
Comments
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗 |
I quite like the idea of GPT4ALL, but unfortunately it seems to be a mostly CPU model (2 minutes for a single response using 36 cores!) and a GPU model is far away One fantastic idea I've seen bouncing around is to use an existing local LLM webserver that is compliant with the OpenAI API. The text-generation-webui project has actually implemented an openai-extension for a lot of their models. I've tested it and it seems to work (5 second responses on a 12GB VRAM using their 'stable-vicuna-13B-GPTQ' model!) but commands like Getting it to worktext-generation-webuiFirst Time Install micromamba create -n textgen python=3.10.9
micromamba activate textgen
## Nvidia gpu stuff
pip3 install torch torchvision torchaudio
## WebUI
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
## OpenAI extension
cd extensions/openai
pip install -r requirements.txt
cd ../../
python server.py --extensions openai --listen
Normal Run micromamba activate textgen
cd text-generation-webui
## Start the server, load the model, enable the OpenAI extension
python server.py --model TheBloke_stable-vicuna-13B-GPTQ --extensions openai --listen
(optional) Test that it's reachablemicromamba activate jupyterai ## (optional, just ensure you have all the jupyter-ai libraries)
import os
os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"
os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"
import openai
response = openai.ChatCompletion.create(
model="TheBloke_stable-vicuna-13B-GPTQ",
messages = [{ 'role': 'system', 'content': "Answer in a consistent style." },
{'role': 'user', 'content': "Teach me about patience."},
{'role': 'assistant', 'content': "The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread."},
{'role': 'user', 'content': "Teach me about the ocean."},
]
)
text = response['choices'][0]['message']['content']
print(text) Jupyter AIRun Jupytermicromamba activate jupyterai ## (optional, just ensure you have all the jupyter-ai libraries)
jupyter-lab
After that, save and it should just work! Jupyter AI with Stable-Vicuna(left: NVTOP showing realtime GPU usage, right: Jupyterlab) test.mp4Limitations
Would it be possible to create a new dropdown item in As always, big thanks to the Jupyter team! |
@mtekman as per #190 (comment) I wonder if the proxy option could help in your use case. |
@krassowski Hi, I've been reading through the comments in a few of those threads and I guess I'm still a little bit lost on what the proxy option does, compared to the base API url? |
Hi @mtekman @zboinek @krassowski I believe we can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm TLDR: UsageThis calls the provider API directly from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-key" #
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# falcon call
response = completion(model="falcon-40b", messages=messages)
# ollama call
response = completion(model="ollama/llama2", messages=messages) |
@ishaan-jaff If I was to use ollama, would this then natively support Edit: I just tested ollama (though not with litellm, which appears to be a paid cloud-based model similar to OpenAI? Happy to remove this statement if I'm wrong), and it doesn't seem to work with jupyterAI git clone [email protected]:jmorganca/ollama.git
cd ollama/ollama
./ollama serve & ./ollama run llama2
## Downloads 3 GB model and runs it at http://localhost:11434/api The problem is that the API offered there (which has a |
Ollama makes it very easy to run a variety of models locally on MacOS, Windows (via WSL, and eventually natively) and Linux. It has automatic GPU support for Apple Silicon and NVidia (it's using llama.cpp under the covers. It provides its own API and is supported by Langchain. It would be great to have support in jupyter-ai without having to setup an API-proxy like litellm -- no judgement on that project, its just that it seems like this would be supported using the existing langchain dependency. |
Try below to connect to a locally hosted model (I used textgen-web-ui):
|
With regard to @mtekman 's comment, many other providers have a common provider called OpenAI API or equivalent. It uses the same "openai" python package, with the difference that it's possible to specify the endpoint and other parameters. For example, this is how https://continue.dev exposes these providers: I am one of the Collaborators of FastChat, and we have it deployed in many places. This would be an invaluable addition to Jupyter-AI. |
the default base url is http://0.0.0.0:5000 , how can i change it ? |
That seems to be the issue of this bug report and that of #190 . You can't. |
is the setting necessary? |
Hi @mtekman , i cannot even make the ai tab setting page show as mentioned above when I run jupyterlab in an offline environment. Is there any way to solve this? When i click ai chat tab, it shows a warning icon and says “There seems to be a problem with the chat backend, please look at the JupyterLab server logs or contact your administrator to correct this problem.” Jupyter Lab version: 4.1.2 |
Weird, it works fine for me -- though I'm using a newer Jupyter Lab.
The chat window tab should appear in the interface |
@mtekman I'm not sure if it's because I'm in an offline environment. And I installed notebook and Jupyter-ai through pip. |
jupyterlab by default is run on localhost (is that what you mean by offline?)
pip might be fighting your system python libraries depending on how your To get around this, either try creating a mamba environment as defined in my last comment, OR, create a virtualenv using just python:
Double check which jupyter-lab is being called, because maybe your system has one installed globally.
|
@mtekman I mean I'm running jupyterlab and Jupyter-ai in an environment without internet connection. I'll try to check again tomorrow, thanks for the reply! |
@mtekman I've tried to create a new Fonda EN's and pip installed jupyterlab-4.2.3 and jupyter-ai-1.18.1 and made sure Jupyter-lab command direct to the file in newly created env. But still I've got the same error message with an error icon says "There seems to be a problem with the chat backend, please look at JupyerLab server logs or contact your administrator to correct the problem" And this time I've noticed that there're error messages in cmd window, which says I'll check if I can change ai chat settings through source code to solve this. |
Check your firewall (e.g. ufw disable), it could be that some internal
connections are blocked?
|
@imClumsyPanda most likely the server extension of |
Hi all! |
I saw it...
(from that post) It seems to still be not addressed |
You run your custom model, and then point to it via the "Base API URL", choosing an arbitrary model from the "Language Model" selection which your custom model should be API compatible with |
It keeps saying "invalid api key". I tried it with a model having no api key and the one I know API key for. |
If you're using text-generation-ui, the API key seems to be hardcoded: #389 (comment)
My understanding of it is that you choose the OpenAI model from the dropdown that has all the endpoints you want. A little bit of trial and error is needed here, and nothing will work 100%.
No, you literally offer a specific model at some address, and in the "Language Model" section you pick the closest OpenAI model that you think will be compatible with the endpoints for your model. It will not consult the OpenAI servers, since you've overriden this with the "Base API url" setting |
I'm not using the test-generation-ui, I'm using https://github.com/h2oai/h2ogpt that runs llama3 via vllm for me. It exposes the standard openai-compatible interface for me on an https port, and I can connect to it from multiple openai-compatible tools. The model is |
Hmm, tricky. Maybe Jupyter is expecting a specifically formatted API key? Perhaps try setting the API key in your custom model to that ridiculous sk-11111* one |
That key is specific to the text-generation-ai service So is there any issue in creating the common provider proposed in #389 (comment)? |
Hi all, |
If it works from curly but not from the browser you may need to configure origin check on jupyter server. |
@krassowski ... how can I do that? |
FYI: |
Hi everybody, I solved my problem and I will post here the solution for a Internal Local LLM server. I am a begginer, so any help will be appreciated. In my case, the domain user is authenticated in MS Entra. Thanks to your help I managed to solve it. First I built what I call the model interface:
Later I built the model provider for jupyter-ai assistant :
Later I built the packages and uploaded to our internal server. |
There's a lot going on in this still open thread, but maybe helpful to others to note that Ollama integration was added in #646. |
What about custom/private LLMs. Will there be an option to use some of longchain local features like llama.cpp?
The text was updated successfully, but these errors were encountered: