Custom local LLMs #389

zboinek · 2023-09-13T21:21:12Z

What about custom/private LLMs. Will there be an option to use some of longchain local features like llama.cpp?

welcome · 2023-09-13T21:21:14Z

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

mtekman · 2023-09-18T15:40:18Z

I quite like the idea of GPT4ALL, but unfortunately it seems to be a mostly CPU model (2 minutes for a single response using 36 cores!) and a GPU model is far away

One fantastic idea I've seen bouncing around is to use an existing local LLM webserver that is compliant with the OpenAI API. The text-generation-webui project has actually implemented an openai-extension for a lot of their models.

I've tested it and it seems to work (5 second responses on a 12GB VRAM using their 'stable-vicuna-13B-GPTQ' model!) but commands like /generate and /learn naturally are not really implemented.

Getting it to work

text-generation-webui

First Time Install

  micromamba create -n textgen python=3.10.9
  micromamba activate textgen
  ## Nvidia gpu stuff
  pip3 install torch torchvision torchaudio
  ## WebUI
  git clone https://github.com/oobabooga/text-generation-webui
  cd text-generation-webui
  pip install -r requirements.txt
  ## OpenAI extension
  cd extensions/openai
  pip install -r requirements.txt
  cd ../../
  python server.py --extensions openai --listen

Go to localhost:7860 → Models Tab
Put https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ into the download text box
Wait for it to download, then kill the server.

Normal Run

  micromamba activate textgen
  cd text-generation-webui
  ## Start the server, load the model, enable the OpenAI extension
  python server.py --model TheBloke_stable-vicuna-13B-GPTQ --extensions openai --listen

(you should see info about the OPENAI_BASE printed here)

(optional) Test that it's reachable

micromamba activate jupyterai  ## (optional, just ensure you have all the jupyter-ai libraries)

In Python:

      import os
      os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"
      os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"
      import openai

      response = openai.ChatCompletion.create(
        model="TheBloke_stable-vicuna-13B-GPTQ",
        messages = [{ 'role': 'system', 'content': "Answer in a consistent style." },
          {'role': 'user', 'content': "Teach me about patience."},
          {'role': 'assistant', 'content': "The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread."},
          {'role': 'user', 'content': "Teach me about the ocean."},
        ]
      )
      text = response['choices'][0]['message']['content']
      print(text)

Jupyter AI

Run Jupyter

micromamba activate jupyterai  ## (optional, just ensure you have all the jupyter-ai libraries)
jupyter-lab

Click on the AI tab → Settings Wheel:

(where API key is the sk-111111111111111111111111111111111111111111111111 from before)

After that, save and it should just work!

Jupyter AI with Stable-Vicuna

(left: NVTOP showing realtime GPU usage, right: Jupyterlab)

test.mp4

Limitations

No /generate, /learn
Untested %%ai magic, since I use R and R does not seem to load the %% stuff.
There is no way to specify the model in the OpenAI settings.

Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?

As always, big thanks to the Jupyter team!

krassowski · 2023-09-18T18:06:33Z

@mtekman as per #190 (comment) I wonder if the proxy option could help in your use case.

mtekman · 2023-09-19T08:45:24Z

@krassowski Hi, I've been reading through the comments in a few of those threads and I guess I'm still a little bit lost on what the proxy option does, compared to the base API url?

ishaan-jaff · 2023-09-22T17:15:59Z

Hi @mtekman @zboinek @krassowski I believe we can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm

TLDR:
We allow you to use any LLM as a drop in replacement for gpt-3.5-turbo.
You can use our proxy server or spin up your own proxy server using LiteLLM

Usage

This calls the provider API directly

from litellm import completion
import os
## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-key" # 
messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# falcon call
response = completion(model="falcon-40b", messages=messages)

# ollama call
response = completion(model="ollama/llama2", messages=messages)

mtekman · 2023-09-25T08:37:31Z

@ishaan-jaff If I was to use ollama, would this then natively support /generate, /learn, /ask directives with responses that JupyterAI could understand?

Edit: I just tested ollama (though not with litellm, which appears to be a paid cloud-based model similar to OpenAI? Happy to remove this statement if I'm wrong), and it doesn't seem to work with jupyterAI

git clone [email protected]:jmorganca/ollama.git
cd ollama/ollama
./ollama serve & ./ollama run llama2
## Downloads 3 GB model and runs it at  http://localhost:11434/api

The problem is that the API offered there (which has a /generate endpoint), does not seem to be compliant with OpenAI's API, so I'm getting no responses from Jupyter.

easp · 2023-10-25T15:53:15Z

Ollama makes it very easy to run a variety of models locally on MacOS, Windows (via WSL, and eventually natively) and Linux. It has automatic GPU support for Apple Silicon and NVidia (it's using llama.cpp under the covers. It provides its own API and is supported by Langchain.

It would be great to have support in jupyter-ai without having to setup an API-proxy like litellm -- no judgement on that project, its just that it seems like this would be supported using the existing langchain dependency.

jamesjun · 2023-12-22T09:25:23Z

Untested %%ai magic, since I use R and R does not seem to load the %% stuff.

Try below to connect to a locally hosted model (I used textgen-web-ui):

%%ai chatgpt -m {"api_base":"http://127.0.0.1:5000/v1"}

surak · 2024-01-19T11:35:08Z

With regard to @mtekman 's comment, many other providers have a common provider called OpenAI API or equivalent. It uses the same "openai" python package, with the difference that it's possible to specify the endpoint and other parameters.

For example, this is how https://continue.dev exposes these providers:

I am one of the Collaborators of FastChat, and we have it deployed in many places. This would be an invaluable addition to Jupyter-AI.

adaaaaaa · 2024-02-03T05:48:01Z

os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"

the default base url is http://0.0.0.0:5000 , how can i change it ?

surak · 2024-02-03T15:44:03Z

os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"
the default base url is http://0.0.0.0:5000 , how can i change it ?

That seems to be the issue of this bug report and that of #190 . You can't.

astrojuanlu · 2024-02-05T14:40:22Z

Ollama support being tracked in #482, LangChain SelfHostedHuggingFaceLLM in #343.

adaaaaaa · 2024-02-29T13:59:59Z

      os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"

is the setting necessary?

mtekman · 2024-02-29T16:45:43Z

@adaaaaaa It seems like it https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#third-party-application-setup

imClumsyPanda · 2024-07-02T11:11:33Z

Hi @mtekman , i cannot even make the ai tab setting page show as mentioned above when I run jupyterlab in an offline environment. Is there any way to solve this?

When i click ai chat tab, it shows a warning icon and says “There seems to be a problem with the chat backend, please look at the JupyterLab server logs or contact your administrator to correct this problem.”

Jupyter Lab version: 4.1.2
Jupyter AI version: 2.18.1

mtekman · 2024-07-02T11:46:39Z

@imClumsyPanda

Weird, it works fine for me -- though I'm using a newer Jupyter Lab.

## conda or mamba or micromamba, all the same
micromamba create -y -c conda-forge -n jupyterlabai \
  jupyterlab=4.2.3 jupyter-ai=2.18.1

## Activate the environment and run it
micromamba activate jupyterlabai
jupyter-lab

The chat window tab should appear in the interface

imClumsyPanda · 2024-07-02T11:48:24Z

@imClumsyPanda

Weird, it works fine for me -- though I'm using a newer Jupyter Lab.
## conda or mamba or micromamba, all the same

micromamba create -y -c conda-forge -n jupyterlabai \

  jupyterlab=4.2.3 jupyter-ai=2.18.1



## Activate the environment and run it

micromamba activate jupyterlabai

jupyter-lab
The chat window tab should appear in the interface

@mtekman I'm not sure if it's because I'm in an offline environment.

And I installed notebook and Jupyter-ai through pip.

mtekman · 2024-07-02T11:56:46Z

@imClumsyPanda

I'm not sure if it's because I'm in an offline environment.

jupyterlab by default is run on localhost (is that what you mean by offline?)

And I installed notebook and Jupyter-ai through pip.

pip might be fighting your system python libraries depending on how your PATH is defined.

To get around this, either try creating a mamba environment as defined in my last comment, OR, create a virtualenv using just python:

 ## create a new env
virtualenv jupyteraivenv 

## source "activate" it
source jupyteraivenv/bin/activate 

## Install the right versions
pip install jupyterlab==4.2.3 jupyter-ai==2.18.1

## Run it
jupyter-lab

Double check which jupyter-lab is being called, because maybe your system has one installed globally.

whereis jupyter-lab
## should give you a path like:
## /home/blah/blah/jupyteraivenv/bin/jupyter-lab

imClumsyPanda · 2024-07-02T16:11:27Z

@mtekman I mean I'm running jupyterlab and Jupyter-ai in an environment without internet connection.

I'll try to check again tomorrow, thanks for the reply!

imClumsyPanda · 2024-07-03T05:19:14Z

@mtekman I've tried to create a new Fonda EN's and pip installed jupyterlab-4.2.3 and jupyter-ai-1.18.1 and made sure Jupyter-lab command direct to the file in newly created env.

But still I've got the same error message with an error icon says "There seems to be a problem with the chat backend, please look at JupyerLab server logs or contact your administrator to correct the problem"

And this time I've noticed that there're error messages in cmd window, which says [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats refere=None or [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats?token=[secret] refere=None.

I'll check if I can change ai chat settings through source code to solve this.

mtekman · 2024-07-03T05:46:01Z

Check your firewall (e.g. ufw disable), it could be that some internal connections are blocked?

krassowski · 2024-07-03T11:43:06Z

@imClumsyPanda most likely the server extension of jupyter-ai fails to load for some reason specific your environment (e.g. conflicting version of a dependency). You would need to look at an initial portion of the log (maybe with --debug option). Also, checking output of pip check and jupyter server extension list can be helpful.

dimm0 · 2024-07-09T17:19:54Z

Hi all!
I'm lost in multiple issues tracking the problem.
Is there today a way to point jupyterai at a custom OpenAI-compatible API URL and specify an arbitrary model to use?
(I'm running LLama3 via H2Ogpt / VLLM)

mtekman · 2024-07-09T17:40:47Z

@dimm0 #389 (comment)

dimm0 · 2024-07-09T17:49:35Z

I saw it...

No /generate, /learn
Untested %%ai magic, since I use R and R does not seem to load the %% stuff.
There is no way to specify the model in the OpenAI settings.
Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?

(from that post)

It seems to still be not addressed

mtekman · 2024-07-09T17:51:38Z

You run your custom model, and then point to it via the "Base API URL", choosing an arbitrary model from the "Language Model" selection which your custom model should be API compatible with

dimm0 · 2024-07-09T18:09:05Z

It keeps saying "invalid api key". I tried it with a model having no api key and the one I know API key for.
But how does choosing the right model work? Will it query the list of available models from the endpoint?

mtekman · 2024-07-09T18:16:02Z

If you're using text-generation-ui, the API key seems to be hardcoded: #389 (comment)

But how does choosing the right model work?

My understanding of it is that you choose the OpenAI model from the dropdown that has all the endpoints you want. A little bit of trial and error is needed here, and nothing will work 100%.

Will it query the list of available models from the endpoint?

No, you literally offer a specific model at some address, and in the "Language Model" section you pick the closest OpenAI model that you think will be compatible with the endpoints for your model.

It will not consult the OpenAI servers, since you've overriden this with the "Base API url" setting

dimm0 · 2024-07-09T18:20:11Z

I'm not using the test-generation-ui, I'm using https://github.com/h2oai/h2ogpt that runs llama3 via vllm for me. It exposes the standard openai-compatible interface for me on an https port, and I can connect to it from multiple openai-compatible tools. The model is meta-llama/Meta-Llama-3-70B-Instruct. I can enable an API KEY or use it without any key. How can I add one to jupyter-ai?

mtekman · 2024-07-09T18:31:18Z

Hmm, tricky. Maybe Jupyter is expecting a specifically formatted API key? Perhaps try setting the API key in your custom model to that ridiculous sk-11111* one

dimm0 · 2024-07-09T18:54:03Z

That key is specific to the text-generation-ai service

So is there any issue in creating the common provider proposed in #389 (comment)?

DanielCastroBosch · 2024-07-31T12:53:52Z

Hi all,
Here we have a internal LLM server. It uses the chat GPT -3.5 turbo model.
In the network layer that I work we don´t have access to chat GPT internet servers, we must use our internal server.
I tried to change the the base API URL parameter to https:///v1/public/gpt3/chats/messages but it is not working. This URL works when I use CURL to test it sending messages to the server.
It returns a NotFoundError: Error code: 404 - {'statusCode': 404, 'message': 'Resource not found'} - error
Can anybody please help me? Is there another solution for me?
For example, develop some code to add a new item in Language Model drop down?

krassowski · 2024-07-31T12:56:57Z

If it works from curly but not from the browser you may need to configure origin check on jupyter server.

DanielCastroBosch · 2024-07-31T14:03:43Z

@krassowski ... how can I do that?
Thanks in advance

jhgoebbert · 2024-08-12T14:02:53Z

FYI:
You can find our implementation of a custom model provider (using the OpenAI API) to include Blablador to jupyter-ai here:
https://github.com/FZJ-JSC/jupyter-ai-blablador
To be honest, it is just a tiny bit more than the template and still under development, but might be of help for someone.

DanielCastroBosch · 2024-08-13T20:29:14Z

Hi everybody,

I solved my problem and I will post here the solution for a Internal Local LLM server. I am a begginer, so any help will be appreciated. In my case, the domain user is authenticated in MS Entra. Thanks to your help I managed to solve it.

First I built what I call the model interface:

# mymodule.py

from typing import Any, Dict, Iterator, List, Mapping, Optional
from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk
import urllib3
import requests
import json

class serverLLM(LLM):
    """A custom chat model that interacts with a internal llm server

    When contributing an implementation to LangChain, carefully document
    the model including the initialization parameters, include
    an example of how to initialize the model and include any relevant
    links to the underlying models documentation or API.

    Example:

        .. code-block:: python


            result = model.invoke([HumanMessage(content="hello")])
            result = model.batch([[HumanMessage(content="hello")],
                                 [HumanMessage(content="world")]])
    """

    token_url = "<MS ENTRA APP link>"
    client_id = "<CLIENT ID>"
    client_secret = "<CLIENT SECRET>"
    scope = "<LLM SERVER API SCOPE>"
    TryoutBaseURL = "<LLM SERVER BASE API URL>"
    proxy = { 'http': '<HTTP PROXY>', 'https': '<HTTPS PROXY>' }
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

    def _get_access_token(self):
        data = {
        "client_id": self.client_id,
        "scope": self.scope,
        "client_secret": self.client_secret,
        "grant_type": "client_credentials",
     }
        
        response = requests.post(self.token_url, data=data, verify=False, proxies=self.proxy)
        response_data = response.json()
        return response_data.get("access_token")

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        """
        Args:
            prompt: The prompt to generate from.
            stop: Stop words to use when generating. Model output is cut off at the
                first occurrence of any of the stop substrings.
                If stop tokens are not supported consider raising NotImplementedError.
            run_manager: Callback manager for the run.
            **kwargs: Arbitrary additional keyword arguments. These are usually passed
                to the model provider API call.

        Returns:
            The model output as a string. Actual completions SHOULD NOT include the prompt.
        """
        
        msg=""

        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        
        # Get the access token
        access_token = self._get_access_token()
        # Set up the headers
        headers = {
         "Content-Type": "application/json",
         "Authorization": f"Bearer {access_token}",
         "ocp-Apim-Subscription-Key": "<LLM SERVER API KEY>", 
        }

               # Set up the payload
        query_data = {
         "messages": [
            {
                "role": "user",
                "content": f"{prompt}"
            }
        ],
         "model": "gpt3",
         "temperature": 0.5
        }

        # Send POST request
        response = requests.post(self.TryoutBaseURL, headers=headers, json=query_data, verify=False, proxies=self.proxy)
        # print(f"Status code: {response.status_code}")
        # Check if the request was successful
        if response.status_code == 200:
           # Parse the JSON response
           result = response.json()
           # print(f"JASON: {result}")

           # Print the assistant's response
           for message in result:
             if message["role"] == "assistant":
                msg=msg + message["content"]
        else:
            msg=(f"error: {response.status_code}:" + response.text)

        # print(f"MESSAGE: {msg}")
        return msg


    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """Return a dictionary of identifying parameters."""
        return {
            # The model name allows users to specify custom token counting
            # rules in LLM monitoring applications (e.g., in LangSmith users
            # can provide per token pricing for their model and monitor
            # costs for the given LLM.)
            "model_name": "myModel",
        }

    @property
    def _llm_type(self) -> str:
        """Get the type of language model used by this chat model. Used for logging purposes only."""
        return "MyModel"

Later I built the model provider for jupyter-ai assistant :

from jupyter_ai_magics import BaseProvider
from mymodule import serverLLM

class modelProvider(BaseProvider, serverLLM):

    id = "<id>"
    name = "<NAME>"
    model_id_key = "model"
    model_id = "<model id>"
    models = ["models"]
    def __init__(self, **kwargs):

        llm=serverLLM()
        super().__init__(**kwargs)

Later I built the packages and uploaded to our internal server.
Still need improvement but it is a good start.

cboettig · 2024-09-22T04:33:16Z

There's a lot going on in this still open thread, but maybe helpful to others to note that Ollama integration was added in #646.

zboinek added the enhancement New feature or request label Sep 13, 2023

mtekman mentioned this issue Sep 19, 2023

cannot instantiate local gpt4all model in chat #348

Open

krassowski mentioned this issue Oct 29, 2023

Document how to add custom model providers #420

Merged

surak mentioned this issue Jan 19, 2024

Support for locally hosted models #190

Closed

dlqqq mentioned this issue Mar 4, 2024

Self-hosted LLM support #661

Open

jmorganca mentioned this issue May 6, 2024

Integration with jupyter-ai ollama/ollama#1343

Closed

srdas mentioned this issue Jul 2, 2024

supporting local model for jupyter-ai with ollama #868

Closed

Custom local LLMs #389

Custom local LLMs #389

Comments

zboinek commented Sep 13, 2023

welcome bot commented Sep 13, 2023

mtekman commented Sep 18, 2023 • edited Loading

Getting it to work

text-generation-webui

First Time Install

Normal Run

(optional) Test that it's reachable

Jupyter AI

Run Jupyter

Jupyter AI with Stable-Vicuna

Limitations

krassowski commented Sep 18, 2023

mtekman commented Sep 19, 2023

ishaan-jaff commented Sep 22, 2023

Usage

mtekman commented Sep 25, 2023 • edited Loading

easp commented Oct 25, 2023 • edited Loading

jamesjun commented Dec 22, 2023

surak commented Jan 19, 2024

adaaaaaa commented Feb 3, 2024 • edited Loading

surak commented Feb 3, 2024

astrojuanlu commented Feb 5, 2024

adaaaaaa commented Feb 29, 2024 • edited Loading

mtekman commented Feb 29, 2024

imClumsyPanda commented Jul 2, 2024

mtekman commented Jul 2, 2024

imClumsyPanda commented Jul 2, 2024 • edited Loading

mtekman commented Jul 2, 2024

imClumsyPanda commented Jul 2, 2024

imClumsyPanda commented Jul 3, 2024

mtekman commented Jul 3, 2024 via email

krassowski commented Jul 3, 2024

dimm0 commented Jul 9, 2024 • edited Loading

mtekman commented Jul 9, 2024

dimm0 commented Jul 9, 2024 • edited Loading

mtekman commented Jul 9, 2024

dimm0 commented Jul 9, 2024

mtekman commented Jul 9, 2024

dimm0 commented Jul 9, 2024

mtekman commented Jul 9, 2024 • edited Loading

dimm0 commented Jul 9, 2024

DanielCastroBosch commented Jul 31, 2024

krassowski commented Jul 31, 2024

DanielCastroBosch commented Jul 31, 2024

jhgoebbert commented Aug 12, 2024

DanielCastroBosch commented Aug 13, 2024

cboettig commented Sep 22, 2024

mtekman commented Sep 18, 2023 •

edited

Loading

mtekman commented Sep 25, 2023 •

edited

Loading

easp commented Oct 25, 2023 •

edited

Loading

adaaaaaa commented Feb 3, 2024 •

edited

Loading

adaaaaaa commented Feb 29, 2024 •

edited

Loading

imClumsyPanda commented Jul 2, 2024 •

edited

Loading

dimm0 commented Jul 9, 2024 •

edited

Loading

dimm0 commented Jul 9, 2024 •

edited

Loading

mtekman commented Jul 9, 2024 •

edited

Loading