Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom local LLMs #389

Open
zboinek opened this issue Sep 13, 2023 · 37 comments
Open

Custom local LLMs #389

zboinek opened this issue Sep 13, 2023 · 37 comments
Labels
enhancement New feature or request

Comments

@zboinek
Copy link

zboinek commented Sep 13, 2023

What about custom/private LLMs. Will there be an option to use some of longchain local features like llama.cpp?

@zboinek zboinek added the enhancement New feature or request label Sep 13, 2023
@welcome
Copy link

welcome bot commented Sep 13, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@mtekman
Copy link

mtekman commented Sep 18, 2023

I quite like the idea of GPT4ALL, but unfortunately it seems to be a mostly CPU model (2 minutes for a single response using 36 cores!) and a GPU model is far away

One fantastic idea I've seen bouncing around is to use an existing local LLM webserver that is compliant with the OpenAI API. The text-generation-webui project has actually implemented an openai-extension for a lot of their models.

I've tested it and it seems to work (5 second responses on a 12GB VRAM using their 'stable-vicuna-13B-GPTQ' model!) but commands like /generate and /learn naturally are not really implemented.

Getting it to work

text-generation-webui

First Time Install

  micromamba create -n textgen python=3.10.9
  micromamba activate textgen
  ## Nvidia gpu stuff
  pip3 install torch torchvision torchaudio
  ## WebUI
  git clone https://github.com/oobabooga/text-generation-webui
  cd text-generation-webui
  pip install -r requirements.txt
  ## OpenAI extension
  cd extensions/openai
  pip install -r requirements.txt
  cd ../../
  python server.py --extensions openai --listen
  • Go to localhost:7860 → Models Tab
  • Put https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ into the download text box
  • Wait for it to download, then kill the server.

Normal Run

  micromamba activate textgen
  cd text-generation-webui
  ## Start the server, load the model, enable the OpenAI extension
  python server.py --model TheBloke_stable-vicuna-13B-GPTQ --extensions openai --listen
  • (you should see info about the OPENAI_BASE printed here)

(optional) Test that it's reachable

micromamba activate jupyterai  ## (optional, just ensure you have all the jupyter-ai libraries)
  • In Python:
      import os
      os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"
      os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"
      import openai

      response = openai.ChatCompletion.create(
        model="TheBloke_stable-vicuna-13B-GPTQ",
        messages = [{ 'role': 'system', 'content': "Answer in a consistent style." },
          {'role': 'user', 'content': "Teach me about patience."},
          {'role': 'assistant', 'content': "The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread."},
          {'role': 'user', 'content': "Teach me about the ocean."},
        ]
      )
      text = response['choices'][0]['message']['content']
      print(text)

Jupyter AI

Run Jupyter

micromamba activate jupyterai  ## (optional, just ensure you have all the jupyter-ai libraries)
jupyter-lab
  • Click on the AI tab → Settings Wheel:

Screenshot 2023-09-18 at 15-14-06 lab - JupyterLab

  • (where API key is the sk-111111111111111111111111111111111111111111111111 from before)

After that, save and it should just work!

Jupyter AI with Stable-Vicuna

(left: NVTOP showing realtime GPU usage, right: Jupyterlab)

test.mp4

Limitations

  • No /generate, /learn
  • Untested %%ai magic, since I use R and R does not seem to load the %% stuff.
  • There is no way to specify the model in the OpenAI settings.

Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?

As always, big thanks to the Jupyter team!

@krassowski
Copy link
Member

@mtekman as per #190 (comment) I wonder if the proxy option could help in your use case.

@mtekman
Copy link

mtekman commented Sep 19, 2023

@krassowski Hi, I've been reading through the comments in a few of those threads and I guess I'm still a little bit lost on what the proxy option does, compared to the base API url?

@ishaan-jaff
Copy link

Hi @mtekman @zboinek @krassowski I believe we can help with this issue. I’m the maintainer of LiteLLM https://github.com/BerriAI/litellm

TLDR:
We allow you to use any LLM as a drop in replacement for gpt-3.5-turbo.
You can use our proxy server or spin up your own proxy server using LiteLLM

Usage

This calls the provider API directly

from litellm import completion
import os
## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-key" # 
messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# falcon call
response = completion(model="falcon-40b", messages=messages)

# ollama call
response = completion(model="ollama/llama2", messages=messages)

@mtekman
Copy link

mtekman commented Sep 25, 2023

@ishaan-jaff If I was to use ollama, would this then natively support /generate, /learn, /ask directives with responses that JupyterAI could understand?

Edit: I just tested ollama (though not with litellm, which appears to be a paid cloud-based model similar to OpenAI? Happy to remove this statement if I'm wrong), and it doesn't seem to work with jupyterAI

git clone [email protected]:jmorganca/ollama.git
cd ollama/ollama
./ollama serve & ./ollama run llama2
## Downloads 3 GB model and runs it at  http://localhost:11434/api

The problem is that the API offered there (which has a /generate endpoint), does not seem to be compliant with OpenAI's API, so I'm getting no responses from Jupyter.

@easp
Copy link

easp commented Oct 25, 2023

Ollama makes it very easy to run a variety of models locally on MacOS, Windows (via WSL, and eventually natively) and Linux. It has automatic GPU support for Apple Silicon and NVidia (it's using llama.cpp under the covers. It provides its own API and is supported by Langchain.

It would be great to have support in jupyter-ai without having to setup an API-proxy like litellm -- no judgement on that project, its just that it seems like this would be supported using the existing langchain dependency.

@jamesjun
Copy link
Contributor

Untested %%ai magic, since I use R and R does not seem to load the %% stuff.

Try below to connect to a locally hosted model (I used textgen-web-ui):

%%ai chatgpt -m {"api_base":"http://127.0.0.1:5000/v1"}

@surak
Copy link

surak commented Jan 19, 2024

With regard to @mtekman 's comment, many other providers have a common provider called OpenAI API or equivalent. It uses the same "openai" python package, with the difference that it's possible to specify the endpoint and other parameters.

For example, this is how https://continue.dev exposes these providers:

Screenshot 2024-01-19 at 12 33 10

I am one of the Collaborators of FastChat, and we have it deployed in many places. This would be an invaluable addition to Jupyter-AI.

@adaaaaaa
Copy link

adaaaaaa commented Feb 3, 2024

os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"

the default base url is http://0.0.0.0:5000 , how can i change it ?

@surak
Copy link

surak commented Feb 3, 2024

os.environ['OPENAI_API_BASE']="http://0.0.0.0:5001/v1"

the default base url is http://0.0.0.0:5000 , how can i change it ?

That seems to be the issue of this bug report and that of #190 . You can't.

@astrojuanlu
Copy link

Ollama support being tracked in #482, LangChain SelfHostedHuggingFaceLLM in #343.

@adaaaaaa
Copy link

adaaaaaa commented Feb 29, 2024

      os.environ['OPENAI_API_KEY']="sk-111111111111111111111111111111111111111111111111"

is the setting necessary?

@imClumsyPanda
Copy link

Hi @mtekman , i cannot even make the ai tab setting page show as mentioned above when I run jupyterlab in an offline environment. Is there any way to solve this?

When i click ai chat tab, it shows a warning icon and says “There seems to be a problem with the chat backend, please look at the JupyterLab server logs or contact your administrator to correct this problem.”

Jupyter Lab version: 4.1.2
Jupyter AI version: 2.18.1

@mtekman
Copy link

mtekman commented Jul 2, 2024

@imClumsyPanda

Weird, it works fine for me -- though I'm using a newer Jupyter Lab.

## conda or mamba or micromamba, all the same
micromamba create -y -c conda-forge -n jupyterlabai \
  jupyterlab=4.2.3 jupyter-ai=2.18.1

## Activate the environment and run it
micromamba activate jupyterlabai
jupyter-lab

The chat window tab should appear in the interface

@imClumsyPanda
Copy link

imClumsyPanda commented Jul 2, 2024

@imClumsyPanda

Weird, it works fine for me -- though I'm using a newer Jupyter Lab.


## conda or mamba or micromamba, all the same

micromamba create -y -c conda-forge -n jupyterlabai \

  jupyterlab=4.2.3 jupyter-ai=2.18.1



## Activate the environment and run it

micromamba activate jupyterlabai

jupyter-lab

The chat window tab should appear in the interface

@mtekman I'm not sure if it's because I'm in an offline environment.

And I installed notebook and Jupyter-ai through pip.

@mtekman
Copy link

mtekman commented Jul 2, 2024

@imClumsyPanda

I'm not sure if it's because I'm in an offline environment.

jupyterlab by default is run on localhost (is that what you mean by offline?)

And I installed notebook and Jupyter-ai through pip.

pip might be fighting your system python libraries depending on how your PATH is defined.

To get around this, either try creating a mamba environment as defined in my last comment, OR, create a virtualenv using just python:

 ## create a new env
virtualenv jupyteraivenv 

## source "activate" it
source jupyteraivenv/bin/activate 

## Install the right versions
pip install jupyterlab==4.2.3 jupyter-ai==2.18.1

## Run it
jupyter-lab

Double check which jupyter-lab is being called, because maybe your system has one installed globally.

whereis jupyter-lab
## should give you a path like:
## /home/blah/blah/jupyteraivenv/bin/jupyter-lab

@imClumsyPanda
Copy link

@mtekman I mean I'm running jupyterlab and Jupyter-ai in an environment without internet connection.

I'll try to check again tomorrow, thanks for the reply!

@imClumsyPanda
Copy link

@mtekman I've tried to create a new Fonda EN's and pip installed jupyterlab-4.2.3 and jupyter-ai-1.18.1 and made sure Jupyter-lab command direct to the file in newly created env.

But still I've got the same error message with an error icon says "There seems to be a problem with the chat backend, please look at JupyerLab server logs or contact your administrator to correct the problem"

And this time I've noticed that there're error messages in cmd window, which says [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats refere=None or [W 2024-07-03 10:10:10 ServerApp] 404 GET /api/ai/chats?token=[secret] refere=None.

I'll check if I can change ai chat settings through source code to solve this.

@mtekman
Copy link

mtekman commented Jul 3, 2024 via email

@krassowski
Copy link
Member

@imClumsyPanda most likely the server extension of jupyter-ai fails to load for some reason specific your environment (e.g. conflicting version of a dependency). You would need to look at an initial portion of the log (maybe with --debug option). Also, checking output of pip check and jupyter server extension list can be helpful.

@dimm0
Copy link

dimm0 commented Jul 9, 2024

Hi all!
I'm lost in multiple issues tracking the problem.
Is there today a way to point jupyterai at a custom OpenAI-compatible API URL and specify an arbitrary model to use?
(I'm running LLama3 via H2Ogpt / VLLM)

@mtekman
Copy link

mtekman commented Jul 9, 2024

@dimm0 #389 (comment)

@dimm0
Copy link

dimm0 commented Jul 9, 2024

I saw it...

No /generate, /learn
Untested %%ai magic, since I use R and R does not seem to load the %% stuff.
There is no way to specify the model in the OpenAI settings.
Would it be possible to create a new dropdown item in Language Model called OpenAi :: Custom that would enable model selection, similar to the python example above?

(from that post)

It seems to still be not addressed

@mtekman
Copy link

mtekman commented Jul 9, 2024

You run your custom model, and then point to it via the "Base API URL", choosing an arbitrary model from the "Language Model" selection which your custom model should be API compatible with

@dimm0
Copy link

dimm0 commented Jul 9, 2024

It keeps saying "invalid api key". I tried it with a model having no api key and the one I know API key for.
But how does choosing the right model work? Will it query the list of available models from the endpoint?

@mtekman
Copy link

mtekman commented Jul 9, 2024

If you're using text-generation-ui, the API key seems to be hardcoded: #389 (comment)

But how does choosing the right model work?

My understanding of it is that you choose the OpenAI model from the dropdown that has all the endpoints you want. A little bit of trial and error is needed here, and nothing will work 100%.

Will it query the list of available models from the endpoint?

No, you literally offer a specific model at some address, and in the "Language Model" section you pick the closest OpenAI model that you think will be compatible with the endpoints for your model.

It will not consult the OpenAI servers, since you've overriden this with the "Base API url" setting

@dimm0
Copy link

dimm0 commented Jul 9, 2024

I'm not using the test-generation-ui, I'm using https://github.com/h2oai/h2ogpt that runs llama3 via vllm for me. It exposes the standard openai-compatible interface for me on an https port, and I can connect to it from multiple openai-compatible tools. The model is meta-llama/Meta-Llama-3-70B-Instruct. I can enable an API KEY or use it without any key. How can I add one to jupyter-ai?

@mtekman
Copy link

mtekman commented Jul 9, 2024

Hmm, tricky. Maybe Jupyter is expecting a specifically formatted API key? Perhaps try setting the API key in your custom model to that ridiculous sk-11111* one

@dimm0
Copy link

dimm0 commented Jul 9, 2024

That key is specific to the text-generation-ai service

So is there any issue in creating the common provider proposed in #389 (comment)?

@DanielCastroBosch
Copy link

Hi all,
Here we have a internal LLM server. It uses the chat GPT -3.5 turbo model.
In the network layer that I work we don´t have access to chat GPT internet servers, we must use our internal server.
I tried to change the the base API URL parameter to https:///v1/public/gpt3/chats/messages but it is not working. This URL works when I use CURL to test it sending messages to the server.
It returns a NotFoundError: Error code: 404 - {'statusCode': 404, 'message': 'Resource not found'} - error
Can anybody please help me? Is there another solution for me?
For example, develop some code to add a new item in Language Model drop down?

@krassowski
Copy link
Member

If it works from curly but not from the browser you may need to configure origin check on jupyter server.

@DanielCastroBosch
Copy link

@krassowski ... how can I do that?
Thanks in advance

@jhgoebbert
Copy link

FYI:
You can find our implementation of a custom model provider (using the OpenAI API) to include Blablador to jupyter-ai here:
https://github.com/FZJ-JSC/jupyter-ai-blablador
To be honest, it is just a tiny bit more than the template and still under development, but might be of help for someone.

@DanielCastroBosch
Copy link

Hi everybody,

I solved my problem and I will post here the solution for a Internal Local LLM server. I am a begginer, so any help will be appreciated. In my case, the domain user is authenticated in MS Entra. Thanks to your help I managed to solve it.

First I built what I call the model interface:

# mymodule.py

from typing import Any, Dict, Iterator, List, Mapping, Optional
from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.language_models.llms import LLM
from langchain_core.outputs import GenerationChunk
import urllib3
import requests
import json

class serverLLM(LLM):
    """A custom chat model that interacts with a internal llm server

    When contributing an implementation to LangChain, carefully document
    the model including the initialization parameters, include
    an example of how to initialize the model and include any relevant
    links to the underlying models documentation or API.

    Example:

        .. code-block:: python


            result = model.invoke([HumanMessage(content="hello")])
            result = model.batch([[HumanMessage(content="hello")],
                                 [HumanMessage(content="world")]])
    """

    token_url = "<MS ENTRA APP link>"
    client_id = "<CLIENT ID>"
    client_secret = "<CLIENT SECRET>"
    scope = "<LLM SERVER API SCOPE>"
    TryoutBaseURL = "<LLM SERVER BASE API URL>"
    proxy = { 'http': '<HTTP PROXY>', 'https': '<HTTPS PROXY>' }
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

    def _get_access_token(self):
        data = {
        "client_id": self.client_id,
        "scope": self.scope,
        "client_secret": self.client_secret,
        "grant_type": "client_credentials",
     }
        
        response = requests.post(self.token_url, data=data, verify=False, proxies=self.proxy)
        response_data = response.json()
        return response_data.get("access_token")

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        """
        Args:
            prompt: The prompt to generate from.
            stop: Stop words to use when generating. Model output is cut off at the
                first occurrence of any of the stop substrings.
                If stop tokens are not supported consider raising NotImplementedError.
            run_manager: Callback manager for the run.
            **kwargs: Arbitrary additional keyword arguments. These are usually passed
                to the model provider API call.

        Returns:
            The model output as a string. Actual completions SHOULD NOT include the prompt.
        """
        
        msg=""

        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        
        # Get the access token
        access_token = self._get_access_token()
        # Set up the headers
        headers = {
         "Content-Type": "application/json",
         "Authorization": f"Bearer {access_token}",
         "ocp-Apim-Subscription-Key": "<LLM SERVER API KEY>", 
        }

               # Set up the payload
        query_data = {
         "messages": [
            {
                "role": "user",
                "content": f"{prompt}"
            }
        ],
         "model": "gpt3",
         "temperature": 0.5
        }

        # Send POST request
        response = requests.post(self.TryoutBaseURL, headers=headers, json=query_data, verify=False, proxies=self.proxy)
        # print(f"Status code: {response.status_code}")
        # Check if the request was successful
        if response.status_code == 200:
           # Parse the JSON response
           result = response.json()
           # print(f"JASON: {result}")

           # Print the assistant's response
           for message in result:
             if message["role"] == "assistant":
                msg=msg + message["content"]
        else:
            msg=(f"error: {response.status_code}:" + response.text)

        # print(f"MESSAGE: {msg}")
        return msg


    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """Return a dictionary of identifying parameters."""
        return {
            # The model name allows users to specify custom token counting
            # rules in LLM monitoring applications (e.g., in LangSmith users
            # can provide per token pricing for their model and monitor
            # costs for the given LLM.)
            "model_name": "myModel",
        }

    @property
    def _llm_type(self) -> str:
        """Get the type of language model used by this chat model. Used for logging purposes only."""
        return "MyModel"

Later I built the model provider for jupyter-ai assistant :

from jupyter_ai_magics import BaseProvider
from mymodule import serverLLM

class modelProvider(BaseProvider, serverLLM):

    id = "<id>"
    name = "<NAME>"
    model_id_key = "model"
    model_id = "<model id>"
    models = ["models"]
    def __init__(self, **kwargs):

        llm=serverLLM()
        super().__init__(**kwargs)

Later I built the packages and uploaded to our internal server.
Still need improvement but it is a good start.

@cboettig
Copy link

There's a lot going on in this still open thread, but maybe helpful to others to note that Ollama integration was added in #646.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests