Spike/PoC utilizing local LLM to enable better understanding OpenTDF #239

jrschumacher · 2024-07-25T16:39:25Z

As we approach running a Spike/PoC we want to break it down into steps.

Hypothesis

We believe that if we can create the ability to embed an CPU based LLM into the CLI we can enable users to get tailored support with running the OpenTDF platform. This support will enable them to deploy and administrate the platform quickly without needing specific guidance from a human.

The benefit of this approach is that it enables humans with limited knowledge to quickly learn how to do a process without having to invest vast quantities of time reading or scouring resources. This is especially true for platforms that have limited documentation and/or examples that may not fit the exact problem at hand. Additionally, this approach will satisfy the environmental constraints such as air-gapped environments, need-to-know limitations, and limited connectivity.

Solution

Implement a LLM solution based on the work of https://github.com/ollama/ollama to load a user provided pre-installed model.

Approach

Utilizing langchaingo and ollama get a chat interface working in the CLI
Implement some simple prompt-engineering to focus the LLM
Investigate RAG with an embedded vector db

andrewrust-virtru · 2024-07-30T11:52:52Z

Current Status

Currently ollama models are functional, and accessible from the otdfctl chat command.

Configurations are managed in a chat_config.json file located in the home directory, and are being loaded in via the chat_config.go file. This is temporary as there is certainly a more graceful way of storing and managing chat parameters. Currently the parameters are:

{
    "model": "llama3",
    "verbose": true,
    "apiURL": "http://localhost:11434/api/generate"
}

This could technically be model agnostic so long as it runs on the same port and url, and has the same REST-like structure for handling queries that ollama supports. Verbosity could also be changed to a string ("high", "med", "low") but it is simpler to start with a bool.

Currently, verbose controls if the entire sanitized prompt is shown to the user before a response. This can and should include much more especially during initial startup.

TODOs:

Graceful startup in initial loading of configs using chat_config.go
Graceful exits and additional error checks for if the model is not running or if there are other trivial issues
Test secondary model and refine configurations to make implementation more model-agnostic (Gemma, TinyChatEngine)
Organize sanitization prompts for different levels of user familiarity
Collect and disseminate generalized Q&As to quality-test our prompt engineering efforts (vibe-check the model and prompting for our use-case)
Open-ended: Investigate improved prompt engineering efforts for ollama models
Open-ended: Benchmark both 'performance' as well as speed between model types but more explicitly on prompt engineering efforts

Links to explore for improved prompt engineering:
llama2 prompt engineering
How to prompt llama3

jrschumacher linked a pull request Jul 25, 2024 that will close this issue

feat(core): llm integration for cli #236

Draft

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike/PoC utilizing local LLM to enable better understanding OpenTDF #239

Spike/PoC utilizing local LLM to enable better understanding OpenTDF #239

jrschumacher commented Jul 25, 2024

andrewrust-virtru commented Jul 30, 2024 •

edited

Loading

Spike/PoC utilizing local LLM to enable better understanding OpenTDF #239

Spike/PoC utilizing local LLM to enable better understanding OpenTDF #239

Comments

jrschumacher commented Jul 25, 2024

Hypothesis

Solution

Approach

andrewrust-virtru commented Jul 30, 2024 • edited Loading

Current Status

andrewrust-virtru commented Jul 30, 2024 •

edited

Loading