You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we approach running a Spike/PoC we want to break it down into steps.
Hypothesis
We believe that if we can create the ability to embed an CPU based LLM into the CLI we can enable users to get tailored support with running the OpenTDF platform. This support will enable them to deploy and administrate the platform quickly without needing specific guidance from a human.
The benefit of this approach is that it enables humans with limited knowledge to quickly learn how to do a process without having to invest vast quantities of time reading or scouring resources. This is especially true for platforms that have limited documentation and/or examples that may not fit the exact problem at hand. Additionally, this approach will satisfy the environmental constraints such as air-gapped environments, need-to-know limitations, and limited connectivity.
Currently ollama models are functional, and accessible from the otdfctl chat command.
Configurations are managed in a chat_config.json file located in the home directory, and are being loaded in via the chat_config.go file. This is temporary as there is certainly a more graceful way of storing and managing chat parameters. Currently the parameters are:
This could technically be model agnostic so long as it runs on the same port and url, and has the same REST-like structure for handling queries that ollama supports. Verbosity could also be changed to a string ("high", "med", "low") but it is simpler to start with a bool.
Currently, verbose controls if the entire sanitized prompt is shown to the user before a response. This can and should include much more especially during initial startup.
TODOs:
Graceful startup in initial loading of configs using chat_config.go
Graceful exits and additional error checks for if the model is not running or if there are other trivial issues
Test secondary model and refine configurations to make implementation more model-agnostic (Gemma, TinyChatEngine)
Organize sanitization prompts for different levels of user familiarity
Collect and disseminate generalized Q&As to quality-test our prompt engineering efforts (vibe-check the model and prompting for our use-case)
Open-ended: Investigate improved prompt engineering efforts for ollama models
Open-ended: Benchmark both 'performance' as well as speed between model types but more explicitly on prompt engineering efforts
As we approach running a Spike/PoC we want to break it down into steps.
Hypothesis
We believe that if we can create the ability to embed an CPU based LLM into the CLI we can enable users to get tailored support with running the OpenTDF platform. This support will enable them to deploy and administrate the platform quickly without needing specific guidance from a human.
The benefit of this approach is that it enables humans with limited knowledge to quickly learn how to do a process without having to invest vast quantities of time reading or scouring resources. This is especially true for platforms that have limited documentation and/or examples that may not fit the exact problem at hand. Additionally, this approach will satisfy the environmental constraints such as air-gapped environments, need-to-know limitations, and limited connectivity.
Solution
Implement a LLM solution based on the work of https://github.com/ollama/ollama to load a user provided pre-installed model.
Approach
The text was updated successfully, but these errors were encountered: