Skip to content

Latest commit

 

History

History
79 lines (63 loc) · 4.34 KB

README.md

File metadata and controls

79 lines (63 loc) · 4.34 KB

KATI-LLAMA (local large language model chat)

GitHub GitHub issues Downloads

KATI-LLAMA is an interface for chatting with Large Language Models on a private PC. The Language Model can be downloaded automatically in the settings and then used offline. The KATI application allows the user to communicate with an AI in a human-like manner. The AI's responses can be output with a natural voice and the AI's avatar image changes appearance depending on the chatbot's mood. Below is a summary of the features of KATI-LLAMA.

GitHub release (with filter)

Key features of KATI:

  • Talk to AI without an internet connection
  • Optional voice output with a voice pre-installed in the operating system or a natural-sounding TikTok voice. (The TikTok voice requires an internet connection)
  • Voice input (System Speech or Whisper)
  • Dynamic avatar images to represent AI emotions.
  • Chat history with filter function and read-aloud function.
  • Rating function for AI responses as an aid to the filter function
  • Reduce wait times by streaming responses directly. (If the read-aloud function is active, the output only happens when the sentence is complete)
  • Text and code are formatted for better readability.
  • Multilingual user interface (DE, EN, FR, ES, PT, JA, KO)

preview

preview2

preview3

Demonstration Video

Nuget packages and associated licenses used in KATI

Next milestone (research)

  • Add more Language Models in the settings for download.

Known bugs that will be fixed soon

  • Can't find any bugs yet :)

Performance issues

  • Depending on the model configured, more or less RAM and processor power are required. This can affect the performance of the AI's responses. Try a smaller model and see if the AI responds faster. Keep in mind that the smaller the model, the lower the quality of the responses.
  • Slow output may also be due to the configured processor setting. AVX is very slow, but it is mostly supported by older processors. With AVX2, the latency is significantly lower, but not all processors support it. Try chatting with AVX2 to see if it works for you.
  • If the read-aloud function is enabled, the program waits to output until a complete sentence is available. To minimize the response time, you can disable the audio output, then the response text will be streamed without interruption.
  • The AI sometimes takes longer to answer in general if it finds little information about a question. In this case, you can try to cancel the chat session and rephrase the question