AI VoiceAssistant

AI VoiceAssistant is a Python-based voice assistant that combines speech-to-text (STT), text-to-speech (TTS), and either a locally hosted large language model (LLM) powered by llama.cpp or the OpenAI API. It provides a simple way to interact with AI through voice commands, leveraging clipboard context and hotkeys for smooth operation. It is specialized for shell commands and coding. It gathers system info (OS, shell, GPU, python version, home dir, etc.) to provide correct commands for environment it runs in.

Features

Speech-to-Text (STT): Converts spoken commands into text using a hotkey.
Flexible LLM Options:
- Local LLM via llama.cpp
- OpenAI API (requires API key; currently a To-Do to use environment variables for the API key)
Clipboard Integration: Use the clipboard as additional context for commands.
Hotkey-based Control:
- Start recording: CMD / WinKey / Super + Shift
- Execute the transcribed command: CMD / WinKey / Super + Control
- Cancel command execution: Speak the word Cancel.
Code Interaction: Refactor or optimize code by including clipboard content in commands when the word "buffer" is spoken.
Memory: Option to enable or disable LLM memory via try icon menu. Useful when subsequent commands are needed with reference to command-response history.

Installation

Clone the repository:

git clone https://github.com/yourusername/AI-VoiceAssistant.git
cd AI-VoiceAssistant

Install dependencies:
```
pip install -r requirements.txt
```
Option 1: Set up Llama.cpp:
- Follow the instructions on the llama.cpp GitHub page to compile and set up the LLM server.
- Download the required LLM model from HuggingFace in GGUF format and place it in any directory. I recommend some of Qwen2.5-Coder-Instruct models (https://huggingface.co/bartowski?search_models=Qwen2.5-Coder).
- Start the Llama.cpp server:
```
./llama-server --model /path/to/your/model
```
- If possible use FlashAttention2 parameter (e.g. ./llama-server -m '/mnt/disk2/LLM_MODELS/models/Qwen2.5-Coder-14B-Instruct-Q8_0.gguf' -fa -ngl 99 ) for faster inference (see instructions in llama.cpp repo)
Option 2: Use OpenAI API:
- Obtain an OpenAI API key from OpenAI.
- Modify the code to input your API key when prompted. (ToDo: enable passing the API key via an environment variable.)
Run the Voice Assistant:
```
python main.py
```

Usage Instructions

Hotkey Functions

Start Recording Speech: Press CMD / WinKey / Super + Shift.
- Speak your command. The assistant will transcribe it and display the text in real time.
Execute the Command: Press CMD / WinKey / Super + Control.
- If the word "Cancel" is detected, the command will not execute.
- If the first word spoken is "buffer", clipboard content will be included in the prompt sent to the LLM.

Example Commands

General Commands:
- "Extract the audio from a video file (input.mp4) and save it as an MP3 file."
- "Cancel." (aborts execution)
Programming:
- Copy some code to the clipboard and say:
  - "Write a function to generate a report (in JSON format) summarizing disk usage statistics."
  - "Buffer. Optimize this code."
  - "Buffer. Refactor the code to improve readability."

Transcription floating window:

Enable/disable memory:

Demo

Short demonstration recorded in real time (RTX3090): https://youtu.be/UB_ZXU_a0xY

Notes

Ensure the Llama.cpp server is running before starting the Python script if using the local LLM.
If using OpenAI API, ensure the API key is correctly set.

To-Do

Add an option to pass the OpenAI API key as an argument or environment variable for improved security and ease of use.

Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve the project.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
AI1b.ico		AI1b.ico
LICENSE		LICENSE
README.md		README.md
llm_module.py		llm_module.py
main.py		main.py
requirements.txt		requirements.txt
stt_module.py		stt_module.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI VoiceAssistant

Features

Installation

Usage Instructions

Hotkey Functions

Example Commands

Demo

Notes

To-Do

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

nmandic78/AI-VoiceAssistant

Folders and files

Latest commit

History

Repository files navigation

AI VoiceAssistant

Features

Installation

Usage Instructions

Hotkey Functions

Example Commands

Demo

Notes

To-Do

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages