Feature Request: add --stdin as an audio source so whisper_mic can be used over SSH #91

ulfnic · 2024-11-29T02:51:33Z

Feature Request

Reasoning

Adding --stdin would allow the audio source and whisper_mic to occupy different systems so it can be used over SSH. This being a significant advantage if you're "good" hardware is not on the device you're using.

This sounds like a job better suited to plain whisper but it doesn't support continuous near-realtime interpretation which you need for mic input.

Implementation

Using ffmpeg to continuously output mic input as wav format into whisper_mic, readable as file: /dev/fd/0

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | whisper_mic --loop --model large-v3 --stdin

As an SSH command:

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | ssh USER@ADDRESS -- whisper_mic --loop --model large-v3 --stdin

As a possible leed speech_recognition has built-in support for reading from files though i'm not sure if it'll cooperate with continuous reading and interpretation.

Extras

Example of using stdout for "keyboard" typing

Above commands can be piped into a read loop that types continuous output between each newline:

# X11
... | while IFS= read -r; do xdotool type -- "${REPLY} "; done

# Wayland
... | while IFS= read -r; do wtype -- "${REPLY} "; done

For that to work well, in cli.py you'd need to add import sys so you can add sys.stdout.flush() under every print(result), then in utils.py have logging go to stderr:

from rich.console import Console
rich_handler = RichHandler(level=logging.INFO, rich_tracebacks=True, markup=True, console=Console(stderr=True))

Standalone example of the ffmpeg for illustration purposes

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - > playable.wav

The text was updated successfully, but these errors were encountered:

q5sys · 2024-12-23T04:13:45Z

This would be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: add --stdin as an audio source so whisper_mic can be used over SSH #91

Feature Request: add --stdin as an audio source so whisper_mic can be used over SSH #91

ulfnic commented Nov 29, 2024 •

edited

Loading

q5sys commented Dec 23, 2024

Feature Request: add --stdin as an audio source so whisper_mic can be used over SSH #91

Feature Request: add --stdin as an audio source so whisper_mic can be used over SSH #91

Comments

ulfnic commented Nov 29, 2024 • edited Loading

Feature Request

Reasoning

Implementation

Extras

Example of using stdout for "keyboard" typing

Standalone example of the ffmpeg for illustration purposes

q5sys commented Dec 23, 2024

ulfnic commented Nov 29, 2024 •

edited

Loading