Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add --stdin as an audio source so whisper_mic can be used over SSH #91

Open
ulfnic opened this issue Nov 29, 2024 · 1 comment

Comments

@ulfnic
Copy link

ulfnic commented Nov 29, 2024

Feature Request

Reasoning

Adding --stdin would allow the audio source and whisper_mic to occupy different systems so it can be used over SSH. This being a significant advantage if you're "good" hardware is not on the device you're using.

This sounds like a job better suited to plain whisper but it doesn't support continuous near-realtime interpretation which you need for mic input.

Implementation

Using ffmpeg to continuously output mic input as wav format into whisper_mic, readable as file: /dev/fd/0

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | whisper_mic --loop --model large-v3 --stdin

As an SSH command:

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | ssh USER@ADDRESS -- whisper_mic --loop --model large-v3 --stdin

As a possible leed speech_recognition has built-in support for reading from files though i'm not sure if it'll cooperate with continuous reading and interpretation.

Extras

Example of using stdout for "keyboard" typing

Above commands can be piped into a read loop that types continuous output between each newline:

# X11
... | while IFS= read -r; do xdotool type -- "${REPLY} "; done

# Wayland
... | while IFS= read -r; do wtype -- "${REPLY} "; done

For that to work well, in cli.py you'd need to add import sys so you can add sys.stdout.flush() under every print(result), then in utils.py have logging go to stderr:

from rich.console import Console
rich_handler = RichHandler(level=logging.INFO, rich_tracebacks=True, markup=True, console=Console(stderr=True))

Standalone example of the ffmpeg for illustration purposes

ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - > playable.wav
@q5sys
Copy link

q5sys commented Dec 23, 2024

This would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants