You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding --stdin would allow the audio source and whisper_mic to occupy different systems so it can be used over SSH. This being a significant advantage if you're "good" hardware is not on the device you're using.
This sounds like a job better suited to plain whisper but it doesn't support continuous near-realtime interpretation which you need for mic input.
Implementation
Using ffmpeg to continuously output mic input as wav format into whisper_mic, readable as file: /dev/fd/0
As a possible leed speech_recognition has built-in support for reading from files though i'm not sure if it'll cooperate with continuous reading and interpretation.
Extras
Example of using stdout for "keyboard" typing
Above commands can be piped into a read loop that types continuous output between each newline:
For that to work well, in cli.py you'd need to add import sys so you can add sys.stdout.flush() under every print(result), then in utils.py have logging go to stderr:
from rich.console import Console
rich_handler = RichHandler(level=logging.INFO, rich_tracebacks=True, markup=True, console=Console(stderr=True))
Standalone example of the ffmpeg for illustration purposes
Feature Request
Reasoning
Adding
--stdin
would allow the audio source andwhisper_mic
to occupy different systems so it can be used over SSH. This being a significant advantage if you're "good" hardware is not on the device you're using.This sounds like a job better suited to plain
whisper
but it doesn't support continuous near-realtime interpretation which you need for mic input.Implementation
Using
ffmpeg
to continuously output mic input as wav format into whisper_mic, readable as file: /dev/fd/0ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | whisper_mic --loop --model large-v3 --stdin
As an SSH command:
ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - | ssh USER@ADDRESS -- whisper_mic --loop --model large-v3 --stdin
As a possible leed
speech_recognition
has built-in support for reading from files though i'm not sure if it'll cooperate with continuous reading and interpretation.Extras
Example of using stdout for "keyboard" typing
Above commands can be piped into a read loop that types continuous output between each newline:
For that to work well, in cli.py you'd need to add
import sys
so you can addsys.stdout.flush()
under everyprint(result)
, then in utils.py have logging go tostderr
:Standalone example of the ffmpeg for illustration purposes
ffmpeg -f pulse -i default -f wav -ac 1 -ar 44100 - > playable.wav
The text was updated successfully, but these errors were encountered: