Transcribe apps audio #52

thewh1teagle · 2024-04-17T00:21:40Z

Goal

Transcribe system audio / microphone (single or both) and preview it in realtime

Research

Possible to follow approaches in
https://github.com/CapSoftware/Cap

Useful Rust Crate
https://github.com/helmerapp/scap

Perhaps on:
macOS: https://github.com/svtlabs/screencapturekit-rs (screen capture kit)
Graphics.capture on Windows (https://github.com/NiiightmareXD/windows-capture)

macOS app which provides a way to capture system audio using ScreenCaptureKit API
https://github.com/Mnpn/Azayaka

Microsoft answer for how audacity manage to record audio from speakers (TLDR: Windows WASAPI)
https://answers.microsoft.com/en-us/windows/forum/all/how-record-speaker-output-windows-10/251bb695-5170-4a35-a90f-42d9f6f3345a

MacOS sample
https://gist.github.com/thewh1teagle/d02415b9768fd816a780f9af6a3f2bdb

Some platforms provide virtual channels for monitoring (PulseAudio and PipeWire on Linux, WASAPI on Windows, Core Audio on macOS), though not all, and cpal does not expose them
(not sure on Core Audio actually, they might have disabled it or removed it for security reasons)

Loopback added to cpal
RustAudio/cpal#478 (working in windows)

Additional questions:
How to get system audio + microfone at the same time into single stream
Linux?

TLDR

Rust crate cpal provides a way to get audio stream from microfone(s)
On Windows it also provides audio stream from default output device (system audio)
On macOS we should use screencapturekit-rs and provide stream which is equivalent to cpal stream.

If two streams used, then mix them by adding both (simple addition to the sample(s) numbers works)
Push them to whisper in loop
Mixing can introduce synchronization issues (is it's two different sound cards etc) and RtAudio handle that better and possible to use through rtaudio-rs
whisper.cpp expects single channel (mono) 16khz rate and size of 16 bit
Probably need resampling, and converting to mono from stereo is by mean of both.

Simple approach

Record from speakers/mic concurrently and write to file every 5-10 at the best silent position
Write to queue of paths (each item will be one or two paths)
Another task which iterate the queue, merge if needed, and transcribe it.

https://github.com/ggerganov/whisper.cpp/tree/master/examples/stream#sliding-window-mode-with-vad

The text was updated successfully, but these errors were encountered:

quinn-eschenbach · 2024-04-23T17:12:20Z

Hey, I sent the message on the rust audio discord. I'm very to new to digital audio processing and rust, but happy to help wherever I can.

I've come to a very similar solutions on my research and even got a test app working using screencapture kit + whisper cpp.

To convert the audio for whisper, i use only the left channel and only took every 3rd frame from the buffer, but obviously thats a hack and it needs to be done properly.

Also found this example for handling streaming in the whisper cpp repo:
https://github.com/ggerganov/whisper.cpp/blob/master/examples/stream/stream.cpp

And another repo we could check is obs, they handle it well. I took a look but wasn't able to understand it (I'm a C noob).

Lets work on this together, really love the Idea of vibe!

thewh1teagle · 2024-04-29T13:56:44Z

@quinn-eschenbach

Sounds great!
Currently, the challenges I'm facing are:

Mixing two audio streams together (speaker + mic).
Getting an audio stream from ScreenCaptureKit and doing the same as in 1 with it.
I'm trying to solve 1 first; not sure how to do that with cpal.

TzahiS · 2024-05-07T22:36:52Z

I don't have knowledge of the subject but been following whisper related colabs and believe the following link might help
https://github.com/Sourasky-DHLAB/Whisper
Especially notebook 4.
https://github.com/Sourasky-DHLAB/Whisper/blob/main/Colab/Whisper_Speaker_Diarization.ipynb

thewh1teagle · 2024-06-07T00:29:08Z

I've made some progress; it will be added soon.

As starting point it will be possible by simply record mic / speakers / both. when finished it's just like transcribe any audio file in Vibe.

https://github.com/thewh1teagle/vibe/tree/feat/record

zmwangx/rust-ffmpeg#73 (For merging audio files after recording)

RustAudio/cpal#876 (Probably will be added soon for macOS)

https://discord.com/channels/548404410439696434/1248439946411249695

zmwangx/rust-ffmpeg#103

zmwangx/rust-ffmpeg#73

https://discord.com/channels/590254806208217089/590257558317695005/1248448876462211123

Merged. Just need to add audio merging support with ffmpeg and in future to update cpal with screencapturekit

thewh1teagle · 2024-06-07T21:31:02Z

Added in 2.0.2

thewh1teagle added this to Vibe Roadmap 🚀 Jan 24, 2024

thewh1teagle converted this from a draft issue Apr 17, 2024

thewh1teagle mentioned this issue May 6, 2024

Transcribe several files one by one #59

Closed

thewh1teagle added the needs research label May 7, 2024

thewh1teagle moved this from Planning to Building in Vibe Roadmap 🚀 May 24, 2024

thewh1teagle moved this from Building to Planning in Vibe Roadmap 🚀 Jun 6, 2024

thewh1teagle closed this as completed Jun 7, 2024

github-project-automation bot moved this from Planning to Released in Vibe Roadmap 🚀 Jun 7, 2024

sammcj mentioned this issue Aug 8, 2024

[Feature Request]: Record and transcribe specific app / output audio and mic #211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe apps audio #52

Transcribe apps audio #52

thewh1teagle commented Apr 17, 2024 •

edited

Loading

quinn-eschenbach commented Apr 23, 2024

thewh1teagle commented Apr 29, 2024

TzahiS commented May 7, 2024

thewh1teagle commented Jun 7, 2024 •

edited

Loading

thewh1teagle commented Jun 7, 2024

Transcribe apps audio #52

Transcribe apps audio #52

Comments

thewh1teagle commented Apr 17, 2024 • edited Loading

Goal

Research

TLDR

Simple approach

quinn-eschenbach commented Apr 23, 2024

thewh1teagle commented Apr 29, 2024

TzahiS commented May 7, 2024

thewh1teagle commented Jun 7, 2024 • edited Loading

thewh1teagle commented Jun 7, 2024

thewh1teagle commented Apr 17, 2024 •

edited

Loading

thewh1teagle commented Jun 7, 2024 •

edited

Loading