Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Says WAV file is valid, then later says it's invalid? #113

Open
binarykitchen opened this issue Sep 25, 2024 · 2 comments
Open

Says WAV file is valid, then later says it's invalid? #113

binarykitchen opened this issue Sep 25, 2024 · 2 comments

Comments

@binarykitchen
Copy link
Contributor

Running your latest version on ArchLinux.

nodejs-whisper says the WAV file is valid, but later the native whisper instance says it's not. Huh?

[dev:server] [Nodejs-whisper] File is a valid WAV file.

And later it says:

[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'

Here are the details from the logs:

[dev:server] DEBUG: »»-----------------------------------------►
[dev:server] [Nodejs-whisper] Checking and downloading model if needed: base
[dev:server] autoDownloadModelName base
[dev:server] options {
[dev:server]   modelName: 'base',
[dev:server]   autoDownloadModelName: 'base',
[dev:server]   verbose: true,
[dev:server]   removeWavFileAfterTranscription: false,
[dev:server]   whisperOptions: { outputInVtt: true }
[dev:server] }
[dev:server] [Nodejs-whisper] Models already exist. Skipping download.
[dev:server] [Nodejs-whisper] Checking file existence: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Converting file to WAV format: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Checking if the file is a valid WAV: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] File is a valid WAV file.
[dev:server] [Nodejs-whisper] Constructing command for file: /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] [Nodejs-whisper] Executing command: ./main  -ovtt -l auto -m ./models/ggml-base.bin  -f /home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav
[dev:server] code--- 0
[dev:server] stdout--- 
[dev:server] stderr--- whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.bin'
[dev:server] whisper_model_load: loading model
[dev:server] whisper_model_load: n_vocab       = 51865
[dev:server] whisper_model_load: n_audio_ctx   = 1500
[dev:server] whisper_model_load: n_audio_state = 512
[dev:server] whisper_model_load: n_audio_head  = 8
[dev:server] whisper_model_load: n_audio_layer = 6
[dev:server] whisper_model_load: n_text_ctx    = 448
[dev:server] whisper_model_load: n_text_state  = 512
[dev:server] whisper_model_load: n_text_head   = 8
[dev:server] whisper_model_load: n_text_layer  = 6
[dev:server] whisper_model_load: n_mels        = 80
[dev:server] whisper_model_load: ftype         = 1
[dev:server] whisper_model_load: qntvr         = 0
[dev:server] whisper_model_load: type          = 2 (base)
[dev:server] whisper_model_load: adding 1608 extra tokens
[dev:server] whisper_model_load: n_langs       = 99
[dev:server] whisper_model_load:      CPU total size =   147.37 MB
[dev:server] whisper_model_load: model size    =  147.37 MB
[dev:server] whisper_init_state: kv self size  =   16.52 MB
[dev:server] whisper_init_state: kv cross size =   18.43 MB
[dev:server] whisper_init_state: compute buffer (conv)   =   16.39 MB
[dev:server] whisper_init_state: compute buffer (encode) =  132.07 MB
[dev:server] whisper_init_state: compute buffer (cross)  =    4.78 MB
[dev:server] whisper_init_state: compute buffer (decode) =   96.48 MB
[dev:server] read_wav: WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav' must be 16 kHz
[dev:server] error: failed to read WAV file '/home/michael-heuberger/code/binarykitchen/videomail.io/var/local/tmp/clients/videomail.io/1ef7ae52-7eab-6f50-8362-05f8c267a8f2/videomail_preview.wav'
[dev:server] 
[dev:server] whisper_print_timings:     load time =   306.03 ms
[dev:server] whisper_print_timings:     fallbacks =   0 p /   0 h
[dev:server] whisper_print_timings:      mel time =     0.00 ms
[dev:server] whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   encode time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   batchd time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
[dev:server] whisper_print_timings:    total time =   312.29 ms
[dev:server] 
[dev:server] stdout--- 
[dev:server] [Nodejs-whisper] Transcribing Done!
[dev:server] [Nodejs-whisper] Error during processing: Transcription failed or produced no output.

Any ideas what this could be?

Thanks!

@binarykitchen
Copy link
Contributor Author

I think it's because the input sample rate is at 48kHz, while whisper expects it to be at 16 kHz. That said, you should also check the sample rate.

@ChetanXpro
Copy link
Owner

Yeah i think its due to sample rate, i will look into this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants