Repetitive Phrase Looping #171

noahuser · 2024-02-11T11:06:43Z

I've been using Whisper-timestamped for some time and it worked flawlessly. However, after a few months during which I updated my Mac to Sonoma, I've encountered a recurring issue upon returning to use the tool. The transcription process appears to proceed normally, with the loading bar reaching 100% as expected. Yet, at a certain point, the transcription process gets stuck and begins looping the same sentence over and over again until the end of the audio file. For instance, at 00:38:03, it transcribes a sentence and then repeats this sentence in a loop until 01:30:03, which is when the audio ends. Initially, I suspected an issue with the audio file itself, but the problem persists across different audio files, including one that was previously transcribed perfectly a few months ago. Interestingly, the exact timing of when the loop starts varies with each attempt. I am at a loss on how to resolve this issue. Does anyone have any suggestions or insights?

I have already tried to enable VAD, nothing changed. I already tried to uninstall and reinstall whisper-timestamped, nothing changed.

Jeronymous · 2024-02-26T19:02:09Z

This seems to be a duplicate of #94
You have some suggestion there.

Repetitions are due to model hallucination.
Which model are you using?

noahuser · 2024-02-26T20:29:15Z

I am using the large model. I already tried everything in #94. I have two MacBooks, one Intel i7 the other one M2 Pro. I tried the same audio with the Intel one, it functions perfectly w/out any Issue. The one with M2 Pro does this hallucinations every time. In the M2 one I "solved" the problem putting this code on my Python (result = transcribe_timestamped(model, audio_file, beam_size=5, best_of=5, temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0))). But usually it takes 1 hour to transcribe the audio, with this kind of setting it actually functions very well, but it takes something like 6 hours pro audio. That's not a big problem, but when It functioned before in an hour it was really beautiful :)

Jeronymous · 2024-02-27T09:07:20Z

OK, when you say "I use the large model", you have to know there are several versions of the large model (now there are 3).
So if you use model = whisper.load_model("large") in your code, without specifying the version, that might load the latest version.
Which could explain why you experienced a difference of behaviour suddenly.
You can specify the exact version to use, with, e.g. model = whisper.load_model("large-v1") (or "large-v2" whatever was the last one when "it worked flawlessly")

bcicc · 2024-11-26T17:44:47Z

I was having this issue after switching to whisper-timestamped from whisperX and was confused by the frequency of hallucinations, but passing best_of=5, beam_size=5, temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0) to transcribe solved this for me. Apparently, WhisperX passes these by default:
https://github.com/m-bain/whisperX/blob/9e3a9e0e38fcec1304e1784381059a0e2c670be5/whisperx/asr.py#L300

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repetitive Phrase Looping #171

Repetitive Phrase Looping #171

noahuser commented Feb 11, 2024

Jeronymous commented Feb 26, 2024

noahuser commented Feb 26, 2024

Jeronymous commented Feb 27, 2024 •

edited

Loading

bcicc commented Nov 26, 2024

Repetitive Phrase Looping #171

Repetitive Phrase Looping #171

Comments

noahuser commented Feb 11, 2024

Jeronymous commented Feb 26, 2024

noahuser commented Feb 26, 2024

Jeronymous commented Feb 27, 2024 • edited Loading

bcicc commented Nov 26, 2024

Jeronymous commented Feb 27, 2024 •

edited

Loading