Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repetitive Phrase Looping #171

Open
noahuser opened this issue Feb 11, 2024 · 4 comments
Open

Repetitive Phrase Looping #171

noahuser opened this issue Feb 11, 2024 · 4 comments

Comments

@noahuser
Copy link

I've been using Whisper-timestamped for some time and it worked flawlessly. However, after a few months during which I updated my Mac to Sonoma, I've encountered a recurring issue upon returning to use the tool. The transcription process appears to proceed normally, with the loading bar reaching 100% as expected. Yet, at a certain point, the transcription process gets stuck and begins looping the same sentence over and over again until the end of the audio file. For instance, at 00:38:03, it transcribes a sentence and then repeats this sentence in a loop until 01:30:03, which is when the audio ends. Initially, I suspected an issue with the audio file itself, but the problem persists across different audio files, including one that was previously transcribed perfectly a few months ago. Interestingly, the exact timing of when the loop starts varies with each attempt. I am at a loss on how to resolve this issue. Does anyone have any suggestions or insights?

I have already tried to enable VAD, nothing changed. I already tried to uninstall and reinstall whisper-timestamped, nothing changed.

@Jeronymous
Copy link
Member

This seems to be a duplicate of #94
You have some suggestion there.

Repetitions are due to model hallucination.
Which model are you using?

@noahuser
Copy link
Author

I am using the large model. I already tried everything in #94. I have two MacBooks, one Intel i7 the other one M2 Pro. I tried the same audio with the Intel one, it functions perfectly w/out any Issue. The one with M2 Pro does this hallucinations every time. In the M2 one I "solved" the problem putting this code on my Python (result = transcribe_timestamped(model, audio_file, beam_size=5, best_of=5, temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0))). But usually it takes 1 hour to transcribe the audio, with this kind of setting it actually functions very well, but it takes something like 6 hours pro audio. That's not a big problem, but when It functioned before in an hour it was really beautiful :)

@Jeronymous
Copy link
Member

Jeronymous commented Feb 27, 2024

OK, when you say "I use the large model", you have to know there are several versions of the large model (now there are 3).
So if you use model = whisper.load_model("large") in your code, without specifying the version, that might load the latest version.
Which could explain why you experienced a difference of behaviour suddenly.
You can specify the exact version to use, with, e.g. model = whisper.load_model("large-v1") (or "large-v2" whatever was the last one when "it worked flawlessly")

@bcicc
Copy link

bcicc commented Nov 26, 2024

I was having this issue after switching to whisper-timestamped from whisperX and was confused by the frequency of hallucinations, but passing best_of=5, beam_size=5, temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0) to transcribe solved this for me. Apparently, WhisperX passes these by default:
https://github.com/m-bain/whisperX/blob/9e3a9e0e38fcec1304e1784381059a0e2c670be5/whisperx/asr.py#L300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants