-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
用BELLE-2/Belle-whisper-large-v2-zh识别中文音频,效果还不如Systran/faster-whisper-large-v2? #574
Comments
+1 |
根据上面结果,大概原因可能是使用belle-whisper没有做vad切分,所以都是按照最长30秒做的识别,这样有一定的影响。 |
belle-whisper转为fasterwhisper模型格式,请问这个怎么处理呢?有相关的技术资料吗? |
https://opennmt.net/CTranslate2/quantization.html#quantize-on-model-conversion |
但是whisper里默认是有vad的呀,你是指belle-whisper里把vad去掉了? |
你说的应该是 timestamps, belle-whisper 微调时没有进一步优化timestamp。如果需要timestamps需要在推理时主动打开。faster-whisper框架有vad,切分效果更好一些。所以建议用faster-whisper框架调用belle-whisper |
多谢大佬 我试试 |
我这边,使用 v2,v3转到 faster-whisper 的模型,好像也没有 vad 成功。 Name: whisperx Name: faster-whisper 测试用视频:https://www.youtube.com/watch?v=we8vNy6DYMI v2 偶尔还会出现乱码 model = WhisperModel(model_size, device="cuda", compute_type="float16") |
您好,我使用ct2-transformers-converter --model BELLE-2--Belle-whisper-large-v3-zh --output_dir BELLE-2--Belle-whisper-large-v3-zh-ct2 --copy_files preprocessor_config.json --quantization float16 这个命令将模型转换为faster-whisper格式,在加载模型时model = WhisperModel(model_size, device="cuda", compute_type="float16")提示错误:Max retries exceeded with url: /openai/whisper-tiny/resolve/main/tokenizer.json,请问为什么还要去huggingface.co下载这个tokenizer.json呀,正确的做法该怎么做呢,谢谢拉 |
请问你是怎么转的,我自己用命令行转没成功 |
你好,我现在也遇到了这个问题,转成fasterwhisper之后,设置vad无效,还是30s,请问你有解决这个问题吗 |
用whisperx,设置chunk_size可以指定vad的最大切分时长 |
作者您好,我用BELLE-2/Belle-whisper-large-v2-zh跑实验 效果还不如Systran/faster-whisper-large-v2
按道理在中文数据上finetune的模型性能 应该比fasterwhisiper的好才对
我用的测试音频文件在这里 https://drive.google.com/file/d/1UTGOlnc3c_5FDHv_hH3IyNgNjxHNKQkD/view?usp=sharing
我是这么用的
怎么才能弄出好的效果么
The text was updated successfully, but these errors were encountered: