Whisper

Whisper.jl is a Julia package for automatic speech recognition, based on OpenAI's Whisper model. This package wraps the whisper.cpp code, which is a C/C++ implementation of the model. It uses the model weights which were published by OpenAI. The weights are downloaded on demand.

Quick Start Guide

The transcribe function is the simplest way to run speech recognition on an audio signal. It takes as input the model name, and a float32 array with the sound signal. The signal is expected to be sampled at 16kHz.

using Whisper, LibSndFile, FileIO, SampledSignals

s = load("/path/to/audio_file.ogg")  

# Whisper expects 16kHz sample rate and Float32 data
sout = SampleBuf(Float32, 16000, round(Int, length(s)*(16000/samplerate(s))), nchannels(s))  
write(SampleBufSink(sout), SampleBufSource(s))  # Resample

if nchannels(sout) == 1
    data  = sout.data
elseif nchannels(sout) == 2
    sd = sout.data
    data = [sd[i,1] + sd[i,2] for i in 1:size(sd)[1]] #convert stereo to mono
end

result = transcribe("base.en", data)

For more control, the whisper.cpp C interface is available within the Whisper.LibWhisper module.

Models

The following models are available, and should be referenced by name. The models with the .en suffix are specialised for the english language.

Model	Disk	Mem
tiny / tiny.en	75 MB	~125 MB
base / base.en	142 MB	~210 MB
small / small.en	466 MB	~600 MB
medium / medium.en	1.5 GB	~1.7 GB
large-v1 / large.en	2.9 GB	~3.3 GB

TODO

The biggest missing functionality is a streaming interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Whisper

Quick Start Guide

Models

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

Whisper

Quick Start Guide

Models

TODO