Realtime transcribe by running whisper model on geo-distributed cloud.
This showcase demonstrates real-time speech-to-text transcription using the Whisper model. The model is deployed across geographically distributed cloud infrastructure to ensure optimal performance and low latency for users around the world.
Users are automatically directed to the most suitable backend server based on their location. To determine your assigned backend and hardware configuration, simply ping edgeai.yomo.dev
and check the returned IP address.
By leveraging this geographically distributed architecture, this showcase delivers fast, accurate, and reliable speech transcription for users globally.
To deploy this real-time speech transcription system on your own infrastructure, follow these steps:
- Start the frontend: Run
pnpm run dev
to launch the frontend application, which provides the interface for simultaneous interpretation. - Choose your backend: Backends are located in the
./backends/
directory and are built using YoMo. Each backend targets a specific type of AI infrastructure. - Select and run the appropriate backend script:
- for
Arm
based processors, run backends/whisper_cpp_arm_server.py to load whisper.cpp model. - for
NVidia
GPUs, run backends/whisper_nvidia_t4_server.py to load whisper model.
Please note: These instructions assume you have the necessary dependencies like Whisper, Whisper.cpp and YoMo Framework installed. Refer to the project documentation for further details.
follow instructions to run this demo on Arm-based processor dev machine.