diff --git a/README.md b/README.md index 90c1e36..fd9d143 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ LinTO-STT is an API for Automatic Speech Recognition (ASR). LinTO-STT can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector. +It can be used to do offline or real-time transcriptions. + The following families of STT models are currently supported (please refer to respective documentation for more details): * [Kaldi models](kaldi/README.md) * [Whisper models](whisper/README.md) diff --git a/kaldi/README.md b/kaldi/README.md index ee5e222..c7bd22b 100644 --- a/kaldi/README.md +++ b/kaldi/README.md @@ -4,6 +4,8 @@ LinTO-STT-Kaldi is an API for Automatic Speech Recognition (ASR) based on models LinTO-STT-Kaldi can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector. +It can be used to do offline or real-time transcriptions. + ## Pre-requisites ### Hardware @@ -46,11 +48,9 @@ docker pull lintoai/linto-stt-kaldi Have the acoustic and language model ready at AM_PATH and LM_PATH if you are using LinTO models. If you are using a Vosk model, have it ready at MODEL. -**3- Fill the .env** +**3- Fill the .env file** -```bash -cp kaldi/.envdefault kaldi/.env -``` +An example of .env file is provided in [kaldi/.envdefault](https://github.com/linto-ai/linto-stt/blob/master/kaldi/.envdefault). | PARAMETER | DESCRIPTION | EXEMPLE | |---|---|---| @@ -85,7 +85,7 @@ docker run --rm \ -p HOST_SERVING_PORT:80 \ -v AM_PATH:/opt/AM \ -v LM_PATH:/opt/LM \ ---env-file kaldi/.env \ +--env-file .env \ linto-stt-kaldi:latest ``` @@ -111,7 +111,7 @@ docker run --rm \ -v AM_PATH:/opt/AM \ -v LM_PATH:/opt/LM \ -v SHARED_AUDIO_FOLDER:/opt/audio \ ---env-file kaldi/.env \ +--env-file .env \ linto-stt-kaldi:latest ``` diff --git a/whisper/.envdefault b/whisper/.envdefault index a8f8794..126fd51 100644 --- a/whisper/.envdefault +++ b/whisper/.envdefault @@ -55,7 +55,7 @@ PROMPT= # CUDA_VISIBLE_DEVICES=0 # Number of threads per worker when running on CPU -NUM_THREADS=4 +# NUM_THREADS=4 # Number of workers minus one (all except from the main one) CONCURRENCY=2 diff --git a/whisper/README.md b/whisper/README.md index b4f1e3b..07a1a53 100644 --- a/whisper/README.md +++ b/whisper/README.md @@ -2,7 +2,9 @@ LinTO-STT-Whisper is an API for Automatic Speech Recognition (ASR) based on [Whisper models](https://openai.com/research/whisper). -LinTO-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector. +LinTO-STT-Whisper can either be used as a standalone transcription service or deployed within a micro-services infrastructure using a message broker connector. + +It can be used to do offline or real-time transcriptions. ## Pre-requisites @@ -106,11 +108,9 @@ or docker pull lintoai/linto-stt-whisper ``` -### 2- Fill the .env +### 2- Fill the .env file -```bash -cp whisper/.envdefault whisper/.env -``` +An example of .env file is provided in [whisper/.envdefault](https://github.com/linto-ai/linto-stt/blob/master/whisper/.envdefault). | PARAMETER | DESCRIPTION | EXEMPLE | |---|---|---| @@ -184,7 +184,7 @@ yo(yoruba), zh(chinese) ``` and also `yue(cantonese)` since large-v3. -### Serving mode +#### SERVING_MODE ![Serving Modes](https://i.ibb.co/qrtv3Z6/platform-stt.png) STT can be used in two ways: @@ -195,6 +195,7 @@ Mode is specified using the .env value or environment variable ```SERVING_MODE`` ```bash SERVICE_MODE=http ``` + ### HTTP Server The HTTP serving mode deploys a HTTP server and a swagger-ui to allow transcription request on a dedicated route. @@ -203,7 +204,7 @@ The SERVICE_MODE value in the .env should be set to ```http```. ```bash docker run --rm \ -p HOST_SERVING_PORT:80 \ ---env-file whisper/.env \ +--env-file .env \ linto-stt-whisper:latest ``` @@ -236,7 +237,7 @@ You need a message broker up and running at MY_SERVICE_BROKER. ```bash docker run --rm \ -v SHARED_AUDIO_FOLDER:/opt/audio \ ---env-file whisper/.env \ +--env-file .env \ linto-stt-whisper:latest ``` @@ -371,4 +372,4 @@ This project is developped under the AGPLv3 License (see LICENSE). * [HuggingFace Transformers](https://github.com/huggingface/transformers) * [SpeechBrain](https://github.com/speechbrain/speechbrain) * [TorchAudio](https://github.com/pytorch/audio) -* [Whisper_Streaming](https://github.com/ufal/whisper_streaming) \ No newline at end of file +* [Whisper_Streaming](https://github.com/ufal/whisper_streaming) diff --git a/whisper/stt/__init__.py b/whisper/stt/__init__.py index 8bac458..767f800 100644 --- a/whisper/stt/__init__.py +++ b/whisper/stt/__init__.py @@ -23,9 +23,6 @@ VAD_MIN_SPEECH_DURATION = float(os.environ.get("VAD_MIN_SPEECH_DURATION", 0.1)) VAD_MIN_SILENCE_DURATION = float(os.environ.get("VAD_MAX_SILENCE_DURATION", 0.1)) -NUM_THREADS = os.environ.get("NUM_THREADS", os.environ.get("OMP_NUM_THREADS")) -NUM_THREADS = int(NUM_THREADS) - try: import faster_whisper @@ -55,6 +52,7 @@ def set_num_threads(n): # os.environ["OMP_NUM_THREADS"] = str(n) pass + DEFAULT_NUM_THREADS = None else: import torch DEFAULT_NUM_THREADS = torch.get_num_threads() @@ -62,6 +60,7 @@ def set_num_threads(n): torch.set_num_threads(n) # Number of CPU threads +NUM_THREADS = os.environ.get("NUM_THREADS", os.environ.get("OMP_NUM_THREADS")) if NUM_THREADS is None: NUM_THREADS = DEFAULT_NUM_THREADS if NUM_THREADS is not None: