From 511546ab0a6ca24caaea5096ba71c57bb32079a7 Mon Sep 17 00:00:00 2001 From: Jeronymous Date: Sat, 30 Nov 2024 17:01:20 +0100 Subject: [PATCH] tune doc --- README.md | 121 +++++++++++++++++++++++++++++------------------------- 1 file changed, 65 insertions(+), 56 deletions(-) diff --git a/README.md b/README.md index 6510ef0..7777a01 100644 --- a/README.md +++ b/README.md @@ -14,14 +14,14 @@ The service allows you to: * [Deploy](#deploy) * [Using docker run](#using-docker-run) * [Using docker compose](#using-docker-compose) - * [Environement Variables](#environement-variables) + * [Environment Variables](#environment-variables) * [API](#api) - * [/list-services](#environement-variables) + * [/list-services](#environment-variables) * [Subservice resolution](#subservice-resolution) * [/transcribe](#transcribe) - * [Transcription config](#transcription-config) - * [/transcribe-multi](#transcribe-multi) - * [MultiTranscription config](#multitranscription-config) + * [Transcription configuration](#transcription-configuration) + * [/job/{jobid}](#job) * [/results/{result_id}](#results) * [Transcription results](#transcription-results) @@ -38,7 +38,7 @@ To use the transcription service you must have at least: * A mongo DB running at `MONGO_HOST:MONGO_PORT`. Optionnaly, for diarization or punctuation the following are needed: -* One or multiple instances of [linto-diarization-worker](https://github.com/linto-ai/linto-diarization) > 1.2.0 for speaker diarization configured on the same service broker (LANGUAGE must be compatible). +* One or multiple instances of [linto-diarization-worker](https://github.com/linto-ai/linto-diarization) > 1.2.0 for speaker diarization configured on the same service broker. * One or multiple instances of [linto-punctuation-worker](https://github.com/linto-ai/linto-punctuation) > 1.2.0 for text punctuation configured on the same service broker (LANGUAGE must be compatible). To share audio files across the different services they must be configured with the same shared volume `RESSOURCE_FOLDER`. @@ -54,7 +54,7 @@ docker build . -t transcription_service ```bash cp .envdefault .env ``` -Fill the .env with the value described bellow [Environement Variables](#environement-variables) +Fill the .env with the value described below [Environment Variables](#environment-variables) 2- Launch a container: ```bash @@ -72,29 +72,29 @@ Fill ```SERVING_PORT```, ```YOUR_SHARED_FOLDER``` with your values. ```bash cp .envdefault .env ``` -Fill the .env with the value described bellow [Environement Variables](#environement-variables) +Fill the .env with the value described below [Environment Variables](#environment-variables) 2- Compose ```bash docker-compose up . ``` -### Environement Variables +### Environment Variables | Env variable| Description | Example | |:-|:-|:-| -|SERVICE_NAME| STT service name, use to connect to the proper redis channel and mongo collection|my_stt_service| -|LANGUAGE| Language code as a BCP-47 code | fr-FR | -|KEEP_AUDIO|Either audio files are kept after request|1 (true) / 0 (false)| -|CONCURRENCY|Number of workers (default 10)|10| -|SERVICES_BROKER|Message broker address|redis://broker_address:6379| -|BROKER_PASS|Broker Password| Password| -|MONGO_HOST|MongoDB results url|my-mongo-service| -|MONGO_PORT|MongoDB results port|27017| -|RESOLVE_POLICY| Subservice resolve policy (default ANY) * |ANY \| DEFAULT \| STRICT | -|_DEFAULT| Default serviceName for subtask * | punctuation-1 | - -*: See [Subservice Resolution](#subservice-resolution) +|`LANGUAGE`| Language code (BCP-47 code) used for text normalization (digits to words, punctuation normalization, ...) | `fr-FR` | +|`KEEP_AUDIO`|Either audio files are kept after request|`1` (true) \| `0` (false)| +|`CONCURRENCY`|Number of workers (default 10)|`10`| +|`SERVICE_NAME`| STT service name, use to connect to the proper redis channel and mongo collection|`my_stt_service`| +|`SERVICES_BROKER`|Message broker address|`redis://broker_address:6379`| +|`BROKER_PASS`|Broker Password| `Password`| +|`MONGO_HOST`|MongoDB results url|`my-mongo-service`| +|`MONGO_PORT`|MongoDB results port|`27017`| +|`RESOLVE_POLICY`| Subservice resolve policy (default ANY) * | `ANY` \| `DEFAULT` \| `STRICT` | +|<`SERVICE_TYPE`>`_DEFAULT`| Default serviceName for subtask <`SERVICE_TYPE`> * | `punctuation-1` | + +*: See [Subservice resolution](#subservice-resolution) ## API The transcription service offers a transcription API REST to submit transcription requests. @@ -160,7 +160,7 @@ There is 3 policies to resolve service names: * DEFAULT: Use the service default subservice (must be declared) * STRICT: If the service is not specified, raise an error. -Resolve policy is declared at launch using the RESOLVE_POLICY environement variable: ANY | DEFAULT | STRICT (default ANY). +Resolve policy is declared at launch using the RESOLVE_POLICY environment variable: ANY | DEFAULT | STRICT (default ANY). Default service names must be declared at launch: _DEFAULT. E.g. The default punctuation subservice is "punctuation-1", `PUNCTUATION_DEFAULT=punctuation1`. @@ -183,8 +183,8 @@ Response format can be application/json or text/plain as specified in the accept |Form Parameter| Description | Required | |:-|:-|:-| -|transcriptionConfig|(object optionnal) A transcriptionConfig Object describing transcription parameters | See [Transcription config](#transcription-config) | -|force_sync|(boolean optionnal) If True do a synchronous request | [true \| **false** \| null] | +|transcriptionConfig|(object optionnal) A transcription configuration describing transcription parameters, in JSON format | See [Transcription configuration](#transcription-configuration) | +|force_sync|(optional boolean, default=false) If True do a synchronous request | `true` \| `false` \| `null` | If the request is accepted, answer should be ```201``` with a json or text response containing the jobid. @@ -208,34 +208,43 @@ Additionnaly a timestamps file can be uploaded alongside the audio file containi 7.05 13.0 ``` -#### Transcription config -The transcriptionConfig object describe the transcription parameters and flags of the request. It is structured as follows: +#### Transcription configuration +The transcription config describes the transcription input parameters and flags of the request. +It permits to set: +* Target language for the transcript, +* Voice Activity Detection (VAD) parameters, +* Diarization parameters, +* Punctuation parameters. + +It is structured as follows: ```json { + "language": "fr", # Target language for the transcript "vadConfig": { - "enableVad": true, - "methodName": "WebRTC", - "minDuration": 30 - }, - "punctuationConfig": { - "enablePunctuation": false, # Applies punctuation - "serviceName": null # Force serviceName (See SubService resolution) + "enableVad": true, # Enables Voice Activity Detection + "methodName": "WebRTC", # VAD method + "minDuration": 30 # Minimum duration of a speech segment }, "diarizationConfig": { - "enableDiarization": true, #Enables speaker diarization - "numberOfSpeaker": null, #If set, forces number of speaker - "maxNumberOfSpeaker": 50 #If set and and numberOfSpeaker is not, limit the maximum number of speaker. - "serviceName": null # Force serviceName (See SubService Resolving) + "enableDiarization": true, # Enables speaker diarization or not + "numberOfSpeaker": null, # If set, forces number of speaker + "maxNumberOfSpeaker": 50 # If set and and numberOfSpeaker is not, limit the maximum number of speaker. + "serviceName": null # Force serviceName (See SubService Resolving) }, - "language": "fr-FR" + "punctuationConfig": { + "enablePunctuation": false, # Applies punctuation or not + "serviceName": null # Force serviceName (See SubService resolution) + } } ``` -ServiceNames can be filled to use a specific subservice version. Available services are available on /list-services. - +`serviceName` can be filled to use a specific subservice version. Available services are available on `/list-services`. +The target `language` can be "`*`" for automatic language detection, or usual tags to describe a language ("fr", "fr-FR", "French" -- see https://github.com/linto-ai/linto-stt/tree/master/whisper#language). +Note that the role of this parameter is different from the role of the env variable `LANGUAGE` which is used for text normalization +(and limited to BCP-47 codes). -### /transcribe-multi + 00:07.719 Diarization and punctuation are set ``` -* text/srt returns the transcription formated as SubRip Subtitle. +* `text/srt` returns the transcription formated as SubRip Subtitle. ``` 1 00:00:00,000 --> 00:00:03,129