Applying Large-Scale Weakly-Supervised Automatic Speech Recognition to Air Traffic Control
The CreateDataset
module contains the necessary files to create and upload datasets to huggingface. A random split is made in the data where 80% will be used for training and 20% for validaiton. Currently the following datasets are available:
However, the latter will only be used for training purposed and not for evaulation as it does not form a benchmark.
The PromptTesting
module contains the files used for the iterative experiments on prompting and normalization.
Two scripts are available for evaluating the models. One is for evaluating the blank models and the second is for evaulating the fine-tuned models. The fine-tuned model weights will be converted into the whisper format in order to be used for inference. This folder also contains the normalization script.
The fine-tuning scripts are created to form a modular way of fine-tuning the blank Whisper models on the created datasets. The models will automatically be uploaded into the huggingface format. The fine-tuning relies on the deepspeed
package. Currently the following fine-tuned models are available:
All the datasets and models are available on the HuggingFace🤗 Hub.
An interactive demo can be found on HuggingFace🤗 Spaces.
The paper can be found here.
All code is licensed under the LGPL-3.0 license. See the LICENSE file for details. In this repository we use Whisper which is licensed under the MIT License.
If you use this code in your work, please cite the accompanying paper:
The models can be cited as follows:
@misc {wlv3-atco2-asr,
author = { {J.L.P.M. van Doorn} },
title = { whisper-large-v3-atco2-asr },
year = 2023,
url = { https://huggingface.co/jlvdoorn/whisper-large-v3-atco2-asr },
doi = { 10.57967/hf/1386 },
publisher = { Hugging Face }
},
@misc {wlv3-atcosim,
author = { {J.L.P.M. van Doorn} },
title = { whisper-large-v3-atcosim },
year = 2023,
url = { https://huggingface.co/jlvdoorn/whisper-large-v3-atcosim },
doi = { 10.57967/hf/1387 },
publisher = { Hugging Face }
},
@misc {wlv3-atco2-asr-atcosim,
author = { {J.L.P.M. van Doorn} },
title = { whisper-large-v3-atco2-asr-atcosim },
year = 2023,
url = { https://huggingface.co/jlvdoorn/whisper-large-v3-atco2-asr-atcosim },
doi = { 10.57967/hf/1388 },
publisher = { Hugging Face }
},
@misc {wlv2-atco2-asr,
author = { {J.L.P.M. van Doorn} },
title = { whisper-large-v2-atco2-asr },
year = 2023,
url = { https://huggingface.co/jlvdoorn/whisper-large-v2-atco2-asr },
doi = { 10.57967/hf/1376 },
publisher = { Hugging Face }
},
@misc {wlv2-atcosim,
author = { {J.L.P.M. van Doorn} },
title = { whisper-large-v2-atcosim },
year = 2023,
url = { https://huggingface.co/jlvdoorn/whisper-large-v2-atcosim },
doi = { 10.57967/hf/1374 },
publisher = { Hugging Face }
},
@misc {wlv2-atco2-asr-atcosim,
author = { {J.L.P.M. van Doorn} },
title = { whisper-large-v2-atco2-asr-atcosim },
year = 2023,
url = { https://huggingface.co/jlvdoorn/whisper-large-v2-atco2-asr-atcosim },
doi = { 10.57967/hf/1375 },
publisher = { Hugging Face }
}
The datasets can be cited as follows:
@misc {atco2-asr,
author = { {J.L.P.M. van Doorn} },
title = { atco2-asr },
year = 2023,
url = { https://huggingface.co/datasets/jlvdoorn/atco2-asr },
doi = { 10.57967/hf/1377 },
publisher = { Hugging Face }
},
@misc {atcosim,
author = { {J.L.P.M. van Doorn} },
title = { atcosim },
year = 2023,
url = { https://huggingface.co/datasets/jlvdoorn/atcosim },
doi = { 10.57967/hf/1378 },
publisher = { Hugging Face }
},
@misc {atco2-asr-atcosim,
author = { {J.L.P.M. van Doorn} },
title = { atco2-asr-atcosim },
year = 2023,
url = { https://huggingface.co/datasets/jlvdoorn/atco2-asr-atcosim },
doi = { 10.57967/hf/1379 },
publisher = { Hugging Face }
}