Skip to content

Official implementation of our IWSLT 2023 paper "The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks"

Notifications You must be signed in to change notification settings

duyichao/MINETrans-IWSLT23

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks

This project is the official implementation of the MineTrans English-to-Chinese speech transaltion system for the IWSLT2023 speech-to-speech translation (S2ST) track and the offline speech translation (S2T) track.

🌐 Demo Page • 🤗 HuggingFace Page(Coming soon) • 📃 Paper • 📽️ Slide • ⏬ Data • 🤖 Model

Team: Yichao Du, Zhengsheng Guo, Jinchuan Tian, Zhirui Zhang, Xing Wang, Jianwei Yu, Zhaopeng Tu, Tong Xu, and Enhong Chen


Overview


Setup

git clone https://github.com/duyichao/MineTrans-IWSLT23.git
cd MineTrans-IWSLT23
pip install -e ./fairseq
pip install -r requirements.txt

Speech-to-Speech Translation

Pre-trained Models

Speech Encoder & K-means Model

Language Speech Encoder Block type Model size Dataset KM-Model
En Wav2vec 2.0 Conformer Large Voxpopuli & GigaSS ×
Zh HuBert Transformer Base GigaSS & AISHELL3 layer6.km250

S2UT Model

Models ASR-BLEU ASR-charF Checkpoint
W2V2-CONF-LARGE 27.7 23.4 download
W2V2-CONF-LARGE+T2U 27.8 23.7 download
HUBERT-TRANS-LARGE+T2U 26.2 23.2 download
HUBERT-TRANS-LARGE+T2U* 25.7 22.6 download

Unit HiFi-GAN Vocoder

Unit config Unit size Language Dataset Model
HuBERT Base, layer 6 250 Zh GigaSS-S (200h) d_500000

Data Preparation

Formatting Data

Dataset should be prepared into the following format.

id	audio	n_frames	tgt_text	tgt_n_frames
YOU0000010267_S0001707	/path/to/YOU0000010267_S0001707.wav	49600	44 127 27 66 46	100
YOU0000016336_S0001298	/path/to/YOU0000016336_S0001298.wav	83200	44 239 222 46	202

Inference

  1. Follow the same inference process as in fairseq-S2T to generate units (${RESULTS_PATH}/generate-${GEN_SUBSET}.txt).
CFG=config_u250_s2ut_audio.yaml
CKPT_S2UT=/path/to/checkpoint
RESULTS_PATH=/path/to/results
EVAL_DATA_PATH=/path/to/eval_data
GEN_SUBSET=/path/to/test_data


mkdir ${RESULTS_PATH} -p
CUDA_VISIBLE_DEVICES=1 \
  fairseq-generate ${EVAL_DATA_PATH} \
  --config-yaml ${CFG} \
  --task speech_to_text \
  --path ${CKPT_S2UT} --gen-subset ${GEN_SUBSET} \
  --max-tokens 2000000 --max-source-positions 2000000 --max-target-positions 10000 \
  --beam 10 --max-len-a 1 --max-len-b 200 --lenpen 1 \
  --scoring sacrebleu \
  --required-batch-size-multiple 1 \
  --results-path ${RESULTS_PATH}
  1. Convert unit sequences to waveform through unit-based HiFi-GAN vocoder.
VOCODER_CFG=/path/to/vocoder_cfg
VOCODER_CKPT=/path/to/vocoder_ckpt
  grep "^D\-" ${RESULTS_PATH}/generate-${GEN_SUBSET}.txt |
    sed 's/^D-//ig' | sort -nk1 | cut -f3 \
      >${RESULTS_PATH}/generate-${GEN_SUBSET}.hyp.unit
  grep "^T\-" ${RESULTS_PATH}/generate-${GEN_SUBSET}.txt |
    sed 's/^D-//ig' | sort -nk1 | cut -f2 \
      >${RESULTS_PATH}/generate-${GEN_SUBSET}.ref.unit

  mkdir ${RESULTS_PATH}/audio_gen -p
  python3 ./minetrans/scripts/generate_waveform_from_code.py \
    --in-code-file ${RESULTS_PATH}/generate-${GEN_SUBSET}.hyp.unit \
    --vocoder ${VOCODER_CKPT} --vocoder-cfg ${VOCODER_CFG} \
    --results-path ${RESULTS_PATH}/audio_gen --dur-prediction

Offline Speech Translation

Coming soon.


Citation

Please cite our paper if you find this repository helpful in your research:

@inproceedings{du2023minetrans,
    title = {The {M}ine{T}rans Systems for {IWSLT} 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks},
    author = {Du, Yichao and Zhengsheng, Guo and Tian, Jinchuan and Zhang, Zhirui and Wang, Xing and Yu, Jianwei and Tu, Zhaopeng  and Xu, Tong  and Chen, Enhong},
    booktitle = {Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)},
    year = {2023},
    publisher = {Association for Computational Linguistics},
    url = {https://aclanthology.org/2023.iwslt-1.3},
    pages = {79--88},

}

About

Official implementation of our IWSLT 2023 paper "The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published