Name	Name	Last commit message	Last commit date
Latest commit History 18 Commits
assest	assest
demo	demo
fairseq	fairseq
minetrans/scripts	minetrans/scripts
speech-resynthesis	speech-resynthesis
.gitignore	.gitignore
README.md	README.md

The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks

This project is the official implementation of the MineTrans English-to-Chinese speech transaltion system for the IWSLT2023 speech-to-speech translation (S2ST) track and the offline speech translation (S2T) track.

🌐 Demo Page • 🤗 HuggingFace Page(Coming soon) • 📃 Paper • 📽️ Slide • ⏬ Data • 🤖 Model

Team: Yichao Du, Zhengsheng Guo, Jinchuan Tian, Zhirui Zhang, Xing Wang, Jianwei Yu, Zhaopeng Tu, Tong Xu, and Enhong Chen

Setup

git clone https://github.com/duyichao/MineTrans-IWSLT23.git
cd MineTrans-IWSLT23
pip install -e ./fairseq
pip install -r requirements.txt

Speech-to-Speech Translation

Pre-trained Models

Speech Encoder & K-means Model

Language	Speech Encoder	Block type	Model size	Dataset	KM-Model
En	Wav2vec 2.0	Conformer	Large	Voxpopuli & GigaSS	×
Zh	HuBert	Transformer	Base	GigaSS & AISHELL3	layer6.km250

S2UT Model

Models	ASR-BLEU	ASR-charF	Checkpoint
W2V2-CONF-LARGE	27.7	23.4	download
W2V2-CONF-LARGE+T2U	27.8	23.7	download
HUBERT-TRANS-LARGE+T2U	26.2	23.2	download
HUBERT-TRANS-LARGE+T2U*	25.7	22.6	download

Unit HiFi-GAN Vocoder

Unit config	Unit size	Language	Dataset	Model
HuBERT Base, layer 6	250	Zh	GigaSS-S (200h)	d_500000

Data Preparation

Formatting Data

Dataset should be prepared into the following format.

id	audio	n_frames	tgt_text	tgt_n_frames
YOU0000010267_S0001707	/path/to/YOU0000010267_S0001707.wav	49600	44 127 27 66 46	100
YOU0000016336_S0001298	/path/to/YOU0000016336_S0001298.wav	83200	44 239 222 46	202

Inference

Follow the same inference process as in fairseq-S2T to generate units (${RESULTS_PATH}/generate-${GEN_SUBSET}.txt).

CFG=config_u250_s2ut_audio.yaml
CKPT_S2UT=/path/to/checkpoint
RESULTS_PATH=/path/to/results
EVAL_DATA_PATH=/path/to/eval_data
GEN_SUBSET=/path/to/test_data


mkdir ${RESULTS_PATH} -p
CUDA_VISIBLE_DEVICES=1 \
  fairseq-generate ${EVAL_DATA_PATH} \
  --config-yaml ${CFG} \
  --task speech_to_text \
  --path ${CKPT_S2UT} --gen-subset ${GEN_SUBSET} \
  --max-tokens 2000000 --max-source-positions 2000000 --max-target-positions 10000 \
  --beam 10 --max-len-a 1 --max-len-b 200 --lenpen 1 \
  --scoring sacrebleu \
  --required-batch-size-multiple 1 \
  --results-path ${RESULTS_PATH}

Convert unit sequences to waveform through unit-based HiFi-GAN vocoder.

VOCODER_CFG=/path/to/vocoder_cfg
VOCODER_CKPT=/path/to/vocoder_ckpt
  grep "^D\-" ${RESULTS_PATH}/generate-${GEN_SUBSET}.txt |
    sed 's/^D-//ig' | sort -nk1 | cut -f3 \
      >${RESULTS_PATH}/generate-${GEN_SUBSET}.hyp.unit
  grep "^T\-" ${RESULTS_PATH}/generate-${GEN_SUBSET}.txt |
    sed 's/^D-//ig' | sort -nk1 | cut -f2 \
      >${RESULTS_PATH}/generate-${GEN_SUBSET}.ref.unit

  mkdir ${RESULTS_PATH}/audio_gen -p
  python3 ./minetrans/scripts/generate_waveform_from_code.py \
    --in-code-file ${RESULTS_PATH}/generate-${GEN_SUBSET}.hyp.unit \
    --vocoder ${VOCODER_CKPT} --vocoder-cfg ${VOCODER_CFG} \
    --results-path ${RESULTS_PATH}/audio_gen --dur-prediction

Offline Speech Translation

Coming soon.

Citation

Please cite our paper if you find this repository helpful in your research:

@inproceedings{du2023minetrans,
    title = {The {M}ine{T}rans Systems for {IWSLT} 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks},
    author = {Du, Yichao and Zhengsheng, Guo and Tian, Jinchuan and Zhang, Zhirui and Wang, Xing and Yu, Jianwei and Tu, Zhaopeng  and Xu, Tong  and Chen, Enhong},
    booktitle = {Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)},
    year = {2023},
    publisher = {Association for Computational Linguistics},
    url = {https://aclanthology.org/2023.iwslt-1.3},
    pages = {79--88},

}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks

Overview

Setup

Speech-to-Speech Translation

Pre-trained Models

Speech Encoder & K-means Model

S2UT Model

Unit HiFi-GAN Vocoder

Data Preparation

Formatting Data

Inference

Offline Speech Translation

Citation

About

Releases

Packages

Languages

duyichao/MINETrans-IWSLT23

Folders and files

Latest commit

History

Repository files navigation

The MineTrans Systems for IWSLT 2023 Offline Speech Translation and Speech-to-Speech Translation Tasks

Overview

Setup

Speech-to-Speech Translation

Pre-trained Models

Speech Encoder & K-means Model

S2UT Model

Unit HiFi-GAN Vocoder

Data Preparation

Formatting Data

Inference

Offline Speech Translation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages