NPDA-kNN-ST

Official implementation of EMNLP'2022 paper "Non-Parametric Domain Adaptation for End-to-end Speech Translation".

This codebase is currently a nightly version and is undergoing refactoring, and we will release the refactored code in the future.

Citation

Please cite our paper if you find this repository helpful in your research:

@article{Du2022NonParametricDA,
  title={Non-Parametric Domain Adaptation for End-to-End Speech Translation},
  author={Yichao Du and Weizhi Wang and Zhirui Zhang and Boxing Chen and Tong Xu and Jun Xie and Enhong Chen},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.11211}
}

Instructions

Requirements and Insallation

python = 3.6
pytorch = 1.8.1
torchaudio = 0.8.1
SoundFile = 0.10.3.post1
numpy = 1.19.5
omegaconf = 2.0.6
PyYAML = 5.4.1
sentencepiece = 0.1.96
sacrebleu = 1.5.1
faiss-gpu = 1.7.1.post1
torch-scatter = 2.0.8

Preparations and Configurations

Pre-trained Model and Data

We use the vocab file and pre-trained ST model provided by Fairseq S2T MuST-C Example.

TSV Data

The TSV manifests we used are different from Fairseq S2T MuST-C Example, as follows:

id	audio	n_frames	speaker	src_lang	src_text	tgt_lang	tgt_text
ted_767_0	/data/mustc/en-fr/fbank80.zip:55688475685:274688	858	spk.767	en	These breakthroughs, we need to move those at full speed, and we can measure that in terms of companies, pilot projects, regulatory things that have been changed.	fr	Ces progrès, il faut que nous les réalisions à toute vitesse, et nous pouvons quantifier cela en termes de sociétés, de projets pilotes, de modifications des dispositions réglementaires.

You can get them by:

cd ./myscripts/prepare_data
bash ./prepare_data/prep_mustc_data.sh
bash ./prep_europarst_data.sh
bash ./prep_europarmt_data.sh

Config File

If the source and target dictionaries are different, we need to declare src_bpe_tokenizer, src_vocab_filename, tgt_bpe_tokenizer, and vocab_filename in config file.

input_channels: 1
input_feat_per_channel: 80
sampling_alpha: 1.0
specaugment:
  freq_mask_F: 27
  freq_mask_N: 1
  time_mask_N: 1
  time_mask_T: 100
  time_mask_p: 1.0
  time_wrap_W: 0
transforms:
  '*':
  - utterance_cmvn
  _train:
  - utterance_cmvn
  - specaugment
src_bpe_tokenizer:
  bpe: sentencepiece
  sentencepiece_model: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_5000.model
src_vocab_filename: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_5000.txt
tgt_bpe_tokenizer:
  bpe: sentencepiece
  sentencepiece_model: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_8000.model
vocab_filename: /data/mustc/en-fr/st_joint_mt_data/spm_unigram_8000.txt

For multilingual experiments, you need to add a new parameter prepend_tgt_lang_tag: True to the configuration yaml file.

Unifying Text and Speech Representation

We input parallel $\langle speech, translation\rangle$ and $\langle speech, transcription\rangle$ into the model to unify text adn speech representation on the decoder side. For convenience, we also provide the well trained model.

Trianing Bilingual Model on MuST-C Corpus:

cd ./myscripts/mustc2europarl
bash ./unify_representation_bilingual.sh

Trianing Multilingual Model on MuST-C Corpus:

cd ./myscripts/mustc2europarl
bash ./unify_representation_multilingual.sh

Inference with $k$NN Retrieval

Create Datastore

When the model with unified text and speech representation is tuned well, we could load the model for creating a cached datastore with the script as follow:

cd ./myscripts/mustc2europarlst
bash ./build_datastore.sh

Build Faiss Index

The FAISS index requires a training stage where it learns a set of clusters for the keys. Once this is completed, the keys must all be added to the index. The speed of adding keys to the index depends on the hardware, particularly the amount of RAM available. When the knn_index is build, we can remove keys.npy and vals.key to save the hard disk space.

cd ./myscripts/mustc2europarlst
bash ./train_datastore.sh

Inference via $k$NN-ST

cd ./myscripts/mustc2europarlst
bash ./eval_via_knn.sh

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
NNST		NNST
build		build
docs		docs
examples		examples
fairseq.egg-info		fairseq.egg-info
fairseq		fairseq
fairseq_cli		fairseq_cli
myscripts		myscripts
scripts		scripts
tests		tests
.DS_Store		.DS_Store
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_fairseq.md		README_fairseq.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.py		setup.py
setup.sh		setup.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NPDA-kNN-ST

Citation

Instructions

Requirements and Insallation

Preparations and Configurations

Pre-trained Model and Data

TSV Data

Config File

Unifying Text and Speech Representation

Inference with $k$NN Retrieval

Create Datastore

Build Faiss Index

Inference via $k$NN-ST

About

Releases

Packages

Languages

License

duyichao/NPDA-KNN-ST

Folders and files

Latest commit

History

Repository files navigation

NPDA-kNN-ST

Citation

Instructions

Requirements and Insallation

Preparations and Configurations

Pre-trained Model and Data

TSV Data

Config File

Unifying Text and Speech Representation

Inference with $k$NN Retrieval

Create Datastore

Build Faiss Index

Inference via $k$NN-ST

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages