This repository provides the implementation of the paper: Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks (INTERSPEECH 2021). This work is an extension of our previous work (ICASSP 2021) with Codes.
- The EER(%) and t-DCF of different network architectures on the ASVspoof2019 logical access.
- The detection accuracy on each attack, given different network architectures with EER operating points. A16 and A19 are two attacks from the training set but trained with different training data. A17 is the most difficult unseen attack.
-
Python and packages
This code was tested on Python 3.7 with PyTorch 1.6.0. Other packages can be installed by:
pip install -r requirements.txt
-
Kaldi
This work used Kaldi to extract the spectrogram acoustic feature, you need to install Kaldi before running our scripts.
This work is conducted on the logical access of ASVspoof2019 Dataset, which can be downloaded here. It consists of attacks generated by different voice conversion and text-to-speech algorithms.
This repository mainly consists of three parts: (i) feature extraction, (ii) system training and (iii) system evaluation.
The conducted experiments in the paper were based on the CQT feature, while this repo provides codes for three feature extraction, i.e. Spec, LFCC and CQT. The top script for feature extraction is extract_feats.sh
, where the first step (Stage 0) is to prepare dataset before feature extraction, followed by feature extraction for Spec (Stage 1), CQT (Stage 2) and LFCC (Stage 3). All features are required to be truncated by Stage 4.
Given your dataset directory in extract_feats.sh
, you can run any stage (e.g. NUM) in the extract_feats.sh
by
./extract_feats.sh --stage NUM
This repo supports different system architectures, as configured in the conf/training_mdl
directory. You can specify the modelconfig
, feats
, etc., in start_training_evaluation.sh
, then run the codes below to train and evaluate your models.
./start_training_evaluation.sh
Remember to rename your runid
in start_training_evaluation.sh
to differentiate each configuration.
Some well-trained models are available in the pretrained_models
directory.
For evaluating systems, you can either use the Kaldi command compute-eer
with the resulting *.eer
file to compute system EER, e.g.
compute-eer NameofScoringFile.txt.eer
or use the ASVspoof2019 official script scoring/evaluate_tDCF_asvspoof19.py
with the resulting *.txt
file to compute both system EER and t-DCF, e.g. on the LA evalation set, you need to run
python scoring/evaluate_tDCF_asvspoof19.py scoring/la_asv_scores/ASVspoof2019.LA.asv.eval.gi.trl.scores.txt NameofScoringFile.txt
If this repo is helpful with your research or projects, please kindly star our repo and cite our paper as follows:
@article{li2021channel,
title={Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks},
author={Li, Xu and Wu, Xixin and Lu, Hui and Liu, Xunying and Meng, Helen},
journal={Proc. Interspeech 2021},
year={2021}
}
- Xu Li at the Chinese University of Hong Kong ([email protected], [email protected])