This guid provides the process of generating training data for NVAS3D.
Download all rooms from Matterport3D.
Download the following dataset for source audios and locate them at data/source
To process Slakh2100 dataset (clipping, splitting, and upsampling to 48kHz), execute the following command:
python nvas3d/training_data_generation/script_slakh.py
The output will be located at data/MIDI/clip/
.
To upsample LibriSpeech dataset to 48kHz, execute the following command:
python nvas3d/training_data_generation/upsample_librispeech.py
The output will be located at data/source/LibriSpeech48k
, and move it to data/MIDI/clip/speech/LibriSpeech48k
.
To generate square-shaped microphone configuration metadata, execute the following command:
python nvas3d/training_data_generation/generate_metadata_square.py
The output metadata will be located at data/nvas3d_square/
Finally, to generate the training data for NVAS3D, execute the following command:
python nvas3d/training_data_generation/generate_training_data.py
The generated data will be located at data/nvas3d_square_all_all
.
-
LibriSpeech is licensed under CC-BY-4.0.
-
Matterport3D: Matterport3D-based task datasets and trained models are distributed with the Matterport3D Terms of Use and are licensed under CC BY-NC-SA 3.0 US.