Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 2.12 KB

File metadata and controls

50 lines (38 loc) · 2.12 KB

NVAS3D Training Data Generation

This guid provides the process of generating training data for NVAS3D.

Step-by-Step Instructions

1. Download Matterport3D Data

Download all rooms from Matterport3D.

2. Download Source Audios

Download the following dataset for source audios and locate them at data/source

3. Preprocess Source Audios

3.1. Clip, Split, and Upsample Slakh2100

To process Slakh2100 dataset (clipping, splitting, and upsampling to 48kHz), execute the following command:

python nvas3d/training_data_generation/script_slakh.py

The output will be located at data/MIDI/clip/.

3.2. Upsample LibriSpeech

To upsample LibriSpeech dataset to 48kHz, execute the following command:

python nvas3d/training_data_generation/upsample_librispeech.py

The output will be located at data/source/LibriSpeech48k, and move it to data/MIDI/clip/speech/LibriSpeech48k.

4. Generate Metadata for Microphone Configuration

To generate square-shaped microphone configuration metadata, execute the following command:

python nvas3d/training_data_generation/generate_metadata_square.py

The output metadata will be located at data/nvas3d_square/

5. Generate Training Data

Finally, to generate the training data for NVAS3D, execute the following command:

python nvas3d/training_data_generation/generate_training_data.py

The generated data will be located at data/nvas3d_square_all_all.

Acknowledgements