This project explores the integration of Neural Audio Codecs (NACs) into Contrastive Language-Audio Pretraining (CLAP) models, demonstrating their superior feature discrimination and retrieval efficacy, and setting new benchmarks for audio representation in AI systems.
We highly recommend users to run this project under conda
environment.
To create a new environment with the necessary dependencies, run the following command:
conda env create --name envname --file=env.yaml
To run the project on multiple GPU nodes using Distributed Data Parallel (DDP), use the following command:
torchrun --nproc_per_node=4 SMC_CodecCLAP/retrieval/smc.py -c SMC_CodecCLAP/retrieval/settings/mel.yaml
If you are running on CPU nodes, use the following command:
python3 SMC_CodecCLAP/retrieval/smc.py -c SMC_CodecCLAP/retrieval/settings/encodec.yaml