Skip to content

Repo configured with useful functions for preprocessing, segmenting and creating audio datasets, as well as extracting embeddings

License

Notifications You must be signed in to change notification settings

cvillela/audio_extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Extractor

Repo configured with useful functions for preprocessing, segmenting and creating audio datasets, as well as extracting embeddings.

Installation

  • Create a conda environment and activate it:
   conda create --name audioext -c conda-forge python=3.11
   conda activate audioext
   pip install -r ./requirements.txt
  • For jukebox embeddings - Install Jukemirlib from the github repo:
   pip install git+https://github.com/rodrigo-castellon/jukemirlib.git
  • Install ffmpeg
  conda install -c conda-forge ffmpeg

Jukebox Extractor

Extract OpenAI's Jukebox embeddings from a series of audio files contained in a directory. From the project directory, run for help on the parameters:

python -m audioext.pipelines.jukebox_extractor -h

Note that running this script for the first time will automatically download the model weigths to the machine. Refer to Jukemirlib for more information.

Musicgen Dataset Generator

Create MusiGen ready samples and metadata from a series of audio files contained in a directory, and send them to train, val and test splits. Prompting is still unconditional (use 8H38fNdtri as a prompt to all models). From the project directory, run for help on the parameters:

python -m audioext.pipelines.musicgen_dataset -h

Reduce Noise + Remove Silence

Reduce noise and optionally remove silence in between sounds of files in a given directory. From the project directory, run for help on the parameters:

python -m audioext.pipelines.denoise -h

Next Steps

TO-DO

  • Reformat code to have a decoupled audio_segmenter utility
  • Reformat code to have a decoupled audio_processer utility
  • Jukebox extraction notebook to callable script
  • Reformat pipelines for Musicgen Dataset Generation.
  • Add Denoise + Remove Silence.
  • Reformat pipeline for Music Emotion Recognition.
  • Create Audio Metadata enriching pipelines.
  • Add bandwith extension and audio super-resolution?

Known bugs

  • Handle single audio file with varying sample_rate
  • Debug multi-channel normalization

About

Repo configured with useful functions for preprocessing, segmenting and creating audio datasets, as well as extracting embeddings

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages