-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
AudioCraft v1.0.0 release with training code, AudioGen, MultiBandDiff…
…usion etc.
- Loading branch information
Showing
204 changed files
with
15,938 additions
and
936 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,14 @@ | ||
# Audiocraft | ||
# AudioCraft | ||
![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg) | ||
![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg) | ||
![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg) | ||
|
||
Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model. | ||
AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code | ||
for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen. | ||
|
||
## MusicGen | ||
|
||
Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive | ||
Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates | ||
all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict | ||
them in parallel, thus having only 50 auto-regressive steps per second of audio. | ||
Check out our [sample page][musicgen_samples] or test the available demo! | ||
|
||
<a target="_blank" href="https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | ||
</a> | ||
<a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen"> | ||
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/> | ||
</a> | ||
<br> | ||
|
||
We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data. | ||
|
||
## Installation | ||
Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following: | ||
AudioCraft requires Python 3.9, PyTorch 2.0.0. To install AudioCraft, you can run the following: | ||
|
||
```shell | ||
# Best to make sure you have torch installed first, in particular before installing xformers. | ||
|
@@ -33,143 +17,66 @@ pip install 'torch>=2.0' | |
# Then proceed to one of the following | ||
pip install -U audiocraft # stable release | ||
pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft # bleeding edge | ||
pip install -e . # or if you cloned the repo locally | ||
``` | ||
|
||
## Usage | ||
We offer a number of way to interact with MusicGen: | ||
1. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support). | ||
2. You can run the Gradio demo in Colab: [colab notebook](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing). | ||
3. You can use the gradio demo locally by running `python app.py`. | ||
4. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally (if you have a GPU). | ||
5. Checkout [@camenduru Colab page](https://github.com/camenduru/MusicGen-colab) which is regularly | ||
updated with contributions from @camenduru and the community. | ||
6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below. | ||
|
||
## API | ||
|
||
We provide a simple API and 4 pre-trained models. The pre trained models are: | ||
- `small`: 300M model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-small) | ||
- `medium`: 1.5B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-medium) | ||
- `melody`: 1.5B model, text to music and text+melody to music - [🤗 Hub](https://huggingface.co/facebook/musicgen-melody) | ||
- `large`: 3.3B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-large) | ||
|
||
We observe the best trade-off between quality and compute with the `medium` or `melody` model. | ||
In order to use MusicGen locally **you must have a GPU**. We recommend 16GB of memory, but smaller | ||
GPUs will be able to generate short sequences, or longer sequences with the `small` model. | ||
|
||
**Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`. | ||
You can install it with: | ||
``` | ||
apt-get install ffmpeg | ||
``` | ||
|
||
See after a quick example for using the API. | ||
|
||
```python | ||
import torchaudio | ||
from audiocraft.models import MusicGen | ||
from audiocraft.data.audio import audio_write | ||
|
||
model = MusicGen.get_pretrained('melody') | ||
model.set_generation_params(duration=8) # generate 8 seconds. | ||
wav = model.generate_unconditional(4) # generates 4 unconditional audio samples | ||
descriptions = ['happy rock', 'energetic EDM', 'sad jazz'] | ||
wav = model.generate(descriptions) # generates 3 samples. | ||
|
||
melody, sr = torchaudio.load('./assets/bach.mp3') | ||
# generates using the melody from the given audio and the provided descriptions. | ||
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr) | ||
|
||
for idx, one_wav in enumerate(wav): | ||
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS. | ||
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True) | ||
``` | ||
|
||
## 🤗 Transformers Usage | ||
|
||
MusicGen is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies | ||
and additional packages. Steps to get started: | ||
|
||
1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main: | ||
|
||
``` | ||
pip install git+https://github.com/huggingface/transformers.git | ||
pip install -e . # or if you cloned the repo locally (mandatory if you want to train). | ||
``` | ||
|
||
2. Run the following Python code to generate text-conditional audio samples: | ||
|
||
```py | ||
from transformers import AutoProcessor, MusicgenForConditionalGeneration | ||
|
||
|
||
processor = AutoProcessor.from_pretrained("facebook/musicgen-small") | ||
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small") | ||
|
||
inputs = processor( | ||
text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"], | ||
padding=True, | ||
return_tensors="pt", | ||
) | ||
|
||
audio_values = model.generate(**inputs, max_new_tokens=256) | ||
We also recommend having `ffmpeg` installed, either through your system or Anaconda: | ||
```bash | ||
sudo apt-get install ffmpeg | ||
# Or if you are using Anaconda or Miniconda | ||
conda install 'ffmpeg<5' -c conda-forge | ||
``` | ||
|
||
3. Listen to the audio samples either in an ipynb notebook: | ||
## Models | ||
|
||
```py | ||
from IPython.display import Audio | ||
At the moment, AudioCraft contains the training code and inference code for: | ||
* [MusicGen](./docs/MUSICGEN.md): A state-of-the-art controllable text-to-music model. | ||
* [AudioGen](./docs/AUDIOGEN.md): A state-of-the-art text-to-sound model. | ||
* [EnCodec](./docs/ENCODEC.md), a state-of-the-art high fidelity neural audio codec. | ||
* [Multi Band Diffusion](./docs/MBD.md): EnCodec compatible decoder using diffusion. | ||
|
||
sampling_rate = model.config.audio_encoder.sampling_rate | ||
Audio(audio_values[0].numpy(), rate=sampling_rate) | ||
``` | ||
## Training code | ||
|
||
Or save them as a `.wav` file using a third-party library, e.g. `scipy`: | ||
AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models. | ||
For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to | ||
the [AudioCraft training documentation](./docs/TRAINING.md). | ||
|
||
```py | ||
import scipy | ||
For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model | ||
that provides pointers to configuration, example grids and model/task-specific information and FAQ. | ||
|
||
sampling_rate = model.config.audio_encoder.sampling_rate | ||
scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy()) | ||
``` | ||
|
||
For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the | ||
[MusicGen docs](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen) or the hands-on | ||
[Google Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb). | ||
## API documentation | ||
|
||
## Model Card | ||
We provide some [API documentation](https://facebookresearch.github.io/audiocraft/api_docs/audiocraft/index.html) for AudioCraft. | ||
|
||
See [the model card page](./MODEL_CARD.md). | ||
|
||
## FAQ | ||
|
||
#### Will the training code be released? | ||
|
||
Yes. We will soon release the training code for MusicGen and EnCodec. | ||
#### Is the training code available? | ||
|
||
Yes! We provide the training code for [EnCodec](./docs/ENCODEC.md), [MusicGen](./docs/MUSICGEN.md) and [Multi Band Diffusion](./docs/MBD.md). | ||
|
||
#### I need help on Windows | ||
#### Where are the models stored? | ||
|
||
@FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4) | ||
Hugging Face stored the model in a specific location, which can be overriden by setting the `AUDIOCRAFT_CACHE_DIR` environment variable. | ||
|
||
#### I need help for running the demo on Colab | ||
|
||
Check [@camenduru tutorial on Youtube](https://www.youtube.com/watch?v=EGfxuTy9Eeo). | ||
## License | ||
* The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE). | ||
* The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights). | ||
|
||
|
||
## Citation | ||
|
||
For the general framework of AudioCraft, please cite the following. | ||
``` | ||
@article{copet2023simple, | ||
title={Simple and Controllable Music Generation}, | ||
author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez}, | ||
year={2023}, | ||
journal={arXiv preprint arXiv:2306.05284}, | ||
title={Simple and Controllable Music Generation}, | ||
author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez}, | ||
year={2023}, | ||
journal={arXiv preprint arXiv:2306.05284}, | ||
} | ||
``` | ||
|
||
## License | ||
* The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE). | ||
* The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights). | ||
|
||
[arxiv]: https://arxiv.org/abs/2306.05284 | ||
[musicgen_samples]: https://ai.honu.io/papers/musicgen/ | ||
When referring to a specific model, please cite as mentioned in the model specific README, e.g | ||
[./docs/MUSICGEN.md](./docs/MUSICGEN.md), [./docs/AUDIOGEN.md](./docs/AUDIOGEN.md), etc. |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Copyright (c) Meta Platforms, Inc. and affiliates. | ||
# All rights reserved. | ||
# | ||
# This source code is licensed under the license found in the | ||
# LICENSE file in the root directory of this source tree. | ||
"""Adversarial losses and discriminator architectures.""" | ||
|
||
# flake8: noqa | ||
from .discriminators import ( | ||
MultiPeriodDiscriminator, | ||
MultiScaleDiscriminator, | ||
MultiScaleSTFTDiscriminator | ||
) | ||
from .losses import ( | ||
AdversarialLoss, | ||
AdvLossType, | ||
get_adv_criterion, | ||
get_fake_criterion, | ||
get_real_criterion, | ||
FeatLossType, | ||
FeatureMatchingLoss | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Copyright (c) Meta Platforms, Inc. and affiliates. | ||
# All rights reserved. | ||
# | ||
# This source code is licensed under the license found in the | ||
# LICENSE file in the root directory of this source tree. | ||
|
||
# flake8: noqa | ||
from .mpd import MultiPeriodDiscriminator | ||
from .msd import MultiScaleDiscriminator | ||
from .msstftd import MultiScaleSTFTDiscriminator |
Oops, something went wrong.