A project about generating guitar tabs using large language models. Mostly experimented on fine-tuning GPT-2.
DadaGP dataset is used for both training and generation. Their encoder/decoder code is used to convert guitar pro files to text and vice versa.
Python: 3.9.13
Prepare a config file in yaml format just like in the example_configs folder. Clone the dadaGP repo in the same folder as the project. Install the packages in the requirements file.
python preprocessing.py --input-path ../DadaGP-v1.1 \
--output-path ../DadaGP-processed-classic_rock \
--genre classic_rock
python train.py --config configs/gpt2_large_train.yml
Prepare an input txt file that contains initial tokens. Or use initial tokens of your favorite song from the dataset. The rest is running the following commands:
python inference.py --config configs/gpt2_train.yml \
--input-path /path/to/here_comes_the_sun.txt \
--output-path /path/to/output/folder/ \
--output-file here_comes_the_sun.gp5 \
--n-warm-up 128 \
--max-length 1024 \
--overlap 128 \
--instruments clean0
cd ./dadaGP/
python dadagp.py decode \
/path/to/output/folder/input.txt \
/path/to/output/folder/here_comes_the_sun.gp5
Open the .gp5 file in Guitar Pro or TuxGuitar as a free alternative.
- Using LLMs with larger context sizes and training custom tokenizers seem promising to improve generation results.
- Token type embeddings