This is the repo of all codes used in our paper [...].
Install pyenv
and poetry
. You can also setup your environment with other
tools, if the dependencies listed in pyproject.toml
are installed.
The following is an example of setup for venv.
cd <repo>
git clone <this-repo-url>
pyenv install --patch 3.6.9 < python_alignment.patch; pyenv local 3.6.9
or just use python 3.6.9; note that the install command is using this patchpython -m venv .venv; source .venv/bin/activate.fish
python -m pip install -U pip
python -m pip install -r requirements.txt
pip install git+https://github.com/CPJKU/madmom -c constraints.txt
(this is needed because magenta has an exact dependency for an old mido version in its requirements.txt)python setup.py build_ext --inplace
Then, you'll need vienna_corpus
, SMD
and Maestro
datasets from
asmd
package:
python -m asmd.install
- Download our pretrained vienna model on Maestro and put it in your working dir from our mega
- Train our proposed model or download the pretrained ones from our mega:
- You will need the template matrix provided in this repo. To rebuild it
run
python -m perceptual.make_template
. You will need the synthesized scale and the corresponding midi in thescales
andaudio
folder. You can download them from our megapython -m perceptual.proposed create_mini_specs
to create the dataset of mini-specs or download it from our mega.- dataset size: 474.429 notes (831 batches in test, 178 in train))
python -m perceptual.proposed train
to train our model for velocity estimation and test it. I obtained the following absolute error (avg, std) on the test set: 15.11, 10.94 (251 epochs)- redo everything with vienna model (use
--vienna
forcreate_mini_specs
andtrain
)
- Run
python -m perceptual.excerpt_search
This will analyze vienna_corpus
in search of excerpts, will transcribe the
original performances and will create a new directory audio
with all
extracted excerpts audio files and a directory to_be_synthesized
with all
midi files that you have to synthesize and put in audio
- Synthesize the chosen excerpts with vsts or download our
synthesized midis from our mega; extract them in the
audio
directory. You should have a directory for each vst inaudio
and for each vst you should have 5 different audio. In the root ofaudio
you should also have the original recordings. - Install
sox
in your path for post-processing to add reverb or run with--no-postprocess
- Analyze chosen excerpts:
python -m perceptual.chose_vst
This will copy the excerpts relative to the chosen vsts to the folder
excerpts
.
Chosen vsts: - q0: ./audio/salamander - q1: ['./audio/pianoteq1', './audio/salamander-norm_-20_reverb_50_norm'] - q2: ./audio/pianoteq1-norm_-20_reverb_100_norm
Set up your server (Python or PHP) and download WAET.
- place the directory
excerpts
in the root of WAET - place the directory
reveal.js-3.9.2
into the root of WAET - place the file
index.html
in the root of WAET (if you want, you can regenerate theindex.html
by runningpandoc --to revealjs -V revealjs-url=reveal.js-3.9.2 --output index.html --standalone index.md
) - place the file
listening_test.xml
in[WAET root]/tests/pool.xml
- place the file
core.css
in[WAET root]/css/core.css
You should be able to access your test at /test.html?url=php/pool.php
.
More info in the WAET wiki
index.html
contains the instructions for the test, so that you can
distribute the url to the root of WAET to your partecipants.
To plot tests you should use streamlit run perceptual_app.py
,
which also prints correlations with the objective measure of your choice.
The test answers that we collected are available in the repo.
Before of running you should change the settings according to your system: open the script and change the initial global variables:
PATH
is the path to thesaves
dir of WAETDISCARD_BEFORE_THAN
defines a date before of which the answers whould be discarded; this is useful for removing debug answersMAP_VALUES
defines the mapping for creating the control groups according to the answer of the users
Also note that all answers in which the users listened to for less than 5 seconds or didn't move the cursor are completely discarded. This is hard-coded in final section of the script.
At each run, violin plots are created for each control group and each method.
One plot is created for each question type and excerpt or for each question
type if average
option is used. Under each plot, there are the p-values
computed for each combination of groups or methods. The error margins and
correlations are shown too.
Supplementary materials show some of the plots that can be generated.
To compute the linear regressions of the perceptual values, you should run
python perceptual.eval_regression
. It will plot the regression
predictions for various model and weights for the case with and without MFCC
features. Than, it will also plots the weights with only the selected features.
If you want, you can test the selected features by using our_eval
as option
to the subjective_eval
script.
- Install fluidsynth and download SalamanderGrandPianoV3 soundfont in sf2 format from our mega folder and put it in your working dir
- run
python -m perceptual.alignment.dtw_tuning
to check the FastDTW tuning in midi2midi overMusicNet
solo piano songs - run
python -m perceptual.alignment.align amt
to perform our amt-based alignment over SMD dataset with the best parameters found in the previous step - run
python -m perceptual.alignment.align ewert
to perform our baseline alignment over SMD dataset - run
python -m perceptual.alignment.analysis results/ewert.csv results/amt.csv
to plot the results of alignment