Skip to content

wrongbad/tensormidi

Repository files navigation

tensormidi

Extremely fast midi parser returning dense numpy structured arrays.

Ideal for machine learning pipelines.

Natively supported by numba so you can write optimized post-processors and tokenizers in python.

Can parse ~10k midi files per second on single CPU core (Ryzen 7950X)

%pip install git+https://github.com/wrongbad/tensormidi.git
import tensormidi

midi = tensormidi.load('bach/catech7.mid')

print(f'{midi.shape=}')
print(f'{midi.dtype=}')
for k in midi.dtype.names:
    print(k, midi[0][k])
midi.shape=(1440,)
midi.dtype=dtype((numpy.record, [('time', '<f8'), ('track', 'u1'), ('program', 'u1'), ('channel', 'u1'), ('type', 'u1'), ('key', 'u1'), ('value', 'u1')]), align=True)
time 1.2
track 4
program 19
channel 3
type 144
key 43
value 80

All your favorite array-level ops just work

import numpy as np

notes = np.sum(midi.type == tensormidi.NOTE_ON)
length = np.max(midi.time)
print(f'{notes=}')
print(f'{length=}')
notes=720
length=79.60473141666768

Field accessors are normal numpy array views, understood by other libraries

import torch

torch.tensor(midi.time)
tensor([ 1.2000,  1.2000,  1.2000,  ..., 79.6047, 79.6047, 79.6047],
       dtype=torch.float64)

API

def load(
    filename: str,              # path to the midi file
    merge_tracks: bool = True,  # merge all tracks into 1
    seconds: bool = True,       # convert times to seconds (include tempo)
    notes_only: bool = True,    # keep only NOTE_ON and NOTE_OFF events
    default_program: int = 0,   # fallback when track doesn't specify program
):

returns

If seconds == True returns tracks

Else returns tracks, tempos, ticks_per_beat

tracks

If merge_tracks == True then tracks is a single numpy array of event records.

Else, tracks is a list of numpy arrays of event records.

Numpy record array memory layout is the same as an array of structs in C/C++.

field dtype description
time float64 seconds or ticks since beginning of song
track uint8 track index the event originates from
program uint8 most recent program for the channel (or default_program)
channel uint8 midi channel
type uint8 event type (see below)
key uint8 multi-purpose (see below)
value uint8 multi-purpose (see below)

Fields key and value are multi-purpose for various channel events

type key value
NOTE_ON note velocity
NOTE_OFF note velocity
POLY_AFTERTOUCH note pressure
CONTROL index value
CHAN_AFTERTOUCH 0 pressure
PITCH_BEND value&127 value>>7

PROGRAM_CHANGE events are consumed internally, populating the program field on later events.

tempos

Tempos is a record array specifying tempo changes throughout the song

field dtype description
tick uint64 ticks since beginning of song when change takes effect
sec_per_beat float64 new tempo, in seconds per beat

ticks_per_beat

Scalar value indicating ticks per beat for the whole file

For example, ticks per second is ticks_per_beat / sec_per_beat where sec_per_beat comes from latest tempo event.

C++ Linkage

The C++ library is header only with clean C++ APIs, unbiased by the python bindings.

Header include path can be dumped with python -m tensormidi.includes for easy makefile use.

Of course you could just clone this repo and point to src/tensormidi/include as well.

Numba Example

Numpy record arrays work perfectly with numba.

Here is an example of how you can compute note durations with simple code that is also very fast.

%pip install numba
import numba

# @numba.jit
def durations(midi):
    n = len(midi)
    out = np.zeros(n, dtype=np.float32)
    off_time = np.zeros((16, 128), dtype=np.float64)
    for i in range(n-1, -1, -1):
        e = midi[i]
        if e.type == tensormidi.NOTE_ON:
            out[i] = off_time[e.channel, e.key] - e.time
        elif e.type == tensormidi.NOTE_OFF:
            off_time[e.channel, e.key] = e.time
    return out

midi = tensormidi.load('bach/catech7.mid')

print("pure python")
%timeit durs = durations(midi)

print("with numba")
jitdurations = numba.jit(durations)
%timeit durs = jitdurations(midi)

durs = jitdurations(midi)
durs = durs[midi.type == tensormidi.NOTE_ON]
notes = midi[midi.type == tensormidi.NOTE_ON]

print("")
print(notes[:20].key)
print(durs[:20])
pure python
8.87 ms ± 79.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
with numba
2.43 µs ± 1.98 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

[43 43 43 43 62 64 66 67 82 45 45 45 45 72 69 67 66 67 81 64]
[1.05 1.05 1.05 1.05 0.13 0.13 0.13 0.86 0.26 1.05 1.05 1.05 1.05 0.78
 0.13 0.13 0.13 0.13 0.26 0.13]

FluidSynth Example

%pip install pyfluidsynth
from IPython.display import Audio
import fluidsynth
import numpy as np

samplerate = 44100
synth = fluidsynth.Synth(samplerate=samplerate)
synth.sfload('/usr/share/sounds/sf2/FluidR3_GM.sf2')

midi = tensormidi.load('bach/catech7.mid')
audio = np.zeros((0,2), np.int16)

for m in midi:
    nsamp = int(samplerate * m.time)
    if nsamp > audio.shape[0]:
        # make the audio engine catch up to current time
        nsamp -= audio.shape[0]
        chunk = synth.get_samples(nsamp).reshape(-1, 2)
        audio = np.concatenate((audio, chunk))
    
    # every note event carries program id
    synth.program_change(m.channel, m.program)
    
    if m.type == tensormidi.NOTE_ON:
        synth.noteon(m.channel, m.key, m.value)
    elif m.type == tensormidi.NOTE_OFF:
        synth.noteoff(m.channel, m.key)
    elif m.type == tensormidi.CONTROL:
        synth.cc(m.channel, m.key, m.value)

Audio(data=audio[:, 0], rate=samplerate)
!jupyter nbconvert --to markdown readme.ipynb --output ../README.md