GSoC 2021 (Part II): Generating Music with the Music Transformer

In the last post we had an overview of what was available in the packages that were developed in this summer. In this post, we would go over the Performance representation and how we can convert a NoteSequence to and from a Performance, and how the events are encoded to and decoded from model inputs (one-hot vectors). Finally, we'll generate music using the music transformer.

Performance Events

Performance is a representation for expressing a polyphonic music sequence. It allows for generating music with expressive human-like dynamics and timing since it encodes note-on, note-off, velocity and time-shift events. The velocity events play the key part in producing the dynamics of the music generated by the models. The MIDI format has 127 velocity values and those are quantized into 32 velocity bins in the Performance representation.

Let's see how we can convert our MIDI files to the Performance representation.

HandelChaconne.mid
58.20 KBDownload
midi = load("HandelChaconne.mid")
sequence = NoteSequence(midi)

We need to quantize the NoteSequence before converting it to a performance.

Steps per second is the quantizing resolution (amount of quantized time steps for one second)

const steps_per_second = 100
NoteSequences.absolutequantize!(sequence, steps_per_second)
performance = Performance(sequence, velocity_bins=32)
# If the amount of velocity_bins are not passed, the notes will have the same velocity throughout the performance.

Now let's see how the events are encoded into one-hot indices.

using NoteSequences.PerformanceRepr
encoder = PerformanceOneHotEncoding(num_velocitybins=32)
index = encode_event(PerformanceEvent(TIME_SHIFT, 100), encoder)
# 356
CmajorChord = [
  PerformanceEvent(VELOCITY, 20),
  PerformanceEvent(NOTE_ON, 60),
  PerformanceEvent(NOTE_ON, 64),
  PerformanceEvent(NOTE_ON, 67),
]
indices = map(event -> encode_event(event, encoder), CmajorChord)
"""
4-element Vector{Int64}:
 376
  61
  65
  68
"""

In a similar manner, we can decode those indices to get PerformanceEvents.

events = map(index -> decode_event(index, encoder), indices)
"""
4-element Vector{PerformanceEvent}:
 VELOCITY 20
 NOTE-ON 60
 NOTE-ON 64
 NOTE-ON 67
"""

The PerformanceRNN uses these encoded indices from PerformanceOneHotEncoding as its inputs.

To convert a performance into the inputs of the music transformer, we can use the MidiPerformanceEncoder. It accounts for padding and EOS tokens along with the performance event tokens. It can also convert a NoteSequence to performance one-hot indices directly, instead of having us to manually quantize the NoteSequence, convert it to a performance and then finally to one-hot indices.

The music transformer was trained on input midi pitches ranging from 21 to 108, which is 88 pitches in total. It is the same range of pitches and amount of keys as in a standard piano.

const MIN_PITCH = 21
const MAX_PITCH = 108
const VELOCITY_BINS = 32
add_eos = true
encoder = MidiPerformanceEncoder(
      NoteSequences.DEFAULT_STEPS_PER_SECOND, # 100
      VELOCITY_BINS, MIN_PITCH, MAX_PITCH, add_eos
)
map(event -> encode_event(event, encoder), CmajorChord)
"""
4-element Vector{Int64}:
 298
  42
  46
  49
"""

Generating a performance from scratch

We can use the unconditional music transformer to generate a performance from scratch. The generation will stop when it encounters an EOS token. It can take a while to generate longer performances on the CPU, so we can set numsteps to a specific amount, and if the generated performance reaches that many steps, the generation is stopped. Here, we set numsteps to 2500 steps (30 seconds at 100 steps per second).

musictransformer = pretrained"unconditional_model_16"
# This will generate a performance that is approximately 25 seconds long
midi = generate(musictransformer, numsteps=3000)
musescore_export(midi)

Here's what it generated.

Here are some longer samples.

Generating with a primer

To prime the model, we have to first convert our midifile to a performance.

Let's apply sustain pedal changes on the note sequence, and quantize the sequence before converting it to a performance.

handel = load("HandelChaconne.mid")
ns = NoteSequence(handel)
NoteSequences.applysustainchanges!(ns)
NoteSequences.absolutequantize!(ns, 100)
# We must set the amount of velocity bins to encode velocity events in the performance.
# The velocity values are quantized to 32 bins for the music transformer
# So we could also set it to 32
performance = Performance(ns, velocity_bins=32)

Let's set the length of the performance to 1000 steps (which is 10 seconds at 100 steps per second)

setlength!(performance, 1000)
musescore_export(getnotesequence(performance))

This is how the primer sounds.

Now let's ask the model to generate a continuation for this primer.

midi = generate(musictransformer, primer=performance, numsteps=3000)
# musescore_export(midi)

These were the continuations for 10 seconds of the primer

These were the continuations when the model was given 20 seconds of the same primer.

(The volume is quite low for the last one)

Conclusion

That's all for this blog! Feel free to share in the #music-dev channel of the Julia language slack if you generate something with this model :)

References

Runtimes (1)