Vasanth Mani Vasi / Aug 21 2021

GSoC 2021 (Part I): Implementing Generative Music Models in Julia

I am glad to have been a part of the wonderful Julia community through Google Summer of Code 2021. It has been an exciting summer so far, working on expanding the JuliaMusic ecosystem with MusicTransformer.jl, PerformanceRNN.jl and NoteSequences.jl. I would like to thank my mentors Avik Sengupta and George Datseris for this opportunity and their support.

It's the final week of GSoC, where we're wrapping up our projects and adding finishing touches to them. In this series of blog posts, I will go over what was accomplished this summer, what the packages do and how to use them to generate music.

using PkgPkg.add(url="https://github.com/VasanthManiVasi/NoteSequences.jl")Pkg.add(url="https://github.com/VasanthManiVasi/PerformanceRNN.jl")Pkg.add(url="https://github.com/VasanthManiVasi/MusicTransformer.jl")using NoteSequencesimport PerformanceRNNusing MusicTransformer

Music Transformer and Relative Attention

The Music Transformer is a state-of-the-art neural network for music generation and it was released by the Magenta project at Google. It's key feature is relative attention, which encodes relative positional representations to the attention mechanism in the Transformer.

The original Transformer model requires adding absolute positional information to incorporate positional dependencies in its attention mechanism. Music, however, has a lot of relative positional dependencies, and relative attention explicitly modulates attention based on the relative distance between two sequence elements. This relation-aware self-attention mechanism also allows the model to generalize beyond the length of the training examples.

Visualization of relative attention in the music transformer. https://magenta.tensorflow.org/music-transformer

Let's build the relative-attention Music Transformer in Julia. It is implemented as MultiheadRelativeAttention in MusicTransformer.jl and has the following structure:

struct MultiheadRelativeAttention{R<:AbstractArray, Q<:Dense, K<:Dense, V<:Dense, O<:Dense, DP<:Dropout} <: AbstractAttention    head::Int    future::Bool    relative_embedding::R    iqproj::Q    ikproj::K    ivproj::V    oproj::O    drop::DPend

And this is a TransformerRelative block which computes multi-head relative-attention:

TransformerRelative(        MultiheadRelativeAttention(head, size, hs, size, max_relative_position; future=future, pdrop=pdrop),        LayerNorm(size),        PwFFN(size, ps, act),        LayerNorm(size),        Dropout(pdrop),    )

Finally, let's stack these TransformerRelative blocks together to build the Music Transformer.

using Fluxusing Transformersusing Transformers.Basicusing MusicTransformer: TransformerRelativeconst N = 6const num_heads = 8const depth = 512const ff_depth = 2048const max_relative_position = 2048const vocab_size = 310music_transformer = Stack(     @nntopo(inputs => embeds => $N => logits),     Embed(depth, vocab_size),     [TransformerRelative(depth, num_heads, ff_depth, max_relative_position)        for i in 1:N]...,     LayerNorm(depth))

Thus, we have created the Music Transformer in Julia. Now let's see how to feed music sequences to the model.

NoteSequences.jl

To make the transformer learn from music data, we need a way to convert our raw midi files to a representation the model can understand (e.g. one-hot vectors). For this, we have NoteSequences.jl.

A NoteSequence is an abstract representation of a musical sequence. The package has utility functions for manipulating a NoteSequence, converting them to various intermediate representations like Melody or Performance, and also to model-specific inputs (one-hot vectors). It also has functions for exporting them to midi and audio.

HereComesTheSun.mid

33.21 KBDownload

Here's how we can extract the instruments in a midi file using NoteSequences.jl. getinstruments() gives the musical notes, control change events, pitch bend events and the program number of each instrument in the midi file.

using FileIOusing NoteSequencesmidi = load("HereComesTheSun.mid")getinstruments(midi)"""returns the below output9-element Vector{NoteSequences.Instrument}: Instrument(program = 24) with 1121 Notes, 6 Control Changes, 0 Pitch Bends Instrument(program = 32) with 422 Notes, 5 Control Changes, 0 Pitch Bends Instrument(program = 48) with 258 Notes, 106 Control Changes, 0 Pitch Bends Instrument(program = 0) with 502 Notes, 5 Control Changes, 0 Pitch Bends Instrument(program = 81) with 60 Notes, 5 Control Changes, 0 Pitch Bends Instrument(program = 52) with 207 Notes, 6 Control Changes, 0 Pitch Bends Instrument(program = 52) with 117 Notes, 6 Control Changes, 0 Pitch Bends Instrument(program = 52) with 107 Notes, 6 Control Changes, 0 Pitch Bends Instrument(program = 0) with 1093 Notes, 5 Control Changes, 0 Pitch Bends"""

To obtain a NoteSequence from a midi file, we first extract all the individual instruments, and then put them together as a sequence. This again sounds just like the MIDI format, but unlike a midi file, the events in the note sequence also have the program and instrument number. Therefore, we can, for instance, find all the events related to a specific instrument and its program number easily without writing a separate parser for a NoteSequence.

What else is a NoteSequence useful for?

If we write functions to convert different formats to and from a NoteSequence, we can enable interoperation among all of those formats. For instance, if you write a function to convert ABC music notation to a NoteSequence. You can now also convert any ABC music files to the MIDI format (since you can already convert NoteSequences to a MIDI file). You could also convert music data in the ABC notation as inputs to a model (through a NoteSequence).

By the way, if you would like to contribute to the JuliaMusic ecosystem, writing a function to convert MusicXML to a NoteSequence would be a great start! It would allow for an interoperation between MusicXML.jl and MIDI.jl, which was wanted for a long time!

Using NoteSequences.jl to convert MIDI files to a Performance

HandelChaconne.mid

58.20 KBDownload

handel_midi = load("HandelChaconne.midi")notesequence = NoteSequence(handel_midi)"""returns the below outputNoteSequence(tpq=384, isquantized=false, sps=-1)  Total time = 252988 ticks  1 TimeSignatures, 0 KeySignatures, 1 Tempos  3012 Notes, 0 PitchBends, 11375 ControlChanges"""

We can apply sustain control pedal changes to the notes in the sequence. Let's visualize them using MusicVisualizations.jl

sustained_ns = NoteSequences.applysustainchanges(notesequence)sustained_midi = midifile(sustained_ns)# Plotting them to visualize the differenceusing MusicVisualizationsnoteplotter(getnotes(handel_midi))noteplotter(getnotes(sustained_midi))

Plotting the notes before applying the sustain pedal changes.

Plotting the notes after applying the sustain pedal changes.

If we look closer, we can see that many of the notes have been sustained (especially the notes on the left and the notes in the middle).

const steps_per_second = 100# Quantize all the events in the NoteSequence based on absolute timequantized_ns = NoteSequences.absolutequantize(notesequence, steps_per_second)# Converting to the performance representationperformance = Performance(quantized_ns, velocity_bins=32)performance[1:6]"""returns the following output6-element Vector{PerformanceEvent}: TIME-SHIFT 97 VELOCITY 21 NOTE-ON 74 TIME-SHIFT 2 VELOCITY 16 NOTE-ON 59"""

This was just an exhibition of using NoteSequences.jl for manipulating NoteSequences and converting them to other intermediate music representations like Performance. We'll see more about them in detail in the forthcoming blog posts.

Pre-Trained Music Models

Here is a list of the pre-trained music models that were added.

PerformanceRNN.jl

PerformanceRNN - An LSTM that does not encode note velocities but can generate music with expressive timing.

PerformanceRNN with dynamics - An LSTM that is aware of note velocities and can generate music with both expressive timing and dynamics.

MusicTransformers.jl

UnconditionalMusicTransformer - A piano performance language model trained on over 10,000 hours of performances obtained from the transcriptions of piano recordings on YouTube.
music_transformer = pretrained"unconditional_model_16"

MelodyConditionedMusicTransformer - A melody-conditioned piano performance language model. This model can be conditioned with a monophonic melody input to generate a polyphonic performance. It generates an accompaniment for the given melody.
melody_conditioned_transformer = pretrained"melody_conditioned_model_16"

PerformanceRNN.list_pretrains()MusicTransformer.list_pretrains()

We'll use these models in the coming blog posts to generate music.

Conclusion

This was just a short overview of what's capable with the packages. In the next blog post, we'll see how we could use the unconditional music transformer to generate music! Here's a sample for what's to come.