MuseNet is a deep neural network created by OpenAI that can generate 4-minute musical compositions with up to 10 different instruments. It combines styles from different genres such as country, Mozart and the Beatles.
It is based on the same general-purpose unsupervised technology as GPT-2, a large-scale transformer model trained to predict the next token in a sequence, whether audio or text.
The model is trained on sequential data, by asking it to predict the upcoming note given a set of notes. It uses chordwise encoding, which considers every combination of notes sounding at one time as an individual ‘chord’, and assigns a token to each chord.
Additionally, the composer and instrumentation tokens are used to give more control over the kinds of samples MuseNet generates. The model is able to generate music that blends different styles and instruments, while also being able to remember long-term structure in a piece.
It is trained using a dataset collected from various sources such as Classical Archives and BitMidi, as well as the MAESTRO dataset.