Jukebox is an open-source neural network tool that generates music and rudimentary singing as raw audio in multiple genres and artist styles. It releases the code and model weights with an exploration tool for generated samples.
With Jukebox, users can provide input regarding genre, artist, and lyrics, and the tool outputs new music samples in response. Jukebox produces a wide range of music and singing styles and generalizes to lyrics not seen during training.
The tool can also produce music that bears no resemblance to the songs upon which it trained when conditioned on lyrics seen during training. With Jukebox, users can condition on 12 seconds of audio, and the tool completes the remainder in a specified style.Jukebox models music directly as raw audio, which is challenging because raw audio sequences are very long.
To tackle this problem, Jukebox uses an autoencoder to compress raw audio to a lower-dimensional space, which lets the tool generate audio in that compressed space and up-sample back to the raw audio space.
Jukebox is an example of pushing the boundaries of generative models and is more expressive than tools that generate music symbolically in the form of a piano roll.
It is well-suited for users interested in experimenting with AI-generated music.