Google has created an AI that can generate music from text descriptions, but isn’t releasing it

Image Credits: Brian Heather

An impressive new AI system from Google can generate music in any genre given a text description. But the company, fearing the risks, has no immediate plans to release it.

Called MusicLM, Google is certainly not the first AI generative system for songs. There are other attempts, including Riffusion, an AI that composes music by visualizing it, as well as Dance Diffusion, Google’s AudioML, and OpenAI’s Jukebox. But due to technical limitations and limited training data, no one has been able to create songs that are particularly complex in composition or with high accuracy.

MusicLM is perhaps the first that can.

Detailed in an academic paper, MusicLM was trained on a dataset of 280,000 hours of music to learn to generate coherent songs for descriptions of – as the creators put it – “considerable complexity” (e.g. “a charming jazz song with a memorable solo saxophone and solo singer” or “90s Berlin techno with low bass and a heavy kick.” His songs, remarkably, sound something like the work of a human artist, though not necessarily as inventive or musically cohesive.

It is difficult to overestimate exactly how good the samples sound like there are no musicians or instrumentalists in the loop. Even when fed somewhat long and meandering descriptions, MusicLM manages to capture nuances like instrumental riffs, melodies, and moods.

The caption for the example below, for example, includes the “causes the experience of being lost in space” part, and it definitely delivers on that front (at least to my ears):

Here’s another example generated by a description starting with the sentence “The main soundtrack of an arcade game.” Plausible, right?

MusicLM’s AI capabilities extend beyond generating short song clips. Google researchers show that the system can be based on existing melodies, whether hummed, sung, whistled or played on an instrument. Also, MusicLM can take several descriptions written in sequence (eg “time to meditate”, “time to wake up”, “time to run”, “time to give 100%”) and create a kind of melodic “story” or a short story up to a few minutes long—ideally suited for a film soundtrack.

Check out below which comes from the series “electronic song performed in a video game”, “meditation song performed next to a river”, “fire”, “fireworks”.

That’s not all MusicLM can also be instructed by a combination of picture and caption or generate audio that is “played” by a certain type of instrument in a certain genre. Even the experience level of the AI ​​”musician” can be set, and the system can create music inspired by places, eras or requirements (e.g. motivational workout music).

But MusicLM isn’t flawless – far from it, to be honest. Some of the samples have a distorted quality, an inevitable side effect of the training process. And while MusicLM can technically generate vocals, including choral harmonies, they leave a lot to be desired. Most of the “lyrics” range from near-English to pure gibberish sung by synthesized voices that sound like a fusion of several artists.

Still, Google researchers note the many ethical challenges posed by a system like MusicLM, including the tendency to include copyrighted training data material in the generated songs. During an experiment, they found that about 1% of the music generated by the system was directly replicated from the songs it trained on—a threshold apparently high enough to discourage them from running MusicLM in its current state.

“We recognize the risk of potential misuse of creative content related to the use case,” the paper’s co-authors wrote. “We strongly emphasize the need for more future work to address these risks associated with music generation.”

Assuming MusicLM or a system like it one day becomes available, it seems inevitable that major legal issues will come to the fore – even if the systems are positioned as tools to assist artists rather than replace them. They already have, albeit around simpler AI systems. In 2020, Jay-Z’s record company filed copyright strikes against a YouTube channel, Vocal Synthesis, for using AI to create covers of Jay-Z songs like Billy Joel’s “We Didn’t Start the Fire” . After initially removing the videos, YouTube reinstated them, finding that the takedown requests were “incomplete.” But deeply fake music still stands on murky legal ground.

A white paper authored by Eric Sunray, now a legal intern at the Music Publishers Association, claims that AI music generators like MusicLM infringe music copyright by creating “tapestries of coherent audio from the works they absorb in training, thereby infringing replicating United States Copyright Law exactly.” Since Jukebox’s release, critics have also questioned whether training AI models on copyrighted musical material constitutes fair use. Similar concerns have been raised about the training data used in systems for artificial intelligence generating images, codes and text that are often deleted from the web without the knowledge of the creators.

From the user’s perspective, Waxy’s Andy Baio speculated that music generated by an AI system would be considered a derivative work, in which case only the original elements would be copyrighted. Of course, it is not clear what can be considered “original” in such music; to use this music commercially is to enter uncharted waters. It’s simpler if the generated music is used for purposes protected by fair use, such as parody and commentary, but Bayo expects the courts will have to make decisions on a case-by-case basis.

It may not be long before there is some clarity on the matter. Several lawsuits making their way through the courts are likely to relate to music-generating AI, including one involving the rights of artists whose work is used to train AI systems without their knowledge or consent. But time will tell.

Leave a Comment

Your email address will not be published. Required fields are marked *