A new artificial intelligence (AI) tool from Google can now generate music in any genre from text prompts, and can even transform a whistling or humming melody into other instruments. According to Google Research, the technology, called MusicLM, is a text-to-music generation system. He works by analyzing the text and deciphering the scale and complexity of the composition.
“We present MusicLM, a model generating high-quality music from textual descriptions such as ‘a soothing violin melody backed by a distorted guitar riff,'” the research paper says. “We demonstrate that MusicLM can be conditioned by both text and melody, as it can transform whistling and humming melodies according to the style described in a text caption,” he added.
According to the paper, MusicLM was trained on a dataset of 280,000 hours of music to learn to generate coherent songs from textual descriptions and pick up nuances such as mood, melody and instrumentation. Its capabilities extend beyond generating short song clips. Google researchers have shown that the system can be based on existing melodies, whether hummed, sung, whistled or played on an instrument.
In addition, according to the research, MusicLM can also take several descriptions written in sequence—for example, “time to meditate,” “time to wake up,” “time to run,” and “time to give 100%”—and create something like a melodic “story” or narrative up to several minutes long. It can also be instructed by a combination of picture and caption, or generate audio that is “played” by a specific type of instrument in a specific game.
Also read | NASA’s Hubble Space Telescope discovers a black hole twisting a donut-shaped star
It should be noted that Google is not the first company to do this. According to TechCrunch, projects like OpenAI’s Jukebox or Riffusion, an artificial intelligence that can generate music by visualizing it, and Google’s own AudioLM have all tried their hand. However, due to technical limitations and limited training data, no one has been able to create songs particularly complex in composition with high fidelity. So the researchers believe that MusicLM is perhaps the first that can.
“MusicLM transforms the conditional music generation process as a hierarchical sequence-to-sequence modeling task and generates music at 24kHz that remains constant for several minutes. Our experiments show that MusicLM outperforms previous systems in terms of both sound quality and adherence to textual description,” the Google researchers said in the paper.
But MusicLM isn’t flawless. For starters, some of the sample music Google released in its research paper is of distorted quality. While the system can technically generate vocals, they are often synthesized and sound meaningless, according to TechCrunch. Another drawback is the sometimes compressed nature of the sound quality, a byproduct of the training process.
Also read | American woman found dead 50 years ago identified through crowd-funded DNA test
The Google researchers also noted the numerous ethical challenges posed by a system like MusicLM, including a tendency to include copyrighted training data material in the generated songs. During an experiment, the researchers found that about 1% of the music generated by the system was directly reproduced from the songs it was trained on. That threshold is apparently high enough to discourage Google researchers from running the latest AI system in its current state.
“We acknowledge the risk of potential misuse of creative content related to the use case,” the paper’s co-authors wrote. “We strongly emphasize the need for more future work to address these risks associated with music generation,” they added.
Featured Video of the Day
3500 Made in India drones, classic raga tunes at Beating Retreat