Wish you could sing like Charli XCX or possess any musical talent? YouTube AI might make that happen
Fake it (with a neural network) until you make it
YouTube is experimenting with software that generates music using an AI model called Lyria built by Google DeepMind.
The Google-owned home-video giant announced on Thursday two features it's pushing out to a small group of testers: Dream Track and the Music AI Tool.
Dream Track converts a text-based prompt into a short audio snippet that mimics the voice and style of various pop stars – namely Alec Benjamin, Charlie Puth, Charli XCX, Demi Lovato, John Legend, Sia, T-Pain, Troye Sivan, and Papoose. The choice of artist is limited to these performers so far because Google has had to negotiate licenses to train Lyria on their music to avoid a copyright war.
You can listen to what Dream Track generates for the prompt "a sunny morning in Florida, R&B" in the style of T-Pain – an artist well known for altering his voice with autotune – below.
It certainly sounds like T-Pain, and the lyrics are appropriate and match the prompt, too. Dream Track is currently only available to "a limited set of creators" who can generate 30-second clips of AI-made tracks that can be posted as YouTube Shorts – typically minute-long videos.
Music AI Tool seems more interesting and useful. It allows folks, particularly those without any or many instruments, to transform an audio clip – such as a chord or someone humming a tune – into something that retains the original sound but played in the form of another instrument.
The most impressive demo, perhaps, converts a bunch of "na-na-na" singing sounds into a orchestral score, complete with string instruments that seems like it could be a somewhat convincing soundtrack for a film. You can hear it below.
YouTube is only sharing Music AI Tool with select artists, songwriters, and producers that are part of its Music AI Incubator program right now.
Music AI Tool allows people to create music in new forms without having to know how to sing or play musical instruments particularly well – much like how anyone can use text-to-image models to generate artwork without having to know how to draw or paint.
It kinda reminds us of years ago, in the rise of electronic music, when cynics moaned that synthesizers and computer-aided sequencing let anyone churn out tracks like so-called real musicians.
"These experiments explore the potential of AI features to help artists and creators stretch their imaginations and augment their creative processes," explained YouTube's Lyor Cohen and Toni Reid, global head of music, and VP emerging experiences & community products, respectively.
"And in turn, fans will be able to connect to the creatives they love in new ways, bringing them closer together through interactive tools and experiences. All of this will help us iterate and enhance the technology, informing applications for the future."
- AI copyright row deepens: Stability VP quits in protest over 'fair use' excuse
- YouTubers kindly asked to mark their deepfake vids as Fake Fakey McFake Fakes
- Bad Vibrations: Music publishers sue Anthropic AI for using copyrighted lyrics
- Spotify now using AI to clone podcaster's voice into Spanish
Generative AI and music, however, is particularly tricky. Not only is it difficult to build models capable of creating audio that actually sounds good, but securing the data to train the systems is difficult. Record labels are notoriously litigious when it comes to protecting their copyrights – as YouTube knows very well. The video site has said it's working around these issues, and is trying to enter licensing agreements to compensate artists for their music.
"Despite the tremendous opportunity AI presents, we also recognize it's a quickly evolving space that presents complex challenges. One of YouTube's greatest strengths is our strong relationships with music industry partners. We're committed to collaborating with them as we enter this new era, critically exploring together new opportunities and developing sensible and sustainable controls, monetization and attribution frameworks," Cohen and Reid added.
Meanwhile, researchers at Google DeepMind are tackling the problem of fake AI-generated audio that could be used to manipulate or mislead listeners. Tracks created using its Lyria model will carry imperceptible watermarks from its SynthID tool used to identify synthetic content. SynthID apparently works by converting audio data into a two-dimensional spectrogram, applying a digital watermark to that representation, and converting it back into audio.
"The watermark is designed to maintain detectability even when the audio content undergoes many common modifications such as noise additions, MP3 compression, or speeding up and slowing down the track. SynthID can also detect the presence of a watermark throughout a track to help determine if parts of a song were generated by Lyria," DeepMind explained. ®