Bot war: Here's how you can theoretically use adversarial AI to evade YouTube's hard-line copyright-detecting AI

YMMV – and the second M is doing a lot of heavy lifting, here

Analysis YouTube is understood to use machine-learning algorithms to identify copyrighted material in user-uploaded videos, so that, in theory at least, any artists featured are properly compensated for their work. This system works more or less, though it is not without its controversies.

Concerns over heavy-handedness and fair use rights aside, it turns out AI algorithms can, allegedly subtly, tweak the audio in video submissions so that any copyrighted music present can evade detection by YouTube's AI bots after they are uploaded.

Boffins at University of Maryland in America reckon their code successfully manipulated the audio in two songs – Stevie Wonder’s smash hit Signed, Sealed, Delivered that peaked at number 3 in 1970; and Kesha’s infectious track Tik Tok, which topped the chart in 2010 – so that after they were uploaded to YouTube, they avoided detection, and still sounded more or less the same as the originals.

This doctored audio is a type of adversarial attack in which neural networks tweak an input, such as a photograph, to produce a slightly vandalized output that hoodwinks another neural network into misidentifying that input data. Adversarial attacks are most commonly performed on computer vision models, such as the carefully tweaked toy turtle that was subsequently mistaken by machine-learning software for a gun.

And, yes, adversarial audio can be crafted, too, causing systems such as YouTube's digital copyright detectives to misidentify the music – as described by the Uni of Maryland team in this academic paper [PDF] shared online this week.

The adversarial audio derived from the aforementioned Stevie Wonder and Kesha’s songs were used to attack Youtube’s Content ID algorithm. Every time a video is uploaded to YouTube, it’s automatically scanned by Content ID, which checks to see if the footage and audio in the clip match any of the material in its database of copyrighted works.

If there is no match, it believes the video being uploaded is original. If there is a match, however, the owner of the copyrighted material in the submitted video can have the whole file blocked from view, monetized with ads so they make some money on their work, or just track the vid's viewing stats.

Fingerprints all over it

The researchers claim their AI-tweaked audio managed to slip past this Content ID system and sounded the same as the original soundtrack to humans. In terms of identifying an uploaded video's soundtrack, YouTube’s machine-learning algorithms most likely work by extracting feature vectors from the audio – think of these vectors as fingerprints for sound. The software then attempts to match these audio fingerprints against the fingerprints of tracks in the Content ID database. A fingerprint match indicates the audio in the video is probably the same as the corresponding copyrighted work in the database.

The trick to generating good adversarial audio, therefore, is to tweak a song in a way so that it is still recognizable to humans but not to copyright-detection algorithms like Content ID.

In developing the software, the team first studied software that recognizes songs from audio clips: Apple-owned Shazam, which identifies whatever music you're listening to from your phone's microphone.

Robot on road photo via Shutterstock

Nice 'AI solution' you've bought yourself there. Not deploying it direct to users, right? Here's why maybe you shouldn't


How Shazam and similar applications work has been reverse-engineered over the years, though the key thing to grasp is that a song can be represented as a spectrogram of its audio frequencies over time. Features, such as the densities of dominant frequencies, can be identified in a track's spectrogram, and used to generate a unique fingerprint for the audio.

The researchers trained a convolutional neural network to convert an audio signal into a spectrogram and then identify the main features in that representation to generate a unique fingerprint for the song. Standard gradient descent methods were then used to produce adversarial versions of the songs: the audio signals were tweaked so that they allegedly sound the same as the original input to the human ear yet produce different fingerprints.

In other words, the researchers changed the fingerprints of Stevie Wonder’s Signed, Sealed, Delivered, and Kesha’s Tik Tok, so they no longer matched up to the ones recognized by YouTube’s Content ID system while allegedly keeping the audio the same to humans.

Unfortunately, it’s difficult to tell how good these particular adversarial examples turned out. “Due to copyright law, we are unable to share the original or adversarially perturbed versions of copyrighted songs. However, for each experiment with copyrighted material, we run the exact same attack,” the paper noted.

That said, the team did produce an adversarial example of Total Totality by The 126ers, free to use from YouTube's music library, that allegedly slipped past Content ID, and is embedded and documented on this webpage, here. Compare the first clip, which is the original audio sample, to the eighth clip, which is the AI-crafted adversarial example that apparently evaded Content ID.

Adversarial examples are real!

The aforementioned adversarial example is, sadly, not great: it sounds like Futurama's robot sociopath Bender taking a cheese grater to his famous shiny metal ass. There are now unpleasant high-frequency squeaks and scrapes that aren't in the original clip. In other words, it’s not that similar to the original. We also listened to the AI-altered version of Tik Tok, and it also was a bit naff.

Parsa Saadatpanah, coauthor of the paper and a PhD student at the University of Maryland, told The Register that the project is more of a "proof of concept," though, and that the goal wasn't to target YouTube's algorithms. It is, after all, an early academic study of what could be possible with more research and work, so we should bear that in mind.

“Our attack is a 'transfer attack,' which means we build an attack for a model system, and hope that it also works on a different target system like YouTube that we don’t have access to,” he said. “The closer the model is to the real system, the better the attack. This study was supposed to be a high-level proof of concept – the truth is we never really expected to break YouTube, and so we weren’t trying to build a realistic model of how YouTube’s system works.”

Instead, the researchers said they were trying to raise awareness that adversarial examples are a threat to real-world systems. Classic adversarial examples such as manipulating traffic signs to trick self-driving cars have been criticized for not working in the real world.

“Fully autonomous vehicles are still a ways off, and most autonomous systems don’t rely on machine learning to identify traffic rules – this information is all available through [geographic information systems]," Saadatpanah said. "As a result, we sometimes hear from people that adversarial examples are not a real threat.

“For example, to fool a stop sign detector, an attacker needs to manipulate a stop sign – a physical object, not an image – and then hope that the adversarial perturbation still works when photographed with different cameras, resolutions, lighting conditions, viewing angles, distances, and motion blurs. But when fooling a copyright detector, the attacker can directly manipulate an audio or video file, and then upload it straight to a server with no modification."

The best defense to adversarial attacks is adversarial training, he advised. When copyright-detection algorithms like YouTube’s Content ID model are exposed to these adversarial examples during the training process, they are more likely to recognize attacks and be immune against them.

Spokespeople for YouTube declined to comment. ®

Broader topics

Other stories you might like

  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading
  • Big Tech loves talking up privacy – while trying to kill privacy legislation
    Study claims Amazon, Apple, Google, Meta, Microsoft work to derail data rules

    Amazon, Apple, Google, Meta, and Microsoft often support privacy in public statements, but behind the scenes they've been working through some common organizations to weaken or kill privacy legislation in US states.

    That's according to a report this week from news non-profit The Markup, which said the corporations hire lobbyists from the same few groups and law firms to defang or drown state privacy bills.

    The report examined 31 states when state legislatures were considering privacy legislation and identified 445 lobbyists and lobbying firms working on behalf of Amazon, Apple, Google, Meta, and Microsoft, along with industry groups like TechNet and the State Privacy and Security Coalition.

    Continue reading
  • SEC probes Musk for not properly disclosing Twitter stake
    Meanwhile, social network's board rejects resignation of one its directors

    America's financial watchdog is investigating whether Elon Musk adequately disclosed his purchase of Twitter shares last month, just as his bid to take over the social media company hangs in the balance. 

    A letter [PDF] from the SEC addressed to the tech billionaire said he "[did] not appear" to have filed the proper form detailing his 9.2 percent stake in Twitter "required 10 days from the date of acquisition," and asked him to provide more information. Musk's shares made him one of Twitter's largest shareholders. The letter is dated April 4, and was shared this week by the regulator.

    Musk quickly moved to try and buy the whole company outright in a deal initially worth over $44 billion. Musk sold a chunk of his shares in Tesla worth $8.4 billion and bagged another $7.14 billion from investors to help finance the $21 billion he promised to put forward for the deal. The remaining $25.5 billion bill was secured via debt financing by Morgan Stanley, Bank of America, Barclays, and others. But the takeover is not going smoothly.

    Continue reading

Biting the hand that feeds IT © 1998–2022