Microsoft: Copyright law didn't stop the VCR and shouldn't stop the LLM
Lawyers argue content used to train LLMs does not 'supplant the market' for news
Updated Microsoft is coming out swinging over claims by the New York Times that the Windows giant and OpenAI infringed copyright by using its articles to build ChatGPT and other models.
In yesterday's filing [PDF], Microsoft's lawyers recall the early 1980s efforts of the Motion Picture Association to stifle the growth of VCR technology, likening it to the legal efforts of the New York Times (NYT) to stop OpenAI in their work on the "latest profound technological advance."
The motion describes the NYT's allegations that the use of GPT-based products "harms The Times," and "poses a mortal threat to independent journalism" as "doomsday futurology."
The NYT case is one of many being faced by OpenAI over the training of its Large Language Models (LLMs). The NYT is alleging that large amounts of its content were harvested in that training without permission. It gives examples that it alleges prove ChatGPT was trained using its articles.
Microsoft's response doesn't appear to suggest that content has not been lifted. Instead, it says: "Despite The Times's contentions, copyright law is no more an obstacle to the LLM than it was to the VCR (or the player piano, copy machine, personal computer, internet, or search engine.)"
Which seems a bit of a stretch. We're pretty sure Microsoft would be reaching for the phone to its lawyers if bits of Windows were to show up in other operating systems.
The motion states that the NYT's methods to demonstrate how its content could be regurgitated did not represent real-world usage of the GPT tools at issue. "The Times," explains the motion, "crafted unrealistic prompts to try to coax the GPT-based tools to output snippets of text matching The Times's content."
In its demands for the dismissal of the three claims in particular, the motion points out that Microsoft shouldn't be held liable for end-user copyright infringement through GPT-based tools. It also says that to get the NYT content regurgitated, a user would need to know the "genesis of that content."
"And in any event, the outputs the Complaint cites are not copies of works at all, but mere snippets."
Finally, the filing delves into the murky world of "fair use," the American copyright law, which is relatively permissive in the US compared to other legal jurisdictions.
OpenAI hit back at the NYT last month and accused the company of paying someone to "hack" ChatGPT in order to persuade it to spit out those irritatingly verbatim copies of NYT content.
- OpenAI claims New York Times paid someone to 'hack' ChatGPT
- Media experts cry foul over AI's free lunch of copyrighted content
- Oracle tells Supremes: Fair use? Pah! There's nothing fair about 'Google's copying'
- Reusing software 'interfaces' is fine, Google tells Supreme Court, pleads: Think of the devs
At the time, we said: "By hack, presumably the biz means: Logged in as normal and asked it annoying questions."
The NYT's lead counsel, Ian Crosby, told The Register: "What OpenAI bizarrely mischaracterizes as 'hacking' is simply using OpenAI's products to look for evidence that they stole and reproduced The Times's copyrighted works."
We asked Crosby for his take on Microsoft's motion. We also asked Microsoft how it would react if it found Windows source code being used to train LLMs. We will update this piece should either respond. ®
Updated to add on March 6:
A Microsoft spokesperson got in touch to say: "Lawfully developed AI-powered tools should be allowed to advance responsibly just like valuable technologies of the past. They are also not a substitute for the vital role that journalists play in our society."
Ian Crosby, Susman Godfrey partner and lead counsel for The New York Times, told The Register: "Microsoft doesn't dispute that it worked with OpenAI to copy millions of The Times's works without its permission to build its tools. Instead, it oddly compares LLMs to the VCR even though VCR makers never argued that it was necessary to engage in massive copyright infringement to build their products.
"Despite Microsoft's attempts to frame its relationship with OpenAI as a mere 'collaboration,' in reality, as The Times's complaint states, the two companies are intertwined when it comes to building their generative AI tools.
"In spite of Microsoft's protestations about how The Times described in detail the unprecedented theft of copyrighted works by the defendants, the bottom line is that The Times looked for its stolen works and found them. Microsoft now blames The Times for bringing this to light as an excuse for their and OpenAI's wrongdoing.
"Microsoft, the most valuable company in the world, claims an unfettered right to 'harness[] humanity's collective wisdom and thinking' – and, of course, its expression – for free. And despite its rhetoric about 'improv[ing] the way people live and work,' its true goal is to make money from new products built by copying the works of others."