This article is more than 1 year old
Microsoft, GitHub, OpenAI urge judge to bin Copilot code rip-off case
We're not the bad guys in this, Azure empire says with a straight face
Attorneys representing Microsoft, its GitHub subsidiary, and OpenAI have asked a judge to throw out a copyright case against GitHub's programming assistant Copilot, on the grounds the challenge against them lacks standing.
To have standing – to be allowed to make a complaint to a court – a plaintiff must have suffered a harm of some kind that the court can address. And that's what the trio are arguing.
The complaint, filed in November against the three companies on behalf of two anonymous plaintiffs, alleges that Copilot was trained on public source code without regard for the software licensing terms imposed by those who created the software.
"By training their AI systems on public GitHub repositories (though based on their public statements, possibly much more) we contend that the defendants have violated the legal rights of a vast number of creators who posted code or other work under certain open-source licenses on GitHub, wrote Matthew Butterick, a software developer and one of the attorneys behind the complaint, when the case was filed last November.
Essentially, the plaintiffs contend that Copilot, based on OpenAI's Codex model, was created by vacuuming up vast amounts of publicly accessible source code, without regard for it's licensing terms, and reproducing that code on demand when presented with an appropriate query from a Copilot user.
"Defendants have made no attempt to comply with the open-source licenses that are attached to much of their training data," the complaint says. "Instead, they have pretended those licenses do not exist, and trained Codex and Copilot to do the same."
But attorneys for the defending firms contend the plaintiffs have failed to cite specific instances that Copilot reproduced their own code and have failed to identify specific examples of copying outside of examples from textbooks like Eloquent JavaScript by Marijn Haverbeke, who is not a party to the case.
"The essence of Plaintiffs’ complaint is that rarely – the complaint cites a study reporting 1 percent of the time – Copilot (and therefore Codex) allegedly generates snippets of code similar to the publicly available code that it learned from, and does so without also generating copyright notices or open source license terms that originally accompanied the code," the OpenAI-backed motion to dismiss [PDF] explains.
"But Plaintiffs provide no allegation that any code that they authored was used by Codex or generated as a suggestion to a Codex user; they only point to Codex’s abilities to generate common textbook programming functions, such as a function [from Eloquent JavaScript] for determining if a number is odd or even."
The motion also contends that the plaintiffs should not be allowed to bring their claim anonymously, based on the US Ninth Circuit's test to balance the public benefit of disclosure with valid reasons for privacy. That appeals court test supports anonymity when: there's a risk of retaliatory harm; when the matter is of a sensitive or highly personal nature; and when the party would be compelled to admit to illegal conduct.
None of those circumstances apply in this case, the defendants' legal team argues.
- GitHub's Copilot flies into its first open source copyright lawsuit
- GitHub adds admin controls to Copilot, paints 'Business' on the side, doubles price
- ChatGPT talks its way through Wharton MBA, medical exams
- Stack Overflow bans ChatGPT as 'substantially harmful' for coding issues
The complaint is also deficient, the defense says, because it fails to enumerate specific wrongs, as required by law, against the handful of businesses named in the lawsuit. The defendants also raise objections to the allegations of Digital Millennium Copyright Act (DMCA) violations, among other supposed legal shortcomings.
A parallel Microsoft-backed motion to dismiss [PDF] makes similar arguments and also tries to turn the tables on the plaintiffs' claim that, "Defendants chose to build AI systems designed to enhance their own profit at the expense of a global open-source community that they had once sought to foster and protect."
"Copilot withdraws nothing from the body of open source code available to the public," the Microsoft-backed motion argues. "Rather, Copilot helps developers write code by generating suggestions based on what it has learned from the entire body of knowledge gleaned from public code. In so doing, Copilot advances the very values of learning, understanding, and collaboration that animate the open source ethic.
"With their demand for an injunction and a multi-billion dollar windfall in connection with software that they willingly share as open source, it is Plaintiffs who seek to undermine those open source principles and to stop significant advancements in collaboration and progress."
Beyond this 'we're not profiteers, they are' argument, Microsoft's legal team insists that GitHub users know what they are signing up for when they agree to the code hosting firm's Terms of Service, which authorizes the parsing, indexing, and analysis of public code.
"Any GitHub user thus appreciates that code placed in a public repository is genuinely public," the Microsoft motion states. "Anyone is free to examine, learn from, and understand that code, as well as repurpose it in various ways. And, consistent with this open source ethic, neither GitHub’s TOS nor any of the common open source licenses prohibit either humans or computers from reading and learning from publicly available code."
Tyler Ochoa, a professor in the law department at Santa Clara University in California, told The Register that, based solely on the court filings to dismiss, "I would say they stand a very good chance of getting many, perhaps most, of the claims dismissed. But the court will likely grant leave [for the plaintiffs] to amend to attempt to cure some (perhaps many) of the alleged deficiencies."
Ochoa said "spaghetti complaints" – in which multiple claims are thrown against the wall to see what sticks – are common in copyright cases. He said claims based on state law that duplicate federal copyright law are likely to be dismissed.
He explained, "The claims that strike me that should be dismissed with prejudice are: tortious interference, unjust enrichment, and unfair competition should be preempted by Section 301(a) of the Copyright Act and the false designation of origin claim under Section 43(a) of the Lanham Act should be dismissed under Dastar."
Ochoa said he found it unusual that the plaintiffs had not filed a specific copyright infringement claim but instead cited the DMCA's prohibition on the removal of Copyright Management Information (CMI) – the removal of software licenses from Copilot output. He speculated that may have been an attempt to avoid the argument that Copilot's code reproduction should be allowed under the Fair Use doctrine.
As the defense pointed out, he said, the removal of CMI has an intent requirement – you have to intend to facilitate infringement to violate the law. "CMI arguments are very difficult to sustain," he said. "The courts have been interpreting that statute quite narrowly."
Asked to comment on the motions to dismiss, Matthew Butterick declined to respond. ®