A win at last: Big blow to AI world in training data copyright scrap

You gotta fight ... for your Reuters ... to party

Thomson Reuters has won a partial summary judgment in a copyright case against shuttered AI firm Ross Intelligence, a decision that disallows fair use as a defense for training models on proprietary data without permission.

"We are pleased that the court granted summary judgment in our favor and concluded that Westlaw’s editorial content created and maintained by our attorney editors, is protected by copyright and cannot be used without our consent," a spokesperson for Thomson Reuters told The Register today.

"The copying of our content was not 'fair use.'"

Ross Intelligence announced its shutdown on December 11, 2020, following the Thomson Reuters lawsuit, and subsequently filed an unsuccessful antitrust counterclaim. The AI legal startup did not respond to a request for comment.

As the case continues toward settlement or trial, the issue will be damages rather than whether the use of the copyrighted material is lawful, unless there's a successful appeal.

At least 38 AI-related copyright claims are pending before US courts. The decision in the five-year-old case against Ross Intelligence, accused of infringing content created by Thomson Reuters' Westlaw subsidiary by using it for training AI models, represents the first significant adverse decision against an AI firm with regard to copyright law.

Fair's fair

Federal district Judge Stephanos Bibas came to a different conclusion in Delaware last year when he mostly denied [PDF] the Thomson Reuters' motion for summary judgement.

But subsequently, the judge reconsidered his 2023 summary judgement opinion, and in a memorandum opinion [PDF] issued on Tuesday, he has disallowed fair use as a copyright defense. This is particularly noteworthy because the case focuses on training (AI model input) rather than inference (AI model output).

In his memorandum, the judge describes how Ross Intelligence tried to compete with Westlaw. To train its AI legal search tool, the startup first tried to license Westlaw's content but was refused. It then enlisted legal support vendor LegalEase Solutions to obtain the training data through so-called "bulk memos."

As the judge's memo explains, "Bulk memos are lawyers’ compilations of legal questions with good and bad answers. LegalEase gave those lawyers a guide explaining how to create those questions using Westlaw headnotes, while clarifying that the lawyers should not just copy and paste headnotes directly into the questions."

LegalEase is said to have sold Ross about 25,000 of those memos to train its AI search tool.

"In other words, Ross built its competing product using bulk memos, which in turn were built from Westlaw headnotes," the judge's memo explains. "When Thomson Reuters found out, it sued Ross for copyright infringement."

Headnotes are short summaries of uncopyrightable court opinions offered to Westlake clients, so there's some disagreement among legal experts about the extent to which these snippets of text should qualify for copyright protection, individually or as part of a compilation.

Judge Bibas initially concluded that the headnotes shared enough similarity with uncopyrightable court opinions that they could not qualify for copyright protection. But he changed his mind based on a sculpting analogy, as he explained in his memo:

...each headnote is an individual, copyrightable work. That became clear to me once I analogized the lawyer’s editorial judgment to that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable. So too, even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole. Identifying which words matter and chiseling away the surrounding mass expresses the editor’s idea about what the important point of law from the opinion is. That editorial expression has enough 'creative spark' to be original.

Santa Clara University law professor Eric Goldman took issue with that analogy in a write-up expressing surprise at the court ruling.

"The court’s analogy to chiseling marble is wholly unpersuasive because sculptors have a wide range of freedom to express themselves, while summarizers of court opinions do not," he wrote.

"To the extent the court is saying that there’s an individual copyright that comes from picking and choosing the interesting quotes out of a court opinion, I vigorously disagree."

The court’s analogy to chiseling marble is wholly unpersuasive

A critical factor in the judge's decision is that Ross Intelligence used Westlake content to develop a competing legal research tool. The judge determined that Ross's use is not "transformative," one of four tests to assess whether fair use can be claimed as a defense. Another test under the law is the impact the use of copyrighted content has on the market for the original work. There, the judge determined that Ross used Westlake's content to compete with it.

"Ross took the headnotes to make it easier to develop a competing legal research tool," the judge wrote. "So Ross’s use is not transformative. Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today."

Despite the judge's directive that his decision does not apply to generative AI – the case dates back to May 6, 2020, before ChatGPT – Edward Lee, professor of law at Santa Clara University, believes other courts hearing generative AI cases will consider judge Bibas's reasoning.

Judge Bibas's decision has great significance

"In the near term, Judge Bibas's decision has great significance," Edward Lee, professor of law at Santa Clara University, told The Register.

"Even though he qualified it as not involving generative AI, we can expect every plaintiff in the 30-plus copyright lawsuits to be citing this decision to the respective courts and asking them to adopt the same analysis.

"However, in the mid term, this decision is only one district court decision. Many other judges and potentially juries will decide fair use defense in the other cases. And we all should expect the fair use issue of training of AI models will go to the Supreme Court. Maybe not in this case, but one or more of the AI lawsuits. The issue is too important to the country for the Supreme Court to ignore it."

But even before the judge's decision, AI firms appear to have taken note of the accumulation of AI copyright lawsuits and reassessed their potential liability exposure. Anthropic in January settled a lyrics copyright claim brought by Universal Music and other publishers. And OpenAI, which succeeded in having a copyright claim tossed last November, has been making content licensing deals with publishers amid lobbying to force AI firms to pay for training data. ®

More about

TIP US OFF

Send us news


Other stories you might like