Software

AI + ML

Pulitzer Prize winning author Michael Chabon and others sue OpenAI

Another would be copyright class action


Pulitzer Prize winning US novelist Michael Chabon and several other writers are the latest to file a proposed class action accusing OpenAI of copyright infringement, alleging it pulled their work into the datasets used to train the models behind ChatGPT.

The suit claims that OpenAI "cast a wide net across the internet" to capture the most comprehensive set of content available to better train its GPT models, allegedly "necessarily" leading it "to capture, download, and copy copyrighted written works, plays and articles."

One of the more interesting parts of the lawsuit is an allegation about how the authors believe the AI business got its hands on "two internet-based book corpora," which it notes that OpenAI simply refers to as "Books1" and "Books2." The filing alleges that in the July 2020 paper introducing GPT-3, "Language Models are Few-Shot Learners," OpenAI disclosed that in addition to "Common Crawl" and "WebText" web page datasets, "16 percent of the GPT3 training dataset came from... 'Books1' and 'Books2'."

The writers lawsuit goes on to allege that there are only a few places on the public internet that contain this much material, claiming that OpenAI's Books1 dataset "is based on either the Standardized Project Gutenberg Corpus or Project Gutenberg itself" and accusing the AI biz of sourcing Books2 from:

infamous "shadow library" websites, like Library Genesis ("LibGen"), Z-Library, Sci-Hub, and Bibliotik, which host massive collections of pirated books, research papers, and other text-based materials. The materials aggregated by these websites have also been available in bulk through torrent systems.

Also included in the suit is Tony and Grammy award winner David Henry Hwang, the playwright and screenwriter behind M. Butterfly, Chinglish, Yellow Face, and The Dance and the Railroad; Peabody winner and Love and other Impossible Pursuits author Ayelet Waldman; Women We Buried author Rachel Louise Snyder; and Who is Rich? scribe Matthew Klam.

The writers allege that because "when ChatGPT is prompted, it generates not only summaries, but in-depth analyses of the themes present in Plaintiffs' copyrighted works," the writers believe "the underlying GPT model was trained using [the] plaintiffs' works."

The writers' lawyers also claim that when asked to write a paragraph in the style of The Amazing Adventures of Kavalier & Clay, the book that bagged US novelist Chabon his Pulitzer, ChatGPT generated a passage imitating his writing style and including references to the characters dealing with "the weight of the world at war."

Screenshot from the complaint, exhibit A (click to enlarge)

The suit [PDF] was filed in California federal court late last week and was yesterday assigned to San Francisco Magistrate Judge Peter H. Kang.

Ex-US pres Bill Clinton has written a cyber-attack pulp thriller. With James Patterson. Really

READ MORE

OpenAI is facing multiple lawsuits around copyright – including two in San Francisco filed by novelists Paul Tremblay and Mona Awad, and, separately, comedian Sarah Silverman and novelists Christopher Golden and Richard Kadrey. Its lawyers argued in those cases that the AI biz has not violated copyright laws, claiming ChatGPT's LLMs are protected under the US doctrine of "fair use." Their argument is that the way the business uses the text conforms to US copyright law, which allows a fair use exception for so-called "transformative uses" of work – a remix of the original that serves a different purpose or audience.

The US Copyright Office is currently seeking comment on a study of the copyright law and policy issues raised by artificial intelligence systems.

Defense for OpenAI hasn't yet filed a response to the Chabon complaint. We have asked OpenAI for comment.

The allegations in the case include direct and vicarious copyright infringement, illegal removal of copyright management information, unfair competition, and unjust enrichment. They are seeking an injunction against the infringement of their copyrights as well as unspecified damages.

OpenAI boss Sam Altman last week scored Indonesia's first ever golden visa – meaning he can now live in the archipelagic nation for up to 10 years – in recognition of his potential to "generate inbound investment." ®

Send us news
30 Comments

GPT apps fail to disclose data collection, study finds

Researchers say that implementing Actions omit privacy details and expose info

OpenAI allegedly wants TSMC 1.6nm for in-house AI chip debut

Another job for Broadcom, then

Of course the Internet Archive’s digital lending broke the law, appeals court says

Sorry, no, you can’t just digitize, share copyrighted books without permission

OpenAI co-founder's Safe Superintelligence startup inhales $1B in funding

No product? No problem!

AI firms propose 'personhood credentials' … to fight AI

It's going to take more than CAPTCHA to prove you're real

Have we stopped to think about what LLMs actually model?

Claims about much-hyped tech show flawed understanding of language and cognition, research argues

Writers sue Anthropic for feeding 'stolen' copyrighted work into Claude

Another day, another lawsuit over how AI lands training sets

OpenAI kills Iranian accounts using ChatGPT to write US election disinfo

12 on X and one on Instagram caught in the crackdown

Shein, Temu escalate epic e-commerce squabble

Chinese fast fashion slingers get their Spider-Man meme moment

Elon Musk is suing OpenAI again, claims CEO Sam Altman ‘betrayed’ him

These two are going through a really, really bad breakup

OpenAI might be a partner, but it's also a competitor, says Microsoft

Either Sam and Satya are on the rocks, or this is just more maneuvering to avoid regulatory scrutiny

OpenAI unveils AI search engine SearchGPT – not that you're allowed to use it yet

Launching in Beta is so 2014. We're in the prototype limited sign-up era now