Software

AI + ML

Pulitzer Prize winning author Michael Chabon and others sue OpenAI

Another would be copyright class action


Pulitzer Prize winning US novelist Michael Chabon and several other writers are the latest to file a proposed class action accusing OpenAI of copyright infringement, alleging it pulled their work into the datasets used to train the models behind ChatGPT.

The suit claims that OpenAI "cast a wide net across the internet" to capture the most comprehensive set of content available to better train its GPT models, allegedly "necessarily" leading it "to capture, download, and copy copyrighted written works, plays and articles."

One of the more interesting parts of the lawsuit is an allegation about how the authors believe the AI business got its hands on "two internet-based book corpora," which it notes that OpenAI simply refers to as "Books1" and "Books2." The filing alleges that in the July 2020 paper introducing GPT-3, "Language Models are Few-Shot Learners," OpenAI disclosed that in addition to "Common Crawl" and "WebText" web page datasets, "16 percent of the GPT3 training dataset came from... 'Books1' and 'Books2'."

The writers lawsuit goes on to allege that there are only a few places on the public internet that contain this much material, claiming that OpenAI's Books1 dataset "is based on either the Standardized Project Gutenberg Corpus or Project Gutenberg itself" and accusing the AI biz of sourcing Books2 from:

infamous "shadow library" websites, like Library Genesis ("LibGen"), Z-Library, Sci-Hub, and Bibliotik, which host massive collections of pirated books, research papers, and other text-based materials. The materials aggregated by these websites have also been available in bulk through torrent systems.

Also included in the suit is Tony and Grammy award winner David Henry Hwang, the playwright and screenwriter behind M. Butterfly, Chinglish, Yellow Face, and The Dance and the Railroad; Peabody winner and Love and other Impossible Pursuits author Ayelet Waldman; Women We Buried author Rachel Louise Snyder; and Who is Rich? scribe Matthew Klam.

The writers allege that because "when ChatGPT is prompted, it generates not only summaries, but in-depth analyses of the themes present in Plaintiffs' copyrighted works," the writers believe "the underlying GPT model was trained using [the] plaintiffs' works."

The writers' lawyers also claim that when asked to write a paragraph in the style of The Amazing Adventures of Kavalier & Clay, the book that bagged US novelist Chabon his Pulitzer, ChatGPT generated a passage imitating his writing style and including references to the characters dealing with "the weight of the world at war."

Screenshot from the complaint, exhibit A (click to enlarge)

The suit [PDF] was filed in California federal court late last week and was yesterday assigned to San Francisco Magistrate Judge Peter H. Kang.

Ex-US pres Bill Clinton has written a cyber-attack pulp thriller. With James Patterson. Really

READ MORE

OpenAI is facing multiple lawsuits around copyright – including two in San Francisco filed by novelists Paul Tremblay and Mona Awad, and, separately, comedian Sarah Silverman and novelists Christopher Golden and Richard Kadrey. Its lawyers argued in those cases that the AI biz has not violated copyright laws, claiming ChatGPT's LLMs are protected under the US doctrine of "fair use." Their argument is that the way the business uses the text conforms to US copyright law, which allows a fair use exception for so-called "transformative uses" of work – a remix of the original that serves a different purpose or audience.

The US Copyright Office is currently seeking comment on a study of the copyright law and policy issues raised by artificial intelligence systems.

Defense for OpenAI hasn't yet filed a response to the Chabon complaint. We have asked OpenAI for comment.

The allegations in the case include direct and vicarious copyright infringement, illegal removal of copyright management information, unfair competition, and unjust enrichment. They are seeking an injunction against the infringement of their copyrights as well as unspecified damages.

OpenAI boss Sam Altman last week scored Indonesia's first ever golden visa – meaning he can now live in the archipelagic nation for up to 10 years – in recognition of his potential to "generate inbound investment." ®

Send us news
30 Comments

OpenAI asks Uncle Sam to let it scrape everything, stop other countries complaining

The rest of the world doesn't think 'fair use' is fair but we should make 'em

Do AI robo-authors qualify for copyright? It's still no, says appeals court

Computer scientist Stephen Thaler again told his 'Creativity Machine' can't earn a ©

Judge says Meta must defend claim it stripped copyright info from Llama's training fodder

Facebook giant allegedly didn't want neural networks to emit results that would give the game away

We did not have Brave clashing with Rupert Murdoch on our 2025 bingo card, but there it is

Indie browser maker asks judge for legal shield against copyright threats over AI summaries

Brits end probe into Microsoft's $13B bankrolling of OpenAI

Redmond doesn't have total control over GPT maker so we lack authority, say monopoly cops

AI crawlers haven't learned to play nice with websites

SourceHut says it's getting DDoSed by LLM bots

Show top LLMs some code and they'll merrily add in the bugs they saw in training

One more time, with feeling ... Garbage in, garbage out

MINJA sneak attack poisons AI models for other chatbot users

Nothing like an OpenAI-powered agent leaking data or getting confused over what someone else whispered to it

AI running out of juice despite Microsoft's hard squeezing

Biz leaders still dream of obedient agents replacing workers. In the actual workplace, they're going AWOL

How nice that state-of-the-art LLMs reveal their reasoning ... for miscreants to exploit

Blueprints shared for jail-breaking models that expose their chain-of-thought process

Does terrible code drive you mad? Wait until you see what it does to OpenAI's GPT-4o

Model was fine-tuned to write vulnerable software – then suggested enslaving humanity

Satya Nadella says AI is yet to find a killer app that matches the combined impact of email and Excel

Microsoft CEO is more interested in neural nets boosting GDP than delivering superhuman intelligence