This article is more than 1 year old
Pulitzer Prize winning author Michael Chabon and others sue OpenAI
Another would be copyright class action
Pulitzer Prize winning US novelist Michael Chabon and several other writers are the latest to file a proposed class action accusing OpenAI of copyright infringement, alleging it pulled their work into the datasets used to train the models behind ChatGPT.
The suit claims that OpenAI "cast a wide net across the internet" to capture the most comprehensive set of content available to better train its GPT models, allegedly "necessarily" leading it "to capture, download, and copy copyrighted written works, plays and articles."
One of the more interesting parts of the lawsuit is an allegation about how the authors believe the AI business got its hands on "two internet-based book corpora," which it notes that OpenAI simply refers to as "Books1" and "Books2." The filing alleges that in the July 2020 paper introducing GPT-3, "Language Models are Few-Shot Learners," OpenAI disclosed that in addition to "Common Crawl" and "WebText" web page datasets, "16 percent of the GPT3 training dataset came from... 'Books1' and 'Books2'."
The writers lawsuit goes on to allege that there are only a few places on the public internet that contain this much material, claiming that OpenAI's Books1 dataset "is based on either the Standardized Project Gutenberg Corpus or Project Gutenberg itself" and accusing the AI biz of sourcing Books2 from:
infamous "shadow library" websites, like Library Genesis ("LibGen"), Z-Library, Sci-Hub, and Bibliotik, which host massive collections of pirated books, research papers, and other text-based materials. The materials aggregated by these websites have also been available in bulk through torrent systems.
Also included in the suit is Tony and Grammy award winner David Henry Hwang, the playwright and screenwriter behind M. Butterfly, Chinglish, Yellow Face, and The Dance and the Railroad; Peabody winner and Love and other Impossible Pursuits author Ayelet Waldman; Women We Buried author Rachel Louise Snyder; and Who is Rich? scribe Matthew Klam.
The writers allege that because "when ChatGPT is prompted, it generates not only summaries, but in-depth analyses of the themes present in Plaintiffs' copyrighted works," the writers believe "the underlying GPT model was trained using [the] plaintiffs' works."
The writers' lawyers also claim that when asked to write a paragraph in the style of The Amazing Adventures of Kavalier & Clay, the book that bagged US novelist Chabon his Pulitzer, ChatGPT generated a passage imitating his writing style and including references to the characters dealing with "the weight of the world at war."
- Microsoft to shield paid-up Copilot customers from any AI copyright brawls it starts
- OpenAI urges court to throw out authors' claims in AI copyright battle
- Judge snuffs man's quest to have AI-created art protected by copyright
- Internet Archive sued by record labels as battle with book publishers intensifies
- Judge lets art trio take another crack at suing AI devs over copyright
The suit [PDF] was filed in California federal court late last week and was yesterday assigned to San Francisco Magistrate Judge Peter H. Kang.
Ex-US pres Bill Clinton has written a cyber-attack pulp thriller. With James Patterson. Really
READ MOREOpenAI is facing multiple lawsuits around copyright – including two in San Francisco filed by novelists Paul Tremblay and Mona Awad, and, separately, comedian Sarah Silverman and novelists Christopher Golden and Richard Kadrey. Its lawyers argued in those cases that the AI biz has not violated copyright laws, claiming ChatGPT's LLMs are protected under the US doctrine of "fair use." Their argument is that the way the business uses the text conforms to US copyright law, which allows a fair use exception for so-called "transformative uses" of work – a remix of the original that serves a different purpose or audience.
The US Copyright Office is currently seeking comment on a study of the copyright law and policy issues raised by artificial intelligence systems.
Defense for OpenAI hasn't yet filed a response to the Chabon complaint. We have asked OpenAI for comment.
The allegations in the case include direct and vicarious copyright infringement, illegal removal of copyright management information, unfair competition, and unjust enrichment. They are seeking an injunction against the infringement of their copyrights as well as unspecified damages.
OpenAI boss Sam Altman last week scored Indonesia's first ever golden visa – meaning he can now live in the archipelagic nation for up to 10 years – in recognition of his potential to "generate inbound investment." ®