Sarah Silverman, novelists sue OpenAI for scraping their books to train ChatGPT
Plus: Adobe is limiting how staff can use external generative AI tools, and the Pentagon is testing different large language models
AI in brief Award-winning novelists Paul Tremblay and Mona Awad, and, separately comedian Sarah Silverman and novelists Christopher Golden and Richard Kadrey, have sued OpenAI and accused the startup of training ChatGPT on their books without consent, violating copyright laws.
The lawsuits, both filed in the Northern District Court of San Francisco, say ChatGPT generates accurate summaries of their books and highlighted this as evidence for the software being trained on their work.
"OpenAI made copies of Plaintiffs' books during the training process of the OpenAI Language Models without Plaintiffs' permission. Specifically, OpenAI copied at least Plaintiff Tremblay's book The Cabin at the End of the World; and Plaintiff Awad's books 13 Ways of Looking at a Fat Girl and Bunny," according to court documents [PDF] in the first suit.
In the second suit, Silverman et al [PDF], make similar claims. The trio also make a point of stating that their books, including Silverman's book, a comical autobiography titled The Bedwetter, contain certain copyright management information that would have been included in the legit, copyrighted books. This is the basis of the third count they allege against OpenAI, a claim it breached the DCMA by removing the copyright management info. The suit states: "At no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published works."
OpenAI trains its large language models by scraping text from the internet, and although it hasn't revealed exactly what resources it has swallowed up, the startup has admitted to training its systems on hundreds of thousands of books protected by copyright, and stored on websites like Sci-Hub or Bibliotik.
All of the authors believe that their books have been ingested by ChatGPT without their permission, and that OpenAI is profiting from their work without attribution. They have launched a class-action lawsuit for other authors to join, and are requesting compensatory damages and permanent injunctions to stop OpenAI from continuing in its actions.
Speaking of OpenAI... It has released Code Interpreter, a plugin for ChatGPT Plus subscribers that can analyze uploaded files, allowing users to query and edit their documents, create charts, and so on. It also claims to run code.
Adobe restricts employees from using third-party generative AI tools
Software giant Adobe has banned employees from using their private email addresses or corporate credit cards to sign up and pay for machine learning products and services.
Chief information officer Cindy Stoddard warned staff to protect the tech giant's data and to not use generative AI tools in a way that could harm its business, customers, or workforce, Business Insider reported.
Adobe hasn't banned third-party applications like ChatGPT outright, but has strict restrictions in place on what is and isn't allowed on such systems. Employees should not reveal their input prompts, upload private Adobe data or code to generate email drafts, summarize documents, or patch software bugs.
They should also make sure to opt out of having content from their conversations being used as training data. In addition they can't sign up to use these tools with their own private email addresses or pay for a subscription with their corporate credit cards (or pay with a personal card and claim it back as an expense).
- Now that you've all tried it ... ChatGPT web traffic falls 10%
- OpenAI is still banging on about defeating rogue superhuman intelligence
- Mozilla pauses blunder-prone AI chatbot in MDN docs
- Google says public data is fair game for training its AIs
The US Department of Defense is testing LLMs
The Pentagon is testing five large language models' abilities to solve text-based tasks that could one day aid in decision-making and combat.
The models are fed top secret documents and asked to help plan and solve hypothetical scenarios, like a global crisis. Some tasks, like requesting information from specific military units, can sometimes take staff hours or days to complete, but large language models can provide data within minutes.
One test reportedly carried out a request in just ten minutes. But the technology is notoriously tricky, its performance can depend on the way the request is worded, and it is prone to generating false information.
"That doesn't mean it's ready for prime time right now. But we just did it live. We did it with secret-level data," said Matthew Strohmeyer, a US Air Force colonel, who told Bloomberg the military could deploy large language models soon.
The Pentagon did not reveal what models they were testing. Scale AI, however, reportedly revealed that one of them is its defense-oriented Donovan system. Other potential systems could be OpenAI's models offered via Microsoft's Azure Government platform, or other tools built by defense contractors like Palantir or Anduril. ®