If you're going to train AI on our books, at least pay us, authors tell Big Tech
Also: OpenAI enters deals with the Associated Press and Shutterstock to license content, and more
AI in brief More than 8,000 writers have signed an open letter penned by the US Authors Guild urging leaders from six top AI companies to obtain consent and compensate them for training models on their copyrighted work.
Large language models are trained on large amounts of text scraped from the internet. Hundreds of thousands of books hosted on websites have been ingested without writers' permission. Now many of those writers are speaking out against having their work ripped off by computers.
"Generative AI technologies built on large language models owe their existence to our writings," begins the letter addressed to the CEOs of OpenAI, Alphabet, Stability AI, Meta, IBM, and Microsoft. "These technologies mimic and regurgitate our language, stories, style, and ideas. Millions of copyrighted books, articles, essays, and poetry provide the 'food' for AI systems, endless meals for which there has been no bill."
"You're spending billions of dollars to develop AI technology. It is only fair that you compensate us for using our writings, without which AI would be banal and extremely limited."
Mary Rasenberger, CEO of the Authors Guild, told NPR that the letter was written to try and get the companies to settle with writers without having to take matters into court. "Lawsuits are a tremendous amount of money … They take a really long time," she said. Other writers, however, have been more aggressive and have sued those they see as having stolen their work.
LLaMa for profit
Meta will reportedly release a new version of its large language model, LLaMA, that supports commercial use in an attempt to compete with rival AI developers.
The social media giant often releases its models for academic research, and was criticized for not being as open as it claimed by preventing developers using LLaMA for commercial applications.
Zuckerberg's biz is reportedly looking at how it might be able to charge enterprises to fine-tune the model on its own custom data – but may not end up charging users at all, according to the Financial Times.
The hope is that by sharing its models with more developers, Meta will be able to take some of the shine away from competitors OpenAI, Google, and Microsoft. The rumor was first reported by The Information last month.
- Sarah Silverman, novelists sue OpenAI for scraping their books to train ChatGPT
- Google, DeepMind accused of 'stealing the internet' to create Bard AI chatbot
- OpenAI's ChatGPT may face a copyright quagmire after 'memorizing' these books
OpenAI inks data deals with Associated Press and Shutterstock
Facing an onslaught of copyright lawsuits, OpenAI has announced partnerships with the Associated Press and Shutterstock to license their content for training AI models. although that does little for the thousands of book authors (to say nothing of the millions of authors of other written works).
In this more "enlightened" scenario, OpenAI will get its hands on an archive of text dating back to 1985 published by the non-profit news agency, and AP will get access to the startup's "technology and product expertise" in return.
Last week in a statement, Kristin Heitmann, AP senior vice president and chief revenue officer, said: "We are pleased that OpenAI recognizes that fact-based, nonpartisan news content is essential to this evolving technology, and that they respect the value of our intellectual property."
Both entities will team up to look for "potential use cases for generative AI in news products and services." The deal follows a similar announcement from stock image vendor Shutterstock the week before that revealed OpenAI had signed a six-year agreement to license its content.
OpenAI will use the data to train its generative AI systems, while Shutterstock can continue using its technology to power tools like its AI Image Generator. The license agreement allows OpenAI to collect and obtain data with permission, while they are compensated for sharing their resources. The financial details of both deals were not disclosed. ®