Media experts cry foul over AI's free lunch of copyrighted content
US senators want to know how chatbots represent an existential crisis to journalism and democracy
Tech companies should compensate news publishers for training AI models on their copyrighted content, media experts told senators in a hearing this week.
The US Senate Committee on the Judiciary quizzed leaders from media trade associations and academia on how generative AI affects the journalism industry.
Journalism has always adapted as new technologies are invented. The rise of the internet has cut newspapers, and pushed the written word online. Publishers change their editorial strategies to appear high on Google rankings, attracting readers and digital advertisers. But how will they fare against large language models that can automatically generate text?
Trained on massive amounts of the internet, generative AI models can produce all types of content. The New York Times recently sued OpenAI, accusing the startup of unlawfully scraping "millions of [its] copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides and more."
Not only is OpenAI alleged to have stolen its work, The New York Times claimed it was now unfairly profiting off it by generating passages of its articles verbatim, allowing netizens to evade its paywall. In an attempt to wrestle back some power from tech companies, publishers are now fighting for compensation and trying to negotiate licensing agreements. But it's a difficult battle to win, especially if the law might not be on their side.
It's unclear whether generative AI violates current copyright laws. The models' developers believe that their use of content scraped from the internet should be protected under fair use since their chatbots create and produce text that transforms and transcends the original material. OpenAI insisted that ChatGPT regurgitating copyrighted content was a "rare bug."
Roger Lynch, CEO of magazine publisher Condé Nast, disagreed. "Fair use is to allow criticism, parody, scholarship, research, news reporting," he told the senators. "The law is clear when there is an adverse effect on the market for the copyrighted material ... Fair use is not intended to simply enrich tech companies that prefer not to pay."
There are other ways that tools like ChatGPT can eat into publishers' profits beyond reproducing their stories. Danielle Coffey, CEO of the News/Media Alliance trade association, noted that chatbots designed to crawl the web and act like a search engine, like Microsoft Bing or Perplexity, can summarize articles too.
Readers could ask them to extract and condense information from news reports, meaning there would be less incentive for people to visit publishers' sites, leading to a loss of traffic and ad revenue. "There would be no business model for us in that ecosystem," she said during the hearing.
Licensing agreements will keep the journalism industry afloat since it'd give media outlets a way to make money from generative AI. The deals need to be negotiated in a way that wouldn't prevent smaller developers from building their own large language models. Jeff Jarvis, who recently retired from the City University of New York's Newmark Graduate School of Journalism, is against licensing for all uses and was afraid it could set precedents that would affect journalists and small, open source companies competing with Big Tech.
- Daughter of George Carlin horrified someone cloned her dad with AI for hour special
- OpenAI rolls out Team tier because not everyone has enterprise-deep pockets
- AI flips the script on fingerprint lore – maybe they're not so unique after all
- Pennsylvanians, your government workers are now powered by ChatGPT
It's difficult to figure out a fair way to compensate publishers without knowing what content and how much of it was used to train AI models exactly. Coffey put forward the idea that tech companies should build a searchable database cataloging all the websites that have been scraped. AI companies may argue that it's too tricky and cumbersome to sort through the huge amounts of text they have amassed over time.
Revealing their sources might make their AI tools look bad too, considering the amount of inappropriate text their models have ingested, including people's personal information and toxic or NSFW content.
"The notion that the tech industry is saying that it's too complicated to license from such an array of content owners doesn't stand up," said Curtis LeGeyt, president and CEO of the National Association of Broadcasters. "Over the past three decades local TV broadcasters have literally done thousands of deals with cable and satellite systems across the country for the distribution of their programming."
Lynch urged Congress to clarify that training on copyrighted materials is illegal and not fair use. LeGeyt, however, said that passing new legislation to clear up the issue may be premature if it can be sorted through litigation. "If we have clarity that current laws apply to generative AI, let's let the marketplace work. If it's an arms race of who can spend the most on litigation, we know that the tech industry beats out everyone else."
Although companies like OpenAI believe training falls under fair use, the startup is acting more cautiously as the number of lawsuits against it pile up. So far, it has secured licensing agreements with the Associated Press, Axel Springer, and is reportedly in talks with CNN, Fox Corp, and Time.
"Although they negotiate with us, their starting point is 'we don't want to pay for content that we know that we should be able to get for free,'" Lynch said. If tech companies get their way, and the courts decide that generative AI doesn't violate copyright, they should still pay publishers for using their materials anyway, LeGeyt said.
"These technologies should be licensing our content. If they're not, Congress should act," he urged senators. ®