Reported $60M Reddit deal signed to train AI models with user data

Training machine learning on Redditors' musings - what could go wrong?

Reddit has reportedly signed a $60 million deal with an unnamed AI biz to hand over user conversations for model training.

The deal comes as Reddit looks to boost interest in its upcoming IPO. Reddit reportedly told prospective investors about the $60 million contract earlier this year, and indicated that its execs may repeat this type of content-sharing-for-model-training deal in the future.

Bloomberg, citing "people familiar with the matter," noted that both the stock market debut and AI deal details are subject to change and the site's expected listing might now happen as soon as March. Reddit did not immediately respond to The Register's inquiries.

The site's users, on the other hand, had plenty to say about the rumored $60 million deal. Comments ranged from "Reddit is asking WAY too little" to why would anyone pay tens of millions of dollars for 'shitposts' and "obscure horror artwork."

More cynical posters recalled Reddit's plan to charge for API access, which led to a limited user revolt and some forums going private or shutting down. As well as making third-party apps pay to interact with the platform – ensuring Reddit either makes money directly from those outside apps or from ads shown in its own offerings – the new pricing was a way to cash in on AI model makers scraping the site for training data.

It's well-known that Reddit posts and/or submitted links have been used to train neural networks, including OpenAI's GPT-2, in the past.

"Good to know that Reddit's API clampdown was never because they wanted to protect your data from AI usage," one user noted. "They were just protecting it from unpaid AI usage. Welcome to the dullest cyberpunk hell."

Reddit wouldn't be the first outfit to offer user-generated training data to AI players, and presumably that sharing is covered in the terms and conditions of use. While fighting off claims of copyright infringement regarding the use of books and published journalism in training its family of super-models, OpenAI has secured licensing agreements with the Associated Press and Axel Springer, and is reportedly in talks with CNN, Fox, and Time expressly to use these media orgs' articles for training.

However, while Reddit's a superb time suck for burrowing deep into obscure rabbit holes about things like grilled cheese sandwiches and screaming fish, it's still a truckload of opinion and personal experiences that may or may not reflect reality. We're not sure we'd use all of it for building a definitive language model.

As one Reddit user put it: "Gonna be one hell of a stupid AI." ®

More about

TIP US OFF

Send us news


Other stories you might like