Training AI on Mastodon posts? The idea's extinct after terms updated

Such rules could be tricky to enforce in the Fediverse, though

Mastodon is the latest platform to push back against AI training, updating its terms and conditions to ban the use of user content for large language models (LLMs).

"We want to make it clear," the federated platform stated in an email to users, "that training LLMs on the data of Mastodon users on our instances is not permitted."

The announcement may feel like shutting the stable door after the horse has bolted, but it's still reassuring to know that users' rants on the platform, in theory, won't feed into the LLMs behind generative AI services.

To be fair, enforcing such restrictions on a platform that prides itself on decentralization and openness could prove difficult. The terms apply only to Mastodon's own instances, not the wider Fediverse. It's possible to deploy a robots.txt file to block AI crawlers, but that relies on those behind the bots respecting it rather than invoking fair use.

Mastodon is not the only platform worried about its content being used for AI training. Another social media platform, Bluesky, recently said: "We do not use any of your content to train generative AI, and have no intention of doing so," but, as the service acknowledged, enforcement of such a rule outside its systems is challenging.

As 2024 drew to a close, a million public posts from Bluesky's firehose API turned up in a training set.

Earlier in June, discussion forum Reddit sued Anthropic, an AI business, over allegations [complaint is here – PDF] that content generated by its users was scraped in violation of contractual terms and technical barriers. The suit did not cite examples of any alleged robots.txt violations by Anthropic after July 2024.

In 2024, Reddit signed a data-sharing deal with OpenAI. Earlier that year, it signed an AI training deal with Google, having begun charging companies to use its data-downloading API in 2023.

Mastodon's change highlights the concerns of users over how their data might be used, particularly on platforms that are, by their nature, as free and open as possible.

The updates, including an increase in minimum age from 13 to 16, take effect from July 1. ®

More about

TIP US OFF

Send us news


Other stories you might like