Stack Overflow to charge LLM developers for access to its coding content

No more freebies – Google signs up to improve Gemini's programming abilities

Stack Overflow has launched an API that will require all AI models trained on its coding question-and-answer content to attribute sources linking back to its posts. And it will cost money to use the site's content.

"All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model," it confirmed in a statement.

The Overflow API is designed to act as a knowledge database to help developers build more accurate and helpful code-generation models. Google announced it was using the service to access relevant information from Stack Overflow via the API and integrate the data with its latest Gemini models, and for its cloud storage console.

"This partnership brings our enterprise AI platform together with the most in-depth and popular developer knowledge platform available today," Thomas Kurian, CEO at Google Cloud, declared in a statement. 

"Google Cloud and Stack Overflow will help developers more effectively use AI in the platforms they prefer, combining the vast knowledge from the Stack Overflow community and new AI capabilities, powered by Vertex AI and Google Cloud's trusted, secure infrastructure."

Overflow API will give Gemini the ability to generate code and answer programming-related questions by referencing content posted and curated on Stack Overflow. Meanwhile, developers will be able to access the platform's information directly inside Google Cloud Console to ask cloud and infrastructure questions.

Google has previously explained that it will collect public resources on the web to train models. Stack Overflow, however, believes it should be compensated for its content – made up of more than 58 million questions and answers.

"In the AI era, Stack Overflow has maintained that the foundation of trusted and accurate data will be central to how technology solutions are built, with millions of the world's developers coming to our platform as one of the few high quality sources of information with community attribution at its core," said CEO Prashanth Chandrasekar.

As our sister site DevClass notes, "last year some moderators declared a “general moderator strike” over being restricted from removing AI-generated answers. The company is also developing OverflowAI as an AI-driven service, presumably now using Google AI under the covers."

With developers increasingly turning to chatbots to answer their questions, traffic to Stack Overflow has steadily dropped over time. The platform, however, believes that it is more relevant than ever – considering that AI will need fresh data created by humans. 

"The best content has a human at its core. [Generative AI] can only respond with what has already been published – it won't provide any data or insightful feedback on anything created after its last data ingestion. What does a human component provide? Humans provide up-to-date information, identify patterns early on, and add signals around the social value of knowledge within a community, providing context to the content," it concluded. ®

More about


Send us news

Other stories you might like