Google Search results polluted by buggy AI-written code frustrate coders
Pulumi claims it has culled bad infrastructure-as-code samples
Analysis Google has indexed inaccurate infrastructure-as-code samples produced by Pulumi AI – a developer that uses an AI chatbot to generate infrastructure – and the rotten recipes are already appearing at the top of search results.
This mess started with Pulumi's decision to publish the result of its users' prompts on a curated AI Answers page. Google's crawlers indexed the resulting robo-responses – but when users find them, the AI answers are often inaccurate.
"It has happened," wrote developer Arian van Putten in a social media post over the weekend. "The number one Google result was an official Pulumi documentation page that was clearly written by an LLM (it had a disclaimer that it was) and hallucinated an AWS feature that didn't exist. This is the beginning of the end."
As The Register opined in 2022 and reported in January this year, search quality has declined because search engines index low-quality AI-generated content and present it in search results. This remains an ongoing area of concern.
Pulumi AI and its online archive of responses, AI Answers, is a case in point. Google's search crawler indexes the output of Pulumi's AI and presents it to search users alongside links to human-authored content. Software developers have found some of the resulting AI-authored documentation and code inaccurate or even non-functional.
The problem was noted on March 21, 2024 by developer Pete Nykänen in a GitHub Issues post to the Pulumi AI code repository. "Today I was googling various infrastructure related searches and noticed a worrying trend of Pulumi AI answers getting indexed and ranking high on Google results, regardless of the quality of the AI answer itself or if the question involved Pulumi in the first place. This happened with multiple searches and will probably get even worse as the time goes on."
Others have also raised the issue.
A rising tide of muck
Nykänen told The Register in an email that he began noticing Pulumi AI search result issues around the time he posted to GitHub last month.
"As an engineer, I spend a lot of time searching for answers online and it was not difficult to notice the AI answers rising to the top of the search results overnight, even for keywords unrelated to Pulumi itself," he noted. "I filed the issue and hoped that Pulumi would rectify the situation (which they promised to do) but sadly the issue still persists."
"Documentation, especially infrastructure related, is already often incorrect, hard to find, outdated or otherwise missing. While tools like Pulumi AI can provide value to some, filling the internet with unconfirmed, possibly hallucinated, answers is actually pretty malicious. And the longer it goes on, the worse it gets."
Nykänen argued that with AI content already appearing at the top of search results and more companies creating content generation tools, he hopes that those involved in AI consider how their work impacts the integrity of the web.
"I don't think it's too late for Pulumi either and hopefully they will decide to hide their AI generated content from search engine scrapers," he suggested.
Aaron Friel, an AI engineer at Pulumi, acknowledged Nykänen's concerns, responding the following day that the developer has "taken steps to remove more than half (almost two thirds) of AI Answers, and we plan to continue to ensure that these AI answers are complementary to our existing documentation."
Friel noted that Pulumi also plans to make sure its site mentions real APIs and upstream documentation. Testing generated code is also on the to-do list.
Hello? Google?
That was a month ago, and Google hasn't yet gotten the memo. When The Register tried the keywords cited by Nykänen on Monday "aws lightsail xray" – Pulumi AI's answer was the second search result. And when we tried again on Tuesday, it ranked at the top of the page – above the official AWS documentation.
We asked Google what it thought of the situation and a company spokesperson told us it "always aims to surface high quality information, but on some niche topics or unusual queries, there may not be a lot of high quality content available to rank highly in Search."
The search giant also reminded us that it policies mean "Low value content that’s created at scale to manipulate Search rankings is spam, however it is produced", and that recent updates to its tech "reduced low quality, unoriginal content on Search by 45 percent, and aim to tackle unhelpful content that’s designed to rank well in Search."
Microsoft's Bing search engine could be ahead of the game in terms of filtering AI-generated material as it did not have this problem for the same query, though results it produced included a Chat button that launched an AI-generated response if you took the bait and clicked rather than just hitting return to submit the query. Brave Search also omitted the Pulumi AI response. DuckDuckGo, meanwhile, returned the Pulumi AI result as the fourth item on its search results page for the query.
- Open Source world's Bruce Perens emits draft Post-Open Zero Cost License
- European Commission starts formal probe of Meta over election misinformation
- Politicians call for ban on 'killer robots' and the curbing of AI weapons
- OpenAI slapped with GDPR complaint: How do you correct your work?
Another GitHub Issue post on Monday, referring to van Putten's complaint, has asked for the removal of Pulumi AI's answer about AWS EBS direct APIs – which Pulumi evidently does not support.
Several AI hallucinations flagged in March have already been dealt with.
In an email to The Register, Pulumi co-founder and CEO Joe Duffy defended his firm's AI effort – but allowed that more drastic intervention might be called for if the issue can't be adequately addressed.
"Pulumi AI has transformed how most of our customers work, enabling them to navigate a sea of hundreds of clouds with the myriad ways you can use all of their services," Duffy explained. "We processed a 50 percent increase in prompts quarter on quarter, which is a testament to how useful our customers are finding it to their daily work."
A startup that promises to do better ...
Duffy claimed that Pulumi has tested and improved its code quality over time and has seen a double-digit improvement in the success rates for code examples quarter over quarter.
"That said, we know these aren't perfect," he conceded. "Because our AI answers are indexable by Google, they show up in search results. I'll be the first to admit, I was surprised at how highly Google is ranking these pages, since in general they have no inbound links – a far cry from how PageRank used to work – and I would have expected it to prefer our older, more mature content."
Asked when Pulumi first realized its AI had issues, Duffy acknowledged Pulumi has been aware its AI isn't perfect since it launched last year, and has invested to improve its quality.
"We have a new typechecker loop that feeds back into the AI and improves our results," he explained. "We've tweaked it to be better at Python, and we've taught it about our cloud SDKs. All of these have had material increases in quality – and it will just keep getting better from here. Although there's been some negative sentiment on social media, far and away the feedback we get directly is that the AI is helpful, especially when just getting started in the cloud – it truly is daunting to even get started navigating hundreds of clouds each with tens of thousands of services."
Duffy revealed that Pulumi has already removed 100,000 AI answers and will take down more in future.
Despite the challenges, Duffy expects AI will improve over time. "We move fast and try innovative new ideas regularly – and sometimes they just don't work out the way we intended," he admitted. "If we can’t get to a good place quickly, we will absolutely consider delisting all of them and building back up more slowly."
Duffy added that Pulumi's AI Answers clearly state that they're the product of AI. "Despite the hallucinations, we regularly hear 'Even if imperfect, we prefer to have something 80 percent correct, [rather] than nothing at all'." ®