Off-Prem

SaaS

Google Cloud (over)Run: How a free trial experiment ended with a $72,000 bill overnight

Billing budget? Free plan? All useless when buggy code went into overdrive


Sudeep Chauhan, founder of startup Milkie Way, suffered a bad case of bill shock when a test with a $7.00 billing budget and a free database plan on Google Cloud platform (GCP) generated a $72,000 invoice overnight.

"I jumped out of the bed, logged into Google Cloud Billing, and saw a bill for ~$5,000," Chauhan wrote on his company's blog. "Super stressed, and not sure what happened, I clicked around, trying to figure out what was happening. I also started thinking of what may have happened, and how we could possibly pay the $5K bill. The problem was, every minute the bill kept going up. After two hours, it settled at a little short of $72,000."

It was especially surprising that it happened to Chauhan, who is ex-Google and even spent two years as a payments technical program manager. What happened?

The idea was to build a system that scraped web pages and stored the results in a database. His team picked Google Cloud Run, a GCP service that runs containers, for the job. They then found their code in each instance would timeout and stop as it scraped one page after the other. So, they set up a many-instance system that processed pages in parallel to get each page fetched and stored within the run-time limit.

Devs invited to bake 'Run on Google Cloud' button into git repos... By Google, of course

READ MORE

Chauhan wrote: "To overcome the timeout limitation, I suggested using POST requests (with URL as data) to send jobs to an instance, and [to] use multiple instances in parallel instead of using one instance serially. Because each instance in Cloud Run would only be scraping one page, it would never time out, process all pages in parallel (scale), and also be highly optimized because Cloud Run usage is accurate to milliseconds."

The ex-Googler reflected that he missed the possibility of pages that link back to each other, causing "infinite recursion." It should not have mattered too much, though: he set a billing budget of $7.00 and had a Firebase database on a free plan. "The worst case we imagined was exceeding the daily free Firestore limits," he said. Further, the credit card for the account had a spending limit of $100.

Unfortunately, a billing budget "does not automatically cap Google Cloud or Google Maps Platform usage/spending," according to the docs.

While Chauhan was asleep after a day of testing, Google sent an automated email informing him that his free Firebase plan had been "upgraded due to activity in Google Cloud," and that this "initiated billing" for the project.

He discovered multiple issues with the GCP cost controls. "Billing takes about a day to be synced, and that's why we noticed the charges the next day," Chauhan said. Next, the "Firebase Dashboard took more than 24 hours to update," he said. This meant that the dashboard showed usage within the daily limit, when it was, he said, "86 million percentage points" more than what was shown.

Billing takes about a day to be synced, and that's why we noticed the charges the next day

The GCP Cloud Run defaults also played their part. "The max-instances is preset to 1,000, and concurrency set to 80," he said. If he had corrected this to small values like 2 and 1, the bill shock would not have occurred.

Thanks to these settings, "running [out] this version of Hello World deployment on Cloud Run made 116 billion reads and 33 million writes to Firestore," said Chauhan.

Most of the cost was down to Firebase read operations, even at just $0.06 per 100,000. Multiply that by 116 billion and you get $69,600. There was also the small matter of 16,000 hours of Cloud Run Compute time, partly because the application did not delete the services but left them "in background process".

The performance of the buggy code was impressive in its way. "At the peak, Firebase was able to handle about one billion reads per minute," he said, while Cloud Run with concurrency "can handle 9 million requests per minute".

"Fail fast, learn fast with cloud is a bad idea," Chauhan concluded. "If you count the number of pages in GCP documentation, it's probably more than pages in [a] few novels. Understanding pricing, usage, is not only time consuming, but requires a deep understanding of how cloud services work."

There is a happy ending. "After going through our lengthy doc on this incident sharing our side of the story, various consults, talks, and internal discussions, Google let go of our bill as a one-time gesture," said Chauhan.

Such leniency cannot be relied upon. Auto-scaling and on-demand computing has downsides, and working out what something will cost is challenging. Caution is advised. ®

Send us news
116 Comments

Huawei's cloud unit is its current growth vehicle

Big in China – and a presence elsewhere, but not at a scale to worry global hyperscalers

Backblaze cloud storage buzzes with added Event Notifications

If you want open system to automate workflows over platform of your choosing, join the queue

Alibaba Cloud reveals network telemetry tool that helped cut number of engineers needed by 86%

Zoonet employs 'elegant generalization of ping and traceroute' among other tricks

AWS must pay $525M to cloud storage patent holder, says jury

Computing giant will appeal ruling, which found infringement was not 'willful'

SharePoint logs are easily circumvented and Microsoft is dragging its heels

Now is the perfect time to review those permissions

US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss

It's the region where stuff gets stressed at scale first, says Dave Brown, as he plots variants of Amazon's Outposts

Huawei Cloud reveals the dynamic traffic allocation system it uses to cut bandwidth bills

Created during COVID to handle video boom and sliced bandwidth costs by 30 percent

Irish power crunch could be prompting AWS to ration compute resources

Users report being pointed to other EU regions if they need more grunt

Alibaba Cloud slashes prices outside China

Domestic customers saw their fees cut last January

What happened to agility and new business models? Cloud benefits have all gone to IT

Orgs are missing a trick when it comes to the white fluffy stuff, survey says

Cloud vendor lock-in is shocking, but there's a get out of jail card

We've done it once, we can do it again

Stability AI reportedly ran out of cash to pay its bills for rented cloudy GPUs

Generative AI darling was on track to pay $99M on compute to generate just $11M in revenues