Off-Prem

SaaS

Google Cloud (over)Run: How a free trial experiment ended with a $72,000 bill overnight

Billing budget? Free plan? All useless when buggy code went into overdrive


Sudeep Chauhan, founder of startup Milkie Way, suffered a bad case of bill shock when a test with a $7.00 billing budget and a free database plan on Google Cloud platform (GCP) generated a $72,000 invoice overnight.

"I jumped out of the bed, logged into Google Cloud Billing, and saw a bill for ~$5,000," Chauhan wrote on his company's blog. "Super stressed, and not sure what happened, I clicked around, trying to figure out what was happening. I also started thinking of what may have happened, and how we could possibly pay the $5K bill. The problem was, every minute the bill kept going up. After two hours, it settled at a little short of $72,000."

It was especially surprising that it happened to Chauhan, who is ex-Google and even spent two years as a payments technical program manager. What happened?

The idea was to build a system that scraped web pages and stored the results in a database. His team picked Google Cloud Run, a GCP service that runs containers, for the job. They then found their code in each instance would timeout and stop as it scraped one page after the other. So, they set up a many-instance system that processed pages in parallel to get each page fetched and stored within the run-time limit.

Devs invited to bake 'Run on Google Cloud' button into git repos... By Google, of course

READ MORE

Chauhan wrote: "To overcome the timeout limitation, I suggested using POST requests (with URL as data) to send jobs to an instance, and [to] use multiple instances in parallel instead of using one instance serially. Because each instance in Cloud Run would only be scraping one page, it would never time out, process all pages in parallel (scale), and also be highly optimized because Cloud Run usage is accurate to milliseconds."

The ex-Googler reflected that he missed the possibility of pages that link back to each other, causing "infinite recursion." It should not have mattered too much, though: he set a billing budget of $7.00 and had a Firebase database on a free plan. "The worst case we imagined was exceeding the daily free Firestore limits," he said. Further, the credit card for the account had a spending limit of $100.

Unfortunately, a billing budget "does not automatically cap Google Cloud or Google Maps Platform usage/spending," according to the docs.

While Chauhan was asleep after a day of testing, Google sent an automated email informing him that his free Firebase plan had been "upgraded due to activity in Google Cloud," and that this "initiated billing" for the project.

He discovered multiple issues with the GCP cost controls. "Billing takes about a day to be synced, and that's why we noticed the charges the next day," Chauhan said. Next, the "Firebase Dashboard took more than 24 hours to update," he said. This meant that the dashboard showed usage within the daily limit, when it was, he said, "86 million percentage points" more than what was shown.

Billing takes about a day to be synced, and that's why we noticed the charges the next day

The GCP Cloud Run defaults also played their part. "The max-instances is preset to 1,000, and concurrency set to 80," he said. If he had corrected this to small values like 2 and 1, the bill shock would not have occurred.

Thanks to these settings, "running [out] this version of Hello World deployment on Cloud Run made 116 billion reads and 33 million writes to Firestore," said Chauhan.

Most of the cost was down to Firebase read operations, even at just $0.06 per 100,000. Multiply that by 116 billion and you get $69,600. There was also the small matter of 16,000 hours of Cloud Run Compute time, partly because the application did not delete the services but left them "in background process".

The performance of the buggy code was impressive in its way. "At the peak, Firebase was able to handle about one billion reads per minute," he said, while Cloud Run with concurrency "can handle 9 million requests per minute".

"Fail fast, learn fast with cloud is a bad idea," Chauhan concluded. "If you count the number of pages in GCP documentation, it's probably more than pages in [a] few novels. Understanding pricing, usage, is not only time consuming, but requires a deep understanding of how cloud services work."

There is a happy ending. "After going through our lengthy doc on this incident sharing our side of the story, various consults, talks, and internal discussions, Google let go of our bill as a one-time gesture," said Chauhan.

Such leniency cannot be relied upon. Auto-scaling and on-demand computing has downsides, and working out what something will cost is challenging. Caution is advised. ®

Send us news
115 Comments
Get our Weekly newsletter

Keep Reading

AWS creates a quantum computing cloud with classical testbed plus rentable qubits

If you think the quantum world is confusing, wait until you see the pricing

AWS has just shown its new hybrid cloud ambitions make it an even broader threat

Comment While also making it harder to label dedicated and skilled sysadmins as recalcitrant box-huggers

BeyondCorp Enterprise: Google's Chrome-shaped approach to 'cloud-native zero trust computing'

New security features in Chrome but can businesses do everything they need through the browser?

Microsoft? AWS? Nein und nein. Deutsche Bank signs up with Google Cloud for its latest crack at digital transformation

5 months after request for proposal, Satya and Jeff left to languish on sidelines

Amazon Web Services launches appeal after losing $12m AWS trademark war in China to local biz Actionsoft

American goliath vows to overturn ruling at Middle Kingdom's Supreme People’s Court

Google signs agreement to offer discounts on cloud services through UK govt's Digital Marketplace

Microsoft already at the trough, IBM and AWS said to be en route

Alibaba Cloud revenue grows 62% – but it's still just a sixth the size of AWS

But company reckons things are just getting started in China

AWS to double sales droids as Google, Microsoft's growing clouds threaten to gobble larger slices of Bezos' pie

Experts drafted in to help new hires answer customers' technical questions

The winners and losers of infrastructure clouds revealed: AWS, Microsoft, Google and Alibaba get fatter

Can you smell the democratisation of IT? Neither can the shrinking 'others' section

It's happened: AWS signs Memorandum of Understanding for fluffy white services with UK.gov

Exclusive Public sector to be treated as one vast buyer of clouds under One Government Value Agreement

Tech Resources

5 Things To Consider When Choosing An APM Tool

The goal: to reach a state of observability where the whole picture is detailed and constantly up-to-date — with monitoring and alerts that take it all into consideration.

Incident Response Guide

What’s the best way to stop a cyberattack from turning into a full breach? Prepare in advance.

Expediting IT in the Wake of Ever-increasing Demands and Problematic Talent Shortages

IT innovation comes in waves. Cloud IT disrupted the last decade, and the next wave of disruption will likely center on intelligent infrastructure.

Toolkit: Getting Started with Vulnerability Risk Management

Reducing risk across your complete, modern attack surface is no small undertaking, but you don’t have to go at it alone.