Off-Prem

SaaS

Google Cloud (over)Run: How a free trial experiment ended with a $72,000 bill overnight

Billing budget? Free plan? All useless when buggy code went into overdrive


Sudeep Chauhan, founder of startup Milkie Way, suffered a bad case of bill shock when a test with a $7.00 billing budget and a free database plan on Google Cloud platform (GCP) generated a $72,000 invoice overnight.

"I jumped out of the bed, logged into Google Cloud Billing, and saw a bill for ~$5,000," Chauhan wrote on his company's blog. "Super stressed, and not sure what happened, I clicked around, trying to figure out what was happening. I also started thinking of what may have happened, and how we could possibly pay the $5K bill. The problem was, every minute the bill kept going up. After two hours, it settled at a little short of $72,000."

It was especially surprising that it happened to Chauhan, who is ex-Google and even spent two years as a payments technical program manager. What happened?

The idea was to build a system that scraped web pages and stored the results in a database. His team picked Google Cloud Run, a GCP service that runs containers, for the job. They then found their code in each instance would timeout and stop as it scraped one page after the other. So, they set up a many-instance system that processed pages in parallel to get each page fetched and stored within the run-time limit.

Devs invited to bake 'Run on Google Cloud' button into git repos... By Google, of course

READ MORE

Chauhan wrote: "To overcome the timeout limitation, I suggested using POST requests (with URL as data) to send jobs to an instance, and [to] use multiple instances in parallel instead of using one instance serially. Because each instance in Cloud Run would only be scraping one page, it would never time out, process all pages in parallel (scale), and also be highly optimized because Cloud Run usage is accurate to milliseconds."

The ex-Googler reflected that he missed the possibility of pages that link back to each other, causing "infinite recursion." It should not have mattered too much, though: he set a billing budget of $7.00 and had a Firebase database on a free plan. "The worst case we imagined was exceeding the daily free Firestore limits," he said. Further, the credit card for the account had a spending limit of $100.

Unfortunately, a billing budget "does not automatically cap Google Cloud or Google Maps Platform usage/spending," according to the docs.

While Chauhan was asleep after a day of testing, Google sent an automated email informing him that his free Firebase plan had been "upgraded due to activity in Google Cloud," and that this "initiated billing" for the project.

He discovered multiple issues with the GCP cost controls. "Billing takes about a day to be synced, and that's why we noticed the charges the next day," Chauhan said. Next, the "Firebase Dashboard took more than 24 hours to update," he said. This meant that the dashboard showed usage within the daily limit, when it was, he said, "86 million percentage points" more than what was shown.

Billing takes about a day to be synced, and that's why we noticed the charges the next day

The GCP Cloud Run defaults also played their part. "The max-instances is preset to 1,000, and concurrency set to 80," he said. If he had corrected this to small values like 2 and 1, the bill shock would not have occurred.

Thanks to these settings, "running [out] this version of Hello World deployment on Cloud Run made 116 billion reads and 33 million writes to Firestore," said Chauhan.

Most of the cost was down to Firebase read operations, even at just $0.06 per 100,000. Multiply that by 116 billion and you get $69,600. There was also the small matter of 16,000 hours of Cloud Run Compute time, partly because the application did not delete the services but left them "in background process".

The performance of the buggy code was impressive in its way. "At the peak, Firebase was able to handle about one billion reads per minute," he said, while Cloud Run with concurrency "can handle 9 million requests per minute".

"Fail fast, learn fast with cloud is a bad idea," Chauhan concluded. "If you count the number of pages in GCP documentation, it's probably more than pages in [a] few novels. Understanding pricing, usage, is not only time consuming, but requires a deep understanding of how cloud services work."

There is a happy ending. "After going through our lengthy doc on this incident sharing our side of the story, various consults, talks, and internal discussions, Google let go of our bill as a one-time gesture," said Chauhan.

Such leniency cannot be relied upon. Auto-scaling and on-demand computing has downsides, and working out what something will cost is challenging. Caution is advised. ®

Send us news
116 Comments

Microsoft sprinkles AI 'magic' and additional storage tiers on OneDrive

Big emphasis on photos in mobile app

Kyndryl follows in IBM's footsteps with rolling layoffs likely affecting thousands

Underutilized staff get sent to the 'bench' – and seldom return

Cloud threats have execs the most freaked out because they're not prepared

Ransomware? More like 'we don't care' for everyone but CISOs

Google files first ever complaint with European Commission against Microsoft

Mountain View versus Redmond: Fight over cloud software licensing policies gets formal

Hyperscalers are carving up the ocean floor into private internet highways

Think tank warns of sovereignty risks from subsea cable consolidation

Cloud giants point the finger at each other during regulator hearings

Those are some mighty powerful underdogs you've got there

AWS claims customers are packing bags and heading back on-prem

See? We do have competition, cloud giant tells regulator

'Hyperscale customer' to take massive datacenter site near London

'Commercially sensitive' incognito buyer has a lot more support than last group that tried to build a bit barn near M25

Keen to bring it on home? Private cloud appliance pitched at compliance-conscious

Service with endless scalability, at least that's the plan

The elusive dream of cloud portability: Why migrating workloads isn't so simple

Despite early promises, moving between providers remains a complex and costly endeavor

Admins wonder if the cloud was such a good idea after all

As AWS, Microsoft, and Google hike some prices, it's time to open up the ROI calculator

Alibaba Cloud boosts failure prediction with logfile timestamps

Machine learning helps, but more data catches more faults - so Chinese champ has shared its data