Off-Prem

SaaS

Google Cloud (over)Run: How a free trial experiment ended with a $72,000 bill overnight

Billing budget? Free plan? All useless when buggy code went into overdrive


Sudeep Chauhan, founder of startup Milkie Way, suffered a bad case of bill shock when a test with a $7.00 billing budget and a free database plan on Google Cloud platform (GCP) generated a $72,000 invoice overnight.

"I jumped out of the bed, logged into Google Cloud Billing, and saw a bill for ~$5,000," Chauhan wrote on his company's blog. "Super stressed, and not sure what happened, I clicked around, trying to figure out what was happening. I also started thinking of what may have happened, and how we could possibly pay the $5K bill. The problem was, every minute the bill kept going up. After two hours, it settled at a little short of $72,000."

It was especially surprising that it happened to Chauhan, who is ex-Google and even spent two years as a payments technical program manager. What happened?

The idea was to build a system that scraped web pages and stored the results in a database. His team picked Google Cloud Run, a GCP service that runs containers, for the job. They then found their code in each instance would timeout and stop as it scraped one page after the other. So, they set up a many-instance system that processed pages in parallel to get each page fetched and stored within the run-time limit.

Devs invited to bake 'Run on Google Cloud' button into git repos... By Google, of course

READ MORE

Chauhan wrote: "To overcome the timeout limitation, I suggested using POST requests (with URL as data) to send jobs to an instance, and [to] use multiple instances in parallel instead of using one instance serially. Because each instance in Cloud Run would only be scraping one page, it would never time out, process all pages in parallel (scale), and also be highly optimized because Cloud Run usage is accurate to milliseconds."

The ex-Googler reflected that he missed the possibility of pages that link back to each other, causing "infinite recursion." It should not have mattered too much, though: he set a billing budget of $7.00 and had a Firebase database on a free plan. "The worst case we imagined was exceeding the daily free Firestore limits," he said. Further, the credit card for the account had a spending limit of $100.

Unfortunately, a billing budget "does not automatically cap Google Cloud or Google Maps Platform usage/spending," according to the docs.

While Chauhan was asleep after a day of testing, Google sent an automated email informing him that his free Firebase plan had been "upgraded due to activity in Google Cloud," and that this "initiated billing" for the project.

He discovered multiple issues with the GCP cost controls. "Billing takes about a day to be synced, and that's why we noticed the charges the next day," Chauhan said. Next, the "Firebase Dashboard took more than 24 hours to update," he said. This meant that the dashboard showed usage within the daily limit, when it was, he said, "86 million percentage points" more than what was shown.

Billing takes about a day to be synced, and that's why we noticed the charges the next day

The GCP Cloud Run defaults also played their part. "The max-instances is preset to 1,000, and concurrency set to 80," he said. If he had corrected this to small values like 2 and 1, the bill shock would not have occurred.

Thanks to these settings, "running [out] this version of Hello World deployment on Cloud Run made 116 billion reads and 33 million writes to Firestore," said Chauhan.

Most of the cost was down to Firebase read operations, even at just $0.06 per 100,000. Multiply that by 116 billion and you get $69,600. There was also the small matter of 16,000 hours of Cloud Run Compute time, partly because the application did not delete the services but left them "in background process".

The performance of the buggy code was impressive in its way. "At the peak, Firebase was able to handle about one billion reads per minute," he said, while Cloud Run with concurrency "can handle 9 million requests per minute".

"Fail fast, learn fast with cloud is a bad idea," Chauhan concluded. "If you count the number of pages in GCP documentation, it's probably more than pages in [a] few novels. Understanding pricing, usage, is not only time consuming, but requires a deep understanding of how cloud services work."

There is a happy ending. "After going through our lengthy doc on this incident sharing our side of the story, various consults, talks, and internal discussions, Google let go of our bill as a one-time gesture," said Chauhan.

Such leniency cannot be relied upon. Auto-scaling and on-demand computing has downsides, and working out what something will cost is challenging. Caution is advised. ®

Send us news
115 Comments

New audio server Pipewire coming to next version of Ubuntu

What does that mean? Better latency and a replacement for PulseAudio

The next release of Ubuntu, version 22.10 and codenamed Kinetic Kudu, will switch audio servers to the relatively new PipeWire.

Don't panic. As J M Barrie said: "All of this has happened before, and it will all happen again." Fedora switched to PipeWire in version 34, over a year ago now. Users who aren't pro-level creators or editors of sound and music on Ubuntu may not notice the planned change.

Currently, most editions of Ubuntu use the PulseAudio server, which it adopted in version 8.04 Hardy Heron, the company's second LTS release. (The Ubuntu Studio edition uses JACK instead.) Fedora 8 also switched to PulseAudio. Before PulseAudio became the standard, many distros used ESD, the Enlightened Sound Daemon, which came out of the Enlightenment project, best known for its desktop.

Continue reading

VMware claims 'bare-metal' performance from virtualized Nvidia GPUs

Is... is that why Broadcom wants to buy it?

The future of high-performance computing will be virtualized, VMware's Uday Kurkure has told The Register.

Kurkure, the lead engineer for VMware's performance engineering team, has spent the past five years working on ways to virtualize machine-learning workloads running on accelerators. Earlier this month his team reported "near or better than bare-metal performance" for Bidirectional Encoder Representations from Transformers (BERT) and Mask R-CNN — two popular machine-learning workloads — running on virtualized GPUs (vGPU) connected using Nvidia's NVLink interconnect.

NVLink enables compute and memory resources to be shared across up to four GPUs over a high-bandwidth mesh fabric operating at 6.25GB/s per lane compared to PCIe 4.0's 2.5GB/s. The interconnect enabled Kurkure's team to pool 160GB of GPU memory from the Dell PowerEdge system's four 40GB Nvidia A100 SXM GPUs.

Continue reading

Nvidia promises annual datacenter product updates across CPU, GPU, and DPU

Arm one year, x86 the next, and always faster than a certain chip shop that still can't ship even one standalone GPU

Computex Nvidia's push deeper into enterprise computing will see its practice of introducing a new GPU architecture every two years brought to its CPUs and data processing units (DPUs, aka SmartNICs).

Speaking on the company's pre-recorded keynote released to coincide with the Computex exhibition in Taiwan this week, senior vice president for hardware engineering Brian Kelleher spoke of the company's "reputation for unmatched execution on silicon." That's language that needs to be considered in the context of Intel, an Nvidia rival, again delaying a planned entry to the discrete GPU market.

"We will extend our execution excellence and give each of our chip architectures a two-year rhythm," Kelleher added.

Continue reading

Now Amazon puts 'creepy' AI cameras in UK delivery vans

Big Bezos is watching you

Amazon is reportedly installing AI-powered cameras in delivery vans to keep tabs on its drivers in the UK.

The technology was first deployed, with numerous errors that reportedly denied drivers' bonuses after malfunctions, in the US. Last year, the internet giant produced a corporate video detailing how the cameras monitor drivers' driving behavior for safety reasons. The same system is now apparently being rolled out to vehicles in the UK. 

Multiple camera lenses are placed under the front mirror. One is directed at the person behind the wheel, one is facing the road, and two are located on either side to provide a wider view. The cameras are monitored by software built by Netradyne, a computer-vision startup focused on driver safety. This code uses machine-learning algorithms to figure out what's going on in and around the vehicle.

Continue reading

AWS puts its latest homebrew Arm CPU – the Graviton3 – into production

Just one instance type for now, but cheaper than third-gen Xeons or EPYCs

Amazon Web Services has made its latest homebrew CPU, the Graviton3, available to rent in its Elastic Compute Cloud (EC2) infrastructure-as-a-service offering.

The cloud colossus launched Graviton3 at its late 2021 re:Invent conference, revealing that the 55-billion-transistor device includes 64 cores, runs at 2.6GHz clock speed, can address DDR5 RAM and 300GB/sec max memory bandwidth, and employs 256-bit Scalable Vector Extensions.

The chips were offered as a tech preview to select customers. And on Monday, AWS made them available to all comers in a single instance type named C7g.

Continue reading

Beijing reverses ban on tech companies listing offshore

Announcement comes as Chinese ride-hailing DiDi Chuxing delists from NYSE under pressure

The Chinese government has announced that it will again allow "platform companies" – Beijing's term for tech giants – to list on overseas stock markets, marking a loosening of restrictions on the sector.

"Platform companies will be encouraged to list on domestic and overseas markets in accordance with laws and regulations," announced premier Li Keqiang at an executive meeting of China's State Council – a body akin to cabinet in the USA or parliamentary democracies.

The statement comes a week after vice premier Liu He advocated technology and government cooperation and a digital economy that supports an opening to "the outside world" to around 100 members of the Chinese People's Political Consultative Congress (CPPCC).

Continue reading

Nvidia teases server designs for Grace-Hopper Superchips

x86 still 'very important' we're told as lid lifted on Arm-based kit

Computex Nvidia's Grace CPU and Hopper Superchips will make their first appearance early next year in systems that'll be based on reference servers unveiled at Computex 2022 this week.

It's hoped these Arm-compatible HGX-series designs will be used to build computer systems that power what Nvidia believes will be a "half trillion dollar" market of machine learning, digital-twin simulation, and cloud gaming applications.

"This transformation requires us to reimagine the datacenter at every level, from hardware to software from chips to infrastructure to systems," Paresh Kharya, senior director of product management and marketing at Nvidia, said during a press briefing.

Continue reading

Nvidia brings liquid cooling to A100 PCIe GPU cards for ‘greener’ datacenters

For those who want to give their racks an air cut

Nvidia's GPUs are becoming increasingly more power hungry, so the US giant is hoping to make datacenters using them "greener" with liquid-cooled PCIe cards that contain its highest-performing chips.

At this year's Computex event in Taiwan, the computer graphics goliath revealed it will sell a liquid-cooled PCIe card for its flagship server GPU, the A100, in the third quarter of this year. Then in early 2023, the company plans to release a liquid-cooled PCIe card for the A100's recently announced successor, the Hopper-powered H100.

Nvidia's A100 has already been available for liquid-cooled servers, but to date, this has only been possible in the GPU's SXM form factor that goes into the company's HGX server board.

Continue reading

Broadcom to buy VMware 'on Thursday for $60 billion'

Think we speak for everyone when we say: Seriously, what the f...?

Broadcom is to acquire VMware for $60 billion in a deal that will be announced on Thursday.

That's according to the Wall Street Journal. VMware is scheduled to report its Q1 2023 results on the same day, so the Thursday announcement theory is not entirely unrealistic.

Neither biz has had anything to say about the reported deal at the time of writing, with VMware declining comment on rumor and speculation.

Continue reading

Screencastify fixes bug that would have let rogue websites spy on webcams

School-friendly tool still not fully protected, privacy guru warns

Screencastify, a popular Chrome extension for capturing and sharing videos from websites, was recently found to be vulnerable to a cross-site scripting (XSS) flaw that allowed arbitrary websites to dupe people into unknowingly activating their webcams.

A miscreant taking advantage of this flaw could then download the resulting video from the victim's Google Drive account.

Software developer Wladimir Palant, co-founder of ad amelioration biz Eyeo, published a blog post about his findings on Monday. He said he reported the XSS bug in February, and Screencastify's developers fixed it within a day.

Continue reading

FTC urged to protect data privacy of women visiting abortion clinics

As Supreme Court set to overturn Roe v Wade, safeguards on location info now more vital than ever

Democrat senators have urged America's Federal Trade Commission to do something to protect the privacy of women after it emerged details of visits to abortion clinics were being sold by data brokers.

Women's healthcare is an especially thorny issue right now after the Supreme Court voted in a leaked draft majority opinion to overturn Roe v Wade, a landmark ruling that declared women's rights to have an abortion are protected by the Fourteenth Amendment of the US Constitution.

If the nation's top judges indeed vote to strike down that 1973 decision, individual states, at least, can set their own laws governing women's reproductive rights. Thirteen states already have so-called "trigger laws" in place prohibiting abortions – mostly with exceptions in certain conditions, such as if the pregnancy or childbirth endangers the mother's life – that will go into effect if Roe v Wade is torn up. People living in those states would, in theory, have to travel to another state where abortion is legal to carry out the procedure lawfully, although laws are also planned to ban that.

Continue reading