Web archive user's $14k BigQuery bill shock after running queries on 'free' dataset

Researcher makes case for default limits after arriving via Python library

A user left with a surprise bill for thousands of dollars after running queries on Google's BigQuery data warehouse has sparked a debate about how vendors should place limits on the use of their tools.

One user of HTTP Archive – a project that aims to track how the web is built – was recently horrified to get a $14,000 bill from Google.

The HTTP project – which crawls websites recording detailed information about fetched resources, used web platform APIs and features, and execution traces of each page – hosts a publicly available dataset on the Chocolate Factory's BigQuery cloud-base data warehouse system.

"This website makes it seem like this 'public' dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars," said user Tim on the HTTP archive forum.

"This official website should be updated to warn people Google is apparently now hosting this dataset to make money. I don't think that was the original mission, but that's what it is today, there's basically zero customer support, and you can lose $14k in the blink of an eye," he added in the discussion post.

An archive maintainer responded that 99 percent of the archive users only view its free monthly reports and annual Web Almanac reports. BigQuery is designed for the 1 percent of "power users" who "need lower level access to the raw data."

The maintainer pointed out that $14,000 would have come from processing about 2.5 petabytes, given Googles rate of $6.25 per TiB. He said Google warns users how much data the query will process when run, yet nonetheless apologized for the user's experience and said he'll add a more explicit warning about BigQuery charging to the website's FAQ page.

However, the user, Tim, came back into the conversation. He said he was running queries from a Python script with the official GCP libraries, which, unlike the web UI, does not have a mechanism to show costs for a query, he said.

"I think one thing that would help is to highlight people should enable the cost controls prior to running queries, as they are not on by default," he said.

Tim argued for a circuit-breaker at $5k or less to stop users from running queries unless they manually confirm they want to continue.

One respondent logged on to say that the complainant was an idiot — in a post now hidden by moderators — for running a query without understanding the volume of data it might address. Others may see this as unhelpful.

While Google makes BigQuery's pricing clear on its website, users — particularly students or academics — might arrive at the data from another direction. Maybe a default should be to prevent processing data above a certain threshold unless the user explicitly agrees or they have signed up to a data plan.

The Register has contacted Google for a statement. ®

More about

TIP US OFF

Send us news


Other stories you might like