This article is more than 1 year old
So what exactly sits behind Google’s Nearline storage service?
Tape? Nah. Blu-ray? Maybe. Weird variant? Possibly
Comment How is Google’s retrieval service for non-essential data, Nearline, with its three-second retrieval latency, viable at the same cost as Amazon’s Glacier, when it uses tape with a 3-5 hour retrieval latency?
Tape is cheap – but slow – and Google can't be using the stuff for Nearline. We think Nearline uses either Blu-ray optical disk drives, or traditional spinning rust, but with some recent twist to get the cost down.
Google says it uses its existing flash and disk infrastructure. Tom Kershaw, director of product management for Google Cloud Platform, said: "We wanted to maintain a single stack because that creates fungibility of content ... So you need to be able to seamlessly move stuff across the storage, and if you have completely separate farms it is really hard to move stuff."
"As everybody knows in storage, storing is cheap and moving is expensive. So we have software on top of our fabric that does the differentiation, but the underlying disk and flash is the same," said Kershaw.
"Because we have such a massive volume of capacity on all of our other storage, if we put Nearline Storage on another kind of system we would lose the economies of scale," he added.
But that underlying disk and flash being the same doesn't quite sound right to us at The Register, and Google offers the facility of moving data between its storage classes, indicating to me that different media within the existing disk and flash infrastructure is being employed.
It's not just a metadata change that's needed when altering a customer's data set storage policy.
If optical disks were being used for nearline storage then that couldn't really be classed as being part of the existing disk and flash infrastructure. Also, retrieval time from offline Blu-ray disks in a library tends to be greater than three seconds. So bye-bye to the Blu-ray possibility.
So what's changed in disk storage to make Google Nearline feasible now, that wasn't available when Amazon devised tape-based Glacier several years ago?
We can see two things; shingled magnetic recording (SMR) and helium-filled drives, both of which lift capacity significantly.
In June HGST introduced its He10 10TB shingled drive.
Seagate has an 8TB shingled drive that is air-filled and has one platter less than HGST's 10TB spinner. HGST's shingling is host-managed and not drive-managed, necessitating some host server application software changes.
Google would not be averse to doing that if it gained drive storage efficiency as a result.
However, HGST said it was working with PMC-Sierra and Avago-LSI for HBA support. If that support was not ready, then Google would face problems unless it got HBA technology earlier or from somewhere else. Or unless it was using Seagate SMR disk drives which store less (6-8TB) but don't need host server software changes.
This writer thinks Google is likely using SMR disk drives for its nearline storage, with HGST helium drives the best capacity/cost fit and Seagate ones simpler to adopt, but not so good from the capacity/cost angle. ®