Igneous ARM CPUs: What if they tossed the blindfold?

Rise of the Storage Machines

On Storage Igneous's Ethernet-accessed, ARM-driven disk drives provide a seriously large amount of collective CPU chops to its dataBox/dataRouter array but the poor little suckers work blindfolded. Why would I say that?

Think about it like this: a data object, say a photo image, comes into the array to be stored and is split into 20 data chunks plus eight parity chunks for the (20+8) for the erasure coding scheme. These 28 chunks are then striped across 28 disk drives, each with their own ARM CPU. Over time, more and more stripes are sent to each disk drive so any one drive stores a collection of disparate stripes, which could be data or parity. Unless a disk drive CPU is told or can detect the difference, then everything is data.

The photo object has metadata, such as its name, size, image type, creator, camera ID and so forth. A text object would have roughly the same generic sort of file ID information. This metadata is stored somewhere as well and, as long as it is stored intact, can be searched; find all Nikon Coolpix photos in 2016, for example.

But if it is chunked and striped across multiple drives then it cannot be searched – unless, we suppose, it is chunked at metadata item boundaries so that the image name and other metadata items are not spread across two chunks. But then the chunking mechanism has to know about metadata item boundaries, which gets us into photo image metadata awareness, text image metadata etc ... we won't go there.

Let's suppose we want to find all text files with content relating to Christmas trees. A metadata search can tell us this at file name level. If the text files had been content-inspected before being stored then that additional metadata, the content indices, could be searched as well. How could ARM-powered disk drives help here?

Any individual drives has its stored chunks of data and parity data. The drive hasn't a clue what they refer to, what the contents are about. How can it? All it's been told to do by Igneous's dataRouter is to store this particular chunk. Unless it's told what the chunk is part of and what its contents refer to, then the chunk is gobbledygook. The drive CPU decides on what tracks and on what platter to write the data, and sends it there. That's it.

When the drive CPU gets told to read the chunk it looks up its location in the map it maintains, finds it, and streams it off to the dataRouter which reassembles the object from the chunks and sends it onward to the accessing server.

Unless drives store whole objects they can't look inside them, and drives don't store whole objects because drives fail. So objects are chunked, parity-code chunks added, and the chunks striped across drives so they can be recovered when drives fail. The data protection scheme enforces object content blindness at drive level.

Seagate's Kinetic drives, and Igneous's and OpenIO's ARM-powered, Ethernet-and-object-accessed disk drives are blind to object contents and appear to be of no use in implementing object content-level data services. Can this divide be bridged? We'd be interested to find out out how. ®

Other stories you might like

Biting the hand that feeds IT © 1998–2022