It was a bombshell when Facebook's Jason Taylor said he would like to use flash solid-state storage as an archive medium, but his reasons made perfect sense.
Facebook users had lots of photos stored in their albums and rarely accessed them. But when they did want to look at them they wanted them to come up straight away, whether the photos were recent or five years old.
Only a particular form of flash storage could, in theory, combine the attributes Taylor wanted: low-cost, long-term, reliable storage and fast access.
The two current archive media types, tape and disk, rely on being offline when not being accessed and thus consuming next to no power. As soon as an archive medium is accessed it uses power, and electricity cost is a huge concern when you are storing many petabytes of data, heading towards exabyte levels, over many years.
A tape archive is a library with few drives and hundreds or thousands of slots for offline tape cartridges. Robot mechanisms identify where a cartridge is located when a file is needed and deliver it to a drive to be mounted and spun to the right location on the wound tape so the file can be accessed.
Make it a double
Tape is intrinsically the cheapest archive medium. Unlike disk drives, tape cartridges have no costly embedded drive or motor. They hold data safely for many years or even decades and are robust. They can also compress data and that can double or triple the cartridge's effective capacity.
A tape library runs cool as it has only a few drives that consume power and its robot mechanisms don't use much power either. This means that it uses a fraction of the power and cooling needed for an equivalent amount of data in an online disk drive array.
A tape archive still has a much lower total cost of ownership
The library takes up data-centre space, of course, and that is a cost but compared with a disk-based archive, a tape archive still has a much lower total cost of ownership.
As a result the idea of using disk as an archive was laughable until recently. Two advances, however, have changed the maths: deduplication and spinning down disks when not in use.
Deduplication: Much better than compression at removing repeated strings of bytes in data, deduplication has allowed tape to be replaced by disk for backing up and storing data for short-term protection needs such as recovering lost files or replacing corrupted data.
Deduplication can achieve as much as a 5:1 data reduction ratio with backup or VDI data (less so with images and structured data). That, together with disk’s faster data access*, was more than enough to cause wholesale replacement of tape by disk for backup.
Spinning down disks when not in use: Spinning up the disk and accessing data only when needed is a much quicker process than finding a tape in a library, robotically moving it to a drive, mounting it and streaming it to the right location.
Even though disk capacity has just reached 8TB, and 10TB is in prospect with advances like shingled magnetic recording, and every capacity increase lowers the cost/GB of disk-stored data, tape is increasing its density faster, with 154TB cartridges demonstrated by IBM.
The Wikibon consultancy declares tape "areal density is growing at approximately 30 per cent versus disk, which is growing at only 9.6 per cent."
A non-spinning disk needs no power and that means the cost of deduplicating and spinning down disk is approaching tape storage costs. It is still not as cost-efficient as tape for data archiving, or as reliable, but in cases where archive data access speed is a high priority, disk is beginning to be used.
Its users, though, would like a cheaper way of archiving their data without losing disk’s speed advantage. And, they would say, spun-down disk is not fast enough for data access. It may be better than tape but that is like comparing walking to crawling when what you really want is a medium that can sprint.
Some archive use cases have settled on Blu-ray optical disks as the archive medium but we understand this is a minority and not a mainstream archive medium. Here's a Facebook example.