Tape, glass, and molecules – the future of archival storage
Time to stop giving cold storage the cold shoulder
Feature The future of archival data storage is tape, more tape, and then possibly glass-based tech, with DNA and other molecular tech still a distant prospect.
The function of archival storage is to keep data for the long term – decades and beyond – reliably and affordably. Currently, the main medium for this is LTO tape and it is slow, has a limited life, and not enough capacity considering the implications of ever-increasing image and video resolution and AI-infused data generation. However, there is as yet no viable tape replacement technology at scale, only possibilities, with optical storage more practical and nearer to productization than DNA or other molecular storage.
Tape limitations
Streaming tape, with the tenth LTO generation (LTO-10) being announced, faces growing inadequacy. Although it is the most popular archival storage choice because it has a far lower cost per terabyte than either hard disk or SSD, a tape's content requires it to be copied and written (resilvered) to a fresh tape every five years or so to avoid bit rot.
"Magnetic technology has a finite lifetime," says Ant Rowstron, Distinguished Engineer, Microsoft Project Silica. "You must keep copying it over to new generations of media. A hard disk drive might last five years. A tape, well, if you're brave, it might last ten years. But once that lifetime is up, you've got to copy it over. And that, frankly, is both difficult and tremendously unsustainable if you think of all that energy and resource we're using."
Tape's access speed is slow as it is read sequentially while being streamed through a tape drive, unlike both disk and SSD, which are both randomly accessible giving a far faster time to first byte. LTO-10 has a 400 MBps throughput, the same as LTO-9 with its 18 TB raw/45 TB compressed capacity. It will take longer to read an LTO-10 tape than an LTO-9 tape because it has 66.7 percent more data on it.
Tape reel capacities are also falling behind those of disk and SSD. The latest LTO-10 has a 30 TB raw capacity and, at a 2.5:1 compression ratio, 75 TB. Disk drives are now in the 32-36 TB area with 40 TB coming. SSDs are already far beyond that with 122 TB drives available and 256 TB forecast for next year. The rate of tape capacity increase is slow in comparison, with the next generation, LTO-11, expected to have up to 72 TB raw capacity when it arrives around 2027/28. At that point, disk drives will have around 50 TB of capacity and SSDs should be heading past 300 TB.
But tape is affordable, much more so than either disk or SSD, and the best archival medium we have even though it is slow, getting limited in capacity, and only lasts five to ten years. It does have a roadmap for another four generations, taking us to around 2035/36, which provides some reliability, but it is ripe for replacement should a better technology come along. There are two potential replacements getting attention – glass-based and molecular technologies.
The glass archive game and Project Silica
Microsoft's Project Silica uses technology developed by the UK's University of Southampton to store data in square silica glass tablets by using polarization-based nanostructures, created by femtosecond infrared laser pulses. The glass is impervious to heat, boiling water, electromagnetic field radiation, various chemicals, and surface scratches shouldn't affect data recovery.
The nanostructures, defined by position, orientation, size, and light refraction, are created in a silicon glass tablet, 75 by 75 by 2 millimeters thick (2.95 x 2.95 x 0.08 inches). Together with Microsoft researchers, the academics stored 75.6 GB of data using multiple layers back in 2019.
The Southampton-based researchers then developed a 5D system using two optical and three spatial dimensions in the silica glass. They burned nanoscale circular voids or holes – 130 nm in size – in the silicon using a femtosecond laser pulse to create a micro-explosion and then following pulses to alter the shape, size, and edges of the void to create 460 x 50 nm nanolamellas – nanoscale plate-like structures or gratings called voxels (volumetric pixels). Each voxel stores four bits.
Microsoft says Project Silica glass tablets, roughly the size of a drink coaster or DVD, can now hold 7 TB of raw data, in 100 or more layers, and preserve the data for thousands of years. It uses Azure AI to decode the data stored in glass, saying it makes reading and writing faster and allows more data storage than otherwise.
Data is stored in a four-step process:
- Writing with an ultra-fast femtosecond laser
- Reading through a computer-controlled, polarization-sensitive microscope with polarized light shined through the glass
- Decoding via machine learning algorithms, interpreting the polarized light patterns
- Storing in a library, like tape cartridges
The library has battery-powered robots that charge as they idle inside the library, starting when data is needed. They ascend the rows of shelves, pick up a glass tablet, and take it to the reader. The system doesn't allow stored tablets with data to be taken to the writer station; they are designed to be immutable. However, if they were taken to the writer, the femtosecond laser pulses there could corrupt the stored data.
We don't know the current tablet capacity in actual terabyte numbers, nor the read and write throughput speeds. Microsoft says: "We're able to achieve system-level aggregate write throughputs comparable to current archival systems," presumably meaning tape libraries, and carefully not saying faster than the comparable systems. There is no sign of any imminent availability and we would estimate Project Silica to be two to five years away from product availability.
Microsoft emphasizes the low power requirements for Project Silica libraries, pushing out a sustainability message. It views Project Silica as a way of developing archival storage for its Azure cloud service. This means it's proprietary to Microsoft and unlikely to be made commercially available to AWS, the Google and Oracle clouds, or others.
Elire, a sustainability-focused venture group, has collaborated with Microsoft Research's Project Silica team to harness this technology for their Global Music Vault in Svalbard, Norway. Elire plans to expand this musical repository by establishing locations worldwide that are more accessible than Svalbard.
Cerabyte
Cerabyte's ceramic-coated glass is different in that femtosecond laser pulses burn nanoscale pits in a ceramic medium layered in a glass tablet. It is a single-layer technology so the tablets store less data than Project Silica – 1 GB per surface. However, their endurance and resistance to physical, chemical, and electromagnetic radiation attack is the same. So too is their physical storage in a robot-accessed, tape-style library, and two separate stations for writing data and reading it.
Cerabyte's tablets store data in QR (quick response) codes – two-dimensional bar codes – and the data is read by a scanning microscope. Since it is a single layer device, both reading and writing are much simpler than the Project Silica processes, which need to cope with 100 or more layers and thus have far more precise laser and scanning microscope positioning requirements.
The company says it "writes up to 2,000,000 bits with one laser pulse, enabling ultra-fast data storage and reading with high-speed cameras," but we have no actual throughput numbers.
As with Project Silica and unlike tape, the stored tablets do not need periodical rewriting.
Cerabyte has attracted investment from In-Q-Tel, Pure Storage, and Western Digital. Shantnu Sharma, WD's Chief Strategy and Corporate Development Officer, said: "We are looking forward to working with Cerabyte to formulate a technology partnership for the commercialization of this technology."
With an office in Boulder, Colorado, near LTO tape library suppliers SpectraLogic and Quantum, Cerabyte is in the heart of the US's robot archival library development area. Like Project Silica, it is, in our assessment, two to five years away from commercial product availability, but it will then be commercially available.
The DNA fantasy
Scientists have noticed that DNA stores vast amounts of information in its double helix strands. These are composed from chemical molecules, nucleotides containing one of four nucleobases: cytosine (C), guanine (G), adenine (A), and thymine (T). These four letters of DNA compose a mini-alphabet and their combinations can store data. A nucleotide is far smaller than a disk, SSD, or tape magnetic bit area, and supplier Biomemory claims that 45 zettabytes of data could be stored in 1 gram of DNA. Read about a Biomemory DNA storage array concept here.
From the archival storage point of view, DNA can last for hundreds of years in the right conditions. That's the attraction, but developing a workable DNA archival storage technology is fiendishly difficult because the mechanisms of writing and reading it are deplorably slow and complex. We are talking about chemical reactions, not electrical or magnetic ones, and need to recognize that a DNA storage entity will contain the equivalent of a tape reel chopped into millions of fragments and mixed up.
Once incoming data is coded into some representation of the DNA alphabet or structure, it is written in sequence into the molecular medium, but this output has no intrinsic structure at all. It's just molecules grouped together in little clusters and floating around inside some liquid. These fragments need to be retrieved and sequenced to recover the tiny packets of information each one contains and then the whole data item reconstituted from the fragments, a bit like receiving and reordering the packets from an internet protocol network message.
Written DNA is immutable. But reading a DNA storage medium is destructive. You need enough of it to read it several times. The reading and writing equipment is bulky and read speeds are horrendous. Chinese researchers, using a methylated DNA technique, wrote data at 40 bits per second. LTO-9 tape writes data at 400 MBps, 3,200 Mbps – 80 million times faster.
It is just not possible to conceive of DNA storage achieving an 80 million times write speed improvement in, what, ten years? Twenty years? It's a scientific daydream for the foreseeable future.
- Musk's DOGE muzzled on X over tape storage baloney
- Heart of glass: Human genome stored for 'eternity' in 5D memory crystal
- Bonus features: Sony uses Blu-ray tech to simulate 466 Mbps laser link from the stratosphere to space
- Sony, Fujifilm storage patent lawsuit is all taped up: Better LTO-8 than never, right?
Molecular electrical storage
A different molecular storage concept involves sequence-defined polymers (SDPs) and is being researched at the University of Austin, Texas. The researchers write in a Cell paper: "SDPs offer advantages over DNA. For example, DNA is limited to four monomers, yet SDPs can use a much larger set – eight, sixteen, or even more – allowing for greater information density."
Because DNA sequencing is slow and impractical for data storage, for the purposes of the study, the researchers invented an electrochemical method for decoding sequence-defined polymers, a form of plastic. Sequence components are used to represent the 256 ASCII characters and read via their individual electrical signals.
Corresponding author and electrical engineer Praveen Pasupathy of the University of Texas said in an announcement: "Molecules can store information for very long periods without needing power. Nature has given us the proof of principle that this works. This is the first attempt to write information in a building block of a plastic that can then be read back using electrical signals, which takes us a step closer to storing information in an everyday material."
The research group ran a proof-of-concept experiment, reading and decoding an 11-character password in 2.5 hours. Senior paper author and chemist Eric Anslyn of the University of Texas said: "Our approach has the potential to be scaled down to smaller, more economical devices compared to traditional spectrometry-based systems. It opens exciting prospects for interfacing chemical encoding with modern electronic systems and devices."
This is still work being done at the frontier of research, very far from being commercially viable, whether DNA or plastic. The only viable tape replacement technology in sight is optical, with Cerabyte and Microsoft racing to develop it.
Microsoft claims higher-density glass tablets than Cerabyte, but may keep its technology for in-house use. Cerabyte wants commercial glory and has convinced Pure and Western Digital that it's worth a punt. If a tape library system vendor gets involved, then Cerabyte's concept will be a step closer to being realized. ®