This article is more than 1 year old
New DNA 'hard drive' could keep files intact for millions of years
Microsoft and genetics boffins predict genetics in the datacenter
Researchers at the University of Washington (UW) and Microsoft have managed to write data directly onto DNA, a format with dramatic storage densities and a very long life.
The team wrote 200MB onto strands of synthetic DNA, including video footage of the band OK Go, the Universal Declaration of Human Rights in more than 100 languages, the top 100 books of Project Gutenberg and the Crop Trust's seed database. They were then able to successfully read back the data using error correction code developed by Microsoft, and could do so again long in the future.
"We've seen evidence that this could last intact for thousands of years," Karin Strauss, Microsoft's lead researcher on the project, told The Register on Thursday. "Synthetic encapsulation is very temperature-dependent, but at 10 degrees Celsius the DNA won't degrade for around 2,000 years, and at -18 degrees it could last for millions."
The technique uses a DNA synthesizer that encodes information onto the four bases in DNA – adenine, guanine, cytosine and thymine – allowing large volumes of data to be stored at microscopic detail. The 200MB archive was stored on a piece of DNA the size of a couple of grains of sugar. The synthetic material was encapsulated to protect it and to prevent degradation.
Previous research by UW and Microsoft has estimated that the "raw" storage limit of DNA is an exabyte per cubic millimeter. That said, it takes a long time to actually read the data – hours at a time – so this isn't going to replace Flash any time soon.
Given a medium so delicate, getting the data read again means dealing with error rates, and so Microsoft's coders came up with an error correction system that allows the data to be taken off the DNA storage system in a usable format.
Don't expect this type of technology in your laptop for a good few years yet – the machinery needed to synthesize DNA to write data, and then sequence it to read the information back – is still massively expensive. But that is changing.
"DNA sequencing costs are lowering way faster than Moore's Law has cut the cost of computing," Luis Ceze, the UW's Torode Family Career Development professor of computer science, told The Reg.
"The technology for reading DNA is also improving fast. We don't see any reason why it can't be fast and cheap enough for commercial storage – particularly as by showing DNA storage is viable will create a greater incentive to use it."
The most likely first applications for this will be in the data center, he predicted. Having a DNA synthesizer and sequencer in situ would allow companies looking to archive their data for long-term storage to use DNA for holding onto petabytes of data at a time with very little physical storage space required.
The research was funded by Microsoft Research, the National Science Foundation, and the David Notkin Endowed Graduate Fellowship. ®