Scientists in the US, working alongside Microsoft, have managed to encode "hello" into a readable strand of synthetic DNA, using a fully automated data storage system.
The unusual apparatus to perform this work was created as a potential first step to eventually bring the technology to data centres. The experiment is described in a scientific paper, published on Thursday in Nature.
"Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service — bits are sent to a datacenter and stored there and then they just appear when the customer wants them," said Microsoft principal researcher Karin Strauss.
Due to its structure – containing billions of combinations of just four nucleobases – DNA can serve as very slow but incredibly dense and long-lasting storage media. The idea was first proposed back in 1964-65 by Soviet physicist Mikhail Neiman, but the technologies required to make it happen would only appear in the late 2000s.
The University of Washington and Microsoft research team behind this latest experiment previously said DNA-based storage could fit the contents of an entire data centre into a sugar cube-sized unit, and would last reliably thousands of years. However, retrieving data from individual strands using a DNA sequencer is a long process, currently taking around 10 hours.
Microsoft and the University of Washington have been collaborating on DNA-based storage since 2015, and claimed to have stored 1GB of data in DNA to date – everything from great literary works to cat pictures.
DNA as storage? Old and boring. Boffins now chaining monomersREAD MORE
The research team doesn't make its own synthetic DNA – it purchases long-chain oligonucleotides from specialized biotech firms. To support the experiment, Microsoft bought 10 million strands from from Twist Bioscience in 2016, and another 10 million in 2017.
The boffins said their DNA encoding process relied on automatic synthesizers and sequencers, but it still required a lot of manual labour in the research lab – which they insisted has now been eliminated.
"You can't have a bunch of people running around a data centre with pipettes — it's too prone to human error, it's too costly and the footprint would be too large," said Chris Takahashi, senior research scientist at the UW's Paul G. Allen School of Computer Science and Engineering.
The system converts the ones and zeroes into cytosine, guanine, adenine and thymine, and then uses what researchers describe as "largely off-the-shelf" lab equipment to flow the necessary chemicals to the synthesizer.
When the system needs to retrieve information, we are told, it prepares the DNA using "a novel, minimal preparation protocol" and pushes it into a machine that reads the sequences and translates them back into ones and zeroes.
"Having an automated system to do the repetitive work allows those of us working in the lab to take a higher view and begin to assemble new strategies — to essentially innovate much faster," said Microsoft researcher Bichlien Nguyen.
This is not the only team working on automated DNA-based storage: last year, an MIT startup called Catalog said it was designing a machine that could write a terabyte of data a day, using 500 trillion molecules of DNA.
There was no indication when this will be put to commercial use. ®