Cloudy storage provider Dropbox has enhanced its bit barns with a tiered storage architecture that divides the contents of the platform into frequently accessed "warm" data and "cold" data, with the latter less likely to be disturbed.
The storage shop has changed the way it does replication for older data, cutting the amount of disk space needed by 25 per cent less disk, while providing the same levels of reliability and just slightly longer access times. One would hope the savings will be passed on to customers.
"The end experience for users is almost indistinguishable between the two tiers," the company said this week.
The cold data repository is based on the same dedicated Magic Pocket infrastructure that Dropbox announced in 2016. It features servers operated by the company, rather than standard infrastructure from AWS, which Dropbox used to store files in the early days of the platform.
AWS offers its own cold storage service called Glacier – designed for the enterprise and billed separately from the company's Simple Storage Service (S3). Dropbox has created something very different and built cold storage into the back-end so data tiering is achieved automatically – the customer can't decide how their files will be classified.
Dropbox plans to drop encrypted Linux filesystems in NovemberREAD MORE
In a technical blog post, staff engineer Preslav Le explained that over 40 per cent of all requests on the company's platform are for data uploaded in the last 24 hours, and over 90 per cent of requests for data uploaded in the past year.
One of the simplest ways to cut the costs of storing older data was to tweak replication. Previously, full copies of all files were replicated across multiple data centres, located thousands of miles apart. "We needed to somehow remove the full cross-region replication, but still be able to tolerate geographic outages," Le said.
After several experiments, Dropbox came up with a model that split storage blocks into fragments and striped those fragments across multiple regions. To get a block, the system issues a get request to all three regions, waits for the fastest two responses, and cancels the remaining request.
"The most obvious downside of this model is that, even in the best-case, we can't satisfy a read without fetching fragments from multiple regions," Le said.
"Overall, the results we got significantly beat our expectations. Such a small difference would not affect the end user experience, which is dominated by transferring data over the internet. That allowed us to be more aggressive in what data we consider 'cold' and eligible for migration." ®