This article is more than 1 year old

You must remember: An archive isn't a thing, it's a strategy

Think of the future, edit the past

The equipment

The kit you'll need to implement your archiving strategy depends on the decisions from the previous section. So if you're using cloud-based storage you'll want to go for a storage “gateway” appliance that integrates your on-premise network with the cloud storage. The idea is to present the cloud storage to the local world as if it's just any other on-premise fileshare, so that pretty much any software archiving package you choose will be able to access it without special plug-ins for cloud storage APIs.

If you're going for on-site storage, understand how it'll evolve in the future. If you're archiving to tape then be aware that technology marches on and plan for obsolescence of the hardware you use for reading the tapes. You generally find that a tape drive can read tapes that are one or two versions old but not necessarily go all the way back to the start of the format. So for instance an LTO-5 drive can read LTO-3, LTO-4 and LTO-5 tapes, but not LTO-2 or earlier. And if you're using disk storage for your archive, make it sufficiently resilient that if the next person to occupy your desk wants to upgrade it (e.g. by replacing the disks with bigger ones) they're able to do so easily and without having to do a full dump and restore.

The format for the data

Finally, let's revisit something I mentioned in the requirements list: what format(s) to store the data in. Let's have a quick summary of why we care so much about the data format – in fact let's quote Microsoft TechNet directly as an example: “SQL Server 2012 supports upgrade from only the following versions: SQL 2005 SP4 or SQL 2008 SP2 or SQL 2008 R2 SP1. If you try to restore a backup database from SQL Server 2000, you will get the error number 3169”.

So, then: if your archive contains data that was dumped as a SQL Server 2000 backup file (which could well be less than ten years old – you may not have upgraded the server as soon as SQL Server 2005 came out) you can't just hoover it off the archive and pull it into the current SQL Server version. In fact TechNet recommends restoring it in two steps using SQL Server 2008 as the intermediate version.

And that's just the first example I picked: there are bazillions of apps that have the same issue. At the very least, then, understand (and document for the next incumbent) what versions of the applications you'll need to have to hand in order actually to use the data once it's been restored.

Where it's going

I've failed in a small mission I set myself when I sat down to write this: that was to see if I could find a copy of an old article I wrote years ago about an idea for generic data storage.

The idea was pretty simple. There are loads of different applications, each of which has its own data format, so exchanging files could be a pain. So for example there have been loads of spreadsheet applications over time, and any such app you load these days has to have converters for loads of its competitor products. A linear increase in the number of application formats means a geometric growth in the potential number of formats needing to be supported. So why not define a rich format into which any application could save a document for reading by any other app that supported that single, rich format?

And sure enough, along came formats such as OpenDoc and ODF which satisfied precisely this need. Expand the concept a little, though, and it's not a vast leap of logic to think to yourself: if we exported (say) the data from our SQL Server database into an XML format, with the appropriate schema definitions and metadata, we'd have no need to worry about supporting an old SQL Server format in ten years' time when we come to restore it: XML will still be with us, and it's basically a text file and is dead easy to open and parse. And this is precisely where enterprise archiving is now heading.

In short

An archive, then, is not a thing. It's part of a data retention and deletion strategy and needs strong consideration if you're not to cut off your options for the future. Happily, though, futureproof storage that's well protected is now readily available and the hardware for accessing it isn't hard to install or use.

The hardest bit isn't storing and retrieving the data, though – it's reading it once you get it back onto live storage. So if you're starting now, look to the future and go for a generic, rich formatting option: I reckoned it was the way forward for data interchange when I wrote about it donkey's years ago, and I definitely reckon it's the way archiving's going now. ®

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like