When it comes to backups two sayings are worth keeping in mind: "if your data doesn't exist in at least two places, it doesn't exist" and "a backup whose restore process has not been tested is no backup at all”.
There is nothing like a natural disaster affecting one of your live locations to test your procedures.
I have just had to deal with this; let's take a look at how.
To have a discussion about backups we need to start with what we are backing up and why.
The client in question has two sites, one in Edmonton and one in Calgary. Each site is serviced by a fibre pipe – theoretically capable of 100Mb in emergencies but throttled below that to meet our ISP agreement.
If we keep the cumulative usage between both sites below 'X'Mbps (measured at the 95th percentile) we can use all the bandwidth we want. No caps, no per-GB billing. It is a nice, predictable cost that we can easily manage.
More critically, in case of "oh no!", uncapping those pipes so that each can use the full 100Mb takes a single command.
Each site receives large quantities of information from customers for use at that specific location. This information is made highly available to deal with hardware failure but is not replicated offsite.
We spec our bandwidth only slightly above what we need to handle inbound data; we could never afford the cost of cloud storage, even if we are sending to one of our own datacentres.
Since we have all this infrastructure in place to meet local needs it seems silly not to host our public-facing websites and IT services on our own infrastructure. We have private clouds at each location, gobs of storage, UPSs and a fat pipe. It isn't exactly Amazon, but it should be workable.
None of the databases for our public websites can be set up for live replication because that would require rewriting code to accommodate it. For various reasons that won’t happen any time soon.
So backups are down to cron jobs running on each MySQL server to create regular database dumps, zip, encrypt and then fire them off to our file server.
At the same time the codebase for each web server undergoes a similar backup. The file servers in question are Windows systems running distributed file system replication (DFSR), which does a marvellous job of replicating the backups.
Each site has two identical file servers in a cluster and they both have a copy of the files. The files are then fired across the WAN to the other site where it lives on a pair on that site's file server cluster as well. At this point, I'd say we are pretty well immune to hardware failure.
A backup server at the head office runs a truly archaic version of Retrospect that creates versioned backups to protect against Oopsie Mcfumblefingers, malware or other such issues that might delete the backups in the DFSR share. So anything that is placed in the backups directory on either site is automatically replicated to two systems per site and versioned.
Potential personally identifiable information is encrypted – both in the database and in the rarballs (files that have been compressed or packaged using rar compression) – and none of it leaves corporate control.
So far, so good; better solutions certainly exist, but with no budget I think it gets the job done.