Data AWOL? Thank God for backup. You backed up, right?
Exposing the hidden crisis of the virtual age
Backup is a fundamental component of a healthy infrastructure. I admit backups are neither cutting edge nor sexy but they are important. It is an often-quoted statistic that of the companies that suffer serious data loss, one third go out of business within three years.
Actually, it’s worse than that: 94 per cent of companies suffering from catastrophic data loss do not survive, 43 per cent never reopen and 51 per cent close within two years, according to stats from the University of Texas.
Sure you have backups, right? That’s all well and good until the data you need restoring wasn’t backed up due to some oversight or failure in process.
Best practice states that backups should be closely watched and tested monthly. In reality, when a company has hundreds of systems to maintain the quarterly best practice test routine can and often does go out the window in favour of other “more important things to fix”.
What is worse than that, though, is when that fateful day arrives when a restore is needed (and it will at some point) and it comes to light that there is no backup of the required data. It's usually at this point that it transpires that nobody ticked that box on the relevant server request form to get the backup included.
And even after a bout of some potentially expensive data recovery there is no guarantee that the data is coming back, and even then in a consistent or usable state.
The issue is exacerbated within a virtual environment where machines can be spun up and disposed of almost at will, especially in poorly controlled environments. When discussing backup, control is everything.
Someone has to be responsible for making sure the backups are done and a tendency to pass the buck on mundane tasks can mean in smaller organisation if someone isn’t responsible for it, a failure to capture becomes more likely to occur.
The failure to capture exists in larger companies, too, but in a different way. Larger companies have people whose sole job is to ensure the backups exist. The problem, though, is that different administrators deal with different portions of the server provisioning process – well defined and strong process that is even more critical in this type of environment.
A typical example of larger scale “failure to capture” is a scenario where virtual machines that are created outside the official commissioning process and may not be covered. Often “temporary” or “test machines” have a habit of being moved from proof of concept to production to reduce time lines.
These machines are often setup under the guise that there is no important “stateful data” as it is just a basic test system.
Again, it only takes one failure to tick a box or inform the correct group of people and the backup job fails to get submitted and it is only a matter of time until the backup failure rears its head at the worst possible point.
External cloud systems are even more likely to suffer from this type of failure, especially when there is an addition cost associated with backups. To make matters worse the easy in, easy out nature of cloud makes it a lot easy to overlook the little things ... like backup.
It is this kind of scenario that can result in a backup plan slipping through the net. Dealing with this kind of issue tends to be a bit more complex. This tends to happen in companies that don’t really have stringent controls. It is more a process issue than a technical issue and any company that is looking to address