What can an administrator/server owner do?
Unfortunately, the main thing to address is at the process level. The server provisioning process should address the requirement for backup of the server.
Be clever, on the form where backup is concerned it should be opt out rather than opt in. This prevents any: “Oops, I really did mean to tick that box” claims. Ideally, the requester should document why backup isn’t needed and have approval from an appropriately senior person.
Such a component on a form at least forces the question to be addressed. However, from an administrator point of view it also covers you against anyone trying to shift the blame onto the poor admin.
Also, don’t be so rash with the delete button. Admittedly, it is not strictly related to backups but it can save you a lot of restorations and potential tears. When you have a decom land, leave it in the estate but powered off for a reasonable time period.
I have lost count of the number of times a sheepish application owner has come to see if “by chance” we still had a copy of the machine they were very keen to get rid of last week so that they stopped paying the server maintenance fee. “Just a couple of files we forgot, any chance of a restore?” is a surprisingly frequent question, unfortunately.
As for testing if the backup is working and the validity for machines, there does come a point at which doing test restores become onerous, and trying to do it would eat so many cycles that there wouldn’t be enough time to actually test the restores. So, how do administrators who have hundreds of backups cope?
The key when you are getting too many backups is to decide which systems are either high risk, or key infrastructure, and working on the backup side appropriately.
Those key systems (such as serious revenue generation infrastructure) should be tested twice as year as minimum. The high risk category is something that a lot of people perhaps don’t think about.
When you have third parties running some parts of your infrastructure the backups are often seen as outside your controls. In an ideal world, your contract with the third party should have a clause to ensure that you get copies of not only your data to do test restores, but also any critical third party source code should be held in escrow.
Whilst it may be seen as a bit of overkill, these are systems outside your control so you need to be able to have a firm hand on your data and applications. Frequently we read about vendors going to the wall. What’s your get out strategy if it happens to you?
Any administrator worth his or her salt should, without doubt, be running a crosscheck of inventoried servers versus those listed, active and working on the backup catalogue. This would highlight any discrepancy that can then be investigated.
Being proactive is the key,because as we said earlier once the data is gone, it is gone.
Admittedly, some systems are that complex that a basic backup and restore regimen would not be sufficient. In the days of multi-tiered architecture and multiple services, restoring an individual component wouldn’t be a very good test. In these situations periodic DR tests would be an ideal test method.
Backup is one of those critical items that tends to get neglected as it is not revenue generating or particularly high visibility. That is, until the data isn’t backed up and restored and difficult conversations have to be had.
It is up to administrators to cover their backside and make sure that the documentation proves that the requester declined the backup. That point, I simply cannot stress enough. ®