Fire, flood and vomit: Defeating the Great White Whale of Fail

Got a plan? Better get one quick


I've met a lot of IT people over the years who have a problem comprehending what Business Continuity (BC) actually is. On one hand this is fairly understandable, since to the average IT person “continuity” means making their systems robust and resilient so they can live with a power cut or the loss of one of their sites.

Real IT people, on the other hand, realise there's more to life than the technology: in fact the technology is there to serve an overall business that's usually several times greater than the IT function.

Real IT people are part of an overall BC strategy, and play an active part in making it work, but importantly they almost certainly don't own or manage that strategy – the business types do that.

In this feature we'll look at the essential components, considerations and tasks in an effective BC setup.

What can go wrong?

Before you can do anything you need to understand what you're protecting against. And in this context you're considering not what can go wrong with your systems but what can go wrong with your business.

Let's have a few examples, and start with an obvious one: a power cut in your main data centre that causes systems to go down. The obvious way to protect against this is to have secondary systems that live elsewhere – in your office, perhaps, or in a geographically distant data centre, or maybe in the cloud.

The impact of systems being down is pretty clear: unless you have failover in the systems, the staff will be sitting at their desks unable to do any work.

Next on the list is your business premises becoming unusable. Sometimes this can be spectacular – for example in April 2015 an electrical fire in central London forced dozens of businesses out of their offices due to the resulting power cuts, and the ensuing downtime while the power company fixed the issue.

Generally, though, the reason is quite boring. Yes, power cuts are a common one but these mostly tend to be short-lived (a few hours at most). Flooding is far more time-consuming to deal with, though, since the lower floors of office buildings tend to house important stuff like the telecoms entry point and the power distribution centre which can take weeks or longer to dry out. And if you're unlucky enough to have a fire in your premises, it's a bit of a lottery whether this will be fixed in hours or days.

Finally, you have the people problem – specifically, illness. We've all sat in offices where half the population is sitting sniffling and wheezing with the latest bug doing the rounds – and, of course, the following week they're better and the other half is Phlegm Central.

But what about a “proper” virus outbreak? Norovirus and Norwalk are famous as diseases that can spread quickly within a locality (particularly the latter – it even takes its name from a town in Ohio where that's precisely what happened). So let's have a look at how we can mitigate each of these problem areas.

Systems

Protecting systems against failure is a simple application of cash and training. In the average organisation there is absolutely no rocket science around making systems resilient so that in the event of a failure you can use a secondary system.

This doesn't mean that you have to spend hundreds of thousands of pounds on systems that will automatically fail over when a problem occurs and then magically fail back. Every infrastructure I've worked with has had a variety of different approaches to failover depending on the impact of a failure and the cost of addressing it.

So, if you have an online sales portal that generates significant revenues you'd probably look to make it seamlessly resilient – for instance with clustered database servers and load balancers – so your customers don't see a failover.

On the other hand, you might be quite happy to have your back-end finance system, which is used by staff for reporting and is far less dependent on real-time operation, in a less resilient setup so you don't pay through the nose for two database software licences.

Of course when you're considering systems you also have to think of the premises aspect because if your office is flooded then the systems will be down anyway. I used to work for a company whose IT director insisted that none of the key offices were to contain any server or storage devices – these were all to be housed in data centres.

They could have network devices to provide LAN services and for connecting to remote sites, and they could have a phone system at a push. The reason was simple: data centre space was relatively affordable and offered far better fire, flood and power protection than the offices.

Premises

There are two aspects to premises being unusable: systems being down and people being unable to be there.

Companies have offices for a reason: it gives people a place to work where they can conveniently interact with their colleagues, where it's easy to support their administrative and technology needs, and where customers and suppliers can visit.

Home working is becoming increasingly popular but it's still a minority sport because, actually, having people in the same place is bloody convenient. In a crisis, though, you need to be flexible.

Ask yourself what would happen if your office became unusable – that is, the staff weren't allowed in for some reason. You'd have three choices: tell them not to come in; have them work from somewhere else; or ask them to work at home.

I was discussing BC recently with someone who's responsible for it in their organisation. Their head office is some way from any of their other premises – sufficiently far that they can't just get a few dozen people to decamp to another office. So they spend a chunk of money a year renting space in a separate premises that's nearby but far enough away to probably not be affected by the same disaster as the company office.

There are about 50 seats in two rooms, with PCs, phones, printers and fax machines (the secondary controller for the phone system's also there, on the assumption that if people can't get into the main office, the primary phone controller's probably down). In the event that the office is unusable half the staff are sent home and 50 or so core, key people decamp to the BC centre.

Working at home is way easier these days than it used to be, as remote access and cloud technology have become way easier and far more commonplace than used to be the case. As I've mentioned, you probably wouldn't want all your staff to clear off home as communication becomes awkward, but the cost of a BC suite to house everyone is usually prohibitive so you need to compromise.

People

If your business crisis is induced by people, this will usually be related to illness. In these cases home working is absolutely the way to go, because infectious illness is a no-brainer reason to not bring people together in an office. Enable them to home work and encourage them — as long as their doctors are happy that they're able to work — to do so until they're better. And consider letting the well people home-work as well so they can't catch anything from the infected-but-not-yet-poorly mob in the office.

What works differently

When it’s not business as usual you need to be mindful of this fact and behave differently from normal because there will be things you can't do as well as usual. And even if you can actually work pretty much as you'd hope, this may not be obvious to your suppliers or customers or (and this may or may not be relevant to you) the public. So as well as trying to keep the company running as well as possible, you need to communicate this to those you work with outside your organisation.

Next page: The BC team

Other stories you might like

Biting the hand that feeds IT © 1998–2021