Relocation is a complete success – right up until the last minute
It may be a cliché to say 'Don't rest on your laurels' but you really shouldn't
Who, Me? Welcome to another working week, loyal readers, and another dose of Who, Me? – the Reg's weekly safe space in which readers submit stories of times when tech support went not quite so well as they might have hoped.
This week's hero we'll Regomize as "David" and his story has something in common with recent tales of finding out about the importance of redundancy.
You see David worked for a small but growing company. Small enough that all of its computer kit – web systems, database, backoffice, etc. – fit on a few racks in a server room. Growing fast enough that it was apparent this one room wasn't going to be big enough for long.
So the plan was hatched to move everything to a larger room in the same building. Step one was to build new racks in the larger room, wired up with cabling and patches for power and networking. Step two was to install redundant systems, where possible, in the new location.
Border Gateway Protocol was employed to make sure that customers would be routed to whichever location worked best for them, so there wasn't even a need for downtime while these redundant systems were moved.
- I made this network so resilient nothing could possibly go wro...
- Linux admin asked savvy scientist for IT help and the boffin blew it
- Compression? What's that? And why is the network congested and the PCs frozen?
- After we fix that, how about we also accidentally break something important?
Then it came time to move the one system for which there was no redundancy: the Oracle database on which the business relied. For that, a tight downtime window was scheduled – overnight – during which David and his team could physically move the machine from one room to the other.
The plan went like clockwork – the system was installed and powered up and tested with several minutes to go before the downtime window closed.
Congratulations all around, and all the exhausted techies breathed a sigh of relief – until, finally relaxing after a hard night's effort, David leaned back against the big power button on the rack. The rack with the database on it. The database of which there was only one. On which the entire enterprise depended.
Cue frantic activity as the seconds ticked down to the moment when the business would start actually losing money if all systems were not up and running.
David could not recall if it all actually came back online in time, or maybe a couple of minutes after. We can imagine time was a bit of a blur just then. What he did recall is that a plan was immediately drawn up to ensure that the database was also redundant, and that no single button could ever again take the entire company offline.
Have you ever found yourself as the dreaded "single point of failure?" Don't worry – we've all been there. Tell us about it in an email to Who, Me? and we might share your tale with other readers for some future Monday morn. ®