This article is more than 1 year old
Want a unified data centre? Don't forget to defrag the admins
And make sure you untick the 'wally' box
An effective data centre is more than just some racks of servers with a bit of networking and storage attached.
It needs to be versatile, easy and quick to flex and reconfigure, both manually and automatically, and it needs to keep up with the demands of the applications that run there.
Historically, though, many of the components of the data centre have been purchased and installed separately. How can we pull all these components into a coherent whole? How, in other words, do we defragment our data centre?
If it’s broke, fix it
There is no point in trying to build a fence with rotten planks. No matter how well you nail them together, you will be playing hunt-the-tortoise before you have put your toolbox away.
So before you start considering how to make your kit work together, you need to ensure that the standalone elements are up to the job and configured in such a way that they stand a chance of working as hoped.
This is often a very simple matter of checking for obvious problems. Over the years I have seen some interesting setups: one example was a data centre LAN with servers and storage connected via multi-gigabit trunks, whose layer 3 functionality was offloaded to a low-end router that couldn't route at more than a couple of hundred Mbps.
Another was a chassis-based server setup that aggregated hundreds of virtual machines over a puny 4Gbps uplink.
There was also the dual-Gigabit LACP trunk where they had forgotten to enable LACP on one end; and a backup solution that would have gone twice as fast if someone had configured some simple network parameters properly. (We will come back to that last one later.)
So before you do anything take a step back and say: “Have we done something daft?” And if the answer is yes, fix it. Don't take forever over it but at least make sure you have unticked the “wally” box.
Once things are looking a bit more sensible, you need to consider the touch points between the components of the infrastructure.
My personal favourite is where VMware ESXi hosts connect to the LAN. With my network manager hat on, I once spent a happy afternoon with the server guy and Mr Google. At the end of it we had quadrupled the speed of some LACP-trunked ESXi hosts just by working methodically through the LAN port config on both ends and reconfiguring it.
This is where you need your subject matter experts to come together and collaborate – perhaps for the first time. The backup instance I mentioned earlier was a classic illustration of the need for collaboration.
You need the configuration to be the same on all the components: the link from the storage to the backup server (if it is iSCSI connected), the backup server operating system, the switch and router ports at every point from the backup server to the device being backed up, the virtual switches (if you are in a virtual environment) and the guest server operating system.
One misconfiguration and everything is going to go at the speed of the lowest common denominator. To get it right you need the storage, network and server guys to get their heads together and set it up together.
Extend the thinking to all the core touch points, then, starting with the obvious ones. Some examples are:
- Storage/servers: get the trunking and framing optimal and if you are using software-based iSCSI initiators in a virtual server setup, buy yourself some hardware iSCSI-capable adaptors instead.
- Storage/network: trunking and framing again, and if you are using SAN-based Fibre Channel switches, zone them correctly.
- Servers/network: trunking and framing yet again. Validate the bandwidth available to the servers and if it is multiply connected, monitor the traffic and tweak the settings to spread the load nicely. You will sometimes find that one link gets hammered while others do nothing, which is of course inefficient.
Consider also the number of places where your servers and storage touch the network. In a virtual server setup there is really no excuse, for instance, for backing servers up over the production subnet/VLAN – particularly if the backup server is on a different VLAN from the devices being backed up. It just means your router is being hammered.
Instead drop in a dedicated VLAN for backups, plumb it into your vSwitches (no downtime required) and add a dedicated backup NIC to each of the servers you are backing up (again, there is generally no downtime). And if you are being really sensible you will do backups at hypervisor level anyway, instead of agent-based ones on each virtual machine’s guest operating system.