Are we ready to let software run the data centre?
SDN reaches maturity
Software defined networking (SDN) gets a mixed press. Proponents declare it has given them more flexibility than ever before and that the time from inception of an idea to system implementation is vastly reduced.
When the opponents have their say, the story is that the way you make a network work properly is to have a network specialist configure and manage it.
So who is right, and will SDN continue to grow as a concept?
To find the answer, we have to take a few steps back, have a metaphorical run at today and see where we might land in the future. (And if you are wondering whether that is the most tenuous analogy to appear on The Register, you are probably right.)
In the very early years of data centres everything was static. At first we had shared media networks, but even with the introduction of LAN switches you still had to physically re-patch connections.
The IEEE 802.1Q standard, first published in 1998, brought with it the ability to run multiple layer 2 “collision domains” (and multiple layer 3 subnets) on a single physical port and link and hence configure such things using only software, with no need to re-patch anything physically.
This was a big deal because all of a sudden it was economic to have several collision domains and hence several subnets. (Although you could, even then, have multiple subnets in a single collision domain, you wouldn't want to – the point of subnetting was to segment networks to keep performance up.)
This meant the network guys could stop assigning people static IP addresses for their servers and could instead assign entire network ranges. If you know different departments are all on their own subnets, you can give each of them, say, a Class C IP block and let them get on with it.
This was just a small step, though. The existence of IEEE 802.1Q didn't suddenly make the ancestors of SDN wink into existence. 802.1Q started to become popular on uplinks between switches and other switches or routers, but other connected devices generally didn't do any funky VLAN trunking.
Think back to 2005 or thereabouts and how you implemented disparate networks on a single server (for example separate virtual networks for data, management and backups): nine times out of ten the answer was that you had three network adapters in each of your servers.
The network team still patched them physically at the switch end for you. Support for 802.1Q at the server operating system was limited, and many of the implementations that did exist were a bit crap because it was generally the NIC driver and not the operating system that implemented it.
Things changed in the late 2000s when the virtualisation vendors finally made their virtual switch implementations enterprise-standard. VMware's vNetwork Distributed Switch came along in ESXi 4.0, and Microsoft was coming up the rails around the same time with its Hyper-V implementation.
With proper distributed virtual switching, the server guys could all of a sudden do their own networking stuff. And they still do.
The advent of proper virtual switching in the server farm brought with it a bunch of network guys wailing that the server guys shouldn't be allowed to play at networks because they would break the world.
Happily there were server experts who also knew about networks and could slap the network bigots down by pointing out that the server specialists could only break the network if the network managers let them.
Think about it: in the old days network management gave the server team a subnet into which they could put their systems. With a virtual world they can be given several subnets – or even a supernet that they can slice up themselves – and segment their servers using virtual routers to connect the subnets together with the packets not even touching the cables, switches or routers.
But as long as the network managers restrict the set of VLANs that their switches permit on the ports to which the servers connect, there is not a lot the server guys can break.
And if they do break anything, the impact is only on the server subnet(s). So if, say, they give a server the IP address of the default gateway they have only themselves to blame and they don't break anyone else's bit of the world.
Shape of things to come
The technologies and standards now exist to take things a step further and allow the server infrastructure to reconfigure the network switches and routers themselves.
That technology is here today, true, but do you actually know anyone who is using it? Okay, it is starting to be offered by some service providers and some organisations are starting to use it for infrastructure implementations, but few of us have met anyone who actually does it for real, rather than in an experimental way.
That said, it is definitely happening and it is something you want to start getting to grips with. Proper equipment vendors – Cisco, EMC, VMware and so on – are implementing platforms and frameworks such as OpenStack. You can be confident that it is a good idea in principle if a concept has reached that level of uptake by manufacturers.
Although you can't be sure that it will become a standard in its current form, you will end up before long with something based on what you see today in the standards pot.
Vendors don't like chucking work in the bin unless it really turns out to be a dead duck (ATM to the desktop, for example, or 100VG-AnyLAN) so they will re-use the good bits and alter the bits that turn out to be sub-optimal.
Now, the idea of the server infrastructure being able to affect how the LAN is configured is on the surface scary. After all, the server guys are less specialised in networking than the network guys, so isn't there a danger that things will start to go awry, or at least be not as good as before?
Answer: yes, there is, but it is a danger you can protect against. Also, by focusing the network management people in a different way you can make their lives more interesting and increase the scope of what they get to do. The network team will move up the ladder and become a service provider.
The first thing to bear in mind is that if you implement a network in hardware there is a gulf between the routing requirements of the systems guys and the knowledge and effort it takes to actually make it happen.
A couple of years ago I replaced some end-of-life network switches with a pile of new Cisco 4500 and 6500 devices. Part of the cost was for a Cisco specialist to come and do all the configuration of the equipment, subnets, VLANs and the like.
Was the design anything special? No. I drew it on the whiteboard in 15 minutes and it was just a pile of VLAN/subnet definitions, some low-level topology that needed a bit of Spanning Tree funkiness and some pretty straightforward (and static) routing tables. But implementing that design on the equipment was not a simple task.
The chances of breaking the network with such simple requirements are minimal
So what kind of stuff do server teams want to do? In general they have pretty simple requirements too: running up a new Virtual Data Center in their VMware world, maybe, and having a new subnet and VLAN to put it in.
The chances of breaking the network with such simple requirements are minimal, and with OpenStack and similar technologies why shouldn't they be able to do it themselves?
They can do it within their virtual server platform's virtual networking engine – it is only when the packets have to step outside their blade chassis (or whatever) that the network is needed at all.
In extreme cases SDN can be used to provide layer 2-like functionality on top of a layer 3 network. You have endpoints in different locations connected via a WAN, or perhaps an Internet-based VPN, with technology at each site that tunnels traffic between locations so that it looks like a pan-site layer 2 network to anything attached to the relevant VLAN(s) at either site.
So again, why can't the server guys simply use this virtualised network world and do what they want on top of it?
The answer is that they can, and once people catch onto SDN this kind of thing will take off and soon become normal behaviour. The enabler will be what I mentioned earlier: the network team becomes the service provider, not part of the infrastructure team.
Let's take an example. Imagine you have a corporate WAN connecting three or four sites. Each site has an edge router that links to the WAN connections and deals with the routing of traffic between sites. Your local site routers and switches connect into the “inside” side of the edge router at each location.
Now, whatever you do in your network at each site, you will find it impossible to break the service provider's network (let's assume you don't just unplug the router and throw crap down the line – that would be cheating).
Okay, you could do something disagreeable in the subnet between your router and the provider’s, but it would only break your world. This applies not just in a routed world but also in a layer 2 setup.
In the past I have been a customer of a virtual private LAN service provider which offers end-to-end Ethernet services with inter-site links based on VLANs.
Could I break the provider’s world? No, because it used IEEE 802.1ad “Q-in-Q” technology, so its VLAN setup ran at a level below mine. Whatever I did with my VLANs was effectively being tunnelled through a low-level channel in its network
Now abstract this into your own model and it is obvious: you need the network team to stop working at the infrastructure level, take a step back and become a service provider.
The human touch
And so we reach what initially appears to be a paradox: we don't want the network team to go away, but we do want the server team to be able to reconfigure the network.
In fact it goes further than that: we really want parts of the network to become the responsibility of the server team.
Let's turn this on its head and ask what parts of the network can't be managed by the server team. The answer is, unsurprisingly, the bits they don't care about.
So for instance the network team should still deal with inter-site links and any failover between the two using protocols such as BGP or EIGRP. Or where a primary link is MPLS-based with an internet-based VPN as the secondary for failover purposes.
The server people don't care how the network is configured: they just want some data pipes they can throw data down in the knowledge that it will route to the right place.
All this behind-the-scenes stuff remains the remit of the network team, who now have to start understanding how to implement multi-tiered networks, even within the LAN (historically the tier boundary was the service provider's edge router).
This is not a big deal: if you understand how to configure BGP, dynamic routing and the like on whatever brand of kit you use, it is not a vast leap of technology to understand the necessary additional protocols.
And helpfully, with modern kit you can often virtualise the routing function of your organisation within many devices you already own. So, for example, in many Cisco devices you can use virtual routing and forwarding to have a single router supporting several apparently different infrastructures, even with overlapping IP ranges.
Will this functionality – wide area link resilience and the like – ever become part of an SDN setup? I doubt it, because the network is likely to become highly chaotic.
Service providers have interconnects configured the way they are for a reason, which is that they have a good compromise between control and resilience.
One school of thought says that letting technology choose paths is generally more efficient than letting humans do it because technology can make faster and more consistent and accurate decisions.
If we imagine any big network (the internet, for example) as a global network that decides for itself how to route every packet from A to B, though, it is easy to envisage something somewhere making a dodgy decision and, to use a technical term, everything going a bit wrong.
But at a smaller scale – and that may mean at country-level, not just organisation-level – automated reconfiguration may be an option eventually.
IT systems and concepts have always kept up with what is possible and have depended on a combination of innovation and slogging R&D to move forward.
For instance in 1995 the concept of a VLAN was pointless: we had 10Mbps shared media networks and even a basic eight-port hub cost nearly £700, so a VLAN-capable bridge would have been an expensive way to put extra traffic over a network that didn't have the capacity.
Wind on through the years and although it has been theoretically possible for a server infrastructure to reconfigure a switch (even without ideas like OpenStack and commonality between vendors one could have done funky things with scripts, TCP libraries and proprietary IOS commands), we didn't do it because it was dangerous.
It stood a high chance of screwing up the corporate network. Also, it was a specialist job to do such things as making the code intelligently consider in what order to auto-configure the switches so the script didn't saw off the network branch it was sitting on and writing quite complicated logic into your scripts.
Now we have the ability to implement multiple tiers of networking within our organisation. We can build layer 2 tunnels between sites over layer 3 networks. We can run multiple virtual routers within a single physical router.
We have protocols (admittedly often proprietary ones) that let us do things such as creating a VLAN in one switch and having the infrastructure propagate that to all other devices in the right order to avoid Spanning Tree and the like blowing things up.
And the vendors are implementing technology that allows a far simpler way than ever before for the server-oriented parts of the infrastructure to adjust settings and flows within routers and switches.
So it is absolutely time for the servers and the network to come together under what used to be called the server team, and for the network team to become their service provider. ®