Why blades need enterprise management software: Learn from Trev's hardcore lab tests

A morality tale

13 Reg comments Got Tips?

The value of enterprise management associated with modern blades has been made apparent to me. At the same time, I understand the value that "unblade" systems, such as the Supermicro Twin series or Open Compute systems, can bring.

Cost, and what you plan to do with the things, are, as always, the determinant, but there are no clear winners and no greedy villains to be found here.

I want to go back in in time a little: to July 2014, and the run-up to VMworld. I had just bought a brand new Supermicro F627G2-F73PT+ FatTwin, one of its GPU-capable models.

Micron chipped in a bunch of M500DC SSDs and Nvidia a pair of GRID K2 cards. Supermicro chipped in an SSE-X3348T switch and my lab was almost starting to look respectable.

I occasionally enjoy testing consumer stuff: Bluetooth speakers, for example. But I have been repeatedly asked to test equipment bound for the mid-range and entry-level enterprise.

My dreams of fulfilling the request of my readers were jubilantly euphoric. I was going to test server SANs, GPU-enhanced VDI and a list as long as my arm of other things.

If you search for reviews with my name on them, you will notice a dearth of them in the past few months. The reasons are as much business as intricately technical, but one thing sticks out: looking back, the sort of enterprise management software that ships with high-end blade servers would have saved me months of effort and tens of thousands of dollars.

Trial by ordeal

For one example of the past few months' worth of efforts in prototyping new builds and testing components, let's look at my testing of Nvidia's GRID cards alongside VMware's Horizon View.

I ultimately ended up with a system that showed how amazing this setup could be but the journey to get there serves as a great demonstration of why GRID cards tend to be available only in pre-canned systems.

To get Nvidia GRID cards working properly you need three things: the card, a hypervisor that plays nice with others and a BIOS that supports the whole affair. When you put it together, it is amazing.

I don't care what the VDI naysayers think, Nvidia GRID-powered VDI is such a beautiful technology that it burned through a decade of jaded cynicism and reminded me why I got into IT in the first place.

In the current iteration you need to get the combination of components and configurations right. If you want to do GPU-to-virtual machine pass through, then in the Supermicro BIOS make sure that "above 4G decoding" is disabled.

Look through this PDF for the pciHole.start = “2048” information and apply it to your virtual machine. Do not load the Nvidia driver vib in the ESXi command line.

In theory, the above lets you pass a GPU through to a virtual machine, and then you can play Space Engineers for two solid days make performance changes if you want to do fully virtualised GPUs.

Now, let's add in the bit where, unbeknown to me, there was a bug in this specific configuration so that an ESXi 5.5 RTM install would pass both GPUs from the K2 through to different virtual machines, but ESXi 5.5 U2 would work properly only when I chose to restrict my pass through to one of the GPUs.

If I tried to pass through both, it would fail without any useful errors. But I found that I could load the Nvidia driver under this configuration and both pass through one of the GPUs as well as run the other in virtualised GPU mode, despite the fact that I was told this should not currently be possible.

I spent more than a month trying to find the exact magical combination of BIOS settings, hypervisor install and virtual machine configuration that would let me pass through both GPUs.

Bug mania

In addition to the GPU error, there is an Intel network drive bug for the onboard 10GbE cards on my FatTwin that almost cost my sanity. It certainly cost me clients, and my inability to solve it – and to disconnect from attempting to solve it – has damaged my reputation.

There is a reasonable bit of documentation of the issue here and an example of buck-passing here.

The short version is this: if everything on my 10GbE switch is 10GbE, everything works fine. But it there is a 1GbE device on the network, then any 10GbE devices with the Intel NICs will send to that 1GbE device no faster than about 500Kps.

This holds true for about 60 per cent of 1GbE devices you care to attach. For the other 40 per cent, the Intel NICs seem to work just fine. I have not figured it out.

To reproduce the error load up ESXi 5.5 U2 on a Supermicro F627G2-F73PT+ and connect it all up to any Broadcom-based 10GbE switch (such as the Supermicro SSE-X3348T). Then connect a D-Link DES-1016 switch (from which your other 1GbE devices will hang.)

You can send data from the D-Link 1GbE side to the 10GbE side at wire speed. Data in the other direction will slow to a crawl. Swap out the DES-1016 for the Intel 82579LM network card in my notebook, same issue. Choose instead a Netgear WNDR-3700 V2 Wi-Fi router, and you get wire speed in both directions. Cheap $10 TP-LINK switch? Same deal.

This is an old bug. I have some Intel 1GbE network cards that do the exact same thing if you toss a 100Mbit device on the network. Despite knowing of a previous iteration of this issue I was a week into trying to figure out why my finally-bloody-obeying-me GPU VDI setup was performing like crap before I twigged that the answer wasn't in the box, it was in the band.

Sure enough: $10 TP-LINK switch later and Space Engineers my testing regimen was working smooth as silk.

Grief counselling

I do apologise for the lengthy preamble to my main point, but I want you, dear reader, to grasp the depth of the technical intricacies that those of us who do formal prototyping have to deal with.

There were trials and tribulations, but the end result was that I now have a testlab system that does GPU VDI work wonderfully.

That same system can be re-tasked with a few hours' effort to be a truly amazing server SAN system. I can also load up additional PCI-E cards to test things like A3Cube's PCI-E networking, and who knows what else.

But getting this far has cost me. Tens of thousands of dollars, some of it up front to meet commitments to clients. I had to, for example, go out and buy some new 10GbE network cards because I couldn't beat the existing ones into shape in time to meet deadline.

I have also had to turn away clients, and to my eternal shame, have been behind deadline for others as I have tried to solve problem after problem.

A spectacular amount of grief would have gone away if these systems had the sort of enterprise management software that high-end blades regularly ship with.

In a blade system, the BIOS configuration is stamped on the node by the administration module. That configuration lives with the slot, not the blade, and you can store multiple configurations.

You can have dozens of the things around for different configurations and it makes testing far easier

The configuration contains not only the BIOS configs, but also MAC and WWN addresses for the nodes. Your operating system instances can be bound to a given MAC address and that MAC bound to a given config.

You can have dozens of the things around for the different configurations you need, and it makes testing – and reverting if things go wrong – far easier.

How long does it take to load an operating system completely from scratch and configure it? Isn't this why we created images, then deployable configurations, automation, Puppet, web-scale orchestration and so forth?

But what about the underlying hardware? That which touches the metal needs to be configured as well.

In addition to the hypervisor or operating system that sits on the metal, the system's BIOS needs to be configured. Baseband management systems need configuration. Security and encryption modules need to be loaded with relevant keys.

Even network cards, hard drives and RAID cards have their own ROM that need to be updated, their own configurations to be managed and disseminated.

It took me months to solve the configuration problems with my testlab, in large part because it could take an hour to get everything reloaded, reconfigured and then run through a test, and I had hundreds of configurations that had to be tested.

Not so dumb

For one man in a testlab, my setup will do. But many enterprises repurpose thousands of nodes on a regular basis, and "known good" needs to be more than a hand-written configuration scrawled on a piece of paper covered by a coffee cup.

If I am running Facebook, then a bunch of OpenCompute nodes stripped down to the point that there isn't even a $2 sound card per node that is not required makes perfect sense.

For them, everything is done in software. The chances of a BIOS change to a node once it is plopped onto the rack approach zero. It is a bucket of compute that grabs an operating system from a network and that is all it ever has to be.

Similarly, my Supermicro FatTwin makes for a truly glorious Caringo cluster. Caringo runs a controller virtual machine that hands off a read-only version of CentOS over PXE network boot.

It discovers all the storage on the nodes that boot off it, formats that storage and adds it into its storage pool. Need more storage? Boot more nodes: object storage has become as simple as turning on a connected computer that is designed to boot from PXE.

Again, what is needed these sorts of scenarios are "dumb" nodes. They don't do anything special. At best, they host some SATA drives. This is the future of large-scale storage.

It is the present of hyperscale cloud providers. But despite all the hype out there, it is not the be all and end all of the market.

Blades cut it

Lots of companies, from the small business to the largest enterprises still need their computers to do something. They need their servers to be more than dumb nodes. All those cool technologies that Intel builds into its CPUs are there for a reason: someone wanted them.

There are PCI-E cards that make any number of widgets go bing, USB dongles for software, OLTP applications requiring MCS-class flash storage and goodness only knows what else.

What is more, we don't just throw away our servers whenever the workload changes. Nor are systems administrators – or those who pay them – willing to spend an hour per node reconfiguring the thing.

When a given set of workloads is migrated from one class of node to the next, the old ones are repurposed. Changes are tested on one cluster of nodes for the new workload, and pushed out from there to the entire population of that class.

Prototyping all the bugs out of one cluster cost me three months of my life. That can't happen at enterprise scale. So despite the doom and gloom from those who can't see beyond the price differences of a bottom-barrel Open Compute node and a blade system, blades are here to stay.

They have a role to play in the future of IT, and for now it remains an important one. ®


Biting the hand that feeds IT © 1998–2020