Dell ARMs up for hyperscale servers
But are they dangerous to anyone except HP?
Look out Intel. Here comes another ARM box to the microserver party.
If people didn't want ARM-based servers, Dell wouldn't build them, and so with the launch of the "Copper" ARM server sled for the "Viking" C5000 microserver chassis we know that people want ARM servers. And this is not some experiment that Dell is doing, either, Steve Cumings, executive director of marketing for Dell's Data Center Solutions bespoke server unit, tells El Reg.
"This is not only real, but it is going out the door," says Cumings, with a jab at rival Hewlett-Packard's "Project Moonshot" hyperscale server effort and its "Redstone" many-ARMed server nodes launched last November. Those servers made a big splash, but neither HP nor Calxeda have said much since then about who is using these concept machines – or what the plan is to ramp up the Redstone product line with other ARM chips with more oomph and memory capacity.
Those HP Redstone servers are based on Calxeda's 32-bit ECX-1000 processors, which also debuted last fall, and put four processors on a card, 18 cards in a tray, and 288 nodes in a 3U space – all interconnected by the EnergyCore Fabric Switch embedded on each Calxeda chip. This interconnect can implement a 2D torus, mesh, fat tree, or butterfly tree topology and can scale across 4,096 sockets. This is just the thing for certain hyperscale web and data-munching workloads that need lots of compute cores but not the high-bandwidth interconnect that top-end Ethernet or InfiniBand switches and adapters would offer.
Cumings says that the Copper server sleds have been in development and test for the past year among its hyperscale server customers, who generally are looking for the cheapest clocks and lowest watts to cram more computing per dollar and per watt into their server racks to support various workloads.
The reason why it has taken Dell so long to get a production-grade ARM server to market has little to do with hardware – 32-bit and 40-bit ARM chips that could be used in servers have been around for a while – but rather is related to the fact that the ecosystem of software had to evolve around the machines to give them something useful to do. Even the most hyper of the hyperscale data centers does not want to roll its own Linux and software stacks unless it absolutely has to.
Dell's ARMed PowerEdge C5000 microserver chassis
Now that Ubuntu 12.04 LTS is out with an ARM variant, there is a commercial-grade Linux and application stack you can run on an ARMv7-compatible chip. And the Fedora Project is also doing tweaks to support ARM chips, allowing those who like RHELish Linux to experiment as well.
The LAMP stack and Hadoop will run on these Linuxes on ARM processors, and the OpenStack cloud controller was demoed running on LARM (what else are we going to call it?) this month, too. The KVM hypervisor is expected to be ready soon for the Cortex-A15 ARM processor and Java also works on the chip, too. Dell is also working to get its "Crowbar" server configuration tool working on the Copper servers before the end of the year, and is working to get a variant of Crowbar that can set up Hadoop clusters out the door, too.
The important thing is not that all of this software is running or will soon be running, but that the software is getting architecture-specific optimizations as people see the hardware and software ecosystem for ARM-based servers coming into being.
"We spend a lot of time on hyperscale, and we have a pretty clear understanding of what customers want to do and where their interests lie," explains Cumings. "We have had this system in the lab for over a year, and we have been trying to figure out what it might be good for."
Initially, it looks like what you might expect: Web front ends and Hadoop data munching are the two big workloads that will deliver better performance per watt and better performance per dollar compared to X86 servers.
Cumings is not about to provide any specifics about how much better or what these things cost, but each ARM server node on the Copper sled only burns a peak of 15 watts. That's pretty low, and as low as Intel is trying to shoot with its Xeon and Atom chips.
Dell has chosen Marvell's Armada XP 78460 variant as the basis of the Copper sled. This is a quad-core system-on-chip (SoC) with four Marvell "Sheeva" PJ4B cores, which are a variant of the ARMv7-MP design that has the 40-bit memory addressing added to it. The cores run at 1.6GHz and have both symmetric multiprocessing and asymmetric multiprocessing enabled. SMP allows the cores to be ganged up into a single image behind main and cache memories, while AMP allows the cores to be carved up into isolated virtual hardware slices.
Block diagram of Marvell's Armada XP 78460 chip
As you can see, the heart of the Armada XP 78460 is a system crossbar interconnect that links all of the elements of the SoC together – components that might otherwise be on a server motherboard separated from the CPUs and glued together by a chipset. Each Sheeva CPU has a floating point unit and is linked over a coherency fabric to a 2MB on-die L2 cache. The memory controller supports up to 8GB of DDR3 main memory with ECC scrubbing and running at up to 1.6GHz. There are four PCI-Express 2.0 controllers on the chip, as well as controllers to drive two SATA peripheral ports and four Gigabit Ethernet ports. There is also a 4Gb/sec packet processor that could come in very handy, as well as a security engine for encrypting and decrypting data and a controller to link to three USB 3.0 ports. That is a lot of stuff to cram into a 15 watt thermal envelope – hence the excitement about ARM servers.
The Copper sled server, which slides into the PowerEdge C5000 chassis from the DCS unit, puts four of these Armada XP 78460 processors on a system board with four memory slots; each processor links to one slot and it has 8GB of memory in it. This board is designed by Dell; it is not clear where it is manufactured. The sled has room for four 3.5-inch SATA drives, one for each ARM server node.
Off to the right of the four processors is another processor and some other gadgetry that implements a Layer 2 Ethernet switch across all of the four nodes on the Copper sled as well as the dozen Copper sleds that can be slid into the C5000 chassis. This network can also span multiple enclosures if customers want to do it. The exact feeds and speeds of this on-board, distributed Ethernet switch are not being divulged at this time, but it was clearly designed to compete against the EnergyCore Fabric Switch embedded on each Calxeda ARM server chip. Cumings says that Dell expects for customers to use normal top-of-rack switches to link multiple Copper sled enclosures together, but they don't have to.
Cumings says that the Copper sleds have been shipping to selected seed customers for some time and that Dell will be standing up racks of the ARM-based servers in its solution centers as well as in the Texas Advanced Computing Center (TACC) at the University of Texas. The latter is a supercomputing center that has the facilities to allow secure remote access from all over the world, and Dell's plan is to let developers sign up for some timeslices on the Copper servers to give their code a whirl before they commit to buying the custom iron from DCS.
A single C5000 chassis can hold 48 ARM processors, for a total of 192 cores. That works out to 2,688 cores in a rack if you fill it top to bottom with C5000s – or 2,496 cores if you leave 3U open for top-of-rack Ethernet switches.
The HP Redstone machines can cram a lot more cores into the rack if you don't put storage on them, and still offer higher density if you put SSDs or 2.5-inch SATA disks into some of the SL6500 enclosure's trays. The issues then become the cost of each machine and the premium, if any, that HP is commanding for the Redstone density and the Calxeda interconnect. As Copper and Redstone are not shipping commercially and do not have official prices, it is hard to tell how it all plays out. ®