ARM server hype ramps faster than ARM server chips
All the more time for Intel to get a leg up
Analysis If I didn't have to man El Reg's systems desk for a paycheck and had a little venture capital to blow, I might start a company called Leg Systems, headquartered on the Isle of Man – not because of its tax haven status (which is eroding), but because my company would sell ARM-based systems and say that we wouldn't charge an arm and a leg for them.
Let's be honest, that's probably not much less of a business plan than other startups have used to get venture cash.
ARM Holdings, the design and licensing company behind the ARM processor architecture, unmasked its 64-bit Cortex A50 processor designs in October 2012, and AMD, Samsung Electronics, and Cavium have licensed those designs. AMD and Cavium have admitted that they will be using these ARMv8 architecture chips in servers, and Samsung is widely believed to be working on server parts as well, but has not confirmed its plans. Marvell has aspirations in the ARM server space, too, and has Dell building experimental boxes using its ARM designs and related networking chips.
The battle pitting ARM chips against X86 processors in the data center – mostly Intel Xeons and now Atoms – is not just about low-energy processing, but also about virtualization, networking, and a more integrated data-center design.
If you are wondering why Intel spent past year acquiring the supercomputer interconnect business from Cray, the InfiniBand business from QLogic, and the Ethernet business from the formerly independent Fulcrum Microsystems, it was to get access to interconnect experts and to figure out when and how interconnects – the next logical piece of the hardware stack – can be integrated onto the processor chip complex.
Don't expect Intel to put a Cray "Aries" XC interconnect on an Atom processor to make a network-ready chips for snap-together clusters, but do expect for them to come up with some kind of on-chip interconnect that can compete against the ARM onslaught and protect Intel's intentions to expand its Data Center and Connected Systems Group's aspirations to rule servers, storage, and networking, and to double its business in these areas to $20bn annually by 2016.
As we discussed at length in November, former Intel chip boss and now VMware CEO Pat Gelsinger thinks that the future is ARM and Intel on the endpoints and Intel in the data center. Specifically, by 2015 the analysis that Gelsinger's staff at EMC put together for the Hot Chips 24 conference shows most of the processor and chipset money either in the data center or on end points.
Mobile devices based on non-x86 architectures in the EMC model are expected to be the largest part of the IT ecosystem, pushing around $34bn in chip and chipset revenues, followed by mobile x86 devices (mostly laptops but some tablets and smartphones) driving maybe $27bn in revenues in CPUs and chipsets. That leaves x86-based servers driving around $18bn in revenues in 2015 and x86-based PC desktops with a mere $5bn in processor and chipset sales.
To Gelsinger's way of thinking, ARM on the endpoint and x86 in the data center becomes the new normal because of the size of the software investment on each side. But there is, as El Reg pointed out, another – and we think equally probable – possibility (with absolutely huge error bars) that companies will decide they want one software stack running on one platform. That could mean Intel wins on the smartphone and tablet endpoints, or it could mean that ARM wins in the cloudy data center and then backs its way into the corporate data center.
How this plays out will depend on many factors, not the least of which being the cleverness of the engineers behind ARM server chips and the software stacks that run atop of them. And there is no shortage of smart alecks at the handful of ARM server chip upstarts. Here's who the players are and what we know of their plans:
Calxeda: This is the first silicon etcher to jump into the ARM server fray back in November 2011 with a custom quad-core Cortex-A9 chip that integrated processing and interconnect onto a single chip.
People have been monkeying around with baby ARM servers and Linux operating systems for a lot longer than this, of course, but the Calxeda EnergyCore ECX-1000 – which includes an on-chip distributed Layer 2 switch interconnect – sets the bar for the level of engineering and integration that will be required to supplant X86 processors and external switches in the data center.
The ECX-1000 chips are based on the ARMv7 spec and only sport 32-bit processing and memory addressing, which is fine for certain kinds of media processing, simple web serving, and even some big-data munching jobs that are more constrained by I/O than memory or CPU.
That said, companies have been writing 64-bit software for a long time and they don't want to go back, and 4GB of main memory for four cores is a bit skinny, even if the chip architecture does have a very sophisticated interconnect that can span 4,096 server nodes in a single cluster and without using external switches.
This year, Calxeda will move to a Cortex-A15 core with a chip code-named "Midway" that sports 40-bit memory addressing, boosting the memory on a four-core chip to 16GB. This chip will also provide twice the performance, enhanced virtualization, and a more scalable implementation of that integrated fabric, which is now called the Fleet Service Fabric Switch.
Sometime in 2014 – about a year after Midway ships – Calxeda will move to the ARMv8 core from ARM Holdings with its "Lago" system-on-chip, providing 64-bit processing and memory addressing. Lago will again double the performance of the processor (probably through more cores and not through clock-speed bumps) and add floating point processing in hardware as well as a third-generation on-chip interconnect fabric that will span more than 100,000 nodes.
Calxeda is at the moment only licensing the Cortex-A57 as the basis of its Lago chips, but it is possible that in the future it could employ the Cortex-A53 processors for certain workloads or employ the two different types of chips on the same die in the big.LITTLE approach championed by ARM Holdings.
Further out beyond that is an ARM SoC from Calxeda called "Ratamosa" that will also have performance enhancements, and will be aimed at full-on enterprise applications and supercomputing workloads. While no one will admit to this, Ratamosa is probably timed to coincide with the availability of a commercial-grade and field-tested Windows Server 2012 R2 update, which is the first possible version of Windows that Microsoft might field supporting both x86 and ARM processors. Microsoft could, of course, provide an ARM port of the baseline Windows Server 2012 and its key systems software such as SQL Server and Exchange Server any time it chooses. But for the moment, Redmond seems content to let Red Hat and Canonical lead in ARM support for their Linux distributions while they see what develops.
Applied Micro Circuits: This company is backing into the server chip business from the networking chip and embedded processor markets where it has been making its living in the hopes of carving out a big, juicy, profitable slice of the server racket.
The company launched its X-Gene multi-core SoC based on the ARMv8 design in October 2011, a year before ARM Holdings put out the full ARMv8 specs as embodied in the Cortex-A53 and Cortex-A57 reference designs.
Applied Micro wants to be first with 64-bit ARM servers and to build a sustained lead over its future rivals. The companyprovided more details on the initial X-Gene chipslast summer at Hot Chips, and was showing off potential compute and storage server designs based on the X-Gene chip when everyone else was making ARMv8 announcements at last October's ARM TechCon 2012 event.
Applied Micro has not released the full specs of the X-Gene chip, but what we know is that it uses a two-core module as the basic building block of the SoC. The cores have a four-wide, out-of-order execution unit for integer work, include full virtualization support including nested page tables that hypervisors expect, and have their own L1 data and L1 instruction caches.
The core pair shares an L2 cache, and multiple pairs are ganged up to make a multicore system. A coherent network on the SoC delivers 160GB/sec of bandwidth and links core pairs to each other and to on-chip PCI-Express, networking, and SATA ports as well as to DDR3 main memory.
The initial X-Gene chip will be implemented in the 40 nanometer process from Taiwan Semiconductor Manufacturing Corp (which also etches Calxeda's ARM chips), and will top out at four core modules and eight cores running at a maximum of 2.5GHz.
Each eight-core chip will address up to 256GB of physical memory, 40GB/sec of networking I/O, and 17 lanes of PCI-Express 3.0 bandwidth to carve up into slots. That on-chip interconnect fabric can be extended to a total of 16 processor sockets for a total of 128 cores in a single cluster image.
This initial X-Gene chip is supposed to sample in the first quarter with volume shipments at the end of 2013.