Pic Broadcom's axed Arm server processor project today rose from the grave – as Cavium's 64-bit 32-core two-socket Armv8-A ThunderX2 chip.
Back in 2013, Broadcom announced it was working on Vulcan, a multi-core 3GHz Arm-compatible 64-bit server-grade system-on-chip. By 2016, that ambitious project was quietly dismantled. Broadcom was acquired by Avago in mid-2015, which had no interest in the data-center CPU effort, and within 12 months, Broadcom's Arm server CPU blueprints had been secretly sold to Cavium, which later repackaged the technology as ThunderX2.
There's no mention of Vulcan in Monday's announcement of the ThunderX2's general availability. That Cavium decided to purchase the chipset's designs and engineers from Broadcom on the down low, file off the brand, and market it as ThunderX2, rather than further its first-generation ThunderX, perhaps says all you need to know about that inaugural processor family. The original chip designs may be pulled from the archives, dusted off, and used in another form in future, of course.
None of this is meant to persuade you to immediately discount the ThunderX2. It appears to be quite a capable chip. We're simply describing its product's twisty-turny route to market. We had heard rumors of its Vulcan origins in November and December, and have since had it confirmed by two sources familiar with the ThunderX2's development.
"We leveraged the Vulcan core, and made major changes in the system-on-chip, leveraging learnings from ThunderX to improve its performance for cloud users' workloads, and integrated some Cavium capabilities," Gopal Hegde, vice president of Cavium's data center processor group, told The Register in a statement.
Prices, sockets, and specifications
The ThunderX2 is available now, if you want to buy a server system with one or two of these processors inside it. As far as we can tell, you can't build your PC or laptop out of one of these things: you have to go to HPE, or Cray, or a friendly white-box server maker.
If you want to buy a top-end ThunderX2 – 32 2.2GHz cores in total drawing up to 180W – then it's $1,795 apiece if you buy a tray of 1,000. A 16-core 75W part running at 1.6GHz will set you back $800.
Cavium's chief rival in this space is Qualcomm's 64-bit Arm-compatible Centriq 2400: a top-end model, 48 cores at 2.2GHz drawing up to 120W, costs $1,995. It's a fairly tight race between these two contenders that goes beyond these headline numbers – performance per clock tick, IO bandwidth and interfaces, cache sizing, and so on.
Crucially, ThunderX2 is a one or two-socket part. You can use two per server node, if you wish. Centriq is strictly single socket. Depending on how you pay for software licensing – per core, per socket, per node, etc – and how dense you want your racks, these socket combinations may make all the difference in terms of cost. Single socket might be good for you. It might not be. Your mileage may vary.
For what it's worth, Cloudflare is in love with the Centriq, and wants to plug it into in its barns of web-content-distributing servers, claiming Qualcomm's chips draw considerably less power than its Intel CPUs.
Microsoft, which has also said nice things about the Centriq, claims to be head over heels in love with the ThunderX2, reiterating it wants more than half of its data center capacity – think web searching, email, machine learning, storage, and so on, but not customer virtual machines – driven by Arm processors, and it wants ThunderX2 in that mix.
Redmond has even designed a motherboard for the processors with Cavium so that the hardware fits into its customized Project Olympus server racks, the blueprints of which have been submitted to the Open Compute Project.
At a Cavium Arm server processor launch, suddenly Microsoft shows up and reiterates it still wants >50% of data center capacity to be Arm powered. And it's loving Cavium's Thunder X2 Arm64 system. Together designed two-socket Arm server motherboard #build2018 pic.twitter.com/WYURr2dPQ5— The Register (@TheRegister) May 7, 2018
Why Arm? Microsoft appears, chiefly, to be tired of being tied to one instruction set architecture – to Intel's ISA. Microsoft is part of a swelling crowd of cloud organizations and enterprises considering throwing Arm and similar architectures into their server warehouses to introduce a second or third supplier, and escape the monopoly-strength grip Intel has on prices and supply channels for data-center-grade CPU silicon. Nine out of ten processors going into the world's data centers right now have Intel's logo on them, according to IDC.
It's jolly nice and all that the Arm server stuff is performant with server applications, but it's the fact the upstart silicon can, depending on the model, work out cheaper than what Intel is charging, while more or less doing the same job as Intel's Xeons. That's what fires up cold-hearted corporations. Deductions. And anything that avoids price gouging.
"We have contributed the design of the ThunderX2 motherboard for Microsoft's Project Olympus specification to the Open Compute Project, and we look forward to further optimizing our internal cloud services workloads for ThunderX2," said Microsoft Azure distinguished engineer Dr Leendert van Doorn, who appeared on stage during the ThunderX2 launch to heap praise on the thing.
So, what's under the hood?
The 16nm TSMC-fabricated ThunderX2 CPU cores have quad-issue pipelines that execute instructions out of order, so yes, you'll have to apply Spectre mitigations if you're running untrusted code on these.
But these systems aren't supposed to be running untrusted or user-supplied software. If an attacker is on your web search engine's crawler box and able to exploit Spectre to leak data from kernel or application memory, you have much bigger problems: like, er, there's someone in your search engine's crawler box.
"The forthcoming ThunderX2 does have speculative execution, and is indeed exposed to the Spectre Variant 1 and 2 threats, but is not impacted by the Meltdown Variant 3 threat," a spokesperson for Cavium told us. "Finally, after the Linux patches and system firmware are updated to guard against Spectre, the performance impact of the patches is negligible."
Each core has one, two, or four hardware threads, and up to 32 cores per system-on-chip socket. Each core has 32KB of shared instruction and data level-one cache, 256KB of level-two cache, and 32MB of distributed level-three cache. The CPU cores can be clocked up to 2.5GHz, or 3GHz in turbo mode. There are also two 128-bit NEON floating-point units per core, seemingly matching AVX2 math performance in Broadwell and Haswell Xeon E5 processors.
The ThunderX2 system-on-chip supports up to 4TB of RAM in dual socket mode, using eight 2.67GHz DDR4 controllers and up to 16 DIMMS per socket. These DIMMs can be a mix of RAM and non-volatile memory. The SoC provides 56 lanes of PCIe 3 at x1, x4, x8 and x16, using 16 PCIe controllers. Plus there's the usual SATA3, USB3, and general purpose IO interfaces. A NUMA coherent interconnect dubbed the Cavium Coherent Processor Interconnect keeps the cores glued together using a 600Gbps pipe.
There are more than 40 variations of the ThunderX2 to meet various customer demands. Cavium claims its processor is comparable to scalable Skylake Xeons. ®
As we were about to publish this piece, a report emerged out of Bloomberg claiming Qualcomm is considering killing off or flogging its Centriq processor.