Ampere will today tear the covers off Altra, its 80-core 64-bit Arm N1 processor for cloud and hyperscaler servers.
Meanwhile, Marvell announced additions to its Octeon TX2 family of Arm microprocessors, including a 36-core 64-bit Arm part, and Xilinx will tout its Alveo U25 network card.
These announcements were timed for the start of the Open Compute Project Global Summit, due to be held in Silicon Valley this week – until the COVID-19 coronavirus outbreak forced its cancellation.
Ampere's Altra: This TSMC-fabricated 7nm-node server-grade microprocessor features up to 80 64-bit CPU cores, arranged in a grid-like cache-coherent mesh, consuming up to 210W per package. The Arm-designed N1 cores are compatible with Armv8.2+, clocked up to 3GHz in turbo mode, and feature a four-wide superscalar pipeline with "aggressive" out-of-order execution.
The processor is aimed at performing search, AI inference, and video transcoding, hosting virtual machines and containers, managing data storage, and running databases and web applications. The target market is telcos and public cloud builders, and hyperscalers who pack their data centers with an incredible amount of generic compute, and use orchestration to dynamically deploy software and services to available systems. Jeff Wittich, senior veep of products at Ampere, told us: "We should see adoption in private cloud as well since we’ve designed for all cloud infrastructures."
Interestingly, Ampere was keen to push its single-thread performance: this is not a processor design that has two or more hardware threads running simultaneously through each CPU core, like AMD's Epyc SMT. Ampere claimed this gives Altra consistent and predictable performance, and avoids noisy neighboring code and pipeline resource contention. This is the sort of thing you'd say to explain why you're not offering simultaneous multi-threading in hardware like AMD is. (Intel's SMT technology is insecure.)
SPARCs fly as Oracle recharges Arm server processor designer Ampere with $40mREAD MORE
We're told there is 64KB of L1 instruction and data cache per Altra core, 1MB of L2 cache per core, and 32MB of shared L3 cache, plus two 128-bit SIMD execution units, all on the monolithic silicon die. Altra supports cache coherency across multiple sockets, apparently.
Each processor sports up to eight 72-bit DDR4-3200 memory channels running above 200GB/s and addressing up to 4TB of RAM per socket: that's four, six and eight active channels with two DIMMs-per-channel at 3200MHz. The mesh design ensures uniform memory latency across the CPU cores, Ampere told us.
For IO, the Altra offers 128 lanes of PCIe 4 from a single socket, or 192 in a two-socket system, and support for up to four CCIX-based accelerators. In terms of booting an operating system on it, the hardware is Server Base System Architecture level-four compliant, features secure boot mechanisms, DRAM RAS error reporting, power management and temperature control, acceleration for INT8 and FP16 math for AI inference, AES and SHA-256 acceleration, and the other usual bits and pieces.
In terms of performance, Ampere claimed an 80-core-per-socket dual-socket Altra overclocked to 3.3GHz is on a par with a 2.25GHz 64-core-per-socket (128 threads per socket) dual-socket AMD Epyc 7742, in terms of estimated SPECrate2017_int benchmarks. And that's estimated because Ampere scaled the AMD score by 0.835 due to its use of the AMD64 compiler suite to build the benchmarking code versus GCC 8.2 used for the Altra benchmark build. AMD's C/C++ compiler produces more optimized code than GCC for Arm, you see.
We also note the mid-2019 Epyc sports 256MB of L3 cache versus the 32MB in the Altra. You probably will want to evaluate your workloads on one or two Altra parts before getting too excited.
Meanwhile, Ampere's next-gen 7nm Mystique chip is in development for 2021, we're told, and 2022's 5nm Siryn is being sketched out now. In 2018, the biz touted its 32-core eMAG processor, based on the X-Gene blueprints it acquired from Applied Micro. You can read more about Ampere's history and approach right here and here, on our sister site The Next Platform.
Ampere assured us Altra can boot the usual main GNU/Linux distributions – Ubuntu, CentOS, Red Hat, SUSE, Debian, and so on – plus FreeBSD and Windows Server. The chipset and stack support virtualization, and Docker and Kubernetes containerization. We asked Ampere to explain how it has tackled any Spectre-style side-channels lurking in its out-of-order execution pipeline. "We have full hardware mitigation for Spectre and Meltdown designed into Ampere Altra," Wittich replied.
The Altra is sampling now, and expected to ship in mid-2020. Two server platforms – the dual-socket Mt Jade, and the single-socket Mt Snow – will also be available in 1U and 2U form.
Xilinx's Alveo U25: This is a so-called SmartNIC: it's a PCIe FPGA board that can be configured using high-level C-like or low-level hardware design languages to observe and manipulate network packets flowing through it. That means you can build a customized network adapter that inspects and tweaks data in transit at the chip-level for whatever you need.
Under the hood, there's a Zynq UltraScale+ XCU25 FPGA with four Arm Cortex-A53 CPU cores, an XtremeScale Ethernet controller, 1GB of 40-bit and 2GB of 72-bit DDR4-2666 RAM, a PCIe 3 interface, and two 10/25GB SFP28 DA copper or optical transceivers. It features the Onload technology acquired with Solarflare that allows applications to access the NIC directly rather than run everything through the full TCP/IP stack and kernel code, virtualization support, and lots of other bits and pieces. It works at least with Red Hat Enterprise Linux and variants.
Xilinx also announced its XtremeScale X2562 10/25GbE adapter card in the Open Compute Project specification 3.0 form factor.
Marvell's Octeon TX2: Marvell has expanded its line-up of Octeon TX2 Arm-compatible processors, which are based on the ThunderX2 designs acquired with Cavium, which itself got the blueprints from Broadcom when they were known as Vulcan. You can catch up on more of that history right here.
These processors are aimed at networking infrastructure – think switches, gateways, monitoring equipment, smart NICs, and 5G base stations. The additions are split into two groups:
The CN913x family that features four stock Arm Cortex-A72 CPU cores clocked up to 2.2GHz, with 48KB of instruction and 32KB of data cache per core, two blocks of 512KB L2 cache, and 1MB of L3 cache. There's support for PCIe 3, combinations of 1, 5, and 10GbE network ports, SATA and USB interfaces, and other bits and bytes.
Meanwhile, the CN92xx, CN96xx, and CN98xx are beefier, and aimed at packet inspection, switching, and other more demanding tasks. They sport 12 to 36 64-bit Armv8.2 Octeon TX2 CPU cores clocked up to 2.4GHz, with as much as 66KB of instruction and 41KB of data L1 cache per core, up to 29MB of L2 and L3 cache per chip, combinations of 10 to 100GbE networking, PCIe 4 and DDR4 with ECC support, a security coprocessor, and various peripherals.
We're told the CN913x, CN92xx, and CN96xx are available now, along with development kits and reference designs, whereas the CN98xx will sample next quarter. ®