This article is more than 1 year old

Intel set to squeeze the flops out of Ponte Vecchio GPU

Now this x86 giant just needs to ship a compatible CPU

Hot Chips Intel offered the closest glimpse yet at its flagship datacenter GPU, code named Ponte Vecchio, at the Hot Chips conference this week, with its own internal benchmarks showing the chip outperforming AMD’s MI250x and competing head-to-head with Nvidia’s upcoming H100 GPU.

Announced last year, Ponte Vecchio is Intel’s first serious run at delivering a high-performance GPU for AI/ML and HPC applications. The chip itself is actually a series of memory and compute dies glued together using a combination of Intel’s Foveros and EMIB packaging tech into a "stack" with two such stacks per accelerator.

According to Intel Fellow Hong Jiang, these stacks can behave like a pair of GPU dies or as a single logical die depending on application needs.

Intel claims Ponte Vecchio will deliver 52 teraflops of, based on a design choice, allows the same peak FP32 and FP64 performance, putting it just ahead of AMD’s 47.9Tflop (FP64) MI250X announced last year, and within spitting distance of the H100’s 60Tflops (FP64).

Ponte Vecchio

Ponte Vecchio ... Up close at Hot Chips. Source: Intel

As a side note, normally FP32 is double FP64 as the precision is lower, but intel chose to limit FP32 performance, keeping on a par with FP64. Presumably, it thinks FP64 and AI-suitable precision are more important these days than 32-bit floating point."

Intel also discussed the performance of its XMX matrix accelerators at Hot Chips, which are analogous in many respects to Nvidia’s tensor cores. In single-precision matrix calculations — tensor float — Intel says the GPU will deliver 419Tflops of performance.

Some of the performance can be attributed to Ponte Vecchio’s large caches with include a 64MB register file, 64MB of L1 cache, 408MB of L2 cache, and 128GB of HBM memory.

“This really helps us to keep data on die instead of having to go out to the HBM memory,” Jiang said.

No PCIe 5.0 CPU to match

Both Intel and Nvidia’s GPUs are reliant on PCIe 5.0 for connectivity to the host and that means comparing them to AMD’s PCIe 4.0-based MI200-series GPUs isn’t exactly an apples-to-apples comparison.

The new PCIe spec offers twice the bandwidth to the host, but requires a next-gen CPU from either Intel and AMD, neither of which are available yet.

And while Nvidia could easily opt for AMD’s Epyc 4 chips when they launch this fall or use its own Grace CPUs, Intel appears to be sticking with an all Intel architecture.

At HotChips execs showed off four liquid cooled Ponte Vecchio GPUs paired up with two of its long-delayed Sapphire Rapids Xeon Scalable processors in a 1U chassis. However, Jiang notes that up to eight GPUs can be connected to a single node using the company’s Xe Link fabric.

The latest day has reportedly pushed the chip back until Q1 2023, more than a year and half after it was supposed to launch.

As a result Intel could be left waiting with a perfectly good GPU for its CPU division to deliver Sapphire Rapids.

The superchips arrive

By the time Ponte Vecchio actually arrives the comparison may be far less favorable.

AMD is slated to release its Instinct MI300 accelerators in 2023, which its billing as the “first datacenter APUs.”

The chips will feature a Zen 4 processor co-packaged alongside a CDNA 3-based GPU. In a presentation this spring, AMD claimed APU would deliver an eight-fold performance improvement over the MI250X, though its not clear who that will reflect in terms of real world performance.

If this sounds familiar that’s because Intel, Nvidia, and AMD have all been trending in this direction. At GTC this spring, Nvidia revealed its Grace-Hopper Superchip, which pairs its Arm-based Grace CPU with a GH100 GPU, and 512MBs of LPDDR5X and 80GBs of HBM3 memory on a single 1000W package.

No to be left out, Intel announced similar plans for its Falcon Shores XPU in May, which will see the chipmaker merge its HBM-equipped Sapphire Rapids CPU and Ponte Vecchio GPU stack into a single package.

Intel claims the platform will provide a five-fold improvement in performance-per-watt, memory capacity, and bandwidth compared with “current platforms.”

Rialto Bridge on the horizon

Ponte Vecchio not only faces competition from Nvidia and AMD, but if held back much longer the chip could find its lifespan cut short by its successor, code named Rialto Bridge.

We’ve actually seen this happen once before with Intel’s 11th-gen Rocket Lake CPUs which launched in early 2021 only to be replaced a few months later by the far superior Alder Lake refresh, which offered substantial performance, core count improvements, and process improvements.

Rialto Bridge, which is supposed to begin sampling next year, will see Intel up the power consumption to 800W per module and will require liquid cooling.

Regardless there’s at least one customer eagerly awaiting Ponte Vecchio’s arrival: The Department of Energy’s Argonne National Laboratory, which plans to use the chips in its Aurora supercomputer. That is, when it finally arrives after epic Intel-driven delays. ®

More about

TIP US OFF

Send us news


Other stories you might like