Nvidia gives Grace Hopper superchip an HBM3e upgrade – sometime next year
641GB of total memory ought to be enough for anybody (and their LLM)
Less than three months after Nvidia's Grace Hopper superchips went into full production, CEO and leather jacket aficionado Jensen Huang this week took to the stage at SIGGRAPH 2023 to unveil an even more capable version of the silicon.
The forthcoming processor shares the same basic DNA as the GH200 – the final form of the Grace Hopper superchip Nvidia announced in early 2022 and showed off at Computex this spring. The device's 72-core Arm Neoverse V2 Grace CPU, its Hopper GPU, and its 900GB/sec NVLink-C2C interconnect all remain unchanged. And, much to our initial confusion, the new chip shares the same GH200 moniker.
Nvidia tells us this is because they're different configurations of the same part – so it's not unlike the 40GB and 80GB versions of the A100 from a few years back. In fact, the memory load out is the core difference here, at least for now. Instead of 96GB of HBM3 vRAM and 480GB of LPDDR5x DRAM on the model from this spring, the "next-generation" GH200 features 141GB of HBM3e and 500GB of slower 400GB/sec LPDDR5x. The previous-generation used 512GB/sec LPDDR5x DRAM.
A roughly 20 percent reduction in DRAM performance is pretty substantial. However, what the new GH200 loses in CPU memory bandwidth, it makes up for in vRAM bandwidth to the GPU.
According to Nvidia, the HBM3e memory used in the chip is 50 percent faster than standard HBM3 and is capable of speeds up to 5TB/sec. The larger pool of HBM is also notable as it means customers should be able to fit bigger AI models into fewer systems for inferencing.
- Amazon has more than half of all Arm server CPUs in the world
- Digital Realty: We hear you like your racks dense, how does 70kW sound?
- AMD says it'll jump through Uncle Sam's hoops to sell AI chips to China
- Biden urged to completely cripple AI chips to China
During his SIGGRAPH keynote Tuesday, Huang talked up a dual-superchip configuration. According to Nvidia, a high speed connection between the chips allows them to function logically as a single CPU and GPU resource with 144 CPU cores, eight petaFLOPS of FP8 performance, and 282GB of HBM3e.
"Pretty much you could take just about any large language model you like and put it into this, and it'll inference like crazy," Huang boasted.
For larger workloads, like AI training, the GH200 can be scaled up to 256 chips in a configuration called the DGX GH200. However, apart from more, faster HBM, not much has changed since it showed off the cluster this spring. The chip shop says the 24-rack cluster is still capable of delivering an exaFLOP of FP8 performance.
Huang's comments mirror those of AMD. The ability to fit large language models into a single accelerator – or at least a single server – was a major selling point behind AMD's MI300X GPU, announced during the chip house's June datacenter event. That processor boasts an even larger 192GB pool of vRAM – albeit of the slower HBM3 variety.
Until recently, if you needed more than 80GB of vRAM, your options for sticking with Nvidia were limited. You could add a second GPU to the mix – but that's a rather expensive way to add memory if you don't also need the compute. Here, AMD and Intel were somewhat differentiated with their Instinct and GPU Max cards, respectively, available with up to 128GB of HBM.
However, those eager to get their hands on a higher capacity superchip will have to wait. Nvidia claims the part should arrive in OEM systems sometime in Q2 2024. Many of these systems are likely to use the MGX server specifications, of which the processor giant says its Grace Hopper superchips will be available in any of 100 different variations. ®