Nvidia welcomes Intel into AI era: Fancy a benchmark deathmatch?
We love your deep learning benchmark 'mistakes'
HPC blog Nvidia just fired the first salvo in what promises to be a classic and long-lived benchmark death match vs Intel. In a webpage titled "Correcting Intel's Deep Learning Benchmark Mistakes," Nvidia claimed that Intel was using outdated GPU benchmark results and non-current hardware comparisons to show off its new Knights Landing Xeon Phi processors.
Nvidia called out three Intel claims in particular:
"Xeon Phi is 2.3 times faster in training than GPUs." This claim was made in a press presentation delivered at ISC'16 and on an Intel-produced "fact sheet" (PDFs available here and here). It specifically refers to a stat at the left side of slide 12 (and the second page of the fact sheet) where Intel claims Phi is 2.3 times faster on the AlexNet image training on a DNN (deep neural network).
Nvidia alleges that Intel is using 18-month-old AlexNet numbers for Nvidia (based on a Maxwell system), while using farm-fresh numbers for the Intel Phi.
According to Nvidia, its Pascal processors in the same four-accelerator configuration outperform Intel's Phi by 1.9 times. It also claims its new NVIDIA 8-GPU DGX-1 dedicated DNN training machine can complete AlexNet in two hours, outshining the 4 Phi system by 5.3 times. Ouch.
"Xeon Phi offers 38 per cent better scaling than GPUs across nodes." This claim also occurs in both of the Intel documents referenced above. In this case, Intel is saying that their Phi systems scale better than GPU-equipped boxes, namely when it comes to 32-way/accelerator configurations.
According to Nvidia, Intel is using four-year-old numbers from Oak Ridge's Titan machine, which was using the old Jaguar interconnect and old K20 GPUs, as a comparison to Intel's brand-new Omni Path Architecture connected Phi processors running deep learning workloads.
It points out Baidu-published specs from its speech training workload that show near linear GPU scaling not just to 32 nodes, but to 128 nodes. Ouch again.
"Xeon Phi delivers 50 times scaling on 128 nodes." I didn't see this exact claim in the Intel documents, but there were a lot of claims flying around, so I could have missed it. Whether it's there or not, Nvidia responded to it by again pointing to the near-linear Baidu 128-GPU node result. By the by, getting a 50 times speed up by adding 128 times more resources isn't the kind of scalability you write home about, you know?
What's funny to me is that at the end of Nvidia's "correction" webpage, it welcomes Intel to the era of AI, with an additional admonition that "they should get their facts straight." Hmmm.
But I'd like to see something more along the lines of the old, and unpublished, Data General ad where they are welcoming IBM to the minicomputer market. The two-line ad read: "They say that IBM's Entry Into Minicomputers Will Legitimize the Market ... The Bastards Say, Welcome."
As the budding Intel-Nvidia war develops, we're sure to see shots flying back and forth – maybe we'll even see an Informix-Oracle-like billboard fight like we did in the 1990s? The Highway 101 billboard owners should start writing their proposals now... ®