Congrats, Nvidia and Google: You're still the best (out of five) at training neural networks

ML Perf could do with more entrants' results


Analysis Nvidia and Google continue to dominate in AI hardware, according to the latest benchmarking results from the ML Perf project published this week.

Tech companies all want a little slice of the machine learning pie: Nvidia has rebranded GPUs for AI, Intel is busy trying to tout its CPUs and push its new ASIC known as the NNP-T out on time, and Google – not traditionally known for its hardware – went and built its own accelerator chip too. Several AI hardware startups have also cropped up, like Graphcore and SambaNova.

So a group of hardware nerds from industry and academia decided to get together to create ML Perf for "fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services".

Despite ML Perf being supported by several big names, it looks like only five companies have bothered submitting results: Google, Nvidia, Intel, Fujitsu, and Alibaba. Notable names like AMD are missing and there's nothing from any of the many AI chip startups either.

It's not a cheap endeavor after all, since it requires renting or buying hundreds or thousands of chips. Or perhaps these companies retracted their results because they don't want to look bad next to their competitors.

"Part of the motivation is that when a submitter makes a mistake, not all mistakes can be fixed after submission. So if a submitter makes a mistake that isn't fixable, they need to be able to retract," David Kanter, co-chair of the MLPerf Inference Working Group, told The Register. When we asked if ML Perf did indeed receive more submissions than what's been made public, he replied: "It's entirely possible."

ML Perf tests different models for four tasks – image recognition, object detection, machine translation, and reinforcement learning. The results are split further into two categories – closed division times and open division times.

Surprised man computer photo via Shutterstock

Pssst.... build your own machine learning computer, it's cheaper and even faster than using GPUs on cloud

READ MORE

Closed division is more useful to compare hardware or software platforms and submissions are bound by specific rules so that the results are more of an "apples-to-apples" comparison. The open division, however, gives developers more flexibility to reach a particular result.

In the closed division, Google and Nvidia have submitted results for the first three tasks. Intel has only published results for reinforcement learning, and Alibaba only has one submission in image classification. So it's difficult to compare them back-to-back.

The fastest times to train an ImageNet, a popular computer vision dataset, on the ResNet-50 architecture was pretty comparable for Nvidia and Google. The Chocolate Factory's TPU3 clocked in 1.28 minutes running on TensorFlow, and Nvidia's was 1.33 minutes using Amazon's MXNet framework.

It all requires pretty ridiculous amounts of hardware, however. You'd need to rent 1,024 TPU3s on Google Cloud or have 1,536 Tesla V100s to hand. Alibaba's only entry trailed behind at 24.37 minutes employing 64 of Nvidia's Tesla V100s connected using PCI Express to shuttle data between GPUs and CPUs more quickly. All of that was running on software called Sinian.

For object recognition on the ResNet-34 neural network, Google came out top at 1.21, again based on 1,024 TPU3s, and Nvidia was second with 2.23 by cramming together 240 Tesla V100s optimised using PyTorch. Nvidia fared better when it came to training the Neural Machine Translation, beating Google by tens of seconds or so using 384 Tesla V100s compared to 1,024 TPU3s. Their times were 1.80 and 2.11.

But when it came to the Transformer model, Google topped in at 0.85 minutes, based on 1,024 TPU3s again, compared to Nvidia's 480 Tesla V100s running on PyTorch. It looks like Intel's only strong point appears to be reinforcement learning. In this category, it entered twice: once using 64 Cascade Lake CLX-8260 chips and the other with two Cascade Lake CLX 9282 chips. The first getup managed to train MiniGo, a system that can play the strategy board game Go, in 14.43 minutes, while the second did this in 77.95 minutes.

Nvidia had a slight edge over Intel in this department, however, and trained MiniGo in 13.57 minutes using 24 Tesla V100s running on TensorFlow.

There's only one open division result. Here, Fujitsu trained a ResNet-50 model on the ImageNet dataset using 2048 Tesla V100s in 1.1 minutes. The developers achieved this by tweaking hyperparameters that were fixed in the closed division, Kanter explained.

So what did we learn from ML Perf? Software tricks can help slash training times, using more chips also trains models more quickly, and Nvidia and Google are leading in AI hardware. Basically, nothing we didn't know already.

If you want a complete breakdown and code from all the results, you can find them here. ®

Similar topics

Broader topics


Other stories you might like

  • Google battles bots, puts Workspace admins on alert
    No security alert fatigue here

    Google has added API security tools and Workspace (formerly G-Suite) admin alerts about potentially risky configuration changes such as super admin passwords resets.

    The API capabilities – aptly named "Advanced API Security" – are built on top of Apigee, the API management platform that the web giant bought for $625 million six years ago.

    As API data makes up an increasing amount of internet traffic – Cloudflare says more than 50 percent of all of the traffic it processes is API based, and it's growing twice as fast as traditional web traffic – API security becomes more important to enterprises. Malicious actors can use API calls to bypass network security measures and connect directly to backend systems or launch DDoS attacks.

    Continue reading
  • Nvidia, Siemens tout 'industrial metaverse' to predict the future
    Using Pixar-derived tech to make digital twins immersive

    Siemens and Nvidia don’t want manufacturers to imagine what the future will hold – they want to build a fancy digital twin that helps them to make predictions about whatever comes next.

    During a press conference this week, Siemens CEO Roland Busch painted a picture of a future in which manufacturers are besieged with productivity, labor, and supply chain disruptions.

    "The answer to all of these challenges is technology and digitalization," he said. "The point is, we have to make the digital twin as realistic as possible and bring it as close as possible to the real world."

    Continue reading
  • I was fired for blowing the whistle on cult's status in Google unit, says contractor
    The internet giant, a doomsday religious sect, and a lawsuit in Silicon Valley

    A former Google video producer has sued the internet giant alleging he was unfairly fired for blowing the whistle on a religious sect that had all but taken over his business unit. 

    The lawsuit demands a jury trial and financial restitution for "religious discrimination, wrongful termination, retaliation and related causes of action." It alleges Peter Lubbers, director of the Google Developer Studio (GDS) film group in which 34-year-old plaintiff Kevin Lloyd worked, is not only a member of The Fellowship of Friends, the exec was influential in growing the studio into a team that, in essence, funneled money back to the fellowship.

    In his complaint [PDF], filed in a California Superior Court in Silicon Valley, Lloyd lays down a case that he was fired for expressing concerns over the fellowship's influence at Google, specifically in the GDS. When these concerns were reported to a manager, Lloyd was told to drop the issue or risk losing his job, it is claimed. 

    Continue reading
  • FTC urged to probe Apple, Google for enabling ‘intense system of surveillance’
    Ad tracking poses a privacy and security risk in post-Roe America, lawmakers warn

    Democrat lawmakers want the FTC to investigate Apple and Google's online ad trackers, which they say amount to unfair and deceptive business practices and pose a privacy and security risk to people using the tech giants' mobile devices.

    US Senators Ron Wyden (D-OR), Elizabeth Warren (D-MA), and Cory Booker (D-NJ) and House Representative Sara Jacobs (D-CA) requested on Friday that the watchdog launch a probe into Apple and Google, hours before the US Supreme Court overturned Roe v. Wade, clearing the way for individual states to ban access to abortions. 

    In the days leading up to the court's action, some of these same lawmakers had also introduced data privacy bills, including a proposal that would make it illegal for data brokers to sell sensitive location and health information of individuals' medical treatment.

    Continue reading

Biting the hand that feeds IT © 1998–2022