Congrats, Nvidia and Google: You're still the best (out of five) at training neural networks

ML Perf could do with more entrants' results

Analysis Nvidia and Google continue to dominate in AI hardware, according to the latest benchmarking results from the ML Perf project published this week.

Tech companies all want a little slice of the machine learning pie: Nvidia has rebranded GPUs for AI, Intel is busy trying to tout its CPUs and push its new ASIC known as the NNP-T out on time, and Google – not traditionally known for its hardware – went and built its own accelerator chip too. Several AI hardware startups have also cropped up, like Graphcore and SambaNova.

So a group of hardware nerds from industry and academia decided to get together to create ML Perf for "fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services".

Despite ML Perf being supported by several big names, it looks like only five companies have bothered submitting results: Google, Nvidia, Intel, Fujitsu, and Alibaba. Notable names like AMD are missing and there's nothing from any of the many AI chip startups either.

It's not a cheap endeavor after all, since it requires renting or buying hundreds or thousands of chips. Or perhaps these companies retracted their results because they don't want to look bad next to their competitors.

"Part of the motivation is that when a submitter makes a mistake, not all mistakes can be fixed after submission. So if a submitter makes a mistake that isn't fixable, they need to be able to retract," David Kanter, co-chair of the MLPerf Inference Working Group, told The Register. When we asked if ML Perf did indeed receive more submissions than what's been made public, he replied: "It's entirely possible."

ML Perf tests different models for four tasks – image recognition, object detection, machine translation, and reinforcement learning. The results are split further into two categories – closed division times and open division times.

Surprised man computer photo via Shutterstock

Pssst.... build your own machine learning computer, it's cheaper and even faster than using GPUs on cloud


Closed division is more useful to compare hardware or software platforms and submissions are bound by specific rules so that the results are more of an "apples-to-apples" comparison. The open division, however, gives developers more flexibility to reach a particular result.

In the closed division, Google and Nvidia have submitted results for the first three tasks. Intel has only published results for reinforcement learning, and Alibaba only has one submission in image classification. So it's difficult to compare them back-to-back.

The fastest times to train an ImageNet, a popular computer vision dataset, on the ResNet-50 architecture was pretty comparable for Nvidia and Google. The Chocolate Factory's TPU3 clocked in 1.28 minutes running on TensorFlow, and Nvidia's was 1.33 minutes using Amazon's MXNet framework.

It all requires pretty ridiculous amounts of hardware, however. You'd need to rent 1,024 TPU3s on Google Cloud or have 1,536 Tesla V100s to hand. Alibaba's only entry trailed behind at 24.37 minutes employing 64 of Nvidia's Tesla V100s connected using PCI Express to shuttle data between GPUs and CPUs more quickly. All of that was running on software called Sinian.

For object recognition on the ResNet-34 neural network, Google came out top at 1.21, again based on 1,024 TPU3s, and Nvidia was second with 2.23 by cramming together 240 Tesla V100s optimised using PyTorch. Nvidia fared better when it came to training the Neural Machine Translation, beating Google by tens of seconds or so using 384 Tesla V100s compared to 1,024 TPU3s. Their times were 1.80 and 2.11.

But when it came to the Transformer model, Google topped in at 0.85 minutes, based on 1,024 TPU3s again, compared to Nvidia's 480 Tesla V100s running on PyTorch. It looks like Intel's only strong point appears to be reinforcement learning. In this category, it entered twice: once using 64 Cascade Lake CLX-8260 chips and the other with two Cascade Lake CLX 9282 chips. The first getup managed to train MiniGo, a system that can play the strategy board game Go, in 14.43 minutes, while the second did this in 77.95 minutes.

Nvidia had a slight edge over Intel in this department, however, and trained MiniGo in 13.57 minutes using 24 Tesla V100s running on TensorFlow.

There's only one open division result. Here, Fujitsu trained a ResNet-50 model on the ImageNet dataset using 2048 Tesla V100s in 1.1 minutes. The developers achieved this by tweaking hyperparameters that were fixed in the closed division, Kanter explained.

So what did we learn from ML Perf? Software tricks can help slash training times, using more chips also trains models more quickly, and Nvidia and Google are leading in AI hardware. Basically, nothing we didn't know already.

If you want a complete breakdown and code from all the results, you can find them here. ®

Similar topics

Broader topics

Other stories you might like

  • Google has more reasons why it doesn't like antitrust law that affects Google
    It'll ruin Gmail, claims web ads giant

    Google has a fresh list of reasons why it opposes tech antitrust legislation making its way through Congress but, like others who've expressed discontent, the ad giant's complaints leave out mention of portions of the proposed law that address said gripes.

    The law bill in question is S.2992, the Senate version of the American Innovation and Choice Online Act (AICOA), which is closer than ever to getting votes in the House and Senate, which could see it advanced to President Biden's desk.

    AICOA prohibits tech companies above a certain size from favoring their own products and services over their competitors. It applies to businesses considered "critical trading partners," meaning the company controls access to a platform through which business users reach their customers. Google, Apple, Amazon, and Meta in one way or another seemingly fall under the scope of this US legislation. 

    Continue reading
  • Makers of ad blockers and browser privacy extensions fear the end is near
    Overhaul of Chrome add-ons set for January, Google says it's for all our own good

    Special report Seven months from now, assuming all goes as planned, Google Chrome will drop support for its legacy extension platform, known as Manifest v2 (Mv2). This is significant if you use a browser extension to, for instance, filter out certain kinds of content and safeguard your privacy.

    Google's Chrome Web Store is supposed to stop accepting Mv2 extension submissions sometime this month. As of January 2023, Chrome will stop running extensions created using Mv2, with limited exceptions for enterprise versions of Chrome operating under corporate policy. And by June 2023, even enterprise versions of Chrome will prevent Mv2 extensions from running.

    The anticipated result will be fewer extensions and less innovation, according to several extension developers.

    Continue reading
  • Nvidia wants to lure you to the Arm side with fresh server bait
    GPU giant promises big advancements with Arm-based Grace CPU, says the software is ready

    Interview 2023 is shaping up to become a big year for Arm-based server chips, and a significant part of this drive will come from Nvidia, which appears steadfast in its belief in the future of Arm, even if it can't own the company.

    Several system vendors are expected to push out servers next year that will use Nvidia's new Arm-based chips. These consist of the Grace Superchip, which combines two of Nvidia's Grace CPUs, and the Grace-Hopper Superchip, which brings together one Grace CPU with one Hopper GPU.

    The vendors lining up servers include American companies like Dell Technologies, HPE and Supermicro, as well Lenovo in Hong Kong, Inspur in China, plus ASUS, Foxconn, Gigabyte, and Wiwynn in Taiwan are also on board. The servers will target application areas where high performance is key: AI training and inference, high-performance computing, digital twins, and cloud gaming and graphics.

    Continue reading
  • I was fired for blowing the whistle on cult's status in Google unit, says contractor
    The internet giant, a doomsday religious sect, and a lawsuit in Silicon Valley

    A former Google video producer has sued the internet giant alleging he was unfairly fired for blowing the whistle on a religious sect that had all but taken over his business unit. 

    The lawsuit demands a jury trial and financial restitution for "religious discrimination, wrongful termination, retaliation and related causes of action." It alleges Peter Lubbers, director of the Google Developer Studio (GDS) film group in which 34-year-old plaintiff Kevin Lloyd worked, is not only a member of The Fellowship of Friends, the exec was influential in growing the studio into a team that, in essence, funneled money back to the fellowship.

    In his complaint [PDF], filed in a California Superior Court in Silicon Valley, Lloyd lays down a case that he was fired for expressing concerns over the fellowship's influence at Google, specifically in the GDS. When these concerns were reported to a manager, Lloyd was told to drop the issue or risk losing his job, it is claimed. 

    Continue reading
  • UK competition watchdog seeks to make mobile browsers, cloud gaming and payments more competitive
    Investigation could help end WebKit monoculture on iOS devices

    The United Kingdom's Competition and Markets Authority (CMA) on Friday said it intends to launch an investigation of Apple's and Google's market power with respect to mobile browsers and cloud gaming, and to take enforcement action against Google for its app store payment practices.

    "When it comes to how people use mobile phones, Apple and Google hold all the cards," said Andrea Coscelli, Chief Executive of the CMA, in a statement. "As good as many of their services and products are, their strong grip on mobile ecosystems allows them to shut out competitors, holding back the British tech sector and limiting choice."

    The decision to open a formal investigation follows the CMA's year-long study of the mobile ecosystem. The competition watchdog's findings have been published in a report that concludes Apple and Google have a duopoly that limits competition.

    Continue reading
  • End of the road for biz living off free G Suite legacy edition
    Firms accustomed to freebies miffed that web giant's largess doesn't last

    After offering free G Suite apps for more than a decade, Google next week plans to discontinue its legacy service – which hasn't been offered to new customers since 2012 – and force business users to transition to a paid subscription for the service's successor, Google Workspace.

    "For businesses, the G Suite legacy free edition will no longer be available after June 27, 2022," Google explains in its support document. "Your account will be automatically transitioned to a paid Google Workspace subscription where we continue to deliver new capabilities to help businesses transform the way they work."

    Small business owners who have relied on the G Suite legacy free edition aren't thrilled that they will have to pay for Workspace or migrate to a rival like Microsoft, which happens to be actively encouraging defectors. As noted by The New York Times on Monday, the approaching deadline has elicited complaints from small firms that bet on Google's cloud productivity apps in the 2006-2012 period and have enjoyed the lack of billing since then.

    Continue reading

Biting the hand that feeds IT © 1998–2022