Ampere, Nvidia's latest GPU architecture is finally here – spanking-new acceleration for AI across the board

Your guide to the A100

Video Nvidia has lifted the lid on a fresh line of products based on its latest Ampere architecture, revealing its latest A100 GPU - which promises to be 20X more powerful than its predecessor and capable of powering AI supercomputers – as well as a smaller chip for running machine learning workloads on IoT devices.

CEO Jensen Huang normally launches new goodies during the company’s annual GPU Technology Conference, which was due to take place from 22 to 26 March. But the event, estimated to draw in 10,000 people in Silicon Valley’s San Jose McEnery Convention Center, was cancelled as the novel coronavirus spread across the world. Instead, Huang has decided to pre-record his keynote speech and stick it up as a YouTube video instead.

The announcements made this year are particularly noteworthy as Nvidia has finally introduced the novel architecture codenamed Ampere, overtaking previous generations as the most powerful set of GPUs available yet. Here are the main highlights:

Presenting the A100 - the core of the latest Ampere build

The chip at the heart of Nvidia’s efforts to build AI supercomputers and servers capable of training giant neural networks or crunching through computationally intensive machine learning workloads is the A100. You can get the full corporate view of it here:

Youtube Video

Each one is packed with 54 billion transistors, making it the world’s largest 7nm-process chip fabricated by TSMC. The A100 has 40GB HBM2 memory - that’s 8GB more than the previous Tesla Volta V100 - and has a memory bandwidth of 1.6TB per second and delivers up to 312 TFLOPS using 32-bit FP when training AI models and 1,248 TOPS at INT8 for running inference. At those numbers, that’s a 20X boost in performance compared to the V100. It can also operate at 9.7 TFLOPS at 64-bit FP at a maximum power of 400 Watts.


Nvidia's A100 - click to enlarge.

“NVIDIA A100 GPU is a 20X AI performance leap and an end-to-end machine learning accelerator – from data analytics to training to inference,” Huang said. “For the first time, scale-up and scale-out workloads can be accelerated on one platform. NVIDIA A100 will simultaneously boost throughput and drive down the cost of data centers.”

By “scale-up”, Huang is referring to utilising multiple A100 chips to build servers for high performance computing. On the hand, “scale-out” means to split them up to carry out smaller, independent workloads for inference - more on this later.

Okay, back to the just the single A100 chip: The die on the board measures 826 mm2, and contains 432 TF32 Tensor Cores capable of handling a range of precisions, including FP32 and FP16 at an interconnect bandwidth of 600 GB per second. There are also a couple new and improved features, including something called the multi-instance GPU (MIG) and structural sparsity.

MIG enables a single A100 to be split into seven separate GPUs, each one can carry out different compute jobs of various sizes to optimise usage. Structural sparsity helps researchers carry out matrix math operations up to two times more quickly when the matrices are sparse - meaning the array contains a lot of zeroes - which takes up memory and compute.

The big eight - DGX

What happens when you stick eight A100s together? Well, you get the DGX A100 system to train models on particularly large datasets or for supercomputing clusters.

The DGX A100 is beefed up with 320 GB HBM2 memory to deliver five petaflops of power with a bandwidth of 12.4 TB per second. The eight A100s are connected using six NVSwitch interconnects that support 4.8TB per second of bi-directional bandwidth. It also employs Nvidia Mellanox ConnectX-6 HDR so the system can be hooked up to other network interfaces at a speed of 3.6 TB per second.

Each DGX A100 can be operated as one single large system or split into 56 different instances. Servers typically contain clusters of different types for storage, CPUs, training and inference. Some are over utilised and some are starved at different times of the day, Paresh Kharya, director of product marketing for Accelerated Computing at NVIDIA, said during a press briefing on Wednesday.


Eight become one - click to enlarge.

Nvidia claimed that a single rack of five DGX A100s could replace a whole data center that has been split for training and inference of AI models for “1/20th the power, 1/25th the space and 1/10th the cost.” “That’s why Jensen says ‘the more you buy, the more you save’,” Kharya gushed.

Its first customer is the Argonne National Laboratory, a US Department of Energy research facility in Illinois, where researchers are using supercomputers to combat COVID-19.

“The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days,” said Rick Stevens, Argonne’s associate laboratory director for Computing, Environment and Life Sciences.

The DGX A100 is available now at a cost of $199,000.

Nvidia also has orders from other national research facilities in other countries, including the UAE Artificial Intelligence Office in the United Arab Emirates and VinAI Research in Vietnam.

Now this is POD racing

If you’re looking for even more computational power, then there’s the option of Nvidia’s DGX SuperPOD made up of 140 DGX A100 systems.

The behemoth cluster can reach a performance of 700 petaflops, making it equivalent to a top 20 supercomputer. In fact, Nvidia has updated its own internal supercomputer known as SaturnV with four DGX SuperPODS - or 1,120 A100s - to add another 2.8 exaFLOPS of power. SaturnV, made up of multiple clusters in various locations, can now operate at a total capacity of 4.6 exaFLOPS. Kharya said it was the world’s fastest AI supercomputer today.


Supersize me - click to enlarge.

EGX A100

Jumping from the very large DGX SuperPOD, Nvidia has jumped back to the small EGX A100 made to process data coming in from sensors on IoT devices, whether it’s a camera or a smart refrigerator.

The EGX A100 can receive up to 200GB per second of data. The new architecture also allows data from IoT sensors processed by the chip to be encrypted before it’s sent directly to a GPU, making it more secure for applications in healthcare or retail.

“The fusion of IoT and AI has launched the ‘smart everything’ revolution,” said Huang. “Large industries can now offer intelligent connected products and services like the phone industry has with the smartphone. NVIDIA’s EGX Edge AI platform transforms a standard server into a mini, cloud-native, secure, AI data center. With our AI application frameworks, companies can build AI services ranging from smart retail to robotic factories to automated call centers.”

A spokesperson told The Register: “We’re not ready to disclose the full specs of these NVIDIA Ampere-class GPUs. We’ve announced the architecture with NVIDIA Ampere GPUs and Mellanox ConnextX-6 Dx so we can begin to engage the software ecosystem and use the new security and edge feature of these cards. We’ll share more specs as we get closer to shipping.”

It will be available to customers at the end of the year.

Finally, the newest and smallest chip Nvidia has to offer is the credit card-sized EGX Jetson Xavier NX for microservers.

Each module can pack up to 21 TOPS when operating at 15 Watts or 14 TOPs at 10 Watts and is also made to quickly analyse data incoming from IoT sensors. They are available to order now. ®

Similar topics

Other stories you might like

  • New York City rips out last city-owned public payphones
    Y'know, those large cellphones fixed in place that you share with everyone and have to put coins in. Y'know, those metal disks representing...

    New York City this week ripped out its last municipally-owned payphones from Times Square to make room for Wi-Fi kiosks from city infrastructure project LinkNYC.

    "NYC's last free-standing payphones were removed today; they'll be replaced with a Link, boosting accessibility and connectivity across the city," LinkNYC said via Twitter.

    Manhattan Borough President Mark Levine said, "Truly the end of an era but also, hopefully, the start of a new one with more equity in technology access!"

    Continue reading
  • Cheers ransomware hits VMware ESXi systems
    Now we can say extortionware has jumped the shark

    Another ransomware strain is targeting VMware ESXi servers, which have been the focus of extortionists and other miscreants in recent months.

    ESXi, a bare-metal hypervisor used by a broad range of organizations throughout the world, has become the target of such ransomware families as LockBit, Hive, and RansomEXX. The ubiquitous use of the technology, and the size of some companies that use it has made it an efficient way for crooks to infect large numbers of virtualized systems and connected devices and equipment, according to researchers with Trend Micro.

    "ESXi is widely used in enterprise settings for server virtualization," Trend Micro noted in a write-up this week. "It is therefore a popular target for ransomware attacks … Compromising ESXi servers has been a scheme used by some notorious cybercriminal groups because it is a means to swiftly spread the ransomware to many devices."

    Continue reading
  • Twitter founder Dorsey beats hasty retweet from the board
    We'll see you around the Block

    Twitter has officially entered the post-Dorsey age: its founder and two-time CEO's board term expired Wednesday, marking the first time the social media company hasn't had him around in some capacity.

    Jack Dorsey announced his resignation as Twitter chief exec in November 2021, and passed the baton to Parag Agrawal while remaining on the board. Now that board term has ended, and Dorsey has stepped down as expected. Agrawal has taken Dorsey's board seat; Salesforce co-CEO Bret Taylor has assumed the role of Twitter's board chair. 

    In his resignation announcement, Dorsey – who co-founded and is CEO of Block (formerly Square) – said having founders leading the companies they created can be severely limiting for an organization and can serve as a single point of failure. "I believe it's critical a company can stand on its own, free of its founder's influence or direction," Dorsey said. He didn't respond to a request for further comment today. 

    Continue reading
  • Snowflake stock drops as some top customers cut usage
    You might say its valuation is melting away

    IPO darling Snowflake's share price took a beating in an already bearish market for tech stocks after filing weaker than expected financial guidance amid a slowdown in orders from some of its largest customers.

    For its first quarter of fiscal 2023, ended April 30, Snowflake's revenue grew 85 percent year-on-year to $422.4 million. The company made an operating loss of $188.8 million, albeit down from $205.6 million a year ago.

    Although surpassing revenue expectations, the cloud-based data warehousing business saw its valuation tumble 16 percent in extended trading on Wednesday. Its stock price dived from $133 apiece to $117 in after-hours trading, and today is cruising back at $127. That stumble arrived amid a general tech stock sell-off some observers said was overdue.

    Continue reading
  • Amazon investors nuke proposed ethics overhaul and say yes to $212m CEO pay
    Workplace safety, labor organizing, sustainability and, um, wage 'fairness' all struck down in vote

    Amazon CEO Andy Jassy's first shareholder meeting was a rousing success for Amazon leadership and Jassy's bank account. But for activist investors intent on making Amazon more open and transparent, it was nothing short of a disaster.

    While actual voting results haven't been released yet, Amazon general counsel David Zapolsky told Reuters that stock owners voted down fifteen shareholder resolutions addressing topics including workplace safety, labor organizing, sustainability, and pay fairness. Amazon's board recommended voting no on all of the proposals.

    Jassy and the board scored additional victories in the form of shareholder approval for board appointments, executive compensation and a 20-for-1 stock split. Jassy's executive compensation package, which is tied to Amazon stock price and mostly delivered as stock awards over a multi-year period, was $212 million in 2021. 

    Continue reading
  • Confirmed: Broadcom, VMware agree to $61b merger
    Unless anyone out there can make a better offer. Oh, Elon?

    Broadcom has confirmed it intends to acquire VMware in a deal that looks set to be worth $61 billion, if it goes ahead: the agreement provides for a “go-shop” provision under which the virtualization giant may solicit alternative offers.

    Rumors of the proposed merger emerged earlier this week, amid much speculation, but neither of the companies was prepared to comment on the deal before today, when it was disclosed that the boards of directors of both organizations have unanimously approved the agreement.

    Michael Dell and Silver Lake investors, which own just over half of the outstanding shares in VMware between both, have apparently signed support agreements to vote in favor of the transaction, so long as the VMware board continues to recommend the proposed transaction with chip designer Broadcom.

    Continue reading

Biting the hand that feeds IT © 1998–2022