Nvidia blows out Moore’s Law with fresh Tesla

Insane horsepower for the HPC geek on the go


Nvidia pitches its Tesla hardware as a magical solution for the world’s toughest computing problems. Just move your code that runs well across many processors over to the Tesla boards, and Shazam!. You enjoy sometimes 400 per cent improvements in overall performance.

Despite such mind-blowing increases in horsepower, Tesla continues to occupy a space that one could characterize as ultra-niche. Only the brave few have navigated Nvidia’s CUDA programming apparatus to tweak their code for the general purpose graphics processors inside of the Tesla systems.

That ultra-niche, however, may grow into a niche over the coming year thanks to the introduction of more powerful Tesla systems.

Key to the release today of the Tesla-10 Series processor is the presence of 64-bit, double-precision floating point support. This upgrade lets Nvidia take better care of high performance computing customers – those who make heavy use of mathematical operations – who will likely drive Tesla’s early success.

The Tesla-10 Series chip ships with 240 processing cores – up from 128 cores in the previous product. Although, these are not the beefy cores associated with general purpose chips made by Intel, AMD and others. Instead, they’re little babies that have previously just handled graphics jobs.

Overall, the new chip boasts 1.4bn transistors and 1 Teraflop of computing muscle.

That 1 Teraflop figure is up from half a Teraflop with the older Tesla 8 chip. In addition, the new Tesla chip kicks memory support up to 4GB from 1.5GB, and that’s again a key leap forward for placating the HPC crowd.

The base unit inside of a Tesla chip has been dubbed a Thread Processor Array (TPA). The TPA consists of eight cores, which all have access to a shared memory bank. Nvidia then combines 30 of the TPAs to make a full Tesla 10 chip.

Those customers looking to get into the Tesla game have a couple of system options. Nvidia has rolled out the S1070 box, which is a 1U unit that contains 4 of the Tesla 10 chips. So, that’s 960 cores running at 1.5GHz, reaching 4 Teraflops of performance. The system also holds 16GB of memory, has peak memory bandwidth of 408GB/sec and consumes 700 watts.

Comparison slide of Nvidia's old and new Tesla gear

Tale of the Tesla Tape

You’ll need to connect the S1070 to a host server with a general purpose CPU via a pair of PCIe Gen2 cables.

If an entire box isn’t your thing, then Nvidia offers up the C1060, which is a cigarette carton-sized device that plugs into the PCIe slot on a motherboard. This puppy holds a single Tesla 10 chip clocked at 1.33GHz, has 4GB of memory and eats up 160 watts. It also has an on-board fan, which is a bit of worry if you think about packing a cluster full of these systems. Damn those moving parts!

Broader topics


Other stories you might like

  • AMD nearly doubles Top500 supercomputer hardware share
    Intel loses out as Instinct GPUs power the world’s fastest big-iron system

    Analysis In a sign of how meteoric AMD's resurgence in high performance computing has become, the latest list of the world's 500 fastest publicly known supercomputers shows the chip designer has become a darling among organizations deploying x86-based HPC clusters.

    The most eye-catching bit of AMD news among the supercomputing set is that the announcement of the Frontier supercomputer at the US Department of Energy's Oak Ridge National Laboratory, which displaced Japan's Arm-based Fugaku cluster for the No. 1 spot on the Top500 list of the world's most-powerful publicly known systems.

    Top500 updates its list twice a year and published its most recent update on Monday.

    Continue reading
  • Nvidia taps Intel’s Sapphire Rapids CPU for Hopper-powered DGX H100
    A win against AMD as a much bigger war over AI compute plays out

    Nvidia has chosen Intel's next-generation Xeon Scalable processor, known as Sapphire Rapids, to go inside its upcoming DGX H100 AI system to showcase its flagship H100 GPU.

    Jensen Huang, co-founder and CEO of Nvidia, confirmed the CPU choice during a fireside chat Tuesday at the BofA Securities 2022 Global Technology Conference. Nvidia positions the DGX family as the premier vehicle for its datacenter GPUs, pre-loading the machines with its software and optimizing them to provide the fastest AI performance as individual systems or in large supercomputer clusters.

    Huang's confirmation answers a question we and other observers have had about which next-generation x86 server CPU the new DGX system would use since it was announced in March.

    Continue reading
  • Intel’s Falcon Shores XPU to mix ‘n’ match CPUs, GPUs within processor package
    x86 giant now has an HPC roadmap, which includes successor to Ponte Vecchio

    After a few years of teasing Ponte Vecchio – the powerful GPU that will go into what will become one of the fastest supercomputers in the world – Intel is sharing more details of the high-performance computing chips that will follow, and one of them will combine CPUs and GPUs in one package.

    The semiconductor giant shared the details Tuesday in a roadmap update for its HPC-focused products at the International Supercomputing Conference in Hamburg, Germany.

    Intel has only recently carved out a separate group of products for HPC applications because it is now developing versions of Xeon Scalable CPUs, starting with a high-bandwidth-memory (HBM) variant of the forthcoming Sapphire Rapids chips, for high-performance kit. This chip will sport up to 64GB of HBM2e memory, which will give it quick access to very large datasets.

    Continue reading

Biting the hand that feeds IT © 1998–2022