Nvidia pitches its Tesla hardware as a magical solution for the world’s toughest computing problems. Just move your code that runs well across many processors over to the Tesla boards, and Shazam!. You enjoy sometimes 400 per cent improvements in overall performance.
Despite such mind-blowing increases in horsepower, Tesla continues to occupy a space that one could characterize as ultra-niche. Only the brave few have navigated Nvidia’s CUDA programming apparatus to tweak their code for the general purpose graphics processors inside of the Tesla systems.
That ultra-niche, however, may grow into a niche over the coming year thanks to the introduction of more powerful Tesla systems.
Key to the release today of the Tesla-10 Series processor is the presence of 64-bit, double-precision floating point support. This upgrade lets Nvidia take better care of high performance computing customers – those who make heavy use of mathematical operations – who will likely drive Tesla’s early success.
The Tesla-10 Series chip ships with 240 processing cores – up from 128 cores in the previous product. Although, these are not the beefy cores associated with general purpose chips made by Intel, AMD and others. Instead, they’re little babies that have previously just handled graphics jobs.
Overall, the new chip boasts 1.4bn transistors and 1 Teraflop of computing muscle.
That 1 Teraflop figure is up from half a Teraflop with the older Tesla 8 chip. In addition, the new Tesla chip kicks memory support up to 4GB from 1.5GB, and that’s again a key leap forward for placating the HPC crowd.
The base unit inside of a Tesla chip has been dubbed a Thread Processor Array (TPA). The TPA consists of eight cores, which all have access to a shared memory bank. Nvidia then combines 30 of the TPAs to make a full Tesla 10 chip.
Those customers looking to get into the Tesla game have a couple of system options. Nvidia has rolled out the S1070 box, which is a 1U unit that contains 4 of the Tesla 10 chips. So, that’s 960 cores running at 1.5GHz, reaching 4 Teraflops of performance. The system also holds 16GB of memory, has peak memory bandwidth of 408GB/sec and consumes 700 watts.
Tale of the Tesla Tape
You’ll need to connect the S1070 to a host server with a general purpose CPU via a pair of PCIe Gen2 cables.
If an entire box isn’t your thing, then Nvidia offers up the C1060, which is a cigarette carton-sized device that plugs into the PCIe slot on a motherboard. This puppy holds a single Tesla 10 chip clocked at 1.33GHz, has 4GB of memory and eats up 160 watts. It also has an on-board fan, which is a bit of worry if you think about packing a cluster full of these systems. Damn those moving parts!