Keeping pace with Nvidia in the GPU wars, Advanced Micro Devices has not only launched its "Lisbon" Opteron 4100 processors but also released the embedded versions of its "Cypress" family of GPUs, a counterpunch to Nvidia's "Fermi" chips used in its Tesla embedded GPUs.
The Cypress GPUs already made their way into the ATI Radeon HD 5870 discrete graphics cards (last October and the ATI FirePro V8800 graphics cards for high-end workstations (back in April). Today, the Cypress GPUs will be plunked into the third generation of FireStream GPU coprocessors intended for embedded applications where the GPUs do complex math that an x64 can't do without both taking its shoes off and pulling its pants down (if it is male) or lifting its shirt up (if it is female).
The Cypress GPU is no slouch, just like Nvidia's Fermi GPUs — and just like Intel and AMD are fierce competitors that get the best of each other every now and again, the competition between AMD and Nvidia drives innovation forward. The Cypress GPU gets the normal fan-cooled packaging for the Radeon HD and FirePro discrete graphics cards, with the major difference being that the FirePro cards has more video memory. With the FireStream GPU co-processors, the units are equipped with a passive heat sink that allows them to slide into rack and tower servers, creating the hybrid x64-GPU systems that many think will soon become the norm in the HPC arena.
Here's the block diagram laying out the Cypress GPU components:
The Cypress chip has 1,600 SIMD engines and a slew of supporting electronics wrapped around them so they can do math with their clothing still intact. The AMD GPU has full support for the DirectCompute 11 and OpenCL 1.0 graphics and number-crunching protocols embedded in its hardware, and also includes 32-bit atomic operations, flexible 32KB local data shares, 64KB global data shares, global synchronization, and append/consume buffers etched onto its silicon.
With all of its cores working properly, the Cypress GPU can deliver 2.72 teraflops of single-precision and 544 gigaflops of double-precision floating point performance. While there are some workloads that can use single-precision just fine (some life sciences and oil and gas exploration apps are fine with single precision), most flop heads care about double-precision. And in this case, the ATI Cypress GPU can hold its own against the best Fermi that Nvidia has. However, Nvidia makes much about the fact that the ATI GPU does not have error correction on its cores and GDDR memory — and AMD acknowledges that's a feature it needs to add.
Double-precision math is more interesting to a lot of organizations looking to do more flops. The first FireStream embedded GPUs, from October 2006, were glorified Radeon X19XX GPUs with only single-precision math. The FireStream 9170s hit 500 single-precision gigaflops and added double-precision math — albeit substantially less than you might expect.
In the summer of 2008, ATI kicked out the FireStream 9250 (1 teraflops SP and 200 gigaflops DP) and 9270 (1.2 teraflops SP and 240 gigaflops SP) embedded GPUs. The 9250s were single-slot devices with 1GB of GDDR3 graphics memory rated at under 120 watts, while the 9270s were double-slotters with 2GB of faster GDDR5 memory rated at 160 watts. These units have fans, which screw up the airflow inside of servers and therefore limited their ability to be adopted in HPC clusters. That's why both Nvidia and AMD are going with passive heat sinks with their latest embedded GPUs.
The new entry-level embedded AMD GPU, the FireStream 9350, is the one to go for if you're looking for the best way to put the most flops in a box. With 2GB of GDDR5 graphics memory, 2 teraflops SP and 400 gigaflops DP performance, it is basically twice the GPU of its predecessor, the FireStream 9250. The FireStream 9350 has 1,440 of its SIMD engine cores working — presumably the other 160 are duds — and runs at 700MHz with a memory clock of 1GHz.
The AMD FireStream 9350 Embedded GPU
At 150 watts, the 9350 embedded GPU runs a little hotter than its predecessor, but an extra 30 watts or so to double the performance is a very good Moore's Law trade-off. And equally importantly, the FireStream 9350, at $799, is cheaper than the 9250 GPU, which cost $999. A teraflops of the FireStream 9250 cards would run you just under $5,000, and with the 9350 GPUs, you're talking just under $2,000 per teraflops.