ClearSpeed plots 1 TeraFlop floating point pizza box
1U rocket ship
IDF Floating point whiz ClearSpeed continues to round out its software play ahead of several major product upgrades.
ClearSpeed ships floating point accelerator cards that slot into x86-based systems. The hardware delivers a major performance boost while only consuming around 30 watts of power. Customers in the high performance computing market - organizations such as national labs and oil and gas firms - have shown the most interest in ClearSpeed's kit to date.
ClearSpeed has tried to make it as easy as possible to write software for its chips. Admittedly, however, pushing multi-threaded software onto a unique architecture comes with some challenges.
To that end, ClearSpeed announced this week at the Intel Developer Forum that its CSXL software library will now "provide plug and play acceleration for the most commonly used 64bit level 3 BLAS and LAPACK functions that underpin the foundations of the vast majority of scientific and engineering applications." This builds on ClearSpeed's work to boost application performance while using standard libraries and without requiring changes to the underlying code.
ClearSpeed has also opened a new software developer community, which has its web presence here. The company hopes the site will serve as a meeting point for developers working on HPC applications.
Lastly, ClearSpeed talked up a research program conducted in conjunction with Intel around crafting software for servers that have both standard x86 chips and accelerators.
While the software bits and pieces are crucial, it's hardware that usually gets Reg readers' mouths watering. Along those lines, ClearSpeed again teamed with Intel at IDF to show off a 16U cluster that notched more than one TeraFlop of performance on the Linpack benchmark. The cluster relied on ClearSpeed's Advance e620 hardware, which can provide up to 80 GigaFlops of peak double precision floating point performance and more than one GigaFlop per watt of sustained Linpack performance.
"The entire cluster had a maximum power consumption of less than 7KW and completed the benchmark in just 14 minutes, half the time required by the non-accelerated system," ClearSpeed said. "The energy used to achieve this TeraFLOP performance was approximately 1.5KWh, costing a mere 15 cents assuming a cost of 10 cents per KWh. With the latest quad core Intel processors, the same performance and energy profile could be compressed into just 10 rack units and cost less than $150,000."
But really, a single TeraFlop in 10U is for the weak.
So, ClearSpeed is expected to show a demonstration unit at the November Supercomputing conference in which it delivers one TeraFlop of acceleration via a single 1U appliance-like system. Based on our chat with ClearSpeed CEO Tom Beese, we think this leap forward will come via much smaller accelerator modules that can fit into things such as blade servers.
Intel spent much of IDF bragging about the floating point performance that we'll see with its "Nehalem" processors in 2008. According to Beese, these general purpose Xeon chips will likely show very competitive results versus ClearSpeed's own product.
"We are very clear that we're not meant to be seen as an alternative to a CPU," Beese said. "We are always meant to be complementary to a CPU. In those terms, it's important to remember that our performance per watt will always be much more efficient than a CPU."
So, basically, there's only so much floating point performance you can squeeze out of even a super-charged Xeon server. Anyone that needs more juice in the same amount of space will have to pick up an accelerator like that from ClearSpeed. This is a crucial proposition for companies or labs that can no longer afford to build out their data center space and for those with little extra power to spare.
Beese said that ClearSpeed will likely roll out a major revision of its products within the next year. That hardware should again help the company leap way past even Intel's speediest chips in terms of floating performance.
ClearSpeed has been at this acceleration game for a long time and now seems to benefit from the broad market interest in co-processors. We're seeing a lot of work being done with FPGAs, graphics chips and tweaked multi-core CPUs aimed to handle very specific software loads, usually in the HPC and media markets.
If nothing else, ClearSpeed can claim a nice lead in this space from a software standpoint, as many of the other accelerator makers struggle to teach coders the ways of their gear. ®