Fellow from AMD ridicules Cell as accelerator weakling

Opteron and GPU will conquer all


IBM's Cell chip will struggle to woo server customers looking to turbo charge certain applications because the part has a fundamental design flaw, according to AMD fellow and acceleration chief Chuck Moore.

Sure, sure. Cell is a multimedia throughput dynamo and its SPEs (Synergistic Processing Elements) are just lovely. "But something happened on the way to the ranch," Moore said, speaking this week to a group of Stanford students. "You have to get going first on the PowerPC chip (inside Cell), and the PowerPC core is too weak to act as the central controller."

Moore presented the Stanford students with a possible vision for the future of computing where general purpose processors will function as a type of gateway, handling older code on their own and then funneling new types of software off to specialized silicon. Not surprisingly, Moore sees AMD's Opteron processor as the perfect general purpose chip and the GPUs produced by the ATI clan - rather than Cell chips - as the preferred accelerators for the specialized jobs.

The plan of attack presented by Moore will sound familiar to those of you following current trends in software and hardware development. The rise of multi-core processors has forced coders to adopt parallel programming methods that spread software well across chips with numerous engines. In addition, researchers and companies on the cutting edge of high performance computing are looking at a variety of accelerators, including GPUs and FPGAs to speed up certain libraries and applications.

Like others, Moore argued that we'll soon run into a major software issue, as too few applications will be able to deal with many-cored chips. Things look okay with two, four and even eight core chips, but we're in real trouble after that.

Some of the main issues will arise with the operating system, which handles a lot of the scheduling jobs.

"If you think about it, the OS has a scheduler in it, and it schedules to multiple cores," Moore said. "So, the OS kind of has a serial component to it. . . At some point, the OS starts to get in the way, and the OS actually becomes the bottleneck."

Accelerators present problems as well, since they're a notorious programming pain for developers more acquainted with things like the x86 instruction set. The Cell chip from IBM, Toshiba and Sony receives a ton of grief for being programming beast - a fact also highlighted by Moore.

Plenty of people argue that GPUs are just as much of a pain, but Moore sees the graphics chip route as a realistic answer to dealing with tomorrow's software.

His "throughput machine" would include a number of Opteron chips up front to handle existing software and to crunch through single-threaded code. Then, you combine the Opterons with "a large number of small, power-efficient, domain optimized compute offload engines."

On top of all this, you need a better memory system and a better programming model that lives well above the operating system.

"The reason I am working on this right now is that I honestly do believe that new and emerging applications are defining and operating on much larger scale and more abstract data types.

"The way this would look is a traditional host would offload work to these dense compute accelerators. You would go through APIs, or libraries or domain specific libraries in some cases to avoid the heroic programming. You would use a concurrent runtime environment to ease some of the scheduling and resource management issues.

"And out of that what starts to happen - and this is an interesting result - is that today the industry is sort of locked on ISA compatibility. You are either x86 compatible or you are not. But I think this line of thought leads to API and platform level compatibility, which is a really nice result for the entire industry.

"Maybe it is not such a nice result for AMD because we happen to have a very successful franchise with x86. But I think this is just absolutely inevitable. I don't think we can fight it, so we are embracing it."

Overall, Moore argued that these heterogeneous machines with x86 and GPU processors will make more sense moving forward than the so-called many-cored chips that the likes of Sun and Intel are pursuing where software is spread across tens or even hundreds of similar cores. Of course, there are tons of software questions that need answers before we can fulfill Moore's vision.

You can catch Moore's speech here. ®

Similar topics


Other stories you might like

  • IBM ordered to hand over ex-CEO emails plotting cuts in older workers
    Infamous 'Dinobabies' memo comes back to haunt Big Blue again

    Updated In one of the many ongoing age discrimination lawsuits against IBM, Big Blue has been ordered to produce internal emails in which former CEO Ginny Rometty and former SVP of Human Resources Diane Gherson discuss efforts to get rid of older employees.

    IBM as recently as February denied any "systemic age discrimination" ever occurred at the mainframe giant, despite the August 31, 2020 finding by the US Equal Employment Opportunity Commission (EEOC) that "top-down messaging from IBM’s highest ranks directing managers to engage in an aggressive approach to significantly reduce the headcount of older workers to make room for Early Professional Hires."

    The court's description of these emails between executives further contradicts IBM's assertions and supports claims of age discrimination raised by a 2018 report from ProPublica and Mother Jones, by other sources prior to that, and by numerous lawsuits.

    Continue reading
  • Nvidia taps Intel’s Sapphire Rapids CPU for Hopper-powered DGX H100
    A win against AMD as a much bigger war over AI compute plays out

    Nvidia has chosen Intel's next-generation Xeon Scalable processor, known as Sapphire Rapids, to go inside its upcoming DGX H100 AI system to showcase its flagship H100 GPU.

    Jensen Huang, co-founder and CEO of Nvidia, confirmed the CPU choice during a fireside chat Tuesday at the BofA Securities 2022 Global Technology Conference. Nvidia positions the DGX family as the premier vehicle for its datacenter GPUs, pre-loading the machines with its software and optimizing them to provide the fastest AI performance as individual systems or in large supercomputer clusters.

    Huang's confirmation answers a question we and other observers have had about which next-generation x86 server CPU the new DGX system would use since it was announced in March.

    Continue reading
  • AMD nearly doubles Top500 supercomputer hardware share
    Intel loses out as Instinct GPUs power the world’s fastest big-iron system

    Analysis In a sign of how meteoric AMD's resurgence in high performance computing has become, the latest list of the world's 500 fastest publicly known supercomputers shows the chip designer has become a darling among organizations deploying x86-based HPC clusters.

    The most eye-catching bit of AMD news among the supercomputing set is that the announcement of the Frontier supercomputer at the US Department of Energy's Oak Ridge National Laboratory, which displaced Japan's Arm-based Fugaku cluster for the No. 1 spot on the Top500 list of the world's most-powerful publicly known systems.

    Top500 updates its list twice a year and published its most recent update on Monday.

    Continue reading

Biting the hand that feeds IT © 1998–2022