Cray confirms that the MTA-2 multithreaded supercomputer, one of the most interesting and unusual machines ever built, has indeed been sidelined by the company.
"The R&D development level has gone down," spokesman Steve Conway told The Register.
Will there be an MTA-3?
"That will all be determined by the market. We've shipped two of the MTA-2s, one to a Japanese customer. It all depends on the market," said Conway.
The MTA represents over twenty years of pioneering work in parallel processing, and the ideas inspired today's SMT Intel chips. But very little about the CMOS-based MTA resembles any of today's high end commercial systems, let alone personal computers.
Each MTA processor handles up to 128 hardware threads, and each thread has its own virtual register file and program counter. The MTA processor is attached to a system board, with up to 4GB of memory per board, and up to eight of these modules can be accommodated in a single MTA system.
But that's only part of the story of this remarkable machine. It's a uniform flat shared memory system, with a full-empty bit for every word of memory providing much faster synchronization. And there's no data cache. So cache coherency - the bane of SMP shared memory systems - isn't a problem. The machine creates a large number of tasks, and ensures that each is execution stream is kept busy.
A vintage slide from the Wayback Machine shows MTA machines far outscaling future Cray systems.
Ironically, Cray was acquired by Tera Computing, from SGI in 2000. Dr Burton Smith, father of the MTA, co-founded Tera in 1987, and the company floated in 1995.
Down in the Dell
Conway defended Cray's decision to focus on services revenues from selling commodity Dell systems in clusters.
"It didn't make a whole lot of sense for us to develop that kind of machine, while Dell is one the best in the world for its economics. Services are a big hole in the market."
But don't the customers for PC clusters, running Beowulf, know exactly how to put such a system together, we wondered, as they devised the technology themselves? And where did leave services revenue?
It's not that easy, says Conway.
"One of the guys who we're talking to has tried to put big four clusters in his work lifetimes, and has broken his pick on them each time," he says.
"We have a ten year history with standard machines using standard processors. The Cray T3E series is the bellweather system. That's still the one everyone is trying to emulate with partial success."
So it's standard microprocessor based systems, and PC clusters. Here's what Tera had to say about this three years ago:-
"In an effort to improve scaling, some vendors have abandoned shared memory and introduced distributed-memory computers. These are also euphemistically called scalable parallel, massively parallel, or cluster computers. Regardless of the name, they all suffer the same basic problem: a truly horrible programming model.
"First, they require that applications be rewritten before they can even be run in parallel. Then, to achieve mediocre levels of performance, they require programs to be carefully tuned to manage communications and data placement. And since these systems are built using off-the-shelf microprocessors, they require further tuning for effective use of their data caches. Finally, these systems all suffer from inadequate communication bandwidth. Parallel applications can never be expected to run as well on these computers as on shared memory systems regardless of the programming effort invested."
Sponsored: Webcast: Simplify data protection on AWS