Itanic: It's all academic now – Official

UCB course explains the great 64bit misadventure

The name Itanic, coined here several years ago by Mike Magee, for Intel's IA-64 processor has been formally adopted by academia.

Nick Weaver, a 28-year old graduate student and researcher, teaches computer science classes at the University of California's Berkeley school, and as you can see from his "special topics" class, week 16 next month will be devoted to the "Voyage of the Itanic".

"Itanic describes the architecture very well," he tells us, explaining that the processor contains great ideas and "beautiful features" that ultimately were compromised by terrible subsequent design choices and "feature creep". We invited him to elaborate:

"The first good idea was one they explored several years ago in a paper from HP. That explains that if you have 64 or 28 registers with 64 or 128 condition registers and every instruction is being conditionally executed, then to get to a statically-scheduled superscalar - you'd get benefits of VLIW- issue logic, which is very simple, without the upgrade problem that VLIW has."

What upgrade problem?

"VLIW doesn't scale because the compiler statically issues for the number of function units in the VLIW architecture . So for a new version of VLIW you have to recompile. Transmeta gets around this by always recompiling, and it's not a problem in the DSP community. But it is a problem if you want to do a 'general purpose' processor"

Nick commends deferred exception handling and low-cost checkpointing as two "beautiful features" of the IA-64 architecture.

"This also interacts well with the speculative techniques derived from the first part: You can speculatively execute both sides of a branch, allow cache misses to be errors which are deferred, and you only take the penalty if there is both a cache miss which is on the branch who's result you want. They make this very nice by propagating error


But then the trouble began. His lecture calls Itanic an "unquestionable disaster..."greatly increasing implementation complexity without really providing a benefit to either compiler writers or to performance".

Like what?

"The rotating register file, and the register window notion - they have a thousand registers in there, and the larger the register file the slower it is and the more it costs.

"The rotating register file is there to make software pipelining easier - fills - but it's not actually a big win - and if you ask the compiler writers they go 'why do you this?'". The register window also incurs a performance penalty when doing garbage collection, he suggests.

"The other issue is that combining both features requires that the processor be able to effectively arbitrarily remap a number of logical registers - 128 to the physical registers, which number arounda a thousand. This occurs in a traditional out-of-order machine, but the whole point of EPIC is was to enable lots of parallelism without introducing these complications."

As for the future, he says it isn't completely hopeless. " Once you add a feature to the instruction set it's hard to take it away," he tells us.

Itanic could be helped by a process shrink and wider issue width, by real vector (Cray-like) instructions, but most of all by compiler improvements.

"The right compiler can theoretically allow Itanium to issue 6 instructions/cycle, but at the same time, if the compiler speculates too heavily, this wastes too much effort and performance suffers."

"The MMX and SSD [Screaming Sindy] instruction sets were attempts to build a vector ISA, but they have a few issues, notably too small a vector. Vector machines have a bit of a bad reputation today, one that is, in my opinion unjustified.

Yammer the Hammer

No, he hasn't seen what he describes as "rumors" of Yamhill, Intel's own 64bit skunkworks project, but commends AMD's Hammer as a better approach.

"I think AMD is on the right track," says Nick.

"They've made the core simpler, and that makes it smaller, leaving room for much larger caches."

"The Hammer approach is 'we know how to do a CISC to RISC, how to make that RISC very fast, we know what few changes in the instruction set architecture would make it lots better - 16 general purpose registers, 16 floating point registers instead of the 8 entry stack from the 386 days - so let's do that'.

"This also gave them the ability to really concentrate on the interfaces. The memory interface is on the processor, and the traditional bus has been replaced by networks to communicate with the I/O. This allows glueless 4 way SMP setups, better I/O bandwidth, and better memory latency. It cuts out the chipset when going to memory, which saves 2 pin crossings and a bunch of traditionally slower chipset logic."

"If you want backward compatibility and performance, go Hammer," he recommends. "If you want backward compatibility and performance isn't such an issue, buy Transmeta to translate that old code."

Just my opinion, says Nick. Anyone beg to differ? ®


Similar topics


Send us news

Other stories you might like