Hot Chips Fujitsu wants to squeeze more performance out of its homegrown Sparc64 X processor for commercial and supercomputing workloads, and it can't wait until Taiwan Semiconductor Manufacturing Corp, its foundry partner, gets 20 nanometer processes into the field and ramped.
So the engineers at Fujitsu have gone back over the design for the existing Sparc64 X and added some tweaks to goose the performance while at the same time pushing up the clock speeds a little bit to create the Sparc64 X+ processor.
The Sparc64 X was the result of the convergence two chip lines created by Fujitsu: The Sparc64-VII+ for commercial Solaris servers sold by Sun Microsystems and then Oracle as well as by Fujitsu and the Sparc64-VIIIfx created specifically for the 10.5 petaflops K supercomputer built by the company for the Japanese government.
The Sparc64 X+ does not represent a big change over its predecessor, but has some features that will no doubt make it appealing to the large enterprises - mostly in Japan and Europe - who still buy Sparc M series machines running Solaris to do their big back-end jobs.
Toshio Yoshida, director of processor development for the Enterprise Server Business unit at Fujitsu, walked through the changes the company made to etch the Sparc64 X+ processor at the Hot Chips conference, hosted by the IEEE at Stanford University this week.
This is not just a case of publish or perish, but giving big iron customers whatever performance the engineers can get out of the design until a new chip can be brought to bear. This is, using the Intel parlance, neither a tick nor a tock, but a nip and a tuck.
The Sparc64 X+ processor
With sixteen cores on a die with the Sparc64 X and X+ chips, it is hard to imagine that Fujitsu will add cores with the future Sparc64 XI chip (if that is indeed what it is going to be called), but it is reasonable to guess that Fujitsu will add more cache memory, add more threads to each core, and boost clock speeds with whatever process shrink it can get from TSMC to keep pushing the performance of its Sparc M systems up.
The Sparc64 X+ chip has sixteen cores, each with simultaneous multithreading to yield two virtual threads for instructions per core. The chip has 24MB of on-chip L2 cache, implemented in two segments, and has two DDR3 memory controllers as well as two SERDES controllers and PCI-Express 3.0 and system interconnect circuits on the die.
The chip is 24 by 25 millimeters (600 square meters) in area and crams 2.99 billion transistors in that space. It has 1,500 signal pins. And it is socket compatible with the existing Sparc64 X chips, which is a plus for Fujitsu's customers.
The Sparc64 X+ core
The upgraded Sparc64 chip will have a target frequency of 3.5GHz and higher. The Sparc64 X chip previewed this time last year at Hot Chips ran at 3GHz, the same clock speed as Oracle's Sparc T5 and M5 processors (also fabbed by TSMC in 28 nanometer processors, by the way). That's 16.7 per cent higher clock speeds, and Fujitsu customers will take it if they have beastly jobs that like few threads and clocks as high as they can get them.
Yoshida said that at that 3.5GHz, the Sparc64 X+ will deliver 448 gigaflops of peak double-precision floating point oomph, up 17.2 per cent from the 382 gigaflops that the Sparc64 X could do running at 3GHz.
Block diagram of the Sparc64 X+ processor
The Sparc64 X+ chip delivers 102GB/sec of aggregate throughput across the memory controllers. Yoshida did not want to divulge the bandwidth between the L2 cache on the chip and the main memory controllers. Each Sparc64 X+ socket can be configured with up to 1TB of main memory, which yields a top-end 64 socket Sparc M series machine with 64TB of memory.
Oracle's future Sparc M machines using its homegrown Sparc M6 processor will have 96 sockets and 96TB of memory, but only have twelve cores on the die to Fujitsu's sixteen with the Sparc64 X+ chip. Oracle has four times as many threads per core, too.
With the latest iteration of its Sparc64 core, Fujitsu is doing a number of other things to goose performance.
There is a new instruction to accelerate the RSA encryption sign library, which boosts its performance by 37 per cent, and in the decimal math unit, the ADD function in the NUMBER library has some circuit tweaks to speed it up by 64 per cent and the MULTIPLY function has transistors that speed it up by 32 per cent. (These figures include the effect of the clock speed boost and the change in the "software on chip" functions, as Fujitsu calls its accelerators.)
Bit vector and integer byte compare functions in the database acceleration functions on the Sparc64 chip, which debuted with the Sparc64 X, are enhanced as well. Performance figures for these improvements were not given out.
The glueless SMP interconnect for the Sparc M series servers
The Sparc64-X+ chip uses the same glueless interconnect to create a four-way system board. The system board has two crossbar switches (XB in the diagram above) that have around 168GB/sec of bandwidth into and out of that system board.
With the prior generation of Sparc M machines, those lanes in the XBs ran at 14.5Gb/sec, but with the Sparc64 X+, they now run at 25Gb/sec. Multiple four-way system boards can be linked to each other with additional XBs, with a total of sixteen used to make the 64-socket configuration.
Fujitsu did not say how much better the new interconnect fabric scaled compared to the prior one, but presumably it is considerably better given all that extra bandwidth in the crossbar switches.
"With cutting-edge technology and a commitment to excellence, Fujitsu will continue to develop Sparc64 servers," Yoshida said in closing out his presentation on the forthcoming chip.
Yoshida was not at liberty to say when The Sparc64 X+ might appear in Fujitsu's rendition of the Sparc M series servers and compete at least in some ways against Oracle's Sparc M machines, which currently use Big Larry's own M5 chips and will soon be upgraded to M6 chips, also divulged at Hot Chips this week. ®