Hash for Home
With the TilePro kickers, Tilera is making some performance tweaks in the design as well as delivering a cut-down 36-core variant of the chip. Rather than move to a new chip process and cranking the clock, the new TilePro chips are made in the same 90 nanometer process. The TilePro chips add another dedicated mesh network, this one for cache coherency management, which boosted performance, and so did doubling L1 data and instruction caches per core to 16 KB.
The chips also implement some electronics called "hash for home," which spreads data over the caches on the chip, eliminating hot spots where cores keep hammering the same caches. The new chips also have instructions added specifically for handling video and audio data (important for streaming appliances that will be using the chip) and other instructions for moving and copying data in memory.
The memory controllers on the TilePro chips also have memory striping - akin to RAID striping on disks - to reduce bottlenecks and a direct memory access feature to put data into cache memory without having to go through main memory. All of these and a number of features on the chips have boosted power consumption by 5 percent, but the performance per watt of the chips is nearly double.
That's another way of saying performance is nearly double, and on real workloads, it's somewhere between a factor of 1.5 to 2.5 better than the first Tile64 chips. Significantly for Tilera's marketing efforts, the new TilePro64 running at 866 MHz has 35 times the performance per watt of a 3 GHz quad-core Xeon processor from Intel and 15 times the performance of Texas Instruments' DaVinci DSPs.
The new TilePro64 chip has 64 cores and has 5.6 MB of distributed L2/L3 cache memory. It comes in 700 MHz and 866 MHz versions, and burns 19 to 23 watts when running real workloads. The TilePro36 is a cut-down version of the chip that runs at a much slower 500 MHz and has 3.2 MB of cache. It consumes 10 to 16 watts. The TilePro64 will begin sampling next month, and the TilePro36 will sample by the end of the year. First silicon of these chips was ready to play with in August. Tilera is working on a 120-core chip, due in late 2008 or early 2009, but said nothing more about it this week.
Supercomputing: Not an Option
What it the Tile64 designs do not have are floating point math units and Fortran compilers. So forget supercomputing. But that doesn't mean the Tile64 chips won't see use in the data center. Right now, Tilera has 45 customers, who are messing around with the chips to see what they can do and how they might use them. While the names have to be kept secret, Bob Doud, director of marketing at the company, says that the company has sold over 100 system boards with the chips, which comprise hundreds of processors, and that the company is generating millions of dollars in sales as it makes it way to a ramp during the first half of 2009.
One prototype machine being built with the Tile64 chips is a 5U server with a dozen chips that does SQL database acceleration, and another supercomputer maker is playing around with the chip just in case there are workloads where it can be useful. A number of financial services companies are also looking at the chips to run their algorithms, which do not need floating point math. Media streaming is also another area where companies are playing with the Tile64 chips, too, and so is intrusion detection and deep packet inspection on network devices. 3Com is an early user of the chips.
Tilera was founded in Santa Clara, California, in October 2004. The company's research and development is done in its Westborough, Massachusetts lab, which makes sense given that the Tile64 processor that is based on an MIT project called Raw. The Raw project was funded by the U.S. National Science Foundation and the Defense Advanced Research Projects Agency, the research arm of the U.S. Department of Defense, back in 1996, and it delivered a 16-core processor connected by a mesh of on-core switches in 2002.
One of the key components of that Raw project was the compiler technology that could harness the multi-core architecture of the processor and the integrated switches that linked them together. Anant Agarwal - who worked on the first MIPS RISC processor at Stanford University in the 1980s and who had created a 32-node mesh-based cache coherent processor at MIT in 1994 - had tackled many of these problems. The team that created the Tile64 processor includes techies who worked on Sun Microsystems' Sparcle and Digital Equipment's Alpha RISC processors, too, as well as networking systems from Cisco Systems and supercomputers from Hewlett-Packard and the long-since defunct Thinking Machines - also an MIT spinout. ®