Another thing that Cray is adding back into its supers with the XE6 systems is global memory addressing, something that Cray machines have not had since the T3E super from 1995, which was based on the Digital Equipment Alpha 21164 processor. (Yes, DEC was a damned good engineering company.)
The T3E was the first machine to break the 1 teraflops barrier doing actual work. (I know about ASCI Red and its Linpack ratings.) The global address space in the XE6 is implemented in the Gemini chip and basically allows remote direct memory access (RDMA) from any node in the system to any other node in the system without having to go through the whole MPI stack to have nodes talk to each other.
This global address space is not as tight as the shared global memory that Silicon Graphics implements in NUMAlink 4 for its Itanium-based Altix 4700s or in NUMAlink 5 for its new Xeon 7500-based Altix UVs. Global shared memory in the SGI sense means there is only one copy of the Linux operating system and one address space for applications.
The global addressing based on RDMA that Cray is implementing in the XE6 provides a shared address space for applications, but each node in the cluster has its own copy of the Linux operating system. The "Blue Waters" massively parallel Power7-based super IBM is building for the University of Illinois has something akin to Cray's global addressing.
The global addressing means that applications running across a large number of nodes can be coded more easily than with MPI, but you have to use special languages like Unified Parallel C, Co-Array Fortran, Chapel (from Cray), or X10 (from IBM) to use it. The Cray X1 and X2 vector machines had global address spaces, and so too did the Quadrics interconnect, which is one reason why Duncan Roweth, one of the founders of the British HPC interconnect makers, took a job at Cray when Quadrics shut down a year ago. ®