Benchmark bandit: Numascale unveils 10TB/sec monster

Supermicro scores again, bringing massive compute power to bear

Numascale's non-universal memory architecture has been used to build a 324-CPU system with 108 Supermicro servers sharing a single system image and 20.7TB of memory – scoring a winning McCalpin STREAM benchmark.

The system, with its cache-coherent shared memory, ran at 10.096TB/sec for the McCalpin Scale function. It was 53 per cent more than the 6.59TB/sec attained by the second-placed system, an SGI Altix UV2000.

The benchmark is a synthetic "program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels".

The system enables massive compute power to be brought to bear on a humongous working set of data with parallel data access. Its details were:

  • 108 Supermicro 1U servers with 3 x AMD Opteron 6386 CPUs each, meaning 324 CPUs in total
  • Mounted in three racks with servers interconnected in a 3D torus via NumaConnect, with a 6x6x3 topology
  • Each Opteron 6386 has 16 cores, meaning each server has 48 cores
  • Each server has 192GB of memory, meaning 20.7TB in total
  • Each server runs Linux and there is a single system image across the 5,184 cores

Numascale said its NumaConnect "enables scalable server computer systems to be built from commodity components at cluster prices, while providing high performance shared memory programming capabilities [and it] eliminates the difficulty of MPI (message passing interface) coding for big data problems".

This is a demonstration of Numascale prowess for the Big Data market. The system is the initial part of a cloud computing installation in a north America-based data centre used to run analysis routines that simulate complex dynamic data using both historical data and near real-time information.

The routines evaluate "location placement, megawatt sizing, and energy services mix in order to determine the greatest optimisation and efficiency gains from the integration of banks that store and deliver energy to an electric grid."

It's pretty specialised, but Numascale said its technology could be used in so-called smart city applications, such as traffic analysis. This would need, it suggests, 24x7 real-time streaming data from thousand of sensors to justify real-time decisions to optimise traffic flows.

NumaConnect supports up to 256TB of physical memory address space and up to 196,608 cores. Using Supermicro servers would certainty seem cheaper than using SGI or Cray systems or brand name packaged appliances.

But an all-flash equivalent, say using EMC DSSD technology, might be cheaper still and deliver pretty damn good (but not so fast) performance. Balancing budget and performance needs is going to be fun.

