It's neat having speedy, flashy boxen but they need connecting, too
Surveying the competitors in the upcoming interconnect war
HPC Blog The next big battle ground in High Performance Computing, and thereafter in large enterprise computing, will be centered on high performance interconnects (HPI). These are the mechanisms that tie systems together and enable high speed communication between nodes.
The HPI market is the very high-end of the networking equipment market where having ultra-high bandwidth and ultra-low latency is the name of the game.
In a recent survey of HPC and large enterprise data centres, I found that more than 45 per cent of the respondents were planning to spend more in 2016 on system interconnects and associated I/O than they spent in 2015. An additional 40 per cent planned to spend the same amount as they spent in 2015. From these results, it’s obvious that HPIs are an important topic to them – as well it should be.
A lot of the performance improvement that we’ve seen in HPC over the years is directly attributable to advances in HPI. Interconnect speeds have improved at a 30 per cent annual clip over the past four decades, which, when compared to the 41 per cent yearly performance gain implied by Moore’s Law, ain’t chopped liver.
Flagrant Self-Promotional Message: My company, OrionX, recently published four research reports looking at the HPI market, discussing its evolution, the environment today, how to evaluate alternatives, and the best choices in HPI today. If you’re interested in reading about this topic in more detail, you can find the reports at orionx.net/research They’re free and don’t require you to register with us to see them.
HPI: The landscape
There are three basic technology categories in HPI today: Proprietary, InfiniBand, and Ethernet. Proprietary interconnects are typically used for MPP systems, where the entire system can be used by one or two massive applications. With these proprietary interconnects, customers can more easily optimize their network for the application they’re going to run – which is much less expensive than doing, say, a traditional full fat-tree topology across all of their nodes.
To get one of these proprietary interconnects, you have to purchase a system from one of the vendors that offer them – Cray and SGI are the major players, but there are others, like Bull, who also offer their own HPI gear.
In general, other interconnects such as InfiniBand and Ethernet, are primarily used in more general purpose clustered systems that run several or many HPC and scale-out workloads on the same system.
Using the TOP500 list as a proxy for the HPI market as a whole, you see that Ethernet has a slight edge for being the interconnect of choice, used in 44 per cent of the largest computers on the planet. Ethernet is an attractive technology because it’s ubiquitous and in some cases less expensive.
While it has equivalent bandwidth compared to InfiniBand (100Gb/sec), it falls down performance-wise when it comes to latency. Where InfiniBand and proprietary interconnects measure latency in high nanoseconds, Ethernet is still mired in the low microsecond latency range – which is a crucial disadvantage when it comes to workloads that require high performance.
InfiniBand has the high ground when it comes to performance, with 100Gb/sec bandwidth, less than 90 nanosecond latency, and 150 million messages per second. It also has a robust roadmap, with 200Gb/sec InfiniBand to be released in 2017, with 400Gb/sec on the a few years later.
InfiniBand is used on 40 per cent of the systems on the TOP500 list, mainly in systems near the upper part of the list. In fact, the largest machine on the planet, the NRCPC Taihulight system, uses Mellanox InfiniBand HCA cards and switch chips.
One of the main technical advantages to InfiniBand is that it’s an “off load” technology. This means that the HCA cards and switches manage and execute all networking operations – this includes all protocol functions, setting up packets, sending, receiving, etc. This relieves the CPU from these chores, so it can concentrate on running applications.
A new entrant into the interconnect market is Intel, with their Omni-Path Architecture (OPA). Today, this technology is essentially renamed tech from their TrueScale product line – which was mostly acquired in their QLogic IP purchase a few years back. Coming versions will feature more content from their Cray Aries interconnect IP purchase. Their HPI mechanism is an “on load” technology where the main CPU is responsible for executing and managing all network processing – everything from assembling the packets to monitoring when the transaction is completed.
This can put quite a load on the CPU, particularly when we’re talking about the “roll up” phase of an application when all nodes are reporting their results back to the head node. Message size could be a problem as well, since larger messages take more time to packetize and send. Many HPC applications send out messages of widely varying size, which would lead to uneven performance in an on load architecture.
We don’t know a lot about OPA performance yet, since it’s only in the hands of a few customers so far. From what Intel is saying, their OPA is 100Gb/sec, has near nanosecond latency, and can handle 89 million messages per second.
While these specs are pretty close to what Mellanox has with their InfiniBand EDR (other than message rates), one has to wonder what sort of toll the on-load mechanism will have on the CPUs. Depending on the size of the cluster and the application, it could be very significant – but it’s hard to know at this point since the systems are very new and there aren’t a lot of real world performance numbers out there yet.
Intel will be going heads up against Mellanox and the proprietary interconnect folks in a battle to see who can control the HPI part of the market. Intel’s intent is to move beyond selling just chips, or even systems, to selling entire racks chock full of Intel gear including CPUs, motherboards, accelerators, and the HPI that ties them all together.
This would edge companies like Mellanox out of the HPI market and turn companies like Cray, SGI, Lenovo, and others into Intel resellers, even at the rack level. It would also reduce their ability to differentiate their products, which would undercut their margins significantly.
Intel has the market heft, financial resources, and credibility to push OPA hard. Their sales people have access to every notable customer in HPC and large enterprise. But their technology might not be a good match for market needs, due to the on-load architecture they’re using.
Mellanox has the performance high ground and roadmap to compete, but they’ll have to keep executing at a high pace in order to stay ahead. The proprietary guys will have to do the same in order to justify their more expensive systems. ®