On-Prem

This article is more than 1 year old

Need speed? Then PCIe it is – server power without the politics

No longer for nerds and HPC geeks

Tue 14 Apr 2015 // 12:57 UTC

SATA sucks

Flash drives are fast. SATA is slow. This is a problem. What's worse, no matter how fast you try to make the SATA controller, it doesn't really matter, because it lives on the Southbridge.

The Southbridge of current Intel chips use DMI 2.0, which with an x4 link providing a paltry 20 Gbit/s between the CPU and the Southbridge. A single USB 3.0 port is (theoretically) 5Gbit/s while individual SATA ports are 6Gbit/sec. That's before we get into handful of PCIe lanes hanging off the Southbridge too.

Clearly, "onboard SATA" is unlikely to deliver on advertised speeds even if you plug in flash drives capable of pushing the limits. This, just by the by, is why everyone loves PCIe SSDs, and why NVMe SSDs (PCIe in a SAS/SATA form factor and hotswap tray) are such a big deal. They bypass the Southbridge and use the much (much) faster PCIe.

Networks suck

Not only does anything hanging off the Southbridge suck, but today's networks suck. Network cards generally plug into PCIe, but compared with having a CPU talk directly to its RAM, going out across the network to put or get information takes positively forever. Sadly, we're using networks for everything these days.

We need reliable shared storage between nodes in a cluster so we use a network. That network could be fibre channel, or it could be hyperconverged or anything in between, but it still requires one bundle of "does things with data" to talk to another bundle of "does things with data". This is true no matter whether you call any given device a server, an SAN array, or what-have-you.

Get into the High Performance Computing world and you run into additional problems. Not only do you have servers going out across the network to talk to storage, but you have CPUs reading and writing from RAM on remote systems!

You see, in the HPC world, applications just don't fit into the RAM you can cram into a single node. Many HPC setups are hundreds, if not thousands of nodes lashed together into a single supercomputer, with each node being able to address the memory of each remote node as though it were local.

Our existing networks – for example Ethernet, Infiniband and so on – simply weren't designed for this. Believe it or not, this is not a new problem.

Back to the past again

Those of us who paid attention to the HPC world will remember a time, about 10 years ago, where Hypertransport was one of the most promising new technologies available. The really short version was that we were promised networks made out of Hypertransport itself.

CPUs would come in trays and connect to a Hypertransport switch. There would be trays upon trays of RAM. Storage controllers, networking, graphics cards and what-have-you would all plug into the Hypertransport mesh making a crazily extensible supercomputer where everything was just one hop away from the CPU, or the RAM, or another component.

What's more, in a Hypertransport network work everything could talk to one another at the same sorts of speeds normally reserved for the CPU talking to RAM. Hypertransport is packet based, so why not?

Needless to say, this utopia never really arrived.

There were technical issues in the way. Extending Hypertransport even across the length of a motherboard is hard. Getting that to rack and ultimately building a system that works better than a bunch of individual nodes lashed together proved to be problematic.

Do you trade off a bunch of nodes with high latency links between them, but ultra-low latency between their internal components for some sort of bizarre frankencomputer with a bunch of components that moderate latency but have no need for multiple nodes?

The whole thing got stuck in committee where politics and competing business interests very nearly killed it. Instead of the cheap, easy ultra-high speed revolution we were promised, we got capable – but small and still very niche – outfits like Numascale.

While sporting some nice technology, Numascale hasn't changed the world quite yet. In part because the interconnect used (Hypertransport) isn't universal and broad industry support just isn't quite there.

Next page: Today, all over again

Page:

More about

Narrower topics

Home lab

More about

COMMENTS

More about

Narrower topics

Home lab

TIP US OFF

Send us news

Topics

Special Features

Vendor Voice

Resources

On-Prem

Need speed? Then PCIe it is – server power without the politics

No longer for nerds and HPC geeks

SATA sucks

Networks suck

Back to the past again

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

Dell shaves months off lead times for GPU-powered AI servers

PCIe 7.0 first official draft lands, doubling bandwidth yet again

Cloudflare says it has automated empathy to avoid fixing flaky hardware too often

Protecting distributed branch office environments from ransomware

One rack. 120kW of compute. Taking a closer look at Nvidia's DGX GB200 NVL72 beast

Nvidia now plays kingmaker in the server court, says Omdia

Olympic-level server tossing contest seeks entrants – warranty voiding guaranteed

Dell share price jumps 16% on mention of AI server backlog

French cloud Scaleway starts renting Alibaba's RISC-V SoC

HPE blames GPU shortage for contributing to unexpected sales slide

Lenovo to offer certified refurbished PCs and servers

Nginx web server forked as Freenginx to escape corporate overlords

About Us

Our Websites

Your Privacy