How many PCI Express (PCIe) lanes does your computer have? How many of those are directly provided by the CPU? With only a few exceptions, nobody except high-end gamers and the High Performance Computing (HPC) crowd care. Regular punters (maybe) care about how many PCIe slots they have.
So why is it that gamers and HPC types get worked up about the details of lanes and why are start-ups emerging with the mission of taking PCIe out from inside the server, and of using it to connect nodes?
We should start with "what exactly is PCIe". Most readers of The Register will have heard of PCIe, and a goodly lot of you will have plugged in a PCIe card, Expresscard, or other similar device. For most people, PCIe is a bus within the computer that allows us to attach devices to the computer, usually by adding these devices inside the computer in a permanent fashion.
Graphics cards, RAID controllers, network cards, HBAs and just about everything else you can think of connects to a modern computer through PCIe. Other items you might add to a computer – a hard disk via the SATA interface or a keyboard via USB – plug into a controller which often backs on to PCIe.
To understand how this all slots together today, it's easiest to understand how it all used to work.
The ultimate goal of a modern PC is to get data to the CPU, which crunches numbers, and then back out again in some useful form. Way back when, in the beforetime, the CPU was an isolated device. It talked to the rest of the system through the Front Side Bus (FSB). On the other end of the FSB was the Northbridge.
The RAM — which holds data the CPU needs fast access to — and high speed peripheral bus (AGP or PCIe) were controlled by the Northbridge. I/O – things like PCI, SATA, USB and so forth – were controlled by the Southbridge. The Southbridge and the Northbridge were connected by their own interconnect.
Eventually, the Northbridge was cut in half, with the memory controller going inside the CPU. The PCIe lanes were left to be managed by their own chip, but the FSB was no longer capable of keeping up.
The FSB was replaced by QPI (Intel) or Hypertransport (AMD). QPI and Hypertransport are ultra-high-speed buses that connect individual CPUs (and their associated memory) together, both with each other and with what was left of the Northbridge. The Southbridge and the Northbridge would talk to one another through a separate protocol.
Eventually, what's left of the Northbridge (the PCIe controller) was simply built into the CPU as well. The Northbridge had disappeared entirely.
This means that PCIe devices have a shorter path to the CPU (yay!). If, however, you want more PCIe lanes on a system than are provided by the CPU die itself you either need another CPU or you hang them off the (distant and slow) Southbridge (boo!).
This has some real-world consequences. Looking at the modern Intel chips, those processors designed to be single processor only (such as desktop chips) don't have QPI. It was not deemed necessary as there is no Northbridge to speak of (it having been incorporated entirely into the CPU die) and the specific models having been designed so that adding additional CPUs isn't possible.
Adding PCIe lanes is thusly not really possible. The Southbridges of desktop processors typically come with a limited number of previous-generation PCIe lanes, but they have to fight for contention with USB, SATA and so forth for the limited bandwidth provided by the DMI link between the CPU and the Southbridge.
You could try adding in another Southbridge and sharing that DMI link, but it's already overloaded so the results will be pretty bad. Intel multiprocessor server chips, however, do have QPI. While it's theoretically possible to build a chip that hangs on the QPI bus and provides more PCIe lanes for use to the CPU, manufacturers just aren't producing Northbridge stubs like they used to. If you want more PCIe lanes, you'd better add some CPUs.
In a system-on-a-chip solution, the Southbridge is integrated into the CPU die and things can get all manner of complicated, as engineers seek to cut out as many interconnects as possible.
So why does all of this matter, and of what practical use is knowing how it goes together?