Everything you wanted to know about modern network congestion control but were perhaps too afraid to ask

In which a little unfairness can be quite beneficial

Systems Approach It’s hard not to be amazed by the amount of active research on congestion control over the past 30-plus years. From theory to practice, and with more than its fair share of flame wars, the question of how to manage congestion in the network is a technical challenge that resists an optimal solution while offering countless options for incremental improvement.

This seems like a good time to take stock of where we are, and ask ourselves what might happen next.

Congestion control is fundamentally an issue of resource allocation — trying to meet the competing demands that applications have for resources (in a network, these are primarily link bandwidth and router buffers), which ultimately reduces to deciding when to say no and to whom. The best framing of the problem I know traces back to a paper [PDF] by Frank Kelly in 1997, when he characterized congestion control as “a distributed algorithm to share network resources among competing sources, where the goal is to choose source rate so as to maximize aggregate source utility subject to capacity constraints.”

This is a hard problem in any computer system, but what makes it especially difficult in this case is the decentralized nature of the Internet. Not only is the Internet’s approach to congestion control distributed across its millions of hosts and routers, it is fair to think of them as cooperatively trying to achieve a globally optimal solution. From this perspective, there is a shared objective function, and all the elements are implementing a distributed algorithm to optimize that function. Compounding the challenge, and arguably the reason there are so many approaches to congestion control, is the lack of consensus on the right objective function.

Of course everyone wants high throughput, low latency, stability, and fairness, but it’s how you trade those off against each other that makes it a tough question to answer. To make matters worse, the problem is over-constrained, meaning that each solution must choose which constraints to give the most – and least – weight to.

Fairness has been a particularly thorny issue, not only when considering a given algorithm competing against itself, but especially when comparing algorithm A with algorithm B. If A is able to measure improved throughput over B, but it does so by being more aggressive, and hence, stealing bandwidth from B’s flows, then A’s improvement is not fairly gained and may be discounted. In such an analysis, Jain’s fairness index [PDF] has historically been used as a quantitative measure of fairness.

Everyone wants high throughput, low latency, stability, and fairness, but it’s how you trade those off against each other that makes it a tough question to answer

Fairness arguments have been used for 30 years in favor of the incumbent algorithm (whatever it happens to be), making Ranysha Ware’s recent proposal [PDF] to measure harm instead of fairness a welcome breath of fresh air in this ongoing debate. Ware et al advocate for a threshold based on harm, as measured by a reduction in throughput or an increase in latency. Intuitively, if the amount of harm caused by flows using a new mechanism B on flows using existing mechanism A is within a bound derived from how much harm A-managed flows cause other A-managed flows, we can consider B deployable alongside A without harm.

Unfortunately, replacing fairness with harm doesn’t eliminate the over-constrained nature of the problem. But what might is the proliferation of purpose-built mechanisms targeted at specific usage domains. Based on my reading of a lot of the recent research, this is where congestion control algorithms seem to be headed.

Data Center TCP (DCTCP) is a great example. It targets congestion within the data center, and so assumes link bandwidths are 10Gbps or higher and RTTs are typically measured in the tens of microseconds.

It doesn’t have to be responsive to such a wide-range of operating parameters as the Internet as a whole. Even more importantly, because the data center is contained within a single administrative domain, it is possible to enable features like Explicit Congestion Notification (ECN) on every single switch, and optimize the algorithm to take advantage of that information.

You could argue that the data center is a special case, but I’m not so sure. Google’s Bottleneck Bandwidth and RTT (BBR) mechanism is another example worth considering.

It is general-purpose in that it attempts to handle the wide range of scenarios that any Internet-worthy algorithm would have to respond to, but it’s not particularly fair when competing with non-BBR flows. But that doesn’t matter if it’s only deployed within Google’s backbone, interconnecting their data centers. In that case, BBR only has to be fair to its own flows. Is it possible to generalize from this example?

Perhaps it is. In the early decades of the Internet, end-to-end meant between any two hosts in the Internet. But today, direct communication between arbitrary hosts is the exception, not the rule. Most TCP connections on today’s Internet can be categorized as follows: edge-to-cloud (between an end user device and the nearest cloud hosting center or CDN server); cloud-to-cloud (typically over a provider’s backbone); or intracloud (between two servers within a data center).

The second two are usage domains that already run purpose-built TCP congestion control algorithms, as covered above. And the first is well on its way to being reinvented — with the rollout of 5G (and all the spectrum-allocation algorithms 5G implies) for mobile devices, and with cloud-provider last-mile offerings for our streaming and gaming devices at home.

None of this means the resource allocation problem goes away, but it might become more tractable as the deployments become more customized. Most importantly, we won’t have to agree to one universal mechanism, except as a fallback mechanism for those rare occasions when true peer-to-peer transfers are required. This may very well accelerate the rate at which congestion control papers are published, since the “it’s not fair” argument is no longer relevant.

Of course, if the Internet truly re-decentralizes in the coming years, congestion control might once again need to address global optimization problems, but perhaps that will happen without a single incumbent algorithm being favored by default. It seems congestion control researchers have a secure future. ®

Larry Peterson and Bruce Davie are the authors of Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available on GitHub. You can find them on Twitter, their writings on Substack, and past The Register columns here.

Other stories you might like

  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading
  • American diplomats' iPhones reportedly compromised by NSO Group intrusion software

    Reuters claims nine State Department employees outside the US had their devices hacked

    The Apple iPhones of at least nine US State Department officials were compromised by an unidentified entity using NSO Group's Pegasus spyware, according to a report published Friday by Reuters.

    NSO Group in an email to The Register said it has blocked an unnamed customers' access to its system upon receiving an inquiry about the incident but has yet to confirm whether its software was involved.

    "Once the inquiry was received, and before any investigation under our compliance policy, we have decided to immediately terminate relevant customers’ access to the system, due to the severity of the allegations," an NSO spokesperson told The Register in an email. "To this point, we haven’t received any information nor the phone numbers, nor any indication that NSO’s tools were used in this case."

    Continue reading

Biting the hand that feeds IT © 1998–2021