Google 'cubists' fix bug in Linux network congestion control, boost performance

It's a wonder the 'net works at all, really


A bit of “quality, non-glamorous engineering” could give a bunch of Linux servers a boost by addressing an unnoticed bug in a congestion control algorithm.

This little code snippet addresses the ten-year-old slip-up in the open-source kernel's net/ipv4/tcp_cubic.c code:

static void bictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event)
{
 if (event == CA_EVENT_TX_START) {
 s32 delta = tcp_time_stamp - tcp_sk(sk)->lsndtime;
 struct bictcp *ca = inet_csk_ca(sk);

 /* We were application limited (idle) for a while.
 * Shift epoch_start to keep cwnd growth to cubic curve.
 */
 if (ca->epoch_start && delta > 0) ca->epoch_start += delta;
 return;
 }
}

So what's it all about, Alfie?

The patch was provided by Googlers in the Chocolate Factory's transport networking team, with contributions from Jana Iyengar, Neal Cardwell, and others.

It fixes an old flaw in a set of routines called TCP CUBIC designed to address the “slow response of TCP in fast long-distance networks," according to its creators.

Like any congestion control algorithm, TCP CUBIC makes decisions based on congestion reports: if the network becomes jammed with traffic, hosts are told to slow down.

As Mozilla developer Patrick McManus explains here, the bug was simple: TCP CUBIC interprets a lack of congestion reports as an opportunity to send data at a faster rate.

That condition could, however, arise merely because the system hasn't been getting any congestion reports.

What's supposed to happen in congestion control is that the operating system starts sending data slowly, increases its transmission rate until the network says “that's enough”, and then backs off.

The bug in TCP CUBIC fools the system into thinking it has a clear run at the network and should transmit at the maximum possible rate, crashing into other traffic, and ruining performance and efficiency.

“The end result is that applications that oscillate between transmitting lots of data and then laying quiescent for a bit before returning to high rates of sending will transmit way too fast when returning to the sending state,” McManus explained.

That condition could be quite common, he notes. A server may have sent a short burst of data over HTTP containing a web form for someone to fill out, and go quiet waiting for a response, then assume there's no congestion, and burst out of the blocks at top-rate when it gets the user's response.

“A far more dangerous class of triggers is likely to be the various HTTP based adaptive streaming media formats where a series of chunks of media are transferred over time on the same HTTP channel”, McManus added.

That's why a fix for the ancient bug could be important: Linux is used in many media servers, and for the last decade, an important chunk of congestion control hasn't been working quite right. The patch forces the kernel to act a little more intelligently after an idle period.

A more technical description is included with the bug fix. ®

Similar topics


Other stories you might like

  • Why Cloud First should not have to mean Cloud Everywhere

    HPE urges 'consciously hybrid' strategy for UK public sector

    Sponsored In 2013, the UK government heralded Cloud First, a ground-breaking strategy to drive cloud adoption across the public sector. Eight years on, and much of UK public sector IT still runs on-premises - and all too often - on obsolete technologies.

    Today the government‘s message boils down to “cloud first, if you can” - perhaps in recognition that modernising complex legacy systems is hard. But in the private sector today, enterprises are typically mixing and matching cloud and on-premises infrastructure, according to the best business fit for their needs.

    The UK government should also adopt a “consciously hybrid” approach, according to HPE, The global technology company is calling for the entire IT industry to step up so that the public sector can modernise where needed and keep up with innovation: “We’re calling for a collective IT industry response to the problem,” says Russell MacDonald, HPE strategic advisor to the public sector.

    Continue reading
  • A Raspberry Pi HAT for the Lego Technic fan

    Sneaking in programming under the guise of plastic bricks

    There is good news for the intersection of Lego and Raspberry Pi fans today, as a new HAT (the delightfully named Hardware Attached on Top) will be unveiled for the diminutive computer to control Technic motors and sensors.

    Using a Pi to process sensor readings and manage motors has been a thing since the inception of the computer, and users (including ourselves) have long made use of the General Purpose Input / Output (GPIO) pins that have been a feature of the hardware for all manner of projects.

    However, not all users are entirely happy with breadboards and jumpers. Lego, familiar to many a builder thanks to lines such as its Mindstorms range, recently introduced the Education SPIKE Prime set, aimed at the classroom.

    Continue reading
  • Reg scribe spends week being watched by government Bluetooth wristband, emerges to more surveillance

    Home quarantine week was the price for an overseas trip, ongoing observation is the price of COVID-19

    Feature My family and I recently returned to Singapore after an overseas trip that, for the first time in over a year, did not require the ordeal of two weeks of quarantine in a hotel room.

    Instead, returning travelers are required to stay at home, wear a government-issued tracking device, and stay within range of a government-issued Bluetooth beacon at all times for a week … or else. No visitors are allowed and only a medical emergency is a ticket out. But that sounded easy compared to the hotel quarantine we endured in 2020.

    Continue reading

Biting the hand that feeds IT © 1998–2021