Linux kernel patch from Google speeds up server shutdowns

First world problems: Too many NVMe drives, not enough seconds to spare


A new Linux kernel patch from a Google engineer resolves a problem caused by a condition that many of us might quite like to experience – having too many NVMe drives.

The problem is caused by the relatively long time it takes to properly shut down a drive: apparently, as much as four-and-a-half seconds.

Remember Sun's X4500 storage server, originally codenamed Thumper? It was truly radical when it appeared: a 3U dual-processor server, but with a stonking 48 drive bays. These days Google has a bunch of boxes with a still-fairly-impressive 16 NVMe drives attached to each one. And when they have to reboot, they take a long time.

If you have a storage server with 16 drives, that's 72 seconds of wasted time hanging around every time it reboots. Hardly an eon, but still annoying – because it's totally unnecessary.

The problem is that the kernel's drive-shutdown function is synchronous: for each drive, it waits for the shutdown command to complete before carrying on to the next. The new kernel patch does exactly the same thing, but changes the way that the calls are issued to be asynchronous. It issues the call to the first drive, then immediately moves on to the next, and works its way down the list. When they all return the desired status, the job is done.

Presto, a minute off your reboot time. If you have more storage than Larry Page's home computer anyway.

Although this doesn't directly help most of us, sometimes these sorts of changes can have very pleasant side effects. For instance, there's a tool for kernel developers called kexec which allows one kernel to load another kernel into memory and start it. This has a very desirable side effect, though: it allows you to turbocharge Linux restarts. Since your computer has to spend a minute or so in its firmware, performing some self-tests and so on before it loads the operating system, if you can bypass that and just restart directly from one OS into the other, you can reboot in seconds rather than minutes. And if you're thinking that you have an SSD and bootups are super quick anyway, the effect is even more extreme with an SSD. ®

Bootnote

Like many other things, progress has made things worse, and unfortunately, installing the kexec-tools package on Ubuntu, which used to just magically work, now doesn't. Do let us know if you find a working fix.


Other stories you might like

  • Despite global uncertainty, $500m hit doesn't rattle Nvidia execs
    CEO acknowledges impact of war, pandemic but says fundamentals ‘are really good’

    Nvidia is expecting a $500 million hit to its global datacenter and consumer business in the second quarter due to COVID lockdowns in China and Russia's invasion of Ukraine. Despite those and other macroeconomic concerns, executives are still optimistic about future prospects.

    "The full impact and duration of the war in Ukraine and COVID lockdowns in China is difficult to predict. However, the impact of our technology and our market opportunities remain unchanged," said Jensen Huang, Nvidia's CEO and co-founder, during the company's first-quarter earnings call.

    Those two statements might sound a little contradictory, including to some investors, particularly following the stock selloff yesterday after concerns over Russia and China prompted Nvidia to issue lower-than-expected guidance for second-quarter revenue.

    Continue reading
  • Another AI supercomputer from HPE: Champollion lands in France
    That's the second in a week following similar system in Munich also aimed at researchers

    HPE is lifting the lid on a new AI supercomputer – the second this week – aimed at building and training larger machine learning models to underpin research.

    Based at HPE's Center of Excellence in Grenoble, France, the new supercomputer is to be named Champollion after the French scholar who made advances in deciphering Egyptian hieroglyphs in the 19th century. It was built in partnership with Nvidia using AMD-based Apollo computer nodes fitted with Nvidia's A100 GPUs.

    Champollion brings together HPC and purpose-built AI technologies to train machine learning models at scale and unlock results faster, HPE said. HPE already provides HPC and AI resources from its Grenoble facilities for customers, and the broader research community to access, and said it plans to provide access to Champollion for scientists and engineers globally to accelerate testing of their AI models and research.

    Continue reading
  • Workday nearly doubles losses as waves of deals pushed back
    Figures disappoint analysts as SaaSy HR and finance application vendor navigates economic uncertainty

    HR and finance application vendor Workday's CEO, Aneel Bhusri, confirmed deal wins expected for the three-month period ending April 30 were being pushed back until later in 2022.

    The SaaS company boss was speaking as Workday recorded an operating loss of $72.8 million in its first quarter [PDF] of fiscal '23, nearly double the $38.3 million loss recorded for the same period a year earlier. Workday also saw revenue increase to $1.43 billion in the period, up 22 percent year-on-year.

    However, the company increased its revenue guidance for the full financial year. It said revenues would be between $5.537 billion and $5.557 billion, an increase of 22 percent on earlier estimates.

    Continue reading

Biting the hand that feeds IT © 1998–2022