How a tiny leap-day miscalculation trashed Microsoft Azure

Redmond drills into cause of eight-hour outage


As soon as Microsoft's cloudy platform Azure crashed to Earth, and stayed there for eight hours, on 29 February, every developer who has ever had to handle dates immediately figured it was a leap-day bug.

Now the software biz behemoth has put its hands up and admitted in a detailed dissection of the blunder how a calendar glitch trashed its server farm. It's also a handy guide to setting up your own wholesale-sized cloud platform.

The mega-crash stemmed from the handling of messages passed between a virtual machine running a client's application and the underlying host operating system running on each of the Azure servers.

These messages are encrypted using a public-private key pair taken from a "transfer certificate" generated within the VM. This security measure allows the host OS and the VM to trust their communications channel, through which stuff such as SSL certificates and diagnostic health checks are sent.

A transfer certificate is valid for a year from its creation date. A certificate created by an agent in a VM on 29 February 2012 will expire on 29 February 2013, a date that simply doesn't exist. This crashed the certificate validation process, bringing the start up of the VM to a halt.

The host OS tries to restart a crashed app VM every 25 minutes, which would have been fruitless in this case. After a series of failures, the host OS declares the hardware to be at fault and reports the server as knackered. Automatic systems that manage the clusters of servers try to self-heal the cloud by restarting the VMs on other boxes - in this scenario it caused them to also fall over, cascading the gaffe into a full-blown outage.

It didn't help that, at the time, new versions of the cloud's platform software were being rolled out, which required the generation of new albeit broken transfer certificates. Once enough servers are reported faulty in a cluster, the whole set is put on red alert - halting self-healing and software updates to minimise the damage.

Microsoft engineers confessed that in a rush to roll out the fix to the servers, they hit incompatibility problems within their own code, which knocked out services again.

You can read the full cock-up, blow by blow, here. Microsoft has said it will cough up service credits for customers walloped by the Azure outage. ®

Broader topics


Other stories you might like

  • Experts: AI should be recognized as inventors in patent law
    Plus: Police release deepfake of murdered teen in cold case, and more

    In-brief Governments around the world should pass intellectual property laws that grant rights to AI systems, two academics at the University of New South Wales in Australia argued.

    Alexandra George, and Toby Walsh, professors of law and AI, respectively, believe failing to recognize machines as inventors could have long-lasting impacts on economies and societies. 

    "If courts and governments decide that AI-made inventions cannot be patented, the implications could be huge," they wrote in a comment article published in Nature. "Funders and businesses would be less incentivized to pursue useful research using AI inventors when a return on their investment could be limited. Society could miss out on the development of worthwhile and life-saving inventions."

    Continue reading
  • Declassified and released: More secret files on US govt's emergency doomsday powers
    Nuke incoming? Quick break out the plans for rationing, censorship, property seizures, and more

    More papers describing the orders and messages the US President can issue in the event of apocalyptic crises, such as a devastating nuclear attack, have been declassified and released for all to see.

    These government files are part of a larger collection of records that discuss the nature, reach, and use of secret Presidential Emergency Action Documents: these are executive orders, announcements, and statements to Congress that are all ready to sign and send out as soon as a doomsday scenario occurs. PEADs are supposed to give America's commander-in-chief immediate extraordinary powers to overcome extraordinary events.

    PEADs have never been declassified or revealed before. They remain hush-hush, and their exact details are not publicly known.

    Continue reading
  • Stolen university credentials up for sale by Russian crooks, FBI warns
    Forget dark-web souks, thousands of these are already being traded on public bazaars

    Russian crooks are selling network credentials and virtual private network access for a "multitude" of US universities and colleges on criminal marketplaces, according to the FBI.

    According to a warning issued on Thursday, these stolen credentials sell for thousands of dollars on both dark web and public internet forums, and could lead to subsequent cyberattacks against individual employees or the schools themselves.

    "The exposure of usernames and passwords can lead to brute force credential stuffing computer network attacks, whereby attackers attempt logins across various internet sites or exploit them for subsequent cyber attacks as criminal actors take advantage of users recycling the same credentials across multiple accounts, internet sites, and services," the Feds' alert [PDF] said.

    Continue reading

Biting the hand that feeds IT © 1998–2022