How the NYE leap second clocked Cloudflare – and how a single character fixed it

DNS bug was a matter of time


When the leap second was added just before the arrival of 2017, Cloudflare stumbled. The content delivery network's DNS service suffered a limited service interruption during the first few hours of the new year.

John Graham-Cumming, head of engineering for the company, said in a phone interview with The Register that the reason for the outage was the misapprehension that time cannot go backward.

Leap seconds have the potential to make time seem to rewind, as far as computers are concerned, by throwing clocks out of sync.

Programmers have a long history of failing to understand or to plan for how time-related code will function in the future. Probably the most well-known example is the Y2K bug. And looking ahead, the year 2038 has the potential to create problems for Unix systems that haven't been updated to store time as a 64-bit integer.

As developer Noah Sussman documented in a blog post several years ago, there are many misunderstandings about dealing with time in software applications.

Graham-Cumming published an analysis of the Cloudflare service interruption on the company blog. The incident affected only a few machines across the company's 102 data centers, about 0.2 per cent of DNS queries, and less than 1 per cent of HTTP requests, we're told.

Cloudflare's RRDNS software is written in Go and uses Go's time.Now() function to fetch the current time. But, as Graham-Cumming explained, Go does not provide monotonicity – a guarantee that time moves only forward.

While this is a known issue among Go developers, Graham-Cumming insisted the responsibility belonged to Cloudflare. "We didn't plan in the code that the time difference would be negative," he said. "That was on us."

Cloudflare's RRDNS software is designed to measure the performance of upstream DNS resolvers handling CNAME lookups. It works when time moves forward but fails otherwise, because Go's rand.Int63n() function panics when passed a negative argument.

The fix proved to be a single character that broadened the scope of the RRDNS error checking. Instead of checking only whether the time value rttMAX was zero, the code was updated to check if rttMAX was equal to or less than zero, for those rare instances when time appeared to be going backward.

Graham-Cumming said Cloudflare plans to prevent potential recurrences of the issue by reviewing its code for time calculation problems and by smearing leap seconds over long periods – as Google does – rather than adding new seconds all at once.

"Computers like things to be really predictable, but we have external input making them unpredictable," said Graham-Cumming. "One of the problems with computers is whenever you deal with the real world, you deal with the mess of the real world." ®

Broader topics


Other stories you might like

  • Cisco warns of security holes in its security appliances
    Bugs potentially useful for rogue insiders, admin account hijackers

    Cisco has alerted customers to another four vulnerabilities in its products, including a high-severity flaw in its email and web security appliances. 

    The networking giant has issued a patch for that bug, tracked as CVE-2022-20664. The flaw is present in the web management interface of Cisco's Secure Email and Web Manager and Email Security Appliance in both the virtual and hardware appliances. Some earlier versions of both products, we note, have reached end of life, and so the manufacturer won't release fixes; it instead told customers to migrate to a newer version and dump the old.

    This bug received a 7.7 out of 10 CVSS severity score, and Cisco noted that its security team is not aware of any in-the-wild exploitation, so far. That said, given the speed of reverse engineering, that day is likely to come. 

    Continue reading
  • Cisco execs pledge simpler, more integrated networks
    Is this the end of Switchzilla's dashboard creep?

    Cisco Live In his first in-person Cisco Live keynote in two years, CEO Chuck Robbins didn't make any lofty claims about how AI is taking over the network or how the company's latest products would turn networking on its head. Instead, the presentation was all about working with customers to make their lives easier.

    "We need to simplify the things that we do with you. If I think back to eight or ten years ago, I think we've made progress, but we still have more to do," he said, promising to address customers' biggest complaints with the networking giant's various platforms.

    "Everything we find that is inhibiting your experience from being the best that it can be, we're going to tackle," he declared, appealing to customers to share their pain points at the show.

    Continue reading
  • Mega's unbreakable encryption proves to be anything but
    Boffins devise five attacks to expose private files

    Mega, the New Zealand-based file-sharing biz co-founded a decade ago by Kim Dotcom, promotes its "privacy by design" and user-controlled encryption keys to claim that data stored on Mega's servers can only be accessed by customers, even if its main system is taken over by law enforcement or others.

    The design of the service, however, falls short of that promise thanks to poorly implemented encryption. Cryptography experts at ETH Zurich in Switzerland on Tuesday published a paper describing five possible attacks that can compromise the confidentiality of users' files.

    The paper [PDF], titled "Mega: Malleable Encryption Goes Awry," by ETH cryptography researchers Matilda Backendal and Miro Haller, and computer science professor Kenneth Paterson, identifies "significant shortcomings in Mega’s cryptographic architecture" that allow Mega, or those able to mount a TLS MITM attack on Mega's client software, to access user files.

    Continue reading
  • CISA and friends raise alarm on critical flaws in industrial equipment, infrastructure
    Nearly 60 holes found affecting 'more than 30,000' machines worldwide

    Updated Fifty-six vulnerabilities – some deemed critical – have been found in industrial operational technology (OT) systems from ten global manufacturers including Honeywell, Ericsson, Motorola, and Siemens, putting more than 30,000 devices worldwide at risk, according to private security researchers. 

    Some of these vulnerabilities received CVSS severity scores as high as 9.8 out of 10. That is particularly bad, considering these devices are used in critical infrastructure across the oil and gas, chemical, nuclear, power generation and distribution, manufacturing, water treatment and distribution, mining and building and automation industries. 

    The most serious security flaws include remote code execution (RCE) and firmware vulnerabilities. If exploited, these holes could potentially allow miscreants to shut down electrical and water systems, disrupt the food supply, change the ratio of ingredients to result in toxic mixtures, and … OK, you get the idea.

    Continue reading
  • This startup says it can glue all your networks together in the cloud
    Or some approximation of that

    Multi-cloud networking startup Alkira has decided it wants to be a network-as-a-service (NaaS) provider with the launch of its cloud area networking platform this week.

    The upstart, founded in 2018, claims this platform lets customers automatically stitch together multiple on-prem datacenters, branches, and cloud workloads at the press of a button.

    The subscription is the latest evolution of Alkira’s multi-cloud platform introduced back in 2020. The service integrates with all major public cloud providers – Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud – and automates the provisioning and management of their network services.

    Continue reading
  • Info on 1.5m people stolen from US bank in cyberattack
    Time to rethink that cybersecurity strategy?

    A US bank has said at least the names and social security numbers of more than 1.5 million of its customers were stolen from its computers in December.

    In a statement to the office of Maine's Attorney General this month, Flagstar Bank said it was compromised between December and April 2021. The organization's sysadmins, however, said they hadn't fully figured out whose data had been stolen, and what had been taken, until now. On June 2, they concluded criminals "accessed and/or acquired" files containing personal information on 1,547,169 people.

    "Flagstar experienced a cyber incident that involved unauthorized access to our network," the bank said in a statement emailed to The Register.

    Continue reading
  • 1Password's Insights tool to help admins monitor users' security practices
    Find the clown who chose 'password' as a password and make things right

    1Password, the Toronto-based maker of the identically named password manager, is adding a security analysis and advice tool called Insights from 1Password to its business-oriented product.

    Available to 1Password Business customers, Insights takes the form of a menu addition to the right-hand column of the application window. Clicking on the "Insights" option presents a dashboard for checking on data breaches, password health, and team usage of 1Password throughout an organization.

    "We designed Insights from 1Password to give IT and security admins broader visibility into potential security risks so businesses improve their understanding of the threats posed by employee behavior, and have clear steps to mitigate those issues," said Jeff Shiner, CEO of 1Password, in a statement.

    Continue reading

Biting the hand that feeds IT © 1998–2022