It's always DNS, especially when a sysadmin makes a hash of their semicolons
Remember the days when 'we made it up as we went along'?
Who, Me? DNS (or the Devil's Naming Service as we've heard it called) takes centre stage in this week's tale from the Who, Me? vaults: a warning of the terrors of the forgotten typo.
A Register reader, "Hugo", shared today's story, which takes us back to the late 1990s, when the commercial internet was an optimistic glimpse of the future and outages were a thing that happened when someone accidentally picked up an extension elsewhere in the house.
Hugo was a senior sysadmin for the UK division of a certain global ISP (let's call it "BigNet", for that was certainly not its name).
"BigNet," he told us, "hadn't invested much in tools and automation, and for many things we made it up as we went along.
Panic in the mailroom: The perils of an operating system too smart for its own goodREAD MORE
"There was no customer web portal for anything and they had to raise a ticket, by email, for things like DNS changes which were then actioned by Customer Services."
Happier and simpler times. Until the day Hugo came into work and found the place in uproar.
"DNS was down for every customer, primary domains and secondary," he told us. "The brown stuff really had hit the rotating air displacement machines."
Hugo sprang into action, pulling the DNS server logs and swiftly finding errors. At first they made no sense whatsoever until an awful, creeping realisation dawned.
Remember how he told us that there had been precious little investment in automation? Included in that sacrifice on the altar of corporate perks were tools to edit the DNS. A few helper scripts were used, which basically invoked Vi to edit the zone files.
Vi, for those spared the editor wars, is a venerable text editor much beloved by Unix admins. Others swear by Emacs (others still have been known to just swear at Emacs, but we'll step away from that argument).
As far as the scripts were concerned, there was some simple templating to assist with creating a Start of Authority (SOA) record, but no actual validation of the zone occurred. There was also no history or versioning. There was only the date and owner of the file.
It turned out that Hugo had made the last edit, two weeks ago.
"I had probably been working on a perl or bash script," he told us, "and on the same day I edited the zone file for uk.bignet.net which was in the SOA record for every domain we hosted."
He went on: "In Bind zone files, the comment character is a semicolon, but I accidentally used a hash, and whilst Bind loaded the zone file, it decided it was no longer authoritative, and this went unnoticed."
This was all well and good until the default two weeks time-to-live expired. Since every other domain depended on that one being valid and authoritative, that expiration meant Bind stopped serving all the other domains.
The fix was trivial. Hugo switched the comment to a semicolon, hurriedly pushed out the update and restarted all the name servers. The relief was palpable as the services came back up.
Hugo's fate was, unsurprisingly, to create a DNS zone file validator to prevent further "accidents". Mindful of his own brush with a pink slip, Hugo upped the paranoia of the tool from merely warning of errors to issuing a full-on stop when validation failed.
The customer services and provisioning team hated it "for reasons I couldn't understand," he said, "until I also ran the checker across all the domains we hosted and found something like 15 per cent of them had basic errors."
Hugo, it seemed, was not alone when it came to cavalier treatment of critical files.
Ever been struck by the curse of the wrong comment character? Or a mistake made weeks ago rearing its head in a most unpleasant way? Share your tale of woe with an email to Who, Me? ®