Back in September, IBM was left red-faced when its global load balancer and reverse DNS services fell over for 21 hours.
At the time, IBM blamed the outage on a third-party domain name registrar that was transferring some domains to another registrar. The sending registrar, IBM said, accidentally put the domains in a “hold state” that prevented them being transferred. As the load balancer and reverse DNS service relied on the domains in question, the services became inaccessible to customers.
IBM's now released an incident summary [PDF] in which it says “multiple domain names were mistakenly allowed to expire and were in hold status.”
The explanation also reveals that the network-layer.net domain was caught up in the mess, in addition to the global-datacenter.com and global-datacenter.net domains that IBM reported as messed up in September.
It's unclear if IBM or its outsourced registrar was responsible for the failure to renew registration for the domains.
Whoever's to blame, the incident is not a good look for a company staking its future on the cloud and AI-driven automation.
IBM's not alone in cooking its cloud with avoidable errors: Google today admitted that a brownout in its Canadian bit barns was caused by a patch that had been applied elsewhere, but had for unspecified reasons not made it above the 49th parallel. ®