IBM cloud has experienced a significant Severity One outage – the rating Big Blue uses to denote the most serious incidents that make resources in its cloud unavailable to customers.
The impact was indeed severe: IBM stated that users might not be able to access its catalogue of cloudy services or provision affected services.
Speaking of which, there were 23 – among them Cloud Object Storage, Block Storage Snapshots for VPC, Load Balancers and virtual private networks. In other words, some basic building blocks of enterprise IT that clouds are supposed to be able to scale as and when you – or your infrastructure-as-code – need more resource.
Emails sent to IBM customers and the company's incident customer portal offered differing timelines for the incident. Emails state 16:41 UTC on August 2nd as the moment IBM started investigating the issue, but the portal's first listed action is ongoing probes as of 20:56. The timelines converge at 21:24 when mitigation commenced, before resolution at 00:54 on August third.
IBM's cloud endured a similar outage on Sunday, and some of the services that could not be provisioned over the weekend were also down today. Big Blue's Cloud Console was among the weekend casualties.
- IBM Cloud’s biggest region hit by five-hour Severity One brownout
- IBM Cloud resets ‘Days Since Last Major Incident’ clock to zero – after just five days
- Big Blue services enjoy a lie-in: IBM cloud gets the Monday blues and its customers won't have been happy either
IBM also had an outage on July 22nd when users were unable to log on to its cloud, and experienced similar outages on April third, April 26th, and May 31st. The company told The Register the July outage was due to the temporary demise of Akamai Edge DNS.
In June 2020, IBM's cloud went down so hard even its self-hosted status page was unavailable. The Register asked IBM if it is working to ensure its cloud will not fall victim to single points of failure — be they at Akamai, in-house or elsewhere.
The company did not reply to our queries.
The upshot is that IBM has twice in two days been unable to present its cloud catalogue to users or ensure they can provision all services.
Yes, the same IBM that has bet its business on hybrid clouds.
IBM is not alone in having big problems, though.
Google Cloud has also experienced outages in recent days. The company's Component Access Gateway produced errors for nearly two days, and last week users suffered through three days and ten hours of trouble during which it was not reliable to provision the persistent SSDs in four US-based regions. That meant users could not create new resources in Google Compute Engine, Google Kubernetes Engine, Cloud Composer, Cloud SQL, Cloud Dataproc, and Apigee X.
Again, that's so not the "what you need, as soon as you need it" experience promised by elastic public clouds. ®