IBM blames 'external' network provider, incorrect routing, traffic flood for its two-hour cloud outage

No data loss or attack detected. But aren't hyperscale clouds supposed to be more resilient than this?

Thu 11 Jun 2020 // 02:14 UTC

IBM has blamed a third party for yesterday's hours-long outage of its entire cloud. And while it says no data loss or attack was detected, it's still not a good look: major clouds are supposed to be more resilient than this.

A brief notice on the IT titan's cloud status page offered the following explanation for the breakdown:

IBM is focused on external network provider issues as the cause of the disruption of IBM Cloud services on Tuesday, June 9. All services have been restored.

A detailed root cause analysis is underway. An investigation shows an external network provider flooded the IBM Cloud network with incorrect routing, resulting in severe congestion of traffic and impacting IBM Cloud services and our data centers. Mitigation steps have been taken to prevent a reoccurrence. Root cause analysis has not identified any data loss or cybersecurity issues.

That's rather vague verbiage but is consistent with the sort of traffic flood that can happen when an inadvertent border gateway protocol (BGP) blunder directs packets to the wrong place. BGP hijacking or misconfiguration is a known problem, and you'd think an outfit like IBM would be alert to that sort of error, and have defenses or countermeasures in place to mitigate it.

Yet IBM may not be great at handling traffic spikes: when it ran Australia's e-census it mistakenly identified a flood of inbound connections as a denial-of-service attack, and had to pull the plug on a router to sort things out.

IBM to power down Power-powered virtual private cloud, GPU-accelerated options

Another possibility is that a supplier to IBM's cloud messed things up. We know that IBM uses Akamai for its content delivery network, offers Cloudflare-as-a-service, and has an expansive relationship with AT&T. The Register does not suggest any of those companies had a role in the outage, but all do have sufficient scale to play a part in a global outage.

Whatever the cause, hyperscale clouds are supposed to be sufficiently resilient to handle unexpected nastiness of this sort. That IBM succumbed is not reassuring, given the tech goliath says it is now a cloud business. That's a tricky position to sustain given its cloud remains rather less elegant to operate than rivals' services. ®

Topics

Special Features

Vendor Voice

Resources

SaaS

IBM blames 'external' network provider, incorrect routing, traffic flood for its two-hour cloud outage

No data loss or attack detected. But aren't hyperscale clouds supposed to be more resilient than this?

IBM to power down Power-powered virtual private cloud, GPU-accelerated options

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss

IBM accused of cheating its own executive assistants out of overtime pay

911 goes MIA across multiple US states, cause unclear

Reducing the cloud security overhead

Misconfigured cloud server leaked clues of North Korean animation scam

Sacramento airport goes no-fly after AT&T internet cable snipped

Oracle scores big win with Fujitsu Japan for its Alloy partner cloud

Tencent Cloud to revisit design after circular dependencies slowed emergency API fix

Alleged cryptojacker accused of stealing $3.5M from cloud to mine under $1M in crypto

Alibaba Cloud reveals network telemetry tool that helped cut number of engineers needed by 86%

Backblaze cloud storage buzzes with added Event Notifications

AWS must pay $525M to cloud storage patent holder, says jury

About Us

Our Websites

Your Privacy