Cloudflare dashboard, API service feeling poorly due to datacenter power snafu
Right on time for results day
Cloudflare is having a bad day amid reports of problems with its dashboard and API service following datacenter power issues.
The web security provider said cached files served via the Cloudflare CDN and other security features at the Cloudflare Edge were unaffected, however, the dashboard and API are most definitely not well.
It said: "The following products are currently impacted at the data plane/edge level, meaning that the full product functionality is either partially or fully affected: Logpush, WARP/Zero Trust device posture, Magic WAN, Cloudflare dashboard, Cloudflare API, Stream API, Workers API, Alert Notification System, Workers KV namespace operations."
This outage comes days after the company reported availability issues with Cloudflare Pages and Workers KV. Cloudflare is also set to announce its results for the third quarter ended September 30 after the US market closes later today.
- Cloudflare exiles baseboard management controller from its server motherboards
- Cloudflare loosens AI from the network edge using GPU-accelerated Workers
- No joke: Cloudflare takes aim at Google Fonts with ROFL
- Cloudflare opposes Europe's plan to make Big Tech help pay for networks
The vendor posted a lengthy blog on November 1 explaining what went wrong on October 30 and what it was doing to ensure there would be no repeat.
In a nutshell, Cloudflare rolled out a new KV build to production. It turned out that the deployment tool had a bug, and some traffic got diverted to the wrong destination, which triggered a rollback … which failed. The result was that engineers had to manually switch the production route to the previous working version of Workers KV.
The problem is that an awful lot of Cloudflare products and services depend on Workers KV, meaning that when there is a problem with the platform, the blast radius can be impressive.
In this latest instance, the issue appears to be down to a loss of power that has impacted datacenters rather than some iffy code. It's said that services are gradually recovering.
In a statement sent to The Reg, Cloudflare said:
"We operate in multiple redundant data centres in Oregon that power Cloudflare’s control plane (dashboard, logging, etc). There was a regional power issue that impacted multiple facilities in the region. The facilities failed to generate power overnight. Then, this morning, there were multiple generator failures that took the facilities entirely offline.
"We have failed over to our disaster recovery facility and most of our services are restored. This data centre outage impacted Cloudflare’s dashboards and APIs, but it did not impact traffic flowing through our global network. We are working with our data centre vendors to investigate the root cause of the regional power outage and generator failures.
"We expect to publish multiple blogs based on what we learn and can share those with you when they're live." ®