XCalibre's FlexiScale cloud has disappeared from the heavens. Again.
In late August, an engineer with the UK-based hosting outfit accidentally deleted the company's high-profile compute cloud - which offers on-demand storage, processing, and network bandwidth a la Amazon Web Services - and now XCalibre is working to resolve a "core network failure" that has kept some customers off-line for as much as twenty-four hours.
According XCalibre CEO Tony Lucas, the outage hit at about 5pm UK time on Wednesday, when the cloud experienced "a near simultaneous switch failure" in the switches that connect the storage to the processing nodes. "That is relatively easy to fix, though you do have to take everything down and restart it again," Lucas tells The Reg. "But because of a software limitation in a particular piece of software we use...which only allows you to do one job at a time, so when we have to restart hundreds and hundreds of servers, it takes sometime."
It's no secret that FlexisScale relies on Virtual Iron, the virtualization manager based on the open-source Xen hypervisor.
Lucas says that some customers were back up and running by 9pm UK yesterday. But others are still waiting for their bit of cloud to reappear. "Every single server was restarted by 7:45 this morning [UK time], but there is a network bug that a number of them are still having issues with. We're going through them one-by-one and we're down to a handful - somewhere in the teens."
Lucas is intent on beefing up his architecture so this sort of thing doesn't happen in the future. But that's twice in two months. At the end of August, that engineer accidentally deleted the cloud's main storage volume, and XCalibre needed several days to rebuild it.
And in the midst of the latest outage, some customers are peeved. "I am angry, very angry, so yes there's some vitriol in here, I was hoping that sleeping on it would dull that, but being that all my servers are still down it hasn't," says someone who calls himself Flish.
"I didn't have to wait very long for the next outage," says RichText. "Fortunately, I'm only testing things out at the moment. Does anyone actually use Flexiscale for anything mission-critical?"
A good question. There are drawbacks to putting your apps in the sky. In recent months, we've also seen plenty of downtime from Amazon Web Services - and Google Apps too. ®