It has been a rough morning for Salesforce here in San Francisco: part of the cloud giant's sprawling empire fell over and stayed down for more than 12 hours. In fact, it stumbled over so hard, CEO Marc Benioff's techies are having to apply fixes to servers manually.
The SaaS titan told customers on Thursday night its Files service, used to store and share documents, was struggling worldwide to, well, read and write files. It's now Friday lunchtime, and some data remains inaccessible as engineers continue to roll out repairs.
"The technology team continues to manually update a subset of servers that need to have the fix implemented as well as test and validate the resynchronization process, which is the next phase of the resolution path," Salesforce told customers at 1030 PT on Friday.
"We continue to see a reduction in read/write alerts, and some customers are reporting that they can now read and write to new files.
"However, there is still a subset of customers being impacted by the issue. Files that were created after the incident began and before the fix was applied, however, may remain unavailable until resynchronization is complete."
The systems crashed on Thursday night at roughly 2330 PT. This meant just as Western Europe was beginning to wake up and start their Fridays, the Files part of Salesforce fell over.
Wow. Global outage of @salesforce. What else is on the menu today? 😂— Martin Fišer (@MTFtwister) September 20, 2019
This outage would carry over into the US East Coast and Midwest working day, and the morning for the West Coast where Salesforce is based.
By 1100 PT, the cloud giant said it was nearly through the tedious process of hand-fixing servers to address the problem.
"The technology team now has less than 5 per cent of the affected servers to manually apply the fix to, and the resynchronization process is being implemented to a subset of production hosts," the biz explained. "If the resynchronization completes successfully on those hosts, the team will then start the process of implementing resynchronization across the rest of the affected environment."
As we were preparing to publish this piece, shortly after 1300 PT, Salesforce posted the following revised update, indicating the outage is ongoing as is the repair work:
The technology team continues to manually apply the fix to a subset of the affected servers in three of the impacted data centers. All servers in the other five affected data centers have had the fix implemented.
The synchronization process is the next phase of the resolution path that will ensure files that were created after the incident began and before the fix was applied are once again available to customers. That process was initiated within the first of the eight data centers, and on successful completion of that activity and after the technology team validates the implementation, we will then initiate that process across the remaining data centers.
Inconvenient as it is, Friday's problem pales in comparison to the nightmare outage in May that caused portions of the service to go down over a span of three days.
Hopefully, most of those affected were able to find an alternate way to move files and, barring that, at least head out for an early start to the weekend. ®