Exclusive The Azure outage of January 29 claimed some unexpected victims in the form of surprise database deletions for unlucky customers.
The issue afflicted a number of Azure SQL databases that utilize custom KeyVault keys for Transparent Data Encryption (TDE), according to a message sent to users seen by The Register. Some internal code accidentally dropped these databases during Azure's portal wobble yesterday, forcing Microsoft to restore customer data from a five-minute-ago snapshot.
That means transactions, product orders, and other updates to the data stores during that five-minute window were lost. That may warm you up with red-hot anger if you're in the middle of a particularly nasty cold snap.
The note explained that the cockup happened automatically during what Redmond delicately called an network infrastructure event: a CenturyLink DNS snafu that locked essentially half of Microsoft 365 customers out of their cloud accounts, a breakdown that began at 1045 UTC.
"An automated process, designed to trigger when custom keys are removed from KeyVault, inadvertently caused these TDE databases to be dropped," the message read.
"We are in the process of restoring a copy of these SQL DBs from a recovery point in time of less than 5 minutes before the database was dropped. These restored databases ... are located on the same server as the original database."
Ouch. The Windows giant is asking that if there were any, you know, important transactions in that five-minute window that may impact business processes, then raising a support ticket would be just dandy. Here's more of the memo:
We ask that customers, for each database, identify if lost transactions, during this 5 minute timeframe, could impact business processes or applications outside the database. We would ask you to raise a support ticket in this instance. If the restored database is suitable, the database can be renamed to the original name to continue usage of it.
And you know it's a serious blunder because Redmond is offering months of database service for free as compensation:
We sincerely apologize for the impact to your service. Azure usage charges will be waived for all restored databases for 2 months, and all the original databases for 3 months. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future.
Users have taken to social media to complain about the cloud blunder, while others were left scratching their head in befuddlement...
@AzureSupport @AzureSQLDB Last night around 21.23 some of our azure SQL databases where deleted and not by us. The database is back restored but is empty, we did not do the restore?— Koen Vanden Bossche (@VdBosscheKoen) January 30, 2019
I heard from other people that they have the same issue but there databases are not restored
Transparent Data Encryption, according to Microsoft, is intended to protect an Azure SQL database against what the tech giant calls “the threat of malicious activity.” Clearly it does not protect against the threat of a rogue script running amok during an outage.
Bring Your Own Key (BYOK) support is intended to further reassure users by allowing them to encrypt the Database Encryption Key (DEK) with an asymmetric key called TDE Protector. The TDE is then stored in Azure Key Vault.
In Microsoft’s guide for the usage of TDE with BYOK, the biz is at pains to explain “if TDE encrypted SQL databases lose access to the key vault because they cannot bypass the firewall, the databases are dropped within 24 hours.”
We’ve contacted the software giant for comment, and were told by a spokesperson: "We’re working to restore access to resources that were unavailable to a limited subset of customers. Full access has been re-established for most of those customers already.” ®