Ouch: last week, Digital Ocean took the GitLab fat-finger pill, deleting a production database and triggering a five-hour outage.
Unlike GitLab's disaster, the Digital Ocean “engineer-driven configuration error” didn't include a backup failure.
In its apology for the “unacceptable” turn of events, Digital Ocean explains that the outage took down its control panel and API, and prevented customers from creating new virtual servers (“Droplets”).
Existing Droplets stayed online, thankfully.
The root cause of the problem was that someone's automatic test process was configured using production credentials, something the company agrees should not happen.
“Within three minutes of the initial alerts, we discovered that our primary database had been deleted”, the post states, and that triggered its restore process from a time-delayed database replica.
As well as putting new restrictions on access to the primary database, the company says it's rolling out a network upgrade to speed up connections to the database servers.
It's a salutary lesson for sysadmins: if you don't test your backups, you don't have backups, only hopes. Digital Ocean, apparently, does test them. ®