Last Thursday, office productivity soared when popular online forum Reddit went down for over an hour. Now the site has explained why.
Reddit uses a server synchronisation system called ZooKeeper, but turned this off on August 11 so that it could roll out a new infrastructure build with its cloud providers. However, in the middle of the upgrade, the package management system noticed something was changing and so switched Zookeeper back on.
As a result, the site had to be taken down while the servers were refreshed. In addition, the lack of cache material meant that servers weren't running as normal until three hours after the outage began.
"We take downtime seriously, and are sorry for any inconvenience that we caused," the site's administrators said in a posting.
"The silver lining is that in the process of restoring our systems, we completed a big milestone in our operations modernization that will help make development a lot faster and easier at Reddit." ®