Chef reviews internal update plans after 'degradation incident'
Automation vendor manually fixes Hosted Chef problem
Chef fans faced going hungry on Friday afternoon as the automation vendor’s hosted service degraded itself by having a little lie-down in the back pantry.
Chef issued a mea culpa explaining the “Hosted Chef Service Degradation Incident”, which saw users presented with 404 errors at the end of “successful” Chef Client runs, or empty responses using Knife runs as a client on their workstations.
“This was a result of 2 of our 16 frontend nodes being left in a incorrect state following a routine deployment,” the statement continued. We’ll overlook this and other grammar fails in Chef’s statement, seeing as the firm clearly had other things on its mind at the time.
The snafu was down to Chef operations performing a overnight deployment to upgrade Chef Server to 12.4 - one of a number of updates last week addressing security vulnerabilities. The deploy tool failed to configure the reporting service on two of its hosts leaving them in a “functional, but degraded state”.
A condition many of our readers will be familiar with, particularly on a Friday. The nodes still passed health checks, meaning no red flags were raised.
Once the problem was noticed by Chef, “service engineers manually reconfigured the two incorrectly running nodes which immediately resolved the increased error rates.”
We’ll leave you to discuss whether there’s a smidgeon of irony in that last statement.
As a result, said Chef, “additional host-level monitoring is being put in place to catch this type of issue more rapidly in the future. Also, ELB health checks are being updated to more throughly [sic] test all components before a node is placed into service behind and load balancer.
A footnote added that a previous version of the posting said the upgrade had “accidentally stated the upgrade was to Chef Server 12.6. The upgrade was actually to 12.4, which is the latest release.” ®