This article is more than 1 year old

Expired router cache sends Google Cloud Engine TITSUP

'Unacceptable' performance led to two hours and forty minutes of dead VMs

Google's Cloud Engine (GCE) has experienced Total Inability to Support Usual Performance (TITSUP) for about two-and-a-half hours.

Incident 15045, as Google describes the outage, kicked off at about 22:59 on the evening of 18 February (West Coast US time) and then rolled on until 01:31 the next day.

Virtual machines in the service were unavailable during the outage, as they could not reach the internet. That was an unfortunate outcome given that the timing of the incident meant it came at the start of the European working day.

Google thinks it has figure out what went wrong, offering the following preliminary analysis of the outage:

The internal software system which programs GCE’s virtual network for VM egress traffic stopped issuing updated routing information. The cause of this interruption is still under active investigation. Cached route information provided a defense in depth against missing updates, but GCE VM egress traffic started to be dropped as the cached routes expired.

"We consider GCE’s availability over the last 24 hours to be unacceptable," the post says, adding that Google's cloudy teams are "completely focused on addressing the incident and its root causes, so that this problem or other hypothetical similar problems cannot recur in the future." ®

More about

TIP US OFF

Send us news


Other stories you might like