Google has revealed how it hardened the open source KVM hypervisor to run in its cloud. Removing the QEMU hardware emulation tool looks to be a big part of its efforts.
Google Cloud's technical lead manager Andy Honig and senior product manager Nelly Porter write that the company decided it needed to develop its own alternative with the following qualities:
Simple host and guest architecture support matrix. QEMU supports a large matrix of host and guest architectures, along with different modes and devices that significantly increase complexity. Because we support a single architecture and a relatively small number of devices, our emulator is much simpler. We don’t currently support cross-architecture host/guest combinations, which helps avoid additional complexity and potential exploits. Google’s virtual machine monitor is composed of individual components with a strong emphasis on simplicity and testability. Unit testing leads to fewer bugs in complex system. QEMU code lacks unit tests and has many interdependencies that would make unit testing extremely difficult.
No history of security problems. QEMU has a long track record of security bugs, such as VENOM, and it's unclear what vulnerabilities may still be lurking in the code.
Google's decision to can QEMU is understandable, given that the Xen Project has had a torrid time with the emulator. Xen has even contemplated binning QEMU after fighting flaws a-plenty over the last couple of years.
Honig and Porter outline other KVM-hardening efforts, namely:
- Proactive vulnerability search that uses the “multiple layers of security and isolation built into Google’s KVM”, along with efforts to strengthen those components;
- Shrinking KVM's attack surface by removing unused components such as a legacy mouse driver;
- Booting VMs to a known good state, thanks to a regime that means each host running KVM “generates a peer-to-peer cryptographic key sharing system that it shares with jobs running on that host, helping to make sure that all communication between jobs running on the host is explicitly authenticated and authorized”;
- Verifying code integrity “on every level — from the boot-loader, to KVM, to the customers’ guest VMs”; “Strict internal SLAs and processes to patch KVM in the event of a critical security vulnerability”;
- “Stringent rollout policies and processes for KVM updates driven by compliance requirements and Google Cloud security controls” and helped by the fact that “Only a small team of Google employees has access to the KVM build system and release management control”.
Google's kimono-opening on KVM security follows the company's reveal of its overall security regime, which includes custom cryptographic silicon in its servers. ®