How Google uses mirrors to dynamically reconfigure its networks
Tiny electromechanical units bounce traffic down different fibers
Google has scaled its network capacity from over one petabit per second to beyond six petabits per second since 2015, and some of that growth has come from switches that bounce optical signals off an array of mirrors to redirect traffic.
As our sibling site The Next Platform reported in 2015, Google calls its datacenter networking tech Jupiter, which uses a mixture of merchant silicon and custom code to connect the kit that runs Search, YouTube, Gmail, the G-cloud, and plenty more besides.
The headline figures in both documents detail 5x higher speed and capacity, 30 percent reduction in capex, and 41 percent reduction in power consumption.
Plenty of those improvements are the result of Optical Circuit Switches (OCSes) that use mirrors mounted on Micro-ElectroMechanical Systems (MEMS) to map an optical fiber input port to an output port dynamically.
And, yup, we're aware that MEMS-based, and non-MEMS, optical mirror switching has existed for computer networks for years and years. What's cool here is the density and throughput Google says it has developed, and now documented, for its own globe-spanning use.
Well, we found it interesting, anyway.
In Google's switches, a signal reaches a "fiber collimator array" that offers 136 physical I/O paths, or individual fibers. An incoming signal emerges from one of those fibers, then bounces off a splitter before hitting a MEMS device that has 136 micro mirrors. The MEMS device moves in two dimensions and reflects the signal to one of the 136 fibers in the outgoing collimator array.
The tech was needed because Google wanted its network to "support heterogeneous network elements in a 'pay as you grow' model, adding network elements only when needed and supporting the latest generation of technology incrementally."
That means "allowing the incremental addition of network capacity – even if of a different technology than previously deployed – to deliver a proportional capacity increase and native interoperability for the entire building of devices."
Achieving that vision is not easy because Google's datacenter networks need to be deployed "at the scale of an entire building – perhaps 40MW or more of infrastructure."
"Further, the servers and storage devices deployed into the building are always evolving, for example moving from 40Gbit/sec to 100Gbit/sec to 200Gbit/sec and today 400Gbit/sec native network interconnects. Therefore, the datacenter network needs to evolve dynamically to keep pace with the new elements connecting to it."
Google also recognizes that its network is not a single entity.
- Tencent lines up to deploy Broadcom's co-packaged optical switches
- Flash memory vendors unveil PCIe 5.0 SSDs, latest spec for CXL interconnect tech
- Datacenter networks: You'll manage them from the cloud, eventually, claims Cisco
"Datacenter networks are inherently multi-tenant and continuously subject to maintenance and localized failures," Google's description, from infrastructure VP Amin Vahdat, explained. "A single datacenter network hosts hundreds of individual services with varying levels of priority and sensitivity to bandwidth and latency variation."
"For example, serving web search results in real time might require real-time latency guarantees and bandwidth allocation, while a multi-hour batch analytics job may have more flexible bandwidth requirements for short periods of time," Vahdat stated. "Given this, the datacenter network should allocate bandwidth and pathing for services based on real-time communication patterns and application-aware optimization of the network."
That kind of dynamic reconfiguration also helps resilience.
"Ideally, if ten percent of network capacity needs to be temporarily taken down for an upgrade, then that ten percent should not be uniformly distributed across all tenants, but apportioned based on individual application requirements and priority," Vahdat explains.
But networks are hardwired to do certain things well, and even Google's software-defined networking can only do so much to reconfigure them to adapt to new requirements.
MEMS and OCS also make it possible to upgrade and reconfigure networks without having to rewire (or re-fiber) a datacenter.
Vahdat concluded that its network "delivers 50x less downtime than the best alternatives we are aware of."
So, take that, Cisco, Juniper, Arista, and pals. And for the rest of us, take comfort that this stuff usually trickles down over time – as happened with Kubernetes and cloud-inspired consumption based IaaS models. ®