What’s the big deal with service meshes? Think of them as SDN at Layer 7
A technical yet demystifying dive into networking tech you can’t avoid
Systems Approach I remember when I first heard about Service Meshes in 2017, and wondering what the big deal was. Building cloud applications as a graph of microservices was commonplace, and telcos were hard at work inventing yet other ways to chain together virtualized network functions. Service graphs, service chains, service meshes … how many ways do we really need to talk about composing complex systems from a collection of smaller components?
It wasn’t until I recognised a familiar pattern that I got it: a Service Mesh is just SDN at Layer 7. That’s probably what happens when SDN is the hammer you keep hitting nails with, but I’ve come to believe there is value in that perspective.
The figure below highlights the similarities between the two scenarios, both of which include a centralised controller that issues directives to a distributed set of connectors (physical/virtual switches in one case, and a sidecar container in the other case) — based on a combination of policy intents from above and monitoring data reported from below. The primary difference is that the SDN controller on the left is controlling L2/3 connectivity and the Service Mesh on the right is controlling L7 connectivity.
Comparisons like this often break down at some point, but for me, identifying the differences between the two cases also helped me understand the opportunities in this space. In short, these two cases can be viewed as two ends of a spectrum, with each making a different performance-vs-expressibility design choice for the “connector” elements.
Sidecars can run arbitrary code, and so implement any imaginable service connectivity policy. But the biggest knock on sidecars is that bouncing all traffic through an intermediate container results in a non-trivial performance hit. Physical L2/L3 switches have forwarding rates measured in terabits-per-second, but support limited/fixed functionality (eg, coarse-grained ACLs).
P4-programmable forwarding pipelines present an opportunity to offload some sidecar functionality to the switching fabric, but the best opportunity to find a best-of-both-worlds design point is in virtual switches and SmartNICs. Note also that the functionality of a sidecar generally needs to be close enough to the relevant service to see the actual RPC messages. That tends to rule out network devices that will only see encrypted traffic between hosts.
All of this brings us to a topic that is attracting a lot of attention — namely, to optimise Service Meshes using a combination of eBPF (extended Berkeley Packet Filter) and XDP (eXpress Data Path). When used together, they provide a way to program generalised Match-Action rules in the OS kernel (as part of a virtual switch) or, alternatively, on a SmartNIC.
That eBPF/XDP can be viewed as an alternative implementation of OpenFlow/P4-inspired flow rules is not a coincidence — there is something fundamental about Match-Action rules as an abstraction for programming (and controlling) end-to-end connectivity. Having identified this commonality, the differences are again helpful: eBPF/XDP allows (mostly) general code, while OpenFlow defines a fixed set of Match-Actions and P4 is a restricted language for expressing the same. This is necessary when the Action must execute within a fixed cycle budget, as is the case for a switch-based forwarding pipeline. It also enables formal verification of the data plane — a promising opportunity being pursued by the research community.
It turns out I wasn’t the only person to make the connection between SDN and Service Meshes. Here is Bruce’s version from two years ago, and VMware’s Service Mesh product clearly has parallels to its other SDN offerings. In my experience, there is enormous value in recognising commonality and defining unifying abstractions across seemingly disparate implementation artifacts.
Unifying abstractions are the basis for building better systems. Acknowledging the power of a centralised policy engine (eg, the role of the SDN controller) is one such abstraction (which we also noted in our recent security post). The fundamental nature of Match-Action rules as a way to specify forwarding behaviour (eg, the role of OpenFlow) is another.
Recognising that Envoy sidecars, eBPF/XDP kernel modules, and P4-programmed pipelines can be viewed as three implementation choices for programmable forwarding engines used to build end-to-end service connectivity is an intriguing opportunity that deserves more attention. Successful platforms build on the abstractions that have proven useful in the past. And that is a key tenet of the Systems Approach. ®
Larry Peterson and Bruce Davie are the authors of Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available on GitHub. You can find them on Twitter and their writings on Substack.