Systems Approach Anyone who studies internet technology quickly learns about the importance of distributed algorithms to its design and operation. Routing protocols are an obvious example of such algorithms.
I remember learning how link-state routing worked and appreciating the elegance of the approach: each router telling its neighbors about its local view of the network; flooding of these updates until each router has a complete picture of the network topology; and then every router running the same shortest-path algorithm to ensure (mostly) loop-free routing. I think it was this elegance, and the mental challenge of understanding how such algorithms work, that turned me into a “networking person” for the next thirty years.
The idea of decentralization is baked quite firmly into the internet’s architecture. The definitive paper on the internet’s original design is David Clark’s “The Design Philosophy of the DARPA Internet Protocols” published [PDF] in 1988. Near the top of the list of design goals we find “Internet communication must continue despite loss of networks or gateways,” and “The Internet must permit distributed management of its resources.” The first goal leads directly to the idea that there must not be single points of failure, while the second says more about how network operations must be decentralized.
The idea of decentralization is baked quite firmly into the internet’s architecture
When I worked on the team developing MPLS in the late 1990s, we absolutely believed that every algorithm had to be fully decentralized. Both MPLS traffic engineering (TE) and MPLS-BGP VPNs were designed to use fully distributed algorithms with no central point of control. In the case of TE, we realized early on that centralized algorithms could come closer to providing optimal solutions, but we couldn’t see any way to get those algorithms into the hands of users, given the fundamentally distributed nature of routing.
Ultimately the idea that centralized algorithms could do better took hold with software-defined networking. Google with B4, and Microsoft with SWAN [PDF] both found a way to improve on MPLS-TE by using centralized path selection algorithms, using an SDN controller to push centrally computed paths out to routers that implement a distributed data plane. And MPLS VPNs now face a serious challenge from SD-WAN solutions, which centralize the control of VPN tunnel creation to provide an operationally much simpler solution than that provided by MPLS.
Many people who had internalized the lessons of distributed network architecture struggled to accept SDN because the concept of centralized control was so much at odds with everything we believed about best network design practices. What pushed me over to the SDN camp was the realization that you could build scalable and fault-tolerant networks with centralized control as long as you leveraged ideas from outside the networking community.
Consensus algorithms such as Paxos and Raft, for example, sit at the heart of most SDN controllers, enabling them to scale and tolerate component failures. SDN enables the logical centralization of control without introducing the downsides of scaling bottlenecks or single points of failure. And it has produced substantial benefits, such as the ability to expose a network-wide API, considerably simplifying the problem of network configuration and opening the way to automated network provisioning.
SDN has also not actually made the internet less decentralized. There are still hundreds or thousands of ISPs, the domain name system is still decentralized, and autonomous systems are still managed independently of each other.
Platforms such as Google, Facebook, and Twitter ... present a rather monolithic view of the internet to billions of users
But there is an aspect of centralization to be concerned about, which is the platforms that determine how many people use the internet. While, from a technical point of view, platforms such as Google, Facebook, and Twitter are impressively engineered distributed systems, they present a rather monolithic view of the internet to billions of users. This view of how the actual services that we consume on the internet became increasingly centralized is well captured in a blog post from a16z’s Chris Dixon. A similar view has been nicely illustrated by one of my favorite cartoonists, The Oatmeal: “Reaching People on the Internet in 2021.”
Both Dixon and the Oatmeal point to the disadvantages of leaving too much control in the hands of large platforms. For example, central platforms can suddenly change policies to shift users away from the content being provided by a creator.
There are more technical examples in which widespread reliance on a single platform has led to broad unavailability of internet services. For example, the Fastly outage of 2021 had a global impact on sites that depended on its CDN (such as the New York Times and Amazon); days later, an outage at Akamai had a similar effect; Cloudflare’s 2020 failures provide yet another example of a problem at one platform having sweeping impact. There’s an interesting blog from Cloudflare discussing yet another high-impact outage, which is traced back to Raft failing to elect a leader under certain settings and failure conditions. Essentially, a flaw in a distributed algorithm created a single point of failure for many customers.
- IPv6 still 5–10 years away from mainstream use, but K8s networking and multi-cloud are now real
- Majority of Nutanix users now employ its homebrew hypervisor
- VMware names virtual firewalls as first workload it will offload to SmartNICs
- OpenStack's 10th birthday is next week, but you get the present of a new release today
It’s worth returning to Clark’s Internet Philosophy paper from 1988 and noting that, while the internet still works when routers and gateways fail, satisfying goal number one, many services and websites now fail when a platform on which they depend (such as a CDN) fails. In effect, single points of failure have been unwittingly introduced. And while distributed management of the internet lives on, large chunks of the services we depend on are managed by a small number of entities.
Some of these problems are easier to address than others. The Oatmeal cartoon points to a subscription email service as a way to bypass central gatekeepers of content. Perhaps it will become a best practice to start using multiple CDN providers. And it is claimed that blockchains could lead to a more decentralized internet (see Dixon’s post above). Decentralized finance is one example of how blockchains have created an opportunity to decentralize historically centralized functions. Non-fungible tokens (NFTs) provide a possible path for artists and creators to reach their audiences without central entities (record labels, streaming services, auction houses). At the same time, there is plenty of justified skepticism about the long-term potential of blockchains and cryptocurrencies to move beyond the current speculative phase.
It seems that the pendulum swung hard towards centralization with the rise of a few giant internet companies controlling the way billions of people experience the internet, and that pendulum is showing signs of slowing, if not starting to swing the other way. Decentralization is a pillar of the internet’s architecture that has been fundamental to its success, and we’re now seeing a wide range of efforts to return to its decentralized roots. Let’s hope that at least some will be successful. ®
Larry Peterson and Bruce Davie are the authors of Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available on GitHub. You can find them on Twitter, their writings on Substack, and past The Register columns here.