You should read Section 8 of the Unix User's Manual
And see the importance of open and accessible operations
Systems Approach If, like me, you were a computer-science graduate student who cut your teeth on Berkeley Unix – complete with the first open-source implementation of TCP/IP – you know Section 8 as the cryptic System Maintenance Commands section of the Unix User's Manual.
It was obvious, to me, that this concluding section warranted a closer look because the introduction warned: "Information in this section is not of great interest to most users." Judging by my taste in research problems over the years, reading Section 8 turned out to be a pretty good investment.
But before getting to Section 8, you first learned about the rest of Unix, where you discovered how empowering it is to be able to build new internet applications. Anyone interested in how targeted investments in open-source software, coupled with affordable hardware, can spur innovation should study the role of the Berkeley Software Distribution (BSD) in the success of the internet.
It's easy to assume the internet as we know it today was inevitable, but at the time BSD Unix happened it was not at all clear that the incumbent telcos could be disrupted. We've commented on the power of APIs many times, but the impact of the Socket API (Section 2) on enabling innovation on top of the internet cannot be overstated. With that stable fixed point in the architecture, a thousand flowers bloomed… and we have, thankfully, moved well beyond the telco vision of B-ISDN.
Section 8 was the second half of the story. In addition to describing how to shutdown and boot a system, it defined the process for managing long-running daemon processes, the Unix equivalent of today's microservices. If you had responsibility for configuring and managing system services on your department's server, which came with superuser privilege, you needed not only to know how to program Unix, you also needed to understand the ins and outs of operating Unix.
As a grad student, the lessons I learned while being responsible for sendmail(8) on a live multi-user system were immeasurable. Every mistake instantly sent the faculty into the hallway looking for the responsible idiot. (In my defense, this was at a time when email addresses contained % and ! operators in addition to @, and their precedence was not well-defined.)
The lessons I learned while being responsible for sendmail on a live multi-user system were immeasurable. Every mistake instantly sent the faculty into the hallway looking for the responsible idiot
BSD also provided me with an early lesson in the power of having many eyeballs on the lookout for security vulnerabilities. Looking at the source code for sendmail, for example, revealed a backdoor, whereby one could Telnet to port 25, type the magic "wizard" command, and fork a root shell. So I made my counterparts at Berkeley and other universities aware of that vulnerability by doing exactly that.
Others probably did too, but it was a different time, and the lesson didn't initially take hold. With debugging convenience and a naive sense of community trumping security, the backdoor remained open by default in sendmail until the Morris Worm used it as one of its attack vectors a couple years later.
Gaining this sort of practical experience is obviously valuable if your plan is to become a system administrator, but it has long been my experience that an opportunity to manage systems that deliver services to actual users is a great source of systems research problems, as well as fertile ground for platform innovations.
My PhD dissertation, born out of frustration with sendmail, turned out to be on naming and addressing; later, real-world experience running a CDN on PlanetLab generated a sequence of systems papers (as Vivek Pai and I reported in a 2007 CACM article); and most recently, our experience operating an edge cloud has led to an appreciation for the state management problem inherent in DevOps. And my experience is far from unique: many of the cloud tools we take for granted today – Kubernetes is a great example – started as someone's response to an operational point of pain.
This all leads me to believe that an open operations platform (as documented in Section 8) is just as important as an open programming platform (as documented in Section 2) for democratizing innovation.
- What is Magma? An open-source project for building mobile networks
- SmartNICs, IPUs, DPUs de-hyped: Why and how cloud giants are offloading work from server CPUs
- Everything you wanted to know about modern network congestion control but were perhaps too afraid to ask
- Here's an idea: Verification for computer networks as well as chips and code
Would BSD Unix have had the same impact in the 1980s and 1990s if the university computer center had supported it rather than the computer-science department letting its grad students take ownership of the operations problem? We can ask a similar question today. The value of being able to create new cloud applications is abundantly clear, but is there also value in having open access to the tools used to manage and operate the cloud (rather than delegating the latter to the cloud providers)?
To me, the answer is clearly yes. It comes down to the virtuous cycle of solutions being enabled by platforms on the one hand, and platforms being reshaped with the experience of usage on the other. Stable platforms with well-defined APIs surely allow a thousand flowers to bloom, but eventually disruptive refactoring of those platforms is what leads to the next round of innovation.
Stable platforms with well-defined APIs allow a thousand flowers to bloom, but eventually, disruptive refactoring of those platforms is what leads to the next round of innovation
Software-defined networking is a famous example of disruptive refactoring, but it only works if we have sufficiently sophisticated tooling to assemble all the components into a coherent – and manageable – system. Orchestration and lifecycle management have become the dominant operational issues because (a) many smaller parts have to be assembled, and (b) these individual parts are expected to change more frequently. They are essential parts of what we might call the Cloud OS.
Certainly not everyone who writes programs – whether it's running on a personal server or in the cloud – also needs to know how to keep that program running 24/7, but from the perspective of empowering more people to participate in the creation of new systems, the operations platform needs to be kept open and accessible to anyone who wants to invest the time in it.
Fortunately, there are a plethora of open-source components available today that can be used to operate and lifecycle-manage a cloud. We've documented a roadmap for using them in Edge Cloud Operations: A Systems Approach (a sort of "Section 8" for the Cloud). We're hoping there are still a few people who are just crazy enough to give it a try. ®
Larry Peterson and Bruce Davie are the authors of Computer Networks: A Systems Approach and the related Systems Approach series of books. All their content is open source and available on GitHub. You can find them on Twitter, their writings on Substack, and past The Register columns here.