Forgetting the history of Unix is coding us into a corner
The lessons of yesteryear's OS are getting lost in translation
FOSDEM 2024 There are vital lessons to be learned from the history of Unix, but they're being forgotten. This is leading to truly vast amounts of wasted effort.
This article is the second in a series of pieces based on The Reg FOSS desk's talk at FOSDEM 2024.
FOSDEM is a conference all about free and open source software, and these days, perhaps regrettably, FOSS is about Unix. Increasingly, Unix today is Linux. The problem is that everyone is forgetting what UNIX really is and knowing your history is essential. As George Santayana said, often misquoted:
Progress, far from consisting in change, depends on retentiveness. When change is absolute there remains no being to improve and no direction is set for possible improvement: and when experience is not retained, as among savages, infancy is perpetual. Those who cannot remember the past are condemned to repeat it.
It's often misquoted, which may be why Henry Spencer (who did the original FOSS implementation of regex) paraphrased it as:
Those who do not understand Unix are condemned to reinvent it, poorly.
What Santayana predicted is coming true around the world today, and so is what Spencer predicted. There are other options, but few even look at them. Why is that?
What is 'Unix' anyway?
First things first, it doesn't mean anything related to the original AT&T codebase. That definition hasn't been true since 1993. Novell bought Unix from AT&T 31 years ago, kept the code, and donated the UNIX trademark to the Open Group. Since then, the name "Unix" has meant "passes Open Group testing," which is broadly what POSIX used to mean. The original codebase is effectively gone. SCO Group offshoot Xinuos still sells things based on it, but their main product is based on FreeBSD.
But alongside macOS on both x86-64 and Arm64 and a handful of other OSes, such as HP/UX and IBM AIX, in the Open Group register of UNIX® certified products, there used to be two different Linux distros, Huawei EulerOS and Inspur K/UX, both Chinese CentOS Linux derivatives. Like it or not, this means that Linux isn't "a Unix-like OS" any more. It is a Unix.
It's a member of a big family of Unixes. Even the simplified version [PDF] from Cambridge University shows nearly 30 branches. The Wikipedia family tree shows lots of branches and lines: Version 1 to 4, then Version 5 to 6, then Version 7, System III and System V … plus lots and lots of derivatives, mostly proprietary, and all the BSDs. And of course Linux, with zero shared code, but a shared design.
But if you take another step back, given that Unix now means compatibility, they are all one branch. It's not about lines of descent any more, because in a sense, what was once the POSIX definition became Unix. But if your goal is simply to pass compatibility tests, you can do that with a shim, some kind of compatibility layer. That's why several versions of IBM z/OS are on the Open Group List. That seems strange because they are not Unix-like at all.
This leads to other questions, such as what constitutes a "Unix-like" OS anyway? Anyone who has a basic idea of how to work with Linux, or macOS from inside the terminal, knows what a Unix looks like. Some identifying characteristics are:
- One file system, rooted at
/
. - A shell. (It doesn't matter which one:
bash
,tcsh
,zsh
.) - Commands like
ls
,cat
,echo
,mkdir
,rmdir
,touch
. - A case-sensitive file system.
- Plain text files, and commands connected with redirection and pipes.
Certainly, there can be exceptions to these. For instance, macOS is a Unix™. It passes the tests, and Apple pays for the certification. But it hides most of the real Unix directory tree, its /etc
is relatively empty, it doesn't have an X server – it's an optional extra. And most of all, it's not case sensitive.
But all the same, it's still recognizably a Unix.
So taking that list of general characteristics, and adding a less visible one – that it's programmed mainly in C or something C-like – and requiring that the OS looks like Unix and nothing else, meaning there's no other native layer underneath, then the family is bigger.
So if it's just about compatibility and we don't care about who inherited code from where any more, this raises other questions. How can we categorize them? What is a Unix-like OS anyway?
Generations and the gaps between
Looking at the successive projects in the original Unix family within AT&T is very instructive. For its first decade, Unix was a research project, not a commercial product, and in that time it went through about seven generations.
The thing is that even when other projects started branching off from it and going their own way from between Unix V4 and V8, the original research project kept going. Both the Wikipedia and Cambridge family trees show the AT&T line stopping after the Tenth Edition. But actually, there were two more generations.
If we consider that whole extended family to be one unit, one box on the chart, that leaves a few outliers. Linux is one. Minix is another. There are others that aren't shown, and a whole category of them are very much Unix-like systems: the microkernels.
The original microkernel, CMU Mach, led to a whole bunch of Unix OSes, including the Open Group's OSF/1 and DEC Tru64, as well as MkLinux and famously the GNU HURD. The only one that isn't a historical curiosity or a tiny neglected niche is Apple's macOS family, including iOS, iPadOS and so on.
Dr Andy Tanenbaum's Minix, which was an indirect progenitor of Linux itself, is another, although Minix 1 and 2 were largely monolithic. Minix 3, though, is a true microkernel, and a version runs inside most modern Intel CPUs' management processors.
There are others. The research on Mach led to L4 and others, and work's still going on. QNX is a commercial microkernel Unix-like OS, and it's used in billions of embedded devices … although the only time you might have played with it was Blackberry 10.
- Damn Small Linux returns after a 12-year gap
- Drowning in code: The ever-growing problem of ever-growing codebases
- PiStorm turbocharges vintage Amigas with the Raspberry Pi
- KDE 6 misses boat to make it into Kubuntu 24.04
Although they have enjoyed more success than you might think, microkernels are hard to get working well. Tiny kernels with lots of user-space processes doing stuff are not so hard – AmigaOS did this successfully in the mid-1980s. The problem is keeping the various processes isolated, yet able to communicate with each other. AmigaOS ran on machines with no memory management hardware – everything was in one address space. It's easy to communicate with another process when you can read and write its memory.
The idea of a true microkernel is that the kernel runs in ring 0, in supervisor mode. It schedules processes, it allocates memory, and that's about it. Everything else is implemented in "servers" in user space. Getting them to talk to each other quickly is the hard problem.
For instance, macOS uses Mach, but to provide Unix-compatible APIs so it looks like a Unix, it took a big chunk of the BSD kernel and called it a "Unix server." Mach still manages the processes, not this Unix server, but even so, the server is still big and complicated. And it still wasn't that fast, so NeXT moved it inside the Mach kernel, meaning it's in kernel space too, running in the same memory space as the not-so-micro-any-more Mach kernel.
This leaves us with two families of Unix-like OSes. Monolithic kernels, which includes almost all the old proprietary commercial Unixes, and all the BSDs – and Linux. Plus microkernels, which we can very loosely group into two families. One is commercial, and today mainly means Apple's OSes and QNX. QNX is proprietary, while Apple's kernel and userland is open source but hasn't influenced much else.
And an open source family, including Minix 3, the GNU Hurd, L4, seL4, and many others. Some are used in embedded control systems – such as Intel's ME – but there is little visible mainstream use.
But that is not the end of the line. It is all that most histories show, but there is a third line of descent. Dennis Ritchie and Ken Thompson's work didn't end in 1979 with Unix 7, even if that's where most commercial Unix originated.
Turn it up to 11
The versions of Unix after the Seventh Edition are sometimes collectively called Research Unix, and when I have asked about this online, it has resulted more in people telling me that's wrong than much useful information.
Even so, the lesser-known editions within AT&T's Unix System Laboratories were the Eighth Edition in 1985, the Ninth Edition in 1986, and in 1989 the Tenth Edition. Some fragmentary source code is available, but little information. (Working links or references to what was in these are welcome!)
What came after the Tenth Edition is, for this writer, where it gets interesting. By 1992, Unix was big commercial news, of course. FreeBSD and NetBSD were happening, and Linux was gathering attention.
Meanwhile, Bell Labs did something else.
Think about what constitutes traditional, old-school Unix. It's about "everything is a file," streams of bytes, the familiar shell, and so on. But there are important things on all modern computers that aren't covered by this.
Graphics, notably, don't really form part of the classic Unix kernel. macOS and Linux are both Unixes, as is BSD, but they have radically different GUI layers. This is because Unix was a minicomputer OS. As Dennis Ritchie himself described, it was written on a DEC PDP-7, and later ported to the bigger DEC PDP-11.
These were departmental-scale, multiuser computers. A host machine, plus dumb text terminals on serial connections, with no graphics and no networking – even so, high-end kit for the 1970s.
During the period when commercial Unix flowered, in the 1980s there were very expensive workstations: effectively, personal, single-user minicomputers. It took until the late 1980s for equipment like inexpensive 32-bit computers with onboard graphics, reasonably fast expansion buses (and thus, reasonably fast networking as a fairly cheap option) to start to be mainstream. Then Unix acquired networking support, as it still has. But ways for one Unix machine to talk to another wasn't considered in the original design. Instead, it has files of terminal type definitions. Even today, decades later, networking and graphics need auxiliary, helper tools, such as NFS, SSH, X11, and so on. It works, and it works very well, but these are bolted-on extras that appeared years after the design was set in stone.
This is the point at when Unix Tenth Edition appeared, in 1989, when networking and GUIs were proliferating everywhere.
Today, the Wayland enthusiasts like to talk about how they are modernizing the Linux graphics stack. But Linux is a Unix, and in Unix, everything is meant to be a file. So any Wayland evangelists out there, tell us: where in the file system can I find the files describing a window on the screen under the Wayland protocol? What file holds the coordinates of the window, its place in the Z-order, its colour depth, its contents?
If this stuff is not in the file system, are you sure this is a Unix tool?
Research Unix 10 started to address some of this, but it was too little, too late. The industry had taken much older, simpler versions, commercialized them, and the results were mutating and metastasizing across the world.
So Dennis Ritchie and Ken Thompson did what Niklaus Wirth did with Modula-2 and later with Oberon. They ignored what the industry was doing, went back to their original ideas, and kept working on refining them.
The result is the next step in the development of Unix, and in this era of spiralling software complexity and bloat, it's time to examine it afresh and see if the geniuses who invented Unix didn't have the right ideas, just as Linux was being created. ®
This is the second part of a multi-part series stemming from our vulture's FOSDEM 2024 talk. Next part to follow...