Starting over: Rebooting the OS stack for fun and profit

Making full effective use of new persistent memory means tearing up the rulebook

Opinion Non-volatile RAM is making a comeback, but the deep assumptions of 1970s OS design will stop us making effective use of it. There are other ways to do things.

The most radical new technology in computing in 50 years was cancelled in 2022. Now, other forms of it are being reinvented. If we identify the obsolete ideas embedded in current OSEs, isolate and discard traditional thinking, we can find new ways to use it efficiently.

When Intel cancelled Optane, I wrote that it was bad news for the entire computing industry.

At the time, I explained why it didn't fit in well with existing designs. Now, let's look at how non-traditional designs could overcome this and embrace it, using existing tools and know-how.

There is a pervasive dogma about how computers work: outdated ideas that are today nothing more than tradition, but traditions that are so deeply entrenched that they have become holy writ. Nobody questions them because nobody remembers that there are any other ways to do things.

Less than two years later, other companies are reinventing different implementations of that same type of technology. Only if we question those traditions can we overcome the barriers they pose.

There are other ways. Let's look at some of them.

To summarize the story back then: Optane didn't sell because all conventional OSes today embed a false assumption at the heart of their design, one that isn't true any more. It's a relic of minicomputer designs of the 1960s, which was reinvented and reincorporated into 1980s OSes.

False dichotomies that will go away, real soon now

Falsehood #1:

Computers have two kinds of storage, primary and secondary. They are profoundly different and they must be accessed in different ways. To use computers, you have to constantly think about this, and manage the two types of storage by moving data from one to the other.

Falsehood #2:

Primary store is small, fast but volatile: when you turn the computer off, its contents are lost. Secondary store is big, but slower and persistent.

Falsehood #3:

Primary store is small so it's directly accessible: the processor can see all of it at once and read or write every block individually. Conversely, secondary store is too big for this. To use it, the processor must use a storage controller to move whole blocks of data from secondary to primary store, work on it, then put it back.

This stuff may sound abstruse and theoretical, but it isn't. It's ubiquitous, we just don't notice. Here are some examples, with the 1970s ideas highlighted:

To use a PC, you turn it on. A tiny program in firmware runs, tests the computer, then it boots the computer: it loads your OS from some files on a drive into RAM and starts running it.

You load a program, then once in memory and running, you can then load a file into it to work on it. But you must remember to save regularly or you might lose your work. If your computer crashes it has to reboot and reload the OS. This resets it and whatever you had open is lost.

Programs are kept in files on a drive but because there are so many of them, today, most OSes have some kind of program launcher which keeps frequently used programs handy. Similarly most people keep frequently used documents somewhere handy, like on their desktop.

Once you have persistent memory – PMEM for short – this all becomes bogus. It's limiting legacy junk.

Put some PMEM in your computer's DIMM slots, and most of the core primary/secondary distinction is lost. All the computer's storage is now directly accessible: it's right there on the processor memory bus. There's no "loading" data from "drives" into memory or "saving" any more.

True, there is still a distinction, but it's a much lesser one. There's no primary or secondary store any more: it's all primary, but some is faster and can be rewritten infinitely many times, while some is a little slower and can only be rewritten millions of times.

Chickens and swans

Desktop computers and servers put this distinction front and centre. They are chickens, scratching in the dirt.

Phones and tablets, though, are swans. They look elegant and glide around smoothly. Their designers work hard to hide this distinction.

You have a grid of icons and you can't directly see your "documents". Open an app and it picks up where it was. Turn off the fondleslab and when you turn it on it's right where it was, although some things will be slower to open at first.

Under the water, though, they are completely traditional. They run Unix, a 1970s minicomputer OS, with millions of little files in a complex filesystem. You just can't see it.

It looks smooth and elegant, but that's only because what's under the surface is concealed. Underwater, they are constantly flailing away.

I want to break free

How do we escape this increasingly artificial distinction? It's inextricably part of the Unix design: everything is a file. Apps communicate using files or pipes of plain text.

Windows NT is just a second cousin to Unix: instead of a Unix-like design, it's a DEC VAX/VMS-like design, implemented in Unix languages with Unix tools. Its design just accepts that what's inside files may have complex structure.

(These days, of course, services communicate over network connections carrying JSON data encoded in plain text. Not a significant improvement… and arguably, it puts Windows at a slight competitive advantage. And no, making Unix more complicated – for instance, adoping new types of shell language to handle more data formats – is not an elegant or simple fix.)

Not all computers were designed this way, though. There have been some lines of machines, reasonably successful in their niches, sold profitably enough to survive for decades, which were not built this way.

Some of their tech survives. Much of it is open source. It would be possible to recreate some of these different designs, and the results would be better adapted for PMEM-based computers.

There is real competitive advantage to be had here, and fortunes to be made. But like Linux itself, the key thing is not to start with pumping millions of dollars into radical new products. That very rarely ends up with turning a profit.

Major new product categories often start out small, cheap or free, insignificant and overlooked by the mainstream until they reach the point where they can offer things the big players can't.

You can't get there from here

The problem is, nobody seems to have a coherent idea of where to begin. It's a bit like the old joke about someone who gets lost and asks a local for directions:

"Oh, well, to get there, you don't want to start from here!"

Sure, I have a suggestion how, why and where to go, starting from where we are now. But so do lots of people. Why do I hope mine is interesting?

Well, one of my guiding principles was not to make an incremental improvement on modern OSes. It was to make cheaper, simpler computers. Potentially, cheaper and simpler even than something like a Raspberry Pi.

It's not about what to build. It's what to take away. If it doesn't lead to profitable new product lines, nobody will care.

This is how my reasoning went.

Firstly. We have to throw away backwards compatibility again. We've done this many times before. It was always worth it. But it's nearly three decades since the last time. Everyone's forgotten. It sounds scary. It's really not.

We are not going to sweep away a hundred million Linux servers, routers, embedded controllers, etc. They are very good at what they do. There is no need to replace them. Leave them be. Forget about "a better Linux than Linux." Linux is really good at being Linux.

Secondly, everything can run VMs these days. If we really need something that existing OSes do better, then fine, launch one in a VM and use that.

The other thing to remember is that the microcomputer era started out with toy computers running toy software.

Who remembers the original Linux announcement?

(just a hobby, won't be big and professional like gnu.)

Just about everything that set out to be big and professional, failed. GNU HURD. Microsoft OS/2. Apple Copland. IBM Workplace OS. Apple and IBM's Taligent.

Whereas CP/M and DOS and Windows, even Unix itself, all started out as fun little proof-of-concept things that barely did anything, but were fun to play around with.

Be small. Be simple. Be fun. Embrace being just a toy.

The other thing is that modern FOSS Unix isn't much fun any more. It's too complicated. A mere human can no longer completely understand the whole stack. That's why we have to chop it up into bits with virtualisation: to make it manageable and scalable.

I miss when it was simple and you could understand the whole thing, and so do a lot of people.

What can we throw away?

If we are not going up against Linux, we don't need to do what Linux does. We don't need multiuser support, or POSIX compatibility. We don't need server apps. Linux isn't going anywhere. It already owns the cloud. It can keep it.

What else could we eliminate? Well, Classic MacOS is an interesting example. Multi-million-selling OS with no command line at all. No config files, either – the OS kept config in a database which the user never even saw.

Apple's Newton, and the far more commercial relative, the Palm Pilot: no filesystem. Just OS-managed databases, in RAM.

Presumably you read the intro to this talk. We now have commercial, off-the-shelf non-volatile DIMMs. Our main memory can be persistent, so who needs disk drives any more?

What do we keep?

What's left? Well, obviously, we want to keep graphics, sound, multiple CPUs, GUIs etc. They're nice. We need multitasking. We need memory management.

So I went looking for some kind of small, efficient, kernel to start, manage and stop processes, something solid, multi-core capable, and that's been around for a while.

You see, one of my guiding principles when I started to think about this was: we have lots of wheels. We don't need new ones.

Don't plan on building new languages or whole new technology stacks. We have more than enough of those.

Another principle was: look for survivors.

If a technology had real merit, it left descendants. It may never have got big, or famous, but if it worked and it was worth learning, then someone somewhere will still use it.

There were lots of good ideas from after the Unix era that didn't catch on. Really powerful ideas or tools survive, even if they never get popular.

Anything that survives decades of unpopularity is worth studying. Old but still working is better than new and untried.

Let us raid the toy-box of history for interesting shiny stuff.

But there is so much, I needed some kind of filter. Some way to sort out what was worth a closer look.

The workstation era

Back during the holiday season, I wrote about the War of the Workstations.

I wrote about two dominant types of workstation, and how one type won out and thrived and the other faded into history: Lisp Machines, and Unix. Unix won: before workstations died out, replaced by generic PCs running Linux and BSD, the last generation were all Unix machines.

But there a couple of other types. Two stand out by being really different.

Smalltalk

The other great non-Unix workstation family, you've definitely heard about. At least the apocryphal story of Steve Jobs stealing their ideas.

Although they just make printers now, Xerox invented the GUI and designed and sold the first ever graphical workstations. Xerox's Palo Alto Research Centre, PARC, was where some of the greatest technological developments of the 1970s were made.

I will let Jobs himself explain:

I had three or four people who kept bugging me that I ought to get my rear over to Xerox PARC and see what they were doing. And so I finally did. I went over there. And they were very kind and they showed me what they were working on.

And they showed me really three things, but I was so blinded by the first one that I didn't even really see the other two.

One of the things they showed me was object-oriented programming. They showed me that, but I didn't even see that.

The other one they showed me was really a network computer system. They had over a hundred Alto computers, all network using email, et cetera, et cetera. I didn't even see that.

Now, remember, it was very flawed. What we saw was incomplete. They'd done a bunch of things wrong, but we didn't know that at the time. And still, though, they had the germ of the idea was there and they'd done it very well. And within, you know, 10 minutes, it was obvious to me that all computers would work like this someday. It was obvious.

Xerox's Smalltalk machines used a more general-purpose hardware design, and a relatively simple OS. It was originally written in BCPL and then largely rewritten in a language called Mesa, which later developed into a language called Cedar. This was garbage-collected, it had dynamic typing with strong string handling, and critical sections of code could be marked as type-safe to reduce memory leaks. Cedar inspired both Modula-2 and Java.

On top of that, the PARC researchers wrote a language called Smalltalk. This meant that it didn't need a special CPU or exotic architecture.

Smalltalk is another tech survivor. There are multiple implementations, it's still used in production, there are versions for writing web apps or even that run in your web browser.

Every standalone Smalltalk environment is a sort of VM: it's a whole, self-contained graphical OS, that runs in a window on top of something else.

Lisp and Smalltalk are very different languages, implemented in very different ways.

What I found interesting, though, was to compare what Lisp fans say about Lisp with what Smalltalk fans say about Smalltalk.

You might know Eric Raymond's quote about Lisp:

Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot.

Programmers who know Smalltalk say similar things…

Smalltalk is dangerous. It is a drug. My advice to you would be don't try it; it could ruin your life. Once you take the time to learn it (to REALLY learn it) you will see that there is nothing out there (yet) to touch it. Of course, like all drugs, how dangerous it is depends on your character. It may be that once you've got to this stage you'll find it difficult (if not impossible) to "go back" to other languages and, if you are forced to, you might become an embittered character constantly muttering ascerbic comments under you breath. Who knows, you may even have to quit the software industry altogether because nothing else lives up to your new expectations. — Andy Bower, C++ expert, co-founder of Dolphin Smalltalk

In other words, multiple people have written, or said, or told me personally, that these two languages above all others are worth learning, because they change the way you think about programming. And even if you don't use them much after that, what you learn from them will help you.

That is one of the things that has kept these languages alive this long.

I think it's worth asking what we can learn from where Lisp Machines and Smalltalk boxes overlap.

In both cases, the whole user-facing part of the OS was built in a single language, all executing on the same runtime, from the core interpreter to the windowing system to the end-user apps.

While the Linux kernel is in C, these days, core bits of the system are in Go. Apps are written in a profusion of languages including C++ and Rust, as well as interpreted languages like Python, Ruby, Lua, Julia and many more. A lot of front-end stuff is in JavaScript, and the back-ends are in Java.

Unix people see this as a good thing, as desirable. There are lots of languages to choose from. All have their own particular strengths. Pick one that's good at what you want to do, and ignore the rest.

Whereas what Lisp and Smalltalk workstations show us is the opposite: that if you choose a powerful enough language to build your software, you only need one.

From the Unix perspective, this sounds crazy, and some very smart people have strongly criticized me for saying this.

Here is a heretical idea. If you find that you need different languages for your kernel and your init system and your system scripts and your end-user apps and your customisation tools, that indicates that there's something wrong with the language that you started in.

A second point is that if you chose multiple languages, a stack, then one must be at the bottom… but any serious problems with that one permeate all the way up the stack.

You may have a great, fancy, dynamically typed scripting language, but if it's implemented in a language that is vulnerable to buffer overflows and stack smashing and memory-allocation errors, then so is the fancy language on top.

In other words, there's unavoidably something wrong with all of them.

This problem has built a multi-billion-dollar industry in maintaining this stuff, across hundreds of specialized jobs, and lots of good career paths that will keep you employed for life.

But that doesn't mean it's a good thing for you, or for the users.

Another interesting thing about Lisp and Smalltalk machines is that although they were not open source as we know it today, the vendors did ship you the source code, because the machines ran the source code.

Although they could compile modules for performance, it was abstracted away by the OS and you never needed to see it. Instead, you could inspect the code you were running, while it executed, and modify it on the fly with immediate effect.

These systems blurred the line between compiler and interpreter, and their users loved this.

Another interesting common element is that with the entire OS and apps in one shared environment, a single huge context, you did not need to start over every time you rebooted.

When you shut down, the running OS image, a mixture of program code and program state, was snapshotted to disk.

When you turned the machine back on, it didn't boot. It reloaded to exactly where you were when you shut down, with all your programs and windows and data back where they were.

In other words, they were not filesystem-centric, as all current mainstream OSes are.

Now, with any luck, you're starting to see where I'm going.

But there is another element.

The wild card: Oberon

Xerox PARC was well-funded and its machines were highly-specced and expensive. So were Lisp Machines. Apple saw these machines and built the highly-specced and expensive Lisa.

But someone else visited PARC and was inspired: the late great Niklaus Wirth, the developer of Pascal. From the early CP/M and DOS era right through to the rise of 32-bit Windows, Borland's Turbo Pascal was hugely important – it was one of the most widely-used languages. Its descendant Delphi, a graphical, object-oriented Pascal, was one of the main ways of developing Windows apps before Visual Studio came along. Large chunks of the original classic MacOS were written in Pascal, too. In the '80s and early '90s, it was everywhere.

Wirth didn't stop with Pascal, though. He continued work, creating Modula and then Modula-2.

He visited PARC, was as inspired by what he saw as Steve Jobs, and went back home to Zürich and implemented his own smaller, simpler, cheaper GUI workstation in Modula-2. He called it Lilith.

This revealed some drawbacks in Modula-2, so Wirth moved on, creating the Ceres workstation and its entire software stack, Oberon.

Oberon is a smaller, simpler, faster Pascal. Like Pascal, it's strongly-typed and garbage-collected, but it was designed for implementing OSes.

Oberon is not only a programming language, but also an operating system built in that language. Or rather, a small family of them. The original has a unique text-based, point-and-click, tiling-window interface. No shell, no command line.

A later version, ETH Oberon, added a more conventional GUI.

Wirth taught courses in it at the Swiss Federal Institute of Technology, and for a while, a significant chunk of the Institute ran on Oberon – not just academics, but admin staff too.

Oberon is another great tech survivor. There are native versions for x86 and other processors, emulators, and versions that run as applications under Windows, Linux and macOS. The core OS, IDE and compiler are about five thousand lines of code, while a complete Linux desktop is in the hundreds of millions.

It's all on Github, with comprehensive documentation on Wikibooks and tutorials on Youtube. You can even run it in your browser. It's amazing.

Perhaps the most modern fork is known as A2, nicknamed Bluebottle: it has a full zooming GUI, supports SMP on multiple CPUs and comes with a web browser, email client, image viewer, media player and so on. The core OS is still under ten thousand lines of code. Most active work on it today is in Russia.

The payoff

There were two alternative workstation families that Unix killed off by being simpler, easier to implement, and therefore cheaper:

  1. Lisp Machines, arguably the most powerful language there is, but famously hard.
  2. Smalltalk machines, which kids could use, and which inspired the Mac and the whole of contemporary computing.

OSes written in one needed special hardware; OSes written in the other ran, and still run, on anything.

It seems like an obvious decision, right?

Choices…

I can tell people about interesting tech, but I am not here to tell anyone what to do. This is just an idea.

Both Smalltalk and Lisp exist in multiple FOSS versions. The latest development in Smalltalk is an educational programming environment called Squeak. Its runtime is implemented in C, and there's even a bare-metal version, but Smalltalk runtimes have been built in other languages, including Javascript. The JIT technology in Oracle's JVM started off as a Smalltalk engine called StrongTalk.

My core proposal is to cut Oberon down into a monolithic, net-bootable binary, and run Squeak on top of it. For drivers and networking and so on, there's a solid, mature, fast, low-level language with a choice of toolchains.

But the user-facing environment is the much higher-level Smalltalk-based Squeak GUI – for apps and experimenting and learning.

Decades after they split, we reunite Smalltalk and the tiny efficient OS that it inspired.

Smalltalk isn't the end of the line. For instance, there's a very interesting new language derived from Smalltalk called Newspeak, which tightens up the security and formalizes text-based formats for Smalltalk objects, so you can keep programs in Git or whatever. It's not there yet, but Gilad Bracha and his team are building it on Squeak. It's a possible future direction.

The second string

I'm not proposing this as the only way forward. One alternative commends itself.

When John McCarthy designed Lisp, his original plan was to put a higher-level, more readable language on top… but he discovered that some people could learn and use the list-based, abstract-syntax-tree lower-level language just fine.

Deeming it unnecessary, McCarthy and his team never finished the planned LISP-2.

Multiple teams have built their own ideas of friendlier higher-levels on top of Lisp. Most are long dead.

But one survives. It was originally built by Apple as the original basis for the Newton. Its modern, FOSS descendant is OpenDylan. It's a rich, object-oriented language aimed at building graphical apps for a user-friendly pocket Lisp Machine. You can run Apple's versions on a classic Mac emulator.

Dylan was built on Macintosh Common Lisp.

There's a modern FOSS Lisp environment for Linux: Steel Bank Common Lisp. As an alternative to Smalltalk, it might also be viable to put a Lisp runtime on top of Oberon, and Dylan on top of that. The code is all out there.

Common Lisp is anything but small and simple. My favourite description is from blogger and developer Steve Yegge:

  • Scheme is an exotic sports car. Fast. Manual transmission. No radio.
  • Emacs Lisp is a 1984 Subaru GL 4WD: 'the car that's always in front of you.'
  • Common Lisp is Howl's Moving Castle.

Even so, Common Lisp is still several a thousand times smaller than any modern Linux desktop, though.

I'm not trying to shepherd anyone anywhere. There's no reason not to try both ideas, side by side. Interop between them might be a nightmare, but remember – just a toy. Just for fun.

But we don't have the kit to play with!

Machines filled with PMEM NVDIMMs don't grow on trees, but all this can be prototyped on ordinary hardware and in a VM. Boot from the network to get it started.

A programmer friend of mine, who is a lot smarter than me and often shoots my ideas down, pointed out a snag. Even though NVDIMMs are orders of magnitude tougher than Flash SSDs, they do still wear out. One loop just incrementing a variable would burn out some memory cells in minutes.

But nobody's equipping machines with just NVDIMMs. All the servers out there have some ordinary RAM as well. But if that is all you have – RAM plus persistent memory on NVDIMMs, and no disks – then one area of RAM can be rewritten as many times as you like. That's where variables and system state are kept. The other area is much bigger and slightly slower. That's where code is kept.

While prototyping it, we just need to keep that memory split, until it's reflected in real hardware.

The dream is a machine where you can just pull the plug and when you plug it back in, it picks up exactly where it left off. For now, that's a bit tricky. But a billion Android phones go to sleep when you press the power button and wake up in a second, and they're running Linux with some Flash memory pretending to be a hard disk drive. Just snapshot the current state into PMEM whenever the user presses the button, and when they press it again, copy it back. There is prior art.

Closing words

Four influential tech survivors. All FOSS. All still maintained in current versions. Two different potential ways to move on from disk and file system-centric OSes. Carefully chosen to represent the best of the ideas from the early rivals that shaped Unix, Windows and the Mac.

There are still several billion people without access to their own computers or the Internet. To quote Richard Feynman, there's plenty of room at the bottom.

NT has been around longer than all the previous versions of DOS and Windows put together. Classic MacOS lasted 16 years; at 21, Mac OS X is older, and has had more releases. Linux is 30, but it's an implementation of a 52-year-old OS.

That's a lot of technical debt.

It's time to move on. ®

More about

TIP US OFF

Send us news


Other stories you might like