Postgres pioneer Michael Stonebraker promises to upend the database once more
Turing Award winner whose research and startups broke ground for five decades, tells The Reg he has more up his sleeve
Interview What if we built the operating system on top of the database instead of the other way around? It sounds like an idea from an undergraduate student after one microdose too many, except it's not. It's a serious idea from someone who has already upended the computing industry and whose influence has spread into familiar products from Microsoft and Oracle.
Celebrating his 80th birthday this year, Michael Stonebraker continues with his work in database research, but his mark on the industry has been cemented with PostgreSQL, the open source relational database system which, for the first time, became the most popular choice of database among developers this year, according to the 2023 Stack Overflow survey. As well as a popular open source DBMS, vendors including the cloud hyperscalers, CockroachDB and YugabyteDB all offer database services with a PostgreSQL compatible front end.
Stonebraker's first influential work started with Ingres, the early relational database system, which began as his research topic following his appointment as an assistant professor at UC Berkeley in 1971.
Speaking to The Register, he says: "My PhD thesis was on an aspect of Markov chains, and that, I realized, had no practical value whatsoever. I went to Berkeley, and you've got five years to make a contribution and get tenure. I knew it was not going to be my thesis topic. Then Eugene Wong, who was another faculty member at Berkeley, said, 'Why don't we look at databases?'"
The two read a then-recent proposal about relational databases from IBM researcher Edgar Codd called "A Relational Model of Data for Large Shared Data Banks."
Stonebraker and Wong thought the Englishman's idea was elegant and simple. "The obvious question was to try and build a relational database system. Both Eugene and I had no experience building system software but, like academics, we thought, let's try it and see what happens. So, based on no experience, we set out to build Ingres. And that was what got me my tenure."
Ingres had competition. IBM's System R was the first to demonstrate the relational approach could provide working transactional performance and the first to implement the now ubiquitous SQL. Oracle began its relational system later in the 1970s. Ingres also had to face a platform problem.
"We got lots of people visiting Berkeley and asking us who's the biggest user of Ingres. Then Arizona State University wanted to use it for a records database of 35,000 students but they couldn't get over the fact they had to get an unsupported operating system from these guys at Bell Labs, namely Unix," he says.
Ingres's targeting of mid-range systems, into which Unix had newly emerged, also meant it did not support COBOL, the dominant language for business computing at the time.
"The only solution was to start a company," Stonebraker says.
He went on to found Relational Technology to commercialize Ingres. It was later renamed Ingres Corporation and then bought by ASK Corporation in 1990, which in turn was bought by Computer Associates in 1994. Another Berkeley Ingres team member, Robert Epstein, went on to found Sybase, which for a decade was second to Oracle in the relational database market. In 1992, its product line was licensed to Microsoft, who used it for early versions of SQL Server.
But Stonebraker acknowledges the commercial codebase for Ingres was way ahead of the open source research project — other researchers could get the code for a nominal fee covering the tape required to store and the postal costs — so his team decided to push the code over a cliff and start all over again. What comes after Ingres? Postgres, obviously.
A new era
In 1986, a 28-page paper [PDF] — co-written with Larry Rowe — announced the design for Postgres, as it was then known, setting out six guiding ambitions. Among them were two that would prove pertinent to the database system's longevity. One was to provide better support for complex objects. The second was to provide user extendibility for data types, operators and access methods.
Stonebraker tells us he knew from conversations with Ingres customers that being extendible would be important for a successful database in the future. "Once this customer called me and said, 'You implemented time all wrong'," he said.
The Berkeley professor was baffled because his team had gone to some length to ensure they implemented the Gregorian calendar correctly, leap years and all. But some financial bonds are paid in 12 equal months in a 360-day year, which you cannot implement in Ingres but you can in PostgreSQL, he says.
The motivation to make the database extendable also came from wanting to support new data types. An early project with Ingres tried to use it as a geographic information system, far from its home turf of business data. It was "arbitrarily slow and unfixable," Stonebraker says.
The vision has paid off over the last decade. Ten years ago, PostgreSQL added support for Json documents, the file format around which NoSQL database MongoDB and Couchbase are based.
Stonebraker has been on record criticizing the NoSQL movement in the past. He tells The Register it was converging with relational databases because they were adopting SQL or SQL-like languages and they were accepting the need for consistency.
"NoSQL's biggest good idea was the out of box experience, because with SQL databases, you have to construct the database, and then you have to define the cursor. They are hard to use. That's one of the very valid criticisms made against SQL databases: the out-of-the-box experience sucks. You should be able to just turn it on and say, 'Here's some data'."
The various services available to provide PostgreSQL and PostgreSQL compatible databases go some way to address that, but the emergence of the DBMS as popular open source system was a happy accident, and one Stonebraker had little to do with.
Although the research code for the database was — and remains – open source, building a database company around it was, at the time, impossible, as Stonebraker discovered when founding Illustra in 1992. "When we got venture capital funding for both Ingres and Postgres, VCs would have nothing to do with open source, that was a later phenomenon," he says.
In 2005, Stonebraker founded Vertica based on a shared-nothing column-oriented DBMS for data warehousing, which he now says "would have benefited immensely by being open source but viability of open source code and VC community is a relatively recent phenomenon."
'Closed source databases are not the wave of the future'
Illustra was successful for a period. It was eventually sold to Informix for around $400 million in 1996, with Stonebraker's share worth $6.5 million, Forbes wrote in 1997. Stonebraker became CTO of the parent company for four years.
It's a comfortable sum, but chicken feed compared to Larry Ellison's estimated net worth of $145 billion. Needless to say, Stonebraker is disparaging about Oracle, another early adopter of the relational model. "Ingres was always technically better and Postgres was practically better. It's more flexible, and it's open source. And these days, PostgreSQL is generally comparable in performance. In the general, closed source databases are not the wave of the future and I think Oracle is highly priced and not very flexible," Stonebraker says.
Nonetheless it was Oracle that made a decision which provided a boost to open source PostgreSQL. It bought open source MySQL, which some of the community did not trust in the hands of the proprietary software giant. At the same time Illustra and other companies commercialized Postgres, Berkeley released the code for POSTGRES under the MIT license, allowing other developers to work on it.
In 1994, Andrew Yu and Jolly Chen, both Berkeley graduates, replaced query language POSTQUEL with SQL. The resulting Postgres95 was made freely available and modifiable under a more permissive license and renamed PostgreSQL.
"What ended up happening was Illustra kind of gaining traction, but the big kicker was when this group of totally unrelated people I didn't even know, picked up the open source Postgres code, which was still around, and ran with it, totally unbeknownst to me. That was a wonderful accident," he says.
"When MySQL was bought by Oracle, developers got suspicious in droves, and defected to PostgreSQL. It was another happy accident. It's commercial success is wonderful, but it was largely serendipitous," Stonebraker adds.
Meanwhile, database services have grown up around PostgreSQL. It has become the most dominant front end for compatible, or nearly compatible systems available from Google (AlloyDB and CloudSQL), Microsoft (Azure PostgreSQL), AWS (Aurora and RDS), CockcroachDB, YugabyteDB, EDB, and Avien.
"The whole world is moving to the cloud and Google, Amazon and Microsoft, are all betting the ranch on PostgreSQL compatibility. I think that's a great idea. CockroachDB is wire-compatible with PostgreSQL. You can take a PostgreSQL application, and drop it on CockroachDB. PostgreSQL doesn't have any distributed database capabilities but both YugabyteDB and CockroachDB do," he says.
Stonebraker's influence even reaches into the portfolio of rival Oracle. His federated database Mariposa became the basis for Cohera, a database company PeopleSoft bought in 2001, before becoming part of Oracle in 2004. In 2014, Stonebraker was recognized for the influence of his work on Ingres and Posgres with the Turing Prize, netting $1 million from Google in the process.
Despite many of his ideas being so widely used in the database industry, which Gartner said was worth $91 billion in 2022, Stonebraker is laid back about other people using his ideas.
"I've done well financially. I knew Ted Codd, who was very magnanimous about saying you guys should all run the [ideas]. You want to change the world; any particular person is only part of that. I've always done open source code and shared code with anybody who wanted it. In the process, I've done well financially so yeah, I have no regrets at all," he says.
- There are lots of ways to put a database in the cloud – here's what to consider
- Google wants to takes a byte out of Oracle workloads with PostgreSQL migration service
- The Great Graph Debate: Revolutionary concept in databases or niche curiosity?
- Quickest way to save with Oracle? Get off Unlimited Licensing Agreements, says pundit
But that's not to say he is ready to retire. In his latest project, Stonebraker is ready to change the world again.
The idea for DBOS, a Database-Oriented Operating System, came from a conversation with Matei Zaharia, the author of Apache Spark who is also co-founder of analytics and ML company Databricks and associate professor at Berkeley.
"Spark and Databricks are in the business of managing Spark instances on the cloud. He said at any given moment, Databricks is often managing a million-ish Spark-sub tasks for various users. They couldn't do that using traditional operating system scheduling techniques: they needed something that could scale. The obvious answer was to put all scheduling information into a database. That's exactly what the Databricks guys did: they put it all in a PostgreSQL database, and then started whining about Postgres performance," says Stonebraker.
Never one to shirk a challenge, Stonebraker thought, "Well, I can do better than that."
The new project replaced Linux and Kubernetes with a new operating system stack at the bottom of which is a database system, the prototype multi-node multi-core, transactional, highly-available VoltDB, which Stonebraker started.
"Basically, the operating system is an application to the database, rather than the other way around," he says.
A paper Stonebraker co-authored with Zaharia and others explains: "All operating system state should be represented uniformly as database tables, and operations on this state should be made via queries from otherwise stateless tasks. This design makes it easy to scale and evolve the OS without whole-system refactoring, inspect and debug system state, upgrade components without downtime, manage decisions using machine learning, and implement sophisticated security features."
Successful or otherwise, the OS-as-a-database application idea is unlikely to be Stonebraker's last. After turning 80 in October, he tells The Register he is not about to slow down.
"I can't imagine playing golf three days a week. I like what I do, and I will do it as long as I can be intellectually competitive," he says. ®