Codd almighty! Has it been half a century of SQL already?
The Reg talks to Donald Chamberlin, Michael Stonebraker and more about the legendary language
Feature In the fifty years since SQL was first proposed, it has become the dominant query language for working with relational databases. As such, it underpins the transactions vital to the workings of government, business and the world's economy. But like many standards, its path to global domination is not been a straight one, and in the early years of its gestation, the story could have been very different.
The clue is in the name. Although the acronym might stand for Structured Query Language, it was first known as SEQUEL, a nod to QUEL, a rival language for addressing information in relational databases developed at about the same time.
IBMers Donald Chamberlin and Raymond Boyce wrote the paper "SEQUEL: A structured English query language" in 1974 as a way of addressing data in IBM's newly proposed System R, the first database to embody Edgar Codd's paper [PDF] describing the relational model for database management.
Speaking to The Register, Chamberlin said Codd's approach was a departure from the prevailing thinking in data systems from the 1970s, and would require a new language.
"The most respected figure in the database industry was Charles Bachman. He was working for General Electric, a computer company at the time. He invented the whole concept of a database system as a distinct software layer between the operating system and the application, and he developed a particular database system called Integrated Data Store," Chamberlin said.
Bachman's creation became known as a navigational database, but it was not set to have a long-lasting influence on the industry.
"Charlie thought that you should access a database by writing a procedural program that tells the computer what to do to find the answer. Codd said, 'No that's all wrong, we should use a declarative approach and let the computer figure out how to find the answer.' That was the conflict that was raging in the industry around the early '70s," Chamberlin said.
A SQUARE language, but impossible to type
In their first attempt at creating a language for addressing the new concept of relational databases, Chamberlin and Raymond Boyce created SQUARE or Specifying Queries in A Relational Environment. But it relied on a finger-mangling combination of subscript and superscript notation, and by Chamberlin's own admission, it was difficult to type.
After moving from the east coast to IBM's San Jose Research Laboratory in 1973, Chamberlin and Boyce joined the team for System R.
"The relational model was simple and elegant and powerful. But Codd presented his concepts in a lot of mathematical jargon. He's basically a mathematician, and the first time we read his paper, it wasn't really obvious how elegant and simple this concept was. He took a simple concept and made it complicated. The other problem was that it wasn't clear at first whether these ideas could be efficiently implemented within a program that would have adequate performance," Chamberlin said.
Another database team works on the same problem
System R was IBM's attempt to demonstrate that the relational model could be executed practically by the computers available at the time, but because Codd's paper had been published in a journal, the IBM team were not the only ones working on the problem. Up the coast near San Francisco, another team at University of California Berkeley was working on a similar project.
Chamberlin said: "There were three things that were needed to make a relational database possible: a kind of a three-legged stool. We had the data model. The second thing was a query language that people could understand even though they weren't mathematical specialists, and the third thing was an optimizing compiler that could take this high level language and turn it into an efficient plan with good performance. That's exactly what people were doing both in IBM and also at Berkeley. There was a revolution going on. The cost of computing was coming down, everybody was putting their data online and the winner of that revolution was relational databases," he said.
The question was which one and using which language.
At Berkeley in the early '70s, post-doctorate Michael Stonebraker and fellow computer scientist Eugene Wong began a project to build Ingres, another early relational database, at the same time as System R. With it, they developed their own query language, QUEL.
"Chamberlin and Boyce basically turned SQUARE into a language you could type on a keyboard, but it had this nested structure to it," Stonebraker explained to The Register.
SQL: A standard that is not really standard
With its overwhelming dominance of the database market and backing of the world's international standards body, SQL can show a surprising degree of variation between implementations.
The differences between versions of SQL can lie in efforts to absorb new ideas from the database world, explains Andy Pavlo, associate professor of databaseology at Carnegie Mellon University.
"What makes SQL resilient is its ability to absorb the best ideas for attempts to supplant and replace it over the years. As fashionable approaches in computing evolve over time, SQL is able to take the best ideas and incorporate them. A classic example would be object oriented databases in the 1980s and 1990s. Another would be XML. More recently document store MongoDB took off in the late 2000s, and then SQL adapted by adding JSON support in 2016," he said.
However, support for JSON data types — widely used for exchanging data on the worldwide web — illustrates an issue with SQL: namely, it is not as standardized as many would assume.
Pavlo points out that while JSON became part of the SQL standard in 2016, PostgreSQL had JSON support in 2012. Oracle and MySQL also had their own variations of JSON support.
"It's like everyone kind of does their own one off thing, and then you try to get to the lowest common denominator and say, 'This is what the standard should be.' Often some databases have their own proprietary version, or variation of how to do some kind of new operation, then by the time the SQL standard comes along, they're not going to go back and change things," Pavlo said.
The effect means different vendors have different versions of SQL which can trip up developers and DBAs: Microsoft has T-SQL, Oracle PL/SQL and PostgreSQL PL/pgSQL, for example.
Nonetheless, a few rules can apply to writing good SQL.
Peter Zaitsev, founder and former CEO of open source database consultancy Percona, said SQL is great at expressing simple needs very well, but in more advanced cases, it can get "mind-twistingly complicated" even for a mathematically included brain.
"In my business, you see programs generate SQL queries over five pages long. At that point, it is almost impossible to do something with. It is hard to comprehend and very hard to debug. That has been the experience for many developers," he said.
"A SQL query is like a sentence. You don't want it to be two words, and you don't want it to be five pages long. Use expressions that are contained in a single train of thought.
"My advice is: be smart with SQL. Don't over-kill it. People try to put too much application logic in SQL and it is not easy to maintain," Zaitsev said.
Did the best technical solution win?
However, there were queries that could not be expressed in a nested notation. The result was, Sequel added a flat notation to the nested notation, which made the language "unnecessarily complicated," Stonebraker said.
"On the other hand, QUEL started off with a flat notation from the very beginning. There's no nested notation. Sequel was hampered by the inclusion of the nested notation. But the trouble is, you can't take it out once you put it in," Stonebraker said.
At the same time, the dominance of SEQUEL — the SQL nomenclature came later — was far from a foregone conclusion, as IBM initially showed little interest in developing System R as a commercial product since it was making so much money from its existing data system.
Chamberlin explained: "It was a frustration for me, that's for sure, and for other members of the System R team. This was a business decision that IBM was making. They had a successful commercial database product [in hierarchical IMS]. Why would they want to introduce a competitor for it? It took a few years for them to figure out.
"In the meantime, the System R group published papers both on SQL and on the optimizing compiler in the open technical literature. IBM was very generous allowing us to do that, mainly it was because they didn't take the technology too seriously from a business point of view: this was research," Chamberlin said.
- Postgres pioneer Michael Stonebraker promises to upend the database once more
- Db2 goes 'cloud-first' as IBM struggles to lift database dinosaur
- Thank heavens for the silicon chip: A BRIEF history of data
- 12 simple rules: How Ted Codd transformed the humble database
The birth of Oracle and influence rise of the mini-computer
Two companies were, however, interested in producing commercial relational database. As well as Stonebraker's Ingres, commercialized by the Relational Technology company he helped to found, there was the fledgling Software Development Laboratories founded by Larry Ellison, Bob Miner and Ed Oates, which developed the Oracle database, later to give its name to the company.
Oracle, adopted SEQUEL and claimed to be the first commercially available relational system, and Ingres had a significant advantage over IBM, Stonebraker said. Both were developed for mini-computers systems, distinct from the prevalent mainframes of the 1960s and 1970s.
"It was highly advantageous to both companies that at the time in the early to mid-'80s, the major database systems were all running on IBM hardware so they were written in IBM assembler, so they couldn't be ported to DEC and Data General mini-computers. The '80s was the rise of the mini computer and it presented a green field to Oracle and Ingres," Stonebraker said.
He said the implementation of Quel in Ingres was substantially better than Oracle's adoption of SQL. Stonebraker said Oracle had a head start and was achieving strong growth using "sales tactics I would not condone." However, he said that Ingres might have overtaken Oracle in 1985 were it not for an intervention by IBM.
In late 1983, Big Blue introduced Db2, which, although not its first commercial relational database, was soon to become the flagship of a new vanguard. Db2 used SQL, and with IBM's dominance of the mainframe market, it was game over for rival languages, whether they were technically superior or not.
Standards bodies and government purchasing make the call
But for Chamberlin it was the approval of national and then international standards bodies, and the adoption by US government purchasing authorities, which helped win the day for SQL.
As relational databases became more commonplace due to the success of Ingres and Oracle, the American National Standards Institute (ANSI) decided to take an interest and look at the query languages.
"Well, they had a committee called the H2 committee that has representatives from a lot of different software vendors, including IBM and Oracle, both of which at that time, were marketing SQL products. Crucially, the Ingres people didn't participate in that committee. The world could have been different if Ingres had been represented on that committee," he said.
Instead, the ANSI committee saw two relational products in the market, both of which used SQL so they were compatible, so it thought it would get the most traction if it based its standards on the commercially available SQL implementation, Chamberlin said.
Meanwhile, the US government published the Federal Information Processing Standard 127. "You had to pass that standard to sell software to the government. The FIPS 127 was identical to the ANSI standard. That was very beneficial from the point of view of marketing SQL systems to the government," Chamberlin said.
Following the ANSI standard, the International Organization for Standardization (ISO) created its standard for SQL in 1986. The most recent update from 2023, includes new features related to JSON and property graph. With SQL's world domination assured, Chamberlin reflects what his team could have done differently. He said those pointing out that SQL was not orthogonal — the idea that operations change only one thing — were making a valid criticism.
However, the goal in creating SQL was not to create a programming language as such, but to create a means that non-programmers — business analysts and so on — could use to address their data.
"We wanted the language to look as closely as possible to natural language so they could read it and understand it, like it was an English sentence. But of course, English is not an orthogonal language, so we made some compromises there," he said.
In the end, SQL is mainly used by programmers, database administrators and data scientists. "A little bit more care in making the language more orthogonal would have been beneficial in the long run as things turned out," he said.
Despite SQL's success, Chamberlin sees the need for NoSQL databases and languages to help address data needs in modern applications. But the sheer volume of data in existing relational databases, together with the popularity of open source relational databases such as MySQL, PostgreSQL and SQLite for building new applications, mean SQL is unlikely to be replaced.
Despite his language losing out, Stonebraker says that the database industry has benefited from having a single "intergalactic" standard in SQL.
"I don't see anything supplanting it at this point, unless for some reason the relational database goes out of favor, but I don't see that happening. AI might allow people to use natural language to compile SQL, but that just puts another front end on it," he said.
"There are 20 or 30 popular programming languages in the software development world. In the database world, there's only one." ®