This article is more than 1 year old
CockroachDB hits Azure at last after five-year mission
The database had to be architected from ground up, former GIMP dev Spencer Kimball tell us
Interview Distributed SQL database CockroachDB has taken nearly five years to port its service to Microsoft's Azure platform – in contrast to rival MariaDB, which released SkySQL in March 2020 on all three major clouds. But the time has been well spent, CEO and co-founder Spencer Kimball promises The Register.
As it was designed as a distributed system, porting the database service to a new cloud was a "non-trivial amount of work," he says.
"There's a lot of surface area to cover. To make it work well in AWS, GCP and in Azure – it all has analogs, but they are also idiosyncratic."
For example, when Cockroach launches a dedicated cluster for a customer, it creates a sub-account in the cloud service on their behalf which Cockroach controls.
"AWS has a fundamental limit on those, which is measured in the single-digit thousands," says Kimball. "GCP doesn't have such a limit. The differences between how you create those – it's a completely different API; there's different sorts of settings; they take different amounts of time; they have different failure rates. Supporting one cloud, you kind of get used to all those constraints and performances, and then you move to another cloud. And you use a new API, and get different results, and it fails in different ways. That's just one example."
Cockroach Labs announced its DBaaS on AWS and GCP in 2018. In the end, the decision to offer the database service on Azure last among the major cloud providers was taken on the basis of customer demand – particularly among the largest enterprise customers.
Kimball said four or five years ago, early adopters of the database in the cloud wanted to use AWS and GCP. Enterprise customers, however, wanted to host it themselves. As the latter group of organizations became more comfortable with deploying in the cloud, demand for Azure grew.
"In those big enterprises, you're actually more likely than not to see them with an Azure footprint," he tells El Reg.
Cockroach Labs promises a serverless database that can scale up and down at the user's behest. It's a message global companies such as Bose, Comcast, and Netflix have warmed to.
To appeal to more enterprises with a worldwide reach, the eight-year-old database startup has released multi-region capabilities for its serverless, consumption-based auto-scaling service.
CockroachDB was already available in multiple regions, but only on a dedicated cloud platform – and that had cost implications. "Multi-region without the serverless is really expensive: you can get it for just like the most high-value use cases," Kimball says.
In fact, one of the benefits of going serverless in the first place was to allow a more efficient route to multi-region deployments.
Serverless deployment means users can deploy with provisioning. Instead they get a virtual slice of a larger physical cluster.
Going multi-region means the customer can service regions locally without having to deploy a whole cluster in each region.
- MariaDB's Xpand offers PostgreSQL compatibility without the forking drama
- Cockroach Labs CTO: Google became too comfortable, I wasn't being challenged
- AMD bests Intel in cloud CPU performance study
- CockroachDB adds command line tool as database hits version 22.1
"It's like there's a physical deployment of Cockroach which spans all of these datacenters, and then you can create virtual or logical clusters within that physical footprint. Each one can have a very precise slice through that physical topology that meets their needs without having a bunch of unused idle resources. And that's what motivated serverless in the first place. The trick, though, was trying to make all that a reality," Kimball says.
"Just building serverless was a huge task because we had to change the underlying architecture of CockroachDB to really efficiently support multi-tenancy."
As UC Berkeley students, Kimball and cofounder and CTO Peter Mattis developed popular open source image editing software General Image Manipulation Program (GIMP, since forked as Glimpse, primarily to offer the software under an alternative name).
Kimball and Mattis are also both veterans of Google's software engineering team, having worked on the Colossus distributed file storage.
Despite this heritage, CockroachDB has some competition in terms of distributed SQL databases – including Yugabyte and MariaDB.
While Yugabyte claims 100 percent compatibility with PostgreSQL, MariaDB has recently launched a PostgreSQL-compatible front end to its distributed MariaDB back end, including support for tools used by tech teams familiar with the popular open source database, such as PG Admin.
Kimball admits CockroachDB does not have full PostgreSQL compatibility, but it is getting there.
"We reimplemented the PostgreSQL syntax and capabilities from scratch. We chose to do that because we decided that the architecture was the primary concern in order to get 3,000-node clusters of the sort our customers are asking for. You have to think distributed from the start. That's obviously not the way PostgreSQL is built, and that's why you couldn't simply import it. MariaDB was built on MySQL. Those are monolithic database engines.
"We're very compatible with PostgreSQL. With every release, we become more compatible. We rolled out distributed User Defined Functions in this latest release, and stored procedures are coming in the next release."
CockroachDB had considered going public in 2021, but put plans on ice as the market appetite cooled. Nonetheless, Kimball points out that by selling directly to enterprise – rather than other startups as many database companies and other new tech companies tend to do – CockroachDB had secured its revenue streams.
"Instead of trying to make the jump into enterprise, we just found ourselves selling branded products – because that's what the product is that we have. In the long run, they're still going to be here in ten years or 20 years, and they have massive expansion potential. We tackled the hard part first," he says.
But with competition intensifying in the distributed SQL market, he may find the hard part goes on and on. ®