Cockroach Labs CTO: Google became too comfortable, I wasn't being challenged
Peter Mattis on starting a multibillion-dollar company as serverless database hits GA
Interview Cockroach Labs released the serverless version of its eponymous database for general availability last week. The Register took the opportunity to catch up with CTO Peter Mattis – a Google veteran who is also behind open source image editing software GIMP.
Mattis started the Cockroach project with former UC Berkeley roommate Spencer Kimball around eight years ago. They also worked together at photo-sharing startup Viewfinder. With Cockroach they were trying to get the kind of horizontal scalability from a distributed relational database they had seen at Google, where Mattis had worked for nine and a half years. He'd worked on projects including the original indexing and storage system for Gmail, and the distributed file system Colossus.
"Eventually, Google became too comfortable. They just have too much money flowing around their system. I didn't feel like I was being challenged and it felt like I was stagnating," he says.
Kimball and Mattis left Google to found Viewfinder with CEO Brian McGinnis, formerly of Lehman Brothers. The company produced a mobile application designed to organize and share photos, and went on to be acquired by Square in 2013.
"During that time at Viewfinder, along with one of my co-founders, we had the initial idea for CockroachDB, but we were not at all in the database market, so we shelved that and used Amazon DynamoDB [the value-key, document store NoSQL database]. When we got bought by Square we saw database problems again, and that's when CockroachDB really got started," he says.
It started as an open source project, but after some VC interest Kimball and Benjamin Darnell – co-founder and chief architect – started the company and Mattis joined a couple of months later. Cockroach Labs was incorporated in February 2015. "Square was nice but it was probably going to be fun to do this thing we were passionate about," Mattis says.
In December 2021, Cockroach Labs hit a nominal $5 billion valuation with a $278 million funding round, its sixth in short succession.
Like MySQL fork MariaDB, CockroachDB leans heavily on the Business Source License (BSL), which means the source code is available, but users may not use CockroachDB as a service without an agreement with Cockroach Labs. Cockroach says BSL is not certified as an open source license, but most of the OSI criteria are met, an area of debate for committed source-watchers.
CockroachDB is PostgreSQL compatible on the front end, with a distributed file system on the back end. The company launched a Database-as-a-Service (DBaaS) in 2018 and a beta of a serverless product last year. The product is now generally available.
"This is ready if you use it for production workloads. It's also by far the easiest way to stand up a CockroachDB instance if you want just to kick the tires," Mattis says.
While the DBaaS takes care of the specifying servers and operating systems for the database, the serverless version also avoids engineers provisioning clusters for the database itself, allowing them to scale up and down as they need, only paying for what they use.
"Users get everything they do with the DBaaS, but with serverless, we're actually not saying you have to provision and tell us exactly what kind of machines and what disk sizes you want. Instead, we have a large physical host cluster behind the scenes, and then you're getting these virtual slices of that. That's what your database is: it's a virtual slice of a larger cluster.
"We are managing the cost of packing both of these efficiently into a single physical host cluster, and what you get in return is something that looks just like it elastically scaled up and down to meet your load and requirements."
There is a cost saving against over-provisioning instances, which Mattis argues was common among cloud-based deployments. Even so, there are circumstances where developers may want to provision the database themselves where workloads are known, and this may cost less.
- Microsoft offers SQL Server 2022 release candidate to Linux world
- Open source databases: What are they and why do they matter?
- Oracle, Microsoft agree to shared custody of your workloads in the cloud
- UK signs deal to share police biometric database with US border guards
Amazon's RDS Aurora, which is also PostgreSQL compatible, has had a serverless product since 2019.
But Mattis says "If you choose something like RDS at some point, if you get successful, you will run into limitations and require a major re-architecture of your system. This is the sharding problem that has struck pretty much all the big names in tech and other successful companies."
CockroachDB – so named after Kimball's idea for creating a database that was unkillable – has competition. Also PostgreSQL compatible on the front end and distributed on the back is YugabyteDB, which takes its back-end inspiration from Google Spanner.
Mattis argues that the approach has its limitations, as its SQL optimizer can only push down so much into the distributed environment.
"When we're doing SQL execution, we actually distribute the SQL execution throughout all the nodes in the cluster. They have a tougher hill to climb in order to do that, but I don't want to claim it's impossible," he says.
We contacted Yugabyte for comment. ®