There are lots of ways to put a database in the cloud – here's what to consider
Choosing the right one for you means understanding the trade-off, says MySQL expert Peter Zaitsev
Feature It has been a decade since Amazon RDS launched support for PostgreSQL. Since then, the relational system authored by Turing Award winner Michael Stonebraker in the 1980s has gone on to become the most popular database among professional developers, used by nearly half of them, according to Stack Overflow's 2023 Developer Survey.
I've seen people seduced by the cloud provider, and they fire their DBAs and everybody who knows about databases, but then they figure out when they need schema design, and query optimization – well, Amazon's not going to help them
In parallel to PostgreSQL's rise in popularity comes a bewildering array of ways to deploy the database system in the cloud or exploit PostgreSQL-compatible database services. For example, as well as hosting standard versions of PostgreSQL as in RDS, the major three cloud providers, including Azure and Google, also provide PostgreSQL-compatiable enhanced database services such Aurora and AlloyDB.
And that's just PostgreSQL. Similar options are available for other popular database systems, including MySQL, MongoDB and MariaDB. To navigate these choices, developers and database administrators need to understand the strengths and weaknesses of each approach.
As the author of the MySQL performance bible and founder and former CEO of opensource database consultancy Percona, Peter Zaitsev has witnessed the rise of the various ways of deploying database in the cloud and cautions about making choices lightly.
Fully managed or put in some grunt?
Whether the users might want to manage their deploy in a VM or adopt a serverless system managed by a vendor will depend on how much work they want to do, how much control and flexibility they want to have and how much they can tolerate being locked into a particular vendor.
Added into the mix, the cloud vendors offer proprietary databases for specific workloads, for Amazon offers DynamoDB, a fully managed proprietary value-key database, while Google offers BigQuery, a fully managed, serverless data warehouse.
"These systems are only available if you buy them from a specific cloud vendor: you cannot run it on your own," Zaitsev said.
Alternatively, users can get a standard system based on a popular open source database like PostgreSQL or MySQL, but significantly enhanced and presented as a fully managed service like Amazon Aurora, and Google's AlloyDB.
Lastly, there are fully managed "shrinkwrapped" services based on MySQL or PostgreSQL, such as Googles Cloud SQL or Amazon's RDS.
"This is some standard database technology just with some GUI and interface on top of it and some automatic backups and stuff like that," Zaitsev said.
Going from first to last, users face the most lock-in to the least lock-in with each of these choices. But they should also question what cloud vendors mean by a "fully managed service."
"That is what the cloud vendors recommend to users and what they push them towards, and it also typically comes with the highest cost, because they charge more for that compared to just the basic infrastructure to run a database," he said.
- Putting LLMs into production is a monumental task – vector databases could light the way
- If you don't brush and floss, you're gonna get an abscess – same with MySQL updates
- FerretDB 1.0 offers fresh approach to open source document databases
- AWS dragged over lengthy downtime to migrate PostgreSQL DBaaS
Don't fire the DBAs!
"But when they talk about a fully managed services, you can ask, 'OK, who's responsible for performance or security?' And they come back and say 'This is a shared responsibility.' They expect you to do your part while they keep the environment up and running. That is often misunderstood. I've seen people seduced by the cloud provider, and they fire their DBAs and everybody who knows about databases, but then they figure out when they need schema design, and query optimization – well, Amazon's not going to help them. Any cloud provider would turn around and say, 'Hey, guys, we are keeping the database up and running, but all that stuff, which is specific to application and database usage, is on you'," Zaitsev said.
Another challenge to using shrink-wrapped or enhanced database services from the cloud vendors arrives when users want to use systems across cloud infrastructure from different cloud providers, according to corporate policy or geographic limitations.
"Amazon RDS, for example, sounds simple until you have to run it in different clouds. Then you have to deal with the nuances of RDS and the cloud infrastructure as well, and then it becomes very complicated," Zaitsev said.
Users can manage database deployment in the cloud themselves using virtual machines, but the fastest growing approach to cloud deployment of database is via Kubernetes, the open-source container orchestration which originated with Google.
"It gives us a programmable infrastructure, which is much more flexible and advanced than you get just dealing with VMs. At the same time, can you run it on-prem and on all the clouds. Kubernetes has become much more mature and much more capable to run a database compared to the early stages when it was designed to be as solutions for stateless applications," Zaitsev said.
Into the throng of database options in the cloud, a group of vendors have begun offering serverless systems, which is their own back end, but a front end compatible with a common database. For examples, CockroachDB and Yugabyte both offer serverless database with PostgreSQL-compatible front end.
In June, Cockroach CEO and co-founder Spencer Kimball told The Register it took five years to port the serverless system to Azure, a "non-trivial amount of work" that involved understanding the tolerances and failure of a different cloud architecture.
While Yugabyte claims 100 percent compatibility with PostgreSQL, and MariaDB recently launched a PostgreSQL-compatible front end to its distributed MariaDB back end, Kimball admitted CockroachDB does not have full PostgreSQL compatibility, but it is getting there.
Users, however, should question what lies behind serverless databases, Zaitsev said. "There are really servers in the end, right? It is just you are not charged for them and you may or may not be aware about what is going on with the servers."
One approach to serverless was to scale the instance size up and down according to the load. Another was to offer a multi-tenant approach in Google Spanner or CockroachDB.
"They have a different idea. You have a distributed database which is shared by multiple tenants. The benefit of that approach is, you have more ready to use capacity, which can be dynamically shared. If you need more resources, you don't need to reallocate and spin up the larger instance size," Zaitsev said.
Open source databases: What are they and why do they matter?READ MORE
Serverless is convenient if the load is very irregular. Users do not pay for keeping a system up and running when it is not in use. On the other hand, if the system is well used, and the operator understands and can predict demands on the system, then it can become less valuable from a pricing perspective, he said.
Earlier this year, Gartner said the DBMS market grew by 14.4 percent in 2022, reaching $91 billion, with the cloud platform-as-a-service model capturing nearly all the gain, with cloud spend at 55 percent exceeding on-premises at 45 percent.
The progress to the cloud is slower than Gartner predicted in 2019, when it said by 2022 75 percent of all databases would be deployed or migrated to a cloud platform. Users seem to be taking their time to navigate the many options available to them in deciding their future database strategy. ®