This article is more than 1 year old
The rise and rise of the open-source RDBMS
How relational database providers like AWS offer its customers the freedom to innovate
Advertorial Remember when open-source software was a cottage industry? Those days are long gone. Software issued under an open-source license now runs most of the internet - and database users are also interested in this model.
Open-source database engines have overtaken proprietary systems in popularity. Ten years ago, according to ranking monitoring service DB Engines, open-source database popularity ranking stood at around 35 percent compared to proprietary systems. Today, it stands at a little over 50 percent.
AWS has supported open-source SQL database offerings since it launched its Amazon Relational Database Service (Amazon RDS) in 2009. That first service was built on MySQL and the company has rolled out more open-source offerings since then across both its managed SQL and NoSQL services. In addition to support for commercial database engines, Amazon RDS also launched support for PostgreSQL in 2013 and MariaDB two years later.
With these services, it has offered its customers the freedom to deploy their applications the way they want, while providing them with the unique combined value of open-source innovation with AWS automation.
Freedom to operate
Open-source databases support many of the same use cases as its commercial equivalents, explains Andy Katz, principal product manager at AWS. With a relational database structure, an open-source implementation frees customers from what Katz describes as cumbersome procurement processes that often plague commercial software agreements. Some proprietary vendors offer notoriously complex licensing agreements that get even more sticky when running in virtual environments, and they are very aggressive about enforcing them. With open-source, customers do not have to comb over complex legal contracts for oblique usage restrictions and suffer through time consuming audits.
In addition to avoiding the licensing and auditing burdens that come with using a commercial database, using an open-source offering also makes a big difference when it comes to accessing the latest innovations and most comprehensive set of automated capabilities.
"Open-source provides users with an excellent baseline of capabilities and, building with this innovation, AWS is able to extensively improve database security, performance, scalability, durability, consistency, availability, and currency, more than what customers can easily achieve by themselves," he says. "We're able to do this in part because we, like everyone else, can see into the code and understand what's going on."
Database upgrades for Amazon Aurora MySQL-Compatible Edition, RDS for MySQL, and RDS for MariaDB, for example, have been made safer, simpler and faster with the recent introduction of Amazon RDS Blue/Green deployments. With Blue/Green deployments customers can create a managed staging environment that mirrors their production environment with just a few clicks. They can then make their changes on the staging environment, test these changes, and promote the staging environment to production in as fast as a minute. During the promotion process, Blue/Green Deployments uses built-in switchover guardrails to protect the promotion of their green environment to production. These guardrails cancel their switchover if it detects replication errors, instance health failures and more.
The ability to automate these essential tasks is a key factor in customers' adoption of managed, open-source databases. Implementing a database engine in-house can be a cumbersome process with lots of plumbing to create highly-available systems that are properly patched and backed up. The cloud layer enables customers to flex their computing usage, reflecting workload intensity in ways that would not be commercially viable in-house. In concert with the back-end capabilities that AWS provides, open-source database products offer better resilience, productivity, and agility while improving the productivity of developers and database administrators, Katz asserts.
Freedom to collaborate
The company is also busy contributing new capabilities to open-source database communities. An example of this is its new open-source project Trusted Language Extensions (TLE) for PostgreSQL. A vast library of thousands of extensions is the key to PostgreSQL's versatility. However, the use of extensions comes with some risk because extensions have access to the underlying file system. TLE gives customers the ability to self-serve by testing, certifying and curating the extensions they are interested in, including the extensions they write themselves. TLE supports popular programming languages that users love including JavaScript, Perl, PL/pgSQL, and most recently Rust.
Throughout 2022, AWS has affirmed its commitment to open-source by contributing to a number of open-source database projects, including PostgreSQL, MySQL, MariaDB, and Redis.
For example, this year it updated the popular MariaDB audit plugin to be compatible with MySQL versions 5.7 and 8.0. This new code logs database activities such as user logins and queries, which is useful for security and compliance purposes. AWS made that updated code open-source while also making it available on Amazon RDS for MySQL.
"We're making a lot of investments in PostgreSQL as well, such as improving the capabilities of logical replication, better support for sorting text data, and making it possible for developers to build extensions that customize the data archiving process," says Katz. AWS is also working on features that make it easier to support users with complicated Active Directory setups.
The company is increasing its capacity to make these upstream contributions, Katz says. When it began, it had a single team building all of its open-source contributions and handling bug fixes. It has begun expanding the structure of its open-source team, dedicating engineers full-time to working on specific open-source database projects.
Freedom to innovate
With support for various open-source projects, Amazon RDS customers have access to these operational enhancements, like Blue/Green Deployments, as well as open-source enhancements, like TLE. By having access to the best innovations from AWS and their favorite projects, customers can accelerate planning for and using the latest technologies available to the market today. This makes a big difference for AWS customers adjusting their workloads for industry altering innovations, such as generative AI.
"One of Amazon RDS's most requested capabilities this year so far has been for support for the PostgreSQL extension pgvector." says Katz. pgvector is a community-driven extension that allows users to store embeddings from machine learning, or ML, models in their database and to perform efficient similarity searches. Embeddings are numerical representations, or vectors created from generative AI that capture the semantic meaning of text input into a large language model, or LLM.
"This is exciting for our customers and the broader PostgreSQL community because the extension allows them to build ML capabilities into their e-commerce, media, health applications, and more to find similar items in a catalog," says Katz. "So, if you have a streaming service, you can use pgvector to provide your customers with a tv show recommendation that is similar to the one they just finished."
Now customers who are using RDS for PostgreSQL versions 13.11 and higher, 14.8 and higher, 15.2 or higher can start building with the extension. AWS customers can use pgvector to store and search embeddings from Amazon Bedrock, a fully managed service that makes foundational models from leading AI startups and Amazon available via an API, Amazon SageMaker, and more.
"We are also actively collaborating with the developers of this extension to help continue to improve pgvector for the entire PostgreSQL community." Katz says.
Aside from the freedom to innovate open-source provides, these technologies allow AWS customers to combine the innovations of open-source communities with the AWS stack to access an improved technical experience, according to Katz.
For example, AWS provides users looking to leverage the automation provided by Kubernetes with AWS Controllers for Kubernetes (ACK). This open-source service provides a set of Kubernetes Operators (or controllers) that let users manage AWS services directly from the Kubernetes API. AWS offers its customers the option to run on Kubernetes across its portfolio with availability for Amazon Aurora, Amazon RDS, Amazon MemoryDB, Amazon DynamoDB, and others. Each of these controllers run as containerized applications inside of Kubernetes and can have their access permissions to AWS fine-tuned using IAM roles for service accounts (IRSA). ACK users can connect Kubernetes applications directly to managed databases in Amazon RDS.
Freedom from repetitive work
The open-source database landscape has evolved dramatically just in the last few years, says Katz. The concept of free software had already gained substantial traction with the adoption of Linux and open-source tools from the likes of the Apache Foundation. It was only a matter of time before enterprises became more comfortable with running their information stores on the same model, especially with cloud-based implementations removing much of the grunt work.
"We saw the world start to use open-source databases and many large enterprise customers now trust them," he says.
Some of these enterprises include Tonkean, which had been running the open-source MySQL database with another cloud service provider before moving to Amazon RDS. The company, which sells an online no-code interface for orchestrating chains of complex business processes, was running into trouble with its existing provider. Its developers were left handling too much of the plumbing work to run the database smoothly in the cloud.
Tonkean migrated its MySQL implementation to AWS in 2019, using the Amazon RDS version of the open-source database. The database implementation was the same, but the benefit was in the innovation around it. The integration with the AWS underlying compute and storage architecture enabled the SaaS vendor to set up and use the product quickly, maintain it with minimal management overhead and improve performance.
Freedom to reliably perform
AWS offers its customers a range of options to optimize their workloads for performance. One of which is its Amazon RDS Multi-AZ deployment feature with two readable standbys, available to Amazon RDS open-source users. This option provides a configuration with three availability zones for extra performance. One contains a primary node and the other two contain readable secondaries. The advantage here is in both resilience and performance; two secondaries offer more protection, while write commit latency is approximately half that of the two-zone configuration, says Katz.
While the company offers some canned configurations out of the box, users can customize the details, Katz adds. They can use the Amazon CloudFormation tool to deploy their own configurations at scale for example, allowing them to spin up read-heavy or high-availability configurations depending on the workload.
AWS further reduces the burden of creating performant and scalable databases with Amazon RDS's Read Replica provisioning capability. These are nodes that allow customers to serve multiple copies of the database's data from multiple locations. They enable customers to handle reads from multiple points in the database, giving them better performance for heavy, read-intensive volumes. Today, open-source users can deploy up to 15 read replicas per instance. Amazon RDS's open-source engines also support cascading read replicas. With cascading read replicas, they can scale up to 255 read replicas without adding overhead to source databases.
Performance is an evergreen theme, ensuring that customers will always have headroom for future performance requirements. Apart from scaling horizontally for improved throughput, AWS also invests in the performance of each database instance for improving response time. The recent addition of RDS Optimized Writes for RDS for MySQL and MariaDB, and Optimized Reads, available for RDS for MySQL, MariaDB, and PostgreSQL, are examples of this. Internal implementations of writes and reads were improved recently so that writes and complex queries can occur up to 2x faster.
These operational advantages, along with the ability to scale up capacity on demand, are what separates open-source in Amazon RDS from home-spun, open-source configurations, Katz says. Free software is only part of the story; it is the configuration, task management, and optimizations around it that really bring it home.
As more companies consider open-source databases in general, cloud environments will factor them heavily in their road maps. The low-friction on-ramp, along with the managed operations, will prove appealing for many customers that want something that just works, allowing them to get on with the software innovation that drives their businesses forward.
Sponsored by AWS.