AWS doubles down on innovations that redefine the database management experience
Unveiling Amazon Aurora zero-ETL integration with Amazon Redshift and much more
Advertorial AWS has kept up a furious pace of innovation across its services. Amidst the pipeline of new features at its re:Invent conference in December, AWS is also increasingly focused on helping customers work across its products, by offering increased levels of automation and easy integration.
This is particularly true of the Amazon Aurora relational database service. To fully grasp the service, it helps to understand the underlying infrastructure, which AWS has built from the ground up to offer high performance and unparalleled availability and scalability.
Built for the cloud with full MySQL and PostgreSQL compatibility, Amazon Aurora offers the performance and availability of commercial grade databases at cloud scale and at a fraction of the cost. It includes a fault tolerant storage layer that scales up to 128TiB per instance and automatically maintains six copies of data across three availability zones – customers only pay for one copy and for the storage consumed at the database layer. Other key features include Amazon Aurora Global Database, which allows customers to distribute data in up to five additional AWS Regions and supports replication speeds of less than a second. Scaling is further bolstered by low latency read replicas, while Fast Database Cloning allows users to spin up clones in minutes with just a few clicks and without incurring additional storage fees.
In April 2022, AWS introduced the general availability of Amazon Aurora Serverless v2, which added the ability to scale instantly to hundreds of thousands of transactions in a fraction of a second, without having to change instances or carry out other scaling operations. Aurora Serverless v2 has become the fastest adopted feature in the history of Aurora.
"Aurora Serverless v2 is a game changer for customers, not only helping them scale up and down seamlessly and without downtime, but also by delivering significant cost savings, especially for fluctuating workloads. Serverless sets a new bar for a fully managed service by putting the days of demand forecasting, provisioning, and managing database capacity in the rearview mirror without performance or availability tradeoffs." says Colin Mahony, General Manager for Amazon Aurora at AWS.
Moving on to the new features unveiled at AWS re:Invent, the focus on automating advanced operational techniques and interoperability of their services is evident.
Saving time and maintenance through automation
One of the biggest announcements is that Amazon Aurora now supports zero-ETL integration with Amazon Redshift, removing the need to build and manage data pipelines between Aurora and the company's data warehouse service, Amazon Redshift.
Usually, one of the main challenges for customers deriving near real-time benefits from their data is the need to perform ETL operations to move transactional data from an operational database into an analytics data warehouse. It is a requirement that often leaves engineers having to construct and maintain complex data pipelines.
"Traditionally, you need to figure out your pipelines and your flows, do the mappings of the data from one environment to the other, and carefully establish user access and controls," Mahony explains. "None of them are impossible, and you can do them all manually today but it takes time and ongoing maintenance. The first thing that we do with this zero-ETL integration is we pre-seed the Amazon Redshift environment in bulk, then we send the Change Data Capture (CDC) stream based on the new data coming in."
Amazon Aurora zero-ETL integration with Amazon Redshift automates this process, he explains, "making it more seamless for our customers while also enabling near real-time analytics."
That automation gives customers the opportunity to more quickly perform analytics without disturbing operational systems or impacting throughput. At the same time, it gives users a broad data warehousing and analytics environment to run against.
The feature was conceived after observing how customers were struggling with real-time analytics, says Mahony. "We simply listened to our customers and worked backwards to either automate or eliminate the manual steps and processes many were already doing."
Moreover, he continues, "The tight integration allows customers to enrich Aurora data with other data sources quickly so that actionable insights can be made. Within seconds of data being written into Aurora, customers can now combine this near real-time transactional data with other data sources in Amazon Redshift to gain actionable insights - whether informing machine learning predictions or running analytics."
Amazon RDS Blue/Green Deployments - the name comes from the blue/green deployment methodology for production system updates – is a new feature launch that enables database updates with zero data loss in as fast as a minute.
"We are providing an automated DevOps technique for database upgrades accessible to everyone. The blue environment is your production environment and the green environment is your staging environment. The key benefit of this new feature is that we automate the step of creating a green staging environment in just a few clicks," says Mahony. "We create that production ready green staging environment without any changes to the application that's touching the database. We do the update without any data loss. The process involves some CDC that we're doing on the back end, as well as built-in switchover guardrails."
By making the setup more automated, the intention is that customers will be freed up to do much more testing and reduce disruption to live applications and services. "They can actually test the engine upgrades, test the schema modifications if they've done some of those, and the parameter setting changes in that staging environment, without impacting the production workload. When ready to switchover, we block writes to the green environment which keeps the blue and green environments in sync. The switchover is as fast as a minute, reducing the upgrade downtime from hours to under a minute in this parallel environment." Mahony explains.
The next announcement introduces a feature called Trusted Language Extensions for PostgreSQL, which takes a similar approach to the creation of high-performance PostgreSQL extensions for Amazon Aurora PostgreSQL-Compatible Edition, as well as Amazon RDS for PostgreSQL. An open source project in its own right, Mahony said this new feature will enable developers to immediately integrate an existing extension instead of waiting for AWS to certify it, while also making it safer to build new extensions.
Developers love PostgreSQL for many reasons such as the thousands of extensions which let you add functionality to a database without having to fork it. "There are thousands of extensions available that developers tell us they want to use in production within a managed database. Writing these extensions without defects require both an expert knowledge of C language and PostgreSQL functionality. Moreover, when these extensions are loaded you have to give developers access to underlying file systems, this creates risk from unintended actions that can lead to issues like data leaks or data loss," Mahony explains.
"With Trusted Language Extensions for PostgreSQL, we limit any defects from an extension to a single database connection so you can have a safer runtime. Whenever you have extensions running in a database, you have to be careful with the blast radius. So, we're putting up a lot of guardrails here to protect our customers' environments." he adds.
This is one of the several ways that Aurora contributes back to the open-source community. "Open-source communities are important to us and year-to-date, we've made 435+ accepted contributions back to the open source community." says Mahony.
Organizations continue to increase their usage of operational databases, which are critical to gaining value from data and fueling innovation. Therefore, it is essential to ensure that the data held within them is secure, which is why AWS has extended its Amazon GuardDuty threat detection technology into Amazon Aurora, with Amazon GuardDuty RDS Protection.
"Amazon GuardDuty RDS Protection analyzes and profiles RDS login activity for potential access threats to Aurora databases, and is enabled with just a few clicks without impacting operational database performance or requiring any modifications," says Mahony. "This was a very practical application of our machine learning capabilities."
While Amazon GuardDuty is already applied to a "wide fabric within AWS", the latest move was more about bringing the technology to the database service and applying it directly to activities there.
The feature delivers key benefits for customers when it comes to protecting data and database workloads, including "proactively identifying potential threats to the data that's stored in their Aurora databases and being able to monitor all login activity to existing and new Aurora databases in your account," explains Mahony.
Cross-service integration and operational automation the way forward
AWS remains absolutely committed to continuing innovation within its individual services, but this will increasingly be accompanied by tighter cross-service integration to make them more seamless for customers. This would mean extending the broader machine learning capabilities within guardrails so that they can be focused on database specific activities, for example.
"We've always believed in the right tool for the right job and that one service does not fit all," says Mahony. "But at the same time, it's really important when it makes sense to make those connections between the different services. And so, as you can tell from these announcements, there's a lot of energy going into that from an innovation perspective."
"We're carefully listening to our customers to help solve their most challenging problems. These innovations re-define the database management experience by providing access to advanced operational techniques using automation. We started the year with serverless as the new foundational experience for databases. We wrapped up the year with access to automated database upgrades in as fast as a minute, a commitment to building with the open source community, and the promise of a zero-ETL future." Mahony says, "These announcements are especially exciting because they reflect our commitment to operational automation both within and between our services allowing our customers to get more done in reliable and repeatable ways."
Sponsored by AWS.