Meet the database that supports Amazon.com
The story behind the serverless database built to support the largest ecommerce company in the world
Sponsored Feature This year, one of Amazon's most innovative creations turned ten years old. In January 2012, the company released DynamoDB, a serverless NoSQL databased designed to be fast, highly reliable, and cost effective. This key-value store now powers many of the company's largest services, including Amazon's virtual assistant, Alexa.
An Amazon Science paper detailing what Amazon has learned from DynamoDB over the last decade was recently published, alongside a log of the continued innovation that have been made to the service since it began. We spoke to two of the paper's authors, AWS Principal Engineers, Akshat Vig and Somu Perianayagam, to get their opinions on the key takeaways from the last ten years.
DynamoDB itself was the product of a learning experience, recalls Akshat.
"Amazon.com had internal teams who were using relational databases. We were not able to scale to the needs that Amazon had at that time," he says. The relational systems that the company used were straining under the weight of its growing business, culminating in a series of 2004 outages during the holiday shopping season - "We knew there had to be a better way."
Amazon invented an internal service, Dynamo, to solve that. It was a key-value database focused on scaling to meet core use case such as shopping cart management and session services. Dynamo's architecture offered consistent performance at scale, enabling users to predict its performance even when operating at extreme volume.
The need for managed consistency
Dynamo's uptake among Amazon developers was initially limited and its architects quickly realized that complexity was a factor. Any team that wanted to use Dynamo had to install their own servers and manage that infrastructure. That was a sticking point for busy coders that just wanted to get the job done.
"What developers wanted was a way to have this as a service," explains Akshat. "They wanted a fully managed experience."
To get the flexibility of NoSQL with the convenience of a managed service such as SimpleDB, Amazon launched DynamoDB in 2012. DynamoDB combines the best parts of the original Dynamo design - incremental scalability and predictable high performance - with the best parts of SimpleDB, namely ease of administration of a cloud service, consistency, and a table-based data model that is richer than a pure key-value store, says the company. The upshot is the result of everything learned from building large-scale, non-relational databases for Amazon.com, later combined with the experiences of building highly scalable and reliable cloud computing services at AWS.
DynamoDB handles tasks such as resource provisioning, automatic failure recovery, data encryption, and software upgrades for example. Continuous backups are all processed in the background so that DBAs do not have to worry about them.
DynamoDB differs from Dynamo in several other ways aside from its focus on managed services. It is a multi-tenant system, as opposed to Dynamo's single-tenant option. Dynamo customers had to manage clusters but DynamoDB customers did not have to provision clusters any more. They could just create a table and seamlessly scale it up and down. It also decoupled compute and storage in an important departure from Dynamo's architecture.
DynamoDB offers a simpler fine-grained control than Dynamo, but this was by design. It also provides a completely managed experience that automatically scales up and down. That represents a simpler mental model for customers, enabling them to build applications without worrying about infrastructure. It also offers a simpler API and consistency level thus eliminating the need to modify custom conflict resolutions.
"The idea is that teams don't have to become experts on using the database," says Somu. "It reduces the operational complexity, reducing the tuning and configurations that otherwise become a barrier for adoption."
The advantage of a key-value store is simplicity and consistency, explain Amazon's experts. When accessing queries in a relational database, an organization is typically running JOIN statements across multiple tables, which introduces a level of uncertainty around operation time. Many applications have very simple access patterns that do not require the complexity offered by relational databases.
A key-value store stores a data item as a key (e.g. product) and a value, such as "Glitzy robot lawnmower". A customer can also store JSON documents as values, giving them more detail, but without the table-based links you find in relational systems. One of the first production projects involved a customer doing a Super Bowl advertisement where DynamoDB seamlessly scaled up to 100,000 writes a second, says Akshat. It was then scaled down after the event so the customer wouldn't incur any more costs. This was a big deal because it was not even considered possible at the time. It seems obvious now, but in the past databases were not that elastic or scalable.
DynamoDB provides fast access to items in a table by specifying primary key values. However, many applications might benefit from having one or more secondary (or alternate) keys available, to allow efficient access to data with attributes other than the primary key. To address this, developers can create one or more secondary indexes on a table and issue Query or Scan requests against these indexes.
Ten years of evolution
Amazon applied the lessons that it learned building and operating Dynamo and SimpleDB when rolling out DynamoDB. "We worked backwards from customers, asking what they liked about these products and what was missing," says Somu. The company has spent the last ten years doing the same thing to improve its managed service, he adds.
In 2013, Amazon responded to customer requests for more indexing capabilities by adding support for secondary indexes which supported more complex queries. Local secondary indexes can use document elements from within a record's value store, creating additional sort keys. Global secondary indexes produce alternative tables using different partition and sort keys to offer even more indexing options.
Perhaps the most foundational change in DynamoDB, which it would subsequently use to drive many other new features, was introduced in 2014. "Many customers were asking for features such as backup restores, event-driven programming, exports, and imports to and from S3," Akshat recalls. "We went back to the drawing board to identify the core building blocks that we needed to support all these use cases."
The team identified change data capture (CDC) as the holy grail. This architectural feature identifies and records database changes on the fly, making them available to other processes such as point in time restore – that enables customers to restore data from those changes at any point up to 35 days prior.
Amazon uses DynamoDB Streams and Amazon Kinesis Data Streams for DynamoDB to stream these changes, making them immediately available to other processes and services. Those processes might include analytics applications or even other databases or data warehouses.
"For a distributed database system like DynamoDB with millions of partitions, doing backup and restore isn't easy," says Akshat.
CDC helps to power backup and restore services, with other CDC-fuelled enhancements including the replication engine of global tables which enables multi-region, multi-active support.
Other feature requests called for similar kinds of capabilities you would find in a relational system. "A lot of customers were asking about a way to transactionally update multiple items across tables while ensuring atomicity, consistency, isolation, and durability (ACID)" recalls Akshat. "Generally, in a distributed database transactions are considered to be at odds with scalability." The team met this requirement by using a timestamp-based ordering protocol to support ACID transactions at scale.
DynamoDB also introduced bulk import and export capabilities. Bulk import allows companies to more easily get their data into the system en masse, and bulk export allows companies to export their table data to Amazon S3 where they can use the Amazon Athena interactive query service.
Some of its enhancements required retroactive data changes. For example, when it introduced encryption at rest, it had to encrypt all of the existing data behind the scenes. It was able to do this without any impact on performance and availability.
Cost and capacity improvements
Helping to drive down costs has been another area of focus. Many customers have data that they access infrequently, and were moving it to Amazon's S3 storage layer to reduce their monthly and annual outlay. Amazon responded with the Amazon DynamoDB Standard-Infrequent Access table class, which stored infrequently accessed data at more equitable rates while supporting the database's demand for consistency, availability, and speed.
Originally, the internal partitions that DynamoDB required to scale needed advanced specification from customers. The database charges on a throughput basis, using read and write capacity units (RCUs and WCUs). At first it would distribute those units equally across all of the partitions in a customer's database.
"This early version tightly coupled the assignment of both capacity and performance to individual partitions, which led to challenges for our customers," explains Somu. DynamoDB launched with the assumption that apps uniformly access data in tables. However, a lot of workloads have non-uniform access patterns - over time and space. When the request rate within a table is non-uniform, splitting a partition and dividing throughput equally can sometimes result in the hot items of the partition having less available capacity than it did before the split. Customers would have to provision table capacity based on their peak traffic forecasts rather than automatically scaling partitions to suit.
In 2018, Amazon introduced on-demand capabilities, enabling the database to allocate capacity units at the table level. This eliminated the need for users to worry about partition allocation. DynamoDB now automatically splits partitions based on throughput consumed. The split point is chosen by traffic distribution within the partition. The feature is a simple switch in the DynamoDB interface, and customers can switch a database between provisioned and on-demand without changing the underlying design or data structure.
Sticking to the primary objective
The added features did not stem from a blind rush to deliver, explains Somu. "These evolutions have all focused on scale, consistent performance, and availability," he adds. Before the development team introduced any feature, it explored the potential effects on those all-important transaction latencies.
Thinking carefully before making feature enhancements enables Amazon to maintain high availability by keeping its code performant and reliable. For example, on Amazon Prime Day (July 12-13 2022), the company hit 105.2 million requests per second as it served systems including Alexa, and its fulfilment centers. The database did not waver, recalls Somu. It delivered single-digit millisecond performance across trillions of API calls.
DynamoDB's age in years now exceeds its average response time in milliseconds. As it looks ahead to its next chapter, who knows what improvements we might be writing about ten years hence?
The paper that dives deep into the creation and evolution of DynamoDB - Amazon DynamoDB: A scalable, predictably performant, and fully managed NoSQL database service - Amazon Science - can be downloaded here.
Sponsored by AWS.