This article is more than 1 year old

Unrelenting storage growth drives release of Amazon DocumentDB Elastic Clusters

New Amazon DocumentDB feature promises higher storage capacity and support for applications at massive scale

Sponsored Feature Recently, Amazon Web Services (AWS) announced the latest feature for Amazon DocumentDB (with MongoDB compatibility) that allows organizations to use Amazon DocumentDB for document workloads of virtually any scale and size. Elastic Clusters allows elastic scaling of document databases to handle millions of reads and writes per second with petabytes of storage capacity.

Over the last few years, native JavaScript Object Notation (JSON) document databases have offered an appealing alternative to traditional relational products with more flexibility than rigid relational schemata. Cloud-based fully managed infrastructures have made these next-generation databases even more attractive to enterprises with high-volume data requirements by eliminating much of the administrative overhead.

Many applications are now pushing the boundaries of these databases. Rapid growth is driving the need for even more data throughput and storage capacity. While some existing native JSON document databases can meet these scaling needs today, they often are slow to scale, impact performance during scaling operations and can be costly.  This makes it difficult for companies to balance ease of management, scale and cost.

It is this unrelenting growth which has prompted AWS to release a new Amazon DocumentDB feature that promises higher storage capacity and a focus on supporting applications at massive scale.

The history of Amazon DocumentDB

AWS released Amazon DocumentDB as a managed database service in January 2019 with the goal of bringing more flexibility to its cloud-based customers. It is a document-based database, structured around JSON documents instead of the table-based structures found in traditional relational databases.

The JSON format, popular in NoSQL database implementations, is useful for application developers that tend to think in terms of cloud APIs rather than SQL queries. As both the storage format and query mechanism for JSON simplifies the query process, developers do not need an object relational mapping tool to translate between a relational schema and their application.

JSON is a useful storage format for applications that might change their data structures regularly. As a single document format, it does not use the same rigid schemata found in relational systems. Changing a relational schema is a big deal, involving a carefully planned schema migration that could have an impact on production. Organizations can certainly do it, but it should not be done lightly. Not only does Amazon DocumentDB offer schema flexibility for application development, its cloud-native architecture also offers flexibility to scale operations. Adding a new field to a JSON record involves simply adding another name-value pair to the record that needs it.

"This is great for flexibility," explains Vin Yu, senior technical product manager at AWS. "Not only does Amazon DocumentDB offer schema flexibility for application development, its cloud-native architecture also offers flexibility to scale operations. Decoupled storage and compute tiers make it possible to scale document database storage from CPU capacity," he explains. With this cloud-native architecture, storage scales automatically up to 64 TiB (tebibyte) without any impact to applications. "The 64 TiB storage limit is high enough to meet the needs of many applications," says Yu.  

"Scalability can be a challenge for applications that push the envelope in terms of storage and throughput. These might include popular mobile financial applications used by millions of people," Yu explains. Another common application is gaming. "Imagine the number of people nowadays using multiple devices for entertainment," he says. "When they store a new record for a video game or access a user profile, all those records and documents need to be stored and accessible to millions of users at any time." A company must have confidence in its database storage and retrieval capacity at scale.

Customers could get around these issues by using more powerful virtual instances to run Amazon DocumentDB, but eventually they will hit a ceiling there as well. The next step involves spinning up multiple Amazon DocumentDB instances to handle more data. That would give them multiple writes and take them above the 64 Tib per-instance storage ceiling. The downside is they would have to manage those instances themselves, working out where to store their data across a multi-instance service and remembering where to retrieve it.

How Amazon DocumentDB Elastic Clusters works

It is for this reason that Amazon wanted to make Amazon DocumentDB even more scalable for customers and built a fully managed and easy to scale solution. Step forward Amazon DocumentDB Elastic Clusters.

Amazon DocumentDB Elastic Clusters uses sharding to divide data across underlying compute instances called shards. Amazon DocumentDB allows customers to use the MongoDB sharding APIs to create sharded collections that enable data to be distributed across the shards, each with its own writer, expanding their throughput. "The service scales from tens of thousands of writes per second seen under the original Amazon DocumentDB to over a million," Yu explains.

"The shard key tells the system how to distribute the data. This could be anything from a user ID string to a timestamp. Amazon can work with customers to select the most appropriate shard key to ensure even shard distribution across multiple clusters," explains Yu.

"With Elastic Clusters, scaling compute is simple and customers can easily scale in workloads on Amazon DocumentDB in minutes with little to no downtime or performance impact," says Yu. "Elastic Clusters also offers differentiated management capabilities such as no impact backups and rapid point in time restore enabling customers to focus more time on their applications rather than managing their database."

AWS prioritized making shard management easy. "AWS has their own control planes to manage all of this," Yu explains. "Organizations don't have to worry about the networking aspect of sharding inside their cluster, load balancing, patching or about adding and deleting shards."

Scalability on demand

This behind-the-scenes management brings more than just increased scalability capabilities to Amazon DocumentDB Elastic Clusters. It also helps make the system easier to scale down when workload requirements fall. "Scaling shards in this way is as simple as changing a single number in the dashboard," Yu says. "This helps customers manage cost if they find that their average transactions per second fall during certain periods. Because the service decouples storage and compute capacity, they can scale down their computing capabilities without scaling down their document storage."

AWS has maintained Amazon DocumentDB's high availability in Elastic Clusters. "The service copies each write to the database six times regardless of how many replicas are running, and the customer only pays for one copy. That makes the data highly durable," Yu explains. "Elastic Clusters also offers a default of three replica nodes per shard cluster running across multiple availability zones (AZs)."

"These replicas are not just there for high availability purposes," Yu says. "They are used as regular production read replicas, improving the performance of read-heavy databases. Clients can configure the number of read replica nodes per shard cluster, adjusting it up or down based on the criticality of their database. For example, a development or analytics database might not need that many replicas," he says.

Amazon DocumentDB Elastic Clusters pricing and consumption model

"Similar to Amazon DocumentDB, Elastic Clusters has a pay-as-you-go pricing model. Customers enjoy the predictability of consumption-based billing," says Yu. "Elastic Clusters is based on shards rather than individual instances. Customers have the ability to configure the number of vCPUs which is a predictable unit that they pay for." Customers can start using Elastic Clusters with a single-shard and low vCPU configuration then scale up by adding more shards or increasing the vCPUs as needed.

The original Amazon DocumentDB can still be used too, which will be comforting for those that prefer to configure and pay for their document database service on a per-instance basis. Nevertheless, Yu sees customers gradually adopting Elastic Clusters as their default managed document database option over time. "With Elastic Clusters, customers can cost-effectively meet the needs of their most demanding document workloads," he says.

Amazon will continue to add capabilities to Elastic Clusters, working backwards from customer needs. "We're excited to continue innovating for our customers," Yu adds. "We're focused on providing our customers the capabilities they need to scale for the future."

Amazon DocumentDB Elastic Clusters represents more than just a new feature. It also points to the increasing size requirements of modern cloud-based applications. As data storage and processing volumes continue to rise, techniques for increased scaling will become more critical to support today's most demanding applications and to give customers the capabilities needed for future growth.

Sponsored by AWS.

More about


Send us news