Flexible document model is secret noSQL sauce

AWS takes Amazon DocumentDB enterprise-ready capabilities to the next level

Sponsored Feature As the pace of innovation accelerates, many organizations are moving from traditional monolithic architectures to microservices-driven data management models that give developers the flexibility to work with a diverse set of data volumes and types.

The days of application developers content to remain being tied to one over-burdened database may be over, and as a result, Amazon Web Services (AWS) has created a portfolio of purpose-built databases to help customers find the right database service for each unique workload.

In 2019, AWS expanded its presence in the managed cloud database market to provide a fully managed MongoDB compatible database, Amazon DocumentDB (with MongoDB compatibility), to help customers optimize their enterprise JSON workloads. Amazon DocumentDB is a native JSON document database that is compatible with MongoDB APIs, drivers, and tools.

The database is especially useful when working with complex documents that are dynamic and may require adhoc querying, indexing, and aggregations such as catalogue applications, content management, and IoT telemetry, says the company. The secret sauce is the flexibility of the document model combined with the benefits of a fully managed solution with consumption-based billing and integration with other AWS services.

Native document-based storage in JSON

Amazon DocumentDB supports the JavaScript Object Notation (JSON) format, which offers developers what might be a more natural way of thinking about how data is queried when building applications. This human- and machine-readable format stores data as objects made up of name-value pairs and arrays (for example {"Publication": "The Register"}). The JSON fields can be grouped into lists and nested to create hierarchical structures.

"The growth of JSON really didn't start with databases. It started with applications," believes Vin Yu, senior technical product manager at AWS. "Many of the languages out there were using JSON as a means to transfer data across the internet. So wouldn't it be nice if you didn't have to manipulate, translate, or serialize this data into another format before storing it?"

Querying in a native JSON document database is more natural for developers who think about things in application storage structures. Instead of using JOIN statements to combine records from two tables, as you would in a relational database for example, you can query a single document.

"Using a native JSON document database also allows you to be flexible as application requirements change", says Yu. If you are maintaining user profiles and decide to launch a new product line for pet owners, then you would want to know whether your existing customers have pets, along with other information (How many? Which animals and breeds? How old? Names?).

"In a relational database, you'd have to create a new pets table and relate that back to the existing tables," he explains. "In a native JSON document database, you can just insert that as a new sub-document in line, so there's no need for you to modify any existing schema."

Not all users will have a pet, but that is fine; only those with a furry, feathered, or scaly friend would be included in this sub-document. You would not have to use NULL links as you might in a relational database.

AWS took the flexibility of a native JSON document database and combined it with a fully managed cloud-native architecture. "Our goal is to continuously work backwards from customer needs", says Yu, "and customers wanted the flexibility of a native JSON document database and the ability to scale enterprise JSON workloads, all without managing time-consuming database administration tasks."

"When it comes to scalability, the Amazon DocumentDB architecture is really unique because compute and storage are separated," he adds. "This enables us to scale up our compute to handle millions of reads per second while storage scales independently up to 64TB of data per cluster."

Since the 2019 launch, AWS has been expanding Amazon DocumentDB capabilities to increase its appeal among enterprise customers. This year was one of its most active for Amazon DocumentDB development as it unveiled a tranche of features to make the product even more convenient for developers and database administrators (DBAs).

Tuning and monitoring with Performance Insights

In April, AWS launched Amazon DocumentDB Performance Insights, a database performance tuning and monitoring feature designed to help customers quickly identify the database load and determine when and where to take action.

Yu outlines a typical problem. Say you have an ecommerce workload that follows typical usage patterns throughout the week. Every lunchtime, there is a spike that hits the system hard enough to cause performance problems for users. You need to find out what caused it, but how?

"In the past, customers still had to do all the correlation and data gathering themselves," he explains. They would also need to persist that data for trends analysis so that they could identify, for example, correlations between specific queries and spikes in resource usage over time. "Not only were they gathering this data themselves but they also have to store it themselves."

Performance Insights changes that by introducing a dashboard with visualization capabilities that let DBAs see a database load over time. It stores historical performance data for a rolling seven-day period at no additional cost.

The feature is still in preview, but Yu says that customers have already used it to optimize their workloads. For example, one DBA found an automated workload that attached itself to a cluster without their knowledge. With Performance Insights, they quickly discovered the reporting application had connected to the environment during the middle of the day, slowing down the database that was using the same resource.

Performance Insights can also help DBAs understand what types of queries are hitting the database, Yu adds.

"If you try to query the database based on a particular field that's not indexed, that could be very slow," he points out. That could prompt the DBA to alleviate the performance issue by building a new index.

Fast database cloning and Decimal128 data type

In July, the cloud provider added another feature to Amazon DocumentDB, fast database cloning. This feature quickly replicates clusters, enabling DBAs to conduct various development, analytics, and testing tasks without disrupting production.

The cloning feature can be used to help verify and test database changes on production data without impacting performance.

"You can change some parameters for Amazon DocumentDB and see how performance will react," Yu says. "A great use for this feature is to add some indexes and see how it impacts performance before implementing these changes in production."

The feature is also useful for running ad hoc reporting jobs that would interfere with read-write performance. Simply clone the database quickly and run the report against that up-to-the-minute data, he advises.

Two weeks after the fast database cloning release, AWS rolled out a third enhancement for Amazon DocumentDB in the form of the Decimal128 data type.

This data type extends the level of precision that the database can support when making calculations, and it helps to avoid rounding errors that developers can sometimes run into when dealing with decimal numbers. For example, 0.1 + 0.2 might yield 0.3 in your calculator, but it yields 0.30000000000000004 in the Python REPL.

"You may think that this is small, but it is really important," says Yu. Rounding off two hundred-quintillionths of a cent might not seem much, but when it comes to high precision applications, using Decimal128 is critical.

"It's intended for applications where it's necessary to have very high precision, especially on the decimal numbers," he explains. "You will want to use the Decimal128 data type when working with financial data, tax computations, or scientific and engineering applications that require a lot of precision in the decimal point."

Expanded auditing capabilities

In late August, AWS added support for yet another Amazon DocumentDB feature, Data Manipulation Language (DML) Auditing. This is in addition to the existing Data Definition Language (DDL) auditing capabilities already available with the service. DML Auditing lets customers audit reads, updates, inserts, and delete operations to Amazon CloudWatch.

It also enables customers operating in regulated industries, such as finance or healthcare, to comply with governance and compliance regulations, and enhance security posture, by recording database events.

"With DML Auditing you have an audit trail for everyone who accessed data," Yu says.

The DML Auditing feature is not turned on by default, but turning it on for new and existing database clusters is simple. With a slight configuration modification in the AWS console or AWS Command Line Interface (AWS CLI), the feature logs various aspects of a change event. These include when the change event happened, who was responsible for it and where they logged in from, the data or privilege that they changed, and what command caused it. You can also watch for Data Definition Language (DDL) events that change data structure, such as DROP or ALTER.

Alongside retroactive scans, the feature provides ongoing alerts, which is useful for those protecting sensitive data in the here and now. For example, DBAs handling privileged data can receive alerts when certain events take place such as reads, deletions, or bulk updates.

DBAs can also configure a retention policy in CloudWatch and export DML Auditing data to S3 for longer-term storage, Yu explains.

With customers ranging from Capital One to Dow Jones, Amazon DocumentDB is already well on its way to becoming a go-to tool for enterprises that need the flexibility of a native JSON document database without the hassle of self-management.

"We are committed to continuously innovating on behalf of our customers," says Yu, "and we look forward to this year's re:Invent to share more about upcoming features that will help our customers scale for the future".

Sponsored by AWS.

Similar topics

Similar topics

Similar topics


Send us news