This article is more than 1 year old

Managed document-based data stores come of age

Why enterprises are choosing Amazon DocumentDB to power their mission critical applications

Sponsored Feature For decades, relational databases ruled the enterprise computing space. Their structured data storage models and strong ACID (atomic, consistent, isolated and durable) qualities made them a no-brainer for enterprise applications, especially when handling structured data.

But over time we have seen a seismic shift in the database industry, as enterprise applications have evolved. This has resulted in substantial growth in demand for non-relational database systems. Today, a particular kind of non-relational model - the document database - has become a go-to option for particular use cases. And, three and a half years into the release of its own document database offering, Amazon DocumentDB (with MongoDB compatibility), Amazon Web Services (AWS) is seeing increasing growth in this area.

Businesses are demanding more flexible applications that support fast-evolving customer workloads. Many of these applications demand more flexible storage and retrieval than traditional relational schemas permit.

Non-relational databases, and in particular document databases, are better able to support exciting new use cases that reflect underlying shifts in commercial models, says Barry Morris, general manager at AWS. "For structured workloads the relational databases model is a great choice, but for workloads with more complicated datasets, the document model has major advantages" he advises.

New data models for new business models

This shift is driving major business process innovation, with HP printers providing a good example says Morris. Back in the day, inkjets had a simple capex-based consumables model; when a cartridge ran out, you went to the store and bought a new one. Today, HP offers an ink subscription model. When the printer senses that its ink levels are low enough, it triggers a request for HP to deliver more ink cartridges.

HP's system uses Amazon DocumentDB for the simple reason that different printers send different information, and software upgrades add functionality over time, changing each printer's messaging data. So, each printer sends its messages as JSON files to HP's management service, which stores them natively in Amazon DocumentDB.

"One attractive aspect of this is that there is no mapping of data structures from the printer's idea of what it is reporting to some predesigned database schema, as would be the case in a relational database," Morris explains. Instead, the database parses the JSON file directly for operations like indexing, querying, and aggregation. If a relational database processed the same data, it would have to store it as a binary object or a text string.

"The idea of just storing that JSON and processing it in the database in the same form that it's materialized outside the database, that concept is really important," Morris continues.

How to handle complex, fluid data structures

Relational databases are well-suited for structured, predictable data formats with clear boundaries. The data in a checking account with its well-understood amounts, payees, and transaction dates is a good example.

Increasingly, though, data structures are becoming more complex and unstructured. Imagine a home retailer trying to list a range of household items and their properties in its online catalogue. A TV has a screen resolution and size. A rug has a shape, a pattern design, and a fabric. Your bookshelf has a height, design, and capacity. A houseplant has a genus and environmental requirements. You could manage all that in a relational database, but it would take some serious stretching of the relational model and we would pity the responsible database administrator.

A better option is a hierarchical data format that can store an indeterminate and diverse array of properties for each record. This is where JavaScript Object Notation (JSON) comes in. Based on a subset of ECMA (the language behind JavaScript), this open, lightweight data interchange format is both human- and machine-readable. It uses objects, a collection of name-value pairs, and arrays, an ordered collection of values to store information in its documents, and includes the ability to organize multiple values into lists.

When it first appeared, JSON provided a lightweight way to serialize and communicate complex objects. Then, document databases made it easy to natively store data as JSON, providing an extendable alternative to rigid relational schemas.

Changing a relational schema generally requires a schema migration to update the underlying structure. Document databases simply read and process whatever is in the record, making it possible to add new values as needed.

Faster, more scalable systems

Processing JSON natively in a document database improves developer productivity by removing friction, Morris adds. Developers do not have to manage translation between a relational storage format and their own applications.

"The system will store whatever you want, in whatever manner you want, and allow you to process that data in whatever fashion you like," he explains. "It's not opinionated about schemas and storage strategies."

Native storage and processing improve database performance by eliminating the conversion overhead between JOINed relational tables and application-friendly JSON formats. The more diverse the data is, the more important this becomes.

Application-native queries are important for applications that monitor large, complex, and fast-evolving data sources. They need strong operational analytics, querying fast-moving data sets on the fly to highlight immediate trends. Native document-based queries enable developers to integrate queries on live data directly into the application rather than running batched reports on historical data.

"In fact, the self-describing nature of JSON allows a query engine to search for things in a search-like fashion," says Morris. Unlike a SQL query, you could search for all items by name across the whole database without having to understand the data structure or contents.

This ability to run ad-hoc queries without worrying about the constraints of a schema make it easier to retrieve whatever information you want from a document-based data store - even if you do not know what you will need in the future.

The British Broadcasting Corporation (BBC) uses Amazon DocumentDB to aggregate and manage content from multiple news feeds, compiling it for its customers. The complex, fluid nature of that content makes it better-suited to document storage, which allows it to run powerful queries on the data simply.

A managed approach to document storage

Morris states that document databases and JSON storage really shine when combined with managed services in the cloud. This is where Amazon DocumentDB, itself a managed service, comes into play.

He believes that the market is ripe for many back-end databases to move across to a managed document store. On-prem environments or unmanaged virtual servers in public cloud environments are littered with JSON-centric applications, he says.

AWS includes a Database Migration Service (DMS), that helps to translate data from relational data models to Amazon DocumentDB. Migrating to a document storage format managed in the cloud can also be a great way to eliminate a whole tract of technical debt, and one that simultaneously provides the ability to support flexible applications.

Moving applications to a managed environment also enables the database to take care of mundane tasks that would normally fall in the database administrator's lap, such as backups and restores, redundant failover, and database updates.

Amazon DocumentDB supports powerful JSON queries and provides enterprise features like transactions, built-in security, durability and scalability needed for global workloads, states Morris. The system supports storage, backup, volume cloning, and failover across multiple availability zones. It also features multi-region support through Amazon DocumentDB Global Clusters. "In short, native JSON databases are now comparable to SQL databases in these dimensions," he asserts.

While it is compatible with non-relational engine MongoDB, Amazon DocumentDB is more cloud-centric, says Morris, because it scales storage independently of compute capability. "We felt that a customer with a workload that scales beyond the compute capacity of a single node would want more compute capacity without having to split their storage onto a second node," he explains, stating that customers should not have to shard their systems unnecessarily. Amazon DocumentDB currently supports up to 64TB in an unsharded volume.

Scalability and performance are increasingly important in industries like IoT, which deal with massive device volumes. Smart home company Plume - another Amazon DocumentDB customer - uses native JSON communication and storage to support hundreds of millions of third-party devices with many different data points.

The company moved from MongoDB after that service began to strain under the load of its growing IoT management system. It was able to use Amazon DocumentDB without sharding and shaved around $450,000 off the potential cost of self-managed document storage.

Benefiting from a back-end cloud ecosystem

Since launching in 2019, Amazon DocumentDB has integrated with AWS services including other databases, analytics systems, and machine learning services. This enables customers to build cloud-native applications comprising multiple back-end services such as Amazon ElastiCache for caching, Amazon Aurora for fast managed relational data structures, and Amazon DynamoDB for horizontally-scaled native key-value storage.

Developers can connect Amazon DocumentDB to Amazon Kinesis data streams for real-time streaming data applications. They can also integrate it into broader machine learning applications using Amazon SageMaker to manage the entire machine learning lifecycle, Morris adds.

"The AWS investment in the service is ongoing, and is encouraged by a broad customer base," he says. "It's more difficult for a customer investing in a self-managed service to invest as much in the service than it is for a provider that is delivering the service to multiple users."

Document-based data stores bucked the trend in databases. They moved away from relational databases' normalized model, which emphasized qualities such as the single storage of data and grouping data dependencies. Breaking those normal forms empowered developers at companies like BBC and Plume to focus on agile development, rapid iteration, and accelerated feature velocity.

"The fact that document databases are being used for increasingly mission critical loads and for valuable data is really encouraging," concludes Morris. "It's really what we have built the system to do". For many modern applications, managed document-based storage has finally come of age.

Sponsored by AWS.

More about

More about

More about

TIP US OFF

Send us news