This article is more than 1 year old

Solving the problem of disaster recovery and performance at a global scale

How Amazon delivers multi-active, multi-region database replication for performance and availability

Sponsored Feature We all rely on data to make sense of the present and make informative forecasts. But data itself can be extremely unpredictable. Scaling your database architecture is complex enough when you have a strategy stretching years ahead but real-world events can throw the best laid plans off course.

We are not just talking aboutavoiding data center outages, power outages, cable cuts and the like. The volume and velocity of data growth is astounding on the macro level and unpredictable at the organizational level.

As Amazon Web Services (AWS) solutions architect Aman Dhingra explains, companies might have a data infrastructure that works at a local, regional level, but if a product or service takes off globally they can suddenly find they have users worldwide. And mergers and acquisition (M&A) deals can quickly expand a regional operation – and its applications and data – into a global entity spanning multiple geographies while achieving data sovereignty requirements

As the organization becomes more complex the stakes get higher in terms of fault tolerance and disaster recovery. The challenge for database specialists is that they must maintain a consistent application experience from region to region whether the users are internal employees or customers.

Accommodating a geographically distributed or even global user base "and still maintaining a predictable and delightful user experience" says Dhingra will often prompt companies to look at serving local reads or writes from an area that is near to the end user.

This means building out data infrastructure in other regions to reproduce the setup at base. But if you are not doing this in the cloud that also means finding the real estate to host it and the staff to set up and maintain it from day one. Permanent boots on the ground are required and putting everything in place can take months at best and more likely years.

Organizations taking this approach are also gambling on their traditional or legacy database architecture's ability to scale accordingly. And perhaps most significantly, highly skilled people end up focusing on maintaining this complex infrastructure, rather than innovating on behalf of their customers.

Ensuring data is replicated between the various sites too can be an ongoing challenge depending on the architecture deployed. Simply migrating data globally presents problems of its own in terms of internet turnaround. Timeouts, database crashes, and data conflicts can all add up to a less than magical experience both for users and for admins.

Get those boots on the ground?

So, it is worth noting that while Amazon offers no less than 11 different databases. The senior statesmen of the portfolio is the cloud-native serverless NoSQL database service, DynamoDB, which is used by some of its most high-profile customers. It forms the foundation for multiple, high traffic ecommerce, media or gaming platforms - the sort of applications where latency is critical to engagement.

Being a fully managed cloud database service is one draw for those companies, as it removes that ongoing maintenance and operational overhead for the customers themselves. But DynamoDB was also designed to be a serverless database, built to scale horizontally, which means it is particularly suitable for massive applications where low latency is essential.

It also offers an additional option designed to further enhance low latency and resilience – global tables - which simplifies rolling out replica data sets in other regions and enables live replication between them using DynamoDB global tables.

Disney built out its Disney+ subscription video on demand streaming service using global tables. Web conferencing specialist Zoom also used it extensively to handle the unprecedented and unpredictable growth experienced in both content and users during the pandemic. And of course, DynamoDB underpins Amazon's own operations, including its most recent Prime Day event, where requests hit a peak of 105 million per second.

"The simplicity of global tables is that even when you have an existing table in one region and you decide to go live in another region, the process of adding another replica table is seamless." explains Dhingra.

But the key differentiator, he adds, is that replication options for DynamoDB global tables include multi-active replication. "Most global databases out there can do reads everywhere but writes only in a single region." This is clearly limiting, all the more so if that single region is the one that goes offline. Any organization will of course have automated failover strategies, but these would take anywhere from tens of seconds to minutes to implement. When it comes to DynamoDB with global tables, Dhingra says, failover is instant and automated.

The replication process itself is carried out over AWS's own global network further reducing latency. "A major contributor for the replication itself in terms of latency is the internet turnaround time. But you know, it takes a few milliseconds extra for the actual replication to take place. And users have the ability to monitor this replication latency using the AWS CloudWatch metric that we provide," Dhingra explains.

Don't turnaround

Latency for end users will be consistent with that for a non-global DynamoDB table, meaning in the single digit millisecond average for any scale.

"Typically, what we see is replication latency between 300 milliseconds to one and a half seconds. Again, the major contributor is internet turnaround time. Imagine if you have a table in Sydney, and another table in Virginia. That's probably going to be 170-180 milliseconds of just internet turnaround time," Dhingra says.

This is buttressed by automatic conflict resolution which occurs when two locations are trying to write to the same piece of information. With "last write wins" the most recent write will be the one that shows up across all tables.

Whether an organization intends to operate internationally and implement global tables from the outset, or finds it needs to expand its coverage at short notice, DynamoDB allows it to spin up replication via the AWS console, the AWS CLI, or via popular Infrastructure as code tools like AWS CloudFormation and Terraform. Admins simply need to ensure that their application stack – active or passive - is set up in the additional region, ready to take traffic.

"Any existing data that the table might have, will also be populated onto the other region, and live replication will also kick off automatically," says Dhingra.

The security features offered by DynamoDB are also replicated across regions alongside the data with the same access control policies and role-based access available for configuration right down to attribute level.

"So, you could really go fine-grained on the access that applications can have on data within a DynamoDB table. Be it global or local. That's key as well."

The question is whether all of that scale comes at a cost. A common pattern of how customers use DynamoDB is integrating it with AWS Lambda, an event-driven, serverless computing platform. But, Dhingra explains, "You only pay per request or pay as you use it. So, you could still have the application stack in multiple regions and if it involves Lambda functions, and you will not be paying for that unless it's actually used."

Of course, there will be customers that use a variety of different compute layers, some of which may not be serverless, says Dhingra. "Some regions where you have set up global tables may not actually see a lot of customer traffic or it could vary compared to some other geographical regions. So the on-demand billing mode really helps to keep track of pay per request and even project your growth and in terms of cost."

Each individual organization has to plan and budget for how they are going to implement their global infrastructure using DynamoDB with global tables. That includes deciding whether to opt for a passive or active configuration, and thinking exactly how they will handle disaster recovery.

But the one thing they will not have to plan for is real estate and boots on the ground to implement data replication and ensure resiliency. Because, as Dhingra says, "In terms of global tables, it's a few clicks on the console." Amazon DynamoDB comes in a free tier for anyone who wants to take a test drive of this serverless, NoSQL database service.

Sponsored by AWS.

More about

More about

More about

TIP US OFF

Send us news