How Amazon broke free from Oracle

‘Our DBAs all became cloud architects’

Sponsored Amazon has promoted AWS managed database services for several years now, enticing proprietary on-premises database customers over to its own services in the cloud. The company backs up the benefits of migration with its own practical experience, because it migrated thousands of its databases away from Oracle to the cloud.

Amazon was built on traditional relational databases from the beginning. Long before the ecommerce giant launched AWS, it used Oracle to support its consumer operations, handling the core of its retail transactional systems.

"Amazon had used these Online Transaction Processing (OLTP) databases to process and store data for everything from payments information, wallets, ordering data, our inventory, through to our fulfillment centers and our identity management for customers," explains Thomas Park, Senior Software Development Manager for Consumer’s Business Data Technologies at Amazon.

Amazon also had a separate fleet of Oracle RAC systems handling analytic workloads. This powered a data warehouse that would gather all the data from the transactional systems and run around 600,000 reporting jobs each day.

Fraying at the edges

These systems had served the company well for years, but Amazon's growth trajectory was beginning to strain the Oracle-based infrastructure. Traditional legacy Oracle databases were confined by the computing and storage capacity that Amazon purchased to support the transactions. "As we grew our business into multiple marketplaces and added more product SKUs, we just couldn't scale these databases to meet our business need," Park recalls.

That problem was exacerbated by the lack of virtualization. The Oracle databases ran on bare metal legacy hardware, meaning that Amazon couldn't just deploy more compute and storage capacity from its cloud infrastructure when needed. Instead, it had to deploy more hardware that sat idle just in case capacity spiked. This was common, thanks to the volatile nature of ecommerce workloads. Large events like Black Friday were already testing its limits. When the company introduced Prime Day in 2015, transaction volumes soared again.

Counting the cost of Oracle infrastructure

The difficulty of managing non-virtual servers and storage also complicated Amazon's Oracle operation. Amazon’s Oracle database teams had to estimate/predict as best it could as to when to add more disks to specific servers based on hardware failure patterns, while also running a lot of extra servers running in different regions so that they could pick up transaction processing if other machines failed.

Capital expenditure wasn't the only factor that inflated costs for Amazon's Oracle database team. The maintenance and support burden was significant, explains Park. For example, its database backup and recovery as well as monitoring systems were maintained and operated across multiple data centers. Administrators had to manage these systems in multiple locations and retest them whenever the company introduced new hardware or a new version of the database.

Then there were the licensing issues. Oracle's infamously expensive licensing costs scaled with Amazon's usage requirements, costing the ecommerce giant more money every year.

On-premises proprietary database licensing contracts are notoriously difficult to extract yourself from. Moving to another proprietary system would have been legally burdensome and the technical and commercial problems would have remained. "Once you put your data on their systems, it's extremely hard to switch to another vendor’s database system," Park explains.

Amazon could have moved away from Oracle earlier. It already had a large cloud infrastructure and various managed relational and NoSQL databases to choose from. However, the technical challenges presented a barrier. Moving thousands of databases, many connected to non-standard systems, is a huge undertaking that takes time and planning. Finally, there were the human considerations.

Deciding to make the move

In July 2015, Amazon introduced Prime Day, a massive price promotion involving more than two million deals. Prime Day pushed Amazon to the point where it had no choice but to migrate away from Oracle. Both the transactional databases and the data warehouses were reaching breaking point. The team needed time each year to design and prepare new servers to meet the additional transactional demand, but the extra spikes in workload had compressed that window. “Prior to Prime Day 2015, Amazon had almost 10 months to prepare and scale for the next peak event, “ Park says. “ However since Prime Day was held mid-year, we now had only four-to-five months to prepare for the next peak events... We were starting to see scaling issues. We couldn't possibly push out the hardware fast enough.”

The analytics systems were also splitting at the seams. "We didn't have enough time to load the jobs from multiple sources and then finish mission-critical reports that had to be out the door by 4am on a daily basis," he adds. "We were constantly missing that SLA."

Amazon resolved to migrate its systems to various managed databases in AWS in a project that would last from late 2015 to 2019. The Oracle RAC analytics migration began in late 2015 and was completed by 2018. The OLTP migration started in 2017, eventually expanding to include non-critical services alongside critical ones. That project, nicknamed ‘Rolling Stone’, finished in 2019.

Preparing the migration process

The migration team created a program management office (PMO) to set parameters for the migration project, including timelines, performance requirements, and procedures. Part of the process included setting goals across all service teams that articulated what they wanted to achieve with the migration and convert that into milestones. The PMO organized regular weekly, monthly, and quarterly reviews across all service teams to track progress.

Each service team analysed its data profile to pick the best destination database for its existing Oracle system. In some cases, they chose a relational system such as Amazon's own Aurora native relational system, or other relational systems available through the Amazon Relational Database Service such as PostgreSQL. These served applications with well-established traditional relational schemas, but they were more scalable and manageable than the legacy Oracle system, with lower licensing costs and more equitable terms.

For other service teams who identified opportunities to improve their database operations with different data models, AWS offered different managed database options to suit different data types and applications. These included DynamoDB, the key-value data store that it had launched in 2012, and the ElastiCache in-memory architecture.

In some cases, service teams decided to simply move a read-only database to an S3 bucket. One such migration moved postal address lookup data from an Oracle database running on a $65,000 server to an S3 bucket costing just a few dollars per month.

Whichever approach a service team took, it went through an optimization process to ensure that the target database operated as efficiently as possible. That included analyzing the existing data structure and contents so that the company could purge any data that wasn't necessary before making the move. Many teams would also add metadata to the existing data, enabling them to better secure it in the cloud and optimize efficiency by applying data handling policies.

The service teams were able to take advantage of home-grown tools that Amazon used to help bring on-premises database customers across to its managed cloud database services every day. Where the company needed to migrate data to a different schema structure, it could take advantage of AWS's own Schema Conversion Tool (SCT), which automates conversion of the source database schema to alternative data structures. After service teams had designed the destination data structure and operating parameters, they had access to AWS Database Migration Service (DMS) for the transition process. "A lot of teams used DMS to move the data and then test them out," Park recalls.

Teams would initially run the destination database in read-only mode, using Amazon's change data capture technology to compare data and ensure that the destination database was operating consistently with the Oracle original. "Then they'd cut over using their system and start writing on to the primary database in AWS," he says.

The migration process was staggered, with databases moving across in batches. Depending on the size and complexity of each database, an individual migration took between six and 16 months.

Migrating skills to the cloud

The PMO used the migration process as an opportunity to retrain existing DBAs and give them a clear migration path of their own into the cloud, recalls Park. "It's better to have the Oracle relational database person learn about NoSQL and its benefits, and then allow them to help design the structure," he says. After all, these experts had known this data for years.

Most of the Oracle DBAs stayed for the migration process, he recalls. "After the migration finished in 2019, every single one of them moved over to AWS and became cloud architects to help customers migrate to AWS databases."

The analytics migration moved the Oracle RAC data to a data lake architecture in AWS, combining Amazon's Redshift data warehouse and S3 storage. This and the transactional migration yielded some significant functional and financial wins for the company. It has slashed its database costs by around 60 percent while provisioning more capacity, and it has simplified cost allocation between different service teams by introducing consistent practices.

Performance has also improved dramatically across AWS's internal database portfolio. Services that replatformed to DynamoDB have enjoyed a 40 percent reduction in latency, even though they handle twice as many transactions, the company said.

Overall, Amazon migrated almost 7,500 databases during the three-year transition period, involving over 100 service teams and transitioning 75 Petabytes of data. It reduced database administration overhead by 70 percent, replacing a lot of “undifferentiated heavy lifting” with more value-added data architecture tasks.

So, a quarter-century after the company took its first online order, Amazon is today managing data using its own AWS managed database services from start to finish.

Sponsored by AWS

Biting the hand that feeds IT © 1998–2021