Migrating to Amazon Web Services' Aurora just got a lot easier

Making the switch using Babelfish

Paid Feature Moving your proprietary on-premises database to the cloud makes your infrastructure more flexible. Moving to a managed database that the cloud service provider looks after for you also saves your DBA a ton of work in the long run. In the short term, though, the latter involves some heavy lifting. What about all the existing schemas and transactional code in your existing installation? Transitioning all that to a new format in the cloud takes some heavy lifting.

Up till now, this might have given developers and DBAs pause when considering a transition to Aurora, Amazon Web Services' cloud-native relational database. Developed from the ground up with the cloud in mind, the service offers numerous benefits, including high availability by default, flexibility, and simple license-free operation.

The cloud giant already offers its Database Migration Service (DMS) to help with the transition, including a tool that eases the task of converting schemas between on-premises and managed databases. Now, it expects a new capability called Babelfish to make the process a lot easier.

Amazon likens Babelfish to teaching Aurora PostgreSQL the SQL Server dialect as a second language. By understanding the Microsoft SQL Server dialect, Aurora can execute its commands rather than having a translator in the middle. Babelfish ensures that the underlying PostgreSQL database behaves with the transactional semantics of SQL Server, thus greatly simplifying the complexity of migrating a customer application.

The upshot, much like the Babel fish in Douglas Adams' The Hitchhiker's Guide to the Galaxy, is that an application that talks MSSQL can make itself understood to Aurora, and vice versa. That should save a lot of headaches.

Application migration headaches

Amazon’s schema migration tool makes it simple to migrate a database from an on-premises system to Aurora. However, application developers must contend with subtle differences when migrating their code between different flavours of relational database, such as PostgreSQL and MSSQL.

For example, data types can vary in subtle ways between PostgreSQL and MSSQL, even if they are identical in name. This is because vendors often develop their data types to exceed ANSI standards so that they can give their customers more functionality.

A programmer working with the Money data type in MSSQL will run into problems when migrating to PostgreSQL. In Microsoft's database, it has four-digit precision. In Postgres, it only has two. That can introduce changes in the ways that applications work that might not always surface without thorough testing. Some data types that exist in one database might not exist in others at all, requiring serious refactoring work on the application side.

Another problem crops up with the transactional semantics. MSSQL handles some transactional aspects differently to [than?] Postgres. When inserting data rows into a table and a "duplicate-key" error occurs, the former will roll back only the SQL statement where the error occurs, but will keep the transaction open. PostgreSQL is less forgiving and will roll back the entire transaction when it encounters a duplicate key. That can affect application behavior and lead to different run-time results.

Developers will also quickly run into issues with the language syntax in their database. Each has its own language variant which often varies from vanilla SQL. Microsoft's TSQL dialect is different from PostgreSQL, which offers its own PL/pgSQL language. Both of these are procedural languages, including constructs that allow the database to return complex data types, but they include different language primitives.

Developers typically use syntax conversion tools to handle this issue, auto-generating code to bridge the gap between one database's language and another. This has drawbacks, though. It makes code difficult to read, especially if it was already auto-generated by an object-relational management (ORM) tool. That in turn makes the code hard to verify, especially if the original coder is no longer around. Furthermore, it could violate an organization's existing coding specifications.

Another way to cut through this whole tangled mess is to re-engineer things on the application side. That's a big undertaking, though, involving rewrites and tests of systems already in production that might have multiple dependencies. That wouldn't be so bad if you're working with a modular microservices-based environment, but companies migrating to Aurora won't always be at that stage yet and will be working with traditional, monolithic applications.

Reducing friction in the database migration process

Amazon has done its best to ea se this migration process with the DMS and Schema Conversion Tool (SCT). Once the customer has chosen an application to migrate, the DMS helps transfer the application as-is into a version that can talk to Aurora. AWS uses its SCT to analyse the legacy database access code and estimate how long it will take to migrate it to the Aurora environment, allowing extra time for complexities like stored procedures. If you have incompatibilities between your proprietary database code and the open source database, the company will provide experts to consult with you.

Babelfish is the company's attempt to streamline and enhance that process to make migration easier for customers. It’s a capability of Aurora that understands the MSSQL language and can execute the commands natively in PostgreSQL, handling the appropriate data type conversions and taking care of MSSQL language primitives when communicating with Aurora.

Your MSSQL application still thinks it's talking to Microsoft's database, but you've pointed it at a Babelfish server, which understands most of the MSSQL language and responds appropriately. And what does Amazon mean by “most”?

By way of an answer, the company gave us the following analogy: imagine you are an American in Amsterdam. Most of the inhabitants are fluent in English and will understand what you are saying, which makes it easier to visit or live in the city. Occasionally you will use an English word or an American slang that people will not understand and you might have to use a different word to get your idea across. English, after all, is a second or third language for most Dutch people in Amsterdam.

Amazon says that this bilingual ability will reduce the development time for customers making the leap to its managed relational cloud database. In practice, there will often be some work to do beforehand. You need to work out which data types your application is using, and then ensure that you have the appropriate extensions installed on the Babelfish server.

Murali Brahmadesam, director of software for Aurora at Amazon, promises some serious benefits from Babelfish. The system enables developers to use MSSQL objects through the PostgreSQL interface, making it easy to deploy existing applications while keeping most of the code the same.

"Babelfish will reduce the amount of code you need to rewrite by 90 percent or more for most applications," Murali says. That also means that it will slash application development times, he says, enabling customers to make their move to the cloud more quickly and eliminating the need to alter code tests or client drivers. "Instead of taking a year to migrate your application from SQL Server to PostgreSQL, you can do it in a much shorter duration, perhaps one or two months," he says.

Making the switch using Babelfish

Babelfish will be open-sourced on GitHub with an Apache 2.0 license so that companies can host their own versions, but AWS will also offer it as a managed service. The company is upgrading its Schema Conversion Tool to identify how much of a customer's database schema Babelfish supports, and will equip the SCT to either make changes to the existing database code automatically or make recommendations to the development team.

"The tooling remains the same, but the customer experience will be better," says Murali.

The advantage of connecting to Babelfish is that customers get the convenience of lift-and-shift migration but with the benefits of a managed database. They can make the move to a cloud-native database engine but without changing anything in their application stack until they're ready. You could even leave your application on-premises if you want, reducing access latency using Amazon's Direct Connect service, says Murali.

The accelerated transition will appeal to customers eager to escape draconian proprietary database licensing fees today without investing in a major refactoring project, he adds.

"Now they have full control over application modifications in their own timeline," he says. "They can continue developing in TSQL while using Babelfish if they want to, but over time they can move to pgSQL, if they wish - they can decide what is more comfortable."

Developers can still add cloud-native functionality on the Aurora side. The database can call lambdas on record inserts, which enables developers to extend its functionality with serverless functions in the AWS cloud without changing their existing application stack. They can conduct transactions using the database interface and schema that they're used to via the MSSQL client driver and invoke lambdas for extra functionality on the Aurora side. Murali cites integration with the AWS Sagemaker machine learning service as an example use case.

Amazon's Babelfish will support application developers who see the value in moving to an alternative database engine but have been fretting about the task of migrating their business logic. It simplifies the onboarding challenge, reducing the friction involved and leaving customers better prepared to make the jump.

This article is sponsored by AWS.

Biting the hand that feeds IT © 1998–2021