Take these three steps to unify and manage your data

Putting your information to work for you

Sponsored Do you ever get the feeling that you're working for your data, rather than the other way around? Many on-premises data architectures have built up gradually over the years, creating a disjointed set of silos. It all works, but you probably don't want to prod it too hard.

These siloed architectures are brittle and prone to breakage. Because they're production environments, that makes it difficult to do anything new with your data. Its latent value remains locked away.

Moving data into a managed cloud environment is a good opportunity to break down those barriers and create a platform for innovation, says Rahul Pathak, Vice President, Analytics at Amazon Web Services (AWS). Data modernization isn't something that the C-suite talked about, he explains, but that has changed lately.

"Data has now become a board level conversation for most organizations," says Pathak. "Because data over the last couple of years has become more important, companies are now realizing that we have to view it as a strategic asset."

This need for data-driven business has grown as companies feel greater pressure to reinvent themselves. Technology continues to morph business models. The old ways of doing things don't work as well when new companies use mountains of customer and transaction data to develop new services that undercut and outperform the competition. So, organizations must find new ways of doing things to survive.

Pathak adds that the pandemic sharpened the need for reinvention still further, as companies found themselves compelled to change the way they operated just to stay in business.

"The classic example would be restaurant chains or hospitality where they had to build online assets, enable online ordering, and capture all of that data and fulfil those orders," he says.

Many companies either had to do that from scratch or scale up existing systems overnight. That involved a rush to the cloud as they took advantage of everything from remote working and collaboration through to online applications. Gartner anticipates that this will keep going, forecasting a 23 per cent increase in public cloud usage around the world in 2021.

This move to the cloud goes beyond forklifting existing applications into virtual machines or using SaaS products; it also involves changing the way that companies think about data once it's in a cloud environment. The cloud promises some attractive benefits, including agility, scalability, and cost-effectiveness, but capturing those benefits means managing data differently. In short, it means building a modern data architecture that puts the cloud front and centre.

It takes some planning to build this cloud-centric data model, says Pathak, adding that Amazon has spent years honing the process through its internal expertise and its network of partners. The company breaks it down into three steps: modernization, unification, and innovation.


The modernization part involves migrating databases to a cloud-based data infrastructure. Typically, companies begin with their own on-premises data infrastructure including proprietary licenses.

Some companies approach this migration in steps, beginning with a simple lift-and-shift migration into the cloud where companies bring their SQL Server or Oracle databases in a VM. This works up to a point, says Pathak, but they're bringing those restrictive licenses along with them.

Simply migrating these to virtual machines in the cloud doesn't take full advantage of a data-centric infrastructure, warns Pathak. Those restrictive on-premises licenses are just as burdensome in a VM, and will hold back companies trying to reinvent their data architectures.

An intermediate step is to move classic Oracle deployments to the Amazon Relational Database Service (RDS. This eliminates the licensing issue, converting it to a simple pay-as-you-go service using on-demand or reserved instances. This gets you much further towards data utopia, says Pathak, but you'll still have to do some background tinkering with an RDBMS not designed for the cloud. For example, you must manually set up things like multiple availability zones in Oracle.

The third option is migration to a cloud-native database like Amazon's Aurora relational system that was built from the ground up to run in the cloud. This unlocks performance, availability, and cost benefits that you can't easily get by shoehorning legacy databases into the cloud.

"The way Aurora handles the write storage in parallel is entirely different from when databases were originally designed," he says, explaining that developing for the cloud from scratch enabled the company to make new decisions based on a highly scalable and available storage layer.

The cloud databases in Amazon's portfolio are managed, meaning that all these underlying mechanics are handled automatically for the user. This approach brings both proprietary and open-source database users to the cloud, says Pathak. Open-source users might not have the same burdensome software licenses to deal with, but they still benefit from a cloud-based service that takes care of all the mundane operational tasks involved in maintaining a database engine. They also get the advantage of a database geared for high availability.

The other part of the data modernization process is rearchitecting data for purpose-built databases. While Aurora transfers relational data models into the cloud, AWS also offers other managed database services with tailored support for specific use cases. These range from streaming time series applications through to graph database models via Amazon Neptune, which are well-suited for mapping complex transitive relationships. Others include DynamoDB, which is useful for web-scale applications that forgo relational schemas in favor of key-value pairs.

This use of purpose-built databases for specific use cases goes hand-in-hand with a modernization in application architecture, points out Pathak. "It's about going from an old way of thinking about an application as a monolith that goes against a single database to a loosely coupled, highly distributed architecture with microservices working with multiple purpose-built databases," he says.

This part of the modernization process helps to drive down costs in several ways, he adds, citing Disney as an example. The entertainment company expanded its existing relationship with AWS when launching the Disney+ streaming service. It took advantage of purpose-built databases from the beginning, using AWS key-value stores to manage its subscriber watch lists. This optimization for simple 'put-get' data storage helps drive up efficiency.


The second step in a mature cloud data migration is unification, says Pathak. "One of the biggest challenges is that customers often have their data in a lot of different silos," he explains. "Unification is about being able to access your data in a well-governed way, no matter where it lives." Moving to a cloud infrastructure presents the perfect opportunity to fix that problem, he says. Customers can feed their data into data lakes while also keeping the necessary data in purpose-built data stores for performance. To do that, they'll use Lake Formation, which is AWS's tool to build and manage data lakes stored in Amazon Simple Storage Service (S3). AWS can also unify data from multiple database engines via the AWS Glue engine, which offers a metadata catalogue allowing customers to search and apply access rights to data across the board. This is useful for unified data governance.


Having ported data to the cloud and centralized governance, it's time to start doing interesting things with it. This is where the third stage in AWS's data modernization framework, innovation, comes into play.

The innovation step uses services developed in the cloud that would have been difficult to create in on-premises environments, with a special focus on machine learning. Training machine learning models is a compute-intensive process perfectly suited to scalable cloud environments. This all happens behind the scenes for many machine learning services such as natural language processing, speech recognition, and computer vision, which are then exposed to customers via simple cloud APIs. For those that want to train their own models, the company supports frameworks like TensorFlow with GPU instance pricing.

AWS also offers SageMaker, an integrated development environment for machine learning tasks that allows developers and data scientists alike to prepare data for machine learning models, build machine learning models, train them, and then deploy them when they're optimally fitted.

Pathak says that AWS has also worked to close the gap between managed databases and machine learning tools. "We've built a number of integrations to bring machine learning closer to the data, making it a more integral part of the data store itself," he explains.

For example, the company's Aurora ML service lets customers access machine learning models via SQL queries on transactional data in AWS's cloud-native managed relational database. It supports a range of machine learning algorithms, including those offered by both AWS and its partners, and those developed within SageMaker. Customers get the output from those models as SQL query results. It offers a similar capability for its Neptune graph database engine.

"We can perform machine learning on the data from the database in a more performant way, without having to move it unnecessarily," Pathak concludes. The data modernization journey will take time and planning, but it will be worth the effort for companies that want to unlock the inherent value in their data. The world is filled with DBAs and developers doing their best with fragmented, fragile environments. Unifying and managing that data will put them back in the driver's seat.

Sponsored by Amazon Web Services

Similar topics


Send us news