Analyst house Gartner has warned users not to mix and match data management products from the three largest cloud hyperscalers.
With AWS, Microsoft Azure, and Google Cloud Platform leading the pack, cloud infrastructure providers saw spending hit $41.8bn in Q1, 35 per cent up on a year earlier according to Canalys. While organisations employ some hedging in terms of infrastructure, when it comes to data management and analytics, it is best to stick with one player, said Sanjeev Mohan, VP analyst at Gartner.
Although AWS, Azure, and GCP all have strengths and weaknesses in the data toolset, there was little to gain from trying to pick the best technologies from each vendor, Mohan told the Gartner Data & Analytics Summit this week.
"Tread carefully. If you're new and you say, I want to be multi-cloud, then think again. It is a full-time job keeping up with a single cloud, and understand[ing] all the nuances, [you don't have time] to go and start investing in multiple clouds."
- Funding frenzy from AWS, Microsoft, Google, Salesforce pumps ex-Hadoop wrangler Databricks' value to $28bn
While Mohan used his session to discuss the technologies offered by each of the biggest cloud providers in their data and analytics stack, which one to go with comes down to what kind of user you are.
"Any of these cloud providers will work for you from a business point of view, the question is more non-functional: what skills do you have? If you are a Microsoft shop, then go with Azure, because you're familiar with Microsoft tools. If you have a very sophisticated IT shop and you want a vast breadth of products and you are happy to do a lot of engineering then AWS provides you with more choices. If you are open source as a department, then Google Cloud," Mohan said.
If you're into picking your cloud provider for a data platform, Mohan offered a long list of criteria, including consistency, open source vs proprietary, ease of deployment, data ingest, data access, performance, scalability, and so on.
Just because you're in their cloud doesn't mean you have to go with the multibillion-dollar provider's data tools either. After a bit of fence sitting and hand wringing about not being "unfair to hundreds of products that we cover on weekly basis on non-native tools," Mohan conceded that Informatica has a vast array of data engineering products. Apache Kafka, Snowflake, and newbie data ingestion unicorn Fivetran got honourable mentions too.
But let's not forget old friends. Despite the stratospheric rise and ungainly fall of Hadoop, the stacks of the main cloud providers offer traces of a few hidden elephants. Microsoft, for example, has Hadoop buried in managed service HDInsight. Once relying on Hortonworks, it now uses a Microsoft distro. AWS has Elastic MapReduce, a managed Hadoop and Spark service for batch and online processing. Similarly, Google relies on Hadoop, the distributed file system it inspired, in its Dataproc tool for creating persistent objects, as well as in Data Fusion, a data transformation product. ®