Microsoft will later this year release its open-source friendly lake for big-data analytics in the cloud.
Azure Data Lake Store – announced at Build in April – will be released as a preview, Microsoft said Monday. The store will be built on Apache YARN for developers and data scientists, to analyse information, and will use Azure HDInsight – a managed service for Hadoop, Spark, Storm and Hbase.
HDInsight was developed by Microsoft working with Ubuntu-shop Canonical and Hadoop spinner Hortonworks, Microsoft and Canonical said. In preparation for the preview Microsoft Monday threw the doors open on managed Linux clusters.
Microsoft’s Visual Studio Tools, meanwhile, have been updated to build, debug and tune for Hive queries and Storm topologies running in HDInsight.
YARN is the HDFS-compatible Apache project that makes MapReduce more flexible because it decouples resource management and scheduling from data processing.
The idea is you can support a broader variety of applications.
Microsoft’s data lake – a phrase now with huge currency among the big-data analytics and infrastructure providers – will feature analytics as a service.
Data Lake Analytics will use Microsoft’s U-SQL language it claims “unifies the benefits of SQL with the express power of user code". You’ll use U-SQL to analyse data in Azure, Azure SQL Database, and Azure SQL Datawarehouse.
The firm billed Azure Data Lake Store as HDRS for the cloud, capable of chomping petabyte files and that would be “enterprise ready". ®