Ignite Microsoft has pushed out Azure Synapse Link for Cosmos DB to general availability in an effort to bring its transactional NoSQL database closer to the analytics workhorse data warehouse.
Teased in May last year, Microsoft said the link would comprise of two main components.
Firstly, a Cosmos DB would house a column-oriented analytical store within containers in addition to the existing row-oriented transactional store. "The analytical store is fully isolated from the transactional store such that queries over the analytical store have no impact on your transactional workloads," developers Ramnandan Krishnamurthy and Sri Chintala said in a blog post last year.
Databricks: Ugh, just look at that messy data lake environment. Squints. You know... we could sort that out with a sweet shot of SQLREAD MORE
Secondly, Azure Synapse Analytics would offer run-time support, that is, the "native integration" of the Azure Cosmos DB analytical store with the various analytics runtimes supported by Azure Synapse Analytics, such as Apache Spark and Synapse SQL serverless.
The idea of running analytics on live transactional data is not new by any means. In the relational database world, just a couple of examples include MariaDB's approach, involving holding columnar data in object storage within the database, and Oracle's Cloud Data Science Platform, which performs analytics directly on its business application databases.
Nonetheless, Synapase link for Azure CosmosDb - allowing punters to directly connect to their Azure Cosmos DB containers from Azure Synapse Analytics and access the analytical store with no separate connectors - was an important milestone for Microsoft, said Noel Yuhanna, veep and principal analyst at Forrester.
"We find a growing demand for near real-time analytics from many vertical industries, including financial services, retail, and healthcare. However, latency in data movement and processing from transactional systems often slows down this initiative."
Talking up the hookup between OLTP (transactional) and OLAP (analytical) systems, he added: "With Azure Synapse link for Azure CosmosDB it brings OLTP and OLAP closer together that will help organisations support real-time analytics with minimal effort," he said.
Microsoft's move to bring another Azure source into the Synapse environment would appeal to customers that are using CosmosDB for high-scale and highly distributed applications, while an additional MongoDB API makes CosmosDB more attractive to those familiar with document NoSQL database, said Doug Henschen, veep and principal analyst at Constellation Research.
"The integration between CosmosDB and Synapse will make it easier to make use of the data in CosmosDB in the data science, data engineering and SQL analytics contexts supported by Synapse," he said.
More Cosmos tweaks
Microsoft also announced the general availability of Mongo v4.0 server support in the Azure Cosmos DB API for Mongo DB service, designed to make it simpler for developers to use the database for error handling with multi-document transaction support and "retriable" writes.
Available in preview is Azure Cosmos DB Continuous Backup and Point-in-Time intended to recover and restore data from any point within the past 30 days. Cosmos DB role-based access control is also available in preview.
In the data warehouse side of things, Microsoft is launching Azure Synapse Pathway, a feature designed to simplify the move from a legacy or cloud data warehouse to Azure.
Constellation Research's Henschen said development promises "to bring popular third-party data sources, such as Teradata, Snowflake, IBM Netezza, AWS Redshift, SQL Server and Google BigQuery into Synapse."
Synapse is pitched against other cloud-native data warehouses including Redshift, BigQuery, and – with its $33bn IPO explosion bringing attention to the market – Snowflake.
Although it is early days for Synapse, "lots of companies and customers are at least kicking the tyres," Henschen said, although he added that it was "unclear how many of these deployments can be described as being in full production."
Synapse also marks itself apart with its use of Spark-powered data lake capabilities with a SQL-based warehouse environment running against shared and consistently secured and managed data, he said.
"It's not the same thing as the BigQuery, Snowflake and Redshift database services, but it represents competition in the sense that it's a cloud option for high-scale SQL warehousing and analytics.
"For now, it's early days for Synapse and I'm not seeing a lot of head-to-head competitive assessments yet. Synapse will clearly be an option that any Azure-centric customer will consider," Henschen noted.
Coming at the problem from the other direction, Databricks last year moved to bring SQL to the data lake environment, with the launch of its SQL Analytics to address BI in its Spark-based Delta Lake. ®