Microsoft weaves Oracle and BigQuery data mirroring into Fabric platform

And knits a graph DB out of LinkedIn cast-offs

Microsoft is extending its Fabric cloud-based data platform by including Oracle and Google's BigQuery data warehouse in its mirroring capability, and launching a new graph database based on an in-house LinkedIn project.

The Redmond software giant first announced Fabric in 2023, and introduced mirroring later the same year, promising that it helped users add and manage existing cloud data warehouses and databases in Fabric's analytics system. It has now added the ability to replicate a snapshot of the external databases from Google and Oracle to OneLake in Delta Lake tables and keeps the replica synced in near real time.

Arun Ulag, chief vice president of Azure Data, said in Fabric, mirroring meant users did not have to extract, transform, and load (ETL) data from supported systems or build and maintain data pipelines.

"The snapshot is necessary to create the first copy, creating a baseline, but from that point onward, Fabric constantly keeps the database up to date," he said. "With less than five minutes latency, Fabric keeps the database, the metadata instance, in sync with the original automatically."

Users might first have to do some groundwork, though, Ulag said. Firstly, Fabric would need permission to the Oracle database, and, if the Oracle database is sitting in an on-prem system or sitting behind a firewall, users need to have a Fabric enterprise gateway behind the firewall to connect into the Oracle database.

"The compute for mirroring is free to the customer," Ulag said. "Microsoft absorbs the cost. We provide storage for the customer, so the customer doesn't have to worry about the storage cost. Our objective with mirroring is to just make data completely accessible, available in open source format, so all of Fabric can add value, all of the AI stack can add value."

The mirrored data store in Fabric uses the Apache Parquet file format and the Linux Foundation's Delta Lake open table format (OTF), which is native to OneLake, Fabric's lakehouse system.

For mirroring, Microsoft has also added support for Apache Iceberg, the OTF that originated with Netflix and was adopted by Google, Snowflake, and Cloudera. Databricks, which built Delta Lake, has promised greater integration between the two formats.

Whether users want to take Microsoft up on its offer might depend on where they start from. For organizations deeply into related products like Power BI and the earlier data warehouse iteration, Synapse, it might be a logical move. Google, AWS, Oracle, Databricks, and Snowflake already have their own interpretations of the lakehouse concept and their users are likely to see the proposition differently.

Microsoft has also announced Graph in Fabric, a low/no-code platform for modeling and analyzing relationships across enterprise data. Ulag explained that the database was developed by the team at LinkedIn, which was acquired by Microsoft in 2016. The graph database will primarily be used for understanding the relationships between data in Fabric, he said. ®

More about

TIP US OFF

Send us news


Other stories you might like