This article is more than 1 year old
Google previews streaming connector for BigQuery
All your data on one platform, opines the Chocolate Factory
Google claims a new streaming service will help developers get data into its BigQuery data warehouse from adjacent transactional systems.
Available for preview, Datastream for BigQuery is designed to offer developers the ability to replicate from operational database sources such as AlloyDB for PostgreSQL, PostgreSQL, MySQL, and Oracle, directly into BigQuery, the cloud data warehouse system based on Google's distributed file system Colossus.
Google said it envisions a "unified data cloud, combining databases, analytics, and machine learning into a single platform that offers the scale, speed, security, and simplicity that modern businesses need," according to a blog publicizing the new service.
Datastream employs a serverless, auto-scaling architecture which allows users to set up an ELT (Extract, Load, Transform) pipeline to replicate data from the source OLTP system into BigQuery in more or less real time. The resulting analytics is intended to inform users of business circumstance and help forecast what might happen next.
The serverless service also uses Google's Change Data Capture (CDC) and the Storage Write API's UPSERT functionality to replicate updates directly from source systems into BigQuery tables such that data engineers and developers do not have to build and manage complex data pipelines, staging tables, merge logic, or manual conversion from database-specific data types into BigQuery data types.
- Oracle floats its HeatWave system into Amazon's cloud
- DuckDB, database wrangler used by Google, Facebook, and Airbnb, hits 0.5.0
- Microsoft offers SQL Server 2022 release candidate to Linux world
- Open source databases: What are they and why do they matter?
"Just configure your source database, connection type, and destination in BigQuery and you're all set," Andi Gutmans, veep of database engineering, said in the blog. "Datastream for BigQuery will backfill historical data and continuously replicate new changes as they happen. And as database schemas shift, Datastream seamlessly handles schema changes and automatically adds new tables and columns to BigQuery."
Organizations that do their data munging and analysis in Google Cloud may see the sense in the offer, but some vendors will try to persuade developers to crack this particular nut in other ways. Snowflake, for example, has its Snowpipe feature, which it first previewed in 2017. Amazon has something called AWS Glue. Others ask why move the data at all, and encourage users to perform analytics in transactional systems, as Oracle does with MySQL HeatWave, now available in AWS, and as The Register debated here.
Other news from the Chocolate Factory's cloud division includes role-based access control to Google's OLTP database service Spanner.
"With capabilities such as a built-in audit trail and context-aware access, Identity and Access Management makes it easy to grant permissions at the instance and database level to Spanner users," product manager Mark Donsky said in a blog.
Google is also offering free trial instances of Spanner. ®