Google is promising to capture data logs from Oracle and other on-prem SQL data systems for monitoring, data integration and ML pipelines.
Among the Chocolate Factory’s latest concoctions is Datastream, designed as a new serverless service to catch changes in data and replicate data where desirable.
Gerrit Kazmaier, the general manager and vice president for databases, data analytics and Looker at Google told The Register the system works “directly with the logical database logs” to understand the state of the data, inserts, deletes and updates.
“It’s not incurring any overhead on the source system because we are not probing it for changes; we are understanding the changes in its own format. Also, serverless infrastructure means that there is no burden of managing the systems,” he said.
- Would you Looker that: Google bakes in BigQuery caching layer to boost BI interactivity
- AWS Free Tier, where's your spending limit? 'I thought I deleted everything but I have been charged $200'
- Oracle intros Arm-powered cloud, includes on-prem option for big spenders
- Graph databases to map AI in massive exercise in meta-understanding
- Rapping otters and automated database knob-twiddling: An obvious combination in some universe or other
Organisations might want to do this to analyse data changes on one of Google’s tools, or replicate the data, or introduce triggers to replicate data for ML data pipelines, Kazmaier said.
It is available on preview for Oracle and MySQL, either in the cloud or on-prem. Google has a “roadmap” for introducing the service to most other mainstream RDBMS’s this year.
'Google is saying: if you do it our way, we will make it as easy as possible'
Michele Goetz, vice president and principal analyst with Forrester, said the promise of DataStream was that it reduces the compute, battery, latency, and network bottlenecks by reducing the footprint of data to what's needed for an analytic model or event rather than pushing an entire data set and complex query.
“Incumbent data warehousing can stay in place and is incrementally updated. For strategies that maintain on-premise components due to cost and security, these systems maintain their lifespan. But they also gain in value with DataStream, which creates a bridge between traditional systems and modern intelligent operational environments and use cases,” she said.
Google is also introducing Dataplex as an “intelligent data fabric” for governance across data systems and Analytics Hub designed to cut-and-paste analytics models built with Google’s Looker tools.
On the latter, Kazmaier said: “You're not only sharing raw data sets, but you share Looker models, and Looker blocks directly associated with the data set. So, when someone is receiving that, they are not starting from the raw data again, they can leverage all of the semantic and the analysis that you have built.”
Philip Carnelley, associate veep, software research at IDC Europe, said there were advantages to bringing visibility, governance and integration of data assets onto a single cloud platform but users would have to submit to “the Google way” of doing things”.
“You’re probably going to use BigQuery in preference to Snowflake for example, but all of these things can be made to work together. It is perhaps slightly less open than Amazon. Google is saying, ‘if you do it our way, we will make it as easy as possible'. That’s their philosophy, whereas Amazon is a bit more of a toolkit,” the analyst said. ®