Structured data, unstructured data: It shouldn't matter, says Google
Apache Spark comes to BigQuery while BigLake gets Iceberg support
GCN Google is promoting updates to its cloud-based data management portfolio with the ambition of bringing analysis of structured and unstructured data closer together.
In its cloud-based data warehouse, BigQuery, the Chocolate Factory is announcing support for unstructured data which users can analyze with adjacent capabilities in ML, speech recognition, computer vision, translation, and text processing using BigQuery's familiar SQL interface.
Also new to the data warehouse is support for Apache Spark, the open source analytics engine commonly used for data exploration and ML in data lakes. Google said the move would allow users to create BigQuery stored procedures using Spark that integrate with their SQL pipelines.
Meanwhile, BigLake, Google Cloud's data lake product, would support table format Apache Iceberg, with Databricks format Delta and Hudi streaming coming soon.
Gerrit Kazmaier, veep and general manager of Google data analytics, told The Reg that with its Open Data Cloud Google wanted to unify "data across all data formats, across all clouds, all possible workloads, and for all sides of analysis."
"Secondly, the Open Data Cloud enables a connected ecosystem. It is really a platform for data vendors to build their data applications on top of and innovate together with us on behalf of our customers," he said.
- Loads of PostgreSQL systems are sitting on the internet without SSL encryption
- PostgreSQL 15 promises to ease Oracle and SQL Server migrations
- Google previews streaming connector for BigQuery
- Teradata takes on cloud-native rivals with data lakes, MLOps
Keen observers might see a trend here. Cloudera, the data lake company which grew out of the Hadoop world, has also announced support for Iceberg. Data warehouse stalwart Teradata has announced a data lake as part of its data platform. And towards the end of 2020, Databricks announced support for SQL in its data lake as part of a plan to support BI-type workloads in its environment.
But just because users can move data warehousing and data lake jobs into the same environment doesn't mean they will, Miles Ward, CTO of data consultancy SADA, told The Register.
"The strategy here isn't necessarily to move towards one strategy or the other because that runs the risk of trying to accommodate a 'one size fits all' approach that can make things difficult, but it's to use the core strengths of Google's engineering to create a flexible offering to better meet customers needs.
"The key here is that the architecture allows us to pick and choose each option that's needed, and lets the customer's business needs dictate what the platform looks like, and not the other way around." ®