Updated Enjoying an avalanche of interest since its $33bn IPO, cloud-based data warehouse slinger Snowflake is promising support for unstructured data, ETL within its data cloud, and partners in its data market.
Launched in June, Snowflake's Data Cloud is designed to bring corporate data into one place for analysis and governance, although that has been the aim of data warehousing since it was born back in the 1980s.
Nonetheless, Snowflake, founded in 2012, says it is bringing a new "developer experience" to its Data Cloud, available across public hyperscalers AWS, Azure, and GCP. This means it is allowing data engineers, data scientists, and developers to build ETL/ELT, data preparation, and feature engineering into the Snowflake environment rather than using third-party tools.
Snowflake has made its name as a data warehouse rebuilt for the cloud. It separated storage and compute, and allows users to create virtual warehouses as MPP compute clusters, composed of multiple compute nodes allocated by Snowflake from a cloud provider. But the core database is relational, like that of data warehouse stalwarts Teradata, IBM Netezza, and Oracle, and as such it is designed for structured data.
This week, Snowflake announced support for unstructured data such as audio, video, PDFs, and images. It has not said whether this will be through support for secondary NoSQL database technology or supported with the RDBMS, as Oracle says it does. It has yet to respond to The Register's request for more detail on this point.
The company is also adding data services to its Snowflake Data Marketplace. The point of the marketplace is to make it easier to ingest third-party data, into the analytics environment, as it is all in Snowflake's architecture. New here is allowing third-party service providers to enrich data by running risk assessments, augmenting a data set with behavioural scoring, or "simply outsourcing the more advanced analysis" without having to move the data, according to Snowflake.
Philip Howard, research director at Bloor Research, said Snowflake was not the only company integrating data connectivity within the data warehouse. He questioned what it would mean for data integration vendors that have built up a business around Snowflake such as Fivetran and Matillion. He also asked whether data engineers would want to be writing code or would prefer a no-code environment.
Howard said adding service providers to the Data Cloud might be useful to people who want enrichment such as address verification, but doubted whether users would be interested in outsourcing their analytics in the environment.
"It could be valuable, but how many people are going to outsource their predictive analytics? I don't know about that," he said.
Snowflake's support for unstructured data might benefit users who want light workloads in a hybrid environment, but heavy users of video analytics, for example, might want something more specialist, Howard said.
Here, too, Snowflake is far from unique. Just last week, Databricks promised support for structured SQL workloads in its data lake environment. As far back as 2012, Teradata has been offering support for unstructured data in its Aster analytics environment by integrating Hadoop. ®
Updated to add
According to Snowflake, support for unstructured data starts with the fact its storage layer is built on object storage such as AWS S3.
Christian Kleinerman, SVP of product, told The Register: "This gives Snowflake the ability to store files directly in its storage subsystem in addition to regular structured and semi-structured data." That's just storage, not analysis.
But he added that Snowflake's Stages offers a "way to specify a bucket of cloud object storage".
"Stages is today used for — typically temporary — storage of files to ingest into Snowflake such as json, parquet, csv," Kleinerman said. "We are expanding the concept of Stages to allow for permanent storage of any type of file. In terms of operations, Snowflake will allow querying metadata of such files, such as file size or name, and retrieve URLs to enable consumption of file content by applications."
So, although Snowflake is helping applications address unstructured data, it is not storing unstructured data directly in its database.