Rows, columns, and the search for a database that can do everything
Snowflake last week promised analytics and transactions in the same system. For some it was déjà vu all over again
Analysis Under Nevada's baking summer sunshine, Snowflake last week promised it would bring together two ways of working with data that mix about as well as oil and water.
The data warehouse vendor – well known for its stratospheric $120 billion post-IPO valuation – said it would support both analytics and transactional workloads in the same system.
Launched at the Snowflake Summit 2022 in Vegas, Unistore would be the "foundation for another wave of innovation in the Snowflake Data Cloud," said Christian Kleinerman, senior vice president of product. "Similar to how we redefined data lakes and data warehouses for our customers, Unistore is ushering in a renaissance of building and deploying a new generation of applications in the Data Cloud," he said.
The problem with promises of innovation in tech is that they can – just like Snowflake's market capitalization – be deflated: the company is currently valued at $38 billion.
Snowflake's row-based storage engine is to support analytics on transactional data. Only available in preview, Unistore would allow developers to build "a data pipeline to pull all that data into Snowflake [and then] everything's in Snowflake and it's easy to manage," Carl Perry, director of product management, said on a media call. Or they can develop transactional applications directly in Snowflake's platform, which includes the developer framework Snowpark.
But for some, the promise of "innovation" rang hollow. Following Snowflake's announcement, Domenic Ravita, product marketing veep at database company SingleStore, took to Twitter to point out his company had a patent on an approach that, at first glance, might look similar to Snowflake's.
Talking to The Register, he explained that in 2019 SingleStore had launched the first version of the SingleStore database to support both data structures – row store and column store – in a single table type in the database. "Why that matters is you just create table and you get the benefits of OLTP and OLAP together with the data structure and the tiered storage automatically," he said.
SingleStore counts Uber, Kellogg's, and engineering giant GE among its customers. The company was founded in 2011 as MemSQL by former Facebook and Microsoft engineers Adam Prout (CTO) and Nikita Shamgunov, who remains on the board but is also CEO of Neon, which supports serverless Postgres. The first product was an in-memory transactional database bearing the same name, released in 2013.
Ravita said that in 2014 SingleStore began working on an in-memory row store and an on-disk column store with tiered storage, "meaning transactions hit memory first and then they roll off to disk storage."
Part of the reason was to regain control over the proliferation of database categories which have populated the modern stack as it creaks under the scale of global internet-based applications.
"We need a database for just searching text: Elastic Search. We need a database just for scaling read volumes: we use Redis. We need a database just for documents, catalogs, and tweets: we use MongoDB or Couchbase document. The problem with that is that now if you have a modern SaaS application, underneath that you have a complex collection of databases. In a way, you've accidentally stumbled into creating your own distributed database out of these other databases and now you are a database designer by accident," Ravita said.
SingleStore's approach to supporting both transactional and analytics workloads on a single data store is now called Universal Storage and it was awarded a US patent in July 2021.
Ravita said that, regarding its patent, the company would take a wait-and-see approach to Snowflake's Unistore.
"It's not very clear, and [Unistore] is not available yet. We're waiting to see what's next there. We've invested more than eight years in our technology and our patent was awarded last year, so we'll see. But our first response is: welcome to the party and may the best database win."
Snowflake has declined the opportunity to take part in this article.
The appeal of using a single database for different workloads is not just in the simplicity of the design and support. There is also an economic driver, particularly with the advent of cloud computing where users can end up paying for data movement, storage, and processing, said Ravita.
The point is supported by GigaOM. The research firm's field test showed SingleStoreDB offered a 50 percent saving over three years compared to Snowflake-MySQL stack and a 60 percent saving over the same period compared to AWS Redshift-PostgreSQL stack. Meanwhile, its TPC-H workloads were 100 percent faster than Redshift.
Regardless of performance, SingleStore is not the only company making claims about doing analytics from a transactional database. For example, MongoDB column store indexing for its document database to help developers build analytical queries into their applications.
Oracle has its Heatwave product for MySQL, which, running on Oracle Cloud Infrastructure, helps customers run analytics on transactional applications without having to export data to a specialist analytics system such as Teradata, Snowflake, or AWS Redshift.
Meanwhile, SAP has talked about real-time analytics since 2011, and bases its concept around its in-memory database HANA, which supports the latest iteration of SAP's enterprise applications.
Ravita said that SAP HANA moved data "under the covers" between data storage types with the database architecture. That movement, he said, is "in the path of the hot transactions."
"As far as we know, we are the only production database used by customers on the planet that unifies transactions and analytics in a single storage type."
A SAP spokesman said: "We don't move data between 'stores,' creating multiple copies of data. We have one source/copy of data that is optimally stored for transactional and analytical performance which is what matters."
Andy Pavlo, associate professor of databaseology at Carnegie Mellon University, said that while Snowflake's claims of analyzing live transactional data may not quite stand up, SingleStore is not the only combined database.
- NoSQL player Aerospike links up with Starburst for SQL-based access to edge data
- Cassandra vendor DataStax secures $115m investment for $1.6b valuation
- Database from the 1980s needs time travel says author
- MongoDB wants to grab work from other databases
Snowflake is introducing Hybrid Tables, which, it said, "offer fast single-row operations and allow customers to build transactional business applications directly on Snowflake… [and] enable customers to perform swift analytics on transactional data for immediate context."
Pavlo said: "At a high level, Snowflake and SingleStore – and others – are doing the same thing. They use a row store for transactional updates and then a column-store for data that targets analytic queries."
"The fact that Snowflake calls Unistore tables 'Hybrid Tables' is telling. That means they likely store the data in both row and columnar format. They are likely appending updates in log-structured storage specific to Unistore and then moving batches to their existing columnar storage," he added.
The approach is similar to Vertica's write-optimized storage (WOS), which has been available since the late 2000s, while Google's new Napa DBMS from 2021 is doing something similar, the academic said.
"From what I can tell, Snowflake's Unistore is not doing what SingleStore describes as their 'Universal Storage' architecture, despite similar naming," he said.
However, he cautioned SingleStore about posturing on the basis of intellectual property rights. "Although I have not read SingleStore's patent, if I were them, I wouldn't whip it around in a threatening manner like that. I do not think their patent claims would hold up in litigation, assuming it matches what they describe in their blog. A good lawyer should be able to get it invalidated. There is plenty of prior art on using a single storage representation of a database to support both transactional and analytics that predates the patent and SingleStore's implementation. Notable academic implementations are TUM's HyPer from 2015 and Saarland's OctopusDB from 2010," Pavlo said in an email.
He also pointed out that in the SingleStoreDB approach, the row store is log structured, which could make reads more expensive. "This is not unique to Unistore; all log-structured storage managers have this issue," said Pavlo, who is also CEO of automated database tuning company OtterTune.
Nonetheless, there were real advantages to executing both workloads in a single store, he said, as long as the analytical queries don't interfere with the operational transaction workload.
But the advantage only holds for the use case where the analytical data is from one transactional system rather than a data warehouse combining data from multiple upstream databases, as they often do.
Doug Henschen, vice president and principal analyst with Constellation Research, said one thing the database vendors – including Oracle, MongoDB, and Snowflake – had in common was a desire to broaden their capabilities to prevent customers from turning to third-party products to satisfy their needs.
"The rise of 'modern cloud-native apps' has increased demand for such versatility. However, customers will still be choosing their database/database service based on their primary use case and need. MongoDB, for example, will still appeal primarily to developers seeking an agile platform for application development. It now has a fairly compelling set of capabilities for operational analytics, but it's not a SQL data warehouse/mart platform capable of being foundation for BI and analytics.
"Conversely, Snowflake is saying outright that it's not going after the traditional transactional database market, it's quite literally calling them 'Native Apps' where there's a need for transactional and analytical data together in real time."
In that sense, the vendors were talking to their own customers, rather than the broader market, Henschen said. "Each database vendor is promising their customers that they'll be able to do more with their product, but I don't see it as changing the fundamental use case or initial buyer of the product, even if use cases and user populations expand a bit as these new capabilities mature."
While that might be good news for Snowflake customers, it falls short of the "wave of innovation" promised. Perhaps that's what happens when you try to build a Snowpark in the desert. ®