This article is more than 1 year old
It must have been love, but it's over now: Rockset tries to break up storage and compute, meet transactional, data-warehouse systems in middle
Propositions database land with a 'real-time' system
Rockset has upgraded its main database engine for more efficient scaling and performance gains by separating storage and compute. The outfit said it also separates the cloud compute resources used for ingesting and querying data.
Built for the job of real-time analytics, the new release is part of the company's efforts to build a new class of database, different from both transactional databases, which process limited volumes of data in real time, and analytics or data-warehousing systems, which can analyse large volumes of data offline, CEO Venkat Venkataramani told The Register. "We are actually actively trying to build a third leg: a real-time database."
The engineering team behind Rockset previously worked at Facebook on the real-time analytics on large workloads.
The approach they took was to build what the company calls a "real-time indexing database", Venkataramani said. That means the database indexes all the data coming into the system in real time, allowing for a one to two-second lag, and all of that data is then visible to queries, applications and dashboards.
The new features include separation of storage and compute. The idea is now common to analytics and data warehousing systems, largely led by cloud-native system Snowflake, but also now a feature of AWS Redshift, Google's BigQuery, Azure Synapse and even Teradata, with its heritage in the on-premises appliance data warehouse systems. Rockset brings the idea to real-time databases and is trying to temp batch analytics types away from them and towards its own tech.
"Our pricing model completely decouples storage from compute, so that you can scale each of them independently," Venkataramani claimed.
Other popular real-time systems, such Apache Druid and ElasticSearch with Kibana, couple storage and compute because they were initially built for on-premises systems, he said.
The second new feature separates ingest compute from query compute. The idea is to avoid these jobs competing for resources, and ensure data does not become "stale" or that query performance is reduced, said the company, which claims Intel, Nvidia and Deloitte among its customers.
Big time schema
The firm is looking to unite other warring factions – the SQL and NoSQL worlds. To start with, it ingests data without a schema, like a NoSQL database, because real-time data sources are likely to be semi-structured, using JSON files, for example.
"Without asking for schema management or database administration, we automatically convert NoSQL into SQL tables in the cloud using a technology called converged indexes," Venkataramani said. "You would write to Rockset as though it's a NoSQL database in the cloud, except it indexes it and exposes all of those datasets as fast SQL tables, fully schematised, fully indexed. So all of your SQL queries with full-featured joins and aggregations, and complex filtering – all the standard SQL features – will come back from real time data without the need for database administration."
In common use cases, Rockset is often coupled with a NoSQL database. For example, if a developer is trying to create a real-time leader board for a massively multiplayer online game, a solution might be to ingest data into MongoDB as a transactional system, but at the same time replicate that data to Rockset for the analytical queries the leader board requires.
Rockset is not the first to tackle the problems of real-time data analytics. SAP has said its in-memory HANA database can be used for these problems in business and engineering environments. From the SQL database world, MariaDB is building features that support analytics for both offline and live data.
Rockset is a proprietary database derived from RocksDB, an open-source system used by Facebook, Yahoo!, and LinkedIn. With its heritage in real-time analytics for web-first industries, Rockset is hoping its fresh approach will convince organisations outside its core customer base that it has a better solution for real-time analytics problems that are becoming more important in other kinds of businesses, particularly as they shift online. ®