After more than a million downloads since its introduction two years ago, CrateDB, an open-source distributed SQL database suited for real-time analysis of machine data, has reached to its 1.0 release.
And while the door remains open to technical talent from abroad, Crate.io, which began with offices in Berlin, Germany, and Dornbirn, Austria, is opening its new headquarters in San Francisco, California.
Unlike traditional SQL databases, CrateDB has been designed to operate as a cluster of containers. Its ability to redistribute data as its cluster changes size makes it well-suited for container platforms like Docker, Kubernetes, and Mesos.
It uses elements of Elasticsearch under the hood to manage cluster state, sharding, and replication, alongside other open source technologies like Lucene, Netty, and the Presto SQL parser. Crate.io suggests CrateDB represents an alternative to NoSQL machine data management systems composed of Splunk, Elasticsearch, and Cassandra.
With columnar field caches and a distributed query planner, CreateDB can handle complex queries in real-time. And its distributed SQL query engine makes it adept at data operations like joins and aggregations across clusters.
The 1.0 release bring support for Postgres wire protocol, outer joins, sub-queries, schema and metadata discovery, and trigonometric, percentile, and conditional functions, among other improvements.
According to Christian Lutz, CEO of Crate.io, 75 per cent of the company's customers use CrateDB to manage machine and IoT data. The company claims CrateDB can handle several million insert operations per second (1.5M/sec on a 14 node cluster).
To analyze geospatial and temporal data emanating from its customers' industrial sensors, Space-Time Insight turned to CrateDB.
"It allows us to write and query sensor data at more than 200,000 rows per second, and query terabytes of data," said CTO Paul Hofmann in a statement.
In an email to The Register, Hofmann said CrateDB appealed to the company for reasons beyond transaction speed. "Crate handles [geospatial and temporal] data particularly well and is optimized for geospatial and temporal queries," he said. "We also get image (BLOB) and text support, which is important for our IoT solutions, as they are often used to capture images on mobile devices in the field and provide two-way communication between people and machines. Crate is also microservice ready — it runs on Docker and we've Dockerized our IoT cloud service, for example."
CrateDB also made sense for Space-Time because the company's SI Studio platform relies on Java and SQL. Hofmann said it was important to be able to utilize existing internal skill sets.
Hofmann declined to share details about internal testing to evaluate database options. "But it's fair to say that typical relational databases can't handle anywhere near that rate of ingestion that Crate can," he said. "And, even databases similar to Crate don’t match the performance we're seeing." ®