Kafka-flinger Confluent has had another crack at persuading relational holdouts that stream processing isn't all that scary, by way of the SQL-like ksqlDB.
We spoke to one of the company's technologists, Ben Stopford, at this month's Big Data event in London, where he tried to persuade us of the joys of the tech and prise this hack's claws from the familiar RDBMS of old.
Apache Kafka is all about dealing with streams of data – think Ubers pootling around a city or data flowing in from sensors. Dealing with those streams has historically required a change in mindset, and Kafka flinger Confluent had previously attempted to ease the pain with KSQL.
KSQL, which took a bow in 2017, was aimed at giving SQL coders something a little more familiar to work with in terms of syntax, rather than having to learn a new language like Python, for querying streams.
Crossing the streams
However, the underlying concept was still a little alarming – rather than the one-time query of relational databases of old, a KSQL query ran continuously, transforming streams that passed through in Kafka topics.
The new release of KSQL, ksqlDB, builds on this with the concept of pull queries, giving devs that cosy RDBMS feeling as data is pulled in for a given point in time. Adding
EMIT CHANGES to the query will return to a constantly updating stream of data, or a push query.
The latter is how KSQL traditionally worked, while the former makes the, er, occasionally Kafkaesque world of Kafka a lot less worrying for those trying to get to grips with the tech.
The team has also made hooking up external data considerably easier to deal with, with ksqlDB allowing users to control Kafka Connect connectors.
The result is a considerable simplification for developers more used to a SQL world when building event streaming applications.
Of course, even without ksqlDB, the basic blocks may actually be quite familiar. "Stream processing," said Stopford, "is a database turned inside out… it has a commit log, storage and indexes, but it's all wrapped up inside one box… Stream processing has all the same building blocks, but they're turned the other way around."
Indeed they are, hence the need for something like ksqlDB to allow SQL fans to get their collective heads around the tech.
Stopford went on to point out that streaming data could make a lot of sense in larger teams rather than static designs although did admit that the initial performance might not match up to the expectations of users accustomed to the decades of tuning enjoyed by the likes of Microsoft's SQL engine or Postgres.
Having got a ksqlDB development environment up and running, we'd have to agree, at least with the latter remark. Certainly, the basic query language is familiar enough, but a "powerful feature set"? Compared to the maturity and features of modern relational databases, ksqlDB is definitely a version 1.0 product.
However, that version 1.0 may be all that is needed, with the pull query functionality feeling very familiar to those used to a traditional relational database and preferring a result that is true "now" rather than continually streaming.
Developers thinking about dipping a toe into the streams of Apache Kafka will therefore find ksqlDB a handy stepping stone. ®