Apache Spark may be the fastest data processing engine around for big data, but unless you are conversant in Scala or Java, this cluster computing framework can be a pain to set up and manage.
So here is some help from MemSQL, the in-memory database start-up: a way of letting organisations use Spark without writing code, the company says.
The company today released Spark Streamliner, described as a one click deployment of integrated Apache for fast installs and a single Web-based UI to managed multiple data pipelines. The software is open sourced and available via Github.
These days big data and analytics is all about processing data in real or near-real time, and Spark is the enabling tool to eliminate nowhere-near-real-time batch ETL, says chief marketing officer Gary Orenstein. His company is “backing Spark one hundred per cent”.
And the datasets are getting bigger by the day. The company cites the case of Pinterest, which is already using Spark Streamliner to process 72TB of data a day – or 1GB/sec. Other users/ use cases are not revealed, but include an oil exploration company that is processing reams of sensor data to conduct predictive real-time analytics, according to Orenstein.
MemSQL sits as the data store on top of Apache Spark and makes its contribution to speed by storing and serving data using memory, compared with traditional relational databases which use slower disk storage. But in one key aspect it is a traditional RDBMS in its use – as the company name suggests - of familiar, cosy, ubiquitous SQL.
Throw as much CPU horsepower at the data as they can
Founded in 2011 MemSQL is a 100-person company, venture-backed and shy in disclosing revenues. So it is both a stripling and a minnow compared with the traditional database kings, Oracle and IBM, as well as the enterprise giant SAP, which is chalking up big sales for the new-ish HANA analytics line.
Also in the Forrester Wave report-cum-league table for in-memory database platforms, released August 2015, MemSQL is ranked at the head of the chasing pack behind SAP, Oracle, IBM Teradata and Microsoft.
But the company thinks it has detected a soft underbelly in the market leaders – for starters they can be extremely costly and customers are “essentially handcuffed” to expensive SGI and Exadata machines in SAP Hana and Oracle installations, Eric Frenkiel, CEO and co-founder, says.
In contrast MemSQL deploys a horizontal scale-out approach using commodity hardware and prices by the amount of DRAM used to store the data – a welcome shift in the market, more used to paying by CPU cores, according to Frankiel. These two pillars, encourage customers to “throw as much CPU horsepower at the data as they can,” he says. ®