Thanks for the extra memories, folks: Say hi to GridGain
Distributed in-memory parallel processing nodes face off against storage-class memory
GridGain Systems software provides an in-memory facility for running transactions, streaming and analytics applications using clustered x86 server nodes in a grid defined by a distributed, massively parallel architecture.
It says its software enables such applications to run thousands of times faster than on disk-based systems and also faster than on SSD-based systems.
GridGain was founded in 2007, received a $2.5m seed funding round in 2011 and a $10m A-round in 2013.
The company donated its base software to the Apache Software Foundation in 2014 where it became the Apache Ignite project. Ignite’s first software release took place in 2015.
GridGain’s software integrates a compute grid, data grid, service grid and streaming grid into a single system with Hadoop/Spark integration. It has a unified API that covers Java, .NET and C++ and is a distributed, object-based, ACID transactional, in-memory key-value store.
Data is stored (cached) in memory. As well as a data grid Ignite also provides a compute grid for parallel in-memory processing. There is a service grid giving users control of how many instances of a service can be deployed on each cluster node, to provide fault tolerance and continuous availability of deployed services in case of node failures.
This GridGain software works on top of existing databases – Oracle, MySql, Postgres, DB2, Microsoft SQL – so you can carry on using them if you need to move your database app to work in-memory. According to a white paper (PDF) Ignite “e automatically generates the application domain model based on the schema definition of the underlying database, and then loads the data.”
It supports various interfaces to the data: ANSI SQL; key/value stores; SQL access; MapReduce; HPC/MPP processing; streaming/CEP processing; and Hadoop. This is far wider, Gridgain says, than a typical in-memory data base product.
There is Hadoop acceleration with a “dual-mode, high-performance in-memory file system that is 100 per cent compatible with Hadoop HDFS, and an in-memory optimised MapReduce implementation.“ These can deliver up to a 100x improvement over disk-based Hadoop implementations.
The clustered nodes in GridGain are based in Java Virtual Machines. Nodes automatically discover each other so a cluster can be scaled without a restart.
Version 7.5 of GridGain's software
There are community and enterprise editions of the GridGain software.
The newly-announced v7.5 GridGain Enterprise Edition is twice as fast as the previous version. It has a deadlock-free transactions feature so more users can access the same data pools at the same time. Support for .NET and C++, previously available only in GridGain Enterprise Edition, has now been incorporated into GridGain Community Edition 7.5 and Apache Ignite 1.5.
Support for OSGI containers has been added plus support for compact protocol, allowing dynamic schema changes and SQL indexing on the fly, without deserialisation. There is also added support for Apache Camel and Flume, and automatic data ingest from Twitter or MQTT.
In-memory media challenges
There is no dispute on three issues:
- in-memory processing makes applications run extremely fast
- Memory (DRAM) is much more expensive than disk and flash
- At some stage in-memory data has to be persisted to disk or flash and that us slow
These points tend to restrict in-memory processing to high-value, compute-intensive work with relatively small working sets of data. A server grid, such as GirdGain’s gets over the working set size restriction but not the DRAM cost hurdle.
Storage-class memory media provide near-DRAM speed and sub-DRAM cost by using non-volatile media with faster interfaces than SAS or SATA to disk or flash. For example, NVMe over a PCIe bus or, even faster, DIMM memory bus sockets.
By loading non-volatile DIMMs or PCIe cards with NAND or, later this year and in 2017, faster 3D Xpoint memory, then it will be possible to have large memory-space execution facilities for applications running at near-DRAM speed that don’t need specific write-to-persistent-storage calls in their code.
This will poise a problem or opportunity to suppliers such as GridGain. The threat could be that their in-DRAM memory market is constrained in size by the rise of an in-NAND or in-3D Xpoint in-memory market.
The opportunity is for them to migrate their in-DRAM memory technology software to run inside in-NAND or in-3D Xpoint in-memory systems and so help proliferate the use of very much faster application processing. Will they take it? We think, impelled by open source devotees, they will. ®
PS. One afterthought. How will SAP HANA react to 3D XPoint memory?