Aerospike drags mainframes kicking and screaming into the modern world by feeding their data through Apache Spark

ML, analytics connector reduces memory demand, says database firm

In-memory NoSQL database Aerospike is launching connectors for Apache Spark and mainframes to bring the two environments closer together.

Following on from the release of Aerospike Database 5 in May, the idea is that IT teams can use Aerospike Connect to get data from existing transactional systems hosted on mainframes and exploit it using modern machine learning and analytics tools in Apache Spark. To that end, the Spark 2.4 connector supports streaming APIs for Structured Spark Streaming, which promises low latency for both reads and writes, the company said.

Meanwhile, there is also a connector based on JMS 1.1, a preferred option when integrating and synchronising with mainframe applications, to stream data in and out of Aerospike Database 5.

Bryan Betts, principal analyst with Freeform Dynamics, described the move as "extremely interesting".

He added: "The mainframe world is not the old world: these systems are still fundamental to operations of a lot of organisations. The challenge is bringing in new technology from outside the mainframe world."

According to Aerospike, the speed and low latency of its distributed multi-site clustering database allows users to draw data from mainframe systems for near-real-time analytics without changing or re-platforming the mainframe system.

"Data has gravity: mainframe systems are fundamental to the core operations of many business, holding years and decades of data," Betts said. "If you can get to that without making changes, that could be hugely valuable."

He emphasised that relational databases are not going away. Although the number of databases organisations use is a bit "out of control", a variety of technologies will be necessary, the analyst told The Register.

"I'm not convinced you can standardise on a single database technology. You are going to end up using more than one. You have to be open to the fact that very rarely is there a single tool that will do everything for you. The big companies are trying to cover as many bases as possible. With connectors and the ability to access the same data using multiple tools, the issue of having to have single source, from a large database vendor, kind of goes away."

Srini Srinivasan, Aerospike's chief product officer, told us one of the advantages of using the Spark connector for machine learning and analytics was reducing the demand for memory with Spark.

He said Aerospike had built a "data frame" for Spark to avoid pulling so much data into the analytics environment.

"You don't have to store all the data in Spark: you leave the data in Aerospike, and fetch it as it's needed. This allows the Spark process to access a lot more data, and reduces the amount of memory that Spark is using. Otherwise, you would have to expand the Spark memory footprint by a couple of orders of magnitude." ®

Similar topics

Broader topics

Other stories you might like

  • 381,000-plus Kubernetes API servers 'exposed to internet'
    Firewall isn't a made-up word from the Hackers movie, people

    A large number of servers running the Kubernetes API have been left exposed to the internet, which is not great: they're potentially vulnerable to abuse.

    Nonprofit security organization The Shadowserver Foundation recently scanned 454,729 systems hosting the popular open-source platform for managing and orchestrating containers, finding that more than 381,645 – or about 84 percent – are accessible via the internet to varying degrees thus providing a cracked door into a corporate network.

    "While this does not mean that these instances are fully open or vulnerable to an attack, it is likely that this level of access was not intended and these instances are an unnecessarily exposed attack surface," Shadowserver's team stressed in a write-up. "They also allow for information leakage on version and build."

    Continue reading
  • A peek into Gigabyte's GPU Arm for AI, HPC shops
    High-performance platform choices are going beyond the ubiquitous x86 standard

    Arm-based servers continue to gain momentum with Gigabyte Technology introducing a system based on Ampere's Altra processors paired with Nvidia A100 GPUs, aimed at demanding workloads such as AI training and high-performance compute (HPC) applications.

    The G492-PD0 runs either an Ampere Altra or Altra Max processor, the latter delivering 128 64-bit cores that are compatible with the Armv8.2 architecture.

    It supports 16 DDR4 DIMM slots, which would be enough space for up to 4TB of memory if all slots were filled with 256GB memory modules. The chassis also has space for no fewer than eight Nvidia A100 GPUs, which would make for a costly but very powerful system for those workloads that benefit from GPU acceleration.

    Continue reading
  • GitLab version 15 goes big on visibility and observability
    GitOps fans can take a spin on the free tier for pull-based deployment

    One-stop DevOps shop GitLab has announced version 15 of its platform, hot on the heels of pull-based GitOps turning up on the platform's free tier.

    Version 15.0 marks the arrival of GitLab's next major iteration and attention this time around has turned to visibility and observability – hardly surprising considering the acquisition of OpsTrace as 2021 drew to a close, as well as workflow automation, security and compliance.

    GitLab puts out monthly releases –  hitting 15.1 on June 22 –  and we spoke to the company's senior director of Product, Kenny Johnston, at the recent Kubecon EU event, about what will be added to version 15 as time goes by. During a chat with the company's senior director of Product, Kenny Johnston, at the recent Kubecon EU event, The Register was told that this was more where dollars were being invested into the product.

    Continue reading

Biting the hand that feeds IT © 1998–2022