PaaS + IaaS

This article is more than 1 year old

Informatica hopes to unclog your data pipelines with help from Nvidia in accelerating Spark-based ML operations

Any significant improvement in processing times will be a boon to productivity, say analysts

Wed 17 Mar 2021 // 14:45 UTC

Informatica has announced a serverless, Spark-based data integration engine intended to accelerate data engineering for machine learning in the cloud using Nvidia GPU processors.

Within the vendor's integration platform-as-a-service, CDI Elastic, microservices support different data management activities. The new capability is targeted at extract, transform, load-type (ETL) workloads.

"You may have data sitting in a data lake, and you want to do parallel processing of that data at high performance. That's what CDI Elastic services designed for," said Rik Tamm-Daniels, VP strategic ecosystems.

Informatica is now giving customers the option of running that processing in GPUs, where their cloud provider offers Nvidia processors, which Informatica claimed could offer five times the processing speed for data preparation used in analytics, machine learning, and data science projects.

Also new to the service is support for Nvidia's Rapids suite of software libraries for data science.

"CDI Elastic takes your visually no-code design mappings and translate it into Spark-native code that takes advantage of the parallel processing power of Spark," Tamm-Daniels said. "This new capability takes advantage of what's called the Rapids open-source library that Nvidia makes available to be able to take those Spark-based jobs and optimise them for running on GPUs."

Crucially, Rapids supports Python, one of the preferred programming languages for data science and engineering, and Apache Arrow, which defines a language-independent columnar memory format for flat and hierarchical data, said Mark Beyer, distinguished vice president and analyst with Gartner.

"The fact that Nvidia is putting together the acceleration on the GPUs, with what is basically Arrow, supporting the Python, is interesting," he said.

"One of the areas right now that's difficult in terms of building data pipelines: taking stuff from data science and putting it in production. So, when you formalize those libraries, that's going to accelerate production.

"Hardware acceleration to a data science team is not specifically interesting, but to data engineers and the infrastructure designers, it's like, 'Thank god somebody put some discipline in here'."

Bayer said he had not been able to verify Informatica's performance claims, though machine learning projects spend 90 per cent of their time or more on data engineering, so any significant improvement in data processing could have a large impact on the productivity of data scientists.

Kevin Petrie, vice president of research at analyst firm Eckerson Group, said the efforts of machine learning operations (MLOps) were focused on streamlining the software lifecycle of creating, training, deploying, and monitoring ML models.

The Nvidia/Informatica partnership reaches down the stack to ease a different, but related, bottleneck of performance delays due to massive data volumes, he added. "These delays are choking machine learning and other AI initiatives that suck up lots of data as they train and retrain models for accuracy."

"We should expect more innovation like this as enterprises and vendors take more of a full-stack approach to the operationalisation of machine learning." ®

Topics

Special Features

Vendor Voice

Resources

PaaS + IaaS

Informatica hopes to unclog your data pipelines with help from Nvidia in accelerating Spark-based ML operations

Any significant improvement in processing times will be a boon to productivity, say analysts

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

Intel Gaudi's third and final hurrah is an AI accelerator built to best Nvidia's H100

Los Alamos Lab powers up Nvidia-laden Venado supercomputer

Salesforce apparently poised to slurp data management outfit Informatica

Industrial systems integrating digitalisation

Next Vision, or Vision Next? What we really thought about Google and Intel's AI events

Lambda borrows half a billion bucks to grow its GPU cloud

Intel courts devs with open arms and exotic hardware

What Nvidia's Blackwell efficiency gains mean for DC operators

TSMC boss says one-trillion transistor GPU is possible by early 2030s

Overclocking muddies waters for Nvidia's redesigned RTX 4090 and US sanctions

AI bubble or not, Nvidia is betting everything on a GPU-accelerated future

Nvidia's newborn ChatRTX bot patched for security bugs

About Us

Our Websites

Your Privacy