IBM has announced a unified analytics system that allows data scientists to work across multiple data stores in ways the company said should eliminate time-consuming data integration and preparation.
The Integrated Analytics System, launched at the Strata Data Conference in New York, aims to let data scientists develop and deploy models wherever data resides.
Rob Thomas, general manager for IBM Analytics, said he has “declared that machine learning will be a part of everything we deliver”.
This means aiming to automate IT processes across the board - whether that’s the process of matching data or of moving it.
“We want to change the jobs of IT professionals,” he told The Register. “That’s different from eliminating them. We don’t think a data steward should have to spend their time matching data sets; they should be focused on getting the value out of those datasets.”
He acknowledged that achieving complete automation is a “long way” off, but thought they could reach the halfway point fairly soon.
“We have around five per cent of automation [now], and that’s not enough - it creates too much manual effort. I want to get to, in the next year, automating 50 per cent of the process - that would be a major change in terms of time to value.”
Lowering barriers to entry
The Integrated Analytics System's contribution to that ambition starts with the IBM common SQL engine and allows users to move workloads across data stores. Data in public, private or hybrid clouds will all appear to be resident on a single system.
“Right now, we need massive data integration and data preparation,” Thomas said. “That sucks up 80 per cent of the time - we’ve eliminated that.”
Once the data is in place, the Apache Spark implementation embedded in IBM’s Data Science Experience and Spark lets scientists crack on with running and training models.
“Basically, you take the learning curve down dramatically,” Thomas said.
“The number one difference is implying the otherwise complicated problem of having to build a data warehouse, bring data together, ETL, clean the data, choose the data science tool … we’ve integrated all that.”
The system also allows data scientists to collaborate, even if they use different programming languages. “The data science world at the moment is fragmented: the old world, which is people building in SAS, and the new world, with open languages like Java. But this environment is collaborative, so you can [share code with] your teammates.”
The data science framework for the new system began with open-source Apache Jupyter, Thomas said. Watson's come along for the ride, too.
Thomas added that clients can run the lot as a containers orchestrated by Kubernetes. “We have IBM cloud private, which is based on Kubernetes,” he said.
“We’ve made it simple to get started with data science here, but when you think about expanding this to the enterprise, the fact it plugs right into Kubernetes fabric makes it easy to use it in a different system.”
IBM is also planning to help boost its customers interest - and expertise - in machine learning with what Thomas describes as a “data science elite team” comprised of IBM staffers who will work on-site to impart their machine learning skills.
“I think of it as a player-coach role,” Thomas said. “They help them get going, and then over time act more as a coach because [the company has] built up the skills.”
Big Blue started to form the team in July. More details on the programme will emerge in November. ®