Cloudera is adding a data science workbench to its enterprise product, based on the offerings of acquired startup Sense.io, which the company bought last year.
The product addition comes as the Palo Alto-based company reportedly prepares for a $4.1bn initial public offering later this year, though official channels are keeping quiet about the matter.
Not about everything, however, with the self-service tool for data scientists being announced as an add-on to Cloudera Enterprise, which is currently in beta, although its price has not yet been announced. It will allow data scientists to use their preferred languages – including R, Python and Scala – and libraries within the Spark- and Hadoop-integrated platform.
Speaking to The Register, Cloudera's head of data science, Sean Owen, declined to start the “holy war” of figuring out the best language workbench was offering, but explained that “all are relevant for data science” bringing together the data and compute platform of Hadoop for large-scale production efforts, typically written with Java, Scala or JVM, and also providing data scientists access to their tools of choice in R and Python within the same ecosystem.
Asked whether the workbench could be used to analyse whether to buy shares in a company's initial public offering, Owen chuckled and told The Register: “That depends on what company we're talking about. Other than that I'd say no comment.”
Workbench will be accessible like other web-based notebook tools Zeppelin and Jupyter, to access codes and scripts, edit them, and execute them without fiddling with the command line or firing up an IDE, Owen explained, which is enjoyable to do on-cluster “because I don't have to copy the data out of the secured parameter” of Cloudera Enterprise.
This will also open up hybrid possibilities for data analytics, according to Owen, who acknowledged that, for instance, Hadoop and deep learning was difficult. “That's easier with Data Science Workbench,” said Owen, with the notebook running on customers' clusters and allowing both Hadoop-native tooling and libraries, but also tools developed elsewhere such as Google's Tensorflow.
Projects to tackle distributed deep learning are around, including Deeplearning4j being commercially supported by Skymind, while Yahoo!'s interest in Hadoop has seen it open-source its TensorFlowOnSpark offering.
Owen told us “Cloudera doesn't have special secret plans” to develop its own machine learning tools. Just like it doesn't have any special secret plans to IPO this year… ®