Analysis After securing a lofty position in enterprise applications and databases, Oracle has fixed its eyes on data science. And though analysts have expressed doubt about whether Big Red is producing technologies new to the field, its shiny Cloud Data Science Platform might appeal to those already heavily invested in Oracle's software.
Big Red's pitch is that it will bring cohesion to efforts in data science, allowing practitioners to "collaboratively build, train, manage and deploy machine learning models".
Via the Oracle Cloud Infrastructure Data Science service, data scientists will, the vendor said, be able to automate algorithm selection and tuning, automate predictive feature selection, evaluate models and make machine-learning models explainable to the outside world.
What might puzzle the market was Oracle's decision to include a full Cloudera Hadoop implementation in the platform
The idea is that teams of data scientists can work together in a single environment to build, train and manage new machine-learning models using popular ML language Python and other open-source tools and libraries including TensorFlow, Keras and Jupyter.
According to Oracle, it is integrating machine learning with its Autonomous Database, reducing the time taken in data preparation and movement. It is also selling "auto ML" tools to ease model selection and model optimisation. It provides model catalogues so data scientists can make their models available to other users, even those outside data science such as app developers or business analysts.
Lastly, the service has tools to help data scientists monitor a model's effectiveness and update it on the fly while applications consuming the model's output are still running.
All this sounds nice. The problem is, many of these features already exist in some form or other from other vendors and have been on the market for a while.
When world+dog has a data platform too, Cloudera has to stand out before next new shiny distracts investorsREAD MORE
Since 2018, for example, Teradata has sold management of machine learning models in its Vantage platform. Meanwhile, H2O.ai launched tools which automate many of the processes in developing machine-learning models in its Driverless AI product in 2017. Microsoft also offers its Azure Machine Learning Studio, which it says streamlines the machine learning lifecycle, from building models to deployment and management. Google and AWS have similar tools. In such a crowded market, users might question if Oracle's announcement really stands out, and if integration with the world's most popular database counts for a great deal.
Moving into the field
IDC AVP software researcher Philip Carnelley said Oracle's new data science platform stems from its 2018 buy of datascience.com, which developed a platform designed to centralise data science tools, projects and infrastructure in a fully governed workspace. "It's been long time coming, and in the meantime, their competitors have been working on their own technology. What Oracle has been doing is to industrialise the technology they acquired: make it more robust and integrate it with the rest of the Oracle tools. In and of itself, there is probably nothing that you could not get elsewhere."
But the idea of a data science platform that works seamlessly with Oracle databases and analytics technology will appeal to died-in-the-wool customers of Big Red. Those organisations already invested in the Oracle technology might find better performance because they can run machine-learning models directly on the database, rather than extracting data into another data store to run the algorithm.
"That gives you very fast response times and supports real-time analytics for prediction fraud, for example. If you were to suck it out and run it on another data to analyse on Spark, that might be slower," Carnelley told us. Outside Oracle's core customer base, he added, the decision to go with its data science platform may depend on the organisation's confidence in its skills.
"If you were looking at a new problem and debating whether to load data into AWS S3 and use the Amazon machine-learning tools, then it might come down to the skills set. If you have data scientists that are very IT literate and like using Amazon's tools then that would be fine. If, on the other hand, your data scientists are a bit more on a business level, then you might find that the Oracle technology is easier to get to grips with.
"Each of the individual features, like hyper parameters and auto-tuning, many other companies support, and you can get them open source too. But packaging them together and offering them to support collaboration will appeal to some customers."
What might puzzle the market was Oracle's decision to include a full Cloudera Hadoop implementation in the platform. Hadoop has been losing its allure in recent years, as Cloudera's difficulties attest.
Other vendors building cloud-based data science and analytics platforms – including AWS, Azure, Google Cloud Platform and Teradata – favour object storage for ingesting unstructured, high-volume and high-velocity data. The Oracle platform can query object storage, but the vendor doesn't make clear whether it ingests data into object storage directly in the new data science platform.
Oracle has said its own object storage technology does integrate with other platforms, and its documentation says it can be used "as the primary data repository for big data". So why favour full Hadoop integration?
Oracle's partnership with Cloudera dates back to 2012. The result has been a bucketful of services built on joint technologies, but users might question how relevant they are to the next generation of big data and machine-learning technologies. ®