Databricks wants one tool to rule all AI systems – coincidentally, its own MLflow tool

Turns out people are not that great at tracking thousands of variables

2 Reg comments Got Tips?

American upstart Databricks, established by the original authors of the Apache Spark framework, reckons its open-source machine-learning management engine MLflow is ready for prime time.

The released version 1.0 of the platform focuses on core API components. It improves the handling of metrics and search functionality, and adds support for Hadoop as an artifact store, in addition to the previously supported Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP, and NFS.

It also adds an experimental Open Neural Network Exchange (ONNX) model flavour, and a CLI command for building a Docker image capable of serving an MLflow model.

And finally, there’s Windows support for the MLflow client – in the unlikely event data scientists decide to opt for something other than Linux.

MLflow enables data scientists to track and distribute experiments, package and share models across frameworks, and deploy them – no matter if the target environment is a personal laptop or a cloud data centre.

The company launched the alpha version of MLflow project last year at the Spark + AI Summit.

Multiple code approaches

The basic machine learning life cycle – taking raw data, preparing it, training your model and deploying it – is full of variables and fraught with complications. It can involve hundreds of different open source tools and frameworks, each with dozens of configurable parameters.

Facebook, Google and Uber have all built their own proprietary tools to deal with this complexity.

MLflow was designed to take some of the pain out of machine learning in organizations that don’t have the coding and engineering muscle of the hyperscalers. It works with every major ML library, algorithm, deployment tool and language.


Databricks launches open-source project to drain all your data swamps into info lakes


One of the project’s goals is to improve collaboration between data scientists and engineers that deploy their creations in production.

In a true open source fashion, MLflow users didn’t wait for a stable release to start experimenting: Databricks says the platform has already been deployed at thousands of organizations to manage their machine learning workloads, and the company is offering it as a managed service.

Group effort

Databricks might have started the project, but today, it has more than 100 contributors, including a few from Microsoft.

"People are excited about having an open-source project in this space," Mattei Zacharia, co-founder and chief technologist of Databricks, told El Reg last year.

"They're excited about having an ML platform – it's something that resonates with them, and that many wanted to build already – and having one that is a community effort will be much better than what any company could build on its own."

The next major addition to MLflow will be a Model Registry that allows users to manage their ML model’s lifecycle from experimentation to deployment and monitoring.

You can find full release notes on GitHub, along with the project’s code base. ®


Keep Reading

Machine-learning models trained on pre-COVID data are now completely out of whack, says Gartner

That AI-powered product and price recommendation engine? Useless now

Want to hear our beloved David Attenborough narrate your life? Thanks to the power of machine learning, you can

In brief Plus: AI app for the visually impaired – and Clearview lawyers up

Machine learning helps geoboffins spot huge beds of hot rocks 1,000km across deep below Earth's surface

Large structures were detected as anomalies in seismic waves processed by an algorithm

So many stars, so little time: Machine learning helps astroboffins spot the most oxygen-starved galaxy yet

Don't bother packing your bags for HSC J1631+4426 just yet, it's 430 million light years away

AI in the enterprise: AI may as well stand for automatic idiot – but that doesn't mean all machine learning is bad

Register Debate Is AI just a rebrand of yesterday's dumb algorithms? We present the argument against this motion – and don't forget to vote

Hey, Sparky: Confused by data science governance and security in the cloud? Databricks promises to ease machine learning pipelines

You know the one, that pothole ridden journey from on-prem to the fluffy white stuff

AI startup accuses Facebook of stealing code designed to speed up machine learning models on ordinary CPUs

Neural Magic claims algos in social network's open-source compiler on GitHub look awfully familiar

Europe is falling behind in AI, we need to launch our second machine learning-powered satellite soon, says ESA

Hot on the heels of ɸ-sat-1, the space agency is planning to launch ɸ-sat-2

Biting the hand that feeds IT © 1998–2020