Databricks wants one tool to rule all AI systems – coincidentally, its own MLflow tool

Turns out people are not that great at tracking thousands of variables

2 Reg comments Got Tips?

American upstart Databricks, established by the original authors of the Apache Spark framework, reckons its open-source machine-learning management engine MLflow is ready for prime time.

The released version 1.0 of the platform focuses on core API components. It improves the handling of metrics and search functionality, and adds support for Hadoop as an artifact store, in addition to the previously supported Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP, and NFS.

It also adds an experimental Open Neural Network Exchange (ONNX) model flavour, and a CLI command for building a Docker image capable of serving an MLflow model.

And finally, there’s Windows support for the MLflow client – in the unlikely event data scientists decide to opt for something other than Linux.

MLflow enables data scientists to track and distribute experiments, package and share models across frameworks, and deploy them – no matter if the target environment is a personal laptop or a cloud data centre.

The company launched the alpha version of MLflow project last year at the Spark + AI Summit.

Multiple code approaches

The basic machine learning life cycle – taking raw data, preparing it, training your model and deploying it – is full of variables and fraught with complications. It can involve hundreds of different open source tools and frameworks, each with dozens of configurable parameters.

Facebook, Google and Uber have all built their own proprietary tools to deal with this complexity.

MLflow was designed to take some of the pain out of machine learning in organizations that don’t have the coding and engineering muscle of the hyperscalers. It works with every major ML library, algorithm, deployment tool and language.


Databricks launches open-source project to drain all your data swamps into info lakes


One of the project’s goals is to improve collaboration between data scientists and engineers that deploy their creations in production.

In a true open source fashion, MLflow users didn’t wait for a stable release to start experimenting: Databricks says the platform has already been deployed at thousands of organizations to manage their machine learning workloads, and the company is offering it as a managed service.

Group effort

Databricks might have started the project, but today, it has more than 100 contributors, including a few from Microsoft.

"People are excited about having an open-source project in this space," Mattei Zacharia, co-founder and chief technologist of Databricks, told El Reg last year.

"They're excited about having an ML platform – it's something that resonates with them, and that many wanted to build already – and having one that is a community effort will be much better than what any company could build on its own."

The next major addition to MLflow will be a Model Registry that allows users to manage their ML model’s lifecycle from experimentation to deployment and monitoring.

You can find full release notes on GitHub, along with the project’s code base. ®


Keep Reading

Machine-learning models trained on pre-COVID data are now completely out of whack, says Gartner

That AI-powered product and price recommendation engine? Useless now

Machine learning helps geoboffins spot huge beds of hot rocks 1,000km across deep below Earth's surface

Large structures were detected as anomalies in seismic waves processed by an algorithm

Hey, Sparky: Confused by data science governance and security in the cloud? Databricks promises to ease machine learning pipelines

You know the one, that pothole ridden journey from on-prem to the fluffy white stuff

AI startup accuses Facebook of stealing code designed to speed up machine learning models on ordinary CPUs

Neural Magic claims algos in social network's open-source compiler on GitHub look awfully familiar

Machine learning devs can now run GPU-accelerated code on Windows devices on AMD's chips, OpenAI applies GPT-2 to computer vision

Roundup Plus: AI for the benefit of humankind group loses a member and more

Amazon sticks AI inference chip up for rent in the cloud for machine-learning geeks

re:Invent AWS subscribers, you can forget GPUs (unless you need to train your models)

You know what would look great on our database? Your machine learning model: GPUs and unstructured data on the menu for Exasol as it tries to unify BI and ML

Keeping up in performance stakes vital as data science sector explodes, says analyst

Sponge code borks square AI brains, sucking up compute power in novel attack against machine-learning systems

The inner machinations of my mind are an enigma

Biting the hand that feeds IT © 1998–2020