Achieving production-level governance with machine-learning projects currently presents unique challenges. A new space of tools and practices is emerging under the name MLOps. The space is analogous to DevOps but tailored to the practices and workflows of machine learning.
Why MLOps is Needed
Machine learning models make predictions for new data based on the data they have been trained on. Managing this data in a way that can be safely used in live environments is challenging, and one of the key reasons why 80 per cent of data science projects never make it to production – an estimate from Gartner.
It is essential that the data is clean, correct, and safe to use without any privacy or bias issues. Real-world data can also continuously change, so inputs and predictions have to be monitored for any shifts that may be problematic for the model. These are complex challenges that are distinct from those found in traditional DevOps.
MLOps Not Just DevOps
DevOps practices are centred on the “build and release” process and continuous integration. Traditional development builds are packages of executable artifacts compiled from source code. Non-code supporting data in these builds tends to be limited to relatively small static config files. In essence, traditional DevOps is geared to building programs consisting of sets of explicitly defined rules that give specific outputs in response to specific inputs.
In contrast, machine-learning models make predictions by indirectly capturing patterns from data, not by formulating all the rules. A characteristic machine-learning problem involves making new predictions based on known data, such as predicting the price of a house using known house prices and details such as the number of bedrooms, square footage, and location. Machine-learning builds run a pipeline that extracts patterns from data and creates a weighted machine-learning model artifact. This makes these builds far more complex and the whole data science workflow more experimental. As a result, a key part of the MLOps challenge is supporting multi-step machine learning model builds that involve large data volumes and varying parameters.
To run projects safely in live environments, we need to be able to monitor for problem situations and see how to fix things when they go wrong. There are pretty standard DevOps practices for how to record code builds in order to go back to old versions. But MLOps does not yet have standardisation on how to record and go back to the data that was used to train a version of a model.
There are also special MLOps challenges to face in the live environment. There are largely agreed DevOps approaches for monitoring for error codes or an increase in latency. But it’s a different challenge to monitor for bad predictions. You may not have any direct way of knowing whether a prediction is good, and may have to instead monitor indirect signals such as customer behaviour (conversions, rate of customers leaving the site, any feedback submitted). It can also be hard to know in advance how well your training data represents your live data. For example, it might match well at a general level but there could be specific kinds of exceptions. This risk can be mitigated with careful monitoring and cautious management of the rollout of new versions.
The MLOps Tool Scene
The effort involved in solving MLOps challenges can be reduced by leveraging a platform and applying it to the particular case. Many organisations face a choice of whether to use an off-the-shelf machine-learning platform or try to put an in-house platform together themselves by assembling open-source components.
Some machine-learning platforms are part of a cloud provider’s offering, such as AWS SageMaker or AzureML. This may or may not appeal, depending on the cloud strategy of the organisation. Other platforms are not cloud-specific and instead offer self-install or a custom hosted solution (eg, Databricks MLflow).
Instead of choosing a platform, organisations can instead choose to assemble their own. This may be a preferred route when requirements are too niche to fit a current platform, such as needing integrations to other in-house systems or if data has to be stored in a particular location or format. Choosing to assemble an in-house platform requires learning to navigate the ML tool landscape. This landscape is complex with different tools specialising in different niches and in some cases there are competing tools approaching similar problems in different ways (see the Linux Foundation’s LF AI project for a visualization or categorised lists from the Institute for Ethical AI).
For organisations using Kubernetes, the kubeflow project presents an interesting option as it aims to curate a set of open-source tools and make them work well together on kubernetes. The project is led by Google, and top contributors (as listed by IBM) include IBM, Cisco, Caicloud, Amazon, and Microsoft, as well as ML tooling provider Seldon, Chinese tech giant NetEase, Japanese tech conglomerate NTT, and hardware giant Intel.
Challenges around reproducibility and monitoring of machine learning systems are governance problems. They need to be addressed in order to be confident that a production system can be maintained and that any challenges from auditors or customers can be answered. For many projects these are not the only challenges as customers might reasonably expect to be able to ask why a prediction concerning them was made. In some cases this may also be a legal requirement as the European Union’s General Data Protection Regulation states that a "data subject" has a right to "meaningful information about the logic involved" in any automated decision that relates to them.
Explainability is a data science problem in itself. Modelling techniques can be divided into “black-box” and “white-box”, depending on whether the method can naturally be inspected to provide insight into the reasons for particular predictions. With black-box models, such as proprietary neural networks, the options for interpreting results are more restricted and more difficult to use than the options for interpreting a white-box linear model. In highly regulated industries, it can be impossible for AI projects to move forward without supporting explainability. For example, medical diagnosis systems may need to be highly interpretable so that they can be investigated when things go wrong or so that the model can aid a human doctor. This can mean that projects are restricted to working with models that admit of acceptable interpretability. Making black-box models more interpretable is a fast-growth area, with new techniques rapidly becoming available.
The MLOps scene is evolving as machine-learning becomes more widely adopted, and we learn more about what counts as best practice for different use cases. Different organisations have different machine learning use cases and therefore differing needs. As the field evolves we’ll likely see greater standardisation, and even the more challenging use cases will become better supported. ®
Want to learn more?
Ryan Dawson is a core member of the Seldon open-source team, providing tooling for machine-learning deployments to Kubernetes. He has spent 10 years working in the Java development scene in London across a variety of industries.
Bringing DevOps principles to machine learning throws up some unique challenges, not least very different workflows and artifacts. Ryan will dive into this topic in May at Continuous Lifecycle London 2020 – a conference organized by The Register's mothership, Situation Publishing.
You can find out more, and book tickets, right here.