November 18, 2022

What is MLOps?

MLOps is a term popping up at most machine learning conferences and in an increasing number of publications. It's gaining traction for all the right reasons as it helps manage the machine learning lifecycle, create reproducible workflows, or accelerate the validation process.

In this article, you'll learn:

what MLOps means
what benefits it brings
how to introduce MLOps in your company.

Let's get to it!

Introduction to MLOps

MLOps stands for Machine Learning Operations. It refers to a set of best practices companies may use to execute machine learning projects successfully. MLOps is like DevOps for Machine Learning, to put it simply, though these two are not identical - while they have similar assumptions at their core, they are implemented differently.

MLOps' role is to streamline the communication between machine learning engineers/ data scientists and operations specialists to improve collaboration. With this approach, organizations can simplify the ML management process, boost quality, and better adjust to business needs.

It is essential to understand that MLOps is not an added service within machine learning development - you can't just "order" it at the very end of your ML development process and reap the benefits immediately. MLOps is an approach to the entire machine learning development lifecycle. This means that the phases of MLOps include:

data acquisition
data analysis
data preparation
model training
model evaluation
model refinement
model deployment
monitoring & optimization.

These roughly cover the entire machine learning lifecycle - roughly because the level of granularity of the steps included varies depending on the source. Let's look at an example of the AI life cycle as presented by NVIDIA:

AI lifecycle by Invidia — Source: NVIDIA

So is MLOps just DevOps?

No. As mentioned before, these two are, to some extent, related, but they're not the same thing.

MLOps is built with the concepts of DevOps in mind. DevOps is a set of best practices connecting software development (Dev) and IT Operations (Ops) and emerged over a decade ago (~2008) as a result of IT operations and software development communities raising concerns about "what they felt was a fatal level of dysfunction in the industry," as stated by Ian Buchanan, the Principal Solutions Engineer at Atlassian.

MLOps, in turn, is a younger discipline that has its origin in a 2015 paper, "Hidden Technical Debt in Machine Learning Systems". It has since grown significantly, and according to Cognilytica, it's expected that the market for MLOps solutions will reach 4 billion dollars.

In a piece by Forbes, Samir Tout, a Professor of Cybersecurity at the Eastern Michigan University's School of Information Security & Applied Computing (SISAC), shared: "MLOps is the natural progression of DevOps in the context of AI. While it leverages DevOps' focus on security, compliance, and management of IT resources, MLOps' real emphasis is on the consistent and smooth development of models and their scalability."

The benefits of MLOps

1) Automation

Dedicating time to MLOps, when you have several models in production, helps the entire team by facilitating consistent and predictable results. Instead of manually switching back and forth between various tools or languages, thanks to MLOps, it's simpler for data scientists to design experiments encompassing whole model life cycles and organize them in centralized ways.

2) Scalability

To make data-driven decisions, organizations now more than ever rely on machine learning. To do that, they take advantage of various ML use cases, such as diagnosis aid, fraud detection, chatbots, and product recommendations. However, they also need to define a process for enhancing current models and publishing new ones. Having scalability in mind, data scientists and ML engineers must create a strong infrastructure that can manage a variety of workloads. Companies may have dozens or even hundreds of models in production, all of which require regular monitoring and revisions.

3) Streamlined collaboration

Collaboration between engineers and business professionals, as well as data scientists and engineers, is necessary for a company-wide adoption of ML models. Organizations can standardize ML workflows and establish a shared language for all stakeholders thanks to MLOps principles. This reduces compatibility problems and quickens the construction and deployment of modeling processes.

4) Reproducibility

Automating ML workflows enables reproducibility and repeatability in a variety of processes, including the development, testing, and deployment of ML models. Because of this, continuously trained models become dynamic and adapt to change, e.g. through data versioning. MLOps makes sure to save various versions of data that were created or modified at particular points in time. MLOps practices also include versioning the model with several hyperparameters and model types.

5) Monitoring

Models drift over time as the environment changes, so keeping an eye on their behavior and performance is essential. By continuously retraining the model and using automatic alerts in the event of model drift, MLOps allows organizations to monitor and gain insights into model performance in a systematic way.

6) Cost savings

All the benefits of MLOps combined generate another benefit: cost savings. You can save money because automation reduces the need for manual machine learning model management, which allows better human resources management when they save time they would normally dedicate to ML model management. What's more, you can methodically identify and lessen failures thanks to MLOps. And fewer errors mean more money saved.

How to implement MLOps

When implementing MLOps in your organization, there are 3 ways you can do it:

MLOps level 0 - manual process
MLOps level 1 - ML pipeline automation
MLOps level 2 - CI/CD pipeline automation

MLOps level 0

For businesses that are just getting started with ML, this is usually the level they can easily achieve. It's a script-driven process where each step is performed manually: data preparation, analysis, model training, validation. If you don't train or modify your models often, a fully manual machine learning workflow will suit you well.

Figure 1: Manual ML steps as illustrated by Google

MLOps level 1

By automating the ML pipeline, level 1's objective is to perform continuous model training, enabling companies to supply model prediction services continuously. In this step, the organization must include automated data and model validation processes, pipeline triggers, and metadata management to the pipeline in order to automate the process of retraining models in production using new data.

Figure 2: ML Pipeline automation for Continuous Training as illustrated by Google

MLOps level 2

You need a strong automated CI/CD solution to provide quick and reliable updates to the production pipelines. With the help of this automated CI/CD solution, your data scientists can quickly investigate novel feature engineering, model architecture search, and hyperparameter optimization. They may put these concepts into practice and automatically create, examine, and deploy the new pipeline components to the intended environment.

Figure 3: CI/CD and automated ML pipeline as illustrated by Google

Summary

There is much more that could be said about MLOps - and we're going to discuss this topic further in another in-depth piece where you'll be able to learn more about MLOps in practice and the good practices to follow. I hope this article gave you an overview of why it's worth introducing MLOps into your projects - it's not one of the AI hypes, but a process that streamlines work and allows for better productivity and efficiency.