ReasonField Lab
VirtusLab Group Company
hello@reasonfieldlab.com
ReasonField Lab 2024, All Rights Reserved
With the rise of AI and ML, a new acronym became popular - MLOps. MLOps stands for Machine Learning Operations. This concept focuses on streamlining the process of bringing ML models to production in an easy, safe, and organised fashion.
In 2019, executives at Gap said that 87% of data science projects do not go to production. You may ask, “How is it possible?”. This is mainly because, at the time, there were no excellent practices, and the whole deployment in most companies was in chaos. To mitigate that, MLOps practices emerged.
In this blog post, I will explain in simple terms what MLOps is, how it is applied to Machine Learning pipelines, and why it matters. Then, I will outline MLOps practices and how they relate to DevOps. Finally, I will show how different the system is with and without MLOps.
MLOps is a set of practices for collaboration between data scientists and operation professionals (between people working on models and people using models as software components). Applying these practices:
The chart above shows relationships between different parts of the Machine Learning cycle. It starts with Data Sourcing and Data Labelling, and then we have Experiment tracking, Model Versioning, and Model Deployment. Finally, we monitor predictions to control how our model is used. To ensure high quality, MLOps controls all the segments of the Machine Learning process. Now, let's talk about why it is needed.
MLOps provides a structure that eases model training and deployment and lowers operational costs.
Using it may significantly reduce risks as it provides monitoring tools. With monitoring tools in place, multiple problems can be caught, such as data drift (when your data diverges from your training data and your model performs significantly worse, e.g., the majority of Machine Learning models during the COVID-19 pandemic collapsed after such an unforeseeable anomaly).
Applying MLOps also allows you to scale your solution once you gather more data or want to support 10x more users. It allows you to perform the soft release of new versions of the models, test them (i.e., A/B testing) or dynamically scale when needed (effectively reducing the cost of operation when dealing with varying numbers of requests). Additionally, you can control changes in datasets and experimental setups.
The whole MLOps paradigm circles the topic of its practices. Now, I will outline where those practices are applied in the Machine Learning pipeline:
MLOps was highly inspired by DevOps, its success, and its positive influence on Software Engineering. With a rapid and interactive approach to creating applications, according to DevOps principles, MLOps similarly applies its rules to ship machine learning models to production.
On the other hand, MLOps faces some more challenges. First, MLOps is more experimental as the created model is not probabilistic in its definition. When considering that, it is harder to guarantee reproducibility with Machine Learning than Software.
Secondly, teams in Software are composed of less technical roles like UI/UX and PMs and more technical code-focused roles like Software engineers. In ML teams, you have all of the above and an extension of the technical staff by Data Scientists, Machine Learning Researchers, and Machine Learning Engineers, each focusing on a different aspect of the solution while closely collaborating with others.
Model testing is significantly more challenging and less predictable than Software testing. Designing ML tests is still not formalised, while with software, it is a well-established field with division into different scopes (unit, integration, system and acceptance). In ML systems, you must also add a model validation step on top of all software tests.
Automated deployment is not as easy in the ML world as in the Software due to many small decisions of Data Scientists during model training. To have full processes automated, you have to mimic their reasoning and add some form of monitoring to avoid shipping models of questionable quality.
With ML systems, you can experience seasonal changes or slow evolution of how the data provided looks, which is uncommon in Software systems. Models performance can decay in more ways than software systems, so you should consider that when designing and operating them. Such problems with changes in incoming data can be caused by the following:
Monitoring is more complicated than classical software-related matters because you also have to account for all the statistics of how the model behaves, have alerts when its performance degrades, rollbacks to previous versions, and triggers for training a new model.
To show you better how MLOps affects a project, its problems, and shortcomings, let's consider an example where you want to predict tech stock price via a stock price prediction model.
When we have a simple model trained on available data, we optimise it according to a fixed test data. Usually, ML projects are as follows: You have a pipeline that works in your Jupyter Notebook. In our example, we trained a well-performing model that can predict the price of Tesla stock in 2021-2022 based on historical data. Unfortunately, outside of your Notebook, you cannot access it, as it is not yet deployed to the cloud.
With MLOps, the process looks more complex but has more utility.
In this blog post, I have covered MLOps, its benefits, its practices, and an example of a stock prediction model and its transition from a basic stage without any MLOps to a fully-fledged MLOps system.
For more technical information and hints on applying MLOps successfully, please refer to the MLOps 101 blog post.