The MLflow Airflow provider has been deprecated and is no longer maintained. This tutorial was kept for reference purposes only.
MLflow is a popular tool for tracking and managing machine learning models. It can be used together with Airflow for ML orchestration (MLOx), leveraging both tools for what they do best. In this tutorial, you’ll learn about three different ways you can use MLflow with Airflow.
The DAG in this tutorial shows three different ways Airflow can interact with MLflow:
mlflow.sklearn.autolog. You can use this package to write custom Airflow tasks for ML-related actions like feature engineering.This tutorial takes approximately 30 minutes to complete.
To get the most out of this tutorial, make sure you have an understanding of:
Create a new Astro project:
Add the following packages to your packages.txt file:
Add the following packages to your requirements.txt file:
To connect Airflow to your MLflow instance, you need to create a connection in Airflow.
Run astro dev start in your Astro project to start up Airflow and open the Airflow UI at localhost:8080.
In the Airflow UI, go to Admin -> Connections and click +.
Create a new connection named mlflow_default and choose the HTTP connection type. Enter the following values to create a connection to a local MLflow instance:
mlflow_defaultHTTPhttp://host.docker.internal5000If you are using a remote MLflow instance, enter your MLflow instance URL as the Host and your username and password as the Login and Password in the connection. If you are running your MLflow instance via Databricks, enter your Databricks URL as the Host, enter token as the Login and your Databricks personal access token as the Password. When you test the connection from the Airflow UI, please note that the Test button might return a 405 error message even if your credentials are correct.
In your dags folder, create a file called mlflow_tutorial_dag.py.
Copy the following code into the file. Make sure to provide the name of a bucket in your object storage that is connected to your MLflow instance to the ARTIFACT_BUCKET variable.
This DAG consists of three tasks, each showing a different way to use MLflow with Airflow.
create_experiment task creates a new experiment in MLflow by using the MLflowClientHook in a TaskFlow API task. The MLflowClientHook is one of several hooks in the MLflow provider that contains abstractions over calls to the MLflow API.scale_features task uses the mlflow package in a Python decorated task with scikit-learn to log information about the scaler to MLflow. This functionality is not included in any modules of the MLflow provider, so a custom Python function is the best way to implement this task.create_registered_model task uses the CreateRegisteredModelOperator to register a new model in your MLflow instance.In the Airflow UI run the mlflow_tutorial_dag DAG by clicking the play button.

Open the MLflow UI (if you are running locally at localhost:5000) to see the data recorded by each task in your DAG.
The create_experiment task created the Housing experiments, where your Scaler run from the scale_features task was recorded.

The create_registered_model task created a registered model with two tags.

Open your object storage (if you are using a local MinIO instance at localhost:9001) to see your MLflow artifacts.

Congratulations! You used MLflow and Airflow together in three different ways. Learn more about other operators and hooks in the MLflow Airflow provider in the official GitHub repository.