For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
      • AstroFully-managed data operations, powered by Apache Airflow.
      • Astro Private CloudRun Airflow-as-a-service in your environment.
      • Professional ServicesExpert Airflow services for your enterprise's success.
    • Tools
      • Cosmos
      • Orbiter
      • CLI
      • AI SDK
      • Agents
      • Blueprint
      • UpdatesThe State of Airflow 2026See the insights from over 5,800 data practitioners in the full report. Download Now ➔
  • Customers
  • Docs
    • Insights
      • Blog
      • Webinars
      • Resource Library
      • Events
    • Education
      • Academy
      • What is Airflow?
  • Pricing
Get Started Free
    • Overview
      • Overview
        • SageMaker
        • Anyscale
        • Kafka
        • Azure Blob Storage
        • Azure Container Instances
        • Azure Data Factory integration
        • Azure Data Factory connection
        • Entra Workload Identity
        • BigQuery
        • Cohere
        • dbt
        • DuckDB
        • Fivetran
        • Great Expectations
        • Execute notebooks
        • Marquez
        • MLflow
        • MongoDB
        • MS SQL Server
        • OpenAI
        • OpenSearch
        • pgvector
        • Pinecone
        • PostgreSQL
        • Qdrant
        • Ray
        • Soda data quality
        • Weaviate
        • Weights and Biases
      • Glossary
    • Glossary

Product

  • Platform Overview
  • Astro
  • Astro Observe
  • Astro Private Cloud
  • Security & Trust
  • Pricing

Tools & Services

  • Cosmos
  • Docs
  • Professional Services
  • Product Updates

Use Cases

  • AI Ops
  • Data Observability
  • ETL/ELT
  • ML Ops
  • Operational Analytics
  • All Use Cases

Industries

  • Financial Services
  • Gaming
  • Retail
  • Manufacturing
  • Healthcare
  • All Industries

Resources

  • Academy
  • eBooks & Guides
  • Blog
  • Webinars
  • Events
  • The Data Flowcast Podcast
  • All Resources

Airflow

  • What is Airflow
  • Airflow on Astro
  • Airflow 3.0
  • Airflow Upgrades
  • Airflow Use Cases
  • Airflow 2.x End of Life

Company

  • Our Story
  • Customers
  • Newsroom
  • Careers
  • Contact

Support

  • Knowledge Base
  • Status
  • Contact Support
GitHubYouTubeLinkedInx
  • Legal
  • Privacy
  • Terms of Service
  • Consent Preferences

  • Do Not Sell or Share My Personal information
  • Limit the Use Of My Sensitive Personal Information

Apache Airflow®, Airflow, and the Airflow logo are trademarks of the Apache Software Foundation. Copyright © Astronomer 2026. All rights reserved.

LogoLogo
On this page
  • Three ways to use MLflow with Airflow
  • Time to complete
  • Assumed knowledge
  • Prerequisites
  • Step 1: Configure your Astro project
  • Step 2: Configure your Airflow connection
  • Step 3: Create your DAG
  • Step 4: Run your DAG
  • Conclusion
Airflow 2.xIntegrations & connections

Use MLflow with Apache Airflow

Edit this page
Built with

The MLflow Airflow provider has been deprecated and is no longer maintained. This tutorial was kept for reference purposes only.

MLflow is a popular tool for tracking and managing machine learning models. It can be used together with Airflow for ML orchestration (MLOx), leveraging both tools for what they do best. In this tutorial, you’ll learn about three different ways you can use MLflow with Airflow.

Three ways to use MLflow with Airflow

The DAG in this tutorial shows three different ways Airflow can interact with MLflow:

  • Use an MLflow operator from the MLflow Airflow provider. The MLflow provider contains several operators that abstract over common actions you might want to perform in MLflow, such as creating a deployment with the CreateDeploymentOperator or running predictions from an existing model with the ModelLoadAndPredictOperator.
  • Use an MLflow hook from the MLflow Airflow provider. The MLflow provider contains several Airflow hooks that allow you to connect to MLflow using credentials stored in an Airflow connection. You can use these hooks if you need to perform actions in MLflow for which no dedicated operator exists. You can also use these hooks to create your own custom operators.
  • Use the MLflow Python package directly in a @task decorated task. The MLflow Python package contains functionality like tracking metrics and artifacts with mlflow.sklearn.autolog. You can use this package to write custom Airflow tasks for ML-related actions like feature engineering.

Time to complete

This tutorial takes approximately 30 minutes to complete.

Assumed knowledge

To get the most out of this tutorial, make sure you have an understanding of:

  • The basics of MLflow. See MLflow Concepts.
  • Airflow fundamentals, such as writing DAGs and defining tasks. See Get started with Apache Airflow.
  • Airflow operators. See Operators 101.
  • Airflow hooks. See Hooks 101.
  • Airflow connections. See Managing your Connections in Apache Airflow.

Prerequisites

  • The Astro CLI.
  • An MLflow instance. This tutorial uses a local instance.
  • An object storage connected to your MLflow instance. This tutorial uses MinIO.

Step 1: Configure your Astro project

  1. Create a new Astro project:

    1$ mkdir astro-mlflow-tutorial && cd astro-mlflow-tutorial
    2$ astro dev init
  2. Add the following packages to your packages.txt file:

    git
    gcc
    gcc python3-dev
  3. Add the following packages to your requirements.txt file:

    airflow-provider-mlflow==1.1.0
    mlflow-skinny==2.3.2

Step 2: Configure your Airflow connection

To connect Airflow to your MLflow instance, you need to create a connection in Airflow.

  1. Run astro dev start in your Astro project to start up Airflow and open the Airflow UI at localhost:8080.

  2. In the Airflow UI, go to Admin -> Connections and click +.

  3. Create a new connection named mlflow_default and choose the HTTP connection type. Enter the following values to create a connection to a local MLflow instance:

    • Connection ID: mlflow_default
    • Connection Type: HTTP
    • Host: http://host.docker.internal
    • Port: 5000

If you are using a remote MLflow instance, enter your MLflow instance URL as the Host and your username and password as the Login and Password in the connection. If you are running your MLflow instance via Databricks, enter your Databricks URL as the Host, enter token as the Login and your Databricks personal access token as the Password. When you test the connection from the Airflow UI, please note that the Test button might return a 405 error message even if your credentials are correct.

Step 3: Create your DAG

  1. In your dags folder, create a file called mlflow_tutorial_dag.py.

  2. Copy the following code into the file. Make sure to provide the name of a bucket in your object storage that is connected to your MLflow instance to the ARTIFACT_BUCKET variable.

    1"""
    2### Show three ways to use MLFlow with Airflow
    3
    4This DAG shows how you can use the MLflowClientHook to create an experiment in MLFlow,
    5directly log metrics and parameters to MLFlow in a TaskFlow task via the mlflow Python package, and
    6create a new model using the CreateRegisteredModelOperator of the MLflow Airflow provider package.
    7"""
    8
    9from airflow.decorators import dag, task
    10from pendulum import datetime
    11from astro.dataframes.pandas import DataFrame
    12from mlflow_provider.hooks.client import MLflowClientHook
    13from mlflow_provider.operators.registry import CreateRegisteredModelOperator
    14
    15# Adjust these parameters
    16EXPERIMENT_ID = 1
    17ARTIFACT_BUCKET = "<your-bucket-name>"
    18
    19## MLFlow parameters
    20MLFLOW_CONN_ID = "mlflow_default"
    21EXPERIMENT_NAME = "Housing"
    22REGISTERED_MODEL_NAME = "my_model"
    23
    24
    25@dag(
    26 schedule=None,
    27 start_date=datetime(2023, 1, 1),
    28 catchup=False,
    29)
    30def mlflow_tutorial_dag():
    31 # 1. Use a hook from the MLFlow provider to interact with MLFlow within a TaskFlow task
    32 @task
    33 def create_experiment(experiment_name, artifact_bucket, **context):
    34 """Create a new MLFlow experiment with a specified name.
    35 Save artifacts to the specified S3 bucket."""
    36
    37 ts = context["ts"]
    38
    39 mlflow_hook = MLflowClientHook(mlflow_conn_id=MLFLOW_CONN_ID)
    40 new_experiment_information = mlflow_hook.run(
    41 endpoint="api/2.0/mlflow/experiments/create",
    42 request_params={
    43 "name": ts + "_" + experiment_name,
    44 "artifact_location": f"s3://{artifact_bucket}/",
    45 },
    46 ).json()
    47
    48 return new_experiment_information
    49
    50 # 2. Use mlflow.sklearn autologging in a TaskFlow task
    51 @task
    52 def scale_features(experiment_id: str):
    53 """Track feature scaling by sklearn in Mlflow."""
    54 from sklearn.datasets import fetch_california_housing
    55 from sklearn.preprocessing import StandardScaler
    56 import mlflow
    57 import pandas as pd
    58
    59 df = fetch_california_housing(download_if_missing=True, as_frame=True).frame
    60
    61 mlflow.sklearn.autolog()
    62
    63 target = "MedHouseVal"
    64 X = df.drop(target, axis=1)
    65 y = df[target]
    66
    67 scaler = StandardScaler()
    68
    69 with mlflow.start_run(experiment_id=experiment_id, run_name="Scaler") as run:
    70 X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)
    71 mlflow.sklearn.log_model(scaler, artifact_path="scaler")
    72 mlflow.log_metrics(pd.DataFrame(scaler.mean_, index=X.columns)[0].to_dict())
    73
    74 X[target] = y
    75
    76 # 3. Use an operator from the MLFlow provider to interact with MLFlow directly
    77 create_registered_model = CreateRegisteredModelOperator(
    78 task_id="create_registered_model",
    79 name="{{ ts }}" + "_" + REGISTERED_MODEL_NAME,
    80 tags=[
    81 {"key": "model_type", "value": "regression"},
    82 {"key": "data", "value": "housing"},
    83 ],
    84 )
    85
    86 (
    87 create_experiment(
    88 experiment_name=EXPERIMENT_NAME, artifact_bucket=ARTIFACT_BUCKET
    89 )
    90 >> scale_features(experiment_id=EXPERIMENT_ID)
    91 >> create_registered_model
    92 )
    93
    94
    95mlflow_tutorial_dag()

    This DAG consists of three tasks, each showing a different way to use MLflow with Airflow.

    • The create_experiment task creates a new experiment in MLflow by using the MLflowClientHook in a TaskFlow API task. The MLflowClientHook is one of several hooks in the MLflow provider that contains abstractions over calls to the MLflow API.
    • The scale_features task uses the mlflow package in a Python decorated task with scikit-learn to log information about the scaler to MLflow. This functionality is not included in any modules of the MLflow provider, so a custom Python function is the best way to implement this task.
    • The create_registered_model task uses the CreateRegisteredModelOperator to register a new model in your MLflow instance.

Step 4: Run your DAG

  1. In the Airflow UI run the mlflow_tutorial_dag DAG by clicking the play button.

    DAGs overview

  2. Open the MLflow UI (if you are running locally at localhost:5000) to see the data recorded by each task in your DAG.

    The create_experiment task created the Housing experiments, where your Scaler run from the scale_features task was recorded.

    MLflow UI experiments

    The create_registered_model task created a registered model with two tags.

    MLflow UI models

  3. Open your object storage (if you are using a local MinIO instance at localhost:9001) to see your MLflow artifacts.

    MinIO experiment artifacts

Conclusion

Congratulations! You used MLflow and Airflow together in three different ways. Learn more about other operators and hooks in the MLflow Airflow provider in the official GitHub repository.