For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
      • AstroFully-managed data operations, powered by Apache Airflow.
      • Astro Private CloudRun Airflow-as-a-service in your environment.
      • Professional ServicesExpert Airflow services for your enterprise's success.
    • Tools
      • Cosmos
      • Orbiter
      • CLI
      • AI SDK
      • Agents
      • Blueprint
      • UpdatesThe State of Airflow 2026See the insights from over 5,800 data practitioners in the full report. Download Now ➔
  • Customers
  • Docs
    • Insights
      • Blog
      • Webinars
      • Resource Library
      • Events
    • Education
      • Academy
      • What is Airflow?
  • Pricing
Get Started Free
    • Overview
        • Context graphs
        • Hybrid search
        • Product insights
        • Fine-tuning with Anyscale
        • AI-powered education operations
    • Glossary

Product

  • Platform Overview
  • Astro
  • Astro Observe
  • Astro Private Cloud
  • Security & Trust
  • Pricing

Tools & Services

  • Cosmos
  • Docs
  • Professional Services
  • Product Updates

Use Cases

  • AI Ops
  • Data Observability
  • ETL/ELT
  • ML Ops
  • Operational Analytics
  • All Use Cases

Industries

  • Financial Services
  • Gaming
  • Retail
  • Manufacturing
  • Healthcare
  • All Industries

Resources

  • Academy
  • eBooks & Guides
  • Blog
  • Webinars
  • Events
  • The Data Flowcast Podcast
  • All Resources

Airflow

  • What is Airflow
  • Airflow on Astro
  • Airflow 3.0
  • Airflow Upgrades
  • Airflow Use Cases
  • Airflow 2.x End of Life

Company

  • Our Story
  • Customers
  • Newsroom
  • Careers
  • Contact

Support

  • Knowledge Base
  • Status
  • Contact Support
GitHubYouTubeLinkedInx
  • Legal
  • Privacy
  • Terms of Service
  • Consent Preferences

  • Do Not Sell or Share My Personal information
  • Limit the Use Of My Sensitive Personal Information

Apache Airflow®, Airflow, and the Airflow logo are trademarks of the Apache Software Foundation. Copyright © Astronomer 2026. All rights reserved.

LogoLogo
On this page
  • Architecture
  • Airflow features
  • Next Steps
Reference ArchitecturesGenAI

Batch inference for product insights with Apache Airflow®

Edit this page
Built with

Info

This page has not yet been updated for Airflow 3. The concepts shown are relevant, but some code may need to be updated. If you run any examples, take care to update import statements and watch for any other breaking changes.

The Batch inference for product insights repository is a free and open-source reference architecture showing how to use Apache Airflow® and OpenAI to summarize product feedback and generate insights from it. The full source code is available on GitHub.

Screenshot of a Slack message showing a product summary generated by the pipeline.

This reference architecture was created as a learning tool to demonstrate how to use Apache Airflow to orchestrate data ingestion, tagging of feedback with relevant products, and per-product feedback summarization in a batch inference pipeline. You can adapt the pipeline for your use case by ingesting data from other sources and adjust the LLM model prompts to fit your needs.

Architecture

Batch inference reference architecture diagram.

This batch inference pipeline consists of 4 main components:

  • Data ingestion and embedding: Product feedback is ingested from a variety of sources. The ingest_zendesk_tickets DAG extracts feedback from Zendesk tickets stored in Snowflake, the ingest_data_apis DAG extracts feedback from the GitHub and StackOverflow APIs, as well as from local files containing G2 reviews.
  • Product/feature tagging: Using OpenAI, the feedback is tagged with the relevant product or feature.
  • Create feedback summaries and insights: All feedback relating to one product/feature is aggregated and summarized using GPT-4o. The summaries are posted to a Slack channel.
  • Executive summary: A final DAG aggregates all product summaries and insights into an executive summary that is posted to a Slack channel.

Airflow features

The DAGs that power the batch inference pipelines highlight several key Airflow best practices and features:

  • Dynamic task mapping: Dynamic task mapping is used extensively to parallelize tasks throughout the pipeline. For example, feedback summarization and insight generation is parallelized to create one dynamically mapped task instance per product tag that is analyzed. Custom map indexing is used to make it easier to find specific summaries in the task logs.
  • Object Storage: Interaction with files in object storage is simplified using the experimental Airflow Object Storage API.
  • Airflow retries: To protect against transient API failures and rate limits, all tasks are configured to automatically retry after an adjustable delay.
  • Advanced data-driven scheduling: The DAGs in this reference architecture run on data-driven schedules, including conditional asset scheduling.
  • Modularization: The ingest_data_apis DAG serves as an example of a high level of modularization. Task functions are stored in the include folder and imported into the DAG file to be used in @task decorators. Ingestion sources are defined in a list of configurations with a loop generating one parallel ingestion track per source.

Next Steps

Get the Astronomer GenAI cookbook to view more examples of how to use Airflow to build generative AI applications.

If you’d like to build your own batch inference pipeline, feel free to fork the repository and adapt it to your use case. We recommend deploying the Airflow pipelines using a free trial of Astro.