Batch inference for product insights with Apache Airflow®
Batch inference for product insights with Apache Airflow®
Batch inference for product insights with Apache Airflow®
The Batch inference for product insights repository is a free and open-source reference architecture showing how to use Apache Airflow® and OpenAI to summarize product feedback and generate insights from it. The full source code is available on GitHub.

This reference architecture was created as a learning tool to demonstrate how to use Apache Airflow to orchestrate data ingestion, tagging of feedback with relevant products, and per-product feedback summarization in a batch inference pipeline. You can adapt the pipeline for your use case by ingesting data from other sources and adjust the LLM model prompts to fit your needs.

This batch inference pipeline consists of 4 main components:
ingest_zendesk_tickets DAG extracts feedback from Zendesk tickets stored in Snowflake, the ingest_data_apis DAG extracts feedback from the GitHub and StackOverflow APIs, as well as from local files containing G2 reviews.The DAGs that power the batch inference pipelines highlight several key Airflow best practices and features:
ingest_data_apis DAG serves as an example of a high level of modularization. Task functions are stored in the include folder and imported into the DAG file to be used in @task decorators. Ingestion sources are defined in a list of configurations with a loop generating one parallel ingestion track per source.Get the Astronomer GenAI cookbook to view more examples of how to use Airflow to build generative AI applications.
If you’d like to build your own batch inference pipeline, feel free to fork the repository and adapt it to your use case. We recommend deploying the Airflow pipelines using a free trial of Astro.