For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
      • AstroFully-managed data operations, powered by Apache Airflow.
      • Astro Private CloudRun Airflow-as-a-service in your environment.
      • Professional ServicesExpert Airflow services for your enterprise's success.
    • Tools
      • Cosmos
      • Orbiter
      • CLI
      • AI SDK
      • Agents
      • Blueprint
      • UpdatesThe State of Airflow 2026See the insights from over 5,800 data practitioners in the full report. Download Now ➔
  • Customers
  • Docs
    • Insights
      • Blog
      • Webinars
      • Resource Library
      • Events
    • Education
      • Academy
      • What is Airflow?
  • Pricing
Get Started Free
    • Overview
      • Overview
          • Hybrid search
          • Product insights
          • Fine-tuning with Anyscale
          • Use case - LLM RAG for finance
          • Use case - LLM customer feedback
      • Glossary
    • Glossary

Product

  • Platform Overview
  • Astro
  • Astro Observe
  • Astro Private Cloud
  • Security & Trust
  • Pricing

Tools & Services

  • Cosmos
  • Docs
  • Professional Services
  • Product Updates

Use Cases

  • AI Ops
  • Data Observability
  • ETL/ELT
  • ML Ops
  • Operational Analytics
  • All Use Cases

Industries

  • Financial Services
  • Gaming
  • Retail
  • Manufacturing
  • Healthcare
  • All Industries

Resources

  • Academy
  • eBooks & Guides
  • Blog
  • Webinars
  • Events
  • The Data Flowcast Podcast
  • All Resources

Airflow

  • What is Airflow
  • Airflow on Astro
  • Airflow 3.0
  • Airflow Upgrades
  • Airflow Use Cases
  • Airflow 2.x End of Life

Company

  • Our Story
  • Customers
  • Newsroom
  • Careers
  • Contact

Support

  • Knowledge Base
  • Status
  • Contact Support
GitHubYouTubeLinkedInx
  • Legal
  • Privacy
  • Terms of Service
  • Consent Preferences

  • Do Not Sell or Share My Personal information
  • Limit the Use Of My Sensitive Personal Information

Apache Airflow®, Airflow, and the Airflow logo are trademarks of the Apache Software Foundation. Copyright © Astronomer 2026. All rights reserved.

LogoLogo
On this page
  • Architecture
  • Airflow features
  • Next Steps
Airflow 2.xReference ArchitecturesGenAI

Hybrid Search for eCommerce reference architecture

Edit this page
Built with

The Hybrid Search for eCommerce GitHub repository is a free and open-source reference architecture showing how to use Apache Airflow® with Weaviate to build an automated hybrid search application. A demo of the architecture was shown in the Modern Infrastructure for World Class AI Applications webinar.

Screenshot of the Hybrid Search application frontend.

This reference architecture demonstrates how to use Apache Airflow to orchestrate RAG data ingestion that powers a search application as well as a batch inference pipeline analyzing search queries. It also shows how to use Weaviate’s advanced search capabilities. You can adapt the Hybrid Search application to your use case by ingesting your own data and adjusting the search queries in the website backend to fit your needs.

Architecture

Hybrid search reference architecture diagram.

The hybrid search reference architecture consists of 3 main components:

  • Data ingestion and embedding: Sample data containing product descriptions and images is ingested from Amazon S3 and Snowflake into Weaviate, a vector database. Embedding of the product descriptions uses OpenAI models.
  • Hybrid search: The demo website with a Flask backend and React frontend allows users to experiment with advanced Weaviate search by querying the product descriptions using hybrid search. An OpenAI embedding model is used to embed the user query.
  • Batch inference: All user search queries are stored back in Weaviate so they can be used by a downstream Airflow DAG that runs an OpenAI batch inference pipeline to classify user queries and derive product insights. The results of this analysis are loaded into Snowflake to be displayed in a Streamlit dashboard.

Airflow features

The DAGs that power this hybrid search application highlight several key Airflow best practices and features:

  • Airflow retries: To protect against transient API failures and rate limits, all tasks are configured to automatically retry after an adjustable delay.
  • Advanced data-driven scheduling: The DAGs in this reference architecture run on data-driven schedules, including combined dataset and time scheduling and conditional dataset scheduling.
  • Dynamic task mapping: Product information extraction and ingestion into Weaviate are split into multiple parallelized tasks, the number of which is determined at runtime based on the number of ingestion folders with product information that needs to be processed.
  • Object Storage: Interaction with files in object storage is simplified using the experimental Airflow Object Storage API.
  • Modularization: Functions defining how information is extracted and checksums are calculated are modularized in the include folder and imported into the DAGs. This makes the DAG code more readable and offers the ability to reuse functions across multiple DAGs.

Next Steps

Get the Astronomer GenAI cookbook to view more examples of how to use Airflow to build generative AI applications.

If you’d like to build your own hybrid search application, feel free to fork the repository and adapt it to your use case. We recommend to deploy the Airflow pipelines using a free trial of Astro.