ELT with Snowflake and Apache Airflow® for eCommerce
ELT with Snowflake and Apache Airflow® for eCommerce
ELT with Snowflake and Apache Airflow® for eCommerce
This reference architecture shows how to build an ELT pipeline that ingests eCommerce transaction data, loads it to Snowflake, transforms it through multiple SQL layers, runs data quality checks at each stage, and displays the results in a Streamlit dashboard. Apache Airflow® orchestrates the entire flow across multiple Dags that are chained together using data-aware scheduling. A demo of the architecture is shown in the Implementing reliable ETL & ELT pipelines with Airflow and Snowflake webinar.

The architecture demonstrates a pattern common in analytics teams: extracting from an API, staging raw files in object storage with a clear lifecycle (ingest, stage, archive), loading to a warehouse, running SQL transformations with built-in data quality gates, and surfacing the results in a dashboard. You can adapt it by swapping the data source, adjusting the SQL transformations, or replacing the Streamlit dashboard.

This reference architecture consists of four main components:
In addition to the main pipeline, two housekeeping Dags manage the object storage lifecycle by moving raw files from ingest to stage to archive as they are processed. This keeps the ingest location clean and provides an audit trail of all ingested data.
Data flows through the system in stages: API to object storage to Snowflake raw tables to transformed views to dashboard. Each Dag handles one phase and triggers the next through data-aware scheduling, so downstream work only starts when upstream data is ready.
SQLColumnCheckOperator (validating column-level constraints like non-null and value ranges) and the SQLTableCheckOperator (validating table-level conditions like row counts). Some checks are blocking and stop the pipeline on failure, while others are non-blocking and only send a notification.on_failure_callback at the task group level, so the team is informed without halting the entire pipeline.include folder and executed by SQLExecuteQueryOperator tasks in the Dags. Python helper functions and data quality check definitions are modularized as well, separating orchestration logic from business logic and making individual components reusable across Dags.To build your own ELT pipeline with Snowflake and Apache Airflow, explore the individual Learn guides linked in the Airflow features section for detailed implementation guidance on each pattern. Astronomer recommends deploying Airflow pipelines using a free trial of Astro.