ELT with Snowflake and Apache Airflow® for eCommerce
ELT with Snowflake and Apache Airflow® for eCommerce
ELT with Snowflake and Apache Airflow® for eCommerce
The ELT with Snowflake and Apache Airflow® GitHub repository is a free and open-source reference architecture showing how to use Apache Airflow® with Snowflake to build an end-to-end ELT pipeline. The pipeline ingests data from an eCommerce store’s API, completes several transformation steps in SQL, and displays the data in a dashboard. A demo of the architecture is shown in the Implementing reliable ETL & ELT pipelines with Airflow and Snowflake webinar.

This reference architecture was created as a learning tool to demonstrate how to use Apache Airflow to orchestrate data ingestion into object storage and a data warehouse, as well as how to use Snowflake to transform the data in several steps and display it in a dashboard. You can adapt the pipeline for your use case by ingesting data from other sources, adjusting the SQL transformations, and changing the dashboard to fit your needs.

This reference architecture consists of 4 main components:
Additionally, two DAGs move the raw data between different object storage locations (ingest/stage/archive) to archive records that have been processed.
The DAGs in this reference architecture highlight several key Airflow best practices and features:
on_failure_callback at the task-group-level.include folder and imported into the DAG file to be used in SQLExecuteQueryOperator tasks. This makes the DAG code more readable and offers the ability to reuse SQL queries across multiple DAGs. Additionally, some Python functions and data quality check definitions are modularized as well.If you’d like to build your own ELT/ETL pipeline with Snowflake and Apache Airflow®, feel free to fork the repository and adapt it to your use case. We recommend deploying the Airflow pipelines using a free trial of Astro.