ELT with BigQuery, dbt, and Apache Airflow® for eCommerce
ELT with BigQuery, dbt, and Apache Airflow® for eCommerce
ELT with BigQuery, dbt, and Apache Airflow® for eCommerce
The ELT with BigQuery, dbt, and Apache Airflow® GitHub repository is a free and open-source reference architecture showing how to use Apache Airflow® with Google BigQuery and dbt Core to build an end-to-end ELT pipeline. The pipeline ingests data from an eCommerce store’s API, loads the data to BigQuery and completes several transformation steps using dbt Core run with Astronomer Cosmos. After the reporting tables are created, a message is sent to a Slack channel listing the current top customers.

This reference architecture was created as a learning tool to demonstrate how to use Apache Airflow to orchestrate data ingestion into object storage and a data warehouse, as well as how to use dbt Core to transform the data in several steps. You can adapt the pipeline for your use case by ingesting data from other sources and adjusting the dbt transformations.

This reference architecture consists of 4 main components:
The DAGs in this reference architecture highlight several key Airflow best practices and features:
include folder and imported into the DAG file to be used in tasks using the BigQueryInsertJobOperator. This makes the DAG code more readable and offers the ability to reuse SQL queries across multiple DAGs.This reference architecture contains a dbt Core project. Astro customers can deploy dbt projects to Astro using the dbt Deploys feature. This feature allows you to deploy dbt Core projects from any code location, including separate repositories, to an Astro Deployment with enhanced observability in the Astro UI.

If you’d like to build your own ELT or ETL pipeline with BigQuery, dbt Core, and Apache Airflow®, feel free to fork the repository and adapt it to your use case. We recommend deploying the Airflow pipelines using a free trial of Astro.