The Astro Data Plane is designed to withstand in-region Availability Zone (AZ) degradations and outages as described in Resilience. For full region outages on AWS dedicated clusters, Astro supports self-service cross-region disaster recovery (DR).
For a detailed overview of Astro’s AWS disaster recovery architecture, see the AWS disaster recovery whitepaper in the Astronomer Trust Center.
Self-service cross-region disaster recovery requires the Enterprise Business Critical tier and is currently available for AWS dedicated clusters only. GCP and Azure support are planned for later this year.
Cross-region DR lets you configure a pair of dedicated clusters — a primary and a secondary — in different AWS regions. The secondary cluster stays continuously synchronized with the primary so you can fail over with minimal downtime and data loss. After failover, Astro automatically enables synchronization in the reverse direction, keeping the original primary ready for failback. When the primary region recovers, you can fail back with a single click.
The following table defines the recovery time objective (RTO) and recovery point objective (RPO) for DR clusters. Targets are benchmarked with 80+ Deployments and 1,250+ concurrent task runs.
See Task Logs Replication SLA for details on the RPO guarantee.
The following items transfer to the secondary cluster automatically during failover:
The following items do not transfer automatically and require manual steps after configuring the secondary cluster:
imagePullSecrets for Kubernetes Pod Operators (KPOs)