Disaster recovery | Astronomer Documentation

Cross-region disaster recovery (AWS dedicated clusters)

Self-service cross-region disaster recovery requires the Enterprise Business Critical tier and is currently available for AWS dedicated clusters only. GCP and Azure support are planned for later this year.

Cross-region DR lets you configure a pair of dedicated clusters — a primary and a secondary — in different AWS regions. The secondary cluster stays continuously synchronized with the primary so you can fail over with minimal downtime and data loss. After failover, Astro automatically enables synchronization in the reverse direction, keeping the original primary ready for failback. When the primary region recovers, you can fail back with a single click.

How AWS disaster recovery works

The primary cluster runs all Deployments in Region A.

A multi-region database replicates Deployment metadata to the secondary cluster in Region B.

Multi-region object storage copies task logs to the secondary cluster.

User-deployed images are replicated to the secondary cluster.

On failover, the secondary cluster is promoted to active. All Deployments, configuration, environment variables, connections, and Airflow variables transfer automatically.

Clusters and Deployments retain their IDs, names, namespaces, and system-managed configuration after failover. All hostnames — including the Airflow UI, Airflow API, and Remote Execution API URLs — are updated to point to the secondary cluster and remain the same.

RTO and RPO

The following table defines the recovery time objective (RTO) and recovery point objective (RPO) for DR clusters. Targets are benchmarked with 80+ Deployments and 1,250+ concurrent task runs.

Metric	Target
Recovery time objective (RTO)	Less than one hour
Recovery point objective (RPO)	Less than 15 minutes (requires Task Logs Replication SLA)

See Task Logs Replication SLA for details on the RPO guarantee.

What gets failed over

The following items transfer to the secondary cluster automatically during failover:

Deployments and data pipelines

Dag run history, task instance metadata, and XComs

Deployment configuration

Environment variables, connections, Airflow variables, and metrics exports — whether configured via Environment Manager or directly on the Deployment

Task logs. Enable Task Logs Replication SLA for a guaranteed 15-minute RPO.

The following items do not transfer automatically and require manual steps after configuring the secondary cluster:

Networking and DNS configuration. Configure using self-service features such as VPC peering or Customer Managed Egress, or work with Astronomer support.

imagePullSecrets for Kubernetes Pod Operators (KPOs)

Customer-managed workload identities. You must configure the OIDC issuer and IAM trust policies for the secondary cluster separately. See Workload identity.

Customer-managed Transit Gateway routing on the secondary cluster