Trigger failover and failback | Astronomer Documentation

Trigger failover

Failover is also supported using the Astro API.

Running, scheduled, and event-triggered tasks may be impacted during the failover window. Tasks may fail and require a retry.

Open the DR tab

In the Astro UI, go to Organization Settings > Clusters, select your primary cluster, and open the Disaster Recovery tab. Confirm the Status indicator shows the primary region is active.

Initiate failover

On the Disaster Recovery tab, click Failover. Alternatively, open the cluster’s actions menu (⋯) at the top right of the page and select Failover to Secondary…. Follow the prompts to confirm.

The secondary cluster is promoted to active, and all Deployments and data become available in the secondary cluster.

Validate Deployments

After failover completes, check the health and status of your Deployments, especially mission-critical ones. Validate that your Dags and tasks are running as expected and retry any failures if necessary.

Trigger failback

After the primary region recovers, you can fail back to the original primary cluster.

Failback is also supported using the Astro API.

Running, scheduled, and event-triggered tasks may be impacted during the failback window. Tasks may fail and require a retry.

Open the DR tab

In the Astro UI, go to Organization Settings > Clusters, select your original primary cluster, and open the Disaster Recovery tab. Confirm the Status indicator shows the cluster has failed over to the secondary region.

Initiate failback

On the Disaster Recovery tab, click Failback. Alternatively, open the cluster’s actions menu (⋯) at the top right of the page and select Failback to Primary…. Follow the prompts to confirm.

Universal Metrics Export in DR pairs

If you have Universal Metrics Export (UME) configured, the same UME configuration applies to both the primary and secondary clusters. Metrics exported from each cluster include a cloud_region attribute so you can distinguish data from each cluster in your metrics system.

After failover, update your UME settings if needed to reflect the new active cluster.