Use this guide to trigger a data plane failover from the Astro Private Cloud (APC) UI. A failover moves all Airflow Deployments from a source cluster to a destination cluster. The process runs asynchronously — after you submit the request, APC handles execution without further input.
For a conceptual overview of how failover works, see Data plane failover. To enable the feature before triggering it, see Enable data plane failover.
failoverEnabled set to true. If the Trigger Failover button appears dimmed with the message “Failover isn’t enabled for this cluster”, this means one of the feature configuration requirements hasn’t been satisfied. Double check the configuration instructions, running pods, and pod logs.In the APC UI, go to the cluster list and click the source cluster — the cluster you want to fail over from.
In the top-right corner of the cluster details page, click Trigger Failover.
If the button appears dimmed, the source cluster isn’t eligible for failover. See the prerequisites above.
In the Destination Cluster dropdown, select the cluster you want to fail over to. The dropdown lists only clusters that APC considers valid targets for the source cluster.
If no destinations appear, verify that at least one other cluster is registered, healthy, and reachable from the control plane.
In the Mode dropdown, select one of the following:
After you submit a failover request, APC transitions each Deployment on the source cluster through its own migration state machine. You can monitor progress by checking cluster health status in the APC UI.
The source cluster status transitions to FAILING_OVER while the request is in progress. If any Deployment migration fails, the cluster status transitions to FAILOVER_FAILED. This status is only visible directly within the Postgres database.
Failover execution continues in the background even if you close the browser or navigate away from the cluster details page.
For each Deployment on the source cluster, APC:
After all Deployments migrate successfully, the failover request transitions to SUCCEEDED.
If you want a Dag run to resume after a failover, make sure the Dag sets retries greater than 0 and that the tasks within it are idempotent. Airflow reschedules on the destination cluster any tasks that were running on the source cluster when failover started. Airflow only retries them if the task’s retry count allows it.