Astro Private Cloud (APC) tracks the operational status of every data plane cluster so that workloads only run on healthy infrastructure. This page describes the cluster status values, how the APC API determines status, the GraphQL operations for querying and updating status, and how to troubleshoot unhealthy clusters.
Every operation on this page requires an APC API token sent as a bearer credential. Send your token in the Authorization header on each request to the APC GraphQL endpoint:
For step-by-step instructions on obtaining a user token or creating a system service account token, see Authenticate to the APC API.
Cluster operations are gated by RBAC permissions. The following table maps each operation to the permission APC API checks and the default role that grants it.
The System Admin role inherits every system.clusters.* permission.
The APC API derives cluster status from the healthStatus field in the deployment orchestrator’s /metadata response. The mapping is binary:
A CronJob in the control plane reconciles cluster metadata by calling the deployment orchestrator’s /metadata endpoint. The default schedule is 0 * * * * (every hour at minute 0), and is configurable through the houston.syncDataplaneClusters.schedule value on the Astronomer Helm chart.
You can list the reconcile CronJob and recent runs with the following command:
The paginatedClusters query returns clusters the caller has access to. Pagination uses the take argument, plus either cursor (a cluster UUID) or pageNumber. The response object contains a clusters list and a total count.
The healthStatus field returns a JSON object containing the full health payload the APC API received from the deployment orchestrator, not a single string. The statusReason field is also a JSON object. See Update cluster status for the shape the APC API writes.
Other supported filter arguments include searchPhrase, k8sVersion, id, sortBy, and sortDirection.
A user with permission to update clusters can change a cluster’s status manually. The statusReason argument accepts a JSON object whose shape isn’t enforced by the schema, but APC API itself writes the value the deployment orchestrator returns in its /metadata response when reconciling. To stay consistent, use the same shape APC API uses or include a descriptive message field.
For status changes, supply id (required), status, and statusReason. The updateCluster mutation also accepts name and deploymentsConfigOverride for non-status changes; see Update data plane cluster configurations for those workflows.
APC API blocks configuration updates (deploymentsConfigOverride, name) while the cluster status is INACTIVE and returns the error This operation is not allowed as the cluster is not active. Status itself can still be updated in any state.
Use the reconcileClusterMetadataJob query to make APC API refetch metadata from the deployment orchestrator immediately, instead of waiting for the next CronJob run. The query accepts a list of cluster UUIDs; if you pass null or omit the argument, APC API reconciles every cluster the caller is authorized to update.
A cluster appears in skippedClusterIds when it lacks a data plane URL or when the caller isn’t authorized to reconcile it.
Use this query in the following situations:
From a Pod in the control plane namespace with network access to the deployment orchestrator, call the metadata endpoint:
A healthy response includes (among other fields) the following:
The full response also includes mode, dataplaneUrl, dataplaneId, releaseName, releaseNamespace, dbType, namespacePools, and registry.
INACTIVEPossible causes:
/metadata endpoint returns a non-2xx response or a payload without healthStatus: "HEALTHY".Resolution steps:
Test connectivity from APC API (replace <release-name> with your Helm release, default astronomer):
After the underlying issue is resolved, force a reconciliation through the reconcileClusterMetadataJob query.
APC API returns this error when a configuration update is attempted on a non-ACTIVE cluster:
Resolution:
reconcileClusterMetadataJob to refresh status.ACTIVE:INACTIVE and surface status on operations dashboards.statusReason when manually changing status. The reason is preserved in the cluster record and is useful when diagnosing later incidents.INACTIVE cluster doesn’t affect every workload.