Debug an Astro Private Cloud installation

Use this guide when your Astro Private Cloud (APC) control plane or data plane Pods are not progressing to a healthy state after installation.

Ensure platform components are reaching full availability

Work through the following checks to from controllers to individual containers to isolate possible causes when Pods don’t reach the READY state.

1. Verify controllers and ReplicaSets

List Deployments, StatefulSets, and ReplicaSets in your namespace and confirm the latest ReplicaSet or StatefulSet shows the expected number of available replicas:

$ kubectl get deployment,statefulset,replicaset -n <astronomer namespace>

Identify the most recent ReplicaSet for the component that is failing, with results sorted by creation timestamp:

$ kubectl get replicaset -n <astronomer namespace> --sort-by=.metadata.creationTimestamp

Inspect the returned ReplicaSet for status and events that may be preventing Pods from launching:
```
$ kubectl describe replicaset <replicaset-name> -n <astronomer namespace>
```
Resolve issues such as insufficient resources, pull errors, or missing secrets, then re-check the ReplicaSet until .status.availableReplicas matches .spec.replicas.

2. Examine Pods and namespace events

List Pod status:

$ kubectl get pods -n <astronomer namespace>

describe a failing Pod to view events, container status, and scheduling details:

$ kubectl describe pod <pod-name> -n <astronomer namespace>

Review recent events in the namespace for additional context:

$ kubectl get events -n <astronomer namespace> --sort-by=.lastTimestamp

3. Inspect container logs

If a Pod continues to restart or stuck in CrashLoopBackOff, gather logs for each container:

$ kubectl logs <pod-name> -c <container-name> -n <astronomer namespace>

If the container restarts quickly, use --previous to view logs from the last attempt:

$ kubectl logs <pod-name> -c <container-name> -n <astronomer namespace> --previous

Use the collected errors to adjust your configuration, for example, by fixing database credentials or registry access. After remediation, re-run kubectl get pods to confirm all Pods report READY status. If problems persist, collect the relevant logs and events and contact Astronomer support.

Houston Pods stuck in CrashLoopBackOff

Houston (API) connects directly to the control-plane database during startup. If the Pods restart repeatedly:

List Pods to verify their status:

$ kubectl get pods -n <astronomer namespace>

Test connectivity to the database from inside the cluster:

$ kubectl run psql --rm -it --restart=Never --namespace <astronomer namespace> \
>   --image bitnami/postgresql --command -- \
>   psql $(kubectl get secret -n <astronomer namespace> <platform-release-name>-houston-backend \
>     --template='{{.data.connection | base64decode }}' | sed 's/?.*//g')

If the connection times out, investigate networking or firewall rules between Kubernetes nodes and the Postgres host.

Confirm the astronomer-bootstrap secret contains the correct connection string:

$ kubectl get secret astronomer-bootstrap -n <astronomer namespace> -o yaml

Decode the connection value and fix any typos. After updating the secret, delete the Houston and Grafana Pods so they pick up the change.

x509 “certificate signed by unknown authority” while pulling images

If image pulls fail with a certificate error, such as when syncing registry certificates, restart the Houston Pods followed by the platform registry Pod. Ensure any custom certificate authorities are configured under global.privateCaCerts and applied via helm upgrade.

Houston worker showing NATS timeout errors after installation

After installing or upgrading APC, you might encounter issues where Deployments appear in the Astro CLI and database, but their Kubernetes namespaces are not created. Houston logs might show UnhandledPromiseRejectionWarning: NatsError: TIMEOUT.

This occurs when the NATS JetStream cluster has not yet elected a metadata leader before the Houston worker Pods attempt to set up streams and consumers.

To resolve:

Verify Houston worker Pods are showing NATS timeout errors:

$ kubectl logs -l component=houston-worker -n <astronomer namespace>

Restart the Houston worker Pods to allow them to reconnect after the NATS leader election completes:

$ kubectl rollout restart deployment <platform-release-name>-houston-worker -n <astronomer namespace>

Confirm Deployment namespaces are created:

$ kubectl get namespaces

After the Houston worker Pods restart, they successfully create the necessary Kubernetes resources for your deployments.

Houston worker showing NatsError: 503 after installation

After installing or upgrading APC, Houston and Houston worker Pods may start successfully but silently fail to connect to NATS JetStream. Houston logs might show repeated UnhandledPromiseRejectionWarning: NatsError: 503 entries shortly after startup.

This occurs when Houston and Houston worker start and attempt to initialize JetStream connections before the NATS JetStream subsystem has finished initializing. Both components call jetstreamManager() during startup, which requires JetStream to be fully ready — not just the NATS TCP port. When this API call is made during the JetStream initialization window, NATS returns a 503 “No Responders” error. Because neither component retries on 503, they continue running with broken or missing JetStream connections and silently drop all deployment events. As a result, Deployments may appear in the Astro CLI and database but their Kubernetes namespaces are never created.

To resolve:

Verify that all NATS Pods are healthy and JetStream has finished initializing:

$ kubectl get pods -l app=<platform-release-name>-nats -n <astronomer namespace>

Wait until all NATS Pods report READY status before continuing. You can also inspect the NATS monitoring endpoint directly from inside the Pod to confirm JetStream is responding:
```
$ kubectl exec -n <astronomer namespace> <platform-release-name>-nats-0 -- \
>   curl -s http://localhost:8222/jsz
```
The response should contain JetStream stream and consumer statistics. If the request fails or returns an error, wait and retry before proceeding.

Restart both Houston and Houston worker Pods:

$ kubectl rollout restart deployment \
>   <platform-release-name>-houston \
>   <platform-release-name>-houston-worker \
>   -n <astronomer namespace>

Confirm that Houston worker has established active JetStream subscriptions:

$ kubectl logs -l component=houston-worker -n <astronomer namespace> | grep -i "Running"

You should see a Running log line for each of the eight JetStream worker subjects, for example, NATS houston-upsert-deployment-for-create Running....

After both Pods restart and JetStream subscriptions are established, deployment operations resume normally.

$	kubectl run psql --rm -it --restart=Never --namespace <astronomer namespace> \
>	--image bitnami/postgresql --command -- \
>	psql $(kubectl get secret -n <astronomer namespace> <platform-release-name>-houston-backend \
>	--template='{{.data.connection \| base64decode }}' \| sed 's/?.*//g')

$	kubectl exec -n <astronomer namespace> <platform-release-name>-nats-0 -- \
>	curl -s http://localhost:8222/jsz

$	kubectl rollout restart deployment \
>	<platform-release-name>-houston \
>	<platform-release-name>-houston-worker \
>	-n <astronomer namespace>