Configure Kerberos authentication for Airflow databases

Astro Private Cloud supports Kerberos authentication for Airflow deployment databases, allowing you to connect to Kerberized PostgreSQL databases. This feature is available starting in Astro Private Cloud 1.1.0.

Overview

Kerberos is an authentication protocol that uses tickets to allow secure authentication in network environments. In enterprise environments with strict security requirements, databases are often configured to use Kerberos authentication instead of traditional username and password authentication.

With Kerberos database support in Astro Private Cloud, you can:

  • Connect Airflow deployments to Kerberized PostgreSQL databases
  • Maintain compliance with enterprise security policies that require Kerberos authentication
  • Use existing Kerberos infrastructure for database authentication
  • Support both unified and control plane/data plane deployment modes

How it works

Astro Private Cloud uses PgBouncer as a proxy between Airflow components and the Kerberized database. When you enable Kerberos for a deployment:

  1. You provide labels and environment variables for PgBouncer Pods via the Houston API when creating or updating a deployment.
  2. Your Kerberos credential injection mechanism (such as a mutation webhook) uses these labels to inject Kerberos credentials (keytabs or credential refresh sidecars) into the PgBouncer Pods.
  3. PgBouncer authenticates to the PostgreSQL database using GSSAPI (Kerberos protocol).
  4. Airflow components connect to PgBouncer using standard authentication, and PgBouncer proxies the connection to the Kerberized database.

Architecture

Astro Private Cloud 1.1.0 supports two deployment modes with different Kerberos configurations:

Unified mode

In unified mode, the control plane and data plane are installed in the same Kubernetes cluster.

Kerberos architecture in Unified mode

Control plane/data plane mode

In control plane/data plane mode, you can configure separate Kerberos authentication for the control plane database and data plane (Airflow) databases:

Kerberos architecture in CP/DP mode

In control plane/data plane mode, you can use separate Active Directory instances and Kerberos realms for the control plane and data plane. This allows for greater security isolation between control plane and Airflow deployment databases.

Prerequisites

Before configuring Kerberos authentication, ensure you have:

  • Astro Private Cloud 1.1.0 or later
  • A Kerberized PostgreSQL database (PostgreSQL 18 has known issues)
  • Kerberos infrastructure:
    • Active Directory or MIT Kerberos KDC (Key Distribution Center)
    • Network connectivity between your Kubernetes cluster and the KDC
    • For CP/DP mode: Optionally, separate Active Directory instances for control plane and data plane
  • A mechanism to inject Kerberos credentials into PgBouncer Pods. See Kerberos credential injection.
  • A Kerberos user principal created in your Active Directory
  • Houston API access to create deployments

Responsibility model

Kerberos database authentication in Astro Private Cloud follows a shared responsibility model:

Astronomer responsibilities

  • Providing PgBouncer images with Kerberos (GSSAPI) support
  • Supporting labels and environment variables on PgBouncer Pods via the Houston API
  • Maintaining deployment stability during updates

Customer responsibilities

  • Setting up and managing Kerberos infrastructure (Active Directory, KDC, etc.)
  • Creating and managing Kerberos user principals and keytabs
  • Implementing a mechanism to inject Kerberos credentials into PgBouncer Pods (such as a mutation webhook)
  • Configuring appropriate labels and environment variables via the Houston API to trigger credential injection
  • Creating Kerberos users in the PostgreSQL database with appropriate permissions
  • Pre-creating Airflow databases for deployments
  • Managing Kerberos ticket lifecycle (renewal, rotation)

Scope and limitations

The following are supported in Astro Private Cloud 1.1.0 with Kerberos Authentication:

  • Executor: Kubernetes Executor only
  • Deployment Type: Image-based deployments only
  • Database: PostgreSQL with Kerberos authentication
  • Deployment Modes: Both unified and control plane/data plane modes

Future releases may expand support to other executors, deployment types, and databases.

Kerberos credential injection

You are responsible for implementing a mechanism to inject Kerberos credentials into PgBouncer Pods. One common approach is using a Kubernetes mutation webhook.

A mutation webhook can automatically inject Kerberos credentials when PgBouncer Pods are created. The webhook typically:

  1. Watches for Pod creation requests with specific labels that you configure via the Houston API
  2. Injects a sidecar container that manages Kerberos ticket renewal
  3. Mounts Kerberos configuration files (krb5.conf) and keytabs as volumes
  4. Configures environment variables for Kerberos authentication

When creating a deployment, you’ll specify labels in the pgbouncerConfig section that trigger your webhook to inject the necessary credentials.

Alternative approaches

Other methods for credential injection include:

  • Init containers that fetch credentials from a secret management system
  • Direct volume mounts of Kerberos keytabs from Kubernetes secrets
  • Service mesh sidecars

Choose the approach that best fits your organization’s security requirements and infrastructure.

For implementation guidance, contact Astronomer support or your Astronomer representative.

Step 1: Configure cluster settings

Before creating Kerberos-enabled deployments, you must enable manual connection strings in your cluster configuration.

For unified mode

  1. Open your control plane values.yaml file.

  2. Add the following configuration:

1global:
2 airflow:
3 images:
4 pgbouncer:
5 repository: "quay.io/astronomer/ap-pgbouncer-krb"
6 tag: "1.25.0-2"
  1. Push the configuration change. See Apply a config change.

For control plane/data plane mode

  1. Configure the data plane cluster to enable manual connection strings. See Update data plane cluster configurations for instructions on updating cluster-specific settings.

  2. Add the following to your data plane cluster configuration:

1deployments:
2 manualConnectionStrings:
3 enabled: true
4
5global:
6 airflow:
7 images:
8 pgbouncer:
9 repository: "quay.io/astronomer/ap-pgbouncer-krb"
10 tag: "1.25.0-2"

Step 2: Create the Kerberos database user

You must create a Kerberos user in your PostgreSQL database with the appropriate permissions.

In control plane/data plane mode, create separate Kerberos users for:

  • The control plane database (if using a Kerberized control plane database)
  • Each data plane’s Airflow databases
  1. Connect to your PostgreSQL database using a superuser account.

  2. Create the Kerberos user. The username must be in the format <username>@<REALM>:

1CREATE USER "astro_user@APC.ASTRONOMER.IO" WITH LOGIN;
  1. Grant the necessary permissions:
1-- For AWS RDS with Kerberos
2GRANT rds_ad TO "astro_user@APC.ASTRONOMER.IO";
3
4-- Database and schema permissions
5GRANT ALL PRIVILEGES ON DATABASE postgres TO "astro_user@APC.ASTRONOMER.IO";
6GRANT ALL PRIVILEGES ON SCHEMA public TO "astro_user@APC.ASTRONOMER.IO";
7
8-- Allow database creation
9ALTER ROLE "astro_user@APC.ASTRONOMER.IO" CREATEDB;

Replace astro_user with your Kerberos username and APC.ASTRONOMER.IO with your Kerberos realm.

Step 3: (Optional) Configure PgBouncer in the control plane

If you need to use a Kerberized database for the control plane (Houston’s database), you must configure PgBouncer in the control plane namespace.

Create PgBouncer configuration

  1. Create a pgbouncer.ini file with the following contents. Replace the placeholders with your actual values:
1[databases]
2* = host=<rds-hostname> port=5432 user=<kerberos_user>@<kerberos_realm>
3
4[pgbouncer]
5pool_mode = transaction
6listen_addr = 0.0.0.0
7listen_port = 6543
8idle_transaction_timeout = 60
9transaction_timeout = 60
10autodb_idle_timeout = 60
11admin_users = postgres
12client_idle_timeout = 60
13server_idle_timeout = 60
14track_extra_parameters = search_path
15stats_users = postgres
16
17# Authentication settings
18auth_type = md5
19auth_file = /etc/pgbouncer/users.txt
20ignore_startup_parameters = extra_float_digits
21
22# Kerberos settings
23server_gssauth_negotiate = allow
24server_krb_spn = postgres/<rds-hostname>@<kerberos_realm>
25
26max_client_conn = 200
27verbose = 2
28log_disconnections = 0
29log_connections = 0
30
31# Strongly recommended for RDS
32server_tls_sslmode = prefer
  1. Generate a password hash for the PgBouncer users.txt file:
$echo -n "md5"; echo -n "<password><user>" | md5sum | awk '{print $1}'
  1. Create a users.txt file with the password hashes:
"<kerberos_user>" "<output_from_above_command>"
"postgres" "<output_from_above_command>"

The password authentication in users.txt is used for Houston to connect to PgBouncer. PgBouncer then authenticates to the PostgreSQL database using Kerberos (GSSAPI).

  1. Create the Kubernetes secret:
$kubectl -n astronomer create secret generic astronomer-pgbouncer-config \
> --from-file=pgbouncer.ini=./pgbouncer.ini \
> --from-file=users.txt=./users.txt \
> --dry-run=client -o yaml | kubectl apply -f -

Update control plane configuration

  1. Open your control plane values.yaml file.

  2. Add the following PgBouncer configuration:

1pgbouncer:
2 enabled: true
3 repository: "quay.io/astronomer/ap-pgbouncer-krb"
4 tag: "1.25.0-2"
5 securityContext:
6 runAsGroup: 65534
7 runAsNonRoot: true
8 runAsUser: 65534
9
10global:
11 pgbouncer:
12 enabled: true
13 extraEnv: []
14 extraLabels: []
15 gssSupport: true
16 secretName: astronomer-pgbouncer-config
17 securityContext:
18 runAsGroup: 65534
19 runAsUser: 65534
20 servicePort: "6543"
21 username: postgres
  1. Update the Astronomer bootstrap secret to point to the PgBouncer service:
postgres://<username>:<password>@astronomer-pgbouncer.astronomer.svc.cluster.local:6543?pgbouncer=true&connection_limit=100&pool_timeout=60&prisma_connection_limit=100

The username and password are the user and password from the users.txt file above.

The query parameters pgbouncer=true&connection_limit=100&pool_timeout=60&prisma_connection_limit=100 are critical for Prisma to work correctly with PgBouncer. Without these parameters, the control plane may experience connection issues.

  1. Upgrade the control plane installation. You must perform the upgrade in two steps:

    Step 1: First, upgrade with the --no-hooks flag. This installs PgBouncer in the control plane without running database migration jobs:

$helm upgrade astronomer astronomer/astronomer \
> --namespace astronomer \
> -f <values_file>.yaml \
> --version <version> \
> --no-hooks

The <version> is the APC version you want to upgrade to (e.g., 1.1.0).

Step 2: After the upgrade completes and PgBouncer is running, run the upgrade again without the --no-hooks flag. This runs the database migration jobs:

$helm upgrade astronomer astronomer/astronomer \
> --namespace astronomer \
> -f <values_file>.yaml \
> --version <version>

The <version> is the APC version you want to upgrade to (e.g., 1.1.0).

The two-step upgrade process is necessary because:

  1. The first upgrade installs PgBouncer, which is required for database connectivity when using a Kerberized database.
  2. The second upgrade runs database migration hooks that depend on PgBouncer being available.

Step 4: Create an Airflow deployment database

Before creating a Kerberos-enabled deployment, you must manually create the Airflow database.

Follow these steps to create the database:

  1. Create a PostgreSQL client Pod for database operations:
1apiVersion: v1
2kind: Pod
3metadata:
4 labels:
5 release: astronomer
6 tier: astronomer
7 name: postgres-debug-client
8 namespace: astronomer
9spec:
10 containers:
11 - command:
12 - /bin/bash
13 - -c
14 - sleep infinity
15 image: postgres:16
16 imagePullPolicy: Always
17 name: psql-client
  1. Apply the Pod:
$kubectl apply -f postgres-debug-client.yaml
  1. Exec into the Pod and connect to the database:
$kubectl -n astronomer exec -it postgres-debug-client -- bash
  1. Connect to your database. If using PgBouncer in the control plane:
$psql "postgres://<username>:<password>@astronomer-pgbouncer.astronomer.svc.cluster.local:6543"

The username and password in the connection string should match the credentials configured in your users.txt file from Step 3.

Or connect directly to your Kerberized database (ensure you have appropriate credentials).

  1. Create the Airflow database:
1CREATE DATABASE <deployment_name>_airflow OWNER "<kerberos_user>@<kerberos_realm>";

For example:

1CREATE DATABASE mydeployment_airflow OWNER "astro_user@APC.ASTRONOMER.IO";

Step 5: Create a Kerberos-enabled deployment

Kerberos-enabled deployments must be created using the Houston API. They cannot be created from the Astro UI.

Use the upsertDeployment mutation

  1. Compose your mutation payload. The following example shows the required fields for a Kerberos-enabled deployment:
1{
2 "cloudRole": "",
3 "dagDeployment": {
4 "nfsLocation": "",
5 "type": "image"
6 },
7 "dagProcessor": {
8 "replicas": 1
9 },
10 "kerberosEnabled": true,
11 "deploymentUuid": "",
12 "deployRevisionDescription": "",
13 "description": "",
14 "dockerconfigjson": null,
15 "environmentVariables": [],
16 "executor": "KubernetesExecutor",
17 "image": "",
18 "label": "my-kerberos-deployment",
19 "metadataConnection": "",
20 "skipAirflowDatabaseProvisioning": true,
21 "metadataConnectionJson": {
22 "protocol": "postgresql",
23 "user": "astro_user@APC.ASTRONOMER.IO",
24 "pass": "no-pass",
25 "host": "<airflow-db-host>",
26 "port": 5432,
27 "db": "mydeployment_airflow"
28 },
29 "mode": "helm",
30 "namespace": "",
31 "properties": {
32 "extra_capacity": {
33 "cpu": 0,
34 "memory": 0
35 }
36 },
37 "releaseName": "mydeployment",
38 "resultBackendConnection": "",
39 "resultBackendConnectionJson": {
40 "protocol": "postgresql",
41 "user": "astro_user@APC.ASTRONOMER.IO",
42 "pass": "no-pass",
43 "host": "<airflow-db-host>",
44 "port": 5432,
45 "db": "mydeployment_airflow"
46 },
47 "rollbackEnabled": false,
48 "runtimeVersion": "12.0.0",
49 "scheduler": {
50 "replicas": 1
51 },
52 "triggerer": {},
53 "webserver": {},
54 "workers": {},
55 "pgbouncerConfig": {
56 "labels": {
57 "key": "value"
58 },
59 "env": [
60 {"name": "KERBEROS_USER", "value": "astro_user@APC.ASTRONOMER.IO"},
61 {"name": "KERBEROS_PASSWORD", "value": "YourKerberosPassword"}
62 ],
63 "extraIniResultBackend": "user=astro_user@APC.ASTRONOMER.IO",
64 "extraIniMetadata": "user=astro_user@APC.ASTRONOMER.IO",
65 "extraIni": "server_gssauth_negotiate = allow\\nserver_krb_spn = postgres/<airflow-db-host>@APC.ASTRONOMER.IO",
66 "sslmode": "prefer"
67 },
68 "workspaceLabel": "my-workspace",
69 "workspaceUuid": "<workspace-uuid>"
70}

Important configuration fields

  • kerberosEnabled: Must be set to true. This enables Houston to perform Kerberos-specific validation.
  • skipAirflowDatabaseProvisioning: Must be set to true because you manually create the Airflow database.
  • metadataConnectionJson and resultBackendConnectionJson:
    • user: Must be in the format <username>@<REALM>
    • pass: Can be any value (e.g., "no-pass") since PgBouncer uses Kerberos for database authentication
    • host: The hostname of your Kerberized PostgreSQL database
    • db: The database name you created (e.g., mydeployment_airflow)
  • pgbouncerConfig:
    • labels: Custom labels for the PgBouncer Pod. Use these labels to trigger your Kerberos credential injection mechanism (e.g., "krb-inject": "enabled" or "component": "pgbouncer").
    • env: Environment variables for the PgBouncer Pod. Your credential injection mechanism can use these to configure Kerberos authentication (e.g., KERBEROS_USER, KERBEROS_PASSWORD).
    • extraIniMetadata and extraIniResultBackend: Must specify the Kerberos user.
    • extraIni: Must include Kerberos/GSS settings:
      • server_gssauth_negotiate = allow
      • server_krb_spn = postgres/<airflow-db-host>@<kerberos_realm>
    • sslmode: Set to prefer for RDS or other TLS-enabled databases

In control plane/data plane mode, ensure the Airflow database host, Kerberos realm, and Kerberos user you specify are for the data plane, not the control plane.

  1. Execute the mutation using the Houston API.

Step 6: Verify Kerberos authentication

After creating your deployment, verify that Kerberos authentication is working correctly.

Verify deployment creation

  1. List the Pods in the deployment namespace:
$kubectl -n <airflow-namespace> get pods
  1. Verify that all Airflow Pods are running:
$kubectl -n <airflow-namespace> get pods -l component=scheduler
$kubectl -n <airflow-namespace> get pods -l component=webserver
$kubectl -n <airflow-namespace> get pods -l component=pgbouncer

Verify database connectivity

  1. Check the PgBouncer logs for successful connections:
$kubectl -n <airflow-namespace> logs <pgbouncer-pod-name>

Look for log entries showing connections using the Kerberos principal.

  1. Verify that the Airflow UI loads successfully:

    a. Log in to the Astro UI.

    b. Navigate to your deployment.

    c. Click Airflow UI.

If the Airflow UI loads without errors, PgBouncer is successfully connecting to the database using Kerberos authentication.

Troubleshooting

PgBouncer Pod fails to start

If the PgBouncer Pod fails to start, check the following:

  • Verify that your Kerberos credential injection mechanism is configured correctly.
  • Check the Pod events for error messages:
    $kubectl -n <airflow-namespace> describe pod <pgbouncer-pod-name>
  • Verify that the labels in pgbouncerConfig match what your credential injection mechanism expects.

Kerberos authentication failures

If authentication fails, verify:

  • The Kerberos user exists in your Active Directory and PostgreSQL database.
  • The server_krb_spn in the PgBouncer configuration matches your database hostname and realm.
  • Network connectivity between the Kubernetes cluster and the KDC.
  • Your Kerberos credential injection mechanism is working correctly.
  • In CP/DP mode, you’re using the correct Kerberos realm and credentials for the data plane (not the control plane).

Airflow UI fails to load

If the Airflow UI fails to load:

  • Check the webserver and scheduler logs for database connection errors.
  • Verify that the database was created with the correct owner.
  • Ensure the metadataConnectionJson and resultBackendConnectionJson are correctly configured.
  • Verify the pgbouncerConfig settings, especially the extraIni configuration.

Additional resources