Configure git-sync code deploys
Configure git-sync code deploys
Configure git-sync code deploys
You can deploy Dags to an Astro Private Cloud Deployment using git-sync. After setting up this feature, you can deploy Dags from a Git repository without any additional CI/CD. Dags deployed with git-sync automatically appear in the Airflow UI without requiring additional action or causing downtime. You can also roll back images with the Astro Private Cloud UI and Houston API.
git-sync-relay RWX volume does not work with azurefile-csi.This guide provides details about setup options and the steps for configuring git-sync as a Dag deploy option. It is worth noting that you can use both polling and webhook strategies.
When you configure git-sync, you must choose both a repo fetch mode and a repo share mode:
Repo fetch mode determines how the git-sync relay retrieves changes from your Git repository. Choose one of the following options:
The git-sync relay checks the remote Git repo for changes at clearly defined intervals. Use poll mode for repositories with frequent changes across branches. Note that the relay will be checking for updates continuously.
Tradeoff: Frequent polling generates unnecessary network traffic between your Deployment and the repository when changes are infrequent.
The git-sync relay fetches changes only when a push event fires from the Git repository. Use Webhook mode for repositories that don’t change frequently, to avoid unnecessary network traffic between your Deployment and the repository.
Tradeoff: If configured for a specific branch, the git-sync relay only downloads changes for that branch regardless of fetch mode. However, in Webhook mode, the webhook fires for every push event in the repository — not just pushes to the configured branch. This means a busy repository can still generate frequent webhook calls even when branch filtering is in place.
Repo share mode determines how the git-sync relay distributes the synced repository to Airflow pods in the Deployment. Choose one of the following options:
A git-daemon container serves the repository within the namespace using the Git protocol on port 9418. The Airflow Deployment contains a git-sync relay Pod with both a git-sync container that stores the Git repo and a git-daemon container that serves the repo to the namespace.
Tradeoff: All Airflow containers must clone the repository at startup, which can cause significant network use with large repositories and increase startup time.
The git repository contents are stored on a ReadWriteMany (RWX) storage volume mounted into each Airflow pod, which eliminates git clone activity between pods. The git-sync relay Pod pulls from the external Git repo and writes to the RWX volume.
Requirement: An RWX-compatible StorageClass volume. RWX-compatible StorageClasses aren’t included in standard Kubernetes. You must provision additional cloud infrastructure to support RWX volumes, and the configuration steps differ between cloud providers. See your cloud provider’s documentation for details.
To enable the git-sync deploy feature, you need:
To configure a git-sync deploy mechanism for a Deployment on APC, you need Workspace Editor permissions.
To deploy Dags to a Deployment using a git-sync deploy mechanism, you need permission to push code to a Git repository configured for git-sync deploys.
Git-sync deploys must be explicitly selected using the UI for each Airflow Deployment for both git-daemon and shared-volume modes.
However, for the shared-volume mode, an APC Admin must configure the RWX shared volume storage class name, storageClassName, in the Houston configuration.
For example, update your values.yaml file with the following values, including the path to your RWX compatible storage:
Workspace editors can configure a new or existing Airflow Deployment to use a git-sync mechanism for Dag deploys. From there, any member of your organization with write permissions to the Git repository can deploy Dags to the Deployment. To configure a Deployment for git-sync deploys:
In the Astro Private Cloud UI, create a new Airflow Deployment or open an existing one.
Go to the Dag Deployment section of the Deployment’s Settings page.
For your Mechanism, select Git Sync.
Configure the following values:
./. Other changes outside the Dags directory in your Git repository must be deployed using astro deploy.ssh-keyscan -t rsa <provider-domain>. For an example of how to retrieve GitHub’s public key, refer to Apache Airflow documentation.(Webhook Only) You can now open your GitHub repository and set up a Repository Webhook, or you can return to your Deployment details page to configure this later. Be sure to set the following configurations:
If you complete your Deployment configuration for git-sync and encounter an error during the first Deployment, you might need to force restart the Airflow Deployment at least once, several minutes after you initially create it. For example, you can add any new environment variable to your Deployment, like FOO=foo, to force the Deployment containers to restart.
After you see your Dags update in the Airflow UI, you can remove the environment variable.
After you configure your Deployment, any code pushes to your Dag directory of the specified Git repo and branch will appear in your Deployment with zero downtime.
Newly created Dag files can take up to five minutes (default configuration) from syncing to appear in the Airflow UI. To shorten this delay, we recommend tuning AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL in your Airflow deployment.
The Git repo you want to sync should contain a directory of Dags that you want to deploy to APC. You can include additional files in the repo, such as your other Astro project files, but note that this might affect performance when deploying new changes to Dags.
If you want to deploy Dags with a private Git repo, you additionally need to configure SSH so that your APC Deployment can access the contents of the repo. This process varies slightly between Git repository management tools. For an example of this configuration, read GitLab’s SSH Key documentation.
You can add Kubernetes scheduling configurations, tolerations, nodeSelector, and affinity, to your global git-sync relay configuration. These configurations allow you to:
These settings can allow you to comply with security or compliance requirements for workload isolation, optimize resource utilization by co-locating related components, and handle tainted nodes in mixed-use Kubernetes clusters. These are not required parameters for git-sync relay functionality, so you only need to add nodeSelector, affinity, or tolerations to you configuration if you need specific node placement for your git-sync-relay components.