Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,17 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
- name: "Tests"
run: ./scripts/ci/ci_run_airflow_testing.sh

helm-tests:
timeout-minutes: 5
name: "Checks: Helm tests"
runs-on: ubuntu-latest
env:
CI_JOB_TYPE: "Tests"
steps:
- uses: actions/checkout@master
- name: "Helm Tests"
run: ./scripts/ci/ci_run_helm_testing.sh

requirements:
timeout-minutes: 80
name: "Requirements"
Expand Down
2 changes: 1 addition & 1 deletion CI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ environments we use. Most of our CI jobs are written as bash scripts which are e
the CI jobs and we are mapping all the CI-specific environment variables to generic "CI" variables.
The only two places where CI-specific code might be are:

- CI-specific declaration file (for example it is `<.github/workflow/ci.yml>`_ for GitHub Actions
- CI-specific declaration file (for example it is `<.github/workflows/ci.yml>`_ for GitHub Actions
- The ``get_environment_for_builds_on_ci`` function in `<scripts/ci/libraries/_build_images.sh>`_ where mapping is
performed from the CI-environment specific to generic values. Example for that is CI_EVENT_TYPE variable
which determines whether we are running a ``push``. ``schedule`` or ``pull_request`` kind of CI job. For
Expand Down
4 changes: 4 additions & 0 deletions airflow/kubernetes/worker_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,10 @@ def _get_init_containers(self) -> List[k8s.V1Container]:
name='GIT_SSH_KEY_FILE',
value='/etc/git-secret/ssh'
),
k8s.V1EnvVar(
name='GIT_SYNC_ADD_USER',
value='true'
),

@aneesh-joseph aneesh-joseph Jun 19, 2020

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replicating the chart config onto the kubernetes worker as well, without this config git sync with ssh may not work on the worker when git_sync_run_as_user is set to 50000

k8s.V1EnvVar(
name='GIT_SYNC_SSH',
value='true'
Expand Down
40 changes: 39 additions & 1 deletion chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ The command removes all the Kubernetes components associated with the chart and

## Updating DAGs

The recommended way to update your DAGs with this chart is to build a new docker image with the latest code (`docker build -t my-company/airflow:8a0da78 .`), push it to an accessible registry (`docker push my-company/airflow:8a0da78`), then update the Airflow pods with that image:
The recommended way to update your DAGs with this chart is to build a new docker image with the latest DAG code (`docker build -t my-company/airflow:8a0da78 .`), push it to an accessible registry (`docker push my-company/airflow:8a0da78`), then update the Airflow pods with that image:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will want to change that description as it is not valid any more and we have "EMBED_DAGS" directive while building the dockerfiles + we will add on-build most likely to add the DAGs on building deppendent image but I will do it separately.


```bash
helm upgrade airflow . \
Expand All @@ -76,6 +76,42 @@ helm upgrade airflow . \

For local development purpose you can also build the image locally and use it via deployment method described by Breeze.

## Mounting DAGS using Git-Sync side car with Persistence enabled

This option will use a Persistent Volume Claim with an accessMode of `ReadWriteMany`. The scheduler pod will sync DAGs from a git repository onto the PVC every configured number of seconds. The other pods will read the synced DAGs. Not all volume plugins have support for `ReadWriteMany` accessMode. Refer [Persistent Volume Access Modes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) for details

```bash
helm upgrade airflow . \
--set dags.persistence.enabled=true \
--set dags.gitSync.enabled=true
# you can also override the other persistence or gitSync values
# by setting the dags.persistence.* and dags.gitSync.* values
# Please refer to values.yaml for details
```

## Mounting DAGS using Git-Sync side car without Persistence

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have this option.!

This option will use an always running Git-Sync side car on every scheduler,webserver and worker pods. The Git-Sync side car containers will sync DAGs from a git repository every configured number of seconds. If you are using the KubernetesExecutor, Git-sync will run as an initContainer on your worker pods.

```bash
helm upgrade airflow . \
--set dags.persistence.enabled=false \
--set dags.gitSync.enabled=true
# you can also override the other gitSync values
# by setting the dags.gitSync.* values
# Refer values.yaml for details
```

## Mounting DAGS from an externally populated PVC
In this approach, Airflow will read the DAGs from a PVC which has `ReadOnlyMany` or `ReadWriteMany` accessMode. You will have to ensure that the PVC is populated/updated with the required DAGs(this won't be handled by the chart). You can pass in the name of the volume claim to the chart

```bash
helm upgrade airflow . \
--set dags.persistence.enabled=true \
--set dags.persistence.existingClaim=my-volume-claim
--set dags.gitSync.enabled=false
```


## Parameters

The following tables lists the configurable parameters of the Airflow chart and their default values.
Expand Down Expand Up @@ -159,6 +195,8 @@ The following tables lists the configurable parameters of the Airflow chart and
| `webserver.resources.requests.cpu` | CPU Request of webserver | `~` |
| `webserver.resources.requests.memory` | Memory Request of webserver | `~` |
| `webserver.defaultUser` | Optional default airflow user information | `{}` |
| `dags.persistence.*` | Dag persistence configutation | Please refer to `values.yaml` |
| `dags.gitSync.*` | Git sync configuration | Please refer to `values.yaml` |


Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
Expand Down
95 changes: 95 additions & 0 deletions chart/templates/_helpers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,80 @@
{{ end }}
{{- end }}

{{/* Git ssh key volume */}}
{{- define "git_sync_ssh_key_volume"}}
- name: git-sync-ssh-key
secret:
secretName: {{ .Values.dags.gitSync.sshKeySecret }}
defaultMode: 288
{{- end }}

{{/* Git sync container */}}
{{- define "git_sync_container"}}
- name: {{ .Values.dags.gitSync.containerName }}
image: "{{ .Values.dags.gitSync.containerRepository }}:{{ .Values.dags.gitSync.containerTag }}"
env:
{{- if .Values.dags.gitSync.sshKeySecret }}
- name: GIT_SSH_KEY_FILE
value: "/etc/git-secret/ssh"
- name: GIT_SYNC_SSH
value: "true"
{{- if .Values.dags.gitSync.knownHosts }}
- name: GIT_KNOWN_HOSTS
value: "true"
- name: GIT_SSH_KNOWN_HOSTS_FILE
value: "/etc/git-secret/known_hosts"
{{- else }}
- name: GIT_KNOWN_HOSTS
value: "false"
{{- end }}
{{ else if .Values.dags.gitSync.credentialsSecret }}
- name: GIT_SYNC_USERNAME
valueFrom:
secretKeyRef:
name: {{ .Values.dags.gitSync.credentialsSecret | quote }}
key: GIT_SYNC_USERNAME
- name: GIT_SYNC_PASSWORD
valueFrom:
secretKeyRef:
name: {{ .Values.dags.gitSync.credentialsSecret | quote }}
key: GIT_SYNC_PASSWORD
{{- end }}
- name: GIT_SYNC_REV
value: {{ .Values.dags.gitSync.rev | quote }}
- name: GIT_SYNC_BRANCH
value: {{ .Values.dags.gitSync.branch | quote }}
- name: GIT_SYNC_REPO
value: {{ .Values.dags.gitSync.repo | quote }}
- name: GIT_SYNC_DEPTH
value: {{ .Values.dags.gitSync.depth | quote }}
- name: GIT_SYNC_ROOT
value: {{ .Values.dags.gitSync.root | quote }}
- name: GIT_SYNC_DEST
value: {{ .Values.dags.gitSync.dest | quote }}
- name: GIT_SYNC_ADD_USER
value: "true"
- name: GIT_SYNC_WAIT
value: {{ .Values.dags.gitSync.wait | quote }}
- name: GIT_SYNC_MAX_SYNC_FAILURES
value: {{ .Values.dags.gitSync.maxFailures | quote }}
volumeMounts:
- name: dags
mountPath: {{ .Values.dags.gitSync.root }}
{{- if and .Values.dags.gitSync.enabled .Values.dags.gitSync.sshKeySecret }}
- name: git-sync-ssh-key
mountPath: /etc/git-secret/ssh
readOnly: true
subPath: gitSshKey
{{- if .Values.dags.gitSync.knownHosts }}
- name: config
mountPath: /etc/git-secret/known_hosts
readOnly: true
subPath: known_hosts
{{- end }}
{{- end }}
{{- end }}

# This helper will change when customers deploy a new image.
{{ define "airflow_image" -}}
{{ printf "%s:%s" (.Values.images.airflow.repository | default .Values.defaultAirflowRepository) (.Values.images.airflow.tag | default .Values.defaultAirflowTag) }}
Expand Down Expand Up @@ -185,9 +259,30 @@ log_connections = {{ .Values.pgbouncer.logConnections }}
{{ (printf "%s/logs" .Values.airflowHome) | quote }}
{{- end }}

{{ define "airflow_dags" -}}
{{- if .Values.dags.gitSync.enabled -}}
{{ (printf "%s/dags/%s/%s" .Values.airflowHome .Values.dags.gitSync.dest .Values.dags.gitSync.subPath ) }}
{{- else -}}
{{ (printf "%s/dags" .Values.airflowHome) }}
{{- end -}}
{{- end -}}

{{ define "airflow_dags_volume_claim" -}}
{{- if and .Values.dags.persistence.enabled .Values.dags.persistence.existingClaim -}}
{{ .Values.dags.persistence.existingClaim }}
{{- else -}}
{{ .Release.Name }}-dags
{{- end -}}
{{- end -}}

{{ define "airflow_dags_mount_path" -}}
{{ (printf "%s/dags" .Values.airflowHome) }}
{{- end }}

{{ define "airflow_config_path" -}}
{{ (printf "%s/airflow.cfg" .Values.airflowHome) | quote }}
{{- end }}

{{ define "airflow_webserver_config_path" -}}
{{ (printf "%s/webserver_config.py" .Values.airflowHome) | quote }}
{{- end }}
Expand Down
40 changes: 37 additions & 3 deletions chart/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ data:
# These are system-specified config overrides.
airflow.cfg: |
[core]
dags_folder = {{ include "airflow_dags" . }}
load_examples = False
colored_console_log = False
executor = {{ .Values.executor }}
Expand Down Expand Up @@ -87,13 +88,42 @@ data:
namespace = {{ .Release.Namespace }}
airflow_configmap = {{ include "airflow_config" . }}
airflow_local_settings_configmap = {{ include "airflow_config" . }}
worker_container_repository = {{ .Values.images.airflow.repository }}
worker_container_tag = {{ .Values.images.airflow.tag }}
worker_container_repository = {{ .Values.images.airflow.repository | default .Values.defaultAirflowRepository }}
worker_container_tag = {{ .Values.images.airflow.tag | default .Values.defaultAirflowTag }}
worker_container_image_pull_policy = {{ .Values.images.airflow.pullPolicy }}
worker_service_account_name = {{ .Release.Name }}-worker-serviceaccount
image_pull_secrets = {{ template "registry_secret" . }}
dags_in_image = True
dags_in_image = {{ if or .Values.dags.gitSync.enabled .Values.dags.persistence.enabled }}False{{ else }}True{{ end }}
delete_worker_pods = True
run_as_user = {{ .Values.uid }}
fs_group = {{ .Values.gid }}
{{- if or .Values.dags.gitSync.enabled .Values.dags.persistence.enabled }}
git_dags_folder_mount_point = {{ include "airflow_dags_mount_path" . }}
dags_volume_mount_point = {{ include "airflow_dags_mount_path" . }}
{{- if .Values.dags.persistence.enabled }}
dags_volume_claim = {{ .Release.Name }}-dags
dags_volume_subpath = {{.Values.dags.gitSync.dest }}/{{ .Values.dags.gitSync.subPath }}
{{- else }}
git_repo = {{ .Values.dags.gitSync.repo }}
git_branch = {{ .Values.dags.gitSync.branch }}
git_sync_rev = {{ .Values.dags.gitSync.rev }}
git_sync_depth = {{ .Values.dags.gitSync.depth }}
git_sync_root = {{ .Values.dags.gitSync.root }}
git_sync_dest = {{ .Values.dags.gitSync.dest }}
git_sync_container_repository = {{ .Values.dags.gitSync.containerRepository }}
git_sync_container_tag = {{ .Values.dags.gitSync.containerTag }}
git_sync_init_container_name = {{ .Values.dags.gitSync.containerName }}
git_sync_run_as_user = {{ .Values.uid }}
{{- if .Values.dags.gitSync.knownHosts }}
git_ssh_known_hosts_configmap_name = {{ include "airflow_config" . }}
{{- end }}
{{- if .Values.dags.gitSync.sshKeySecret }}
git_ssh_key_secret_name = {{ .Values.dags.gitSync.sshKeySecret }}
{{- else if .Values.dags.gitSync.credentialsSecret }}
git_sync_credentials_secret = {{ .Values.dags.gitSync.credentialsSecret }}
{{- end }}
{{- end }}
{{- end }}

[kubernetes_secrets]
AIRFLOW__CORE__SQL_ALCHEMY_CONN = {{ printf "%s=connection" (include "airflow_metadata_secret" .) }}
Expand All @@ -120,3 +150,7 @@ data:
airflow_local_settings.py: |
{{ .Values.scheduler.airflowLocalSettings | nindent 4 }}
{{- end }}
{{- if and .Values.dags.gitSync.enabled .Values.dags.gitSync.knownHosts }}
known_hosts: |
{{ .Values.dags.gitSync.knownHosts | nindent 4 }}
{{- end }}
41 changes: 41 additions & 0 deletions chart/templates/dags-persistent-volume-claim.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

{{- if and (not .Values.dags.persistence.existingClaim ) .Values.dags.persistence.enabled }}
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ .Release.Name }}-dags
labels:
tier: airflow
component: dags-pvc
release: {{ .Release.Name }}
chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
heritage: {{ .Release.Service }}
spec:
accessModes: [{{ .Values.dags.persistence.accessMode | quote }}]
resources:
requests:
storage: {{ .Values.dags.persistence.size | quote }}
{{- if .Values.dags.persistence.storageClass }}
{{- if (eq "-" .Values.dags.persistence.storageClass) }}
storageClassName: ""
{{- else }}
storageClassName: "{{ .Values.dags.persistence.storageClass }}"
{{- end }}
{{- end }}
{{- end }}
16 changes: 16 additions & 0 deletions chart/templates/scheduler/scheduler-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,11 @@ spec:
mountPath: {{ template "airflow_local_setting_path" . }}
subPath: airflow_local_settings.py
readOnly: true
{{- end }}
{{- if .Values.dags.gitSync.enabled }}
- name: dags
mountPath: {{ template "airflow_dags_mount_path" . }}
{{- include "git_sync_container" . | indent 8 }}
{{- end }}
# Always start the garbage collector sidecar.
- name: scheduler-gc
Expand Down Expand Up @@ -177,6 +182,17 @@ spec:
- name: config
configMap:
name: {{ template "airflow_config" . }}
{{- if .Values.dags.persistence.enabled }}
- name: dags
persistentVolumeClaim:
claimName: {{ template "airflow_dags_volume_claim" . }}
{{- else if .Values.dags.gitSync.enabled }}
- name: dags
emptyDir: {}
{{- end }}
{{- if and .Values.dags.gitSync.enabled .Values.dags.gitSync.sshKeySecret }}
{{- include "git_sync_ssh_key_volume" . | indent 8 }}
{{- end }}
{{- if not $stateful }}
- name: logs
emptyDir: {}
Expand Down
19 changes: 19 additions & 0 deletions chart/templates/webserver/webserver-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ spec:
restartPolicy: Always
securityContext:
runAsUser: {{ .Values.uid }}
fsGroup: {{ .Values.gid }}
{{- if or .Values.registry.secretName .Values.registry.connection }}
imagePullSecrets:
- name: {{ template "registry_secret" . }}
Expand All @@ -82,6 +83,9 @@ spec:
{{- include "custom_airflow_environment" . | indent 10 }}
{{- include "standard_airflow_environment" . | indent 10 }}
containers:
{{- if and (.Values.dags.gitSync.enabled) (not .Values.dags.persistence.enabled) }}
{{- include "git_sync_container" . | indent 8 }}
{{- end }}
- name: webserver
image: {{ template "airflow_image" . }}
imagePullPolicy: {{ .Values.images.airflow.pullPolicy }}
Expand All @@ -105,6 +109,10 @@ spec:
subPath: airflow_local_settings.py
readOnly: true
{{- end }}
{{- if or .Values.dags.gitSync.enabled .Values.dags.persistence.enabled }}
- name: dags
mountPath: {{ template "airflow_dags_mount_path" . }}
{{- end }}
{{- if .Values.webserver.extraVolumeMounts }}
{{ toYaml .Values.webserver.extraVolumeMounts | indent 12 }}
{{- end }}
Expand Down Expand Up @@ -134,6 +142,17 @@ spec:
- name: config
configMap:
name: {{ template "airflow_config" . }}
{{- if .Values.dags.persistence.enabled }}
- name: dags
persistentVolumeClaim:
claimName: {{ .Release.Name }}-dags

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If uses dags.persistence.existingClaim, it should be modified to
claimName: {{ .Values.dags.persistence.existingClaim }}.
Currently, the webserver pod cannot find pvc, so it is put in a pending state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have raised a PR to fix this - #9688

{{- else if .Values.dags.gitSync.enabled }}
- name: dags
emptyDir: {}
{{- if .Values.dags.gitSync.sshKeySecret }}
{{- include "git_sync_ssh_key_volume" . | indent 8 }}
{{- end }}
{{- end }}
{{- if .Values.webserver.extraVolumes }}
{{ toYaml .Values.webserver.extraVolumes | indent 8 }}
{{- end }}
Loading