Skip to content

Fix KPO hanging indefinitely when init_container_logs=True and pod stays in Pending#68450

Merged
jscheffl merged 1 commit into
apache:mainfrom
jayachandrakasarla:fix-68445-kpo-hangs-when-init-container-logs-true
Jun 12, 2026
Merged

Fix KPO hanging indefinitely when init_container_logs=True and pod stays in Pending#68450
jscheffl merged 1 commit into
apache:mainfrom
jayachandrakasarla:fix-68445-kpo-hangs-when-init-container-logs-true

Conversation

@jayachandrakasarla

Copy link
Copy Markdown
Contributor

Closes #68445

Problem

When KubernetesPodOperator is configured with init_container_logs=True, the task hangs indefinitely if the pod never leaves the Pending phase (e.g. due to an invalid node_selector, missing node pool, or resource exhaustion).

With init_container_logs=False, PodLaunchTimeoutException is raised correctly after startup_timeout_seconds / schedule_timeout_seconds. With init_container_logs=True, the task never times out and the pod is never cleaned up.

You can reproduce the issue using the following DAG code:

from __future__ import annotations

from pendulum import datetime

from airflow.sdk import dag
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
from kubernetes.client import models as k8s


@dag(
    dag_id="kpo_pending_init_container_logs",
    start_date=datetime(2025, 1, 1),
    schedule=None,
    catchup=False,
)
def kpo_pending_init_container_logs():
    KubernetesPodOperator(
        task_id="kpo_pending_with_init_logs",
        name="kpo-pending-with-init-logs",
        namespace="default",
        image="busybox:1.36",
        cmds=["sh", "-c"],
        arguments=["echo main container should never start; sleep 30"],

        deferrable=False,

        # setting the below value to True makes the task hang for long time
        init_container_logs=True,
        init_containers=[
            k8s.V1Container(
                name="init-hello",
                image="busybox:1.36",
                command=["sh", "-c"],
                args=["echo init container should never start; sleep 30"],
            )
        ],

        # schedule the pod on a non-existing node to make sure the pod stays in the pending state
        node_selector={
            "airflow-repro-node": "does-not-exist",
        },

        # Expected behavior: should fail after timeout.
        # Bug: hangs forever when init_container_logs=True.
        startup_timeout_seconds=30,
        schedule_timeout_seconds=30,

        get_logs=True,
        is_delete_operator_pod=False,
        in_cluster=False,
        config_file="/files/kube/config"
    )


kpo_pending_init_container_logs()

Fix

Made self.await_pod_start() to run before self.await_init_containers_completion() to ensure the pod has fully started before attempting to stream init container logs, preventing KPO from hanging when init container log streaming was triggered against a pod still in PENDING state.

Was generative AI tooling used to co-author this PR?

[X] Yes

Used Claude Sonnet to understand the codebase and assist with implementing the changes.

@boring-cyborg boring-cyborg Bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Jun 12, 2026
@jayachandrakasarla jayachandrakasarla changed the title made await_pod_start to run before await_init_containers_completion Fix KPO hanging indefinitely when init_container_logs=True and pod stays in Pending Jun 12, 2026
@jscheffl jscheffl merged commit 79fa165 into apache:main Jun 12, 2026
102 checks passed
@boring-cyborg

boring-cyborg Bot commented Jun 12, 2026

Copy link
Copy Markdown

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

imrichardwu pushed a commit to imrichardwu/airflow that referenced this pull request Jun 16, 2026
…pache#68450)

Co-authored-by: Jayachandra Kasarla <jayachandra.kasarla@MacBook-Pro.local>
dingo4dev pushed a commit to dingo4dev/airflow that referenced this pull request Jun 16, 2026
…pache#68450)

Co-authored-by: Jayachandra Kasarla <jayachandra.kasarla@MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KubernetesPodOperator hangs on a Pending pod when init_container_logs is set

2 participants