Skip to content

KubernetesPodOperator breaks with active log-collection for long running tasks #12136

Description

@yamrzou

I'm encountering the same bug reported in https://issues.apache.org/jira/browse/AIRFLOW-3534, with airflow 1.10.12.

[2020-11-06 13:03:29,672] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:29,673] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:30,681] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:30,681] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:31,692] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Pending
[2020-11-06 13:03:31,692] {pod_launcher.py:139} WARNING - Pod not yet started: fetcher-56104d81c54946a88ce3cd1cf4273477
[2020-11-06 13:03:32,702] {pod_launcher.py:173} INFO - Event: fetcher-56104d81c54946a88ce3cd1cf4273477 had an event of type Running
[2020-11-06 13:04:32,740] {taskinstance.py:1150} ERROR - ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 696, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 436, in _error_catcher
    yield
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 763, in read_chunked
    self._update_chunk_length()
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 700, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 979, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 284, in execute
    final_state, _, result = self.create_new_pod_for_operator(labels, launcher)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 403, in create_new_pod_for_operator
    final_state, result = launcher.monitor_pod(pod=pod, get_logs=self.get_logs)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/airflow/kubernetes/pod_launcher.py", line 155, in monitor_pod
    for line in logs:
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 807, in __iter__
    for chunk in self.stream(decode_content=True):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 571, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 792, in read_chunked
    self._original_response.close()
  File "/opt/bitnami/python/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/opt/bitnami/airflow/venv/lib/python3.6/site-packages/urllib3/response.py", line 454, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
[2020-11-06 13:04:32,743] {taskinstance.py:1194} INFO - Marking task as UP_FOR_RETRY. dag_id=..., task_id=..., execution_date=20201106T120000, start_date=20201106T130329, end_date=20201106T130432
[2020-11-06 13:04:34,641] {local_task_job.py:102} INFO - Task exited with return code 1

The bug goes away by setting get_logs=False in the KubernetesPodOperator. Reproduced with multiple dags and tasks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions