Skip to content

HttpOperator no longer accepts https-prefixed connection URIs from environment variables #36602

Description

@AetherUnbound

Apache Airflow Provider(s)

http

Versions of Apache Airflow Providers

apache-airflow-providers-http>=4.7.0

Apache Airflow version

Any version that supports the above provider version

Operating System

Linux/Docker

Deployment

Docker-Compose

Deployment details

No response

What happened

This PR appears to have altered the way that the hook within the HttpOperator is initialized: #34669

Previously, an HttpHook was initialized explicitly. Since both the connection ID and the hook type were already defined, connection variables defined via environment variables which used https as the prefix were parsed correctly. After the above change, BaseHook.get_connection is used. This refers to the ProviderManager's list of acceptable connection types, which only includes http (assuming that's because it's based on the provider name or something). Note the absence of https below:

In [1]: from airflow.providers_manager import ProvidersManager

In [2]: ProvidersManager().hooks
Out[2]: <airflow.providers_manager.LazyDictWithCache at 0x7f40120a4190>

In [3]: dict(ProvidersManager().hooks)
Out[3]: 
{'generic': HookInfo(hook_class_name=None, connection_id_attribute_name=None, package_name=None, hook_name='Generic', connection_type=None, connection_testable=False),
 'email': HookInfo(hook_class_name=None, connection_id_attribute_name=None, package_name=None, hook_name='Email', connection_type=None, connection_testable=False),
 'fs': HookInfo(hook_class_name='airflow.hooks.filesystem.FSHook', connection_id_attribute_name='fs_conn_id', package_name='airflow.hooks.filesystem', hook_name='File (path)', connection_type='fs', connection_testable=True),
 'package_index': HookInfo(hook_class_name='airflow.hooks.package_index.PackageIndexHook', connection_id_attribute_name='pi_conn_id', package_name='airflow.hooks.package_index', hook_name='Package Index (Python)', connection_type='package_index', connection_testable=True),
 'aws': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook', connection_id_attribute_name='aws_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Web Services', connection_type='aws', connection_testable=True),
 'chime': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.chime.ChimeWebhookHook', connection_id_attribute_name='chime_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Chime Webhook', connection_type='chime', connection_testable=True),
 'emr': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.emr.EmrHook', connection_id_attribute_name='emr_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Elastic MapReduce', connection_type='emr', connection_testable=True),
 'redshift': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook', connection_id_attribute_name='redshift_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Redshift', connection_type='redshift', connection_testable=True),
 'elasticsearch': HookInfo(hook_class_name='airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchHook', connection_id_attribute_name='elasticsearch_conn_id', package_name='apache-airflow-providers-elasticsearch', hook_name='Elasticsearch', connection_type='elasticsearch', connection_testable=True),
 'ftp': HookInfo(hook_class_name='airflow.providers.ftp.hooks.ftp.FTPHook', connection_id_attribute_name='ftp_conn_id', package_name='apache-airflow-providers-ftp', hook_name='FTP', connection_type='ftp', connection_testable=True),
 'http': HookInfo(hook_class_name='airflow.providers.http.hooks.http.HttpHook', connection_id_attribute_name='http_conn_id', package_name='apache-airflow-providers-http', hook_name='HTTP', connection_type='http', connection_testable=True),
 'imap': HookInfo(hook_class_name='airflow.providers.imap.hooks.imap.ImapHook', connection_id_attribute_name='imap_conn_id', package_name='apache-airflow-providers-imap', hook_name='IMAP', connection_type='imap', connection_testable=False),
 'postgres': HookInfo(hook_class_name='airflow.providers.postgres.hooks.postgres.PostgresHook', connection_id_attribute_name='postgres_conn_id', package_name='apache-airflow-providers-postgres', hook_name='Postgres', connection_type='postgres', connection_testable=True),
 'sqlite': HookInfo(hook_class_name='airflow.providers.sqlite.hooks.sqlite.SqliteHook', connection_id_attribute_name='sqlite_conn_id', package_name='apache-airflow-providers-sqlite', hook_name='Sqlite', connection_type='sqlite', connection_testable=True)}

This causes connections of the form AIRFLOW_CONN_...=https://... to no longer be allowed for use by the HttpOperator.

What you think should happen instead

The documentation for v4.8.0 specifies:

Schema (optional):
Specify the service type etc: http/https.

Thus it seems reasonable to expect connection URIs which start with https to be allowed, especially given this is a breaking change with previous behavior that was not immediately obvious when upgrading from provider version <=4.6.0.

How to reproduce

The following command simulates an HTTPS hook being instantiated with the HttpHook class directly, which was the previous internal behavior of the HttpOperator. No errors are raised.

$ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from airflow.providers.http.hooks.http import HttpHook; hook = HttpHook(http_conn_id="sample_hook")'

This command shows the newer behavior, which uses BaseHook to determine the connection type and fails to do so.

$ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from airflow.hooks.base import BaseHook; conn = BaseHook.get_connection("sample_hook"); hook = conn.get_hook()'

Anything else

We were able to apply a patch in our own code as a workaround in WordPress/openverse#3624

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions