Apache Airflow Provider(s)
http
Versions of Apache Airflow Providers
apache-airflow-providers-http>=4.7.0
Apache Airflow version
Any version that supports the above provider version
Operating System
Linux/Docker
Deployment
Docker-Compose
Deployment details
No response
What happened
This PR appears to have altered the way that the hook within the HttpOperator is initialized: #34669
Previously, an HttpHook was initialized explicitly. Since both the connection ID and the hook type were already defined, connection variables defined via environment variables which used https as the prefix were parsed correctly. After the above change, BaseHook.get_connection is used. This refers to the ProviderManager's list of acceptable connection types, which only includes http (assuming that's because it's based on the provider name or something). Note the absence of https below:
In [1]: from airflow.providers_manager import ProvidersManager
In [2]: ProvidersManager().hooks
Out[2]: <airflow.providers_manager.LazyDictWithCache at 0x7f40120a4190>
In [3]: dict(ProvidersManager().hooks)
Out[3]:
{'generic': HookInfo(hook_class_name=None, connection_id_attribute_name=None, package_name=None, hook_name='Generic', connection_type=None, connection_testable=False),
'email': HookInfo(hook_class_name=None, connection_id_attribute_name=None, package_name=None, hook_name='Email', connection_type=None, connection_testable=False),
'fs': HookInfo(hook_class_name='airflow.hooks.filesystem.FSHook', connection_id_attribute_name='fs_conn_id', package_name='airflow.hooks.filesystem', hook_name='File (path)', connection_type='fs', connection_testable=True),
'package_index': HookInfo(hook_class_name='airflow.hooks.package_index.PackageIndexHook', connection_id_attribute_name='pi_conn_id', package_name='airflow.hooks.package_index', hook_name='Package Index (Python)', connection_type='package_index', connection_testable=True),
'aws': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.base_aws.AwsGenericHook', connection_id_attribute_name='aws_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Web Services', connection_type='aws', connection_testable=True),
'chime': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.chime.ChimeWebhookHook', connection_id_attribute_name='chime_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Chime Webhook', connection_type='chime', connection_testable=True),
'emr': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.emr.EmrHook', connection_id_attribute_name='emr_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Elastic MapReduce', connection_type='emr', connection_testable=True),
'redshift': HookInfo(hook_class_name='airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook', connection_id_attribute_name='redshift_conn_id', package_name='apache-airflow-providers-amazon', hook_name='Amazon Redshift', connection_type='redshift', connection_testable=True),
'elasticsearch': HookInfo(hook_class_name='airflow.providers.elasticsearch.hooks.elasticsearch.ElasticsearchHook', connection_id_attribute_name='elasticsearch_conn_id', package_name='apache-airflow-providers-elasticsearch', hook_name='Elasticsearch', connection_type='elasticsearch', connection_testable=True),
'ftp': HookInfo(hook_class_name='airflow.providers.ftp.hooks.ftp.FTPHook', connection_id_attribute_name='ftp_conn_id', package_name='apache-airflow-providers-ftp', hook_name='FTP', connection_type='ftp', connection_testable=True),
'http': HookInfo(hook_class_name='airflow.providers.http.hooks.http.HttpHook', connection_id_attribute_name='http_conn_id', package_name='apache-airflow-providers-http', hook_name='HTTP', connection_type='http', connection_testable=True),
'imap': HookInfo(hook_class_name='airflow.providers.imap.hooks.imap.ImapHook', connection_id_attribute_name='imap_conn_id', package_name='apache-airflow-providers-imap', hook_name='IMAP', connection_type='imap', connection_testable=False),
'postgres': HookInfo(hook_class_name='airflow.providers.postgres.hooks.postgres.PostgresHook', connection_id_attribute_name='postgres_conn_id', package_name='apache-airflow-providers-postgres', hook_name='Postgres', connection_type='postgres', connection_testable=True),
'sqlite': HookInfo(hook_class_name='airflow.providers.sqlite.hooks.sqlite.SqliteHook', connection_id_attribute_name='sqlite_conn_id', package_name='apache-airflow-providers-sqlite', hook_name='Sqlite', connection_type='sqlite', connection_testable=True)}
This causes connections of the form AIRFLOW_CONN_...=https://... to no longer be allowed for use by the HttpOperator.
What you think should happen instead
The documentation for v4.8.0 specifies:
Schema (optional):
Specify the service type etc: http/https.
Thus it seems reasonable to expect connection URIs which start with https to be allowed, especially given this is a breaking change with previous behavior that was not immediately obvious when upgrading from provider version <=4.6.0.
How to reproduce
The following command simulates an HTTPS hook being instantiated with the HttpHook class directly, which was the previous internal behavior of the HttpOperator. No errors are raised.
$ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from airflow.providers.http.hooks.http import HttpHook; hook = HttpHook(http_conn_id="sample_hook")'
This command shows the newer behavior, which uses BaseHook to determine the connection type and fails to do so.
$ docker run -e AIRFLOW_CONN_SAMPLE_HOOK='https://google.com' --rm docker.io/apache/airflow:slim-2.8.0-python3.10 python -c 'from airflow.hooks.base import BaseHook; conn = BaseHook.get_connection("sample_hook"); hook = conn.get_hook()'
Anything else
We were able to apply a patch in our own code as a workaround in WordPress/openverse#3624
Are you willing to submit PR?
Code of Conduct
Apache Airflow Provider(s)
http
Versions of Apache Airflow Providers
apache-airflow-providers-http>=4.7.0Apache Airflow version
Any version that supports the above provider version
Operating System
Linux/Docker
Deployment
Docker-Compose
Deployment details
No response
What happened
This PR appears to have altered the way that the hook within the
HttpOperatoris initialized: #34669Previously, an
HttpHookwas initialized explicitly. Since both the connection ID and the hook type were already defined, connection variables defined via environment variables which usedhttpsas the prefix were parsed correctly. After the above change,BaseHook.get_connectionis used. This refers to theProviderManager's list of acceptable connection types, which only includeshttp(assuming that's because it's based on the provider name or something). Note the absence ofhttpsbelow:This causes connections of the form
AIRFLOW_CONN_...=https://...to no longer be allowed for use by theHttpOperator.What you think should happen instead
The documentation for v4.8.0 specifies:
Thus it seems reasonable to expect connection URIs which start with
httpsto be allowed, especially given this is a breaking change with previous behavior that was not immediately obvious when upgrading from provider version <=4.6.0.How to reproduce
The following command simulates an HTTPS hook being instantiated with the
HttpHookclass directly, which was the previous internal behavior of theHttpOperator. No errors are raised.This command shows the newer behavior, which uses
BaseHookto determine the connection type and fails to do so.Anything else
We were able to apply a patch in our own code as a workaround in WordPress/openverse#3624
Are you willing to submit PR?
Code of Conduct