Skip to content

fix connection uri parsing when the host includes a scheme#31465

Merged
hussein-awala merged 3 commits into
apache:mainfrom
hussein-awala:fix/connection_uri
Jun 19, 2023
Merged

fix connection uri parsing when the host includes a scheme#31465
hussein-awala merged 3 commits into
apache:mainfrom
hussein-awala:fix/connection_uri

Conversation

@hussein-awala

@hussein-awala hussein-awala commented May 22, 2023

Copy link
Copy Markdown
Member

closes: #21638
closes: #31460

This PR introduces a solution to address the parsing problem that occurs when the connection host includes a scheme.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@hussein-awala hussein-awala added this to the Airflow 2.6.2 milestone May 22, 2023
@hussein-awala hussein-awala added type:bug-fix Changelog: Bug Fixes area:core labels May 22, 2023
@uranusjr

Copy link
Copy Markdown
Member

Instead of trying to guess what the user wants, I’m more inclined to make this fail directly when the user tries to add the connection object in the first place.

@hussein-awala

Copy link
Copy Markdown
Member Author

@uranusjr IMO there should not be any problem if the user provide this kind of hosts, since it's a valid host but with a protocol, for ex, if we create the same connection from the UI:
image

Then we use the Airflow CLI to get the connection information:

root@55c0591cd327:/opt/airflow# airflow connections get test_conn
id | conn_id   | conn_type | description | host            | schema | login | password | port | is_encrypted | is_extra_encrypted | extra_dejson       | get_uri                                                   
===+===========+===========+=============+=================+========+=======+==========+======+==============+====================+====================+===========================================================
1  | test_conn | generic   |             | protocol://host |        | user  | pass     | 123  | False        | False              | {'param': 'value'} | generic://user:pass@protocol%3A%2F%2Fhost:123/?param=value

We see that it's successfully parsed and we can use it without any problem, and if I think that the method parse_uri in the connection class should parse the generated uri and create the same connection from it. WDYT?

@hussein-awala

Copy link
Copy Markdown
Member Author

I think we can add a new column to specify the protocol:
Screenshot from 2023-05-23 15-16-22
Screenshot from 2023-05-23 15-31-32

Which is cleaner and easily supported by the uri parsing:

root@cb473c029341:/opt/airflow# airflow connections get spark_on_k8s_conn
id | conn_id           | conn_type | description | host       | schema | login | password | port | is_encrypted | is_extra_encrypted | extra_dejson               | get_uri                                        
===+===================+===========+=============+============+========+=======+==========+======+==============+====================+============================+================================================
1  | spark_on_k8s_conn | spark     |             | 100.68.0.1 |        |       | None     | 443  | False        | False              | {'deploy-mode': 'cluster'} | spark://k8s://100.68.0.1:443/?deploy-mode=clust
   |                   |           |             |            |        |       |          |      |              |                    |                            | er                                             
                                                                                                                                                                                                                   
root@cb473c029341:/opt/airflow# airflow connections get test_conn
id | conn_id   | conn_type | description      | host | schema | login | password | port | is_encrypted | is_extra_encrypted | extra_dejson       | get_uri                                                   
===+===========+===========+==================+======+========+=======+==========+======+==============+====================+====================+===========================================================
2  | test_conn | generic   | test description | host | schema | user  | pass     | 123  | False        | False              | {'param': 'value'} | generic://protocol://user:pass@host:123/schema?param=value 

@hussein-awala

Copy link
Copy Markdown
Member Author

I will add some changes it to make it backward compatible

@hussein-awala hussein-awala marked this pull request as draft May 23, 2023 14:58
@hussein-awala hussein-awala marked this pull request as ready for review June 1, 2023 10:05
@hussein-awala

Copy link
Copy Markdown
Member Author

It's ready and without a new column, I just updated the methods _parse_from_uri and get_uri and added a new pytest file for the connection model which contain the tests for my changes.

@amoghrajesh amoghrajesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hussein-awala the changes mostly look good to me.
Have a few comments

Comment thread airflow/models/connection.py Outdated
Comment thread airflow/models/connection.py Outdated
schemes_count_in_uri = uri.count("://")
if schemes_count_in_uri > 2:
raise AirflowException(f"Invalid connection string: {uri}.")
scheme_in_uri = schemes_count_in_uri == 2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name is rather confusing to me, can we have something else here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check it now?

Comment thread tests/models/test_connection.py
@ephraimbuddy

Copy link
Copy Markdown
Contributor

Hi @hussein-awala , we are about to release 2.6.2, can you address the review comments here so this can be part of 2.6.2

@amoghrajesh amoghrajesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for those changes, @hussein-awala
LGTM +1

@hussein-awala

Copy link
Copy Markdown
Member Author

Hi @hussein-awala , we are about to release 2.6.2, can you address the review comments here so this can be part of 2.6.2

@ephraimbuddy the review is addressed, can your review the PR?

@ephraimbuddy

Copy link
Copy Markdown
Contributor

Hi @uranusjr , do you want to take another look?

@pankajkoti pankajkoti self-requested a review June 14, 2023 08:33
@hussein-awala hussein-awala merged commit 0560881 into apache:main Jun 19, 2023
ephraimbuddy pushed a commit that referenced this pull request Jul 6, 2023
* update _parse_from_uri and get_uri  methods, and add tests for connection model

* some fixes from review

(cherry picked from commit 0560881)
@pankajkoti pankajkoti mentioned this pull request Aug 17, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add capability in Airflow connections to validate host Spark Connection with k8s in URL not mapped correctly

5 participants