Skip to content

Remove duplicated code on S3ToRedshiftOperator#18671

Merged
potiuk merged 5 commits into
apache:mainfrom
mariotaddeucci:duplicate-code-s3_to_redshift
Oct 8, 2021
Merged

Remove duplicated code on S3ToRedshiftOperator#18671
potiuk merged 5 commits into
apache:mainfrom
mariotaddeucci:duplicate-code-s3_to_redshift

Conversation

@mariotaddeucci

@mariotaddeucci mariotaddeucci commented Oct 1, 2021

Copy link
Copy Markdown
Contributor

Remove duplicate code merged on PR #18027
The function was moved to postgres hook

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@mariotaddeucci mariotaddeucci changed the title Removed duplicated code on S3ToRedshiftOperator Remove duplicated code on S3ToRedshiftOperator Oct 1, 2021
@uranusjr uranusjr closed this Oct 3, 2021
@uranusjr uranusjr reopened this Oct 3, 2021
@github-actions github-actions Bot added the okay to merge It's ok to merge this PR as it does not require more tests label Oct 3, 2021
@github-actions

github-actions Bot commented Oct 3, 2021

Copy link
Copy Markdown
Contributor

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk

potiuk commented Oct 3, 2021

Copy link
Copy Markdown
Member

I think we have a small problem here. Seems that we have the Amazon provider depends in this release implicitly on a specific version of the Postgres provider - introduced with #18027. So if you would like to use Redshift from the AWS provider, you need to also have Postgres provider with 2.3.0 version.

This might make us hold a bit with releasing the AWS provider being voted now, because I think we need to introduce a specific version dependency between the providers, otherwise there is a risk that it won't work. However I think just documentation update should be enough. The Amazon provider has "postgres" extra https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/index.html#cross-provider-package-dependencies

We could introduce a "harder" dependency (and modify the "extra" to depend on "apache-airflow-providers-postgres>=2.3.0" or we could make it clear in the documentation. I doubt many people use the "cross-provider" extras, so I am not sure if it's worth to add the version (Seems that documentation update will be more than enough).

WDYT @uranusjr @mariotaddeucci ? If you are OK with that, I can release the package now and update changelog when publishing the documentaiton. Could you plese add a line to the CHANGELOG for Amazon provider that `apache-airflow-providers-postgres>=2.3.0 is needed to use Redshift")

@uranusjr

uranusjr commented Oct 4, 2021

Copy link
Copy Markdown
Member

I'd prefer if we could be able to specify the dependency directly in the package metadata because that's what metadata is for, isn't it 😛 But I'm obviously biased toward solving things this way.

@eladkal

eladkal commented Oct 4, 2021

Copy link
Copy Markdown
Contributor

We could introduce a "harder" dependency (and modify the "extra" to depend on "apache-airflow-providers-postgres>=2.3.0" or we could make it clear in the documentation. I doubt many people use the "cross-provider" extras, so I am not sure if it's worth to add the version (Seems that documentation update will be more than enough).

To my understanding the postgres dependency is relevant only for users of Redshift so if you need the AWS provider for Athena or some other service does it make sense to force install postgres provider as well?

@potiuk

potiuk commented Oct 4, 2021

Copy link
Copy Markdown
Member

To my understanding the postgres dependency is relevant only for users of Redshift so if you need the AWS provider for Athena or some other service does it make sense to force install postgres provider as well?

We do not force it, actually. It's only when you install `apache-airflow-providers-amazon[postgres]' this additionally installls apache-airflow-providers-postgres. The import of "airflow.providers.postgres" is only used when you use one of the Redshift Operators. We consider all such "cross-provider" dependencies as optional and that's why we have them as "extras" in providers and there are more "indications" that there is optional dependency, than actual "forcing" it.

I'd prefer if we could be able to specify the dependency directly in the package metadata because that's what metadata is for, isn't it 😛 But I'm obviously biased toward solving things this way.

Yeah, ideally I agree and we could do it, but I am not sure if it justifies cancelling the RC and releasing a new rc (maybe we can simply add it in the future version ?). Correct me if I am wrong @uranusjr - even if we put (>2.3.0) there, this is not a "hard" requirement. Extras are very "soft" dependencies and they only work temporarily while installation is running - they have no long lasting effects on the installation. You could easily both remove and change version to < 2.3.0 of postgres because the "extras" limitation is not stored persistently (we do not even know that the extra was

That's why I think this scenario where it causes problems is quite unlikely.

First - I imaginge most people will install those providers together: pip install apache-airflow[postgres,amazon,....] or via separately instaling 'apache-airflow-providers-amazon' and apache-airflow-providers-postgres. Both of those will work fine as latest versions of both providers will be used.

Secondly - When you install them via images - users will have them both installed and consistent by default.

Thirdly - also if you use constraints (which is the only supported method) all will be fine.

It's only when you upgrade the amazon provider without upgrading the postgres one - then you can have the problem (and even then you would have to remember to run pip install -U apache-airflow-providers-amazon[postgres] for the limitation to be actually used). We have no way to "force" upgrading "postgres" if the user upgrades pip install -U apache-airlfow-providers-amazon (because postgres is optional extra). And always answer here will be "use constraints" as we do for all other dependencies now.

So I really think just doc update mentioning this "implicit" dependency for now will be enough.

@mariotaddeucci

Copy link
Copy Markdown
Contributor Author

The specific release dependence can be removed after #18447 with RedshiftSqlHook by adding the get_primary_key on this hook.
We can add as implicit to remove it after #18447 or wait.

@potiuk

potiuk commented Oct 5, 2021

Copy link
Copy Markdown
Member

The specific release dependence can be removed after #18447 with RedshiftSqlHook by adding the get_primary_key on this hook. We can add as implicit to remove it after #18447 or wait.

We are going to release rc2 of the amazon provider regardless so I think I will add the extra dependency for now :)

@potiuk

potiuk commented Oct 5, 2021

Copy link
Copy Markdown
Member

Added the dependency here: #18737

@uranusjr

uranusjr commented Oct 6, 2021

Copy link
Copy Markdown
Member

even if we put (>2.3.0) there, this is not a "hard" requirement. Extras are very "soft" dependencies and they only work temporarily while installation is running - they have no long lasting effects on the installation.

Just to comment that this actually applies to all the Python dependencies right now, because pip does not stop you from uninstalling hard dependencies without removing their dependents anyway (unlike say apt). Extras are extra difficult (pun intended) because they also don't show up in pip check and pipdeptree, but most of the time it doesn't matter since even hard dependencies aren't that hard in begin with.

@potiuk

potiuk commented Oct 8, 2021

Copy link
Copy Markdown
Member

Could you please rebase @mariotaddeucci

@mariotaddeucci

Copy link
Copy Markdown
Contributor Author

@potiuk done :)

@potiuk potiuk merged commit 22768ff into apache:main Oct 8, 2021
@potiuk

potiuk commented Oct 8, 2021

Copy link
Copy Markdown
Member

BTW. The limit on Postgres is added to the RC2 provider :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants