Remove duplicated code on S3ToRedshiftOperator#18671
Conversation
|
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
|
I think we have a small problem here. Seems that we have the Amazon provider depends in this release implicitly on a specific version of the Postgres provider - introduced with #18027. So if you would like to use Redshift from the AWS provider, you need to also have Postgres provider with 2.3.0 version. This might make us hold a bit with releasing the AWS provider being voted now, because I think we need to introduce a specific version dependency between the providers, otherwise there is a risk that it won't work. However I think just documentation update should be enough. The Amazon provider has "postgres" extra https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/index.html#cross-provider-package-dependencies We could introduce a "harder" dependency (and modify the "extra" to depend on "apache-airflow-providers-postgres>=2.3.0" or we could make it clear in the documentation. I doubt many people use the "cross-provider" extras, so I am not sure if it's worth to add the version (Seems that documentation update will be more than enough). WDYT @uranusjr @mariotaddeucci ? If you are OK with that, I can release the package now and update changelog when publishing the documentaiton. Could you plese add a line to the CHANGELOG for Amazon provider that `apache-airflow-providers-postgres>=2.3.0 is needed to use Redshift") |
|
I'd prefer if we could be able to specify the dependency directly in the package metadata because that's what metadata is for, isn't it 😛 But I'm obviously biased toward solving things this way. |
To my understanding the postgres dependency is relevant only for users of Redshift so if you need the AWS provider for Athena or some other service does it make sense to force install postgres provider as well? |
We do not force it, actually. It's only when you install `apache-airflow-providers-amazon[postgres]' this additionally installls apache-airflow-providers-postgres. The import of "airflow.providers.postgres" is only used when you use one of the Redshift Operators. We consider all such "cross-provider" dependencies as optional and that's why we have them as "extras" in providers and there are more "indications" that there is optional dependency, than actual "forcing" it.
Yeah, ideally I agree and we could do it, but I am not sure if it justifies cancelling the RC and releasing a new rc (maybe we can simply add it in the future version ?). Correct me if I am wrong @uranusjr - even if we put (>2.3.0) there, this is not a "hard" requirement. Extras are very "soft" dependencies and they only work temporarily while installation is running - they have no long lasting effects on the installation. You could easily both remove and change version to < 2.3.0 of postgres because the "extras" limitation is not stored persistently (we do not even know that the extra was That's why I think this scenario where it causes problems is quite unlikely. First - I imaginge most people will install those providers together: Secondly - When you install them via images - users will have them both installed and consistent by default. Thirdly - also if you use constraints (which is the only supported method) all will be fine. It's only when you upgrade the amazon provider without upgrading the postgres one - then you can have the problem (and even then you would have to remember to run So I really think just doc update mentioning this "implicit" dependency for now will be enough. |
|
Added the dependency here: #18737 |
Just to comment that this actually applies to all the Python dependencies right now, because pip does not stop you from uninstalling hard dependencies without removing their dependents anyway (unlike say apt). Extras are extra difficult (pun intended) because they also don't show up in |
|
Could you please rebase @mariotaddeucci |
|
@potiuk done :) |
|
BTW. The limit on Postgres is added to the RC2 provider :) |
Remove duplicate code merged on PR #18027
The function was moved to postgres hook
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.