Skip to content

Allow override of project_id in DataprocJobBaseOperator#14981

Merged
turbaszek merged 1 commit into
apache:masterfrom
SamWheating:sw-dataproc-dont-use-project-from-hook
Mar 28, 2021
Merged

Allow override of project_id in DataprocJobBaseOperator#14981
turbaszek merged 1 commit into
apache:masterfrom
SamWheating:sw-dataproc-dont-use-project-from-hook

Conversation

@SamWheating

Copy link
Copy Markdown
Contributor

Operators extending the DataprocSubmitJobOperator currently infer the project_id from the GCP connection, which isn't always correct (for example, when using a service account from one Project to submit jobs to a dataproc cluster in a different project).

In our case, this led to 404s when trying to submit jobs, as the operator incorrectly assumed that the dataproc cluster was in the same project as the service account used to submit the job.

Adding an optional override, if the project_id is left unspecified it will fallback to the old behaviour.

I also understand that these operators are now deprecated, so if you don't want to make changes to them you can close this PR.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg Bot added area:providers provider:google Google (including GCP) related issues labels Mar 24, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hook should be initialized in execute method to avoid connection to metadata server at dag loading.

@SamWheating SamWheating Mar 24, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a really good point, but we won't be able to get the project_id from the hook if the hook is not yet instantiated 🤔 We would also have to update the generate_job function in all of the operators which inherit from this one.

Maybe that should be saved for a separate issue? let me know what you think.

@github-actions

Copy link
Copy Markdown
Contributor

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@kaxil

kaxil commented Mar 24, 2021

Copy link
Copy Markdown
Member

Please rebase on latest Master, failing doc build should be fixed by #14986

@SamWheating SamWheating force-pushed the sw-dataproc-dont-use-project-from-hook branch from 612770d to 8db85e3 Compare March 24, 2021 17:10
@github-actions

Copy link
Copy Markdown
Contributor

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@kaxil kaxil force-pushed the sw-dataproc-dont-use-project-from-hook branch from 8db85e3 to 6ea3dab Compare March 25, 2021 14:25
@turbaszek

Copy link
Copy Markdown
Member

@SamWheating please consider using DataprocSubmitJobOperator. All other operators like pig, spark etc are deprecated and will be removed one day.

@SamWheating

Copy link
Copy Markdown
Contributor Author

please consider using DataprocSubmitJobOperator. All other operators like pig, spark etc are deprecated and will be removed one day.

I'll pass this along to the owner of the jobs in question. We encountered this issue when migrating DAGs from an out-of-date airflow environment to a much newer one, hence the use of outdated operators.

Is it still worthwhile to fix/improve these deprecated operators? If not, feel free to close this PR @turbaszek.

@turbaszek

Copy link
Copy Markdown
Member

@SamWheating every fix is good fix. I just wanted to make sure you know about the new operators 👌

@github-actions

Copy link
Copy Markdown
Contributor

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions Bot added the okay to merge It's ok to merge this PR as it does not require more tests label Mar 28, 2021
@turbaszek turbaszek merged commit 099c490 into apache:master Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants