Skip to content

[AIRFLOW-7003] Lazy load all plguins#7644

Merged
mik-laj merged 3 commits into
apache:masterfrom
PolideaInternal:AIRFLOW-7003
Mar 13, 2020
Merged

[AIRFLOW-7003] Lazy load all plguins#7644
mik-laj merged 3 commits into
apache:masterfrom
PolideaInternal:AIRFLOW-7003

Conversation

@mik-laj

@mik-laj mik-laj commented Mar 7, 2020

Copy link
Copy Markdown
Member

Issue link: AIRFLOW-7003

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@mik-laj

mik-laj commented Mar 7, 2020

Copy link
Copy Markdown
Member Author

Travis is green.

@mik-laj mik-laj changed the title [AIRFLOW-7003][WIP] Lazy load all plguins [AIRFLOW-7003] Lazy load all plguins Mar 7, 2020

@ashb ashb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to call integrate_dag_plugins() form somewhere in the task execution path too

Comment thread airflow/plugins_manager.py Outdated

plugins = None # type: Optional[List[AirflowPlugin]]

norm_pattern = re.compile(r'[/|.]')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks.

Comment thread airflow/plugins_manager.py
Comment thread airflow/plugins_manager.py Outdated
hooks_modules = []
executors_modules = []
macros_modules = []
def load_plugins():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this as a name?

Suggested change
def load_plugins():
def ensure_plugins_loaded():

It makes it clearer from the call-site that it will only load them once, not every time this fn is called

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mik-laj

mik-laj commented Mar 9, 2020

Copy link
Copy Markdown
Member Author

I think we need to call integrate_dag_plugins() form somewhere in the task execution path too

I called this method here:
https://github.com/apache/airflow/pull/7644/files#diff-0d282e5c2bb3b60ee0d056a990cf0927R195

Both SchedulerJob and LocalTaskJob use this method.

Comment thread airflow/plugins_manager.py Outdated
hooks_modules = []
executors_modules = []
macros_modules = []
def endure_plugins_loaded():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def endure_plugins_loaded():
def ensure_plugins_loaded():

Comment thread airflow/plugins_manager.py Outdated

def integrate_dag_plugins() -> None:
"""Integrates operator, sensor, hook, macro plugins."""
endure_plugins_loaded()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endure_plugins_loaded()
ensure_plugins_loaded()

Comment thread airflow/plugins_manager.py Outdated
"""Integrate operators plugins to the context"""
def integrate_executor_plugins() -> None:
"""Integrate executor plugins to the context."""
endure_plugins_loaded()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endure_plugins_loaded()
ensure_plugins_loaded()

Comment thread airflow/plugins_manager.py Outdated

def load_plugins_from_plugin_directory():
"""
Load and register Airflow Plugin from plugin directory

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Load and register Airflow Plugin from plugin directory
Load and register Airflow Plugins from plugins directory

"""
from airflow.plugins_manager import operator_extra_links
from airflow import plugins_manager
plugins_manager.endure_plugins_loaded()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plugins_manager.endure_plugins_loaded()
plugins_manager.ensure_plugins_loaded()

"""
from airflow.plugins_manager import registered_operator_link_classes
from airflow import plugins_manager
plugins_manager.endure_plugins_loaded()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plugins_manager.endure_plugins_loaded()
plugins_manager.ensure_plugins_loaded()

Comment thread airflow/www/app.py Outdated
from airflow import plugins_manager

for v in flask_appbuilder_views:
plugins_manager.endure_plugins_loaded()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plugins_manager.endure_plugins_loaded()
plugins_manager.ensure_plugins_loaded()

@ashb

ashb commented Mar 9, 2020

Copy link
Copy Markdown
Member

I think we need to call integrate_dag_plugins() form somewhere in the task execution path too

I called this method here:
https://github.com/apache/airflow/pull/7644/files#diff-0d282e5c2bb3b60ee0d056a990cf0927R195

Both SchedulerJob and LocalTaskJob use this method.

What about airflow task run or similar?

@codecov-io

Copy link
Copy Markdown

Codecov Report

Merging #7644 into master will decrease coverage by 0.01%.
The diff coverage is 87.12%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7644      +/-   ##
==========================================
- Coverage   86.83%   86.82%   -0.02%     
==========================================
  Files         897      897              
  Lines       42805    42876      +71     
==========================================
+ Hits        37170    37226      +56     
- Misses       5635     5650      +15
Impacted Files Coverage Δ
airflow/__init__.py 91.3% <ø> (-0.7%) ⬇️
airflow/models/dagbag.py 89.83% <100%> (+0.08%) ⬆️
airflow/www/app.py 94.28% <100%> (+0.04%) ⬆️
airflow/serialization/serialized_objects.py 90.26% <100%> (+0.06%) ⬆️
airflow/plugins_manager.py 90.14% <85.22%> (+1.01%) ⬆️
...rflow/providers/google/cloud/operators/dataflow.py 90.44% <0%> (-8.65%) ⬇️
airflow/providers/postgres/hooks/postgres.py 92.95% <0%> (-1.41%) ⬇️
airflow/jobs/scheduler_job.py 90.21% <0%> (-0.44%) ⬇️
airflow/jobs/backfill_job.py 91.87% <0%> (-0.29%) ⬇️
airflow/providers/mysql/hooks/mysql.py 92.7% <0%> (-0.07%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3abd34...f44a0d1. Read the comment docs.

"""Deserializes an operator from a JSON object.
"""
from airflow.plugins_manager import operator_extra_links
from airflow import plugins_manager

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to your change, but i am just curious: do you know what's the reason to do a lazy import for plugins_manager here as well as in _deserialize_operator_extra_links?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. This code was written by @kaxil together with @ashb Can you help us?.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a security measure:

Since operator links can be dynamic (think a pre-signed S3 url that is only valid for 15mins), so we need to support inflating to "custom" classes, but we don't want to have to trust the serialized blob, so we only inflate classes are pre-registered.

This is a class of bugs called "Object Injection Attacks" -- if we trusted the input and de-serialized whatever class was here we might end up opening a reverse shell etc. https://blog.nelhage.com/2011/03/exploiting-pickle/ as an example. This defense is not perfect as the plugins are "under user control" but this is mostly looking forward to when we will have an API that accepts a serialized DAG blob to run.

@mik-laj mik-laj merged commit 0be7654 into apache:master Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants