Add Python bindings for accessing ExecutionMetrics by ShreyeshArangath · Pull Request #1381 · apache/datafusion-python

ShreyeshArangath · 2026-02-15T01:52:11Z

Which issue does this PR close?

Rationale for this change

Today, DataFusion Python only exposes execution metrics through formatted console output via explain(analyze=True). This makes it difficult to programmatically inspect execution behavior.

There is currently no structured python API to access per-operator metrics such as output_rows, elapsed_compute, spill_count and other runtime metrics collected during execution.

This PR introduces APIs to surface the execution metrics, mirroring the Rust API in datafusion::physical_plan::metrics.

What changes are included in this PR?

Added plan caching to PyDataFrame so the physical plan used during execution is retained and available for metrics access.
Kept the metrics() method and added collect_metrics() helper to walk the execution plan tree and aggregate metrics from all operators.

Are there any user-facing changes?

Users can now programmatically access execution metrics

  df = ctx.sql("SELECT * FROM t WHERE x > 1")
  df.collect()
  plan = df.execution_plan()
  metrics = plan.collect_metrics() 
  for operator_name, metrics_set in metrics:
      print(f"{operator_name}: {metrics_set.output_rows} rows")

timsaucer

At a high level, I think this could bring a lot of value. Thank you for putting in the work!

From an implementation perspective, did you consider instead of caching the prior execution plan that instead we simply add the collect() and execute_stream() and so forth on PyExecutionPlan? It seems like that would more closely mirror the upstream repo and simplify the code. I haven't spent a lot of time going through the details of why you're caching the prior plan, so it's very possible I missed something.

ShreyeshArangath · 2026-02-20T05:58:50Z

@timsaucer Thanks for the suggestion! Initially when I designed the change, I did consider moving collect() / execute_*() onto plan object. The reason I didn’t go that route was more about how observability fits into real usage patterns (from the cases that I have seen).

Today, I think the users naturally treat a dataframe as the primary handle for a query:

df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
batches = df.collect()

Requiring metrics to go through ExecutionPlan would effectively change the model to look something like so

df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
plan = df.execution_plan()
batches = plan.collect()
metrics = plan.collect_metrics()

I thought that this would require users to restructure pipelines and thread a plan object through call chains purely to have access to metrics. The LoE required to get people to use it seemed high to me.

My goal was to make minimal changes to how users can add support for metrics without changing how they run queries

df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
batches = df.collect()
plan = df.execution_plan()
metrics = plan.collect_metrics()

I’m happy to switch to the plan-based approach if we prefer stronger alignment with the upstream API, but I leaned toward this design to make observability easier to adopt without disrupting current usage patterns — lmk what you think

feat: add Python bindings for accessing ExecutionMetrics

c88674d

ShreyeshArangath changed the title ~~feat: add Python bindings for accessing ExecutionMetrics~~ Add Python bindings for accessing ExecutionMetrics Feb 15, 2026

ShreyeshArangath marked this pull request as ready for review February 15, 2026 01:53

test: imporve tests

075e1ec

timsaucer reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python bindings for accessing ExecutionMetrics#1381

Add Python bindings for accessing ExecutionMetrics#1381
ShreyeshArangath wants to merge 2 commits intoapache:mainfrom
ShreyeshArangath:feat/support-metrics

ShreyeshArangath commented Feb 15, 2026 •

edited

Loading

Uh oh!

timsaucer left a comment

Uh oh!

ShreyeshArangath commented Feb 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ShreyeshArangath commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

timsaucer left a comment

Choose a reason for hiding this comment

Uh oh!

ShreyeshArangath commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

ShreyeshArangath commented Feb 15, 2026 •

edited

Loading

ShreyeshArangath commented Feb 20, 2026 •

edited

Loading