Skip to content

Use execute_values instead of execute_batch for better bulk insert performance with PostgresHook#68207

Merged
dabla merged 2 commits into
apache:mainfrom
dabla:feature/improve-postgres-insert-rows
Jun 9, 2026
Merged

Use execute_values instead of execute_batch for better bulk insert performance with PostgresHook#68207
dabla merged 2 commits into
apache:mainfrom
dabla:feature/improve-postgres-insert-rows

Conversation

@dabla

@dabla dabla commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR optimizes PostgresHook.insert_rows() when fast_executemany=True by switching from execute_batch to execute_values for psycopg2, which provides significantly better bulk insert performance.

Changes

  • psycopg2 with fast_executemany=True: Now uses psycopg2.extras.execute_values() instead of execute_batch(). This batches all rows into a single INSERT statement with multiple value tuples, reducing round-trips and improving throughput.

  • psycopg3: Falls back to the default DbApiHook.insert_rows() implementation. psycopg3's native executemany already uses pipelining internally, so there's no benefit to a custom implementation—and execute_values is not compatible with psycopg3.

  • Format string handling: Both code paths now explicitly set _insert_statement_format to ensure correct SQL generation and self-healing if a previous call failed mid-execution.

Why execute_values over execute_batch?

Method Behavior
execute_batch Sends multiple INSERT statements in batches
execute_values Sends a single INSERT ... VALUES (...), (...), (...) statement

execute_values is typically 2-3x faster for bulk inserts because it minimizes statement parsing overhead and network round-trips.

Testing

  • Updated existing tests to verify execute_values is called instead of execute_batch
  • Added new test to verify psycopg3 correctly falls back to the default implementation even when fast_executemany=True

Was generative AI tooling used to co-author this PR?
  • [ x ] Yes (please specify the tool below)

Claude Opus 4.6


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

… insert performance with psycopg2 in insert_rows of PostgresHook when fast_executemany is enabled unless psycopg3 is used
@dabla dabla merged commit b183a4c into apache:main Jun 9, 2026
80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants