Skip to content

feat(parallel): allow dynamic batch sizing by byte limits#669

Open
ad-claw000 wants to merge 43 commits into
developfrom
feat/328-dynamic-batchsize
Open

feat(parallel): allow dynamic batch sizing by byte limits#669
ad-claw000 wants to merge 43 commits into
developfrom
feat/328-dynamic-batchsize

Conversation

@ad-claw000
Copy link
Copy Markdown
Contributor

Closes #328

This PR introduces an optional max_bytes_per_batch parameter to ParallelQuery.query(). When set, the worker overrides its static batchsize strategy and instead aggregates tuples iteratively up to the specified byte threshold (estimating size mostly via the blob list byte lengths).

This allows queries involving very large blobs (like images) to batch gracefully without exceeding the DB's internal 2048 MB limits, while letting descriptor batches stay safely massive.

@ad-claw000 ad-claw000 force-pushed the feat/328-dynamic-batchsize branch 3 times, most recently from 354cb9b to bab2793 Compare May 4, 2026 09:53
@ad-claw000 ad-claw000 force-pushed the feat/328-dynamic-batchsize branch from 651c7c5 to 709ec68 Compare May 19, 2026 02:37
@ad-claw000 ad-claw000 requested review from Copilot and luisremis May 19, 2026 04:08
@ad-claw000
Copy link
Copy Markdown
Contributor Author

Fixed the pre-commit CI failure by applying autopep8 formatting to the new log messages.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds support for dynamically sized batches in ParallelQuery based on an optional per-batch byte limit to avoid oversized requests (e.g., for large blobs).

Changes:

  • Introduces max_bytes_per_batch parameter to ParallelQuery.query().
  • Updates worker() to optionally build batches by estimated blob byte size instead of fixed batchsize.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aperturedb/ParallelQuery.py Outdated
Comment thread aperturedb/ParallelQuery.py Outdated
Comment thread aperturedb/ParallelQuery.py Outdated
Comment thread aperturedb/ParallelQuery.py Outdated
Comment thread aperturedb/ParallelQuery.py Outdated
@ad-claw000 ad-claw000 force-pushed the feat/328-dynamic-batchsize branch from 2f950e0 to 43a3810 Compare May 20, 2026 01:07
Copilot AI review requested due to automatic review settings May 20, 2026 01:13
Copy link
Copy Markdown
Contributor Author

@ad-claw000 ad-claw000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied autopep8 formatting to the log messages to fix the pre-commit CI failure.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 5 comments.

Comment thread aperturedb/ParallelQuery.py Outdated
Comment thread aperturedb/ParallelQuery.py Outdated
Comment thread aperturedb/ParallelQuery.py
Comment thread aperturedb/ParallelQuery.py Outdated
Comment thread aperturedb/ParallelQuery.py
Copy link
Copy Markdown
Contributor Author

@ad-claw000 ad-claw000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied autopep8 formatting to the new log messages to fix the pre-commit CI failure.

Copilot AI review requested due to automatic review settings May 24, 2026 06:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

@luisremis luisremis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add more testing. for instance, test and verify that images of smaller size will produce larger batches when using AddImage.

Address review comment: verify that AddImage items of smaller size
produce larger batches.
Copilot AI review requested due to automatic review settings May 25, 2026 01:34
@ad-claw000
Copy link
Copy Markdown
Contributor Author

Added the requested test verifying that AddImage operations with smaller image blobs correctly produce larger batches when using max_bytes_per_batch, as requested in the review. See commit 9bdf19d.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 25, 2026 11:08
@ad-claw000
Copy link
Copy Markdown
Contributor Author

Added a unit test (test_dynamic_batching_add_image_variable_sizes) that verifies smaller images produce larger batches when using AddImage, as requested. See commit c9fcd3e.

@ad-claw000
Copy link
Copy Markdown
Contributor Author

Also applied pre-commit (autopep8) fixes in commit 8efe264.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread aperturedb/ParallelQuery.py Outdated
Copilot AI review requested due to automatic review settings May 25, 2026 16:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread aperturedb/ParallelQuery.py Outdated
Copilot AI review requested due to automatic review settings May 26, 2026 19:59
@ad-claw000
Copy link
Copy Markdown
Contributor Author

Addressed the feedback about testing AddImage with variable batch sizes. Unit tests were added in commit c9fcd3e. PTAL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ParallelQuery batchsize cannot adjust for query size

3 participants