feat(parallel): allow dynamic batch sizing by byte limits#669
feat(parallel): allow dynamic batch sizing by byte limits#669ad-claw000 wants to merge 43 commits into
Conversation
Release 0.4.56
Release 0.4.57
Release 0.4.58
354cb9b to
bab2793
Compare
651c7c5 to
709ec68
Compare
|
Fixed the pre-commit CI failure by applying autopep8 formatting to the new log messages. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support for dynamically sized batches in ParallelQuery based on an optional per-batch byte limit to avoid oversized requests (e.g., for large blobs).
Changes:
- Introduces
max_bytes_per_batchparameter toParallelQuery.query(). - Updates
worker()to optionally build batches by estimated blob byte size instead of fixedbatchsize.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2f950e0 to
43a3810
Compare
ad-claw000
left a comment
There was a problem hiding this comment.
Applied autopep8 formatting to the log messages to fix the pre-commit CI failure.
ad-claw000
left a comment
There was a problem hiding this comment.
Applied autopep8 formatting to the new log messages to fix the pre-commit CI failure.
luisremis
left a comment
There was a problem hiding this comment.
add more testing. for instance, test and verify that images of smaller size will produce larger batches when using AddImage.
Address review comment: verify that AddImage items of smaller size produce larger batches.
|
Added the requested test verifying that AddImage operations with smaller image blobs correctly produce larger batches when using max_bytes_per_batch, as requested in the review. See commit 9bdf19d. |
|
Added a unit test ( |
|
Also applied pre-commit (autopep8) fixes in commit 8efe264. |
|
Addressed the feedback about testing AddImage with variable batch sizes. Unit tests were added in commit |
Closes #328
This PR introduces an optional
max_bytes_per_batchparameter toParallelQuery.query(). When set, the worker overrides its staticbatchsizestrategy and instead aggregates tuples iteratively up to the specified byte threshold (estimating size mostly via the blob list byte lengths).This allows queries involving very large blobs (like images) to batch gracefully without exceeding the DB's internal 2048 MB limits, while letting descriptor batches stay safely massive.