Collections: batch size default to 10#689
Conversation
📝 WalkthroughWalkthroughThis PR changes the default collection batch_size from 1 to 10, updates docs to document the parameter, applies a service-layer fallback to 10 when unspecified, and updates tests to use the new default. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/models/collection.py (1)
105-112:⚠️ Potential issue | 🟡 MinorAdd validation constraint to prevent invalid
batch_sizevalues.The
batch_sizefield accepts any integer including 0 or negative values, but the provider layer inopenai.pyusesbatch_size or 10, silently coercing falsy values to 10. This creates an inconsistency where the API acceptsbatch_size=0without feedback but internally uses 10.Add a
ge=1constraint to reject invalid values at the model validation layer:🛡️ Proposed fix
batch_size: int = Field( default=10, + ge=1, description=( "Number of documents to send to OpenAI in a single " "transaction. See the `file_ids` parameter in the " "vector store [create batch](https://platform.openai.com/docs/api-reference/vector-stores-file-batches/createBatch)." ), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/models/collection.py` around lines 105 - 112, The collection model's batch_size Field currently allows zero or negative ints causing silent coercion in the provider (openai.py) — update the Field definition for batch_size in collection.py to add a validation constraint ge=1 so values must be >=1 (i.e., change the Field(...) call for batch_size to include ge=1) so API-level validation rejects invalid inputs before reaching provider logic; keep the existing description unchanged.
🧹 Nitpick comments (1)
backend/app/services/collections/providers/openai.py (1)
33-38: Redundant fallback creates silent coercion forbatch_size=0.The
or 10fallback is problematic: if a user explicitly passesbatch_size=0, it silently becomes 10 with no validation error or warning. SinceCollectionOptionsalready defaults to 10, this fallback only matters for falsy values (0, None).If validation is added to the model (recommended), this fallback becomes unnecessary. Otherwise, consider explicit validation here:
♻️ Option 1: Remove fallback if model validates ge=1
- # Use user-provided batch_size, default to 10 if not set - batch_size = collection_request.batch_size or 10 + batch_size = collection_request.batch_size docs_batches = batch_documents( document_crud, collection_request.documents, batch_size, )♻️ Option 2: Keep fallback but add explicit check for invalid values
- # Use user-provided batch_size, default to 10 if not set - batch_size = collection_request.batch_size or 10 + # Use user-provided batch_size, default to 10 if not set or invalid + if collection_request.batch_size is None or collection_request.batch_size < 1: + batch_size = 10 + else: + batch_size = collection_request.batch_size🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/collections/providers/openai.py` around lines 33 - 38, The current fallback "batch_size = collection_request.batch_size or 10" silently converts invalid falsy values (e.g., 0) to 10; update the logic in the block that sets batch_size (referencing collection_request.batch_size and batch_documents) to handle validation explicitly: if collection_request.batch_size is None keep the default of 10 (or use CollectionOptions default), but if it's provided validate that it's an integer >= 1 and raise a clear error (or return a validation response) for invalid values (like 0 or negatives) before calling batch_documents; alternatively, remove the fallback entirely if CollectionOptions already enforces ge=1.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/tests/services/collections/test_create_collection.py`:
- Around line 351-356: The CreationRequest instantiation in sample_request is
invalid because a positional argument (0) appears after a keyword argument; fix
by either moving the positional 0 before any keyword arguments or, preferably,
convert it to its explicit keyword name (e.g., priority=0 or the correct
parameter name for that positional arg) inside the CreationRequest call so the
call becomes syntactically valid.
---
Outside diff comments:
In `@backend/app/models/collection.py`:
- Around line 105-112: The collection model's batch_size Field currently allows
zero or negative ints causing silent coercion in the provider (openai.py) —
update the Field definition for batch_size in collection.py to add a validation
constraint ge=1 so values must be >=1 (i.e., change the Field(...) call for
batch_size to include ge=1) so API-level validation rejects invalid inputs
before reaching provider logic; keep the existing description unchanged.
---
Nitpick comments:
In `@backend/app/services/collections/providers/openai.py`:
- Around line 33-38: The current fallback "batch_size =
collection_request.batch_size or 10" silently converts invalid falsy values
(e.g., 0) to 10; update the logic in the block that sets batch_size (referencing
collection_request.batch_size and batch_documents) to handle validation
explicitly: if collection_request.batch_size is None keep the default of 10 (or
use CollectionOptions default), but if it's provided validate that it's an
integer >= 1 and raise a clear error (or return a validation response) for
invalid values (like 0 or negatives) before calling batch_documents;
alternatively, remove the fallback entirely if CollectionOptions already
enforces ge=1.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a4b676f7-183e-45a8-b409-b8c881d53983
📒 Files selected for processing (6)
backend/app/api/docs/collections/create.mdbackend/app/models/collection.pybackend/app/services/collections/providers/openai.pybackend/app/tests/api/routes/collections/test_create_collections.pybackend/app/tests/services/collections/providers/test_openai_provider.pybackend/app/tests/services/collections/test_create_collection.py
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
|
||
| sample_request = CreationRequest( | ||
| documents=[document.id], batch_size=1, callback_url=None, provider="openai" | ||
| documents=[document.id], batch_size=10, callback_url=None, provider="openai" |
There was a problem hiding this comment.
not sure how this works, but increasing size and still passing one document id does not make sense. it should be list of documents somewhere but not sure about rest of the code
There was a problem hiding this comment.
yes, can write a test where num docs >10, and the test can then check whether batching happens as expected
Summary
Target issue is #690
Notes
Documentation
Enhancements
Summary by CodeRabbit
Release Notes
Documentation
Improvements