Evaluation: Improve error handling for batch job#724
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughDetects OpenAI batch jobs that finish with only an error file, downloads and aggregates JSONL error messages, persists an aggregated error to the batch job and eval run, and returns an explicit failed result when all requests failed. Changes
Sequence Diagram(s)sequenceDiagram
participant Evaluator as EvaluatorService
participant Poller as poll_batch_status
participant Storage as BatchProviderStorage
participant DB as Database(update_batch_job / eval_run)
participant Logger as Logger
rect rgba(100,150,250,0.5)
Evaluator->>Poller: poll_batch_status(batch_job_id)
Poller-->>Evaluator: provider_status="completed", provider_output_file_id=null, error_file_id
end
rect rgba(150,200,100,0.5)
Evaluator->>Storage: download_file(error_file_id)
Storage-->>Evaluator: JSONL error records
Evaluator->>Evaluator: parse & aggregate most frequent error (msg, count, total)
Evaluator->>DB: update_batch_job(batch_job_id, error_message)
Evaluator->>DB: mark eval_run as failed with error_message
Evaluator->>Logger: log failure with aggregated message
Evaluator-->>Client: return { action: "failed", error: aggregated_message }
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
backend/app/tests/crud/evaluations/test_processing.py (1)
803-839: Please move this scenario behind a factory-backed fixture.This test duplicates
BatchJobandEvaluationRunconstruction that's already repeated elsewhere in the module. A factory/fixture would keep the test focused on the failure behavior and make future model changes cheaper to absorb.As per coding guidelines,
Use factory pattern for test fixtures in backend/app/tests/.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/crud/evaluations/test_processing.py` around lines 803 - 839, Extract the inline BatchJob and EvaluationRun setup into a reusable factory-backed fixture and replace the duplicated construction in this test with that fixture; specifically, create a fixture that constructs a BatchJob (using the same fields: provider, provider_batch_id "batch_all_fail", provider_status "completed", job_type BatchJobType.EVALUATION, total_items, status, organization_id, project_id, timestamps) and an EvaluationRun created via create_evaluation_run (then set eval_run.batch_job_id and eval_run.status="processing"), ensure provider_output_file_id remains unset, register the fixture in the test module and update this test to accept the fixture instead of performing inline DB.add/commit/refresh calls so other tests can reuse BatchJob and EvaluationRun setups.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/crud/evaluations/processing.py`:
- Around line 105-112: The except block in _extract_batch_error_message is
currently catching all exceptions (including DB write errors from
update_batch_job) and returning a generic message, which can hide persistence
failures; change the logic so that only errors related to extracting/formatting
the error message are swallowed here, and any exception raised by
update_batch_job (or any persistence call that sets batch_job.error_message) is
not suppressed—either move the call to update_batch_job out of this broad
try/except, or catch Exception e, detect if it originates from update_batch_job
(or re-raise after logging), ensuring update_batch_job failures are logged and
propagated (do not return the fallback string when update_batch_job fails) so
the caller can observe persistence errors and batch_job.error_message isn't left
unpersisted.
- Around line 631-645: In the provider_status == "completed" branch, the code
incorrectly checks batch_job.provider_output_file_id (which may be stale) to
decide if all requests failed; instead use the freshly polled value from
poll_batch_status (status_result["provider_output_file_id"]) when deciding the
"all requests failed" path—i.e., replace or supplement checks that reference
batch_job.provider_output_file_id with the status_result value (also consider
status_result.get("error_file_id") as already used) so a completed row with a
newly returned provider_output_file_id is not misclassified; see
poll_batch_status, status_result, and batch_job.provider_output_file_id for the
exact symbols to update.
In `@backend/app/tests/crud/evaluations/test_processing.py`:
- Around line 794-801: The test coroutine
test_check_and_process_evaluation_completed_all_requests_failed is missing type
annotations for its mock parameters and return type; update its signature to add
explicit types (e.g. mock_provider_cls: Mock, mock_poll: Mock, mock_get_batch:
Mock, db: Session, test_dataset: Any) and add the return annotation -> None, and
ensure you import typing.Any and unittest.mock.Mock (or the preferred mock type
used in the repo) at the top of the test file so the annotations resolve.
---
Nitpick comments:
In `@backend/app/tests/crud/evaluations/test_processing.py`:
- Around line 803-839: Extract the inline BatchJob and EvaluationRun setup into
a reusable factory-backed fixture and replace the duplicated construction in
this test with that fixture; specifically, create a fixture that constructs a
BatchJob (using the same fields: provider, provider_batch_id "batch_all_fail",
provider_status "completed", job_type BatchJobType.EVALUATION, total_items,
status, organization_id, project_id, timestamps) and an EvaluationRun created
via create_evaluation_run (then set eval_run.batch_job_id and
eval_run.status="processing"), ensure provider_output_file_id remains unset,
register the fixture in the test module and update this test to accept the
fixture instead of performing inline DB.add/commit/refresh calls so other tests
can reuse BatchJob and EvaluationRun setups.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0c1e6a24-d519-49de-95dc-90fa9c6f75f4
📒 Files selected for processing (2)
backend/app/crud/evaluations/processing.pybackend/app/tests/crud/evaluations/test_processing.py
| async def test_check_and_process_evaluation_completed_all_requests_failed( | ||
| self, | ||
| mock_provider_cls, | ||
| mock_poll, | ||
| mock_get_batch, | ||
| db: Session, | ||
| test_dataset, | ||
| ): |
There was a problem hiding this comment.
Add the missing type annotations on the new test coroutine.
The injected mocks and test_dataset fixture are untyped here, and the function is also missing -> None. That breaks the repo-wide Python typing rule.
Suggested fix
async def test_check_and_process_evaluation_completed_all_requests_failed(
self,
- mock_provider_cls,
- mock_poll,
- mock_get_batch,
+ mock_provider_cls: MagicMock,
+ mock_poll: MagicMock,
+ mock_get_batch: MagicMock,
db: Session,
- test_dataset,
- ):
+ test_dataset: EvaluationDataset,
+ ) -> None:As per coding guidelines, Always add type hints to all function parameters and return values in Python code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/tests/crud/evaluations/test_processing.py` around lines 794 -
801, The test coroutine
test_check_and_process_evaluation_completed_all_requests_failed is missing type
annotations for its mock parameters and return type; update its signature to add
explicit types (e.g. mock_provider_cls: Mock, mock_poll: Mock, mock_get_batch:
Mock, db: Session, test_dataset: Any) and add the return annotation -> None, and
ensure you import typing.Any and unittest.mock.Mock (or the preferred mock type
used in the repo) at the top of the test file so the annotations resolve.
Summary
Target issue is #723
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Bug Fixes
Tests
Added comprehensive test coverage for batch error extraction and handling of all-failed batch completion scenarios.
Screenshot
Summary by CodeRabbit
Bug Fixes
Tests