Evaluation: add export_format query param for grouped trace export#562
Evaluation: add export_format query param for grouped trace export#562
Conversation
📝 WalkthroughWalkthroughAdds an Changes
Sequence DiagramsequenceDiagram
participant Client
participant API_Route as API Route
participant Validator
participant CRUD
participant DB as EvalRun/Score
participant Response
Client->>API_Route: GET /api/v1/evaluations/{id}?export_format=grouped&get_trace_info=true
API_Route->>Validator: Validate export_format & get_trace_info
alt invalid (grouped without trace info)
Validator-->>API_Route: validation error
API_Route-->>Client: 400 Bad Request
else valid
API_Route->>DB: Load evaluation run & score (including traces)
DB-->>API_Route: score with traces
API_Route->>CRUD: group_traces_by_question_id(traces)
alt grouping succeeds
CRUD-->>API_Route: grouped traces list
API_Route->>Response: build 200 payload (grouped format)
Response-->>Client: 200 OK with grouped traces
else grouping fails
CRUD-->>API_Route: ValueError / error message
API_Route-->>Client: 400 Bad Request with failure message
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/app/crud/evaluations/core.py`:
- Around line 341-344: The check currently only inspects traces[0] for a
question_id which allows later traces with missing or empty question_id to
create a None key and break sorting; update the validation in the function that
handles traces to iterate over all traces (e.g., use any(t.get("question_id") in
(None, "") for t in traces) or a for-loop) and raise the same ValueError if any
trace has a missing/empty question_id; also fix the error message text to
"whether" instead of "weather" so the raised message reads "Grouped export
format is not available for this evaluation." Reference the symbols traces,
question_id, and the grouping/sorting logic (groups and sorted(groups.keys()))
when making the change.
In `@backend/app/tests/api/routes/test_evaluation.py`:
- Around line 1037-1044: The inline comment next to the client.get call is
stale; in the params dict the key "get_trace_info" is already set to True, so
remove the incorrect comment ("# Missing get_trace_info=true") adjacent to the
response = client.get(...) call (the params containing "export_format" and
"get_trace_info") to avoid confusion.
🧹 Nitpick comments (7)
backend/app/crud/evaluations/core.py (3)
319-338: Incomplete docstring with placeholder text.The docstring contains placeholder descriptions (
Description) that should be properly filled in. Per coding guidelines, Python functions should have complete documentation.📝 Suggested docstring improvement
def group_traces_by_question_id( traces: list[dict[str, Any]], ) -> list[dict[str, Any]]: """ - Docstring for group_traces_by_question_id - - :param traces: Description - :type traces: list[dict[str, Any]] - :return: Description - :rtype: list[dict[str, Any]] - + Group traces by question_id for grouped export format. + + Args: + traces: List of trace dictionaries containing question_id, question, + ground_truth_answer, llm_answer, trace_id, and scores. + Returns: List of grouped traces sorted by question_id: [ { "question_id": 1, "question": "What is Python?", "ground_truth_answer": "...", "llm_answers": ["Answer 1", "Answer 2"], "trace_ids": ["trace-1", "trace-2"], "scores": [[...], [...]] } ] + + Raises: + ValueError: If traces lack a valid question_id. """
346-352: Type hint mismatch:question_idmay not always beint.The type hint
dict[int, list[dict[str, Any]]]assumesquestion_idis always an integer, but the code uses.get("question_id")which could return any type. Ifquestion_idvalues are strings or mixed types, this could lead to unexpected sorting behavior.♻️ Suggested type hint adjustment
- groups: dict[int, list[dict[str, Any]]] = {} + groups: dict[Any, list[dict[str, Any]]] = {}
367-368: Add trailing newline at end of file.Per static analysis hint (Ruff W292), add a newline at the end of the file.
logger.info(f"[group_traces_by_question_id] Created {len(result)} groups") return result +backend/app/api/routes/evaluations/evaluation.py (2)
141-145: Minor style inconsistency: extra space around=instatus_code.Line 143 uses
status_code = 400with spaces, while line 138 usesstatus_code=400without spaces. Consider keeping consistent style.♻️ Consistent formatting
if export_format == "grouped" and not get_trace_info: raise HTTPException( - status_code = 400, + status_code=400, detail="export_format=grouped requires get_trace_info=true" )
164-171: Consider moving the import to the top of the file and avoiding in-place mutation.
The import at line 166 should ideally be at the top of the file with other imports for better readability and consistency.
The code mutates
eval_run.score["traces"]in-place (line 169). Ifeval_runis a SQLModel instance that might be used elsewhere or persisted, this could have unintended side effects. Consider creating a copy for the response instead.♻️ Proposed refactor
Move the import to the top of the file:
from app.crud.evaluations.core import group_traces_by_question_idThen modify the grouping logic to avoid in-place mutation:
# Formatter = grouped if export_format == "grouped" and eval_run.score and "traces" in eval_run.score: - from app.crud.evaluations.core import group_traces_by_question_id try: grouped_traces = group_traces_by_question_id(eval_run.score["traces"]) - eval_run.score["traces"] = grouped_traces + # Create a copy to avoid mutating the model instance + eval_run.score = {**eval_run.score, "grouped_traces": grouped_traces} except ValueError as e: return APIResponse.failure_response(error=str(e), data=eval_run)Note: This also aligns with the documentation which shows
grouped_tracesas a separate field.backend/app/tests/api/routes/test_evaluation.py (2)
959-1061: Consider adding test for ValueError case (traces without question_id).The success test is comprehensive, but there's no test coverage for the case where traces lack
question_idand the grouping function raises aValueError. This is an important error path handled in the route (lines 170-171 of evaluation.py).📝 Suggested additional test
def test_get_evaluation_run_grouped_format_missing_question_id_fails( self, client: TestClient, user_api_key_header: dict[str, str], db: Session, user_api_key: TestAuthContext, create_test_dataset: EvaluationDataset, ) -> None: """Test grouped export fails when traces lack question_id.""" eval_run = EvaluationRun( run_name="test_run_no_qid", dataset_name=create_test_dataset.name, dataset_id=create_test_dataset.id, config={"model": "gpt-4o"}, status="completed", total_items=1, score={ "traces": [ { "trace_id": "trace-1", "question": "What is Python?", "llm_answer": "A language", "scores": [], # No question_id } ], "summary_scores": [], }, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, ) db.add(eval_run) db.commit() db.refresh(eval_run) response = client.get( f"/api/v1/evaluations/{eval_run.id}", params={"export_format": "grouped", "get_trace_info": True}, headers=user_api_key_header, ) assert response.status_code == 200 response_data = response.json() assert response_data["success"] is False assert "not available" in response_data["error"].lower()
1061-1062: Missing blank line before next class definition.PEP 8 recommends two blank lines before class definitions at the module level.
) + class TestGetDataset:
backend/app/crud/evaluations/core.py
Outdated
| # weather question_id exists in the traces | ||
| if traces and (traces[0].get("question_id") is None or traces[0].get("question_id") == ""): | ||
| raise ValueError( | ||
| "Grouped export format is not available for this evaluation.") |
There was a problem hiding this comment.
Validation only checks the first trace, missing question_id in other traces will cause issues.
The current validation only checks traces[0] for question_id. If subsequent traces have missing or empty question_id, they will be grouped under None as a key, and sorted(groups.keys()) on line 355 will raise a TypeError when comparing None with integers.
Also, typo: "weather" should be "whether".
🐛 Proposed fix for robust validation
- # weather question_id exists in the traces
- if traces and (traces[0].get("question_id") is None or traces[0].get("question_id") == ""):
+ # Validate that all traces have a valid question_id
+ if not traces:
+ return []
+
+ for trace in traces:
+ question_id = trace.get("question_id")
+ if question_id is None or question_id == "":
+ raise ValueError(
+ "Grouped export format is not available for this evaluation.")
- raise ValueError(
- "Grouped export format is not available for this evaluation.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # weather question_id exists in the traces | |
| if traces and (traces[0].get("question_id") is None or traces[0].get("question_id") == ""): | |
| raise ValueError( | |
| "Grouped export format is not available for this evaluation.") | |
| # Validate that all traces have a valid question_id | |
| if not traces: | |
| return [] | |
| for trace in traces: | |
| question_id = trace.get("question_id") | |
| if question_id is None or question_id == "": | |
| raise ValueError( | |
| "Grouped export format is not available for this evaluation.") |
🤖 Prompt for AI Agents
In `@backend/app/crud/evaluations/core.py` around lines 341 - 344, The check
currently only inspects traces[0] for a question_id which allows later traces
with missing or empty question_id to create a None key and break sorting; update
the validation in the function that handles traces to iterate over all traces
(e.g., use any(t.get("question_id") in (None, "") for t in traces) or a
for-loop) and raise the same ValueError if any trace has a missing/empty
question_id; also fix the error message text to "whether" instead of "weather"
so the raised message reads "Grouped export format is not available for this
evaluation." Reference the symbols traces, question_id, and the grouping/sorting
logic (groups and sorted(groups.keys())) when making the change.
| response = client.get( | ||
| f"/api/v1/evaluations/{eval_run.id}", | ||
| params={ | ||
| "export_format": "grouped", | ||
| "get_trace_info": True, | ||
| }, # Missing get_trace_info=true | ||
| headers=user_api_key_header, | ||
| ) |
There was a problem hiding this comment.
Stale comment: get_trace_info is actually True here.
The comment on line 1042 says # Missing get_trace_info=true but get_trace_info: True is present in the params. This appears to be copy-paste from the previous test.
🧹 Remove stale comment
response = client.get(
f"/api/v1/evaluations/{eval_run.id}",
params={
"export_format": "grouped",
"get_trace_info": True,
- }, # Missing get_trace_info=true
+ },
headers=user_api_key_header,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = client.get( | |
| f"/api/v1/evaluations/{eval_run.id}", | |
| params={ | |
| "export_format": "grouped", | |
| "get_trace_info": True, | |
| }, # Missing get_trace_info=true | |
| headers=user_api_key_header, | |
| ) | |
| response = client.get( | |
| f"/api/v1/evaluations/{eval_run.id}", | |
| params={ | |
| "export_format": "grouped", | |
| "get_trace_info": True, | |
| }, | |
| headers=user_api_key_header, | |
| ) |
🤖 Prompt for AI Agents
In `@backend/app/tests/api/routes/test_evaluation.py` around lines 1037 - 1044,
The inline comment next to the client.get call is stale; in the params dict the
key "get_trace_info" is already set to True, so remove the incorrect comment ("#
Missing get_trace_info=true") adjacent to the response = client.get(...) call
(the params containing "export_format" and "get_trace_info") to avoid confusion.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/app/crud/evaluations/core.py`:
- Around line 358-366: The docstring for group_traces_by_question_id has
trailing whitespace and Black formatting violations; run the project's
formatter/pre-commit (e.g., black/isort) and remove any trailing spaces in that
docstring block, reflow or re-indent lines to match Black's style (keep the
triple-quoted docstring, parameter lines and Returns section neatly wrapped),
then re-run tests/CI to ensure the formatting errors are resolved.
🧹 Nitpick comments (1)
backend/app/tests/api/routes/test_evaluation.py (1)
939-1045: Prefer a factory/fixture forEvaluationRunsetup in these tests.To align with the test fixture pattern and reduce duplication, consider extracting the
EvaluationRuncreation into a factory/fixture helper (e.g.,create_test_evaluation_run).As per coding guidelines: “backend/app/tests/**/*.py: Use factory pattern for test fixtures in backend/app/tests/”.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…pdate docstring for clarity
a164ee5 to
e6de3d0
Compare
Summary
Target issue is #545
Explain the motivation for making this change. What existing problem does the pull request solve?
The frontend needs a new CSV export format that groups repeated questions horizontally, but the current API only supports row-based export where each iteration appears as a separate row.
Solution: This PR extends the existing evaluation export API to support a grouped format with structure:
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
GET /api/v1/evaluations/{id}?get_trace_info=true&export_format=grouped
Validation: export_format=grouped requires get_trace_info=true, returns error if traces don't have question_id.
Need to merge this PR: #553 , before merging this.
Summary by CodeRabbit
New Features
Bug Fixes / Validation
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.