Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds STT sample PATCH endpoint and models/CRUD to update language and ground truth; introduces signed_url and score handling for TTS dataset/feedback flows; exposes dataset description in responses; standardizes evaluation score naming; tightens CSV header validation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as STT API (routes)
participant Auth as Auth/RequireProject
participant CRUD as CRUD layer
participant DB as Database
Client->>API: PATCH /datasets/samples/{id} with STTSampleUpdate
API->>Auth: REQUIRE_PROJECT (verify project/org)
Auth-->>API: OK
API->>CRUD: get_stt_sample_by_id(session, sample_id, org_id, project_id)
CRUD->>DB: SELECT sample WHERE id, org_id, project_id
DB-->>CRUD: STTSample (or null)
CRUD-->>API: STTSample
API->>CRUD: update_stt_sample(session, sample_id, org_id, project_id, language_id?, ground_truth?)
CRUD->>DB: UPDATE fields, set updated_at
DB-->>CRUD: updated STTSample
CRUD-->>API: updated STTSample
API-->>Client: 200 OK with STTSamplePublic (full representation)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
backend/app/crud/tts_evaluations/result.py (1)
240-242: Sync docstring with the newscoreupdate path.Now that
scoreis handled, the**kwargsdocstring should mention it to avoid stale API expectations.Suggested docstring tweak
- **kwargs: Fields to update (is_correct, comment) + **kwargs: Fields to update (is_correct, comment, score)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/tts_evaluations/result.py` around lines 240 - 242, The docstring for the function/method that consumes **kwargs and sets result.score (the block assigning result.score = kwargs["score"]) is out of date; update that function's docstring to mention the new optional "score" key in **kwargs (its expected type, valid range/semantics, and behavior when present), adjust any examples or parameter lists to include "score", and ensure the docstring style (Google/Sphinx) remains consistent so callers know this kwargs entry is supported.backend/app/crud/stt_evaluations/dataset.py (1)
269-279: Consider guarding against empty updates to avoid no-op writes.When both fields are
None, Line 275-Line 279 still commits and updatesupdated_at. A small guard would make PATCH behavior clearer and reduce unnecessary writes.Possible guard clause
def update_stt_sample( @@ ) -> STTSample: @@ + if language_id is None and ground_truth is None: + raise HTTPException( + status_code=400, + detail="At least one field (language_id or ground_truth) must be provided", + ) + sample = get_stt_sample_by_id( session=session, sample_id=sample_id,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/stt_evaluations/dataset.py` around lines 269 - 279, Add a guard to avoid no-op writes: check whether either language_id or ground_truth is not None before setting fields/updated_at and calling session.add()/commit()/refresh(). If both are None, skip updating sample.updated_at and the database calls (optionally return the untouched sample or a boolean indicating no change). Update the block that assigns sample.language_id, sample.ground_truth, sample.updated_at and calls session.add/commit/refresh to only run when at least one of language_id or ground_truth is provided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/docs/tts_evaluation/update_feedback.md`:
- Around line 16-17: Replace the placeholder pipe-separated strings in the JSON
example with concrete enum values: change the values for "Speech Naturalness"
and "Pronunciation Accuracy" from "low | medium | high" to a single valid
example such as "low" (or "medium"/"high") so the example is a valid JSON
payload and unambiguous when copied.
In `@backend/app/api/routes/tts_evaluations/dataset.py`:
- Around line 95-99: The loop that builds response items calls
storage.get_signed_url(dataset.object_store_url) directly inside the routes that
produce TTSDatasetPublic.from_model, which lets exceptions from get_signed_url
bubble up and return 500s; wrap the call in a try/except (or try/catch) to catch
errors from storage.get_signed_url and handle them by logging the error and
setting signed_url to None (or a safe fallback) so the route (the function that
iterates over datasets and calls TTSDatasetPublic.from_model) continues
returning the rest of the items; apply the same pattern to the other
loop/occurrence around the lines referenced (the second block that also calls
storage.get_signed_url) so both list/get handlers are hardened.
In `@backend/app/crud/evaluations/langfuse.py`:
- Line 212: The metric is being written with a human-readable name ("Cosine
Similarity") which breaks downstream aggregation expecting the canonical machine
key "cosine_similarity"; update the Langfuse metric creation call (the place
setting name="Cosine Similarity") to use the stable key "cosine_similarity"
(e.g., key="cosine_similarity" or metric_key="cosine_similarity" depending on
the API) and, if supported, keep the human label in a separate display_name/name
field so consumers continue to read the friendly label while aggregation uses
the machine-stable key.
In `@backend/app/crud/evaluations/processing.py`:
- Line 395: The change replaced the summary score "name" with "Cosine
Similarity", which breaks downstream code that matches the exact key
"cosine_similarity"; revert this to the stable identifier or explicitly include
a stable key field. Locate the score dictionary created in processing.py (the
summary score entry around where "name": "Cosine Similarity" is set) and either
restore the original name value that clients expect or add a new unchanging
field such as "key": "cosine_similarity" (keeping the human-friendly "name" if
desired) so downstream filters/tests can continue to match by the stable
identifier.
In `@backend/app/services/evaluations/validators.py`:
- Around line 140-152: The validation currently uses set(clean_headers.keys())
which hides duplicate normalized headers (e.g., "Question" and " question")
because set removes multiplicity; update the validator in validators.py to
detect duplicates by counting occurrences of normalized headers (use
clean_headers mapping or rebuild a normalized-name -> list of originals), and if
any normalized header appears more than once raise HTTPException(422) with a
clear detail listing the duplicated normalized header(s) and their original raw
headers (include csv_reader.fieldnames in the message for context); keep the
existing missing/extra logic but run the duplicate check first so ambiguous CSVs
are rejected explicitly.
---
Nitpick comments:
In `@backend/app/crud/stt_evaluations/dataset.py`:
- Around line 269-279: Add a guard to avoid no-op writes: check whether either
language_id or ground_truth is not None before setting fields/updated_at and
calling session.add()/commit()/refresh(). If both are None, skip updating
sample.updated_at and the database calls (optionally return the untouched sample
or a boolean indicating no change). Update the block that assigns
sample.language_id, sample.ground_truth, sample.updated_at and calls
session.add/commit/refresh to only run when at least one of language_id or
ground_truth is provided.
In `@backend/app/crud/tts_evaluations/result.py`:
- Around line 240-242: The docstring for the function/method that consumes
**kwargs and sets result.score (the block assigning result.score =
kwargs["score"]) is out of date; update that function's docstring to mention the
new optional "score" key in **kwargs (its expected type, valid range/semantics,
and behavior when present), adjust any examples or parameter lists to include
"score", and ensure the docstring style (Google/Sphinx) remains consistent so
callers know this kwargs entry is supported.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e4765f2b-594f-4dce-a969-d4f023e60c31
📒 Files selected for processing (16)
backend/app/api/docs/stt_evaluation/update_sample.mdbackend/app/api/docs/tts_evaluation/update_feedback.mdbackend/app/api/routes/evaluations/dataset.pybackend/app/api/routes/stt_evaluations/dataset.pybackend/app/api/routes/tts_evaluations/dataset.pybackend/app/api/routes/tts_evaluations/result.pybackend/app/crud/evaluations/langfuse.pybackend/app/crud/evaluations/processing.pybackend/app/crud/stt_evaluations/__init__.pybackend/app/crud/stt_evaluations/dataset.pybackend/app/crud/tts_evaluations/dataset.pybackend/app/crud/tts_evaluations/result.pybackend/app/models/evaluation.pybackend/app/models/stt_evaluation.pybackend/app/models/tts_evaluation.pybackend/app/services/evaluations/validators.py
| "Speech Naturalness": "low | medium | high", | ||
| "Pronunciation Accuracy": "low | medium | high" |
There was a problem hiding this comment.
Use concrete enum values in the JSON example.
On Line 16 and Line 17, "low | medium | high" reads like a literal payload value and can lead to invalid requests when copied directly.
Suggested doc tweak
"score": {
- "Speech Naturalness": "low | medium | high",
- "Pronunciation Accuracy": "low | medium | high"
+ "Speech Naturalness": "medium",
+ "Pronunciation Accuracy": "high"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "Speech Naturalness": "low | medium | high", | |
| "Pronunciation Accuracy": "low | medium | high" | |
| "score": { | |
| "Speech Naturalness": "medium", | |
| "Pronunciation Accuracy": "high" | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/docs/tts_evaluation/update_feedback.md` around lines 16 - 17,
Replace the placeholder pipe-separated strings in the JSON example with concrete
enum values: change the values for "Speech Naturalness" and "Pronunciation
Accuracy" from "low | medium | high" to a single valid example such as "low" (or
"medium"/"high") so the example is a valid JSON payload and unambiguous when
copied.
| for dataset in datasets: | ||
| signed_url = None | ||
| if storage and dataset.object_store_url: | ||
| signed_url = storage.get_signed_url(dataset.object_store_url) | ||
| data.append(TTSDatasetPublic.from_model(dataset, signed_url=signed_url)) |
There was a problem hiding this comment.
Handle signed URL generation failures explicitly.
storage.get_signed_url(...) is an external call that can fail (invalid URL/storage transient). Right now, that bubbles into 500s and can break full list/get responses.
Suggested hardening
@@
- for dataset in datasets:
+ for dataset in datasets:
signed_url = None
if storage and dataset.object_store_url:
- signed_url = storage.get_signed_url(dataset.object_store_url)
+ try:
+ signed_url = storage.get_signed_url(dataset.object_store_url)
+ except Exception as err:
+ logger.warning(
+ f"[list_datasets] Signed URL generation failed | dataset_id={dataset.id} | error={err}"
+ )
data.append(TTSDatasetPublic.from_model(dataset, signed_url=signed_url))
@@
signed_url = None
if include_signed_url and dataset.object_store_url:
storage = get_cloud_storage(
session=session, project_id=auth_context.project_.id
)
- signed_url = storage.get_signed_url(dataset.object_store_url)
+ try:
+ signed_url = storage.get_signed_url(dataset.object_store_url)
+ except Exception as err:
+ logger.error(
+ f"[get_dataset] Signed URL generation failed | dataset_id={dataset_id} | error={err}",
+ exc_info=True,
+ )
+ raise HTTPException(
+ status_code=502, detail="Failed to generate signed URL"
+ ) from errAlso applies to: 133-138
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/routes/tts_evaluations/dataset.py` around lines 95 - 99, The
loop that builds response items calls
storage.get_signed_url(dataset.object_store_url) directly inside the routes that
produce TTSDatasetPublic.from_model, which lets exceptions from get_signed_url
bubble up and return 500s; wrap the call in a try/except (or try/catch) to catch
errors from storage.get_signed_url and handle them by logging the error and
setting signed_url to None (or a safe fallback) so the route (the function that
iterates over datasets and calls TTSDatasetPublic.from_model) continues
returning the rest of the items; apply the same pattern to the other
loop/occurrence around the lines referenced (the second block that also calls
storage.get_signed_url) so both list/get handlers are hardened.
| langfuse.score( | ||
| trace_id=trace_id, | ||
| name="cosine_similarity", | ||
| name="Cosine Similarity", |
There was a problem hiding this comment.
Use a machine-stable metric key when writing Langfuse scores.
Line 212 sets name="Cosine Similarity". This diverges from the existing canonical key "cosine_similarity" used by consumers and aggregation logic, and can lead to missing/duplicated metric series.
Suggested patch
- name="Cosine Similarity",
+ name="cosine_similarity",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name="Cosine Similarity", | |
| name="cosine_similarity", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/evaluations/langfuse.py` at line 212, The metric is being
written with a human-readable name ("Cosine Similarity") which breaks downstream
aggregation expecting the canonical machine key "cosine_similarity"; update the
Langfuse metric creation call (the place setting name="Cosine Similarity") to
use the stable key "cosine_similarity" (e.g., key="cosine_similarity" or
metric_key="cosine_similarity" depending on the API) and, if supported, keep the
human label in a separate display_name/name field so consumers continue to read
the friendly label while aggregation uses the machine-stable key.
| "summary_scores": [ | ||
| { | ||
| "name": "cosine_similarity", | ||
| "name": "Cosine Similarity", |
There was a problem hiding this comment.
Keep score identifiers stable to avoid breaking clients.
Line 395 changes the summary score name to "Cosine Similarity", but downstream code/tests match by exact key "cosine_similarity". This introduces a backward-incompatible API contract change for score filtering.
Suggested patch
- "name": "Cosine Similarity",
+ "name": "cosine_similarity",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "name": "Cosine Similarity", | |
| "name": "cosine_similarity", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/evaluations/processing.py` at line 395, The change replaced
the summary score "name" with "Cosine Similarity", which breaks downstream code
that matches the exact key "cosine_similarity"; revert this to the stable
identifier or explicitly include a stable key field. Locate the score dictionary
created in processing.py (the summary score entry around where "name": "Cosine
Similarity" is set) and either restore the original name value that clients
expect or add a new unchanging field such as "key": "cosine_similarity" (keeping
the human-friendly "name" if desired) so downstream filters/tests can continue
to match by the stable identifier.
| # Validate exactly 'question' and 'answer' columns (case-insensitive) | ||
| if set(clean_headers.keys()) != {"question", "answer"}: | ||
| extra = set(clean_headers.keys()) - {"question", "answer"} | ||
| missing = {"question", "answer"} - set(clean_headers.keys()) | ||
| parts = [] | ||
| if missing: | ||
| parts.append(f"Missing: {sorted(missing)}") | ||
| if extra: | ||
| parts.append(f"Unexpected: {sorted(extra)}") | ||
| raise HTTPException( | ||
| status_code=422, | ||
| detail=f"CSV must contain 'question' and 'answer' columns " | ||
| f"Found columns: {csv_reader.fieldnames}", | ||
| detail=f"CSV must contain exactly 'question' and 'answer' columns. " | ||
| f"{'. '.join(parts)}. Found columns: {csv_reader.fieldnames}", |
There was a problem hiding this comment.
Handle duplicate normalized headers explicitly
The strict check at this block can still accept duplicate headers (case/space variants), because set(...) drops multiplicity. That lets ambiguous CSVs pass while only one duplicate is actually used downstream.
🔧 Suggested patch
- clean_headers = {
- field.strip().lower(): field for field in csv_reader.fieldnames
- }
+ normalized_headers = [field.strip().lower() for field in csv_reader.fieldnames]
+ clean_headers = {
+ field.strip().lower(): field for field in csv_reader.fieldnames
+ }
+ has_duplicate_headers = len(normalized_headers) != len(set(normalized_headers))
# Validate exactly 'question' and 'answer' columns (case-insensitive)
- if set(clean_headers.keys()) != {"question", "answer"}:
+ if set(clean_headers.keys()) != {"question", "answer"} or has_duplicate_headers:
extra = set(clean_headers.keys()) - {"question", "answer"}
missing = {"question", "answer"} - set(clean_headers.keys())
parts = []
if missing:
parts.append(f"Missing: {sorted(missing)}")
if extra:
parts.append(f"Unexpected: {sorted(extra)}")
+ if has_duplicate_headers:
+ parts.append("Duplicate headers are not allowed")
raise HTTPException(
status_code=422,
detail=f"CSV must contain exactly 'question' and 'answer' columns. "
f"{'. '.join(parts)}. Found columns: {csv_reader.fieldnames}",
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/evaluations/validators.py` around lines 140 - 152, The
validation currently uses set(clean_headers.keys()) which hides duplicate
normalized headers (e.g., "Question" and " question") because set removes
multiplicity; update the validator in validators.py to detect duplicates by
counting occurrences of normalized headers (use clean_headers mapping or rebuild
a normalized-name -> list of originals), and if any normalized header appears
more than once raise HTTPException(422) with a clear detail listing the
duplicated normalized header(s) and their original raw headers (include
csv_reader.fieldnames in the message for context); keep the existing
missing/extra logic but run the duplicate check first so ambiguous CSVs are
rejected explicitly.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…/kaapi-backend into enhancement/tts-metric
Summary
Target issue is #683
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
New Features
Documentation