TTS Evaluation: Metric by AkhileshNegi · Pull Request #681 · ProjectTech4DevAI/kaapi-backend

AkhileshNegi · 2026-03-17T04:21:36Z

Summary

Target issue is #683

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

New Features
- Update STT samples: patch language and/or transcription (partial updates supported)
- TTS feedback: add configurable evaluation scores
- TTS datasets: optionally include signed download URLs
- Datasets: descriptions now shown in responses
Documentation
- Docs for STT sample update endpoint
- TTS feedback docs extended with scoring examples

coderabbitai · 2026-03-17T04:21:44Z

Warning

Rate limit exceeded

@AkhileshNegi has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 12 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9acbe99b-3938-41a4-b615-83d96099af84

📥 Commits

Reviewing files that changed from the base of the PR and between 05ff2c7 and 586f728.

📒 Files selected for processing (5)

backend/app/api/routes/languages.py
backend/app/api/routes/stt_evaluations/dataset.py
backend/app/api/routes/tts_evaluations/dataset.py
backend/app/crud/language.py
backend/app/crud/stt_evaluations/dataset.py

📝 Walkthrough

Walkthrough

Adds STT sample PATCH endpoint and models/CRUD to update language and ground truth; introduces signed_url and score handling for TTS dataset/feedback flows; exposes dataset description in responses; standardizes evaluation score naming; tightens CSV header validation.

Changes

Cohort / File(s)	Summary
STT Sample Update Feature `backend/app/api/docs/stt_evaluation/update_sample.md`, `backend/app/api/routes/stt_evaluations/dataset.py`, `backend/app/models/stt_evaluation.py`, `backend/app/crud/stt_evaluations/dataset.py`, `backend/app/crud/stt_evaluations/__init__.py`	Adds `PATCH /datasets/samples/{sample_id}` endpoint, `STTSampleUpdate` model, CRUD functions `get_stt_sample_by_id` and `update_stt_sample`, and public exports; validates language_id when provided and returns updated `STTSamplePublic`.
TTS Dataset & Feedback `backend/app/api/docs/tts_evaluation/update_feedback.md`, `backend/app/api/routes/tts_evaluations/dataset.py`, `backend/app/api/routes/tts_evaluations/result.py`, `backend/app/models/tts_evaluation.py`, `backend/app/crud/tts_evaluations/dataset.py`, `backend/app/crud/tts_evaluations/result.py`	Adds optional `signed_url` support in list/get dataset endpoints and `signed_url` field on `TTSDatasetPublic`; adds `score` to feedback update payloads and applies score updates in CRUD; `list_tts_datasets` now returns raw `EvaluationDataset` list (signature/return type changed).
Dataset Response Expansion `backend/app/api/routes/evaluations/dataset.py`, `backend/app/models/evaluation.py`	Exposes optional `description` on `DatasetUploadResponse` and populates it from `dataset.description` in response construction.
Evaluation Score Name Standardization `backend/app/crud/evaluations/langfuse.py`, `backend/app/crud/evaluations/processing.py`, `backend/app/tests/crud/evaluations/*`	Renames score identifier from `cosine_similarity` to `Cosine Similarity` in Langfuse trace updates and processing summary scores; tests updated accordingly.
CSV Header Validation `backend/app/services/evaluations/validators.py`	Replaces permissive header check with strict case-insensitive set validation enforcing only `question` and `answer`; returns detailed missing/unexpected header information on errors.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as STT API (routes)
    participant Auth as Auth/RequireProject
    participant CRUD as CRUD layer
    participant DB as Database

    Client->>API: PATCH /datasets/samples/{id} with STTSampleUpdate
    API->>Auth: REQUIRE_PROJECT (verify project/org)
    Auth-->>API: OK
    API->>CRUD: get_stt_sample_by_id(session, sample_id, org_id, project_id)
    CRUD->>DB: SELECT sample WHERE id, org_id, project_id
    DB-->>CRUD: STTSample (or null)
    CRUD-->>API: STTSample
    API->>CRUD: update_stt_sample(session, sample_id, org_id, project_id, language_id?, ground_truth?)
    CRUD->>DB: UPDATE fields, set updated_at
    DB-->>CRUD: updated STTSample
    CRUD-->>API: updated STTSample
    API-->>Client: 200 OK with STTSamplePublic (full representation)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Evaluation: TTS #619: Overlapping TTS evaluation API, models, and feedback/score handling changes.
Evaluation: STT #571: Previously added STT evaluation infrastructure that this PR extends (STT sample routes/CRUD).
Enhancement: Dataset signed URLs and improved validation errors #652: Related changes adding signed URL support for datasets and adjusting public dataset models.

Suggested labels

enhancement

Suggested reviewers

vprashrex
nishika26

Poem

🐰 I hopped through code with eager paws,
I patched a sample, fixed some flaws,
Signed URLs now skitter and sing,
Scores renamed to brighter thing,
CSVs honed — a springtime cause 🌷

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'TTS Evaluation: Metric' is vague and does not clearly convey the specific changes made in the PR, which include multiple unrelated enhancements across TTS and STT evaluation features.	Consider using a more specific title that reflects the primary change, such as 'Add score field to TTS feedback and signed URL support for datasets' or similar, avoiding generic phrasing.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch enhancement/tts-metric

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

backend/app/crud/tts_evaluations/result.py (1)

240-242: Sync docstring with the new score update path.

Now that score is handled, the **kwargs docstring should mention it to avoid stale API expectations.

Suggested docstring tweak

-        **kwargs: Fields to update (is_correct, comment)
+        **kwargs: Fields to update (is_correct, comment, score)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/tts_evaluations/result.py` around lines 240 - 242, The
docstring for the function/method that consumes **kwargs and sets result.score
(the block assigning result.score = kwargs["score"]) is out of date; update that
function's docstring to mention the new optional "score" key in **kwargs (its
expected type, valid range/semantics, and behavior when present), adjust any
examples or parameter lists to include "score", and ensure the docstring style
(Google/Sphinx) remains consistent so callers know this kwargs entry is
supported.

backend/app/crud/stt_evaluations/dataset.py (1)

269-279: Consider guarding against empty updates to avoid no-op writes.

When both fields are None, Line 275-Line 279 still commits and updates updated_at. A small guard would make PATCH behavior clearer and reduce unnecessary writes.

Possible guard clause

 def update_stt_sample(
@@
 ) -> STTSample:
@@
+    if language_id is None and ground_truth is None:
+        raise HTTPException(
+            status_code=400,
+            detail="At least one field (language_id or ground_truth) must be provided",
+        )
+
     sample = get_stt_sample_by_id(
         session=session,
         sample_id=sample_id,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/stt_evaluations/dataset.py` around lines 269 - 279, Add a
guard to avoid no-op writes: check whether either language_id or ground_truth is
not None before setting fields/updated_at and calling
session.add()/commit()/refresh(). If both are None, skip updating
sample.updated_at and the database calls (optionally return the untouched sample
or a boolean indicating no change). Update the block that assigns
sample.language_id, sample.ground_truth, sample.updated_at and calls
session.add/commit/refresh to only run when at least one of language_id or
ground_truth is provided.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/docs/tts_evaluation/update_feedback.md`:
- Around line 16-17: Replace the placeholder pipe-separated strings in the JSON
example with concrete enum values: change the values for "Speech Naturalness"
and "Pronunciation Accuracy" from "low | medium | high" to a single valid
example such as "low" (or "medium"/"high") so the example is a valid JSON
payload and unambiguous when copied.

In `@backend/app/api/routes/tts_evaluations/dataset.py`:
- Around line 95-99: The loop that builds response items calls
storage.get_signed_url(dataset.object_store_url) directly inside the routes that
produce TTSDatasetPublic.from_model, which lets exceptions from get_signed_url
bubble up and return 500s; wrap the call in a try/except (or try/catch) to catch
errors from storage.get_signed_url and handle them by logging the error and
setting signed_url to None (or a safe fallback) so the route (the function that
iterates over datasets and calls TTSDatasetPublic.from_model) continues
returning the rest of the items; apply the same pattern to the other
loop/occurrence around the lines referenced (the second block that also calls
storage.get_signed_url) so both list/get handlers are hardened.

In `@backend/app/crud/evaluations/langfuse.py`:
- Line 212: The metric is being written with a human-readable name ("Cosine
Similarity") which breaks downstream aggregation expecting the canonical machine
key "cosine_similarity"; update the Langfuse metric creation call (the place
setting name="Cosine Similarity") to use the stable key "cosine_similarity"
(e.g., key="cosine_similarity" or metric_key="cosine_similarity" depending on
the API) and, if supported, keep the human label in a separate display_name/name
field so consumers continue to read the friendly label while aggregation uses
the machine-stable key.

In `@backend/app/crud/evaluations/processing.py`:
- Line 395: The change replaced the summary score "name" with "Cosine
Similarity", which breaks downstream code that matches the exact key
"cosine_similarity"; revert this to the stable identifier or explicitly include
a stable key field. Locate the score dictionary created in processing.py (the
summary score entry around where "name": "Cosine Similarity" is set) and either
restore the original name value that clients expect or add a new unchanging
field such as "key": "cosine_similarity" (keeping the human-friendly "name" if
desired) so downstream filters/tests can continue to match by the stable
identifier.

In `@backend/app/services/evaluations/validators.py`:
- Around line 140-152: The validation currently uses set(clean_headers.keys())
which hides duplicate normalized headers (e.g., "Question" and " question")
because set removes multiplicity; update the validator in validators.py to
detect duplicates by counting occurrences of normalized headers (use
clean_headers mapping or rebuild a normalized-name -> list of originals), and if
any normalized header appears more than once raise HTTPException(422) with a
clear detail listing the duplicated normalized header(s) and their original raw
headers (include csv_reader.fieldnames in the message for context); keep the
existing missing/extra logic but run the duplicate check first so ambiguous CSVs
are rejected explicitly.

---

Nitpick comments:
In `@backend/app/crud/stt_evaluations/dataset.py`:
- Around line 269-279: Add a guard to avoid no-op writes: check whether either
language_id or ground_truth is not None before setting fields/updated_at and
calling session.add()/commit()/refresh(). If both are None, skip updating
sample.updated_at and the database calls (optionally return the untouched sample
or a boolean indicating no change). Update the block that assigns
sample.language_id, sample.ground_truth, sample.updated_at and calls
session.add/commit/refresh to only run when at least one of language_id or
ground_truth is provided.

In `@backend/app/crud/tts_evaluations/result.py`:
- Around line 240-242: The docstring for the function/method that consumes
**kwargs and sets result.score (the block assigning result.score =
kwargs["score"]) is out of date; update that function's docstring to mention the
new optional "score" key in **kwargs (its expected type, valid range/semantics,
and behavior when present), adjust any examples or parameter lists to include
"score", and ensure the docstring style (Google/Sphinx) remains consistent so
callers know this kwargs entry is supported.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e4765f2b-594f-4dce-a969-d4f023e60c31

📥 Commits

Reviewing files that changed from the base of the PR and between bf08b39 and d586eb6.

📒 Files selected for processing (16)

backend/app/api/docs/stt_evaluation/update_sample.md
backend/app/api/docs/tts_evaluation/update_feedback.md
backend/app/api/routes/evaluations/dataset.py
backend/app/api/routes/stt_evaluations/dataset.py
backend/app/api/routes/tts_evaluations/dataset.py
backend/app/api/routes/tts_evaluations/result.py
backend/app/crud/evaluations/langfuse.py
backend/app/crud/evaluations/processing.py
backend/app/crud/stt_evaluations/__init__.py
backend/app/crud/stt_evaluations/dataset.py
backend/app/crud/tts_evaluations/dataset.py
backend/app/crud/tts_evaluations/result.py
backend/app/models/evaluation.py
backend/app/models/stt_evaluation.py
backend/app/models/tts_evaluation.py
backend/app/services/evaluations/validators.py

coderabbitai · 2026-03-17T06:59:53Z

backend/app/api/docs/tts_evaluation/update_feedback.md

+    "Speech Naturalness": "low | medium | high",
+    "Pronunciation Accuracy": "low | medium | high"


⚠️ Potential issue | 🟡 Minor

Use concrete enum values in the JSON example.

On Line 16 and Line 17, "low | medium | high" reads like a literal payload value and can lead to invalid requests when copied directly.

Suggested doc tweak

"score": { - "Speech Naturalness": "low | medium | high", - "Pronunciation Accuracy": "low | medium | high" + "Speech Naturalness": "medium", + "Pronunciation Accuracy": "high" }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"Speech Naturalness": "low | medium | high",

"Pronunciation Accuracy": "low | medium | high"

"score": {

"Speech Naturalness": "medium",

"Pronunciation Accuracy": "high"

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/api/docs/tts_evaluation/update_feedback.md` around lines 16 - 17, Replace the placeholder pipe-separated strings in the JSON example with concrete enum values: change the values for "Speech Naturalness" and "Pronunciation Accuracy" from "low | medium | high" to a single valid example such as "low" (or "medium"/"high") so the example is a valid JSON payload and unambiguous when copied.

coderabbitai · 2026-03-17T06:59:53Z

backend/app/api/routes/tts_evaluations/dataset.py

+    for dataset in datasets:
+        signed_url = None
+        if storage and dataset.object_store_url:
+            signed_url = storage.get_signed_url(dataset.object_store_url)
+        data.append(TTSDatasetPublic.from_model(dataset, signed_url=signed_url))


⚠️ Potential issue | 🟠 Major

Handle signed URL generation failures explicitly.

storage.get_signed_url(...) is an external call that can fail (invalid URL/storage transient). Right now, that bubbles into 500s and can break full list/get responses.

Suggested hardening

@@ - for dataset in datasets: + for dataset in datasets: signed_url = None if storage and dataset.object_store_url: - signed_url = storage.get_signed_url(dataset.object_store_url) + try: + signed_url = storage.get_signed_url(dataset.object_store_url) + except Exception as err: + logger.warning( + f"[list_datasets] Signed URL generation failed | dataset_id={dataset.id} | error={err}" + ) data.append(TTSDatasetPublic.from_model(dataset, signed_url=signed_url)) @@ signed_url = None if include_signed_url and dataset.object_store_url: storage = get_cloud_storage( session=session, project_id=auth_context.project_.id ) - signed_url = storage.get_signed_url(dataset.object_store_url) + try: + signed_url = storage.get_signed_url(dataset.object_store_url) + except Exception as err: + logger.error( + f"[get_dataset] Signed URL generation failed | dataset_id={dataset_id} | error={err}", + exc_info=True, + ) + raise HTTPException( + status_code=502, detail="Failed to generate signed URL" + ) from err

Also applies to: 133-138

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/tts_evaluations/dataset.py` around lines 95 - 99, The loop that builds response items calls storage.get_signed_url(dataset.object_store_url) directly inside the routes that produce TTSDatasetPublic.from_model, which lets exceptions from get_signed_url bubble up and return 500s; wrap the call in a try/except (or try/catch) to catch errors from storage.get_signed_url and handle them by logging the error and setting signed_url to None (or a safe fallback) so the route (the function that iterates over datasets and calls TTSDatasetPublic.from_model) continues returning the rest of the items; apply the same pattern to the other loop/occurrence around the lines referenced (the second block that also calls storage.get_signed_url) so both list/get handlers are hardened.

coderabbitai · 2026-03-17T06:59:53Z

backend/app/crud/evaluations/langfuse.py

            langfuse.score(
                trace_id=trace_id,
-                name="cosine_similarity",
+                name="Cosine Similarity",


⚠️ Potential issue | 🟠 Major

Use a machine-stable metric key when writing Langfuse scores.

Line 212 sets name="Cosine Similarity". This diverges from the existing canonical key "cosine_similarity" used by consumers and aggregation logic, and can lead to missing/duplicated metric series.

Suggested patch

- name="Cosine Similarity", + name="cosine_similarity",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

name="Cosine Similarity",

name="cosine_similarity",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/langfuse.py` at line 212, The metric is being written with a human-readable name ("Cosine Similarity") which breaks downstream aggregation expecting the canonical machine key "cosine_similarity"; update the Langfuse metric creation call (the place setting name="Cosine Similarity") to use the stable key "cosine_similarity" (e.g., key="cosine_similarity" or metric_key="cosine_similarity" depending on the API) and, if supported, keep the human label in a separate display_name/name field so consumers continue to read the friendly label while aggregation uses the machine-stable key.

coderabbitai · 2026-03-17T06:59:53Z

backend/app/crud/evaluations/processing.py

            "summary_scores": [
                {
-                    "name": "cosine_similarity",
+                    "name": "Cosine Similarity",


⚠️ Potential issue | 🟠 Major

Keep score identifiers stable to avoid breaking clients.

Line 395 changes the summary score name to "Cosine Similarity", but downstream code/tests match by exact key "cosine_similarity". This introduces a backward-incompatible API contract change for score filtering.

Suggested patch

- "name": "Cosine Similarity", + "name": "cosine_similarity",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"name": "Cosine Similarity",

"name": "cosine_similarity",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/processing.py` at line 395, The change replaced the summary score "name" with "Cosine Similarity", which breaks downstream code that matches the exact key "cosine_similarity"; revert this to the stable identifier or explicitly include a stable key field. Locate the score dictionary created in processing.py (the summary score entry around where "name": "Cosine Similarity" is set) and either restore the original name value that clients expect or add a new unchanging field such as "key": "cosine_similarity" (keeping the human-friendly "name" if desired) so downstream filters/tests can continue to match by the stable identifier.

coderabbitai · 2026-03-17T06:59:53Z

backend/app/services/evaluations/validators.py

+        # Validate exactly 'question' and 'answer' columns (case-insensitive)
+        if set(clean_headers.keys()) != {"question", "answer"}:
+            extra = set(clean_headers.keys()) - {"question", "answer"}
+            missing = {"question", "answer"} - set(clean_headers.keys())
+            parts = []
+            if missing:
+                parts.append(f"Missing: {sorted(missing)}")
+            if extra:
+                parts.append(f"Unexpected: {sorted(extra)}")
            raise HTTPException(
                status_code=422,
-                detail=f"CSV must contain 'question' and 'answer' columns "
-                f"Found columns: {csv_reader.fieldnames}",
+                detail=f"CSV must contain exactly 'question' and 'answer' columns. "
+                f"{'. '.join(parts)}. Found columns: {csv_reader.fieldnames}",


⚠️ Potential issue | 🟡 Minor

Handle duplicate normalized headers explicitly

The strict check at this block can still accept duplicate headers (case/space variants), because set(...) drops multiplicity. That lets ambiguous CSVs pass while only one duplicate is actually used downstream.

🔧 Suggested patch

- clean_headers = { - field.strip().lower(): field for field in csv_reader.fieldnames - } + normalized_headers = [field.strip().lower() for field in csv_reader.fieldnames] + clean_headers = { + field.strip().lower(): field for field in csv_reader.fieldnames + } + has_duplicate_headers = len(normalized_headers) != len(set(normalized_headers)) # Validate exactly 'question' and 'answer' columns (case-insensitive) - if set(clean_headers.keys()) != {"question", "answer"}: + if set(clean_headers.keys()) != {"question", "answer"} or has_duplicate_headers: extra = set(clean_headers.keys()) - {"question", "answer"} missing = {"question", "answer"} - set(clean_headers.keys()) parts = [] if missing: parts.append(f"Missing: {sorted(missing)}") if extra: parts.append(f"Unexpected: {sorted(extra)}") + if has_duplicate_headers: + parts.append("Duplicate headers are not allowed") raise HTTPException( status_code=422, detail=f"CSV must contain exactly 'question' and 'answer' columns. " f"{'. '.join(parts)}. Found columns: {csv_reader.fieldnames}", )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/services/evaluations/validators.py` around lines 140 - 152, The validation currently uses set(clean_headers.keys()) which hides duplicate normalized headers (e.g., "Question" and " question") because set removes multiplicity; update the validator in validators.py to detect duplicates by counting occurrences of normalized headers (use clean_headers mapping or rebuild a normalized-name -> list of originals), and if any normalized header appears more than once raise HTTPException(422) with a clear detail listing the duplicated normalized header(s) and their original raw headers (include csv_reader.fieldnames in the message for context); keep the existing missing/extra logic but run the duplicate check first so ambiguous CSVs are rejected explicitly.

codecov · 2026-03-17T07:58:21Z

Codecov Report

❌ Patch coverage is 61.97183% with 27 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/crud/stt_evaluations/dataset.py	12.50%	14 Missing ⚠️
backend/app/api/routes/stt_evaluations/dataset.py	30.00%	7 Missing ⚠️
backend/app/api/routes/tts_evaluations/dataset.py	73.33%	4 Missing ⚠️
backend/app/api/routes/tts_evaluations/result.py	50.00%	1 Missing ⚠️
backend/app/crud/tts_evaluations/result.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

backend/app/api/routes/stt_evaluations/dataset.py

…/kaapi-backend into enhancement/tts-metric

AkhileshNegi added 5 commits March 14, 2026 12:52

added signed url for tts dataset

f03e860

added API to update STT samples

f9ecdb7

updated TTS feedback results

33db59e

added description and headers validation

bdc8133

using cosine similarity naming for consistency

029bb48

AkhileshNegi marked this pull request as ready for review March 17, 2026 06:52

Merge branch 'main' into enhancement/tts-metric

d586eb6

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

updating testcases

05ff2c7

AkhileshNegi changed the title ~~Enhancement/tts metric~~ TTS Evaluation: Metric Mar 17, 2026

AkhileshNegi self-assigned this Mar 17, 2026

AkhileshNegi added the enhancement New feature or request label Mar 17, 2026

AkhileshNegi requested review from nishika26 and vprashrex March 17, 2026 07:58

Merge branch 'main' into enhancement/tts-metric

97e4f82

nishika26 reviewed Mar 18, 2026

View reviewed changes

backend/app/api/routes/stt_evaluations/dataset.py Outdated Show resolved Hide resolved

vprashrex approved these changes Mar 18, 2026

View reviewed changes

AkhileshNegi added 3 commits March 18, 2026 12:45

moved it language exception to language module

ad328d5

Merge branch 'enhancement/tts-metric' of github.com:ProjectTech4DevAI…

dad9226

…/kaapi-backend into enhancement/tts-metric

code cleanups

586f728

AkhileshNegi merged commit 824cd26 into main Mar 18, 2026
2 checks passed

AkhileshNegi deleted the enhancement/tts-metric branch March 18, 2026 07:30

AkhileshNegi linked an issue Mar 18, 2026 that may be closed by this pull request

Evaluation: Metrics for TTS #683

Closed

		"Speech Naturalness": "low \| medium \| high",
		"Pronunciation Accuracy": "low \| medium \| high"

-    "Speech Naturalness": "low | medium | high",
-    "Pronunciation Accuracy": "low | medium | high"
+   "score": {
+    "Speech Naturalness": "medium",
+    "Pronunciation Accuracy": "high"
+   }

Conversation

AkhileshNegi commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Uh oh!

coderabbitai bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AkhileshNegi commented Mar 17, 2026 •

edited

Loading

coderabbitai bot commented Mar 17, 2026 •

edited

Loading

codecov bot commented Mar 17, 2026 •

edited

Loading