Skip to content

STT: list and get files#655

Merged
AkhileshNegi merged 5 commits intomainfrom
enhancement/list_audio
Mar 11, 2026
Merged

STT: list and get files#655
AkhileshNegi merged 5 commits intomainfrom
enhancement/list_audio

Conversation

@nishika26
Copy link
Copy Markdown
Collaborator

@nishika26 nishika26 commented Mar 10, 2026

Summary

Target issue is #656

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • New Features

    • Added API endpoints to list all audio files and retrieve individual audio files from projects.
    • Added optional signed URL support for direct cloud storage access to audio files.
    • Improved error handling with proper responses when requested audio files are not found.
  • Tests

    • Added comprehensive test coverage for new audio file listing and retrieval endpoints.

Summary by CodeRabbit

  • New Features

    • Added endpoints to list all audio files and retrieve individual files from your project, including metadata such as file size and content type.
    • Optional signed URL generation for direct cloud storage access to audio files without additional authentication.
  • Documentation

    • Added API documentation for audio file listing and retrieval endpoints with examples and response schemas.
  • Tests

    • Comprehensive test coverage for new audio file management endpoints and signed URL functionality.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Adds two STT audio endpoints (list and get) with optional pre-signed S3 URL support, new CRUD list_files(), helpers to build FilePublic schemas (signed_url), docs, and tests covering listing, retrieval, signed URL behavior, auth, and project isolation.

Changes

Cohort / File(s) Summary
Documentation
backend/app/api/docs/stt_evaluation/get_audio.md, backend/app/api/docs/stt_evaluation/list_audios.md
Adds docs for GET /files and GET /files/{file_id}, detailing returned metadata and optional pre-signed S3 URL when include_signed_url=True.
API Routes
backend/app/api/routes/stt_evaluations/files.py
Adds list_audio and get_audio endpoints with include_signed_url Query flag, conditional storage access for signed URLs, and 404 handling for missing files.
CRUD Operations
backend/app/crud/file.py
Adds list_files(session, organization_id, project_id) to fetch audio File records filtered by organization/project and file_type.
Data Models
backend/app/models/file.py
Adds optional `signed_url: str
Helper Utilities
backend/app/services/stt_evaluations/helpers.py
New helpers build_file_schema and build_file_schemas to map FileFilePublic and optionally attach signed URLs via storage client with per-item error handling.
Tests
backend/app/tests/api/routes/test_stt_evaluation.py
Adds TestListAudioFiles and TestGetAudioFile suites covering listing, retrieval, signed URL behavior, auth checks, project isolation, and signed URL failure cases.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant API as API (stt_evaluations/files)
  participant DB as Database (File table)
  participant Storage as Storage (S3 / signed URL)

  Client->>API: GET /files?include_signed_url=true
  API->>DB: list_files(organization_id, project_id)
  DB-->>API: list of File records
  alt include_signed_url and storage available
    API->>Storage: generate_presigned_url(file.s3_key)
    Storage-->>API: presigned URL (or error)
    API-->>Client: 200 [FilePublic{..., signed_url}]
  else no signed url
    API-->>Client: 200 [FilePublic{..., signed_url: null}]
  end
Loading
sequenceDiagram
  participant Client as Client
  participant API as API (stt_evaluations/files)
  participant DB as Database (File table)
  participant Storage as Storage (S3 / signed URL)

  Client->>API: GET /files/{file_id}?include_signed_url=true
  API->>DB: get_file_by_id(file_id, organization_id, project_id)
  alt file found
    DB-->>API: File record
    alt include_signed_url and storage available
      API->>Storage: generate_presigned_url(file.s3_key)
      Storage-->>API: presigned URL (or error)
      API-->>Client: 200 FilePublic{..., signed_url}
    else
      API-->>Client: 200 FilePublic{..., signed_url: null}
    end
  else file not found
    DB-->>API: none
    API-->>Client: 404 Not Found
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • vprashrex
  • AkhileshNegi

Poem

🐰 I hopped through bytes and S3 lanes,

I fetched the files and tied their reins,
Signed links gleam then softly hide,
Tests snug the burrow, side by side,
A carrot-coded hop — delight in strides!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'STT: list and get files' directly and concisely describes the main changes—adding list and get endpoints for audio files in the STT evaluation feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enhancement/list_audio

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 98.22485% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/services/stt_evaluations/helpers.py 88.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 77-85: The route handlers are using generic file lookups
(list_files and get_file_by_id) and therefore return non-audio files; update the
list and get handlers to enforce audio-only scope by filtering list_files with
file_type=FileType.AUDIO.value (or equivalent enum value) and by validating the
fetched record in get_audio (from get_file_by_id) and returning a 404 if its
type !== FileType.AUDIO.value; ensure build_file_schemas usage
(include_signed_url/storage) remains unchanged and only runs on the
filtered/validated audio records.
- Around line 59-61: Change the Query default for include_signed_url from True
to False in all endpoint signatures so signed URLs are opt-in; then make signed
URL creation best-effort by wrapping get_cloud_storage() and presign calls in
try/except (or handle returned None) and set signed_url = None on any failure
instead of letting exceptions propagate. Locate every use of include_signed_url,
get_cloud_storage(), and presign() (and the signed_url variable) in this module
and mirror the same behavior: default False, attempt to obtain storage and
presign only if include_signed_url is True, and on any failure assign signed_url
= None and continue returning metadata.

In `@backend/app/models/file.py`:
- Around line 95-102: The file defines AudioUploadResponse twice which causes a
duplicate declaration (Ruff F811); remove the redundant class declaration so
there is only one AudioUploadResponse model; locate both class definitions named
AudioUploadResponse and delete the later/duplicate block (the one with fields
file_id, s3_url, filename, size_bytes, content_type) ensuring the remaining
single definition has the correct fields and description.

In `@backend/app/services/stt_evaluations/helpers.py`:
- Around line 31-34: The logger in build_file_schema is exposing raw storage
exceptions (e.g., CloudStorageError) which may contain S3 URLs; change the
logger.warning calls that interpolate the exception (the ones in
build_file_schema around the signed URL generation and the similar block at
lines 61-64) to instead log the function prefix and file.id plus either the
exception type or a masked URL. Specifically, replace f"...{e}" with a safe
message like f"[build_file_schema] Failed to generate signed URL for file
{file.id}: {type(e).__name__}" or call the existing mask_string(mask_url) helper
to mask any URL before logging; ensure both occurrences use the exact
"[build_file_schema]" prefix and do not include the raw exception text or full
S3 path.

In `@backend/app/tests/api/routes/test_stt_evaluation.py`:
- Around line 1154-1169: The test creates files using create_test_file but
assigns them to unused local variables (file1, file2 etc.), causing Ruff F841;
remove the unused bindings by either inlining the create_test_file calls or
assigning to _ (e.g., replace "file1 = create_test_file(...)" with "_ =
create_test_file(...)" or just call create_test_file(...) ) for each occurrence
(lines creating file1/file2 and the other similar blocks around create_test_file
calls at the noted ranges) so the DB side effects remain but no unused-variable
lint errors are produced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1708eb8c-613b-4e42-9279-b977318d09b2

📥 Commits

Reviewing files that changed from the base of the PR and between 7098f0e and a5cecaf.

📒 Files selected for processing (7)
  • backend/app/api/docs/stt_evaluation/get_audio.md
  • backend/app/api/docs/stt_evaluation/list_audios.md
  • backend/app/api/routes/stt_evaluations/files.py
  • backend/app/crud/file.py
  • backend/app/models/file.py
  • backend/app/services/stt_evaluations/helpers.py
  • backend/app/tests/api/routes/test_stt_evaluation.py

@nishika26 nishika26 self-assigned this Mar 10, 2026
@nishika26 nishika26 added the enhancement New feature or request label Mar 10, 2026
@nishika26 nishika26 linked an issue Mar 10, 2026 that may be closed by this pull request
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/app/api/routes/stt_evaluations/files.py (1)

112-120: ⚠️ Potential issue | 🟡 Minor

Validate file type is AUDIO for consistency with list_audio endpoint.

The list_audio endpoint only returns audio files (via list_files which filters by FileType.AUDIO.value), but get_audio uses get_file_by_id which doesn't filter by file type. This inconsistency means a user could access non-audio files via this audio-specific endpoint if they know the file ID.

Proposed fix
-from app.models.file import AudioUploadResponse, FilePublic
+from app.models.file import AudioUploadResponse, FilePublic, FileType

Then update the validation at lines 119-120:

-    if not file:
+    if not file or file.file_type != FileType.AUDIO.value:
         raise HTTPException(status_code=404, detail=f"File with ID {file_id} not found")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/routes/stt_evaluations/files.py` around lines 112 - 120, The
get_audio path currently calls get_file_by_id without checking file type,
allowing non-audio files to be returned; update the handler (the get_audio flow
around get_file_by_id) to enforce audio-only results by verifying file.type ==
FileType.AUDIO.value (or using FileType.AUDIO) after retrieving file and raising
HTTPException(404) if it isn't audio; ensure FileType is imported/available and
keep the error message consistent with the existing 404 handling.
🧹 Nitpick comments (9)
backend/app/tests/api/routes/test_stt_evaluation.py (3)

1270-1276: Unused variable assignment.

Same issue: user_file is assigned but not used.

Proposed fix
         # Create file in user's project
-        user_file = create_test_file(
+        _ = create_test_file(
             db=db,
             organization_id=user_api_key.organization_id,
             project_id=user_api_key.project_id,
             filename="user_project_file.mp3",
             file_type=FileType.AUDIO.value,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/api/routes/test_stt_evaluation.py` around lines 1270 -
1276, The test assigns user_file from create_test_file but never uses it; either
remove the unused assignment or mark it intentionally unused by prefixing the
variable with an underscore (e.g., _user_file) so linters/test checks stop
flagging it; locate the call to create_test_file in the test_stt_evaluation.py
test and either drop the assignment entirely or rename user_file to _user_file
to indicate it is intentionally unused.

1239-1245: Unused variable assignment.

Same issue: file is assigned but not used.

Proposed fix
         # Create test file
-        file = create_test_file(
+        _ = create_test_file(
             db=db,
             organization_id=user_api_key.organization_id,
             project_id=user_api_key.project_id,
             filename="no_signed_url.mp3",
             file_type=FileType.AUDIO.value,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/api/routes/test_stt_evaluation.py` around lines 1239 -
1245, The test assigns the return of create_test_file to the unused variable
file in test_stt_evaluation.py; either remove the assignment (call
create_test_file(...) without capturing the result) or use a throwaway name like
_ or actually assert/inspect the created object; update the line referencing
create_test_file(...) accordingly so there are no unused variable warnings for
file.

1203-1209: Unused variable assignment.

The file variable is assigned but never used in the test. Either inline the call or assign to _.

Proposed fix
         # Create test file
-        file = create_test_file(
+        _ = create_test_file(
             db=db,
             organization_id=user_api_key.organization_id,
             project_id=user_api_key.project_id,
             filename="signed_url_test.mp3",
             file_type=FileType.AUDIO.value,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/api/routes/test_stt_evaluation.py` around lines 1203 -
1209, The test assigns the result of create_test_file(...) to the unused local
variable file; remove the unused binding by either inlining the call (call
create_test_file(...) without assignment) or assign it to _ instead (e.g., _ =
create_test_file(...)) so the helper is executed but the unused variable warning
is resolved; update the invocation that currently binds to file in
test_stt_evaluation.py accordingly.
backend/app/services/stt_evaluations/helpers.py (5)

61-64: Remove unused exception variable.

Same as above: replace e with nothing or _.

Proposed fix
-            except Exception as e:
+            except Exception:
                 logger.warning(
                     f"[build_file_schemas] Failed to generate signed URL for file {file.id}"
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/stt_evaluations/helpers.py` around lines 61 - 64, The
except block in build_file_schemas currently captures an unused exception
variable (`except Exception as e:`) before calling logger.warning; remove the
unused variable by changing the handler to `except Exception:` (or `except
Exception as _:`) so the exception variable is not created, keeping the existing
logger.warning call referencing file.id unchanged; update the except clause in
the helper where the failing signed-URL generation is handled (the try/except
surrounding logger.warning for file.id).

31-34: Remove unused exception variable.

The exception is caught but e is never used. Replace with _ to indicate intentional discard and satisfy Ruff F841.

Proposed fix
-        except Exception as e:
+        except Exception:
             logger.warning(
                 f"[build_file_schema] Failed to generate signed URL for file {file.id}"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/stt_evaluations/helpers.py` around lines 31 - 34, In the
except block that logs "[build_file_schema] Failed to generate signed URL for
file {file.id}" (inside build_file_schema), change the unused exception binding
from "except Exception as e:" to "except Exception as _:" so the exception
variable is intentionally discarded and Ruff F841 is satisfied; no other logic
changes required.

4-4: Import Iterable from collections.abc instead of typing.

Per Python 3.9+, Iterable should be imported from collections.abc. The typing module re-exports are deprecated.

Proposed fix
-from typing import Iterable
+from collections.abc import Iterable
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/stt_evaluations/helpers.py` at line 4, Replace the
deprecated import "from typing import Iterable" with "from collections.abc
import Iterable" in helpers.py; locate the line that imports Iterable (symbol:
Iterable) and change its source module to collections.abc so the code uses the
modern Python 3.9+ import while leaving all downstream uses of Iterable
unchanged.

47-49: Docstring parameter name mismatch.

Same issue as above: include_url should be include_signed_url.

Proposed fix
     Args:
         files: Iterable of File database model instances
-        include_url: Whether to generate and include signed URLs
+        include_signed_url: Whether to generate and include signed URLs
         storage: Cloud storage instance for generating signed URLs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/stt_evaluations/helpers.py` around lines 47 - 49, The
docstring in backend/app/services/stt_evaluations/helpers.py uses the wrong
parameter name `include_url`; update the docstring to match the actual function
signature by renaming that parameter description to `include_signed_url` (keep
the existing description "Whether to generate and include signed URLs") so the
docstring aligns with the function parameter (referencing the helper function in
that file that accepts files and include_signed_url).

19-21: Docstring parameter name mismatch.

The docstring references include_url but the actual parameter is include_signed_url.

Proposed fix
     Args:
         file: The File database model instance
-        include_url: Whether to generate and include a signed URL
+        include_signed_url: Whether to generate and include a signed URL
         storage: Cloud storage instance for generating signed URLs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/stt_evaluations/helpers.py` around lines 19 - 21, Update
the function docstring in stt_evaluations/helpers.py so the parameter name
matches the actual signature: replace the incorrect `include_url` name with
`include_signed_url` and ensure the description reads "Whether to generate and
include a signed URL" (or equivalent) for the `include_signed_url` parameter;
also search the same docstring for any other occurrences of `include_url` and
correct them to `include_signed_url` so the docs and function signature are
consistent with the `file` parameter description.
backend/app/api/routes/stt_evaluations/files.py (1)

71-75: Consider wrapping get_cloud_storage in try/except for resilience.

If get_cloud_storage() raises an exception (e.g., missing credentials, configuration error), the endpoint will fail entirely. Since signed URLs are optional metadata, consider catching exceptions and proceeding with storage=None to return file metadata without signed URLs rather than failing the request.

Proposed fix for list_audio
     storage = None
     if include_signed_url:
-        storage = get_cloud_storage(
-            session=_session, project_id=auth_context.project_.id
-        )
+        try:
+            storage = get_cloud_storage(
+                session=_session, project_id=auth_context.project_.id
+            )
+        except Exception:
+            logger.warning(
+                f"[list_audio] Failed to initialize cloud storage for signed URLs | "
+                f"project_id: {auth_context.project_.id}"
+            )
Proposed fix for get_audio
     storage = None
     if include_signed_url:
-        storage = get_cloud_storage(
-            session=_session, project_id=auth_context.project_.id
-        )
+        try:
+            storage = get_cloud_storage(
+                session=_session, project_id=auth_context.project_.id
+            )
+        except Exception:
+            logger.warning(
+                f"[get_audio] Failed to initialize cloud storage for signed URLs | "
+                f"file_id: {file_id}"
+            )

Also applies to: 122-126

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/routes/stt_evaluations/files.py` around lines 71 - 75, The
call to get_cloud_storage() when include_signed_url is true can raise and abort
the request; wrap the call in a try/except around the storage assignment (the
storage variable set in the list_audio and get_audio handlers) so that on any
exception you log/debug the error and fall back to storage = None, allowing the
handlers (list_audio/get_audio) to continue returning file metadata without
signed URLs; ensure you only attempt signed-url generation later if storage is
not None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 106-110: The log message in the get_audio path contains a typo:
the formatted field name uses include__signed_url (double underscore) instead of
include_signed_url; update the logger.info call (the f-string inside the
get_audio handler where logger.info is invoked) to use include_signed_url so the
log prints the correct variable name and value.

---

Duplicate comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 112-120: The get_audio path currently calls get_file_by_id without
checking file type, allowing non-audio files to be returned; update the handler
(the get_audio flow around get_file_by_id) to enforce audio-only results by
verifying file.type == FileType.AUDIO.value (or using FileType.AUDIO) after
retrieving file and raising HTTPException(404) if it isn't audio; ensure
FileType is imported/available and keep the error message consistent with the
existing 404 handling.

---

Nitpick comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 71-75: The call to get_cloud_storage() when include_signed_url is
true can raise and abort the request; wrap the call in a try/except around the
storage assignment (the storage variable set in the list_audio and get_audio
handlers) so that on any exception you log/debug the error and fall back to
storage = None, allowing the handlers (list_audio/get_audio) to continue
returning file metadata without signed URLs; ensure you only attempt signed-url
generation later if storage is not None.

In `@backend/app/services/stt_evaluations/helpers.py`:
- Around line 61-64: The except block in build_file_schemas currently captures
an unused exception variable (`except Exception as e:`) before calling
logger.warning; remove the unused variable by changing the handler to `except
Exception:` (or `except Exception as _:`) so the exception variable is not
created, keeping the existing logger.warning call referencing file.id unchanged;
update the except clause in the helper where the failing signed-URL generation
is handled (the try/except surrounding logger.warning for file.id).
- Around line 31-34: In the except block that logs "[build_file_schema] Failed
to generate signed URL for file {file.id}" (inside build_file_schema), change
the unused exception binding from "except Exception as e:" to "except Exception
as _:" so the exception variable is intentionally discarded and Ruff F841 is
satisfied; no other logic changes required.
- Line 4: Replace the deprecated import "from typing import Iterable" with "from
collections.abc import Iterable" in helpers.py; locate the line that imports
Iterable (symbol: Iterable) and change its source module to collections.abc so
the code uses the modern Python 3.9+ import while leaving all downstream uses of
Iterable unchanged.
- Around line 47-49: The docstring in
backend/app/services/stt_evaluations/helpers.py uses the wrong parameter name
`include_url`; update the docstring to match the actual function signature by
renaming that parameter description to `include_signed_url` (keep the existing
description "Whether to generate and include signed URLs") so the docstring
aligns with the function parameter (referencing the helper function in that file
that accepts files and include_signed_url).
- Around line 19-21: Update the function docstring in stt_evaluations/helpers.py
so the parameter name matches the actual signature: replace the incorrect
`include_url` name with `include_signed_url` and ensure the description reads
"Whether to generate and include a signed URL" (or equivalent) for the
`include_signed_url` parameter; also search the same docstring for any other
occurrences of `include_url` and correct them to `include_signed_url` so the
docs and function signature are consistent with the `file` parameter
description.

In `@backend/app/tests/api/routes/test_stt_evaluation.py`:
- Around line 1270-1276: The test assigns user_file from create_test_file but
never uses it; either remove the unused assignment or mark it intentionally
unused by prefixing the variable with an underscore (e.g., _user_file) so
linters/test checks stop flagging it; locate the call to create_test_file in the
test_stt_evaluation.py test and either drop the assignment entirely or rename
user_file to _user_file to indicate it is intentionally unused.
- Around line 1239-1245: The test assigns the return of create_test_file to the
unused variable file in test_stt_evaluation.py; either remove the assignment
(call create_test_file(...) without capturing the result) or use a throwaway
name like _ or actually assert/inspect the created object; update the line
referencing create_test_file(...) accordingly so there are no unused variable
warnings for file.
- Around line 1203-1209: The test assigns the result of create_test_file(...) to
the unused local variable file; remove the unused binding by either inlining the
call (call create_test_file(...) without assignment) or assign it to _ instead
(e.g., _ = create_test_file(...)) so the helper is executed but the unused
variable warning is resolved; update the invocation that currently binds to file
in test_stt_evaluation.py accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d36cab9d-9a4b-4f96-a66f-9c246c9ca685

📥 Commits

Reviewing files that changed from the base of the PR and between a5cecaf and 312f8cb.

📒 Files selected for processing (5)
  • backend/app/api/routes/stt_evaluations/files.py
  • backend/app/crud/file.py
  • backend/app/models/file.py
  • backend/app/services/stt_evaluations/helpers.py
  • backend/app/tests/api/routes/test_stt_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/models/file.py


id: int
object_store_url: str
signed_url: str | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

come to think of it I think the structure should be
STTResult -> STTSample -> File -> object_store_url & signed_url
we can discuss more on this

{
  "id": 42,
  "transcription": "hello world",
  "stt_sample": {
    "id": 100,
    "file": {
      "id": 200,
      "object_store_url": "s3://bucket/file.wav",
      "signed_url": "https://signed-url..."
    }
  }
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wont these endpoints (the ones being covered in this pr), wont they just return file response model only

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
backend/app/api/routes/stt_evaluations/files.py (1)

77-85: ⚠️ Potential issue | 🟠 Major

Enforce audio-only scope before building the response.

Line 77 and Line 112 still rely on generic file lookups, so these handlers can return non-audio records unless backend/app/crud/file.py is already filtering by type. Filter the list to FileType.AUDIO.value and 404 non-audio records in get_audio before calling the schema builders.

Possible fix
-from app.models.file import AudioUploadResponse, FilePublic
+from app.models.file import AudioUploadResponse, FilePublic, FileType
@@
-    files = list_files(
-        session=session,
-        organization_id=auth_context.organization_.id,
-        project_id=auth_context.project_.id,
-    )
+    files = [
+        file
+        for file in list_files(
+            session=session,
+            organization_id=auth_context.organization_.id,
+            project_id=auth_context.project_.id,
+        )
+        if file.file_type == FileType.AUDIO.value
+    ]
@@
-    if not file:
+    if not file or file.file_type != FileType.AUDIO.value:
         raise HTTPException(status_code=404, detail=f"File with ID {file_id} not found")
#!/bin/bash
set -euo pipefail

echo "== CRUD helper definitions =="
rg -n -C4 'def (list_files|get_file_by_id)\(' backend/app/crud/file.py

echo
echo "== File type enum / file_type usage =="
rg -n -C3 'class FileType|file_type' \
  backend/app/models/file.py \
  backend/app/crud/file.py \
  backend/app/api/routes/stt_evaluations/files.py

Expected result: either list_files() already exposes a file-type filter you can pass through here, or this route needs the local FileType.AUDIO.value checks above.

Also applies to: 112-120

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/routes/stt_evaluations/files.py` around lines 77 - 85, The
handlers call list_files and get_file_by_id then pass results to
build_file_schemas/get_audio without ensuring items are audio; update the logic
to enforce audio-only scope by either passing a file-type filter into list_files
(if available) or filtering the returned list to only items with file_type ==
FileType.AUDIO.value before calling build_file_schemas, and in the single-file
handler (get_audio) return a 404 when the retrieved file's file_type !=
FileType.AUDIO.value; refer to the symbols list_files, get_file_by_id,
build_file_schemas, get_audio, and FileType.AUDIO.value to locate the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 77-85: The handlers call list_files and get_file_by_id then pass
results to build_file_schemas/get_audio without ensuring items are audio; update
the logic to enforce audio-only scope by either passing a file-type filter into
list_files (if available) or filtering the returned list to only items with
file_type == FileType.AUDIO.value before calling build_file_schemas, and in the
single-file handler (get_audio) return a 404 when the retrieved file's file_type
!= FileType.AUDIO.value; refer to the symbols list_files, get_file_by_id,
build_file_schemas, get_audio, and FileType.AUDIO.value to locate the changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ca77f37-17e7-4900-ab70-0ceb3f35d307

📥 Commits

Reviewing files that changed from the base of the PR and between 312f8cb and b2dc314.

📒 Files selected for processing (1)
  • backend/app/api/routes/stt_evaluations/files.py

@AkhileshNegi AkhileshNegi changed the title STT: list and get audio files STT: list and get files Mar 11, 2026
@AkhileshNegi AkhileshNegi merged commit 22a7ac9 into main Mar 11, 2026
1 of 2 checks passed
@AkhileshNegi AkhileshNegi deleted the enhancement/list_audio branch March 11, 2026 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

STT: Endpoint for audio files

2 participants