Conversation
📝 WalkthroughWalkthroughAdds two STT audio endpoints (list and get) with optional pre-signed S3 URL support, new CRUD list_files(), helpers to build FilePublic schemas (signed_url), docs, and tests covering listing, retrieval, signed URL behavior, auth, and project isolation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as API (stt_evaluations/files)
participant DB as Database (File table)
participant Storage as Storage (S3 / signed URL)
Client->>API: GET /files?include_signed_url=true
API->>DB: list_files(organization_id, project_id)
DB-->>API: list of File records
alt include_signed_url and storage available
API->>Storage: generate_presigned_url(file.s3_key)
Storage-->>API: presigned URL (or error)
API-->>Client: 200 [FilePublic{..., signed_url}]
else no signed url
API-->>Client: 200 [FilePublic{..., signed_url: null}]
end
sequenceDiagram
participant Client as Client
participant API as API (stt_evaluations/files)
participant DB as Database (File table)
participant Storage as Storage (S3 / signed URL)
Client->>API: GET /files/{file_id}?include_signed_url=true
API->>DB: get_file_by_id(file_id, organization_id, project_id)
alt file found
DB-->>API: File record
alt include_signed_url and storage available
API->>Storage: generate_presigned_url(file.s3_key)
Storage-->>API: presigned URL (or error)
API-->>Client: 200 FilePublic{..., signed_url}
else
API-->>Client: 200 FilePublic{..., signed_url: null}
end
else file not found
DB-->>API: none
API-->>Client: 404 Not Found
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 77-85: The route handlers are using generic file lookups
(list_files and get_file_by_id) and therefore return non-audio files; update the
list and get handlers to enforce audio-only scope by filtering list_files with
file_type=FileType.AUDIO.value (or equivalent enum value) and by validating the
fetched record in get_audio (from get_file_by_id) and returning a 404 if its
type !== FileType.AUDIO.value; ensure build_file_schemas usage
(include_signed_url/storage) remains unchanged and only runs on the
filtered/validated audio records.
- Around line 59-61: Change the Query default for include_signed_url from True
to False in all endpoint signatures so signed URLs are opt-in; then make signed
URL creation best-effort by wrapping get_cloud_storage() and presign calls in
try/except (or handle returned None) and set signed_url = None on any failure
instead of letting exceptions propagate. Locate every use of include_signed_url,
get_cloud_storage(), and presign() (and the signed_url variable) in this module
and mirror the same behavior: default False, attempt to obtain storage and
presign only if include_signed_url is True, and on any failure assign signed_url
= None and continue returning metadata.
In `@backend/app/models/file.py`:
- Around line 95-102: The file defines AudioUploadResponse twice which causes a
duplicate declaration (Ruff F811); remove the redundant class declaration so
there is only one AudioUploadResponse model; locate both class definitions named
AudioUploadResponse and delete the later/duplicate block (the one with fields
file_id, s3_url, filename, size_bytes, content_type) ensuring the remaining
single definition has the correct fields and description.
In `@backend/app/services/stt_evaluations/helpers.py`:
- Around line 31-34: The logger in build_file_schema is exposing raw storage
exceptions (e.g., CloudStorageError) which may contain S3 URLs; change the
logger.warning calls that interpolate the exception (the ones in
build_file_schema around the signed URL generation and the similar block at
lines 61-64) to instead log the function prefix and file.id plus either the
exception type or a masked URL. Specifically, replace f"...{e}" with a safe
message like f"[build_file_schema] Failed to generate signed URL for file
{file.id}: {type(e).__name__}" or call the existing mask_string(mask_url) helper
to mask any URL before logging; ensure both occurrences use the exact
"[build_file_schema]" prefix and do not include the raw exception text or full
S3 path.
In `@backend/app/tests/api/routes/test_stt_evaluation.py`:
- Around line 1154-1169: The test creates files using create_test_file but
assigns them to unused local variables (file1, file2 etc.), causing Ruff F841;
remove the unused bindings by either inlining the create_test_file calls or
assigning to _ (e.g., replace "file1 = create_test_file(...)" with "_ =
create_test_file(...)" or just call create_test_file(...) ) for each occurrence
(lines creating file1/file2 and the other similar blocks around create_test_file
calls at the noted ranges) so the DB side effects remain but no unused-variable
lint errors are produced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1708eb8c-613b-4e42-9279-b977318d09b2
📒 Files selected for processing (7)
backend/app/api/docs/stt_evaluation/get_audio.mdbackend/app/api/docs/stt_evaluation/list_audios.mdbackend/app/api/routes/stt_evaluations/files.pybackend/app/crud/file.pybackend/app/models/file.pybackend/app/services/stt_evaluations/helpers.pybackend/app/tests/api/routes/test_stt_evaluation.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/api/routes/stt_evaluations/files.py (1)
112-120:⚠️ Potential issue | 🟡 MinorValidate file type is AUDIO for consistency with list_audio endpoint.
The
list_audioendpoint only returns audio files (vialist_fileswhich filters byFileType.AUDIO.value), butget_audiousesget_file_by_idwhich doesn't filter by file type. This inconsistency means a user could access non-audio files via this audio-specific endpoint if they know the file ID.Proposed fix
-from app.models.file import AudioUploadResponse, FilePublic +from app.models.file import AudioUploadResponse, FilePublic, FileTypeThen update the validation at lines 119-120:
- if not file: + if not file or file.file_type != FileType.AUDIO.value: raise HTTPException(status_code=404, detail=f"File with ID {file_id} not found")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/stt_evaluations/files.py` around lines 112 - 120, The get_audio path currently calls get_file_by_id without checking file type, allowing non-audio files to be returned; update the handler (the get_audio flow around get_file_by_id) to enforce audio-only results by verifying file.type == FileType.AUDIO.value (or using FileType.AUDIO) after retrieving file and raising HTTPException(404) if it isn't audio; ensure FileType is imported/available and keep the error message consistent with the existing 404 handling.
🧹 Nitpick comments (9)
backend/app/tests/api/routes/test_stt_evaluation.py (3)
1270-1276: Unused variable assignment.Same issue:
user_fileis assigned but not used.Proposed fix
# Create file in user's project - user_file = create_test_file( + _ = create_test_file( db=db, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, filename="user_project_file.mp3", file_type=FileType.AUDIO.value, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/api/routes/test_stt_evaluation.py` around lines 1270 - 1276, The test assigns user_file from create_test_file but never uses it; either remove the unused assignment or mark it intentionally unused by prefixing the variable with an underscore (e.g., _user_file) so linters/test checks stop flagging it; locate the call to create_test_file in the test_stt_evaluation.py test and either drop the assignment entirely or rename user_file to _user_file to indicate it is intentionally unused.
1239-1245: Unused variable assignment.Same issue:
fileis assigned but not used.Proposed fix
# Create test file - file = create_test_file( + _ = create_test_file( db=db, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, filename="no_signed_url.mp3", file_type=FileType.AUDIO.value, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/api/routes/test_stt_evaluation.py` around lines 1239 - 1245, The test assigns the return of create_test_file to the unused variable file in test_stt_evaluation.py; either remove the assignment (call create_test_file(...) without capturing the result) or use a throwaway name like _ or actually assert/inspect the created object; update the line referencing create_test_file(...) accordingly so there are no unused variable warnings for file.
1203-1209: Unused variable assignment.The
filevariable is assigned but never used in the test. Either inline the call or assign to_.Proposed fix
# Create test file - file = create_test_file( + _ = create_test_file( db=db, organization_id=user_api_key.organization_id, project_id=user_api_key.project_id, filename="signed_url_test.mp3", file_type=FileType.AUDIO.value, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/api/routes/test_stt_evaluation.py` around lines 1203 - 1209, The test assigns the result of create_test_file(...) to the unused local variable file; remove the unused binding by either inlining the call (call create_test_file(...) without assignment) or assign it to _ instead (e.g., _ = create_test_file(...)) so the helper is executed but the unused variable warning is resolved; update the invocation that currently binds to file in test_stt_evaluation.py accordingly.backend/app/services/stt_evaluations/helpers.py (5)
61-64: Remove unused exception variable.Same as above: replace
ewith nothing or_.Proposed fix
- except Exception as e: + except Exception: logger.warning( f"[build_file_schemas] Failed to generate signed URL for file {file.id}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/stt_evaluations/helpers.py` around lines 61 - 64, The except block in build_file_schemas currently captures an unused exception variable (`except Exception as e:`) before calling logger.warning; remove the unused variable by changing the handler to `except Exception:` (or `except Exception as _:`) so the exception variable is not created, keeping the existing logger.warning call referencing file.id unchanged; update the except clause in the helper where the failing signed-URL generation is handled (the try/except surrounding logger.warning for file.id).
31-34: Remove unused exception variable.The exception is caught but
eis never used. Replace with_to indicate intentional discard and satisfy Ruff F841.Proposed fix
- except Exception as e: + except Exception: logger.warning( f"[build_file_schema] Failed to generate signed URL for file {file.id}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/stt_evaluations/helpers.py` around lines 31 - 34, In the except block that logs "[build_file_schema] Failed to generate signed URL for file {file.id}" (inside build_file_schema), change the unused exception binding from "except Exception as e:" to "except Exception as _:" so the exception variable is intentionally discarded and Ruff F841 is satisfied; no other logic changes required.
4-4: ImportIterablefromcollections.abcinstead oftyping.Per Python 3.9+,
Iterableshould be imported fromcollections.abc. Thetypingmodule re-exports are deprecated.Proposed fix
-from typing import Iterable +from collections.abc import Iterable🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/stt_evaluations/helpers.py` at line 4, Replace the deprecated import "from typing import Iterable" with "from collections.abc import Iterable" in helpers.py; locate the line that imports Iterable (symbol: Iterable) and change its source module to collections.abc so the code uses the modern Python 3.9+ import while leaving all downstream uses of Iterable unchanged.
47-49: Docstring parameter name mismatch.Same issue as above:
include_urlshould beinclude_signed_url.Proposed fix
Args: files: Iterable of File database model instances - include_url: Whether to generate and include signed URLs + include_signed_url: Whether to generate and include signed URLs storage: Cloud storage instance for generating signed URLs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/stt_evaluations/helpers.py` around lines 47 - 49, The docstring in backend/app/services/stt_evaluations/helpers.py uses the wrong parameter name `include_url`; update the docstring to match the actual function signature by renaming that parameter description to `include_signed_url` (keep the existing description "Whether to generate and include signed URLs") so the docstring aligns with the function parameter (referencing the helper function in that file that accepts files and include_signed_url).
19-21: Docstring parameter name mismatch.The docstring references
include_urlbut the actual parameter isinclude_signed_url.Proposed fix
Args: file: The File database model instance - include_url: Whether to generate and include a signed URL + include_signed_url: Whether to generate and include a signed URL storage: Cloud storage instance for generating signed URLs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/stt_evaluations/helpers.py` around lines 19 - 21, Update the function docstring in stt_evaluations/helpers.py so the parameter name matches the actual signature: replace the incorrect `include_url` name with `include_signed_url` and ensure the description reads "Whether to generate and include a signed URL" (or equivalent) for the `include_signed_url` parameter; also search the same docstring for any other occurrences of `include_url` and correct them to `include_signed_url` so the docs and function signature are consistent with the `file` parameter description.backend/app/api/routes/stt_evaluations/files.py (1)
71-75: Consider wrappingget_cloud_storagein try/except for resilience.If
get_cloud_storage()raises an exception (e.g., missing credentials, configuration error), the endpoint will fail entirely. Since signed URLs are optional metadata, consider catching exceptions and proceeding withstorage=Noneto return file metadata without signed URLs rather than failing the request.Proposed fix for list_audio
storage = None if include_signed_url: - storage = get_cloud_storage( - session=_session, project_id=auth_context.project_.id - ) + try: + storage = get_cloud_storage( + session=_session, project_id=auth_context.project_.id + ) + except Exception: + logger.warning( + f"[list_audio] Failed to initialize cloud storage for signed URLs | " + f"project_id: {auth_context.project_.id}" + )Proposed fix for get_audio
storage = None if include_signed_url: - storage = get_cloud_storage( - session=_session, project_id=auth_context.project_.id - ) + try: + storage = get_cloud_storage( + session=_session, project_id=auth_context.project_.id + ) + except Exception: + logger.warning( + f"[get_audio] Failed to initialize cloud storage for signed URLs | " + f"file_id: {file_id}" + )Also applies to: 122-126
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/stt_evaluations/files.py` around lines 71 - 75, The call to get_cloud_storage() when include_signed_url is true can raise and abort the request; wrap the call in a try/except around the storage assignment (the storage variable set in the list_audio and get_audio handlers) so that on any exception you log/debug the error and fall back to storage = None, allowing the handlers (list_audio/get_audio) to continue returning file metadata without signed URLs; ensure you only attempt signed-url generation later if storage is not None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 106-110: The log message in the get_audio path contains a typo:
the formatted field name uses include__signed_url (double underscore) instead of
include_signed_url; update the logger.info call (the f-string inside the
get_audio handler where logger.info is invoked) to use include_signed_url so the
log prints the correct variable name and value.
---
Duplicate comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 112-120: The get_audio path currently calls get_file_by_id without
checking file type, allowing non-audio files to be returned; update the handler
(the get_audio flow around get_file_by_id) to enforce audio-only results by
verifying file.type == FileType.AUDIO.value (or using FileType.AUDIO) after
retrieving file and raising HTTPException(404) if it isn't audio; ensure
FileType is imported/available and keep the error message consistent with the
existing 404 handling.
---
Nitpick comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 71-75: The call to get_cloud_storage() when include_signed_url is
true can raise and abort the request; wrap the call in a try/except around the
storage assignment (the storage variable set in the list_audio and get_audio
handlers) so that on any exception you log/debug the error and fall back to
storage = None, allowing the handlers (list_audio/get_audio) to continue
returning file metadata without signed URLs; ensure you only attempt signed-url
generation later if storage is not None.
In `@backend/app/services/stt_evaluations/helpers.py`:
- Around line 61-64: The except block in build_file_schemas currently captures
an unused exception variable (`except Exception as e:`) before calling
logger.warning; remove the unused variable by changing the handler to `except
Exception:` (or `except Exception as _:`) so the exception variable is not
created, keeping the existing logger.warning call referencing file.id unchanged;
update the except clause in the helper where the failing signed-URL generation
is handled (the try/except surrounding logger.warning for file.id).
- Around line 31-34: In the except block that logs "[build_file_schema] Failed
to generate signed URL for file {file.id}" (inside build_file_schema), change
the unused exception binding from "except Exception as e:" to "except Exception
as _:" so the exception variable is intentionally discarded and Ruff F841 is
satisfied; no other logic changes required.
- Line 4: Replace the deprecated import "from typing import Iterable" with "from
collections.abc import Iterable" in helpers.py; locate the line that imports
Iterable (symbol: Iterable) and change its source module to collections.abc so
the code uses the modern Python 3.9+ import while leaving all downstream uses of
Iterable unchanged.
- Around line 47-49: The docstring in
backend/app/services/stt_evaluations/helpers.py uses the wrong parameter name
`include_url`; update the docstring to match the actual function signature by
renaming that parameter description to `include_signed_url` (keep the existing
description "Whether to generate and include signed URLs") so the docstring
aligns with the function parameter (referencing the helper function in that file
that accepts files and include_signed_url).
- Around line 19-21: Update the function docstring in stt_evaluations/helpers.py
so the parameter name matches the actual signature: replace the incorrect
`include_url` name with `include_signed_url` and ensure the description reads
"Whether to generate and include a signed URL" (or equivalent) for the
`include_signed_url` parameter; also search the same docstring for any other
occurrences of `include_url` and correct them to `include_signed_url` so the
docs and function signature are consistent with the `file` parameter
description.
In `@backend/app/tests/api/routes/test_stt_evaluation.py`:
- Around line 1270-1276: The test assigns user_file from create_test_file but
never uses it; either remove the unused assignment or mark it intentionally
unused by prefixing the variable with an underscore (e.g., _user_file) so
linters/test checks stop flagging it; locate the call to create_test_file in the
test_stt_evaluation.py test and either drop the assignment entirely or rename
user_file to _user_file to indicate it is intentionally unused.
- Around line 1239-1245: The test assigns the return of create_test_file to the
unused variable file in test_stt_evaluation.py; either remove the assignment
(call create_test_file(...) without capturing the result) or use a throwaway
name like _ or actually assert/inspect the created object; update the line
referencing create_test_file(...) accordingly so there are no unused variable
warnings for file.
- Around line 1203-1209: The test assigns the result of create_test_file(...) to
the unused local variable file; remove the unused binding by either inlining the
call (call create_test_file(...) without assignment) or assign it to _ instead
(e.g., _ = create_test_file(...)) so the helper is executed but the unused
variable warning is resolved; update the invocation that currently binds to file
in test_stt_evaluation.py accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d36cab9d-9a4b-4f96-a66f-9c246c9ca685
📒 Files selected for processing (5)
backend/app/api/routes/stt_evaluations/files.pybackend/app/crud/file.pybackend/app/models/file.pybackend/app/services/stt_evaluations/helpers.pybackend/app/tests/api/routes/test_stt_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/models/file.py
|
|
||
| id: int | ||
| object_store_url: str | ||
| signed_url: str | None = None |
There was a problem hiding this comment.
come to think of it I think the structure should be
STTResult -> STTSample -> File -> object_store_url & signed_url
we can discuss more on this
{
"id": 42,
"transcription": "hello world",
"stt_sample": {
"id": 100,
"file": {
"id": 200,
"object_store_url": "s3://bucket/file.wav",
"signed_url": "https://signed-url..."
}
}
}
There was a problem hiding this comment.
but wont these endpoints (the ones being covered in this pr), wont they just return file response model only
There was a problem hiding this comment.
♻️ Duplicate comments (1)
backend/app/api/routes/stt_evaluations/files.py (1)
77-85:⚠️ Potential issue | 🟠 MajorEnforce audio-only scope before building the response.
Line 77 and Line 112 still rely on generic file lookups, so these handlers can return non-audio records unless
backend/app/crud/file.pyis already filtering by type. Filter the list toFileType.AUDIO.valueand 404 non-audio records inget_audiobefore calling the schema builders.Possible fix
-from app.models.file import AudioUploadResponse, FilePublic +from app.models.file import AudioUploadResponse, FilePublic, FileType @@ - files = list_files( - session=session, - organization_id=auth_context.organization_.id, - project_id=auth_context.project_.id, - ) + files = [ + file + for file in list_files( + session=session, + organization_id=auth_context.organization_.id, + project_id=auth_context.project_.id, + ) + if file.file_type == FileType.AUDIO.value + ] @@ - if not file: + if not file or file.file_type != FileType.AUDIO.value: raise HTTPException(status_code=404, detail=f"File with ID {file_id} not found")#!/bin/bash set -euo pipefail echo "== CRUD helper definitions ==" rg -n -C4 'def (list_files|get_file_by_id)\(' backend/app/crud/file.py echo echo "== File type enum / file_type usage ==" rg -n -C3 'class FileType|file_type' \ backend/app/models/file.py \ backend/app/crud/file.py \ backend/app/api/routes/stt_evaluations/files.pyExpected result: either
list_files()already exposes a file-type filter you can pass through here, or this route needs the localFileType.AUDIO.valuechecks above.Also applies to: 112-120
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/routes/stt_evaluations/files.py` around lines 77 - 85, The handlers call list_files and get_file_by_id then pass results to build_file_schemas/get_audio without ensuring items are audio; update the logic to enforce audio-only scope by either passing a file-type filter into list_files (if available) or filtering the returned list to only items with file_type == FileType.AUDIO.value before calling build_file_schemas, and in the single-file handler (get_audio) return a 404 when the retrieved file's file_type != FileType.AUDIO.value; refer to the symbols list_files, get_file_by_id, build_file_schemas, get_audio, and FileType.AUDIO.value to locate the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@backend/app/api/routes/stt_evaluations/files.py`:
- Around line 77-85: The handlers call list_files and get_file_by_id then pass
results to build_file_schemas/get_audio without ensuring items are audio; update
the logic to enforce audio-only scope by either passing a file-type filter into
list_files (if available) or filtering the returned list to only items with
file_type == FileType.AUDIO.value before calling build_file_schemas, and in the
single-file handler (get_audio) return a 404 when the retrieved file's file_type
!= FileType.AUDIO.value; refer to the symbols list_files, get_file_by_id,
build_file_schemas, get_audio, and FileType.AUDIO.value to locate the changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ca77f37-17e7-4900-ab70-0ceb3f35d307
📒 Files selected for processing (1)
backend/app/api/routes/stt_evaluations/files.py
Summary
Target issue is #656
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
New Features
Tests
Summary by CodeRabbit
New Features
Documentation
Tests