Refactor Document Module and Cloud Storage to use project-based organization and add signed URL support#346
Conversation
WalkthroughModels, storage, CRUD, API routes, and tests were migrated from user-scoped to project-scoped ownership: documents now reference project_id and project.storage_path; public response models (DocumentPublic, Message) added; CloudStorage gained project-aware APIs (signed URL, size, delete); soft-delete (is_deleted) and an Alembic migration were added. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as Documents API
participant Auth as CurrentUserOrgProject
participant Proj as ProjectService
participant Store as AmazonCloudStorage(project_id)
participant CRUD as DocumentCrud(project_id)
Client->>API: POST /docs/upload (file, X-API-KEY)
API->>Auth: resolve user + project_id
API->>Proj: get_project_by_id(project_id)
Proj-->>API: Project(storage_path)
API->>Store: put(file, path=storage_path/...)
Store-->>API: object_store_url
API->>CRUD: create(document with project_id, object_store_url)
CRUD-->>API: Document
API-->>Client: APIResponse(DocumentPublic)
sequenceDiagram
autonumber
actor Client
participant API as Documents API
participant Auth as CurrentUserOrgProject
participant CRUD as DocumentCrud(project_id)
participant Store as AmazonCloudStorage(project_id)
Client->>API: GET /docs/{id}?include_url=true (X-API-KEY)
API->>Auth: resolve user + project_id
API->>CRUD: read_one(id) [filters is_deleted=False]
CRUD-->>API: Document
alt include_url == true
API->>Store: get_signed_url(object_store_url, expires_in)
Store-->>API: signed_url
end
API-->>Client: APIResponse(DocumentPublic with optional signed_url)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/document.py (1)
46-56: Fix log context keys: use project_id (not owner_id) in error logsThe structured log payload still uses 'owner_id'. Replace with 'project_id'.
Apply this diff:
- logger.error( - f"[DocumentCrud.read_many] Invalid skip value | {{'owner_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", + logger.error( + f"[DocumentCrud.read_many] Invalid skip value | {{'project_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", @@ - logger.error( - f"[DocumentCrud.read_many] Invalid limit value | {{'owner_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}", + logger.error( + f"[DocumentCrud.read_many] Invalid limit value | {{'project_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}",Also applies to: 58-68
🧹 Nitpick comments (21)
backend/app/models/project.py (1)
1-1: Nit: remove unused import UUIDOnly uuid4 is used.
Apply this diff:
-from uuid import UUID, uuid4 +from uuid import uuid4backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py (1)
8-11: Remove unused importsqlmodel.sql.sqltypes is not used.
Apply this diff:
-from alembic import op -import sqlalchemy as sa -import sqlmodel.sql.sqltypes +from alembic import op +import sqlalchemy as sabackend/app/crud/document.py (3)
42-45: Optional: add deterministic ordering to read_manyConsider ordering results (e.g., newest first) for stable pagination.
Apply this diff:
- statement = select(Document).where( + statement = select(Document).where( and_(Document.project_id == self.project_id, Document.is_deleted.is_(False)) - ) + ).order_by(Document.inserted_at.desc())Also applies to: 70-71
83-91: Tighten error message grammar and detail in read_eachMinor but user-facing in logs; fix “atleast” and pluralization, and include missing IDs for easier debugging.
Apply this diff:
- raise ValueError( - f"Requested atleast {requested_count} document retrieved {retrieved_count}" - ) + missing = set(doc_ids) - {d.id for d in results} + raise ValueError( + f"Requested at least {requested_count} documents, retrieved {retrieved_count}. Missing: {sorted(missing)}" + )
98-106: Clarify ownership error messageThe placeholders are reversed and “attempter” is awkward. Make the message unambiguous.
Apply this diff:
- elif document.project_id != self.project_id: - error = "Invalid document ownership: project={} attempter={}".format( - self.project_id, - document.project_id, - ) + elif document.project_id != self.project_id: + error = "Invalid document ownership: document.project_id={} request.project_id={}".format( + document.project_id, + self.project_id, + )backend/app/models/document.py (3)
10-16: Index project_id for query performanceMost queries filter by project_id and is_deleted. Adding an index on project_id (and optionally a composite (project_id, is_deleted)) will speed reads.
Apply this diff (minimal):
- project_id: int = Field( + project_id: int = Field( description="The ID of the project to which the document belongs", foreign_key="project.id", - nullable=False, + nullable=False, + index=True, ondelete="CASCADE", )If you prefer a composite index, we can add it via Alembic instead.
37-47: Enable from-attributes mapping for DocumentPublic (SQLModel/Pydantic v2)When returning ORM instances as DocumentPublic, set model_config so model_validate(..., from_attributes=True) works cleanly.
Apply this diff:
class DocumentPublic(DocumentBase): id: UUID = Field(description="The unique identifier of the document") signed_url: str | None = Field( default=None, description="A signed URL for accessing the document" ) inserted_at: datetime = Field( description="The timestamp when the document was inserted" ) updated_at: datetime = Field( description="The timestamp when the document was last updated" ) + # Pydantic v2 / SQLModel v0.0.22+ + model_config = {"from_attributes": True}And add this import once at the top if you prefer a typed config:
# optional typed config from pydantic import ConfigDict # ... model_config = ConfigDict(from_attributes=True)
37-47: Confirm public exposure of project_id is intentionalDocumentPublic inherits project_id from DocumentBase. If APIs are always project-scoped, returning project_id may be redundant and could leak internal IDs to clients. Keep if needed; otherwise consider a slimmer public DTO without project_id.
backend/app/models/__init__.py (1)
5-5: Silence Ruff F401 for package re-exportsIn init.py, importing for re-export triggers F401. Easiest is to add a noqa on the line.
Apply this diff:
-from .document import Document, DocumentPublic +from .document import Document, DocumentPublic # noqa: F401Alternative: define all and include these names explicitly.
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py (1)
10-10: Remove unused import.
sqlmodel.sql.sqltypesis not used.-import sqlmodel.sql.sqltypesbackend/app/api/routes/collections.py (2)
27-27: Drop unused importDocument.
Documentisn’t referenced in this module. Keeps lints clean.-from app.models import Collection, Document, DocumentPublic +from app.models import Collection, DocumentPublic
426-439: Response model now uses DocumentPublic — filter out soft-deleted docs and prefer built-inlisttyping.
- The route returns
DocumentPublic, butDocumentCollectionCrud.readdoesn’t currently excludeDocument.is_deleteddocuments or enforce project scoping. You can prevent leaking soft-deleted or cross-project docs by filtering in the CRUD.- Prefer
listoverListper modern typing style.- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],Outside this file (CRUD suggestion for
backend/app/crud/document_collection.py):# inside DocumentCollectionCrud.read(...) statement = ( select(Document) .join(DocumentCollection, DocumentCollection.document_id == Document.id) .where( and_( DocumentCollection.collection_id == collection.id, Document.is_deleted.is_(False), Document.project_id == collection.project_id, # ensure project-scoping ) ) )backend/app/api/routes/documents.py (5)
1-6: Adopt built-inlisttyping and removetyping.Listimport.Modern Python favors
list[...]overList[...]. Also drop the now-unused import.-from typing import List +from typing import List # remove @@ - response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]], @@ - response_model=APIResponse[DocumentPublic], + response_model=APIResponse[DocumentPublic], @@ - response_model=APIResponse[DocumentPublic], + response_model=APIResponse[DocumentPublic],And actually remove the import:
-from typing import ListAlso applies to: 20-24, 123-126
4-4: Build S3 keys with POSIX semantics to avoid backslashes on Windows.
pathlib.Pathcan emit backslashes on Windows, which becomes part of the S3 key. UsePurePosixPathto guarantee forward slashes in object keys.-from pathlib import Path +from pathlib import PurePosixPath @@ - key = Path(str(project.storage_path), str(document_id)) + key = PurePosixPath(str(project.storage_path)) / str(document_id)Also applies to: 52-54
46-54: Validate project existence before upload — good. Consider early content-type/size checks.404 on missing project is correct. As a follow-up, consider validating
src.content_typeand limiting size before uploading to avoid wasted bandwidth.
84-85: Remove unused variable.The result from
c_crud.delete(document, a_crud)is not used.- data = c_crud.delete(document, a_crud) + c_crud.delete(document, a_crud)
106-116: “Permanent” delete still soft-deletes DB row. Consider adding a hard-delete path.You delete the S3 object, but
d_crud.delete(doc_id)only marks the document as deleted. If “permanent” should also remove the DB row, add apurgetoDocumentCrud(hard delete) and call it here.Outside this file (in
backend/app/crud/document.py):def purge(self, doc_id: UUID): document = self.read_one(doc_id) self.session.delete(document) self.session.commit() logger.info(f"[DocumentCrud.purge] Document hard-deleted | {{'doc_id': '{doc_id}', 'project_id': {self.project_id}}}")Then update this route to:
c_crud.delete(document, a_crud) storage.delete(document.object_store_url) d_crud.purge(doc_id)Also applies to: 117-119
backend/app/core/cloud/storage.py (4)
13-13: Remove unused import.
UserProjectOrgisn’t referenced.-from app.models import UserProjectOrg
110-116: Align abstractputsignature with subclass.Base class uses
basename: str, subclass useskey: Path. Aligning keeps contracts consistent and helps tooling.- def put(self, source: UploadFile, basename: str): + def put(self, source: UploadFile, key: Path): raise NotImplementedError()
126-141: Normalize S3 key formatting withas_posix().Ensure forward slashes regardless of OS.
- def put(self, source: UploadFile, key: Path) -> SimpleStorageName: - destination = SimpleStorageName(str(key)) + def put(self, source: UploadFile, key: Path) -> SimpleStorageName: + destination = SimpleStorageName(key.as_posix())
190-216: Presigned URL implementation — solid. Consider validating scheme.Implementation looks correct. As a minor hardening, validate that
urlhass3scheme before presigning and raise a clear error otherwise.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py(1 hunks)backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py(1 hunks)backend/app/api/routes/collections.py(3 hunks)backend/app/api/routes/documents.py(6 hunks)backend/app/core/cloud/storage.py(7 hunks)backend/app/crud/document.py(6 hunks)backend/app/models/__init__.py(1 hunks)backend/app/models/document.py(1 hunks)backend/app/models/project.py(2 hunks)backend/app/models/user.py(0 hunks)
💤 Files with no reviewable changes (1)
- backend/app/models/user.py
🧰 Additional context used
🧬 Code graph analysis (9)
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py (2)
backend/app/alembic/versions/99f4fc325617_add_organization_project_setup.py (1)
upgrade(20-72)backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)
upgrade(21-73)
backend/app/models/project.py (3)
backend/app/alembic/versions/d98dd8ec85a3_edit_replace_id_integers_in_all_models_.py (1)
upgrade(21-73)backend/app/alembic/versions/66abc97f3782_user_id_from_uuid_to_int.py (1)
upgrade(20-152)backend/app/alembic/versions/0f205e3779ee_add_api_key_table.py (1)
upgrade(20-39)
backend/app/models/__init__.py (1)
backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py (4)
backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/alembic/versions/8e7dc5eab0b0_add_fk_constraint_to_user_id_columns.py (1)
upgrade(20-42)backend/app/alembic/versions/60b6c511a485_add_project_id_in_api_key_table.py (1)
upgrade(20-47)backend/app/alembic/versions/99f4fc325617_add_organization_project_setup.py (1)
upgrade(20-72)
backend/app/api/routes/collections.py (5)
backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/core/cloud/storage.py (1)
AmazonCloudStorage(121-232)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/utils.py (1)
APIResponse(27-48)backend/app/crud/document_collection.py (2)
DocumentCollectionCrud(8-48)create(12-23)
backend/app/crud/document.py (5)
backend/app/crud/collection.py (3)
read_one(84-93)delete(107-115)CollectionCrud(17-146)backend/app/models/document.py (1)
Document(19-34)backend/app/tests/crud/documents/test_crud_document_delete.py (2)
document(13-21)TestDatabaseDelete(24-44)backend/app/tests/crud/documents/test_crud_document_update.py (2)
TestDatabaseUpdate(15-67)test_update_respects_owner(57-67)backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)
backend/app/models/document.py (6)
backend/app/core/util.py (1)
now(13-14)backend/app/models/document_collection.py (1)
DocumentCollection(9-23)backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/models/user.py (1)
User(48-60)backend/app/tests/utils/document.py (1)
DocumentMaker(30-48)backend/app/tests/crud/documents/test_crud_document_update.py (1)
TestDatabaseUpdate(15-67)
backend/app/api/routes/documents.py (7)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/collection.py (3)
CollectionCrud(17-146)delete(107-115)read_one(84-93)backend/app/crud/project.py (1)
get_project_by_id(26-28)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/models/message.py (1)
Message(5-6)backend/app/utils.py (2)
APIResponse(27-48)success_response(34-37)backend/app/core/cloud/storage.py (5)
AmazonCloudStorage(121-232)put(114-115)put(126-150)delete(217-232)get_signed_url(190-215)
backend/app/core/cloud/storage.py (1)
backend/app/models/user.py (1)
UserProjectOrg(65-66)
🪛 Ruff (0.12.2)
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py
10-10: sqlmodel.sql.sqltypes imported but unused
Remove unused import: sqlmodel.sql.sqltypes
(F401)
backend/app/models/__init__.py
5-5: .document.Document imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
5-5: .document.DocumentPublic imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
10-10: sqlmodel.sql.sqltypes imported but unused
Remove unused import: sqlmodel.sql.sqltypes
(F401)
backend/app/api/routes/collections.py
27-27: app.models.Document imported but unused
Remove unused import: app.models.Document
(F401)
426-426: Use list instead of List for type annotation
Replace with list
(UP006)
backend/app/api/routes/documents.py
23-23: Use list instead of List for type annotation
Replace with list
(UP006)
84-84: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
backend/app/core/cloud/storage.py
13-13: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/crud/document.py (2)
19-25: LGTM: read_one adds correct project-scoped + soft-delete filtersThe query adds both project_id equality and is_deleted=False. This matches the new ownership model and soft delete semantics.
125-136: LGTM: delete implements soft-deletion via is_deleted/deleted_atSoft delete flows through update() and logs context. Behavior aligns with list/read filters.
backend/app/api/routes/collections.py (1)
228-230: Project-scoped storage and CRUD initialization — good change.Switching to
AmazonCloudStorage(current_user.project_id)andDocumentCrud(session, current_user.project_id)aligns the collections flow with the new per-project storage model.backend/app/api/routes/documents.py (2)
31-33: Project-scoped listing — good alignment.
DocumentCrud(session, current_user.project_id)ensures list operations respect project boundaries andis_deletedfiltering via the CRUD. Looks good.
131-144: Signed URL support indoc_info— looks correct.The
DocumentPublic.model_validate(..., from_attributes=True)conversion and conditional presign are clean. Good defaults.
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/b79fc198879a_add_storage_path_column_in_project_table.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/42c4c8b22a09_add_is_deleted_column_in_document.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (1)
18-24: Remove importedcrawlerfixture to avoid shadowing parameterImporting
crawlerand also using a parameter namedcrawlercan trip linters (F811) and is unnecessary.from app.tests.utils.document import ( DocumentStore, DocumentMaker, Route, WebCrawler, - crawler, crawler, )Replace with:
from app.tests.utils.document import ( DocumentStore, DocumentMaker, Route, WebCrawler, - crawler, +)backend/app/tests/api/routes/documents/test_route_document_info.py (1)
4-12: Fix linter error: importedcrawlershadows the fixture parameterRuff flags F811 (redefinition/shadowing). Drop the import; rely on the fixture parameter.
from app.tests.utils.document import ( DocumentComparator, DocumentMaker, DocumentStore, Route, WebCrawler, - crawler, httpx_to_standard, )backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
80-81: Don’t persist absolute local file paths as fname.
Storing full paths can leak local env/PII; store basename only. Update test expectation accordingly (and ensure route normalizes to basename).Apply this diff:
- assert result.fname == str(scratch) + assert result.fname == Path(scratch).nameIf the route currently stores the full path, normalize it server-side using Path(file.filename).name before persisting. I can provide a patch if desired.
backend/app/tests/utils/document.py (1)
136-150: Fix incorrect@singledispatchmethodusage (don’t combine with@staticmethod).Current implementation will raise due to wrong signature. Accept
selfand register typed variants accordingly.- @ft.singledispatchmethod - @staticmethod - def to_string(value): - return value + @ft.singledispatchmethod + def to_string(self, value): + return value @@ - @to_string.register - @staticmethod - def _(value: UUID): - return str(value) + @to_string.register + def _(self, value: UUID): + return str(value) @@ - @to_string.register - @staticmethod - def _(value: datetime): - return value.isoformat() + @to_string.register + def _(self, value: datetime): + return value.isoformat()
🧹 Nitpick comments (20)
backend/app/tests/crud/collections/test_crud_collection_read_one.py (1)
8-8: Remove unused import: settingsKeeps tests lint-clean.
-from app.core.config import settingsbackend/app/tests/crud/collections/test_crud_collection_delete.py (2)
61-62: Prefer.is_(False)for SQLAlchemy boolean comparisonsAvoid E712 and generate correct SQL.
-stmt = select(APIKey).where( - APIKey.project_id == project.id, APIKey.is_deleted == False -) +stmt = select(APIKey).where( + APIKey.project_id == project.id, APIKey.is_deleted.is_(False) +)
6-6: Remove unused import: settingsKeep lint green.
-from app.core.config import settingsbackend/app/tests/crud/collections/test_crud_collection_read_all.py (2)
8-8: Add missing import for DocumentMakerRequired if adopting the proposed fix.
-from app.tests.utils.document import DocumentStore +from app.tests.utils.document import DocumentStore, DocumentMaker
29-33: Fixture signature looks wrong and unusedThis fixture takes self (invalid for pytest fixtures) and appears unused. Remove or fix to avoid confusion.
-@pytest.fixture(scope="class") -def refresh(self, db: Session): - db.query(Collection).delete() - db.commit() +@pytest.fixture(scope="class") +def refresh(db: Session): + db.query(Collection).delete() + db.commit()If unused, delete the fixture entirely.
backend/app/tests/crud/documents/test_crud_document_read_one.py (2)
40-43: Avoid dependency on a hard-coded project name ("Dalgo").
This can flake if such a project isn’t seeded. Use a guaranteed different project_id without relying on DB seed data.Apply this diff:
- other_project = get_project(db, name="Dalgo") # Different project - - crud = DocumentCrud(db, other_project.id) + other_project_id = store.project.id + 1 # Different project + crud = DocumentCrud(db, other_project_id)
38-48: Add coverage for soft-deleted read behavior.
Reading a soft-deleted document should 404; add a test.Example:
def test_cannot_read_soft_deleted_document(self, db: Session, store: DocumentStore): doc = store.put() crud = DocumentCrud(db, store.project.id) crud.delete(doc.id) with pytest.raises(HTTPException) as exc_info: crud.read_one(doc.id) assert exc_info.value.status_code == 404backend/app/tests/api/routes/documents/test_route_document_remove.py (2)
57-66: Assert deleted_at as well to fully validate soft delete.
Currently only checks is_deleted.Apply this diff:
result = db.exec(statement).one() - assert result.is_deleted is True + assert result.is_deleted is True + assert result.deleted_at is not None
83-87: Strengthen assertion on unknown-document removal.
Also verify 404 and error detail.Apply this diff:
- response = crawler.delete(route.append(next(maker))) - - assert response.is_error + response = crawler.delete(route.append(next(maker))) + assert response.status_code == 404 + body = response.json() + assert body.get("detail") == "Document not found"backend/app/tests/api/routes/documents/test_route_document_upload.py (2)
93-104: Add S3-specific assertions for scheme and bucket.
Makes the storage contract explicit.Apply this diff:
- url = urlparse(result.object_store_url) + url = urlparse(result.object_store_url) + assert url.scheme == "s3" + assert url.netloc == settings.AWS_S3_BUCKET key = Path(url.path) key = key.relative_to(key.root)Optionally, also assert the key prefix matches the project’s storage_path.
62-64: Missing tests for signed URL support.
Add an API test that requests a signed GET URL for an uploaded document and validates: (a) 200 response, (b) URL host is the configured S3 endpoint, (c) query contains X-Amz-Signature and X-Amz-Expires, (d) access is denied after expiry in a time-mocked context.I can draft this test against your current route shape if you share the endpoint signature.
backend/app/tests/crud/documents/test_crud_document_update.py (3)
52-58: Rename tests to reflect “project” terminology.
The implementation moved away from “owner”; align test names.Apply this diff:
-def test_update_sets_default_owner( +def test_update_sets_default_project( @@ -def test_update_respects_owner( +def test_update_respects_project(
65-70: Valid negative ownership check.
For robustness, consider creating a real second project instead of +1 id if/when fixtures are available.
19-20: Leftover “owner_id” in DocumentCrud logs.
Log fields still reference owner_id; rename to project_id for consistency across the refactor.Proposed patch (in app/crud/document.py):
- f"[DocumentCrud.read_many] Invalid skip value | {{'owner_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", + f"[DocumentCrud.read_many] Invalid skip value | {{'project_id': {self.project_id}, 'skip': {skip}, 'error': '{str(err)}'}}", @@ - f"[DocumentCrud.read_many] Invalid limit value | {{'owner_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}", + f"[DocumentCrud.read_many] Invalid limit value | {{'project_id': {self.project_id}, 'limit': {limit}, 'error': '{str(err)}'}}",backend/app/tests/crud/documents/test_crud_document_delete.py (1)
37-47: Cross-project delete correctly yields 404.
Good negative test for access scoping.Consider adding a test that a second delete on the same id returns 404 to confirm idempotent soft-delete behavior.
backend/app/tests/crud/documents/test_crud_document_read_many.py (2)
12-16: Fail fast if project lookup returns None in the fixture.Add an assertion so the fixture errors early with a clear message when no active project is found (avoids later AttributeError on
store.project.id).project = get_project(db) ds = DocumentStore(db, project.id) ds.fill(TestDatabaseReadMany._ndocs) + assert ds.project is not None, "DocumentStore.project not initialized"
27-27: DRY: provide acrudfixture instead of re-creatingDocumentCrudin each test.Reduces duplication and keeps test setup consistent.
@pytest.fixture def store(db: Session): project = get_project(db) ds = DocumentStore(db, project.id) ds.fill(TestDatabaseReadMany._ndocs) return ds + +@pytest.fixture +def crud(db: Session, store: DocumentStore): + return DocumentCrud(db, store.project.id)Then update tests to accept
crudand remove local instantiations:- crud = DocumentCrud(db, store.project.id) docs = crud.read_many()(Apply similarly in other tests.)
Also applies to: 44-44, 55-55, 64-64, 73-73, 82-82, 93-93, 101-101, 110-110
backend/app/tests/utils/document.py (3)
27-31: Type the lazy-loaded project as optional.Clarifies intent and avoids static analysis warnings.
- self.project: Project = None + self.project: Project | None = None
43-43: Align file extension betweenkeyandfnamefor consistency.Using different extensions is confusing in tests and storage expectations.
- key = f"{self.project.storage_path}/{doc_id}.txt" + key = f"{self.project.storage_path}/{doc_id}.xyz"(Alternatively, switch both to
.txtif downstream expects that.)Also applies to: 49-52
155-156: Minor: redundantdict()wrapper.
to_public_dict()already returns a dict.- this = dict(self.to_public_dict()) + this = self.to_public_dict()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (18)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py(1 hunks)backend/app/models/project.py(2 hunks)backend/app/tests/api/routes/collections/test_create_collections.py(3 hunks)backend/app/tests/api/routes/documents/test_route_document_info.py(3 hunks)backend/app/tests/api/routes/documents/test_route_document_list.py(2 hunks)backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py(2 hunks)backend/app/tests/api/routes/documents/test_route_document_remove.py(3 hunks)backend/app/tests/api/routes/documents/test_route_document_upload.py(5 hunks)backend/app/tests/crud/collections/test_crud_collection_create.py(1 hunks)backend/app/tests/crud/collections/test_crud_collection_delete.py(2 hunks)backend/app/tests/crud/collections/test_crud_collection_read_all.py(1 hunks)backend/app/tests/crud/collections/test_crud_collection_read_one.py(1 hunks)backend/app/tests/crud/documents/test_crud_document_delete.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_read_many.py(9 hunks)backend/app/tests/crud/documents/test_crud_document_read_one.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_update.py(3 hunks)backend/app/tests/utils/collection.py(1 hunks)backend/app/tests/utils/document.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/models/project.py
🧰 Additional context used
🧬 Code graph analysis (16)
backend/app/tests/utils/collection.py (4)
backend/app/models/collection.py (1)
Collection(21-54)backend/app/tests/utils/utils.py (1)
get_user_id_by_email(52-54)backend/app/crud/collection.py (3)
__init__(18-20)read_one(84-93)CollectionCrud(17-146)backend/app/api/routes/collections.py (1)
collection_info(399-406)
backend/app/tests/crud/collections/test_crud_collection_read_all.py (3)
backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
store(11-16)backend/app/tests/crud/documents/test_crud_document_read_one.py (1)
store(13-15)backend/app/tests/utils/document.py (2)
DocumentStore(55-82)fill(81-82)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py (4)
backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)backend/app/alembic/versions/60b6c511a485_add_project_id_in_api_key_table.py (1)
upgrade(20-47)backend/app/alembic/versions/0f205e3779ee_add_api_key_table.py (1)
upgrade(20-39)backend/app/alembic/versions/99f4fc325617_add_organization_project_setup.py (1)
upgrade(20-72)
backend/app/tests/api/routes/documents/test_route_document_list.py (4)
backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
store(11-16)backend/app/tests/crud/documents/test_crud_document_read_one.py (1)
store(13-15)backend/app/tests/utils/document.py (2)
DocumentStore(55-82)crawler(171-172)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (2)
backend/app/tests/utils/document.py (3)
DocumentStore(55-82)crawler(171-172)DocumentMaker(26-52)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)
backend/app/tests/crud/collections/test_crud_collection_create.py (4)
backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
store(11-16)backend/app/tests/crud/documents/test_crud_document_read_one.py (1)
store(13-15)backend/app/tests/utils/document.py (2)
DocumentStore(55-82)fill(81-82)backend/app/tests/crud/documents/test_crud_document_update.py (1)
documents(11-14)
backend/app/tests/crud/collections/test_crud_collection_delete.py (5)
backend/app/crud/collection.py (3)
CollectionCrud(17-146)_(118-125)_(128-146)backend/app/models/api_key.py (1)
APIKey(28-38)backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (5)
project(67-68)DocumentStore(55-82)fill(81-82)_(143-144)_(148-149)backend/app/tests/utils/collection.py (1)
get_collection(24-55)
backend/app/tests/api/routes/documents/test_route_document_upload.py (3)
backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/tests/conftest.py (3)
user_api_key(89-91)client(52-55)db(24-41)backend/app/models/document.py (1)
Document(19-34)
backend/app/tests/crud/collections/test_crud_collection_read_one.py (2)
backend/app/tests/utils/document.py (2)
DocumentStore(55-82)fill(81-82)backend/app/tests/utils/collection.py (1)
get_collection(24-55)
backend/app/tests/utils/document.py (6)
backend/app/crud/project.py (1)
get_project_by_id(37-39)backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/tests/utils/utils.py (1)
SequentialUuidGenerator(140-153)backend/app/core/config.py (1)
AWS_S3_BUCKET(76-77)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/tests/api/routes/test_api_key.py (1)
test_create_api_key(14-29)backend/app/tests/api/routes/collections/test_collection_info.py (1)
test_collection_info_successful(57-77)
backend/app/tests/crud/documents/test_crud_document_read_one.py (3)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (2)
DocumentCrud(14-135)read_one(19-35)
backend/app/tests/api/routes/documents/test_route_document_remove.py (4)
backend/app/tests/utils/document.py (5)
DocumentStore(55-82)delete(126-130)append(107-112)put(70-75)DocumentMaker(26-52)backend/app/tests/conftest.py (2)
db(24-41)user_api_key(89-91)backend/app/tests/crud/documents/test_crud_document_delete.py (1)
document(14-23)backend/app/models/document.py (1)
Document(19-34)
backend/app/tests/crud/documents/test_crud_document_update.py (3)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)DocumentMaker(26-52)backend/app/crud/document.py (2)
DocumentCrud(14-135)update(98-123)
backend/app/tests/crud/documents/test_crud_document_read_many.py (3)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (2)
project(67-68)DocumentStore(55-82)backend/app/crud/document.py (2)
DocumentCrud(14-135)read_many(37-71)
backend/app/tests/crud/documents/test_crud_document_delete.py (4)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/models/document.py (1)
Document(19-34)
🪛 Ruff (0.12.2)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
44-44: Trailing whitespace
Remove trailing whitespace
(W291)
45-45: Trailing whitespace
Remove trailing whitespace
(W291)
backend/app/tests/crud/collections/test_crud_collection_delete.py
6-6: app.core.config.settings imported but unused
Remove unused import: app.core.config.settings
(F401)
61-61: Avoid equality comparisons to False; use not APIKey.is_deleted: for false checks
Replace with not APIKey.is_deleted
(E712)
backend/app/tests/crud/collections/test_crud_collection_read_one.py
8-8: app.core.config.settings imported but unused
Remove unused import: app.core.config.settings
(F401)
10-10: app.tests.utils.collection.uuid_increment imported but unused
Remove unused import: app.tests.utils.collection.uuid_increment
(F401)
backend/app/tests/api/routes/documents/test_route_document_info.py
47-47: Redefinition of unused crawler from line 10
(F811)
🪛 GitHub Actions: AI Platform CI
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
[error] 1-1: Trailing whitespace detected by pre-commit (hook: trailing-whitespace). The file was updated.
[error] 1-1: Black formatting changed the file. Hook: black. Please re-run 'pre-commit run --all-files'.
🔇 Additional comments (32)
backend/app/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py (1)
34-52: Backfill document.project_id with earliest active API key and guard against NULLs
Use a CTE to deterministically pick each user’s earliest non-deleted API key and update only NULL project_id values, then raise an error if any NULLs remain. Sincealembic/psqlweren’t available here, manually run the migration and confirmSELECT COUNT(*) FROM document WHERE project_id IS NULL;returns 0.backend/app/tests/api/routes/documents/test_route_document_list.py (1)
51-51: LGTM: DocumentStore now scoped by project_idPer-project instantiation aligns with storage/CRUD changes.
Also applies to: 81-81
backend/app/tests/crud/collections/test_crud_collection_read_one.py (1)
18-19: LGTM: Create documents under the collection’s projectEnsures project-consistent fixtures.
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (2)
62-62: LGTM: Store scoped to projectCreating DocumentStore with project_id derived from the crawler aligns with project-scoped storage.
96-97: LGTM: Using DocumentMaker with project_id for non-existent doc caseCorrectly generates an in-memory doc tied to the right project.
backend/app/tests/crud/collections/test_crud_collection_create.py (1)
16-19: LGTM: Documents are created under the collection’s projectInitializing DocumentStore with collection.project_id matches the project-based refactor.
backend/app/tests/api/routes/collections/test_create_collections.py (3)
9-9: LGTM: Use APIKeyPublic for per-project contextSwitching the fixture type clarifies inputs and aligns with the API surface.
52-55: LGTM: Project-scoped DocumentStore in API testDocuments seeded correctly for the API call using the key’s project_id.
66-66: LGTM: Header built from API keyMatches expected auth header consumption.
backend/app/tests/api/routes/documents/test_route_document_info.py (4)
27-27: LGTM: Project-scoped store for info routeConsistent with project-based ownership.
38-38: LGTM: Same hereProject-scoped seeding looks correct.
47-47: LGTM: Correct fixture typingUsing WebCrawler matches how the tests authenticate.
50-50: LGTM: Proper maker usage for unknown docCreates an id not present in DB to exercise error path.
backend/app/tests/crud/documents/test_crud_document_read_one.py (4)
8-8: Import aligns with project-scoped tests.
Matches the new project-based access pattern.
14-15: Project-scoped DocumentStore fixture is correct.
Ensures all generated docs are tied to a specific project.
22-24: Correct: DocumentCrud now receives project_id.
Scoping reads to the project is enforced.
30-36: Good negative case using a non-persisted document.
Validates 404 on unknown id in the same project.backend/app/tests/api/routes/documents/test_route_document_remove.py (1)
38-40: Correct project scoping via API key.
Route call uses the API key’s project_id consistently.backend/app/tests/api/routes/documents/test_route_document_upload.py (2)
29-31: Header standardization looks good.
Fixed header "X-API-KEY" matches the test client helpers.
49-51: Fixture update aligns uploader with APIKeyPublic.
Clean and explicit dependency on user_api_key.backend/app/tests/crud/documents/test_crud_document_update.py (3)
19-26: Project-scoped CRUD usage is correct.
Update path respects per-project ownership.
32-36: Sequential updates ordering check is sound.
Covers monotonic insert timestamps.
42-46: Active document remains undeleted on update.
Good invariant check on is_deleted.backend/app/tests/crud/documents/test_crud_document_delete.py (2)
15-21: Project-scoped deletion path is correct.
DocumentCrud constructed with the document’s project_id and soft-deletes the record.
31-31: Proper soft-delete assertion.
Pairing this with inserted_at ≤ deleted_at is a solid check.backend/app/tests/crud/documents/test_crud_document_read_many.py (1)
44-48: LGTM on skip/limit semantics.Assertions correctly reflect offset/limit behavior and boundary cases (0, negative, and oversized values).
Also applies to: 55-57, 64-66, 73-75, 82-86, 93-95, 101-103, 110-115
backend/app/tests/utils/document.py (6)
15-16: LGTM on imports and model surface update.Imports reflect project-scoped ownership and public models.
Also applies to: 19-19
56-60: LGTM: deterministic DB state for tests.Clearing the table on init makes test runs isolated.
118-130: LGTM: API key handling moved to model, header usage correct.Header name is consistent and avoids dict plumbing across tests.
134-135: Helpful docstring.Clarifies the comparator’s intent against
DocumentPublic.
158-167: LGTM: public-shape projection matchesDocumentPublicfields.Covers inherited fields and converts UUID/datetime to strings.
171-173: LGTM: crawler fixture signature updated toAPIKeyPublic.Header wiring is correct.
...p/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
Show resolved
Hide resolved
...p/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
Outdated
Show resolved
Hide resolved
...p/alembic/versions/40307ab77e9f_add_storage_path_to_project_and_project_to_document_table.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/api/routes/documents.py (1)
83-84: Use sentence case for user-facing messageMatches prior feedback and reads better.
- Message(message="Document Deleted Successfully") + Message(message="Document deleted successfully")
🧹 Nitpick comments (12)
backend/app/core/cloud/__init__.py (1)
1-8: Export get_cloud_storage to satisfy Ruff F401Expose the public surface via all.
from .storage import ( AmazonCloudStorage, AmazonCloudStorageClient, CloudStorage, CloudStorageError, get_cloud_storage, ) + +__all__ = [ + "AmazonCloudStorage", + "AmazonCloudStorageClient", + "CloudStorage", + "CloudStorageError", + "get_cloud_storage", +]backend/app/api/routes/collections.py (2)
27-28: Remove unused importDocumentKeep imports minimal.
-from app.models import Collection, Document, DocumentPublic +from app.models import Collection, DocumentPublic
426-427: Prefer built-in generics (list) overListAlign with PEP 585 and silence UP006.
- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],backend/app/api/routes/documents.py (6)
6-6: Drop unused importHTTPExceptionSilence Ruff F401.
-from fastapi import APIRouter, File, UploadFile, Query, HTTPException +from fastapi import APIRouter, File, UploadFile, Query
9-10: Trim imports: remove unusedget_project_by_idIt’s not used after scoping to project via
CurrentUserOrgProject.-from app.crud import DocumentCrud, CollectionCrud, get_project_by_id +from app.crud import DocumentCrud, CollectionCrud
12-13: Remove unusedCurrentUserProject-scoped deps are used everywhere.
-from app.api.deps import CurrentUser, SessionDep, CurrentUserOrgProject +from app.api.deps import SessionDep, CurrentUserOrgProject
23-24: Prefer built-in generics (list) overListAlign with PEP 585 and silence UP006.
- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],
80-81: Remove unused variabledataResult isn’t used; call for side-effects only.
- data = c_crud.delete(document, a_crud) + c_crud.delete(document, a_crud)
100-105: Remove unusedprojectfetch; not needed hereAvoid dead code.
- project = get_project_by_id(session=session, project_id=current_user.project_id) a_crud = OpenAIAssistantCrud(client) d_crud = DocumentCrud(session, current_user.project_id) c_crud = CollectionCrud(session, current_user.id) storage = get_cloud_storage(session=session, project_id=current_user.project_id)backend/app/core/cloud/storage.py (3)
16-17: Remove unused importUserProjectOrgSilence Ruff F401.
-from app.models import UserProjectOrg
139-141: Ensure POSIX S3 keys; avoid OS-specific separatorsPath() may emit backslashes on Windows. Use PurePosixPath for S3 keys.
- def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: - key = Path(self.storage_path, file_path) + def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: + from pathlib import PurePosixPath + key = PurePosixPath(self.storage_path) / PurePosixPath(str(file_path))
139-141: Align base/derived method signatures forputBase accepts
basename: str; derived expectsfile_path: Path. Unify to Path + return type.Apply in base class (outside this hunk):
- def put(self, source: UploadFile, basename: str): + def put(self, source: UploadFile, file_path: Path) -> "SimpleStorageName": raise NotImplementedError()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
backend/app/api/routes/collections.py(4 hunks)backend/app/api/routes/documents.py(6 hunks)backend/app/core/cloud/__init__.py(1 hunks)backend/app/core/cloud/storage.py(7 hunks)backend/app/crud/project.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
backend/app/crud/project.py (4)
backend/app/models/project.py (2)
Project(29-60)ProjectPublic(62-66)backend/app/tests/crud/test_project.py (2)
test_get_project_by_id(53-60)test_get_non_existent_project(118-121)backend/app/api/routes/project.py (3)
update_project(76-91)read_project(59-67)delete_project(100-111)backend/app/crud/openai_conversation.py (1)
get_conversation_by_id(11-23)
backend/app/core/cloud/__init__.py (1)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(249-259)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
backend/app/api/routes/documents.py (5)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/models/message.py (1)
Message(5-6)backend/app/core/cloud/storage.py (8)
get_cloud_storage(249-259)put(118-119)put(139-164)delete(130-131)delete(231-246)client(29-41)get_signed_url(127-128)get_signed_url(204-229)
backend/app/api/routes/collections.py (3)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(249-259)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/crud/document.py (1)
DocumentCrud(14-135)
🪛 Ruff (0.12.2)
backend/app/core/cloud/__init__.py
6-6: .storage.get_cloud_storage imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/core/cloud/storage.py
16-16: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
backend/app/api/routes/documents.py
6-6: fastapi.HTTPException imported but unused
Remove unused import: fastapi.HTTPException
(F401)
12-12: app.api.deps.CurrentUser imported but unused
Remove unused import: app.api.deps.CurrentUser
(F401)
23-23: Use list instead of List for type annotation
Replace with list
(UP006)
80-80: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
100-100: Local variable project is assigned to but never used
Remove assignment to unused variable project
(F841)
backend/app/api/routes/collections.py
27-27: app.models.Document imported but unused
Remove unused import: app.models.Document
(F401)
426-426: Use list instead of List for type annotation
Replace with list
(UP006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/crud/project.py (1)
37-39: LGTM: switch to session.get() is correct and simplerThe change improves clarity and leverages the identity map semantics. No issues.
backend/app/api/routes/documents.py (3)
31-33: LGTM: per-project scoping for listing is correct
DocumentCrud(session, current_user.project_id)ensures project-level isolation.
46-52: LGTM: storage + CRUD correctly scoped per projectKey construction uses project storage path; CRUD sets
project_idon new docs.
135-139: LGTM: signed-URL gating via query flag is cleanPublic projection + on-demand presign is appropriate.
backend/app/core/cloud/storage.py (1)
248-260: LGTM: project-scoped storage factoryValidates project, propagates storage_path, returns configured instance.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
backend/app/api/routes/documents.py (1)
100-105: “Permanent delete” still soft-deletes in DB; implement hard-delete and remove unusedproject.S3 object is removed, but
d_crud.deleteonly setsis_deleted=True. Provide apurgethat actually deletes the DB row; also drop the unusedprojectvariable.- project = get_project_by_id(session=session, project_id=current_user.project_id) @@ - d_crud.delete(doc_id) + d_crud.purge(doc_id)Add to
backend/app/crud/document.py:def purge(self, doc_id: UUID) -> None: document = self.session.get(Document, doc_id) if document is None or document.project_id != self.project_id: raise HTTPException(status_code=404, detail="Document not found") self.session.delete(document) self.session.commit() logger.info( f"[DocumentCrud.purge] Document permanently deleted | " f"{{'doc_id': '{doc_id}', 'project_id': {self.project_id}}}" )Also applies to: 111-111
🧹 Nitpick comments (4)
backend/app/api/routes/documents.py (4)
6-6: Prune unused imports (ruff F401).Remove
HTTPExceptionandCurrentUser.-from fastapi import APIRouter, File, UploadFile, Query, HTTPException +from fastapi import APIRouter, File, UploadFile, Query-from app.api.deps import CurrentUser, SessionDep, CurrentUserOrgProject +from app.api.deps import SessionDep, CurrentUserOrgProjectAlso applies to: 12-12
23-23: Modernize typing (UP006).Use built-in
listin the response model.- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],
9-9: Remove now-unused import after droppingprojectlookup.
get_project_by_idbecomes unused; prune it.-from app.crud import DocumentCrud, CollectionCrud, get_project_by_id +from app.crud import DocumentCrud, CollectionCrud
127-139: Signed URL integration looks correct; consider optional TTL parameter.Current default (1h) is fine; optionally expose
expires_in: int = Query(3600, ge=60, le=86400)and pass it toget_signed_url.- include_url: bool = Query( - False, description="Include a signed URL to access the document" - ), + include_url: bool = Query(False, description="Include a signed URL to access the document"), + expires_in: int = Query(3600, ge=60, le=86400, description="Signed URL TTL (seconds)"), @@ - if include_url: + if include_url: storage = get_cloud_storage(session=session, project_id=current_user.project_id) - doc_schema.signed_url = storage.get_signed_url(document.object_store_url) + doc_schema.signed_url = storage.get_signed_url(document.object_store_url, expires_in=expires_in)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/api/routes/documents.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/api/routes/documents.py (6)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/document.py (2)
Document(19-34)DocumentPublic(37-47)backend/app/models/message.py (1)
Message(5-6)backend/app/utils.py (2)
APIResponse(27-48)success_response(34-37)backend/app/core/cloud/storage.py (8)
get_cloud_storage(249-259)put(118-119)put(139-164)delete(130-131)delete(231-246)client(29-41)get_signed_url(127-128)get_signed_url(204-229)
🪛 Ruff (0.12.2)
backend/app/api/routes/documents.py
6-6: fastapi.HTTPException imported but unused
Remove unused import: fastapi.HTTPException
(F401)
12-12: app.api.deps.CurrentUser imported but unused
Remove unused import: app.api.deps.CurrentUser
(F401)
23-23: Use list instead of List for type annotation
Replace with list
(UP006)
80-80: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
100-100: Local variable project is assigned to but never used
Remove assignment to unused variable project
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/api/routes/documents.py (3)
27-33: Project-scoped listing looks good.
CurrentUserOrgProject+DocumentCrud(session, current_user.project_id)is correct.
113-115: LGTM on message phrasing.Sentence case “Document permanently deleted successfully” matches previous nitpick.
46-56: Persistingstr(object_store_url)is correct. SimpleStorageName.str returns the canonicals3://bucket/keyURL viaurlunparse(self.to_url()), so no change is needed.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/core/cloud/storage.py (1)
95-104: Fix S3 URI construction: missing leading “/” and None fields in ParseResult.Without a leading slash,
str(SimpleStorageName(...))can render likes3://bucketkey. Also, defaultingparams/query/fragmenttoNoneis unnecessary—use empty strings.Apply:
def to_url(self): - kwargs = { - "scheme": "s3", - "netloc": self.Bucket, - "path": self.Key, - } - for k in ParseResult._fields: - kwargs.setdefault(k) + kwargs = { + "scheme": "s3", + "netloc": self.Bucket, + "path": "/" + self.Key.lstrip("/"), + } + for k in ParseResult._fields: + kwargs.setdefault(k, "") return ParseResult(**kwargs)
♻️ Duplicate comments (1)
backend/app/core/cloud/storage.py (1)
16-18: Remove unused import.
UserProjectOrgis unused and flagged by Ruff F401.Apply:
-from app.models import UserProjectOrg
🧹 Nitpick comments (1)
backend/app/core/cloud/storage.py (1)
155-162: Seek to start before uploading to avoid partial uploads.Apply:
try: - self.aws.client.upload_fileobj( + source.file.seek(0) + self.aws.client.upload_fileobj( source.file,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/core/cloud/storage.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
🪛 Ruff (0.12.2)
backend/app/core/cloud/storage.py
17-17: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
backend/app/core/cloud/storage.py (3)
114-122: Abstractputsignature mismatches implementation — align to Path and return type.ABC takes
(basename: str) -> SimpleStorageNamewhile the impl uses(file_path: Path) -> SimpleStorageName. This breaks LSP and confuses callers.@abstractmethod - def put(self, source: UploadFile, basename: str) -> SimpleStorageName: - """Upload a file to storage""" + def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: + """Upload a file to storage at the given relative file_path and return its storage name""" passRun to verify no lingering call sites use the old API:
#!/bin/bash set -euo pipefail # Find any abstract/override still using basename or -> str rg -nP 'def\s+put\s*\([^)]*\bbasename\b' -C2 rg -nP 'def\s+put\s*\([^)]*\)\s*->\s*str' -C2 # Calls passing basename= or a string literal as 2nd arg (likely old API) rg -nP '\bput\s*\([^)]*basename\s*=' -C2 || true rg -nP '\bput\s*\(\s*[^,]+,\s*["\']' -n -C2 || true
217-235: Validateexpires_in(AWS limit is 7 days) to prevent boto ParamValidationError.Reject values outside 1–604800 seconds with a clear error.
def get_signed_url(self, url: str, expires_in: int = 3600) -> str: @@ - name = SimpleStorageName.from_url(url) + if not isinstance(expires_in, int) or not (1 <= expires_in <= 604800): + raise ValueError("expires_in must be an integer between 1 and 604800 seconds (7 days).") + name = SimpleStorageName.from_url(url)Check for call sites passing excessive TTLs:
#!/bin/bash set -euo pipefail rg -nP '\bget_signed_url\s*\([^)]*expires_in\s*=\s*(\d+)' | \ python - << 'PY' import re,sys for line in sys.stdin: m = re.search(r'expires_in\s*=\s*(\d+)', line) if m and int(m.group(1))>604800: print("[exceeds 7 days] ", line.strip()) PY
262-272: Fail fast whenstorage_pathis missing (unmigrated project).Avoid constructing a client with an empty/None root.
storage_path = project.storage_path + if not storage_path: + raise ValueError(f"Project {project_id} has no storage_path configured") return AmazonCloudStorage(project_id=project_id, storage_path=storage_path)
🧹 Nitpick comments (2)
backend/app/core/cloud/storage.py (2)
17-17: Remove unused import.
UserProjectOrgisn’t used; drop it to satisfy Ruff F401 and reduce noise.-from app.models import UserProjectOrg
157-164: Default Content-Type to octet-stream when missing.
UploadFile.content_typecan be None; boto will error or store without type. Set a sane default.self.aws.client.upload_fileobj( source.file, - ExtraArgs={ - "ContentType": source.content_type, - }, + ExtraArgs={"ContentType": (source.content_type or "application/octet-stream")}, **kwargs, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/core/cloud/storage.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
🪛 Ruff (0.12.2)
backend/app/core/cloud/storage.py
17-17: app.models.UserProjectOrg imported but unused
Remove unused import: app.models.UserProjectOrg
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/core/cloud/storage.py (1)
167-169: Good: project-scoped telemetry with masked bucket/key.Logging now consistently includes
project_idand masks sensitive values.Also applies to: 186-189, 205-208, 250-252
backend/app/tests/crud/documents/test_crud_document_read_one.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/core/cloud/storage.py (2)
92-104: Fix S3 URL formatting and POSIX key handling.
to_urlshould ensure a leading slash;from_urlshould normalize to POSIX. Current code risks malformeds3://bucketkeyand backslashes on Windows.Apply:
class SimpleStorageName: @@ - def to_url(self): - kwargs = { - "scheme": "s3", - "netloc": self.Bucket, - "path": self.Key, - } - for k in ParseResult._fields: - kwargs.setdefault(k) - return ParseResult(**kwargs) + def to_url(self): + path = self.Key if str(self.Key).startswith("/") else f"/{self.Key}" + kwargs = {"scheme": "s3", "netloc": self.Bucket, "path": path} + for k in ParseResult._fields: + kwargs.setdefault(k, "") + return ParseResult(**kwargs) @@ - def from_url(cls, url: str): - url = urlparse(url) - path = Path(url.path) - if path.is_absolute(): - path = path.relative_to(path.root) - return cls(Bucket=url.netloc, Key=str(path)) + def from_url(cls, url: str): + url = urlparse(url) + key = PurePosixPath(url.path.lstrip("/")).as_posix() + return cls(Bucket=url.netloc, Key=key)And update the import:
from pathlib import Path, PurePosixPathAlso applies to: 105-112
44-85: Robust bucket checks: avoidint()on error code; handleus-east-1; catch ParamValidationError.Casting
Error.Codeto int is brittle (e.g., "NoSuchBucket"). Also, omitCreateBucketConfigurationinus-east-1.Apply:
- except ValueError as err: + except (ValueError, ParamValidationError) as err: @@ - except ClientError as err: - response = int(err.response["Error"]["Code"]) - if response != 404: + except ClientError as err: + code = str(err.response.get("Error", {}).get("Code", "")) + if code not in ("404", "NoSuchBucket"): @@ - self.client.create_bucket( - Bucket=settings.AWS_S3_BUCKET, - CreateBucketConfiguration={ - "LocationConstraint": settings.AWS_DEFAULT_REGION, - }, - ) + create_kwargs = {"Bucket": settings.AWS_S3_BUCKET} + if settings.AWS_DEFAULT_REGION and settings.AWS_DEFAULT_REGION != "us-east-1": + create_kwargs["CreateBucketConfiguration"] = { + "LocationConstraint": settings.AWS_DEFAULT_REGION + } + self.client.create_bucket(**create_kwargs)Add (if not already present):
from botocore.exceptions import ClientError, ParamValidationError
♻️ Duplicate comments (4)
backend/app/core/cloud/storage.py (4)
16-17: Remove unused import and keep imports lean.
UserProjectOrgis not used in this module.Apply:
-from app.models import UserProjectOrg
262-273: Fail fast if project has nostorage_path.Older/missing rows will otherwise produce confusing errors downstream.
Apply:
storage_path = project.storage_path + if not storage_path: + raise ValueError(f"Project {project_id} has no storage_path configured") return AmazonCloudStorage(project_id=project_id, storage_path=storage_path)
150-155: Harden key construction: block traversal segments and keep keys relative.Reject
./..path segments to avoid odd S3 keys and potential traversal issues. You already reject absolute paths and normalize to POSIX—good.Apply:
def put(self, source: UploadFile, file_path: Path) -> SimpleStorageName: if file_path.is_absolute(): raise ValueError("file_path must be relative to the project's storage root") + if any(part in (".", "..") for part in file_path.parts): + raise ValueError("file_path must not contain '.' or '..' segments'") key = Path(self.storage_path) / file_path destination = SimpleStorageName(key.as_posix())
217-236: Validateexpires_inand surface clear errors for presigned URLs.Constrain to AWS limits (1–604800s) and catch
ParamValidationError.Apply:
def get_signed_url(self, url: str, expires_in: int = 3600) -> str: @@ - name = SimpleStorageName.from_url(url) + if not isinstance(expires_in, int) or not (1 <= expires_in <= 604800): + raise ValueError("expires_in must be an integer between 1 and 604800 seconds (7 days).") + name = SimpleStorageName.from_url(url) try: signed_url = self.aws.client.generate_presigned_url( "get_object", Params={"Bucket": name.Bucket, "Key": name.Key}, ExpiresIn=expires_in, ) @@ - except ClientError as err: + except (ClientError, ParamValidationError) as err: logger.error( f"[AmazonCloudStorage.get_signed_url] AWS presign error | " f"{{'project_id': '{self.project_id}', 'bucket': '{mask_string(name.Bucket)}', 'key': '{mask_string(name.Key)}', 'error': '{str(err)}'}}", exc_info=True, ) raise CloudStorageError(f'AWS Error: "{err}" ({url})') from errAlso add the missing import:
from botocore.exceptions import ClientError, ParamValidationError
🧹 Nitpick comments (1)
backend/app/core/cloud/storage.py (1)
157-164: Set a safe default ContentType to avoid boto ParamValidationError.
UploadFile.content_typecan be empty; boto3 expects a string.Apply:
self.aws.client.upload_fileobj( source.file, ExtraArgs={ - "ContentType": source.content_type, + "ContentType": source.content_type or "application/octet-stream", }, **kwargs, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/core/cloud/storage.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/cloud/storage.py (3)
backend/app/crud/project.py (1)
get_project_by_id(37-38)backend/app/models/user.py (1)
UserProjectOrg(65-66)backend/app/utils.py (1)
mask_string(158-167)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/core/cloud/storage.py (2)
166-168: LGTM: Project-scoped logging.Including
project_idand masking bucket/key improves traceability without leaking secrets.Also applies to: 171-173
179-196: LGTM: S3 interactions and project-scoped logging.
stream,get_file_size_kb, anddeletelook correct and consistent with the new project-based design.Also applies to: 197-216, 244-259
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
backend/app/api/routes/documents.py (2)
100-114: Permanent delete currently soft-deletes; call purge + fix CollectionCrud argEnsure DB row is removed after S3 delete and collections cleanup. Also pass project_id to CollectionCrud.
- d_crud = DocumentCrud(session, current_user.project_id) - c_crud = CollectionCrud(session, current_user.id) + d_crud = DocumentCrud(session, current_user.project_id) + c_crud = CollectionCrud(session, current_user.project_id) @@ - storage.delete(document.object_store_url) - d_crud.delete(doc_id) + storage.delete(document.object_store_url) + d_crud.purge(doc_id)
76-84: Fix scope: use project_id for collections; drop unused var; sentence case message
- Pass project_id to CollectionCrud (project-scoped).
- Remove unused variable (ruff F841).
- Standardize message casing.
- d_crud = DocumentCrud(session, current_user.project_id) - c_crud = CollectionCrud(session, current_user.id) + d_crud = DocumentCrud(session, current_user.project_id) + c_crud = CollectionCrud(session, current_user.project_id) @@ - document = d_crud.delete(doc_id) - data = c_crud.delete(document, a_crud) + document = d_crud.delete(doc_id) + c_crud.delete(document, a_crud) @@ - return APIResponse.success_response( - Message(message="Document Deleted Successfully") - ) + return APIResponse.success_response(Message(message="Document deleted successfully"))
🧹 Nitpick comments (2)
backend/app/crud/document.py (1)
73-96: Edge case + message wording in read_each
- Short-circuit empty doc_ids to avoid generating a no-op query.
- Fix wording (“at least”, pluralize “documents”) for clearer errors.
- def read_each(self, doc_ids: list[UUID]): + def read_each(self, doc_ids: list[UUID]): + if not doc_ids: + return [] @@ - raise ValueError( - f"Requested atleast {requested_count} document retrieved {retrieved_count}" - ) + raise ValueError( + f"Requested at least {requested_count} documents; retrieved {retrieved_count}" + )backend/app/api/routes/documents.py (1)
23-33: Optional: explicitly coerce to DocumentPublicFastAPI can coerce, but making it explicit keeps parity with info() and avoids surprises.
- data = crud.read_many(skip, limit) - return APIResponse.success_response(data) + docs = crud.read_many(skip, limit) + public = [DocumentPublic.model_validate(d, from_attributes=True) for d in docs] + return APIResponse.success_response(public)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
backend/app/api/routes/documents.py(6 hunks)backend/app/crud/document.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/crud/document.py (4)
backend/app/crud/collection.py (3)
read_one(84-93)delete(107-115)CollectionCrud(17-146)backend/app/models/document.py (1)
Document(19-34)backend/app/tests/crud/documents/test_crud_document_update.py (2)
TestDatabaseUpdate(15-67)test_update_respects_owner(57-67)backend/app/alembic/versions/c43313eca57d_add_document_tables.py (1)
upgrade(20-36)
backend/app/api/routes/documents.py (6)
backend/app/crud/document.py (4)
DocumentCrud(14-135)read_many(37-71)delete(125-135)read_one(19-35)backend/app/crud/collection.py (3)
CollectionCrud(17-146)delete(107-115)read_one(84-93)backend/app/models/document.py (3)
Document(19-34)DocumentPublic(37-47)Document(10-30)backend/app/models/message.py (1)
Message(5-6)backend/app/utils.py (3)
APIResponse(27-48)get_openai_client(170-200)success_response(34-37)backend/app/core/cloud/storage.py (7)
get_cloud_storage(262-272)put(120-122)put(150-177)delete(140-142)delete(244-259)get_signed_url(135-137)get_signed_url(217-242)
🪛 Ruff (0.12.2)
backend/app/api/routes/documents.py
80-80: Local variable data is assigned to but never used
Remove assignment to unused variable data
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/crud/document.py (3)
15-25: Good: project-scoped access + soft-delete filter applied in lookupsConstructor and read_one correctly scope by project_id and exclude soft-deleted rows.
39-71: Input validation on pagination looks soundNegative skip/limit checks with structured logging are fine.
98-114: Ownership enforcement on update is correctPopulate missing project_id and reject mismatches; timestamp update is in place.
backend/app/api/routes/documents.py (2)
46-58: Upload path looks goodPer-project storage via get_cloud_storage and deferring project_id set to CRUD is fine.
130-139: LGTM: signed URL opt-in on info()The from_attributes conversion and conditional presign are correct.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
backend/app/tests/crud/documents/test_crud_document_read_one.py (3)
15-16: Avoid flaky fixture: create a dedicated test project instead of fetching any active one.get_project(db) can raise when no active projects exist and may couple this test to seeded data. Prefer an isolated test project.
Apply:
- project = get_project(db) - return DocumentStore(db, project.id) + project = create_test_project(db) + return DocumentStore(db, project.id)Additionally, to avoid cross-test interference, scope DocumentStore.clear to the project (change in backend/app/tests/utils/document.py):
class DocumentStore: def __init__(self, db: Session, project_id: int): self.db = db self.documents = DocumentMaker(project_id=project_id, session=db) - self.clear(self.db) + self.clear(self.db, project_id) - @staticmethod - def clear(db: Session): - db.exec(delete(Document)) + @staticmethod + def clear(db: Session, project_id: int | None = None): + if project_id is None: + db.exec(delete(Document)) + else: + db.exec(delete(Document).where(Document.project_id == project_id)) db.commit()
31-31: OK; consider a simpler invalid-ID path.Use a random UUID instead of generating an unpersisted Document to reduce coupling to DocumentMaker internals.
Apply:
+from uuid import uuid4 @@ - document = next(store.documents) + random_id = uuid4() @@ - crud.read_one(document.id) + crud.read_one(random_id)
43-43: Correct: enforce project boundary.Using other_project.id validates 404 on cross-project reads. Consider adding a companion test for is_deleted=True returning 404.
backend/app/tests/crud/documents/test_crud_document_delete.py (4)
9-10: Prefer deterministic project creation to avoid env-coupled flakesRelying on get_project(db) assumes an active project already exists. Using create_test_project(db) makes tests self-contained and stable across environments.
Apply:
-from app.tests.utils.utils import get_project from app.tests.utils.test_data import create_test_project- project = get_project(db) + project = create_test_project(db)- project = get_project(db) + project = create_test_project(db)
16-21: Initialize CRUD with the known project id, not the generated document’sPassing project.id is clearer and avoids coupling the test to DocumentMaker’s behavior of setting document.project_id.
- crud = DocumentCrud(db, document.project_id) + crud = DocumentCrud(db, project.id)
38-43: Strengthen the negative test by asserting no side effects on the source docAfter the 404, re-fetch the original doc and assert it wasn’t mutated.
with pytest.raises(HTTPException) as exc_info: crud.delete(document.id) assert exc_info.value.status_code == 404 assert "Document not found" in str(exc_info.value.detail) + # Verify no side effects on the original document + fresh = db.exec(select(Document).where(Document.id == document.id)).one() + assert fresh.is_deleted is False
27-33: Add a targeted read_one test to lock in soft-delete semanticsValidate that CRUD reads exclude deleted documents by asserting 404 on read_one after deletion.
class TestDatabaseDelete: @@ def test_delete_marks_deleted(self, document: Document): assert document.is_deleted is True + + def test_read_one_404_after_delete(self, db: Session): + project = create_test_project(db) + store = DocumentStore(db, project.id) + doc = store.put() + crud = DocumentCrud(db, project.id) + crud.delete(doc.id) + with pytest.raises(HTTPException) as exc_info: + crud.read_one(doc.id) + assert exc_info.value.status_code == 404
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
backend/app/tests/crud/documents/test_crud_document_delete.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_read_one.py(2 hunks)backend/app/tests/crud/documents/test_crud_document_update.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/tests/crud/documents/test_crud_document_update.py
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/tests/crud/documents/test_crud_document_delete.py (5)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/test_data.py (1)
create_test_project(34-50)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/models/document.py (1)
Document(19-34)
backend/app/tests/crud/documents/test_crud_document_read_one.py (4)
backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/test_data.py (1)
create_test_project(34-50)backend/app/tests/utils/document.py (3)
project(67-68)DocumentStore(55-82)put(70-75)backend/app/crud/document.py (2)
DocumentCrud(14-135)read_one(19-35)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/tests/crud/documents/test_crud_document_read_one.py (3)
8-9: Imports align with project-scoped tests.These imports are appropriate for the new project-based organization.
23-23: Correct project scoping in CRUD instantiation.Passing store.project.id ensures reads are constrained to the project.
41-41: Good isolation for cross-project access.Creating a separate project for this check is the right approach.
backend/app/tests/crud/documents/test_crud_document_delete.py (2)
31-33: Good assertion on soft-delete flagAsserting is_deleted is True verifies the core behavior post-delete.
17-18: Be cautious: DocumentStore.clear() purges all documentsBecause DocumentStore.init calls clear(), creating a store inside a test wipes all documents in the DB. Ensure tests run in isolated transactions or a per-test database to avoid cross-test interference, especially under parallel runs.
If isolation isn’t guaranteed, consider moving the clear() call into a dedicated fixture with explicit scope or using transaction rollbacks per test.
Also applies to: 39-39
Summary
Target issuen: #85
This PR refactors the document module and cloud storage system to organize files by project rather than user, adds support for generating signed URLs for secure file access, and implements a project-based storage path structure.
Database Schema Updates
Storage Architecture Refactor
API Improvements
Migration Impact
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
Refactor
Chores
Tests