Celery : moving collections module to celery#385
Conversation
1) Move models from route to models folder 2) fix process response to handle callback response
…tructured response
…nd error handling
…for job execution
WalkthroughAdds DB-backed CollectionJob lifecycle and CRUD; moves collection create/delete into background services with start_job/execute_job; scopes collections to projects (removes owner_id); introduces new models, helpers, migrations, docs, routes, and tests to support job-based polling, callbacks, and error normalization. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant API as Collections API
participant Jobs as CollectionJobCrud
participant BG as Scheduler
participant Exec as Worker.execute_job
participant DB
participant OA as OpenAI/VectorStore
rect rgba(200,240,200,0.18)
Client->>API: POST /collections/create (CreationRequest)
API->>Jobs: create(job PENDING, action=CREATE, project_id)
API->>BG: schedule execute_job(job_id, payload, project_id, org_id)
API-->>Client: APIResponse { key: job_id }
end
rect rgba(220,220,255,0.12)
BG->>Exec: invoke execute_job(job_id,...)
Exec->>DB: open session
Exec->>Jobs: update(job_id, status=PROCESSING, task_id)
Exec->>OA: create vector store, upload & batch documents
Exec->>DB: create Collection, link docs, create assistant
Exec->>Jobs: update(job_id, status=SUCCESSFUL, collection_id)
Exec-->>Client: optional webhook callback (SUCCESS)
end
alt failure
Exec->>Jobs: update(job_id, status=FAILED, error_message)
Exec-->>Client: optional webhook callback (FAIL)
end
sequenceDiagram
autonumber
participant Client
participant API as Collection Job API
participant Jobs as CollectionJobCrud
participant Cols as CollectionCrud
Client->>API: GET /collections/info/jobs/{job_id}
API->>Jobs: read_one(job_id)
alt job.status == SUCCESSFUL and job.collection_id
API->>Cols: read_one(collection_id)
API-->>Client: APIResponse { data: CollectionPublic (embedded) }
else job.status == FAILED
API-->>Client: APIResponse { data: CollectionJobPublic (normalized error) }
else
API-->>Client: APIResponse { data: CollectionJobPublic (PENDING/PROCESSING) }
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…the shift from user id to project id
… request metadata
…erate_response function
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
46-46: Duplicate: task_id should be TEXT/VARCHAR, not UUID.As flagged in a previous review comment, Celery task IDs are strings (not UUIDs), so this column type is incorrect. This issue was already raised but has not been addressed.
Apply this diff:
- sa.Column("task_id", sa.Uuid(), nullable=True), + sa.Column("task_id", sa.String(), nullable=True),
46-46: Critical: task_id column type should be String, not UUID.Celery generates task IDs as strings (typically 36-character hyphenated strings like
"a5f3c8d2-1b4e-4f9a-8c7d-2e5f6a1b8c9d"), not as native UUID types. Usingsa.Uuid()will cause type mismatch errors when storing Celery task IDs.Apply this diff:
- sa.Column("task_id", sa.Uuid(), nullable=True), + sa.Column("task_id", sa.String(), nullable=True),Note: This issue was previously flagged by @coderabbitai[bot] and remains unresolved.
🧹 Nitpick comments (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)
37-38: Add explicit ENUM type cleanup in downgrade.The
upgrade()function creates two ENUM types (lines 37-38), but thedowngrade()function (line 94) doesn't explicitly drop them. While PostgreSQL may automatically clean up unused ENUMs when the table is dropped, explicit cleanup ensures complete and predictable reversal.Add ENUM cleanup to the downgrade function:
op.alter_column("collection", "inserted_at", new_column_name="created_at") op.drop_table("collection_jobs") + + # Explicitly drop ENUM types created in upgrade + collection_action_type.drop(op.get_bind(), checkfirst=True) + collection_job_status_enum.drop(op.get_bind(), checkfirst=True)Note: Use
checkfirst=Trueto avoid errors if the types are still referenced elsewhere or have already been dropped.Also applies to: 94-94
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py(1 hunks)backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
upgrade(36-61)downgrade(64-94)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (2)
upgrade(20-26)downgrade(29-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)
57-61:project_idis already oncollection; removingowner_idis safe
Migration 3389c67fdcb4_add_alter_columns_in_collections_table.py adds a nullableproject_idcolumn tocollection, so droppingowner_idhere aligns with project-scoped ownership.backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
20-30: Verify permanent data loss manually.Automatic script execution failed (sqlalchemy unavailable). Run a SQL query (e.g., via psql):
psql $DATABASE_URL -c "SELECT status, COUNT(*) FROM collection WHERE status IN ('processing','failed') GROUP BY status;"Confirm that deleting these rows is acceptable. If any data must be retained, archive it to a separate table or backup before applying this migration.
...end/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py
Outdated
Show resolved
Hide resolved
...end/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py
Show resolved
Hide resolved
...end/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py
Show resolved
Hide resolved
.../app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py
Outdated
Show resolved
Hide resolved
.../app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py
Show resolved
Hide resolved
.../app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py
Show resolved
Hide resolved
kartpop
left a comment
There was a problem hiding this comment.
general comments
- is the
__init__.pyin services/collections required?
| description=load_description("collections/job_info.md"), | ||
| response_model=APIResponse, | ||
| ) | ||
| def collection_job_info( |
There was a problem hiding this comment.
let us keep this endpoint in a separate file to be consistent across the codebase; we already have separate files for documents.py and doc_transformation_job.py
| exc_info=True, | ||
| ) | ||
|
|
||
| if "assistant" in locals(): |
There was a problem hiding this comment.
any reason to rely on locals()? simpler way would be to check for assistant and assistant.id and _backout if it does
"locals() is not reliable for checking variable existence - it reflects the state at the time it's called, but the interpreter may optimize variables differently"
There was a problem hiding this comment.
yes simpler way would be to check if we have an assistant id and if we do, then we go ahead and do the cleanup. however i would have to spend a little more time on that as it is not that straightforward. will cover it in the assistant cleanup pr i will be making for v2
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/collection/collection.py (1)
67-85: Creation flow is incorrect: broad except swallows duplicates and other errors.If read_one finds an existing collection, you raise FileExistsError but immediately catch it and still add, causing IntegrityError. Also using read_one for existence is wrong and not scoped to service fields.
- try: - existing = self.read_one(collection.id) - - raise FileExistsError("Collection already present") - except: - self.session.add(collection) - self.session.commit() + # Idempotency and scope checks + if self._exists(collection): + raise FileExistsError("Collection already present for this project and service") + if not collection.project_id: + collection.project_id = self.project_id + self.session.add(collection) + self.session.commit() + self.session.refresh(collection)
♻️ Duplicate comments (9)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (2)
4-4: Docstring inconsistency has been resolved.The docstring now correctly states "Revises: c6fb6d0b5897", which matches the
down_revisionvalue on line 15. This resolves the critical issue flagged in previous reviews.
14-15: Migration revision chain is now correct.The
down_revisionis properly set to"c6fb6d0b5897", ensuring this migration runs after c6fb6d0b5897 and before b30727137e65 (which drops the status column). The migration order is now consistent.backend/app/models/collection.py (1)
79-79: Typo: “compatable” → “compatible”.- "must be compatable with the assistants API; see the " + "must be compatible with the assistants API; see the "backend/app/services/collections/create_collection.py (2)
187-189: Avoidlocals()for existence checks.Initialize
assistantto None and check explicitly.- if "assistant" in locals(): - _backout(assistant_crud, assistant.id) + if assistant is not None: + _backout(assistant_crud, assistant.id)Add near line 119 (before the inner try):
+ assistant = None
197-207: Fix outer exception handler: undefinederr, closed session, and ensure job is marked FAILED.Use
outer_err, reopen a session, and send callback failure. Based on learningsexcept Exception as outer_err: logger.error( "[create_collection.execute_job] Unexpected Error during collection creation: %s", str(outer_err), exc_info=True, ) - - collection_job.status = CollectionJobStatus.FAILED - collection_job.updated_at = now() - collection_job.error_message = str(err) - collection_job_crud.update(collection_job.id, collection_job) + # Best-effort: update job state in a fresh session + with Session(engine) as session: + job_crud = CollectionJobCrud(session, project_id) + try: + job = job_crud.read_one(UUID(job_id) if isinstance(job_id, str) else job_id) + except Exception: + logger.exception( + "[create_collection.execute_job] Unable to mark collection job as failed | {'job_id': '%s'}", + str(job_id), + ) + else: + job.status = CollectionJobStatus.FAILED + job.error_message = str(outer_err) + job_crud.update(job.id, job) + if 'callback' in locals(): + callback.fail(str(outer_err))backend/app/tests/api/routes/collections/test_create_collections.py (1)
2-5: Remove duplicatepatchimport.Keep a single import to satisfy Ruff F811.
-from unittest.mock import patch - from fastapi.testclient import TestClient from unittest.mock import patch +from uuid import UUIDbackend/app/models/collection_job.py (2)
71-77: Align update schema with actual types and usage.
task_idmust bestr | None(Celery task IDs are strings).- Prior feedback: consider including
idandproject_idor change CRUD to acceptCollectionJob. Current CRUD ignores payload but type hints are misleading. Based on learnings-class CollectionJobUpdate(SQLModel): - task_id: UUID | None = None - status: CollectionJobStatus - error_message: str | None = None - collection_id: UUID | None = None - - updated_at: datetime +class CollectionJobUpdate(SQLModel): + # Optional: include identifiers if CRUD expects them at type-level + # id: UUID + # project_id: int + task_id: str | None = None + status: CollectionJobStatus + error_message: str | None = None + collection_id: UUID | None = None + # Let CRUD set timestamps + updated_at: datetime | None = None
47-49: Maketask_idoptional; it’s not available at creation.As written,
task_idis required and will fail oncreate(). Celery provides it later.- task_id: str = Field(nullable=True) + task_id: str | None = Field(default=None, nullable=True)backend/app/services/collections/delete_collection.py (1)
124-137: Robust failure handling: avoid UnboundLocalError/closed session; fixupdatecall.
- Don’t use
collection_job/collection_job_crudafter session closes.- Current
update(collection_job)has wrong arity.- Log with safe IDs even if
collectionwasn’t loaded. Based on learnings- except Exception as err: - collection_job.status = CollectionJobStatus.FAILED - collection_job.error_message = str(err) - collection_job_crud.update(collection_job) - - logger.error( - "[delete_collection.execute_job] Unexpected error during deletion | " - "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}", - str(collection.id), - str(err), - type(err).__name__, - str(job_id), - exc_info=True, - ) - callback.fail(str(err)) + except Exception as err: + logger.error( + "[delete_collection.execute_job] Unexpected error during deletion | " + "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}", + str(collection_id), + str(err), + type(err).__name__, + str(job_id), + exc_info=True, + ) + # Reopen session to update job state + with Session(engine) as session: + job_crud = CollectionJobCrud(session, project_id) + try: + job = job_crud.read_one(job_id) + except Exception: + logger.exception( + "[delete_collection.execute_job] Unable to mark collection job as failed | {'job_id': '%s'}", + str(job_id), + ) + else: + job.status = CollectionJobStatus.FAILED + job.error_message = str(err) + job_crud.update(job.id, job) + if 'callback' in locals(): + callback.fail(str(err))
🧹 Nitpick comments (21)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
29-30: Document the no-op downgrade.The empty downgrade function is intentional for a data cleanup migration, but consider adding a comment explaining that the deleted data cannot be recovered on rollback.
Apply this diff to document the design decision:
def downgrade(): + # No-op: Data cleanup is irreversible. Deleted rows cannot be restored. passbackend/app/models/collection.py (5)
3-3: Use builtin generics; drop typing.List (UP035).Prefer built‑in collection types with Pydantic v2 / SQLModel. Also remove unused List import.
Based on static analysis hints
-from typing import Any, List, Optional +from typing import Any, Optional @@ - documents: List[UUID] = Field( + documents: list[UUID] = Field(Also applies to: 56-56
31-33: Make timestamps non-nullable for consistency.Align with Organization/Project models; mark inserted_at/updated_at as nullable=False.
Based on learnings
- inserted_at: datetime = Field(default_factory=now) - updated_at: datetime = Field(default_factory=now) + inserted_at: datetime = Field(default_factory=now, nullable=False) + updated_at: datetime = Field(default_factory=now, nullable=False)
68-70: Deduplicate while preserving order.set() reorders UUIDs; use dict.fromkeys to keep stable order.
- def model_post_init(self, __context: Any): - self.documents = list(set(self.documents)) + def model_post_init(self, __context: Any): + # Deduplicate while preserving order + self.documents = list(dict.fromkeys(self.documents))
73-75: Fix typo in comment: "clien" → "client".- # parameters accepted by the OpenAI.clien.beta.assistants.create + # parameters accepted by the OpenAI client (e.g., openai.beta.assistants.create)
110-114: Use Pydantic v2model_fieldsand correct type hint.
__fields__is v1; prefermodel_fieldswith fallback. Accept a model class type.- def extract_super_type(self, cls: "CreationRequest"): - for field_name in cls.__fields__.keys(): - field_value = getattr(self, field_name) - yield (field_name, field_value) + def extract_super_type(self, cls: type[SQLModel]): + # Pydantic v2 uses `model_fields`; keep fallback for compatibility + fields_map = getattr(cls, "model_fields", None) or getattr(cls, "__fields__", {}) + for field_name in fields_map.keys(): + yield field_name, getattr(self, field_name)backend/app/crud/collection/collection.py (5)
5-5: Remove duplicate import.
import loggingis already at Line 1.-import logging
24-30: Clarify error message and fix typo.“attempter” typo and confusing labels; make explicit.
- err = "Invalid collection ownership: owner_project={} attempter={}".format( - self.project_id, - collection.project_id, - ) + err = ( + f"Invalid collection ownership: expected_project={self.project_id}, " + f"collection_project={collection.project_id}" + )
49-65: Existence check should ignore soft-deleted records and log scope.Prevent false positives from soft-deleted collections; enrich log context.
stmt = ( select(Collection.id) .where( - (Collection.project_id == self.project_id) - & (Collection.llm_service_id == collection.llm_service_id) - & (Collection.llm_service_name == collection.llm_service_name) + (Collection.project_id == self.project_id) + & (Collection.llm_service_id == collection.llm_service_id) + & (Collection.llm_service_name == collection.llm_service_name) + & (Collection.deleted_at.is_(None)) ) .limit(1) ) present = self.session.exec(stmt).first() is not None - logger.info( - "[CollectionCrud._exists] Existence check completed | " - f"{{'llm_service_id': '{collection.llm_service_id}', 'exists': {present}}}" - ) + logger.info( + "[CollectionCrud._exists] Existence check | " + f\"{{'project_id': '{self.project_id}', 'llm_service_id': '{collection.llm_service_id}', " + f"'llm_service_name': '{collection.llm_service_name}', 'exists': {present}}}\" + ) return present
135-142: Guard remote delete when llm_service_id is None.Avoid calling remote.delete(None) when the local link is missing.
- remote.delete(model.llm_service_id) + if model.llm_service_id: + remote.delete(model.llm_service_id)
146-158: Scope document-based deletion to project and exclude soft-deleted collections.Prevent cross-project deletions and skip already soft-deleted rows.
statement = ( select(Collection) .join( DocumentCollection, DocumentCollection.collection_id == Collection.id, ) - .where(DocumentCollection.document_id == model.id) + .where( + and_( + DocumentCollection.document_id == model.id, + Collection.project_id == self.project_id, + Collection.deleted_at.is_(None), + ) + ) .distinct() )backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (2)
70-70: Pass UUID, not str, to read_one.Crud expects UUID; avoid implicit casting.
- retrieved_job = collection_job_crud.read_one(str(collection_job.id)) + retrieved_job = collection_job_crud.read_one(collection_job.id)
84-84: Unnecessary commit in test.Helper already commits; this can be removed to speed tests.
- db.commit() + # commit not required herebackend/app/api/routes/collection_job.py (1)
27-29: Prefer|overtyping.Unionin annotations.Cleaner and satisfies Ruff UP007.
- response_model=Union[ - APIResponse[CollectionPublic], APIResponse[CollectionJobPublic] - ], + response_model=APIResponse[CollectionPublic] | APIResponse[CollectionJobPublic],backend/app/services/collections/delete_collection.py (3)
91-101: Set job to PROCESSING when work starts.Mirror create flow; improves observability and client UX.
collection_job = collection_job_crud.read_one(job_id) collection_job.task_id = task_id + collection_job.status = CollectionJobStatus.PROCESSING + collection_job_crud.update(collection_job.id, collection_job)
65-74: Minor: unusedtask_instance.Prefix with underscore to silence Ruff ARG001 and signal intent.
- task_instance, + _task_instance,
66-73: Optional: acceptjob_idas str and cast to UUID internally (Celery passes str).Keeps signatures consistent with create flow.
- task_id: str, - job_id: UUID, + task_id: str, + job_id: str, collection_id: UUID, _task_instance, ) -> None: + # Cast once at the start of the function body: + job_id = UUID(job_id)backend/app/services/collections/create_collection.py (1)
86-86: Minor: unusedtask_instance.Prefix with underscore to silence Ruff ARG001 and signal intent.
- task_instance, + _task_instance,backend/app/api/routes/collections.py (3)
4-4: Swap to builtinlist(and drop unusedUnion).Ruff already points out
typing.List; please switch to the builtinlistannotation and remove the unusedUnionimport while you’re here.
46-47: Lean onCollectionJobStatusfor payload status.We already import
CollectionJobStatus; wiringpayload.status = CollectionJobStatus.PENDING.valuekeeps the metadata aligned with the enum the rest of the flow uses (and avoids ad‑hoc strings like"processing").
80-81: Apply the same enum alignment in delete payloads.Same reasoning as above—set the payload status via
CollectionJobStatus.PENDING.valueinstead of a bare"processing"string so clients see consistent status values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py(1 hunks)backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py(1 hunks)backend/app/api/docs/collections/create.md(1 hunks)backend/app/api/main.py(1 hunks)backend/app/api/routes/collection_job.py(1 hunks)backend/app/api/routes/collections.py(4 hunks)backend/app/crud/collection/collection.py(5 hunks)backend/app/models/collection.py(2 hunks)backend/app/models/collection_job.py(1 hunks)backend/app/services/collections/create_collection.py(1 hunks)backend/app/services/collections/delete_collection.py(1 hunks)backend/app/tests/api/routes/collections/test_create_collections.py(1 hunks)backend/app/tests/crud/collections/collection_job/test_collection_jobs.py(1 hunks)backend/app/tests/services/collections/test_create_collection.py(1 hunks)backend/app/tests/services/collections/test_delete_collection.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/tests/services/collections/test_create_collection.py
- backend/app/tests/services/collections/test_delete_collection.py
- backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py
🧰 Additional context used
🧬 Code graph analysis (10)
backend/app/tests/api/routes/collections/test_create_collections.py (3)
backend/app/models/collection.py (2)
Collection(13-36)CreationRequest(105-113)backend/app/core/cloud/storage.py (1)
client(30-42)backend/app/tests/conftest.py (2)
client(52-55)user_api_key_header(77-79)
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (4)
backend/app/models/collection_job.py (3)
CollectionJob(34-60)CollectionJobStatus(10-14)CollectionActionType(17-19)backend/app/crud/collection/collection_job.py (1)
CollectionJobCrud(19-93)backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/conftest.py (1)
db(24-41)
backend/app/api/routes/collections.py (8)
backend/app/crud/collection/collection.py (3)
CollectionCrud(18-163)read_one(86-110)read_all(112-121)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-93)read_one(24-47)read_all(49-58)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/models/collection.py (4)
ResponsePayload(39-51)CreationRequest(105-113)DeletionRequest(116-117)CollectionPublic(120-129)backend/app/utils.py (3)
APIResponse(29-53)load_description(248-253)success_response(36-39)backend/app/services/collections/helpers.py (1)
extract_error_message(23-46)backend/app/services/collections/create_collection.py (1)
start_job(41-76)backend/app/services/collections/delete_collection.py (1)
start_job(25-62)
backend/app/models/collection.py (3)
backend/app/core/util.py (1)
now(15-16)backend/app/models/organization.py (1)
Organization(34-57)backend/app/models/project.py (1)
Project(29-63)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
upgrade(36-61)downgrade(64-100)
backend/app/api/routes/collection_job.py (6)
backend/app/crud/collection/collection.py (2)
CollectionCrud(18-163)read_one(86-110)backend/app/crud/collection/collection_job.py (1)
CollectionJobCrud(19-93)backend/app/models/collection_job.py (2)
CollectionJobStatus(10-14)CollectionJobPublic(80-87)backend/app/models/collection.py (1)
CollectionPublic(120-129)backend/app/utils.py (3)
APIResponse(29-53)load_description(248-253)success_response(36-39)backend/app/services/collections/helpers.py (1)
extract_error_message(23-46)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/collection/collection.py (4)
CollectionCrud(18-163)create(67-84)read_one(86-110)delete(124-132)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-93)create(76-93)read_one(24-47)update(60-74)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/models/collection.py (3)
Collection(13-36)DeletionRequest(116-117)ResponsePayload(39-51)backend/app/services/collections/helpers.py (8)
SilentCallback(81-88)WebHookCallback(91-114)success(77-78)success(86-88)success(112-114)fail(74-75)fail(82-84)fail(108-110)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/create_collection.py (2)
start_job(41-76)execute_job(79-207)
backend/app/services/collections/create_collection.py (9)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/models/collection.py (6)
now(49-51)Collection(13-36)ResponsePayload(39-51)CreationRequest(105-113)AssistantOptions(72-95)extract_super_type(110-113)backend/app/crud/collection/collection.py (3)
CollectionCrud(18-163)create(67-84)read_one(86-110)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-93)create(76-93)read_one(24-47)update(60-74)backend/app/crud/rag/open_ai.py (2)
OpenAIVectorStoreCrud(101-189)OpenAIAssistantCrud(192-246)backend/app/services/collections/helpers.py (10)
_backout(117-124)batch_documents(50-66)SilentCallback(81-88)WebHookCallback(91-114)success(77-78)success(86-88)success(112-114)fail(74-75)fail(82-84)fail(108-110)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/delete_collection.py (2)
start_job(25-62)execute_job(65-138)
backend/app/crud/collection/collection.py (3)
backend/app/models/collection.py (2)
Collection(13-36)now(49-51)backend/app/models/document_collection.py (1)
DocumentCollection(9-23)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)
backend/app/models/collection_job.py (1)
backend/app/models/collection.py (1)
now(49-51)
🪛 Ruff (0.13.3)
backend/app/tests/api/routes/collections/test_create_collections.py
5-5: Redefinition of unused patch from line 2
Remove definition: patch
(F811)
backend/app/api/routes/collections.py
4-4: typing.List is deprecated, use list instead
(UP035)
backend/app/models/collection.py
3-3: typing.List is deprecated, use list instead
(UP035)
backend/app/api/routes/collection_job.py
27-29: Use X | Y for type annotations
(UP007)
backend/app/services/collections/delete_collection.py
73-73: Unused function argument: task_instance
(ARG001)
backend/app/services/collections/create_collection.py
206-206: Undefined name err
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
20-26: All foreign key constraints referencingcollection.iduse ON DELETE CASCADE. No non-cascading foreign keys found.backend/app/api/main.py (1)
23-24: LGTM — route included.Adding collection_job router looks correct.
Also applies to: 31-31
backend/app/api/docs/collections/create.md (1)
23-26: LGTM — docs aligned with job-based flow.Clear that key refers to collection job ID and polling via job-info endpoint.
Consider linking the exact route (path/method) for “collection job info” here for quick navigation.
backend/app/services/collections/create_collection.py (1)
156-160: Remove duplicate create; it re-adds the same collection.Second
createrisks integrity errors and is redundant.- if flat_docs: - DocumentCollectionCrud(session).create(collection_data, flat_docs) - - collection_crud.create(collection_data) + if flat_docs: + DocumentCollectionCrud(session).create(collection_data, flat_docs)Likely an incorrect or invalid review comment.
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py
Show resolved
Hide resolved
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py
Show resolved
Hide resolved
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
backend/app/services/collections/create_collection.py (1)
195-204: FixNameErrorand handle missing variables in outer exception handler.Line 203 references
err(undefined) instead ofouter_err. Also, if the error occurs beforecollection_job/collection_job_crudare initialized (e.g., during session creation), accessing them raisesNameErrororUnboundLocalError, masking the original error and leaving the job stuck inPENDING.Reopen a fresh session and defensively update the job:
except Exception as outer_err: logger.error( "[create_collection.execute_job] Unexpected Error during collection creation: %s", str(outer_err), exc_info=True, ) - collection_job.status = CollectionJobStatus.FAILED - collection_job.error_message = str(err) - collection_job_crud.update(collection_job.id, collection_job) + with Session(engine) as session: + job_crud = CollectionJobCrud(session, project_id) + try: + job = job_crud.read_one(job_id) + update_data = CollectionJobUpdate( + status=CollectionJobStatus.FAILED, + error_message=str(outer_err), + ) + job_crud.update(job.id, update_data) + except Exception: + logger.exception( + "[create_collection.execute_job] Unable to mark job as failed | " + "{'job_id': '%s'}", + str(job_id), + ) + + if "callback" in locals(): + callback.fail(str(outer_err))Import
CollectionJobUpdateif not already present (same as previous comment).backend/app/services/collections/delete_collection.py (1)
124-138: Fix incorrectupdatecall and handle missing variables.Line 127 calls
updatewith one argument (collection_job) but the signature requires two:job_idandpatch: CollectionJobUpdate. Additionally, if the error occurs before these variables are defined, the handler raisesNameError/UnboundLocalError, masking the root cause.Reopen a session and update defensively:
except Exception as err: + logger.error( + "[delete_collection.execute_job] Unexpected error during deletion | " + "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}", + str(collection_id), + str(err), + type(err).__name__, + str(job_id), + exc_info=True, + ) + + with Session(engine) as session: + job_crud = CollectionJobCrud(session, project_id) + try: + job = job_crud.read_one(job_id) + update_data = CollectionJobUpdate( + status=CollectionJobStatus.FAILED, + error_message=str(err), + ) + job_crud.update(job.id, update_data) + except Exception: + logger.exception( + "[delete_collection.execute_job] Unable to mark job as failed | " + "{'job_id': '%s'}", + str(job_id), + ) + - collection_job.status = CollectionJobStatus.FAILED - collection_job.error_message = str(err) - collection_job_crud.update(collection_job) - - logger.error( - "[delete_collection.execute_job] Unexpected error during deletion | " - "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}", - str(collection.id), - str(err), - type(err).__name__, - str(job_id), - exc_info=True, - ) callback.fail(str(err))Import
CollectionJobUpdate(same as previous comment).
🧹 Nitpick comments (1)
backend/app/crud/collection/collection_job.py (1)
3-3: Use modernlisttype hint instead of deprecatedtyping.List.Python 3.9+ supports native
listfor type hints.Apply this diff:
-from typing import List +from typing import TYPE_CHECKING- def read_all(self) -> List[CollectionJob]: + def read_all(self) -> list[CollectionJob]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/crud/collection/collection_job.py(1 hunks)backend/app/models/collection_job.py(1 hunks)backend/app/services/collections/create_collection.py(1 hunks)backend/app/services/collections/delete_collection.py(1 hunks)backend/app/tests/crud/collections/collection_job/test_collection_jobs.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/tests/crud/collections/collection_job/test_collection_jobs.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/crud/collection/collection_job.py (3)
backend/app/models/collection_job.py (3)
CollectionJob(34-60)CollectionJobUpdate(71-75)CollectionJobCreate(63-68)backend/app/models/collection.py (1)
now(49-51)backend/app/crud/collection/collection.py (3)
read_one(86-110)read_all(112-121)create(67-84)
backend/app/services/collections/delete_collection.py (9)
backend/app/crud/collection/collection.py (4)
CollectionCrud(18-163)create(67-84)read_one(86-110)delete(124-132)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-97)create(80-97)read_one(24-47)update(60-78)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/models/collection_job.py (3)
CollectionJob(34-60)CollectionJobStatus(10-14)CollectionActionType(17-19)backend/app/models/collection.py (3)
Collection(13-36)DeletionRequest(116-117)ResponsePayload(39-51)backend/app/services/collections/helpers.py (8)
SilentCallback(81-88)WebHookCallback(91-114)success(77-78)success(86-88)success(112-114)fail(74-75)fail(82-84)fail(108-110)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/create_collection.py (2)
start_job(41-76)execute_job(79-204)
backend/app/services/collections/create_collection.py (10)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/models/collection.py (6)
now(49-51)Collection(13-36)ResponsePayload(39-51)CreationRequest(105-113)AssistantOptions(72-95)extract_super_type(110-113)backend/app/crud/collection/collection.py (3)
CollectionCrud(18-163)create(67-84)read_one(86-110)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-97)create(80-97)read_one(24-47)update(60-78)backend/app/crud/rag/open_ai.py (2)
OpenAIVectorStoreCrud(101-189)OpenAIAssistantCrud(192-246)backend/app/services/collections/helpers.py (10)
_backout(117-124)batch_documents(50-66)SilentCallback(81-88)WebHookCallback(91-114)success(77-78)success(86-88)success(112-114)fail(74-75)fail(82-84)fail(108-110)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)
🪛 Ruff (0.13.3)
backend/app/crud/collection/collection_job.py
3-3: typing.List is deprecated, use list instead
(UP035)
backend/app/services/collections/delete_collection.py
73-73: Unused function argument: task_instance
(ARG001)
backend/app/services/collections/create_collection.py
203-203: Undefined name err
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backend/app/services/collections/create_collection.py (1)
193-202: Fix NameError and potential UnboundLocalError in outer exception handler.The outer exception handler has two critical issues:
- Line 201: References undefined variable
errinstead ofouter_err, which will raiseNameErrorwhen this handler is triggered.- Lines 200-202: Assumes
collection_jobandcollection_job_crudare in scope, but if the exception occurs before line 100-101 (e.g., during request deserialization at lines 95-96), these variables won't exist, causingUnboundLocalError.To fix both issues safely, open a fresh session and update the job status within a try-except block:
except Exception as outer_err: logger.error( "[create_collection.execute_job] Unexpected Error during collection creation: %s", str(outer_err), exc_info=True, ) - collection_job.status = CollectionJobStatus.FAILED - collection_job.error_message = str(err) - collection_job_crud.update(collection_job.id, collection_job) + try: + with Session(engine) as session: + job_crud = CollectionJobCrud(session, project_id) + job = job_crud.read_one(UUID(job_id)) + job.status = CollectionJobStatus.FAILED + job.error_message = str(outer_err) + job_crud.update(job.id, job) + except Exception: + logger.exception( + "[create_collection.execute_job] Failed to mark job as failed | {'job_id': '%s'}", + job_id, + ) + + if "callback" in locals(): + callback.fail(str(outer_err))
🧹 Nitpick comments (1)
backend/app/services/collections/create_collection.py (1)
153-157: Consider usingcollection_crud.createwith the documents parameter.The code creates the collection, reads it back, then links documents via
DocumentCollectionCrud. SinceCollectionCrud.createalready accepts an optionaldocumentsparameter, you could simplify this to a single call, eliminating the extra read operation.Apply this diff:
- collection_crud.create(collection) - collection_data = collection_crud.read_one(collection.id) - - if flat_docs: - DocumentCollectionCrud(session).create(collection_data, flat_docs) + collection = collection_crud.create(collection, flat_docs if flat_docs else None)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/models/collection_job.py(1 hunks)backend/app/services/collections/create_collection.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/models/collection_job.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/collections/create_collection.py (11)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/models/collection.py (6)
now(49-51)Collection(13-36)ResponsePayload(39-51)CreationRequest(105-113)AssistantOptions(72-95)extract_super_type(110-113)backend/app/crud/collection/collection.py (3)
CollectionCrud(18-163)create(67-84)read_one(86-110)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-97)create(80-97)read_one(24-47)update(60-78)backend/app/crud/rag/open_ai.py (2)
OpenAIVectorStoreCrud(101-189)OpenAIAssistantCrud(192-246)backend/app/services/collections/helpers.py (10)
_backout(117-124)batch_documents(50-66)SilentCallback(81-88)WebHookCallback(91-114)success(77-78)success(86-88)success(112-114)fail(74-75)fail(82-84)fail(108-110)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/delete_collection.py (2)
start_job(25-62)execute_job(65-138)
🪛 Ruff (0.13.3)
backend/app/services/collections/create_collection.py
201-201: Undefined name err
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/collection/collection.py (1)
1-5: Remove duplicate logging import.The
loggingmodule is imported twice (lines 1 and 5), which is redundant.Apply this diff to remove the duplicate:
import logging import functools as ft from uuid import UUID from typing import Optional -import logging from fastapi import HTTPException
♻️ Duplicate comments (3)
backend/app/tests/api/routes/collections/test_create_collections.py (1)
2-5: Remove duplicate import ofpatch.The
patchimport appears twice (lines 2 and 5). This is flagged by Ruff as F811 (redefinition of unused import).Apply this diff to remove the redundant import:
from uuid import UUID -from unittest.mock import patch - from fastapi.testclient import TestClient from unittest.mock import patchbackend/app/services/collections/create_collection.py (1)
188-197: Fix undefined variable in outer exception handler.Line 196 references
errwhich is undefined in the outer exception scope (the caught exception isouter_err). This will raise aNameErrorif this handler executes, masking the original error and leaving the job un-updated.Apply this diff:
except Exception as outer_err: logger.error( "[create_collection.execute_job] Unexpected Error during collection creation: %s", str(outer_err), exc_info=True, ) collection_job.status = CollectionJobStatus.FAILED - collection_job.error_message = str(err) + collection_job.error_message = str(outer_err) collection_job_crud.update(collection_job.id, collection_job)Note: This handler also assumes
collection_jobandcollection_job_crudare in scope, which may not be true if an error occurs before line 95. Consider wrapping this in a fresh session as suggested in previous reviews.backend/app/services/collections/delete_collection.py (1)
87-87: Persist task_id before proceeding.The
task_idis assigned tocollection_job.task_idbut not persisted. If an error occurs in the subsequent operations (lines 89-115), the task_id remains unset in the database.Apply this diff to persist the assignment:
collection = collection_crud.read_one(collection_id) collection_job = collection_job_crud.read_one(job_id) - collection_job.task_id = task_id + update_data = CollectionJobUpdate(task_id=task_id) + collection_job = collection_job_crud.update(collection_job.id, update_data)
🧹 Nitpick comments (8)
backend/app/tests/services/collections/test_delete_collection.py (1)
67-72: Remove or acknowledge unused variable.The variable
precreatedis assigned but never used in this test. If the assignment is intentional (to ensure the job exists), use the underscore convention to signal this.Apply this diff:
- precreated = create_collection_job( + _ = create_collection_job( db=db, project=project, collection=created_collection, job_id=collection_job_id, )backend/app/crud/collection/collection_job.py (1)
3-3: Use built-inlistinstead oftyping.List.Python 3.9+ supports using the built-in
listtype for annotations, makingtyping.Listunnecessary. The project uses Python >= 3.9 per Alembic 1.15 requirements noted in learnings.Apply this diff:
from uuid import UUID import logging -from typing import List from fastapi import HTTPExceptionAnd update line 47:
- def read_all(self) -> List[CollectionJob]: + def read_all(self) -> list[CollectionJob]:backend/app/services/collections/helpers.py (2)
6-6: Use built-inlistinstead oftyping.List.Python 3.9+ supports using the built-in
listtype for annotations. Thetyping.Listimport is deprecated.Apply this diff:
import re from uuid import UUID -from typing import List from dataclasses import asdict, replaceAnd update line 52:
def batch_documents( - document_crud: DocumentCrud, documents: List[UUID], batch_size: int + document_crud: DocumentCrud, documents: list[UUID], batch_size: int ):
87-87: Remove unnecessary f-string prefixes.Lines 87, 91, and 117 use f-strings without any placeholders. Remove the
fprefix for cleaner code.Apply this diff:
def fail(self, body): - logger.info(f"[SilentCallback.fail] Silent callback failure") + logger.info("[SilentCallback.fail] Silent callback failure") return def success(self, body): - logger.info(f"[SilentCallback.success] Silent callback success") + logger.info("[SilentCallback.success] Silent callback success") returnAnd for line 117:
def success(self, body): - logger.info(f"[WebHookCallback.success] Callback succeeded") + logger.info("[WebHookCallback.success] Callback succeeded") self(APIResponse.success_response(body), "complete")Also applies to: 91-91, 117-117
backend/app/services/collections/create_collection.py (2)
52-54: Remove unused assignment.The
updatecall returns the updated job but the result is assigned to an unused variable (flagged by Ruff). Either use the returned value or remove the assignment.Apply this diff:
- collection_job = job_crud.update( + job_crud.update( collection_job_id, CollectionJobUpdate(trace_id=trace_id) )
148-152: Simplify collection creation flow.The current flow creates the collection, reads it back, then links documents separately. Since
CollectionCrud.createaccepts an optionaldocumentsparameter (see backend/app/crud/collection/collection.py lines 54-79), you can passflat_docsdirectly and eliminate the read-back step.Apply this diff:
- collection_crud.create(collection) - collection_data = collection_crud.read_one(collection.id) - - if flat_docs: - DocumentCollectionCrud(session).create(collection_data, flat_docs) + collection_crud.create(collection, flat_docs if flat_docs else None)backend/app/services/collections/delete_collection.py (1)
36-38: Remove unused assignment.The
updatecall returns the updated job but the result is assigned to an unused variable (flagged by Ruff). Either use the returned value or remove the assignment.Apply this diff:
- collection_job = job_crud.update( + job_crud.update( collection_job_id, CollectionJobUpdate(trace_id=trace_id) )backend/app/api/routes/collections.py (1)
4-4: Use built-inlistinstead of deprecatedtyping.List.Python 3.9+ supports using
listdirectly in type annotations. Thetyping.Listis deprecated.Apply this diff:
-from typing import ListAnd update the type hints:
- response_model=APIResponse[List[CollectionPublic]], + response_model=APIResponse[list[CollectionPublic]],- response_model=APIResponse[List[DocumentPublic]], + response_model=APIResponse[list[DocumentPublic]],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py(1 hunks)backend/app/api/docs/collections/create.md(1 hunks)backend/app/api/docs/collections/delete.md(1 hunks)backend/app/api/routes/collection_job.py(1 hunks)backend/app/api/routes/collections.py(4 hunks)backend/app/crud/collection/collection.py(4 hunks)backend/app/crud/collection/collection_job.py(1 hunks)backend/app/models/__init__.py(3 hunks)backend/app/models/collection_job.py(1 hunks)backend/app/services/collections/create_collection.py(1 hunks)backend/app/services/collections/delete_collection.py(1 hunks)backend/app/services/collections/helpers.py(1 hunks)backend/app/tests/api/routes/collections/test_collection_info.py(1 hunks)backend/app/tests/api/routes/collections/test_create_collections.py(1 hunks)backend/app/tests/services/collections/test_create_collection.py(1 hunks)backend/app/tests/services/collections/test_delete_collection.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app/api/docs/collections/create.md
- backend/app/api/routes/collection_job.py
- backend/app/models/collection_job.py
🧰 Additional context used
🧬 Code graph analysis (12)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/models/collection.py (2)
Collection(13-36)CreationRequest(105-113)backend/app/tests/conftest.py (3)
client(52-55)user_api_key_header(77-79)user_api_key(89-91)
backend/app/services/collections/helpers.py (5)
backend/app/core/util.py (1)
post_callback(29-39)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/models/collection.py (2)
ResponsePayload(39-51)now(49-51)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/utils.py (3)
APIResponse(29-53)failure_response(42-53)success_response(36-39)
backend/app/tests/services/collections/test_create_collection.py (10)
backend/app/core/cloud/storage.py (1)
AmazonCloudStorageClient(28-84)backend/app/crud/collection/collection.py (5)
CollectionCrud(17-159)create(55-80)_(131-138)_(141-159)read_one(82-106)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)create(78-96)read_one(24-45)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/models/collection_job.py (3)
CollectionJobStatus(11-15)CollectionJob(35-64)CollectionActionType(18-20)backend/app/models/collection.py (2)
CreationRequest(105-113)ResponsePayload(39-51)backend/app/services/collections/create_collection.py (2)
start_job(41-71)execute_job(74-197)backend/app/tests/utils/openai.py (1)
get_mock_openai_client_with_vector_store(90-117)backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (2)
DocumentStore(52-79)project(64-65)
backend/app/services/collections/delete_collection.py (9)
backend/app/crud/collection/collection.py (3)
CollectionCrud(17-159)read_one(82-106)delete(120-128)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)update(58-76)read_one(24-45)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/models/collection_job.py (2)
CollectionJobStatus(11-15)CollectionJobUpdate(74-79)backend/app/models/collection.py (3)
Collection(13-36)DeletionRequest(116-117)ResponsePayload(39-51)backend/app/services/collections/helpers.py (8)
SilentCallback(85-92)WebHookCallback(95-118)success(81-82)success(90-92)success(116-118)fail(78-79)fail(86-88)fail(112-114)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/create_collection.py (2)
start_job(41-71)execute_job(74-197)
backend/app/tests/api/routes/collections/test_collection_info.py (6)
backend/app/models/collection.py (2)
now(49-51)Collection(13-36)backend/app/models/collection_job.py (4)
CollectionJobCreate(67-71)CollectionActionType(18-20)CollectionJobStatus(11-15)CollectionJobUpdate(74-79)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)create(78-96)update(58-76)backend/app/crud/collection/collection.py (2)
CollectionCrud(17-159)create(55-80)backend/app/api/routes/collections.py (1)
create_collection(44-73)backend/app/tests/conftest.py (4)
db(24-41)client(52-55)user_api_key_header(77-79)user_api_key(89-91)
backend/app/api/routes/collections.py (9)
backend/app/crud/collection/collection.py (4)
CollectionCrud(17-159)create(55-80)read_one(82-106)read_all(108-117)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-96)create(78-96)read_one(24-45)read_all(47-56)backend/app/crud/document_collection.py (2)
DocumentCollectionCrud(8-48)create(12-23)backend/app/models/collection_job.py (3)
CollectionJobStatus(11-15)CollectionActionType(18-20)CollectionJobCreate(67-71)backend/app/models/collection.py (4)
ResponsePayload(39-51)CreationRequest(105-113)DeletionRequest(116-117)CollectionPublic(120-129)backend/app/utils.py (2)
APIResponse(29-53)success_response(36-39)backend/app/services/collections/helpers.py (1)
extract_error_message(22-48)backend/app/services/collections/create_collection.py (1)
start_job(41-71)backend/app/services/collections/delete_collection.py (1)
start_job(25-55)
backend/app/services/collections/create_collection.py (11)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/models/collection.py (6)
now(49-51)Collection(13-36)ResponsePayload(39-51)CreationRequest(105-113)AssistantOptions(72-95)extract_super_type(110-113)backend/app/crud/collection/collection.py (3)
CollectionCrud(17-159)read_one(82-106)create(55-80)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-96)update(58-76)read_one(24-45)create(78-96)backend/app/crud/rag/open_ai.py (2)
OpenAIVectorStoreCrud(101-189)OpenAIAssistantCrud(192-246)backend/app/services/collections/helpers.py (10)
_backout(121-129)batch_documents(51-70)SilentCallback(85-92)WebHookCallback(95-118)success(81-82)success(90-92)success(116-118)fail(78-79)fail(86-88)fail(112-114)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/delete_collection.py (2)
start_job(25-55)execute_job(58-131)
backend/app/crud/collection/collection.py (3)
backend/app/models/collection.py (2)
Collection(13-36)now(49-51)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/crud/collection/collection_job.py (2)
create(78-96)read_one(24-45)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
upgrade(20-26)backend/app/crud/collection/collection_job.py (1)
create(78-96)
backend/app/models/__init__.py (12)
backend/app/models/api_key.py (3)
APIKey(28-38)APIKeyBase(10-20)APIKeyPublic(23-25)backend/app/models/assistants.py (4)
Assistant(33-44)AssistantBase(11-30)AssistantCreate(47-76)AssistantUpdate(79-101)backend/app/models/collection.py (2)
Collection(13-36)CollectionPublic(120-129)backend/app/models/collection_job.py (7)
CollectionActionType(18-20)CollectionJob(35-64)CollectionJobBase(23-32)CollectionJobStatus(11-15)CollectionJobUpdate(74-79)CollectionJobPublic(82-91)CollectionJobCreate(67-71)backend/app/models/credentials.py (5)
Credential(48-92)CredsBase(9-16)CredsCreate(19-29)CredsPublic(95-103)CredsUpdate(32-45)backend/app/models/fine_tuning.py (6)
FineTuningJobBase(20-25)Fine_Tuning(54-89)FineTuningJobCreate(28-51)FineTuningJobPublic(103-119)FineTuningUpdate(92-100)FineTuningStatus(13-17)backend/app/models/model_evaluation.py (6)
ModelEvaluation(38-85)ModelEvaluationBase(21-26)ModelEvaluationCreate(29-35)ModelEvaluationPublic(95-108)ModelEvaluationStatus(14-18)ModelEvaluationUpdate(88-92)backend/app/models/onboarding.py (2)
OnboardingRequest(7-81)OnboardingResponse(84-103)backend/app/models/openai_conversation.py (4)
OpenAIConversationPublic(101-111)OpenAIConversation(58-69)OpenAIConversationBase(22-55)OpenAIConversationCreate(72-98)backend/app/models/organization.py (5)
Organization(34-57)OrganizationCreate(23-24)OrganizationPublic(61-64)OrganizationsPublic(67-69)OrganizationUpdate(28-30)backend/app/models/response.py (6)
CallbackResponse(47-54)Diagnostics(35-39)FileResultChunk(42-44)ResponsesAPIRequest(4-11)ResponseJobStatus(27-32)ResponsesSyncAPIRequest(14-24)backend/app/models/threads.py (3)
OpenAI_Thread(18-21)OpenAIThreadBase(6-11)OpenAIThreadCreate(14-15)
backend/app/crud/collection/collection_job.py (2)
backend/app/models/collection_job.py (3)
CollectionJob(35-64)CollectionJobUpdate(74-79)CollectionJobCreate(67-71)backend/app/crud/collection/collection.py (3)
read_one(82-106)read_all(108-117)create(55-80)
backend/app/tests/services/collections/test_delete_collection.py (7)
backend/app/models/collection.py (3)
DeletionRequest(116-117)Collection(13-36)ResponsePayload(39-51)backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/crud/collection/collection.py (5)
CollectionCrud(17-159)create(55-80)read_all(108-117)read_one(82-106)delete(120-128)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-96)create(78-96)read_all(47-56)read_one(24-45)backend/app/models/collection_job.py (3)
CollectionJobStatus(11-15)CollectionJob(35-64)CollectionActionType(18-20)backend/app/services/collections/delete_collection.py (2)
start_job(25-55)execute_job(58-131)backend/app/tests/conftest.py (1)
db(24-41)
🪛 Ruff (0.13.3)
backend/app/tests/api/routes/collections/test_create_collections.py
5-5: Redefinition of unused patch from line 2
Remove definition: patch
(F811)
backend/app/services/collections/helpers.py
6-6: typing.List is deprecated, use list instead
(UP035)
87-87: f-string without any placeholders
Remove extraneous f prefix
(F541)
91-91: f-string without any placeholders
Remove extraneous f prefix
(F541)
117-117: f-string without any placeholders
Remove extraneous f prefix
(F541)
backend/app/services/collections/delete_collection.py
36-36: Local variable collection_job is assigned to but never used
Remove assignment to unused variable collection_job
(F841)
66-66: Unused function argument: task_instance
(ARG001)
backend/app/api/routes/collections.py
4-4: typing.List is deprecated, use list instead
(UP035)
backend/app/services/collections/create_collection.py
52-52: Local variable collection_job is assigned to but never used
Remove assignment to unused variable collection_job
(F841)
196-196: Undefined name err
(F821)
backend/app/crud/collection/collection_job.py
3-3: typing.List is deprecated, use list instead
(UP035)
backend/app/tests/services/collections/test_delete_collection.py
67-67: Local variable precreated is assigned to but never used
Remove assignment to unused variable precreated
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/tests/services/collections/test_create_collection.py (1)
90-93: Simplify enum comparison.The assertion checks
action_typeagainst both the enum member and its.value, which is overly defensive. Sinceaction_typeis an Enum field on the model, it should always be the enum member, not a string.Apply this diff to simplify:
- assert job.action_type in ( - CollectionActionType.CREATE, - CollectionActionType.CREATE.value, - ) + assert job.action_type == CollectionActionType.CREATEbackend/app/tests/api/routes/collections/test_collection_info.py (1)
58-64: Consider clarifying the FAILED status handling.The pattern of creating a job and immediately updating it when
status == FAILEDworks correctly, but may be unclear to readers. This is necessary becauseCollectionJobCreatedoesn't includeerror_message, so a follow-up update is required.Add a comment to clarify the intent:
collection_job = CollectionJobCrud(db, user.project_id).create(job_in) + # CollectionJobCreate doesn't include error_message, so update separately if needed if collection_job.status == CollectionJobStatus.FAILED: job_in = CollectionJobUpdate( error_message="Something went wrong during the collection job process." )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/tests/api/routes/collections/test_collection_info.py(1 hunks)backend/app/tests/services/collections/test_create_collection.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/tests/services/collections/test_create_collection.py (11)
backend/app/core/cloud/storage.py (1)
AmazonCloudStorageClient(28-84)backend/app/crud/collection/collection.py (5)
CollectionCrud(17-159)create(55-80)_(131-138)_(141-159)read_one(82-106)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)create(78-96)read_one(24-45)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/models/collection_job.py (3)
CollectionJobStatus(11-15)CollectionJob(35-64)CollectionActionType(18-20)backend/app/models/collection.py (2)
CreationRequest(105-113)ResponsePayload(39-51)backend/app/services/collections/create_collection.py (2)
start_job(41-71)execute_job(74-197)backend/app/tests/utils/openai.py (1)
get_mock_openai_client_with_vector_store(90-117)backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/tests/utils/document.py (2)
DocumentStore(52-79)project(64-65)backend/app/tests/conftest.py (1)
db(24-41)
backend/app/tests/api/routes/collections/test_collection_info.py (6)
backend/app/models/collection.py (2)
now(49-51)Collection(13-36)backend/app/models/collection_job.py (4)
CollectionJobCreate(67-71)CollectionActionType(18-20)CollectionJobStatus(11-15)CollectionJobUpdate(74-79)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)create(78-96)update(58-76)backend/app/crud/collection/collection.py (2)
CollectionCrud(17-159)create(55-80)backend/app/api/routes/collections.py (1)
create_collection(44-73)backend/app/tests/conftest.py (4)
db(24-41)client(52-55)user_api_key_header(77-79)user_api_key(89-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/tests/services/collections/test_create_collection.py (1)
111-190: LGTM! Comprehensive integration test.This test thoroughly validates the end-to-end flow of
execute_job:
- Proper AWS/OpenAI mocking with moto and MagicMock
- Correct Session patching to inject the test database
- Verification of all critical state transitions (task_id assignment, status updates, collection creation, document linking)
The test design is solid and covers the success path comprehensively.
kartpop
left a comment
There was a problem hiding this comment.
please change the route path before merging
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backend/app/api/routes/collection_job.py (1)
23-27: Consider simplifying the route path to/jobs/{job_id}.Past reviewers suggested changing from
/info/jobs/{job_id}to/jobs/{job_id}for consistency with other routes like/documents/transformations/{job_id}. The author mentioned keeping "collection_job_id" for clarity, but the consensus was that/collections/jobs/{job_id}(resulting from the prefix) would be clearer and more consistent.Based on past review comments.
🧹 Nitpick comments (2)
backend/app/tests/api/routes/collections/test_collection_info.py (2)
84-87: Clarify the assertion on line 86.Line 86 asserts
data["collection_id"] == collection_job.collection_id, but whencreate_collection_jobis called without a collection_id (as in line 74), both sides evaluate toNone. While this assertion passes, it doesn't verify meaningful behavior.Consider either:
- Asserting explicitly:
assert data["collection_id"] is None- Or adding a comment explaining why checking None equality is intentional
69-138: Consider expanding test coverage.The current tests cover the happy path scenarios well. Consider adding tests for:
- 404 response when job_id doesn't exist
- Authorization check (accessing another project's job)
- DELETE action type in addition to CREATE
- Edge cases like PROCESSING status
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/api/docs/collections/create.md(1 hunks)backend/app/api/docs/collections/delete.md(1 hunks)backend/app/api/routes/collection_job.py(1 hunks)backend/app/tests/api/routes/collections/test_collection_info.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/api/docs/collections/create.md
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/api/routes/collection_job.py (6)
backend/app/crud/collection/collection.py (2)
CollectionCrud(17-159)read_one(82-106)backend/app/crud/collection/collection_job.py (2)
CollectionJobCrud(19-96)read_one(24-45)backend/app/models/collection_job.py (3)
CollectionJobStatus(11-15)CollectionJobPublic(82-91)CollectionActionType(18-20)backend/app/models/collection.py (1)
CollectionPublic(120-129)backend/app/utils.py (3)
APIResponse(29-53)load_description(248-253)success_response(36-39)backend/app/services/collections/helpers.py (1)
extract_error_message(22-48)
backend/app/tests/api/routes/collections/test_collection_info.py (6)
backend/app/models/collection.py (2)
now(49-51)Collection(13-36)backend/app/models/collection_job.py (4)
CollectionJobCreate(67-71)CollectionActionType(18-20)CollectionJobStatus(11-15)CollectionJobUpdate(74-79)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)create(78-96)update(58-76)backend/app/crud/collection/collection.py (2)
CollectionCrud(17-159)create(55-80)backend/app/api/routes/collections.py (1)
create_collection(44-73)backend/app/tests/conftest.py (4)
db(24-41)client(52-55)user_api_key_header(77-79)user_api_key(89-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/api/docs/collections/delete.md (1)
1-13: LGTM! Documentation clearly describes the job-based deletion workflow.The updated documentation accurately reflects the asynchronous deletion process using CollectionJob tracking and polling via the job info endpoint.
backend/app/api/routes/collection_job.py (1)
28-50: LGTM! Job info retrieval logic is well-structured.The endpoint correctly:
- Scopes job retrieval to the user's project
- Conditionally embeds the collection for successful CREATE operations
- Normalizes error messages for failed jobs using
extract_error_messageThe conditional logic at lines 38-41 appropriately checks all three conditions before attaching collection data.
backend/app/tests/api/routes/collections/test_collection_info.py (2)
19-39: LGTM! Helper function correctly creates persisted collections.The
create_collectionhelper appropriately:
- Uses CollectionCrud to persist the collection
- Optionally populates LLM service fields
- Returns the created collection for test assertions
42-66: LGTM! Previous review feedback has been addressed.The
create_collection_jobhelper now correctly omitsinserted_atandupdated_atfromCollectionJobCreate, relying on the model's default_factory for timestamp generation. The conditional error_message update for FAILED jobs is appropriate.
| @@ -1,5 +1,4 @@ | |||
| Retrieve all AI-platform information about a collection given its | |||
| ID. This route is very helpful for: | |||
| Retrieve detailed information about a specific collection by its ID from the collection table. Note that this endpoint CANNOT be used as a polling endpoint for collection creation because an entry will be made in the collection table only after the resource creation and association has been successful. | |||
There was a problem hiding this comment.
One suggestion for all the md/docs files: these are user-facing documents, so they should focus on describing the behavior of the endpoints, not their internal implementation details.
For example:
- If an endpoint triggers a webhook, simply mention that it does, no need to explain how it’s handled internally.
- If an endpoint deletes a resource, just state that it deletes it, don’t specify whether it’s a soft or hard delete.
Similarly, avoid mentioning internal mechanisms like can be use for polling or background tasks unless they directly affect how the user interacts with the API.
There was a problem hiding this comment.
noted, but since for now the main users of this endpoint had been dalgo, I think keeping that in mind I wrote the docs in that way . Stating that you need to poll on the job id returned is something that the user needs to know in this case. The fact that the deletion is soft or hard is sometimes needed to be known considering that our current user of the platform is dalgo, glific and us(the internal team) . So I think we can keep the docs which go into the behaviour of endpoints , anyway we will discuss this with other people as well, and according to their opinion we will apply those changes in v2
| op.execute( | ||
| """ | ||
| DELETE FROM collection | ||
| WHERE status IN ('processing', 'failed') |
There was a problem hiding this comment.
Make sure no collection with empty llm_service_id exist
There was a problem hiding this comment.
I do not think there would be a situation where status turned to successful while no llm service id is there as they're committed together in the same transaction, they both persist atomically. By they i mean the status as successful and the llm service id , for reference - here . But for surity i will still add that sql query as well
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
backend/app/api/docs/collections/job_info.md (1)
7-8: Fix list indentation to match Markdown standards.Nested list items still use 4-space indentation instead of the standard 2-space indentation.
Apply this diff:
* If the job has finished, has been successful and it was a job of creation of collection then this endpoint will fetch the associated collection details from the collection table, including: - - `llm_service_id`: The OpenAI assistant or model used for the collection. - - Collection metadata such as ID, project, organization, and timestamps. + - `llm_service_id`: The OpenAI assistant or model used for the collection. + - Collection metadata such as ID, project, organization, and timestamps.backend/app/services/collections/delete_collection.py (2)
127-141: Fix outer exception handler—UnboundLocalError and closed session.The outer exception handler has multiple critical issues:
- UnboundLocalError risk: If an exception occurs before line 86 (e.g., during
get_openai_client),collection_jobandcollection_job_crudwill be undefined, causing line 128 to raiseUnboundLocalError.- Closed session: The
with Session(engine)context has exited, so the session is closed and line 130 will fail.- Wrong signature: Line 130 calls
update(collection_job.id, collection_job)instead ofupdate(job_id, patch).Reopen a fresh session and fetch the job defensively:
except Exception as err: - collection_job.status = CollectionJobStatus.FAILED - collection_job.error_message = str(err) - collection_job_crud.update(collection_job.id, collection_job) - logger.error( "[delete_collection.execute_job] Unexpected error during deletion | " "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}", - str(collection.id), + str(collection_id), str(err), type(err).__name__, str(job_id), exc_info=True, ) - callback.fail(str(err)) + + # Reopen session to mark job as failed + with Session(engine) as session: + job_crud = CollectionJobCrud(session, project_id) + try: + job_crud.update( + job_id, + CollectionJobUpdate( + status=CollectionJobStatus.FAILED, + error_message=str(err), + ), + ) + except Exception: + logger.exception( + "[delete_collection.execute_job] Unable to mark job as failed | " + "{'job_id': '%s'}", + str(job_id), + ) + + # Only call callback if it was initialized + if "callback" in locals(): + callback.fail(str(err))
114-116: Fix incorrectupdate()signature.Same issue as above—you're passing
collection_jobinstead of aCollectionJobUpdatepatch object.Apply this diff:
collection_job.status = CollectionJobStatus.FAILED collection_job.error_message = str(err) - collection_job_crud.update(collection_job.id, collection_job) + collection_job_crud.update( + collection_job.id, + CollectionJobUpdate( + status=CollectionJobStatus.FAILED, + error_message=str(err), + ), + )
🧹 Nitpick comments (5)
backend/app/models/collection.py (3)
48-51: Remove redundantnow()classmethod.The
ResponsePayload.now()classmethod simply delegates to the importednow()function fromapp.core.util. This adds no value and creates confusion about whichnow()to use.Apply this diff:
time: datetime = Field(default_factory=now) - - @classmethod - def now(cls): - """Returns current UTC time without timezone info""" - return now()
68-69: Document deduplication behavior or preserve order.Converting
documentsto a set and back to a list removes duplicates but also loses the original ordering. If document processing order matters, consider usingdict.fromkeys(self.documents)to preserve insertion order while removing duplicates.If order should be preserved:
def model_post_init(self, __context: Any): - self.documents = list(set(self.documents)) + self.documents = list(dict.fromkeys(self.documents))
110-113: Clarify method name and purpose.The method name
extract_super_typeis misleading—it yields field name-value pairs, not a supertype. Consider renaming toiter_fields()or adding a docstring explaining its purpose.Example rename:
- def extract_super_type(self, cls: "CreationRequest"): + def iter_fields(self): + """Yield (field_name, field_value) pairs for all fields.""" - for field_name in cls.__fields__.keys(): + for field_name in self.__fields__.keys(): field_value = getattr(self, field_name) yield (field_name, field_value)backend/app/services/collections/delete_collection.py (2)
37-39: Remove unused variable assignment.The
collection_jobvariable is assigned but never used. The return value ofupdate()is not needed here since we only need thecollection_job_id.Apply this diff:
- collection_job = job_crud.update( + job_crud.update( collection_job_id, CollectionJobUpdate(trace_id=trace_id) )
67-67: Remove unusedtask_instanceparameter.The
task_instanceparameter is declared but never used in the function body. If it's required by the Celery task signature, add a leading underscore (_task_instance) to indicate it's intentionally unused.Apply this diff:
- task_instance, + _task_instance, ) -> None:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py(1 hunks)backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py(1 hunks)backend/app/api/docs/collections/job_info.md(1 hunks)backend/app/models/collection.py(2 hunks)backend/app/services/collections/delete_collection.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/services/collections/delete_collection.pybackend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.pybackend/app/models/collection.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/collections/delete_collection.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/collection.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.299Z
Learnt from: CR
PR: ProjectTech4DevAI/ai-platform#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.299Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/
Applied to files:
backend/app/models/collection.py
🧬 Code graph analysis (3)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/collection/collection.py (3)
CollectionCrud(17-159)read_one(82-106)delete(120-128)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)update(58-76)read_one(24-45)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/models/collection.py (3)
Collection(13-36)DeletionRequest(116-117)ResponsePayload(39-51)backend/app/services/collections/helpers.py (8)
SilentCallback(85-92)WebHookCallback(95-118)success(81-82)success(90-92)success(116-118)fail(78-79)fail(86-88)fail(112-114)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/create_collection.py (2)
start_job(41-71)execute_job(74-197)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
upgrade(20-32)backend/app/crud/collection/collection_job.py (1)
create(78-96)
backend/app/models/collection.py (3)
backend/app/core/util.py (1)
now(15-16)backend/app/models/organization.py (1)
Organization(34-57)backend/app/models/project.py (1)
Project(29-63)
🪛 markdownlint-cli2 (0.18.1)
backend/app/api/docs/collections/job_info.md
7-7: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🪛 Ruff (0.13.3)
backend/app/services/collections/delete_collection.py
37-37: Local variable collection_job is assigned to but never used
Remove assignment to unused variable collection_job
(F841)
67-67: Unused function argument: task_instance
(ARG001)
backend/app/models/collection.py
3-3: typing.List is deprecated, use list instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)
36-68: Migration structure looks good.The upgrade path correctly:
- Creates enum types with
checkfirst=True- Establishes the
collection_jobstable with proper constraints- Renames
created_attoinserted_atfor consistency- Enforces non-null constraints on
llm_service_idandllm_service_name- Removes deprecated columns (
owner_id,status,error_message)The downgrade strategy (lines 94-95) uses hardcoded defaults as a pragmatic workaround, which was already discussed in previous reviews.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
backend/app/tests/services/collections/test_create_collection.py (1)
31-45: Pass CollectionJobCreate schema instead of model instance.The helper instantiates a
CollectionJobmodel directly (line 38) and passes it toCollectionJobCrud.create(), but according to the CRUD signature inbackend/app/crud/collection/collection_job.py(lines 64-68), thecreatemethod expects aCollectionJobCreateschema, not a model instance. This bypasses Pydantic validation and can cause type issues.Apply this diff to use the proper schema:
+from app.models import CollectionJobCreate + def create_collection_job_for_create( db: Session, project, job_id: UUID, ): """Pre-create a CREATE job with the given id so start_job can update it.""" return CollectionJobCrud(db, project.id).create( - CollectionJob( + CollectionJobCreate( id=job_id, action_type=CollectionActionType.CREATE, project_id=project.id, collection_id=None, status=CollectionJobStatus.PENDING, ) )backend/app/services/collections/delete_collection.py (1)
137-155: Fix outer exception handler to avoid UnboundLocalError and closed session.The outer exception handler has critical issues:
UnboundLocalError/NameError: Lines 138-143 reference
collection_job_crudandcollection_job, which won't be defined if the error occurs before line 87-88 (e.g., duringDeletionRequestconstruction at line 69 or callback setup at lines 72-76).Closed session: The session from line 84 is closed by the time this handler executes, so any DB operations will fail.
NameError on collection: Line 149 references
collection.id, butcollectionis only defined at line 99 inside the inner try block.Apply this diff to reopen a fresh session and handle the job update defensively:
except Exception as err: - collection_job_crud.update( - collection_job.id, - CollectionJobUpdate( - status=CollectionJobStatus.FAILED, - error_message=str(err), - ), - ) - logger.error( "[delete_collection.execute_job] Unexpected error during deletion | " - "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}", - str(collection.id), + "{'error': '%s', 'error_type': '%s', 'job_id': '%s'}", str(err), type(err).__name__, str(job_id), exc_info=True, ) - callback.fail(str(err)) + + with Session(engine) as session: + job_crud = CollectionJobCrud(session, project_id) + try: + job = job_crud.read_one(job_id) + job_crud.update( + job.id, + CollectionJobUpdate( + status=CollectionJobStatus.FAILED, + error_message=str(err), + ), + ) + except Exception: + logger.exception( + "[delete_collection.execute_job] Unable to mark job as failed | " + "{'job_id': '%s'}", + str(job_id), + ) + + if "callback" in locals(): + callback.fail(str(err))
🧹 Nitpick comments (1)
backend/app/services/collections/create_collection.py (1)
151-155: Consider consolidating collection and document creation.The code creates the collection (line 151), reads it back (line 152), then creates DocumentCollection links (lines 154-155). According to
CollectionCrud.createsignature inbackend/app/crud/collection/collection.py(lines 38-61), the method accepts an optionaldocumentsparameter that handles the linking internally.Consider simplifying to:
- collection_crud.create(collection) - collection_data = collection_crud.read_one(collection.id) - - if flat_docs: - DocumentCollectionCrud(session).create(collection_data, flat_docs) + collection = collection_crud.create( + collection, + documents=flat_docs if flat_docs else None + )This eliminates the redundant read and consolidates the logic into a single operation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/models/collection.py(2 hunks)backend/app/services/collections/create_collection.py(1 hunks)backend/app/services/collections/delete_collection.py(1 hunks)backend/app/tests/services/collections/test_create_collection.py(1 hunks)backend/app/tests/services/collections/test_delete_collection.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/services/collections/create_collection.pybackend/app/tests/services/collections/test_create_collection.pybackend/app/tests/services/collections/test_delete_collection.pybackend/app/models/collection.pybackend/app/services/collections/delete_collection.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/collections/create_collection.pybackend/app/services/collections/delete_collection.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/collection.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.299Z
Learnt from: CR
PR: ProjectTech4DevAI/ai-platform#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.299Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/
Applied to files:
backend/app/models/collection.py
🧬 Code graph analysis (5)
backend/app/services/collections/create_collection.py (10)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/models/collection.py (6)
now(49-51)Collection(13-36)ResponsePayload(39-51)CreationRequest(105-113)AssistantOptions(72-95)extract_super_type(110-113)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-96)update(58-76)read_one(24-45)create(78-96)backend/app/crud/rag/open_ai.py (2)
OpenAIVectorStoreCrud(101-189)OpenAIAssistantCrud(192-246)backend/app/services/collections/helpers.py (10)
_backout(121-129)batch_documents(51-70)SilentCallback(85-92)WebHookCallback(95-118)success(81-82)success(90-92)success(116-118)fail(78-79)fail(86-88)fail(112-114)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/services/collections/delete_collection.py (2)
start_job(25-56)execute_job(59-155)
backend/app/tests/services/collections/test_create_collection.py (7)
backend/app/core/cloud/storage.py (1)
AmazonCloudStorageClient(28-84)backend/app/crud/collection/collection.py (5)
CollectionCrud(17-159)create(55-80)_(131-138)_(141-159)read_one(82-106)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)create(78-96)read_one(24-45)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/services/collections/create_collection.py (2)
start_job(41-71)execute_job(74-212)backend/app/tests/utils/openai.py (1)
get_mock_openai_client_with_vector_store(90-117)backend/app/tests/utils/document.py (2)
DocumentStore(52-79)project(64-65)
backend/app/tests/services/collections/test_delete_collection.py (7)
backend/app/models/collection.py (3)
DeletionRequest(116-117)Collection(13-36)ResponsePayload(39-51)backend/app/tests/utils/utils.py (1)
get_project(70-89)backend/app/crud/collection/collection.py (5)
CollectionCrud(17-159)create(55-80)read_all(108-117)read_one(82-106)delete(120-128)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-96)create(78-96)read_all(47-56)read_one(24-45)backend/app/models/collection_job.py (3)
CollectionJobStatus(11-15)CollectionJob(35-64)CollectionActionType(18-20)backend/app/api/routes/collections.py (1)
delete_collection(80-114)backend/app/services/collections/delete_collection.py (2)
start_job(25-56)execute_job(59-155)
backend/app/models/collection.py (3)
backend/app/core/util.py (1)
now(15-16)backend/app/models/organization.py (1)
Organization(34-57)backend/app/models/project.py (1)
Project(29-63)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/collection/collection.py (3)
CollectionCrud(17-159)read_one(82-106)delete(120-128)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)update(58-76)read_one(24-45)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/models/collection_job.py (2)
CollectionJobStatus(11-15)CollectionJobUpdate(74-79)backend/app/models/collection.py (3)
Collection(13-36)DeletionRequest(116-117)ResponsePayload(39-51)backend/app/services/collections/helpers.py (8)
SilentCallback(85-92)WebHookCallback(95-118)success(81-82)success(90-92)success(116-118)fail(78-79)fail(86-88)fail(112-114)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (1)
get_openai_client(175-205)
🪛 Ruff (0.13.3)
backend/app/services/collections/create_collection.py
52-52: Local variable collection_job is assigned to but never used
Remove assignment to unused variable collection_job
(F841)
backend/app/tests/services/collections/test_delete_collection.py
68-68: Local variable precreated is assigned to but never used
Remove assignment to unused variable precreated
(F841)
backend/app/services/collections/delete_collection.py
37-37: Local variable collection_job is assigned to but never used
Remove assignment to unused variable collection_job
(F841)
67-67: Unused function argument: task_instance
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (13)
backend/app/tests/services/collections/test_delete_collection.py (2)
49-108: LGTM!The test correctly verifies that
start_jobupdates the CollectionJob and schedules the background task with appropriate parameters. The unusedprecreatedvariable (line 68) is acceptable here—the job is created for its side effects and fetched later via CRUD.
111-228: LGTM!Both tests effectively validate the
execute_jobworkflow:
- Success path verifies job transitions to SUCCESSFUL,
CollectionCrud.deleteis invoked, and OpenAI client is used.- Failure path confirms job is marked FAILED with the error message when
deleteraises an exception.The mocking strategy and assertions are appropriate.
backend/app/tests/services/collections/test_create_collection.py (2)
48-109: LGTM!The test correctly validates
start_jobbehavior, ensuring the CollectionJob is updated and the background task is scheduled with appropriate parameters. The unused variable at line 68 is acceptable—it's created for side effects.
111-190: LGTM!The test comprehensively validates the
execute_jobsuccess flow, including:
- AWS S3 mocking for document storage
- OpenAI client interaction
- Collection and DocumentCollection creation
- Job status updates
The test structure and assertions are thorough and appropriate.
backend/app/models/collection.py (5)
13-36: LGTM!The
Collectionmodel is properly structured with appropriate foreign keys, cascading deletes, and relationships. The timestamp fields follow the project's naming convention.
39-51: LGTM!The
ResponsePayloadmodel properly encapsulates response metadata for background jobs. The default factories forkeyandtime, along with thenow()classmethod, are well-designed.
54-69: LGTM!The
DocumentOptionsmodel is well-designed with appropriate defaults and helpful field descriptions. Themodel_post_inithook that deduplicates document IDs prevents duplicate processing.
72-102: LGTM!Both
AssistantOptionsandCallbackRequestare well-documented with detailed field descriptions. The temperature default and its explanation are particularly helpful for API users.
105-129: LGTM!The request models (
CreationRequest,DeletionRequest) andCollectionPublicare properly structured:
- Multiple inheritance pattern cleanly composes request models
extract_super_typeutility method is useful for extracting parent class fieldsCollectionPublicappropriately exposes only necessary fields for the public APIbackend/app/services/collections/create_collection.py (2)
41-71: LGTM!The
start_jobfunction correctly orchestrates job scheduling:
- Updates the CollectionJob with correlation/trace context
- Schedules the background task with appropriate parameters
- Returns the job ID for tracking
The "unused"
collection_jobvariable at line 52 is acceptable—the update is persisted via the CRUD method.
199-212: Outer exception handler risks UnboundLocalError.If an error occurs before line 95 (e.g., during
CreationRequestconstruction or job fetching),collection_job_crudandcollection_jobwon't be defined, causing lines 206-211 to raiseUnboundLocalError. Additionally, the session from line 89 is already closed when this handler executes.This issue was flagged in past reviews and noted for addressing in a future PR. Please ensure it's tracked in the GitHub issue for v2 session management improvements.
Based on past review comments.
backend/app/services/collections/delete_collection.py (2)
25-56: LGTM!The
start_jobfunction properly orchestrates collection deletion:
- Updates the CollectionJob with trace context
- Schedules the background task with all necessary parameters
- Returns the job ID for tracking
The "unused"
collection_jobvariable at line 37 is acceptable—the update is persisted via the CRUD method.
59-117: LGTM!The
execute_jobsuccess path is well-structured:
- Properly constructs request/payload models and selects the appropriate callback
- Updates job status to PROCESSING before attempting deletion
- Uses
CollectionJobUpdateschema correctly for job updates- Marks job SUCCESSFUL and triggers success callback upon completion
Summary
Target issue is #383 and #389
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Summary by CodeRabbit
New Features
Documentation
Refactor
Database
Helpers
Tests