Skip to content

Celery : moving collections module to celery#385

Merged
nishika26 merged 48 commits intomainfrom
feature/collection_to_celery
Oct 9, 2025
Merged

Celery : moving collections module to celery#385
nishika26 merged 48 commits intomainfrom
feature/collection_to_celery

Conversation

@nishika26
Copy link
Copy Markdown
Collaborator

@nishika26 nishika26 commented Sep 23, 2025

Summary

Target issue is #383 and #389

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • This PR covers :
    • Shifting collection module's dependecy on project id instead of user id
    • Swapping Fastapi's background task template with celery
    • Introducing a new collection job table which takes care of retaining the job status, error message, etc. An entry will be made in the collection table only when the job has been successful

Summary by CodeRabbit

  • New Features

    • Job-based collection create/delete with background workers, webhook callbacks, polling endpoint for collection jobs, and richer public collection & collection-job metadata.
  • Documentation

    • Updated create/info/delete docs; added collection-job docs and polling guidance clarifying job responses.
  • Refactor

    • Routes/CRUD now project-scoped; collection endpoints delegate to service layer and return simplified public models.
  • Database

    • Added collection_jobs table; renamed timestamp and removed legacy columns from collections.
  • Helpers

    • Added error normalization, document batching, callback framework, and safe rollback utilities.
  • Tests

    • Added/updated tests for job workflows, services, CRUD, and collection lifecycle.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 23, 2025

Walkthrough

Adds DB-backed CollectionJob lifecycle and CRUD; moves collection create/delete into background services with start_job/execute_job; scopes collections to projects (removes owner_id); introduces new models, helpers, migrations, docs, routes, and tests to support job-based polling, callbacks, and error normalization.

Changes

Cohort / File(s) Summary
API Docs
backend/app/api/docs/collections/*.md
Update create/delete docs to return a collection_job; add job_info.md documenting polling-style job-info endpoint; clarify collection info is not a pollable create endpoint.
API Routes
backend/app/api/routes/collections.py, backend/app/api/routes/collection_job.py, backend/app/api/routes/documents.py
Routes moved to project-scoped CurrentUserOrgProject; create/delete now schedule background service jobs (start_job); new GET /collections/info/jobs/{job_id} returns CollectionJobPublic; document endpoints and CRUD use project_id.
Main Router
backend/app/api/main.py
Register new collection_job router.
CRUD: Collections & Jobs
backend/app/crud/__init__.py, backend/app/crud/collection/__init__.py, backend/app/crud/collection/collection.py, backend/app/crud/collection/collection_job.py
CollectionCrud now initialized with project_id (was owner_id) and raises 404 on missing; new CollectionJobCrud added; package re-exports updated.
Models
backend/app/models/__init__.py, backend/app/models/collection.py, backend/app/models/collection_job.py, backend/app/models/user.py
Remove owner_id, status, error_message from Collection; rename created_atinserted_at; add Creation/Deletion/Response payload models and CollectionPublic; add CollectionJob enums/models and public job types; remove User.collections relationship; expand model exports.
Services & Helpers
backend/app/services/collections/*.py
New service modules for create/delete with start_job/execute_job, batching, callback handlers (Silent/WebHook), error extraction (extract_error_message), assistant backout, and orchestration with CRUD/OpenAI/vector-store.
Migrations
backend/app/alembic/versions/*.py
Add migration to create collection_jobs table and enums; alter collection table (drop owner_id/status/error_message, rename created_at→inserted_at); add cleanup migration removing legacy non-successful rows.
Tests
backend/app/tests/...
Tests updated/added to exercise CollectionJob lifecycle, service execute flows, and project-scoped CRUD; test helpers adjusted to use get_project and project_id; some tests refactored to mock start_job.
Test Utils
backend/app/tests/utils/collection.py, backend/app/tests/utils/utils.py
get_collection signature changed to accept project_id; new/updated get_project helper used by tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant API as Collections API
  participant Jobs as CollectionJobCrud
  participant BG as Scheduler
  participant Exec as Worker.execute_job
  participant DB
  participant OA as OpenAI/VectorStore
  rect rgba(200,240,200,0.18)
    Client->>API: POST /collections/create (CreationRequest)
    API->>Jobs: create(job PENDING, action=CREATE, project_id)
    API->>BG: schedule execute_job(job_id, payload, project_id, org_id)
    API-->>Client: APIResponse { key: job_id }
  end
  rect rgba(220,220,255,0.12)
    BG->>Exec: invoke execute_job(job_id,...)
    Exec->>DB: open session
    Exec->>Jobs: update(job_id, status=PROCESSING, task_id)
    Exec->>OA: create vector store, upload & batch documents
    Exec->>DB: create Collection, link docs, create assistant
    Exec->>Jobs: update(job_id, status=SUCCESSFUL, collection_id)
    Exec-->>Client: optional webhook callback (SUCCESS)
  end
  alt failure
    Exec->>Jobs: update(job_id, status=FAILED, error_message)
    Exec-->>Client: optional webhook callback (FAIL)
  end
Loading
sequenceDiagram
  autonumber
  participant Client
  participant API as Collection Job API
  participant Jobs as CollectionJobCrud
  participant Cols as CollectionCrud
  Client->>API: GET /collections/info/jobs/{job_id}
  API->>Jobs: read_one(job_id)
  alt job.status == SUCCESSFUL and job.collection_id
    API->>Cols: read_one(collection_id)
    API-->>Client: APIResponse { data: CollectionPublic (embedded) }
  else job.status == FAILED
    API-->>Client: APIResponse { data: CollectionJobPublic (normalized error) }
  else
    API-->>Client: APIResponse { data: CollectionJobPublic (PENDING/PROCESSING) }
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

ready-for-review

Suggested reviewers

  • avirajsingh7
  • AkhileshNegi
  • kartpop

Poem

Thump-thump, I queue the job and hop along,
Batches hum and vectors hum a brighter song.
A job ID key, a webhook if it's done,
I tidy up softly when failures come.
Hooray — projects guide my carrot-chasing fun! 🥕🐰

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Celery : moving collections module to celery” clearly indicates the primary change of migrating the collections workflow to Celery-based processing, which aligns with the core update of replacing FastAPI background tasks with Celery for collection operations. While the phrasing is slightly redundant, it accurately reflects the main focus of this pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/collection_to_celery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)

46-46: Duplicate: task_id should be TEXT/VARCHAR, not UUID.

As flagged in a previous review comment, Celery task IDs are strings (not UUIDs), so this column type is incorrect. This issue was already raised but has not been addressed.

Apply this diff:

-        sa.Column("task_id", sa.Uuid(), nullable=True),
+        sa.Column("task_id", sa.String(), nullable=True),

46-46: Critical: task_id column type should be String, not UUID.

Celery generates task IDs as strings (typically 36-character hyphenated strings like "a5f3c8d2-1b4e-4f9a-8c7d-2e5f6a1b8c9d"), not as native UUID types. Using sa.Uuid() will cause type mismatch errors when storing Celery task IDs.

Apply this diff:

-        sa.Column("task_id", sa.Uuid(), nullable=True),
+        sa.Column("task_id", sa.String(), nullable=True),

Note: This issue was previously flagged by @coderabbitai[bot] and remains unresolved.

🧹 Nitpick comments (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)

37-38: Add explicit ENUM type cleanup in downgrade.

The upgrade() function creates two ENUM types (lines 37-38), but the downgrade() function (line 94) doesn't explicitly drop them. While PostgreSQL may automatically clean up unused ENUMs when the table is dropped, explicit cleanup ensures complete and predictable reversal.

Add ENUM cleanup to the downgrade function:

     op.alter_column("collection", "inserted_at", new_column_name="created_at")
     op.drop_table("collection_jobs")
+    
+    # Explicitly drop ENUM types created in upgrade
+    collection_action_type.drop(op.get_bind(), checkfirst=True)
+    collection_job_status_enum.drop(op.get_bind(), checkfirst=True)

Note: Use checkfirst=True to avoid errors if the types are still referenced elsewhere or have already been dropped.

Also applies to: 94-94

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81e0890 and 5b6f2f2.

📒 Files selected for processing (2)
  • backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1 hunks)
  • backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
  • upgrade (36-61)
  • downgrade (64-94)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (2)
  • upgrade (20-26)
  • downgrade (29-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)

57-61: project_id is already on collection; removing owner_id is safe
Migration 3389c67fdcb4_add_alter_columns_in_collections_table.py adds a nullable project_id column to collection, so dropping owner_id here aligns with project-scoped ownership.

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

20-30: Verify permanent data loss manually.

Automatic script execution failed (sqlalchemy unavailable). Run a SQL query (e.g., via psql):

psql $DATABASE_URL -c "SELECT status, COUNT(*) FROM collection WHERE status IN ('processing','failed') GROUP BY status;"

Confirm that deleting these rows is acceptable. If any data must be retained, archive it to a separate table or backup before applying this migration.

Copy link
Copy Markdown
Collaborator

@kartpop kartpop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comments

  • is the __init__.py in services/collections required?

description=load_description("collections/job_info.md"),
response_model=APIResponse,
)
def collection_job_info(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us keep this endpoint in a separate file to be consistent across the codebase; we already have separate files for documents.py and doc_transformation_job.py

exc_info=True,
)

if "assistant" in locals():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to rely on locals()? simpler way would be to check for assistant and assistant.id and _backout if it does

"locals() is not reliable for checking variable existence - it reflects the state at the time it's called, but the interpreter may optimize variables differently"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes simpler way would be to check if we have an assistant id and if we do, then we go ahead and do the cleanup. however i would have to spend a little more time on that as it is not that straightforward. will cover it in the assistant cleanup pr i will be making for v2

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/crud/collection/collection.py (1)

67-85: Creation flow is incorrect: broad except swallows duplicates and other errors.

If read_one finds an existing collection, you raise FileExistsError but immediately catch it and still add, causing IntegrityError. Also using read_one for existence is wrong and not scoped to service fields.

-        try:
-            existing = self.read_one(collection.id)
-
-            raise FileExistsError("Collection already present")
-        except:
-            self.session.add(collection)
-            self.session.commit()
+        # Idempotency and scope checks
+        if self._exists(collection):
+            raise FileExistsError("Collection already present for this project and service")
+        if not collection.project_id:
+            collection.project_id = self.project_id
+        self.session.add(collection)
+        self.session.commit()
+        self.session.refresh(collection)
♻️ Duplicate comments (9)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (2)

4-4: Docstring inconsistency has been resolved.

The docstring now correctly states "Revises: c6fb6d0b5897", which matches the down_revision value on line 15. This resolves the critical issue flagged in previous reviews.


14-15: Migration revision chain is now correct.

The down_revision is properly set to "c6fb6d0b5897", ensuring this migration runs after c6fb6d0b5897 and before b30727137e65 (which drops the status column). The migration order is now consistent.

backend/app/models/collection.py (1)

79-79: Typo: “compatable” → “compatible”.

-            "must be compatable with the assistants API; see the "
+            "must be compatible with the assistants API; see the "
backend/app/services/collections/create_collection.py (2)

187-189: Avoid locals() for existence checks.

Initialize assistant to None and check explicitly.

-                if "assistant" in locals():
-                    _backout(assistant_crud, assistant.id)
+                if assistant is not None:
+                    _backout(assistant_crud, assistant.id)

Add near line 119 (before the inner try):

+                assistant = None

197-207: Fix outer exception handler: undefined err, closed session, and ensure job is marked FAILED.

Use outer_err, reopen a session, and send callback failure. Based on learnings

     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
-
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.updated_at = now()
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
+        # Best-effort: update job state in a fresh session
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(UUID(job_id) if isinstance(job_id, str) else job_id)
+            except Exception:
+                logger.exception(
+                    "[create_collection.execute_job] Unable to mark collection job as failed | {'job_id': '%s'}",
+                    str(job_id),
+                )
+            else:
+                job.status = CollectionJobStatus.FAILED
+                job.error_message = str(outer_err)
+                job_crud.update(job.id, job)
+        if 'callback' in locals():
+            callback.fail(str(outer_err))
backend/app/tests/api/routes/collections/test_create_collections.py (1)

2-5: Remove duplicate patch import.

Keep a single import to satisfy Ruff F811.

-from unittest.mock import patch
-
 from fastapi.testclient import TestClient
 from unittest.mock import patch
+from uuid import UUID
backend/app/models/collection_job.py (2)

71-77: Align update schema with actual types and usage.

  • task_id must be str | None (Celery task IDs are strings).
  • Prior feedback: consider including id and project_id or change CRUD to accept CollectionJob. Current CRUD ignores payload but type hints are misleading. Based on learnings
-class CollectionJobUpdate(SQLModel):
-    task_id: UUID | None = None
-    status: CollectionJobStatus
-    error_message: str | None = None
-    collection_id: UUID | None = None
-
-    updated_at: datetime
+class CollectionJobUpdate(SQLModel):
+    # Optional: include identifiers if CRUD expects them at type-level
+    # id: UUID
+    # project_id: int
+    task_id: str | None = None
+    status: CollectionJobStatus
+    error_message: str | None = None
+    collection_id: UUID | None = None
+    # Let CRUD set timestamps
+    updated_at: datetime | None = None

47-49: Make task_id optional; it’s not available at creation.

As written, task_id is required and will fail on create(). Celery provides it later.

-    task_id: str = Field(nullable=True)
+    task_id: str | None = Field(default=None, nullable=True)
backend/app/services/collections/delete_collection.py (1)

124-137: Robust failure handling: avoid UnboundLocalError/closed session; fix update call.

  • Don’t use collection_job/collection_job_crud after session closes.
  • Current update(collection_job) has wrong arity.
  • Log with safe IDs even if collection wasn’t loaded. Based on learnings
-    except Exception as err:
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job)
-
-        logger.error(
-            "[delete_collection.execute_job] Unexpected error during deletion | "
-            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
-            str(err),
-            type(err).__name__,
-            str(job_id),
-            exc_info=True,
-        )
-        callback.fail(str(err))
+    except Exception as err:
+        logger.error(
+            "[delete_collection.execute_job] Unexpected error during deletion | "
+            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
+            str(collection_id),
+            str(err),
+            type(err).__name__,
+            str(job_id),
+            exc_info=True,
+        )
+        # Reopen session to update job state
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark collection job as failed | {'job_id': '%s'}",
+                    str(job_id),
+                )
+            else:
+                job.status = CollectionJobStatus.FAILED
+                job.error_message = str(err)
+                job_crud.update(job.id, job)
+        if 'callback' in locals():
+            callback.fail(str(err))
🧹 Nitpick comments (21)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

29-30: Document the no-op downgrade.

The empty downgrade function is intentional for a data cleanup migration, but consider adding a comment explaining that the deleted data cannot be recovered on rollback.

Apply this diff to document the design decision:

 def downgrade():
+    # No-op: Data cleanup is irreversible. Deleted rows cannot be restored.
     pass
backend/app/models/collection.py (5)

3-3: Use builtin generics; drop typing.List (UP035).

Prefer built‑in collection types with Pydantic v2 / SQLModel. Also remove unused List import.

Based on static analysis hints

-from typing import Any, List, Optional
+from typing import Any, Optional
@@
-    documents: List[UUID] = Field(
+    documents: list[UUID] = Field(

Also applies to: 56-56


31-33: Make timestamps non-nullable for consistency.

Align with Organization/Project models; mark inserted_at/updated_at as nullable=False.

Based on learnings

-    inserted_at: datetime = Field(default_factory=now)
-    updated_at: datetime = Field(default_factory=now)
+    inserted_at: datetime = Field(default_factory=now, nullable=False)
+    updated_at: datetime = Field(default_factory=now, nullable=False)

68-70: Deduplicate while preserving order.

set() reorders UUIDs; use dict.fromkeys to keep stable order.

-    def model_post_init(self, __context: Any):
-        self.documents = list(set(self.documents))
+    def model_post_init(self, __context: Any):
+        # Deduplicate while preserving order
+        self.documents = list(dict.fromkeys(self.documents))

73-75: Fix typo in comment: "clien" → "client".

-    # parameters accepted by the OpenAI.clien.beta.assistants.create
+    # parameters accepted by the OpenAI client (e.g., openai.beta.assistants.create)

110-114: Use Pydantic v2 model_fields and correct type hint.

__fields__ is v1; prefer model_fields with fallback. Accept a model class type.

-    def extract_super_type(self, cls: "CreationRequest"):
-        for field_name in cls.__fields__.keys():
-            field_value = getattr(self, field_name)
-            yield (field_name, field_value)
+    def extract_super_type(self, cls: type[SQLModel]):
+        # Pydantic v2 uses `model_fields`; keep fallback for compatibility
+        fields_map = getattr(cls, "model_fields", None) or getattr(cls, "__fields__", {})
+        for field_name in fields_map.keys():
+            yield field_name, getattr(self, field_name)
backend/app/crud/collection/collection.py (5)

5-5: Remove duplicate import.

import logging is already at Line 1.

-import logging

24-30: Clarify error message and fix typo.

“attempter” typo and confusing labels; make explicit.

-            err = "Invalid collection ownership: owner_project={} attempter={}".format(
-                self.project_id,
-                collection.project_id,
-            )
+            err = (
+                f"Invalid collection ownership: expected_project={self.project_id}, "
+                f"collection_project={collection.project_id}"
+            )

49-65: Existence check should ignore soft-deleted records and log scope.

Prevent false positives from soft-deleted collections; enrich log context.

         stmt = (
             select(Collection.id)
             .where(
-                (Collection.project_id == self.project_id)
-                & (Collection.llm_service_id == collection.llm_service_id)
-                & (Collection.llm_service_name == collection.llm_service_name)
+                (Collection.project_id == self.project_id)
+                & (Collection.llm_service_id == collection.llm_service_id)
+                & (Collection.llm_service_name == collection.llm_service_name)
+                & (Collection.deleted_at.is_(None))
             )
             .limit(1)
         )
         present = self.session.exec(stmt).first() is not None
 
-        logger.info(
-            "[CollectionCrud._exists] Existence check completed | "
-            f"{{'llm_service_id': '{collection.llm_service_id}', 'exists': {present}}}"
-        )
+        logger.info(
+            "[CollectionCrud._exists] Existence check | "
+            f\"{{'project_id': '{self.project_id}', 'llm_service_id': '{collection.llm_service_id}', "
+            f"'llm_service_name': '{collection.llm_service_name}', 'exists': {present}}}\"
+        )
         return present

135-142: Guard remote delete when llm_service_id is None.

Avoid calling remote.delete(None) when the local link is missing.

-        remote.delete(model.llm_service_id)
+        if model.llm_service_id:
+            remote.delete(model.llm_service_id)

146-158: Scope document-based deletion to project and exclude soft-deleted collections.

Prevent cross-project deletions and skip already soft-deleted rows.

         statement = (
             select(Collection)
             .join(
                 DocumentCollection,
                 DocumentCollection.collection_id == Collection.id,
             )
-            .where(DocumentCollection.document_id == model.id)
+            .where(
+                and_(
+                    DocumentCollection.document_id == model.id,
+                    Collection.project_id == self.project_id,
+                    Collection.deleted_at.is_(None),
+                )
+            )
             .distinct()
         )
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (2)

70-70: Pass UUID, not str, to read_one.

Crud expects UUID; avoid implicit casting.

-    retrieved_job = collection_job_crud.read_one(str(collection_job.id))
+    retrieved_job = collection_job_crud.read_one(collection_job.id)

84-84: Unnecessary commit in test.

Helper already commits; this can be removed to speed tests.

-    db.commit()
+    # commit not required here
backend/app/api/routes/collection_job.py (1)

27-29: Prefer | over typing.Union in annotations.

Cleaner and satisfies Ruff UP007.

-    response_model=Union[
-        APIResponse[CollectionPublic], APIResponse[CollectionJobPublic]
-    ],
+    response_model=APIResponse[CollectionPublic] | APIResponse[CollectionJobPublic],
backend/app/services/collections/delete_collection.py (3)

91-101: Set job to PROCESSING when work starts.

Mirror create flow; improves observability and client UX.

     collection_job = collection_job_crud.read_one(job_id)
 
     collection_job.task_id = task_id
+    collection_job.status = CollectionJobStatus.PROCESSING
+    collection_job_crud.update(collection_job.id, collection_job)

65-74: Minor: unused task_instance.

Prefix with underscore to silence Ruff ARG001 and signal intent.

-    task_instance,
+    _task_instance,

66-73: Optional: accept job_id as str and cast to UUID internally (Celery passes str).

Keeps signatures consistent with create flow.

-    task_id: str,
-    job_id: UUID,
+    task_id: str,
+    job_id: str,
     collection_id: UUID,
     _task_instance,
 ) -> None:
+    # Cast once at the start of the function body:
+    job_id = UUID(job_id)
backend/app/services/collections/create_collection.py (1)

86-86: Minor: unused task_instance.

Prefix with underscore to silence Ruff ARG001 and signal intent.

-    task_instance,
+    _task_instance,
backend/app/api/routes/collections.py (3)

4-4: Swap to builtin list (and drop unused Union).

Ruff already points out typing.List; please switch to the builtin list annotation and remove the unused Union import while you’re here.


46-47: Lean on CollectionJobStatus for payload status.

We already import CollectionJobStatus; wiring payload.status = CollectionJobStatus.PENDING.value keeps the metadata aligned with the enum the rest of the flow uses (and avoids ad‑hoc strings like "processing").


80-81: Apply the same enum alignment in delete payloads.

Same reasoning as above—set the payload status via CollectionJobStatus.PENDING.value instead of a bare "processing" string so clients see consistent status values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6f2f2 and 4a1f5ff.

📒 Files selected for processing (15)
  • backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1 hunks)
  • backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)
  • backend/app/api/docs/collections/create.md (1 hunks)
  • backend/app/api/main.py (1 hunks)
  • backend/app/api/routes/collection_job.py (1 hunks)
  • backend/app/api/routes/collections.py (4 hunks)
  • backend/app/crud/collection/collection.py (5 hunks)
  • backend/app/models/collection.py (2 hunks)
  • backend/app/models/collection_job.py (1 hunks)
  • backend/app/services/collections/create_collection.py (1 hunks)
  • backend/app/services/collections/delete_collection.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
  • backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (1 hunks)
  • backend/app/tests/services/collections/test_delete_collection.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/tests/services/collections/test_delete_collection.py
  • backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py
🧰 Additional context used
🧬 Code graph analysis (10)
backend/app/tests/api/routes/collections/test_create_collections.py (3)
backend/app/models/collection.py (2)
  • Collection (13-36)
  • CreationRequest (105-113)
backend/app/core/cloud/storage.py (1)
  • client (30-42)
backend/app/tests/conftest.py (2)
  • client (52-55)
  • user_api_key_header (77-79)
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (4)
backend/app/models/collection_job.py (3)
  • CollectionJob (34-60)
  • CollectionJobStatus (10-14)
  • CollectionActionType (17-19)
backend/app/crud/collection/collection_job.py (1)
  • CollectionJobCrud (19-93)
backend/app/tests/utils/utils.py (1)
  • get_project (70-89)
backend/app/tests/conftest.py (1)
  • db (24-41)
backend/app/api/routes/collections.py (8)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (18-163)
  • read_one (86-110)
  • read_all (112-121)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-93)
  • read_one (24-47)
  • read_all (49-58)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/models/collection.py (4)
  • ResponsePayload (39-51)
  • CreationRequest (105-113)
  • DeletionRequest (116-117)
  • CollectionPublic (120-129)
backend/app/utils.py (3)
  • APIResponse (29-53)
  • load_description (248-253)
  • success_response (36-39)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (23-46)
backend/app/services/collections/create_collection.py (1)
  • start_job (41-76)
backend/app/services/collections/delete_collection.py (1)
  • start_job (25-62)
backend/app/models/collection.py (3)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/organization.py (1)
  • Organization (34-57)
backend/app/models/project.py (1)
  • Project (29-63)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
  • upgrade (36-61)
  • downgrade (64-100)
backend/app/api/routes/collection_job.py (6)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (18-163)
  • read_one (86-110)
backend/app/crud/collection/collection_job.py (1)
  • CollectionJobCrud (19-93)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (10-14)
  • CollectionJobPublic (80-87)
backend/app/models/collection.py (1)
  • CollectionPublic (120-129)
backend/app/utils.py (3)
  • APIResponse (29-53)
  • load_description (248-253)
  • success_response (36-39)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (23-46)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/collection/collection.py (4)
  • CollectionCrud (18-163)
  • create (67-84)
  • read_one (86-110)
  • delete (124-132)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-93)
  • create (76-93)
  • read_one (24-47)
  • update (60-74)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/models/collection.py (3)
  • Collection (13-36)
  • DeletionRequest (116-117)
  • ResponsePayload (39-51)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (81-88)
  • WebHookCallback (91-114)
  • success (77-78)
  • success (86-88)
  • success (112-114)
  • fail (74-75)
  • fail (82-84)
  • fail (108-110)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/create_collection.py (2)
  • start_job (41-76)
  • execute_job (79-207)
backend/app/services/collections/create_collection.py (9)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/models/collection.py (6)
  • now (49-51)
  • Collection (13-36)
  • ResponsePayload (39-51)
  • CreationRequest (105-113)
  • AssistantOptions (72-95)
  • extract_super_type (110-113)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (18-163)
  • create (67-84)
  • read_one (86-110)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-93)
  • create (76-93)
  • read_one (24-47)
  • update (60-74)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/services/collections/helpers.py (10)
  • _backout (117-124)
  • batch_documents (50-66)
  • SilentCallback (81-88)
  • WebHookCallback (91-114)
  • success (77-78)
  • success (86-88)
  • success (112-114)
  • fail (74-75)
  • fail (82-84)
  • fail (108-110)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/delete_collection.py (2)
  • start_job (25-62)
  • execute_job (65-138)
backend/app/crud/collection/collection.py (3)
backend/app/models/collection.py (2)
  • Collection (13-36)
  • now (49-51)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/models/collection_job.py (1)
backend/app/models/collection.py (1)
  • now (49-51)
🪛 Ruff (0.13.3)
backend/app/tests/api/routes/collections/test_create_collections.py

5-5: Redefinition of unused patch from line 2

Remove definition: patch

(F811)

backend/app/api/routes/collections.py

4-4: typing.List is deprecated, use list instead

(UP035)

backend/app/models/collection.py

3-3: typing.List is deprecated, use list instead

(UP035)

backend/app/api/routes/collection_job.py

27-29: Use X | Y for type annotations

(UP007)

backend/app/services/collections/delete_collection.py

73-73: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/create_collection.py

206-206: Undefined name err

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

20-26: All foreign key constraints referencing collection.id use ON DELETE CASCADE. No non-cascading foreign keys found.

backend/app/api/main.py (1)

23-24: LGTM — route included.

Adding collection_job router looks correct.

Also applies to: 31-31

backend/app/api/docs/collections/create.md (1)

23-26: LGTM — docs aligned with job-based flow.

Clear that key refers to collection job ID and polling via job-info endpoint.

Consider linking the exact route (path/method) for “collection job info” here for quick navigation.

backend/app/services/collections/create_collection.py (1)

156-160: Remove duplicate create; it re-adds the same collection.

Second create risks integrity errors and is redundant.

-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
-
-                collection_crud.create(collection_data)
+                if flat_docs:
+                    DocumentCollectionCrud(session).create(collection_data, flat_docs)

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
backend/app/services/collections/create_collection.py (1)

195-204: Fix NameError and handle missing variables in outer exception handler.

Line 203 references err (undefined) instead of outer_err. Also, if the error occurs before collection_job/collection_job_crud are initialized (e.g., during session creation), accessing them raises NameError or UnboundLocalError, masking the original error and leaving the job stuck in PENDING.

Reopen a fresh session and defensively update the job:

     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
 
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+                update_data = CollectionJobUpdate(
+                    status=CollectionJobStatus.FAILED,
+                    error_message=str(outer_err),
+                )
+                job_crud.update(job.id, update_data)
+            except Exception:
+                logger.exception(
+                    "[create_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
+        if "callback" in locals():
+            callback.fail(str(outer_err))

Import CollectionJobUpdate if not already present (same as previous comment).

backend/app/services/collections/delete_collection.py (1)

124-138: Fix incorrect update call and handle missing variables.

Line 127 calls update with one argument (collection_job) but the signature requires two: job_id and patch: CollectionJobUpdate. Additionally, if the error occurs before these variables are defined, the handler raises NameError/UnboundLocalError, masking the root cause.

Reopen a session and update defensively:

     except Exception as err:
+        logger.error(
+            "[delete_collection.execute_job] Unexpected error during deletion | "
+            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
+            str(collection_id),
+            str(err),
+            type(err).__name__,
+            str(job_id),
+            exc_info=True,
+        )
+
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+                update_data = CollectionJobUpdate(
+                    status=CollectionJobStatus.FAILED,
+                    error_message=str(err),
+                )
+                job_crud.update(job.id, update_data)
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job)
-
-        logger.error(
-            "[delete_collection.execute_job] Unexpected error during deletion | "
-            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
-            str(err),
-            type(err).__name__,
-            str(job_id),
-            exc_info=True,
-        )
         callback.fail(str(err))

Import CollectionJobUpdate (same as previous comment).

🧹 Nitpick comments (1)
backend/app/crud/collection/collection_job.py (1)

3-3: Use modern list type hint instead of deprecated typing.List.

Python 3.9+ supports native list for type hints.

Apply this diff:

-from typing import List
+from typing import TYPE_CHECKING
-    def read_all(self) -> List[CollectionJob]:
+    def read_all(self) -> list[CollectionJob]:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a1f5ff and 14d50a8.

📒 Files selected for processing (5)
  • backend/app/crud/collection/collection_job.py (1 hunks)
  • backend/app/models/collection_job.py (1 hunks)
  • backend/app/services/collections/create_collection.py (1 hunks)
  • backend/app/services/collections/delete_collection.py (1 hunks)
  • backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/tests/crud/collections/collection_job/test_collection_jobs.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/crud/collection/collection_job.py (3)
backend/app/models/collection_job.py (3)
  • CollectionJob (34-60)
  • CollectionJobUpdate (71-75)
  • CollectionJobCreate (63-68)
backend/app/models/collection.py (1)
  • now (49-51)
backend/app/crud/collection/collection.py (3)
  • read_one (86-110)
  • read_all (112-121)
  • create (67-84)
backend/app/services/collections/delete_collection.py (9)
backend/app/crud/collection/collection.py (4)
  • CollectionCrud (18-163)
  • create (67-84)
  • read_one (86-110)
  • delete (124-132)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-97)
  • create (80-97)
  • read_one (24-47)
  • update (60-78)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/models/collection_job.py (3)
  • CollectionJob (34-60)
  • CollectionJobStatus (10-14)
  • CollectionActionType (17-19)
backend/app/models/collection.py (3)
  • Collection (13-36)
  • DeletionRequest (116-117)
  • ResponsePayload (39-51)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (81-88)
  • WebHookCallback (91-114)
  • success (77-78)
  • success (86-88)
  • success (112-114)
  • fail (74-75)
  • fail (82-84)
  • fail (108-110)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/create_collection.py (2)
  • start_job (41-76)
  • execute_job (79-204)
backend/app/services/collections/create_collection.py (10)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/models/collection.py (6)
  • now (49-51)
  • Collection (13-36)
  • ResponsePayload (39-51)
  • CreationRequest (105-113)
  • AssistantOptions (72-95)
  • extract_super_type (110-113)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (18-163)
  • create (67-84)
  • read_one (86-110)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-97)
  • create (80-97)
  • read_one (24-47)
  • update (60-78)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/services/collections/helpers.py (10)
  • _backout (117-124)
  • batch_documents (50-66)
  • SilentCallback (81-88)
  • WebHookCallback (91-114)
  • success (77-78)
  • success (86-88)
  • success (112-114)
  • fail (74-75)
  • fail (82-84)
  • fail (108-110)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
🪛 Ruff (0.13.3)
backend/app/crud/collection/collection_job.py

3-3: typing.List is deprecated, use list instead

(UP035)

backend/app/services/collections/delete_collection.py

73-73: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/create_collection.py

203-203: Undefined name err

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
backend/app/services/collections/create_collection.py (1)

193-202: Fix NameError and potential UnboundLocalError in outer exception handler.

The outer exception handler has two critical issues:

  1. Line 201: References undefined variable err instead of outer_err, which will raise NameError when this handler is triggered.
  2. Lines 200-202: Assumes collection_job and collection_job_crud are in scope, but if the exception occurs before line 100-101 (e.g., during request deserialization at lines 95-96), these variables won't exist, causing UnboundLocalError.

To fix both issues safely, open a fresh session and update the job status within a try-except block:

     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
 
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
+        try:
+            with Session(engine) as session:
+                job_crud = CollectionJobCrud(session, project_id)
+                job = job_crud.read_one(UUID(job_id))
+                job.status = CollectionJobStatus.FAILED
+                job.error_message = str(outer_err)
+                job_crud.update(job.id, job)
+        except Exception:
+            logger.exception(
+                "[create_collection.execute_job] Failed to mark job as failed | {'job_id': '%s'}",
+                job_id,
+            )
+
+        if "callback" in locals():
+            callback.fail(str(outer_err))
🧹 Nitpick comments (1)
backend/app/services/collections/create_collection.py (1)

153-157: Consider using collection_crud.create with the documents parameter.

The code creates the collection, reads it back, then links documents via DocumentCollectionCrud. Since CollectionCrud.create already accepts an optional documents parameter, you could simplify this to a single call, eliminating the extra read operation.

Apply this diff:

-                collection_crud.create(collection)
-                collection_data = collection_crud.read_one(collection.id)
-
-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
+                collection = collection_crud.create(collection, flat_docs if flat_docs else None)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14d50a8 and 1dcab67.

📒 Files selected for processing (2)
  • backend/app/models/collection_job.py (1 hunks)
  • backend/app/services/collections/create_collection.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/models/collection_job.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/services/collections/create_collection.py (11)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/models/collection.py (6)
  • now (49-51)
  • Collection (13-36)
  • ResponsePayload (39-51)
  • CreationRequest (105-113)
  • AssistantOptions (72-95)
  • extract_super_type (110-113)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (18-163)
  • create (67-84)
  • read_one (86-110)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-97)
  • create (80-97)
  • read_one (24-47)
  • update (60-78)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/services/collections/helpers.py (10)
  • _backout (117-124)
  • batch_documents (50-66)
  • SilentCallback (81-88)
  • WebHookCallback (91-114)
  • success (77-78)
  • success (86-88)
  • success (112-114)
  • fail (74-75)
  • fail (82-84)
  • fail (108-110)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/delete_collection.py (2)
  • start_job (25-62)
  • execute_job (65-138)
🪛 Ruff (0.13.3)
backend/app/services/collections/create_collection.py

201-201: Undefined name err

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/crud/collection/collection.py (1)

1-5: Remove duplicate logging import.

The logging module is imported twice (lines 1 and 5), which is redundant.

Apply this diff to remove the duplicate:

 import logging
 import functools as ft
 from uuid import UUID
 from typing import Optional
-import logging
 
 from fastapi import HTTPException
♻️ Duplicate comments (3)
backend/app/tests/api/routes/collections/test_create_collections.py (1)

2-5: Remove duplicate import of patch.

The patch import appears twice (lines 2 and 5). This is flagged by Ruff as F811 (redefinition of unused import).

Apply this diff to remove the redundant import:

 from uuid import UUID
-from unittest.mock import patch
-
 from fastapi.testclient import TestClient
 from unittest.mock import patch
backend/app/services/collections/create_collection.py (1)

188-197: Fix undefined variable in outer exception handler.

Line 196 references err which is undefined in the outer exception scope (the caught exception is outer_err). This will raise a NameError if this handler executes, masking the original error and leaving the job un-updated.

Apply this diff:

     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
 
         collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
+        collection_job.error_message = str(outer_err)
         collection_job_crud.update(collection_job.id, collection_job)

Note: This handler also assumes collection_job and collection_job_crud are in scope, which may not be true if an error occurs before line 95. Consider wrapping this in a fresh session as suggested in previous reviews.

backend/app/services/collections/delete_collection.py (1)

87-87: Persist task_id before proceeding.

The task_id is assigned to collection_job.task_id but not persisted. If an error occurs in the subsequent operations (lines 89-115), the task_id remains unset in the database.

Apply this diff to persist the assignment:

             collection = collection_crud.read_one(collection_id)
             collection_job = collection_job_crud.read_one(job_id)
 
-            collection_job.task_id = task_id
+            update_data = CollectionJobUpdate(task_id=task_id)
+            collection_job = collection_job_crud.update(collection_job.id, update_data)
🧹 Nitpick comments (8)
backend/app/tests/services/collections/test_delete_collection.py (1)

67-72: Remove or acknowledge unused variable.

The variable precreated is assigned but never used in this test. If the assignment is intentional (to ensure the job exists), use the underscore convention to signal this.

Apply this diff:

-        precreated = create_collection_job(
+        _ = create_collection_job(
             db=db,
             project=project,
             collection=created_collection,
             job_id=collection_job_id,
         )
backend/app/crud/collection/collection_job.py (1)

3-3: Use built-in list instead of typing.List.

Python 3.9+ supports using the built-in list type for annotations, making typing.List unnecessary. The project uses Python >= 3.9 per Alembic 1.15 requirements noted in learnings.

Apply this diff:

 from uuid import UUID
 import logging
-from typing import List
 
 from fastapi import HTTPException

And update line 47:

-    def read_all(self) -> List[CollectionJob]:
+    def read_all(self) -> list[CollectionJob]:
backend/app/services/collections/helpers.py (2)

6-6: Use built-in list instead of typing.List.

Python 3.9+ supports using the built-in list type for annotations. The typing.List import is deprecated.

Apply this diff:

 import re
 from uuid import UUID
-from typing import List
 from dataclasses import asdict, replace

And update line 52:

 def batch_documents(
-    document_crud: DocumentCrud, documents: List[UUID], batch_size: int
+    document_crud: DocumentCrud, documents: list[UUID], batch_size: int
 ):

87-87: Remove unnecessary f-string prefixes.

Lines 87, 91, and 117 use f-strings without any placeholders. Remove the f prefix for cleaner code.

Apply this diff:

     def fail(self, body):
-        logger.info(f"[SilentCallback.fail] Silent callback failure")
+        logger.info("[SilentCallback.fail] Silent callback failure")
         return
 
     def success(self, body):
-        logger.info(f"[SilentCallback.success] Silent callback success")
+        logger.info("[SilentCallback.success] Silent callback success")
         return

And for line 117:

     def success(self, body):
-        logger.info(f"[WebHookCallback.success] Callback succeeded")
+        logger.info("[WebHookCallback.success] Callback succeeded")
         self(APIResponse.success_response(body), "complete")

Also applies to: 91-91, 117-117

backend/app/services/collections/create_collection.py (2)

52-54: Remove unused assignment.

The update call returns the updated job but the result is assigned to an unused variable (flagged by Ruff). Either use the returned value or remove the assignment.

Apply this diff:

-    collection_job = job_crud.update(
+    job_crud.update(
         collection_job_id, CollectionJobUpdate(trace_id=trace_id)
     )

148-152: Simplify collection creation flow.

The current flow creates the collection, reads it back, then links documents separately. Since CollectionCrud.create accepts an optional documents parameter (see backend/app/crud/collection/collection.py lines 54-79), you can pass flat_docs directly and eliminate the read-back step.

Apply this diff:

-                collection_crud.create(collection)
-                collection_data = collection_crud.read_one(collection.id)
-
-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
+                collection_crud.create(collection, flat_docs if flat_docs else None)
backend/app/services/collections/delete_collection.py (1)

36-38: Remove unused assignment.

The update call returns the updated job but the result is assigned to an unused variable (flagged by Ruff). Either use the returned value or remove the assignment.

Apply this diff:

-    collection_job = job_crud.update(
+    job_crud.update(
         collection_job_id, CollectionJobUpdate(trace_id=trace_id)
     )
backend/app/api/routes/collections.py (1)

4-4: Use built-in list instead of deprecated typing.List.

Python 3.9+ supports using list directly in type annotations. The typing.List is deprecated.

Apply this diff:

-from typing import List

And update the type hints:

-    response_model=APIResponse[List[CollectionPublic]],
+    response_model=APIResponse[list[CollectionPublic]],
-    response_model=APIResponse[List[DocumentPublic]],
+    response_model=APIResponse[list[DocumentPublic]],
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1dcab67 and 7b09e3f.

📒 Files selected for processing (16)
  • backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)
  • backend/app/api/docs/collections/create.md (1 hunks)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/api/routes/collection_job.py (1 hunks)
  • backend/app/api/routes/collections.py (4 hunks)
  • backend/app/crud/collection/collection.py (4 hunks)
  • backend/app/crud/collection/collection_job.py (1 hunks)
  • backend/app/models/__init__.py (3 hunks)
  • backend/app/models/collection_job.py (1 hunks)
  • backend/app/services/collections/create_collection.py (1 hunks)
  • backend/app/services/collections/delete_collection.py (1 hunks)
  • backend/app/services/collections/helpers.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (1 hunks)
  • backend/app/tests/services/collections/test_delete_collection.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/app/api/docs/collections/create.md
  • backend/app/api/routes/collection_job.py
  • backend/app/models/collection_job.py
🧰 Additional context used
🧬 Code graph analysis (12)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/models/collection.py (2)
  • Collection (13-36)
  • CreationRequest (105-113)
backend/app/tests/conftest.py (3)
  • client (52-55)
  • user_api_key_header (77-79)
  • user_api_key (89-91)
backend/app/services/collections/helpers.py (5)
backend/app/core/util.py (1)
  • post_callback (29-39)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/models/collection.py (2)
  • ResponsePayload (39-51)
  • now (49-51)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/utils.py (3)
  • APIResponse (29-53)
  • failure_response (42-53)
  • success_response (36-39)
backend/app/tests/services/collections/test_create_collection.py (10)
backend/app/core/cloud/storage.py (1)
  • AmazonCloudStorageClient (28-84)
backend/app/crud/collection/collection.py (5)
  • CollectionCrud (17-159)
  • create (55-80)
  • _ (131-138)
  • _ (141-159)
  • read_one (82-106)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • read_one (24-45)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (11-15)
  • CollectionJob (35-64)
  • CollectionActionType (18-20)
backend/app/models/collection.py (2)
  • CreationRequest (105-113)
  • ResponsePayload (39-51)
backend/app/services/collections/create_collection.py (2)
  • start_job (41-71)
  • execute_job (74-197)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/utils.py (1)
  • get_project (70-89)
backend/app/tests/utils/document.py (2)
  • DocumentStore (52-79)
  • project (64-65)
backend/app/services/collections/delete_collection.py (9)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (17-159)
  • read_one (82-106)
  • delete (120-128)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • update (58-76)
  • read_one (24-45)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
backend/app/models/collection.py (3)
  • Collection (13-36)
  • DeletionRequest (116-117)
  • ResponsePayload (39-51)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (85-92)
  • WebHookCallback (95-118)
  • success (81-82)
  • success (90-92)
  • success (116-118)
  • fail (78-79)
  • fail (86-88)
  • fail (112-114)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/create_collection.py (2)
  • start_job (41-71)
  • execute_job (74-197)
backend/app/tests/api/routes/collections/test_collection_info.py (6)
backend/app/models/collection.py (2)
  • now (49-51)
  • Collection (13-36)
backend/app/models/collection_job.py (4)
  • CollectionJobCreate (67-71)
  • CollectionActionType (18-20)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • update (58-76)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-159)
  • create (55-80)
backend/app/api/routes/collections.py (1)
  • create_collection (44-73)
backend/app/tests/conftest.py (4)
  • db (24-41)
  • client (52-55)
  • user_api_key_header (77-79)
  • user_api_key (89-91)
backend/app/api/routes/collections.py (9)
backend/app/crud/collection/collection.py (4)
  • CollectionCrud (17-159)
  • create (55-80)
  • read_one (82-106)
  • read_all (108-117)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • read_one (24-45)
  • read_all (47-56)
backend/app/crud/document_collection.py (2)
  • DocumentCollectionCrud (8-48)
  • create (12-23)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
  • CollectionJobCreate (67-71)
backend/app/models/collection.py (4)
  • ResponsePayload (39-51)
  • CreationRequest (105-113)
  • DeletionRequest (116-117)
  • CollectionPublic (120-129)
backend/app/utils.py (2)
  • APIResponse (29-53)
  • success_response (36-39)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (22-48)
backend/app/services/collections/create_collection.py (1)
  • start_job (41-71)
backend/app/services/collections/delete_collection.py (1)
  • start_job (25-55)
backend/app/services/collections/create_collection.py (11)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/models/collection.py (6)
  • now (49-51)
  • Collection (13-36)
  • ResponsePayload (39-51)
  • CreationRequest (105-113)
  • AssistantOptions (72-95)
  • extract_super_type (110-113)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (17-159)
  • read_one (82-106)
  • create (55-80)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • update (58-76)
  • read_one (24-45)
  • create (78-96)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/services/collections/helpers.py (10)
  • _backout (121-129)
  • batch_documents (51-70)
  • SilentCallback (85-92)
  • WebHookCallback (95-118)
  • success (81-82)
  • success (90-92)
  • success (116-118)
  • fail (78-79)
  • fail (86-88)
  • fail (112-114)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/delete_collection.py (2)
  • start_job (25-55)
  • execute_job (58-131)
backend/app/crud/collection/collection.py (3)
backend/app/models/collection.py (2)
  • Collection (13-36)
  • now (49-51)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/crud/collection/collection_job.py (2)
  • create (78-96)
  • read_one (24-45)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
  • upgrade (20-26)
backend/app/crud/collection/collection_job.py (1)
  • create (78-96)
backend/app/models/__init__.py (12)
backend/app/models/api_key.py (3)
  • APIKey (28-38)
  • APIKeyBase (10-20)
  • APIKeyPublic (23-25)
backend/app/models/assistants.py (4)
  • Assistant (33-44)
  • AssistantBase (11-30)
  • AssistantCreate (47-76)
  • AssistantUpdate (79-101)
backend/app/models/collection.py (2)
  • Collection (13-36)
  • CollectionPublic (120-129)
backend/app/models/collection_job.py (7)
  • CollectionActionType (18-20)
  • CollectionJob (35-64)
  • CollectionJobBase (23-32)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
  • CollectionJobPublic (82-91)
  • CollectionJobCreate (67-71)
backend/app/models/credentials.py (5)
  • Credential (48-92)
  • CredsBase (9-16)
  • CredsCreate (19-29)
  • CredsPublic (95-103)
  • CredsUpdate (32-45)
backend/app/models/fine_tuning.py (6)
  • FineTuningJobBase (20-25)
  • Fine_Tuning (54-89)
  • FineTuningJobCreate (28-51)
  • FineTuningJobPublic (103-119)
  • FineTuningUpdate (92-100)
  • FineTuningStatus (13-17)
backend/app/models/model_evaluation.py (6)
  • ModelEvaluation (38-85)
  • ModelEvaluationBase (21-26)
  • ModelEvaluationCreate (29-35)
  • ModelEvaluationPublic (95-108)
  • ModelEvaluationStatus (14-18)
  • ModelEvaluationUpdate (88-92)
backend/app/models/onboarding.py (2)
  • OnboardingRequest (7-81)
  • OnboardingResponse (84-103)
backend/app/models/openai_conversation.py (4)
  • OpenAIConversationPublic (101-111)
  • OpenAIConversation (58-69)
  • OpenAIConversationBase (22-55)
  • OpenAIConversationCreate (72-98)
backend/app/models/organization.py (5)
  • Organization (34-57)
  • OrganizationCreate (23-24)
  • OrganizationPublic (61-64)
  • OrganizationsPublic (67-69)
  • OrganizationUpdate (28-30)
backend/app/models/response.py (6)
  • CallbackResponse (47-54)
  • Diagnostics (35-39)
  • FileResultChunk (42-44)
  • ResponsesAPIRequest (4-11)
  • ResponseJobStatus (27-32)
  • ResponsesSyncAPIRequest (14-24)
backend/app/models/threads.py (3)
  • OpenAI_Thread (18-21)
  • OpenAIThreadBase (6-11)
  • OpenAIThreadCreate (14-15)
backend/app/crud/collection/collection_job.py (2)
backend/app/models/collection_job.py (3)
  • CollectionJob (35-64)
  • CollectionJobUpdate (74-79)
  • CollectionJobCreate (67-71)
backend/app/crud/collection/collection.py (3)
  • read_one (82-106)
  • read_all (108-117)
  • create (55-80)
backend/app/tests/services/collections/test_delete_collection.py (7)
backend/app/models/collection.py (3)
  • DeletionRequest (116-117)
  • Collection (13-36)
  • ResponsePayload (39-51)
backend/app/tests/utils/utils.py (1)
  • get_project (70-89)
backend/app/crud/collection/collection.py (5)
  • CollectionCrud (17-159)
  • create (55-80)
  • read_all (108-117)
  • read_one (82-106)
  • delete (120-128)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • read_all (47-56)
  • read_one (24-45)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (11-15)
  • CollectionJob (35-64)
  • CollectionActionType (18-20)
backend/app/services/collections/delete_collection.py (2)
  • start_job (25-55)
  • execute_job (58-131)
backend/app/tests/conftest.py (1)
  • db (24-41)
🪛 Ruff (0.13.3)
backend/app/tests/api/routes/collections/test_create_collections.py

5-5: Redefinition of unused patch from line 2

Remove definition: patch

(F811)

backend/app/services/collections/helpers.py

6-6: typing.List is deprecated, use list instead

(UP035)


87-87: f-string without any placeholders

Remove extraneous f prefix

(F541)


91-91: f-string without any placeholders

Remove extraneous f prefix

(F541)


117-117: f-string without any placeholders

Remove extraneous f prefix

(F541)

backend/app/services/collections/delete_collection.py

36-36: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)


66-66: Unused function argument: task_instance

(ARG001)

backend/app/api/routes/collections.py

4-4: typing.List is deprecated, use list instead

(UP035)

backend/app/services/collections/create_collection.py

52-52: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)


196-196: Undefined name err

(F821)

backend/app/crud/collection/collection_job.py

3-3: typing.List is deprecated, use list instead

(UP035)

backend/app/tests/services/collections/test_delete_collection.py

67-67: Local variable precreated is assigned to but never used

Remove assignment to unused variable precreated

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/app/tests/services/collections/test_create_collection.py (1)

90-93: Simplify enum comparison.

The assertion checks action_type against both the enum member and its .value, which is overly defensive. Since action_type is an Enum field on the model, it should always be the enum member, not a string.

Apply this diff to simplify:

-        assert job.action_type in (
-            CollectionActionType.CREATE,
-            CollectionActionType.CREATE.value,
-        )
+        assert job.action_type == CollectionActionType.CREATE
backend/app/tests/api/routes/collections/test_collection_info.py (1)

58-64: Consider clarifying the FAILED status handling.

The pattern of creating a job and immediately updating it when status == FAILED works correctly, but may be unclear to readers. This is necessary because CollectionJobCreate doesn't include error_message, so a follow-up update is required.

Add a comment to clarify the intent:

     collection_job = CollectionJobCrud(db, user.project_id).create(job_in)
 
+    # CollectionJobCreate doesn't include error_message, so update separately if needed
     if collection_job.status == CollectionJobStatus.FAILED:
         job_in = CollectionJobUpdate(
             error_message="Something went wrong during the collection job process."
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b09e3f and 95ef105.

📒 Files selected for processing (2)
  • backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/tests/services/collections/test_create_collection.py (11)
backend/app/core/cloud/storage.py (1)
  • AmazonCloudStorageClient (28-84)
backend/app/crud/collection/collection.py (5)
  • CollectionCrud (17-159)
  • create (55-80)
  • _ (131-138)
  • _ (141-159)
  • read_one (82-106)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • read_one (24-45)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (11-15)
  • CollectionJob (35-64)
  • CollectionActionType (18-20)
backend/app/models/collection.py (2)
  • CreationRequest (105-113)
  • ResponsePayload (39-51)
backend/app/services/collections/create_collection.py (2)
  • start_job (41-71)
  • execute_job (74-197)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/utils.py (1)
  • get_project (70-89)
backend/app/tests/utils/document.py (2)
  • DocumentStore (52-79)
  • project (64-65)
backend/app/tests/conftest.py (1)
  • db (24-41)
backend/app/tests/api/routes/collections/test_collection_info.py (6)
backend/app/models/collection.py (2)
  • now (49-51)
  • Collection (13-36)
backend/app/models/collection_job.py (4)
  • CollectionJobCreate (67-71)
  • CollectionActionType (18-20)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • update (58-76)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-159)
  • create (55-80)
backend/app/api/routes/collections.py (1)
  • create_collection (44-73)
backend/app/tests/conftest.py (4)
  • db (24-41)
  • client (52-55)
  • user_api_key_header (77-79)
  • user_api_key (89-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/tests/services/collections/test_create_collection.py (1)

111-190: LGTM! Comprehensive integration test.

This test thoroughly validates the end-to-end flow of execute_job:

  • Proper AWS/OpenAI mocking with moto and MagicMock
  • Correct Session patching to inject the test database
  • Verification of all critical state transitions (task_id assignment, status updates, collection creation, document linking)

The test design is solid and covers the success path comprehensively.

Copy link
Copy Markdown
Collaborator

@kartpop kartpop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the route path before merging

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
backend/app/api/routes/collection_job.py (1)

23-27: Consider simplifying the route path to /jobs/{job_id}.

Past reviewers suggested changing from /info/jobs/{job_id} to /jobs/{job_id} for consistency with other routes like /documents/transformations/{job_id}. The author mentioned keeping "collection_job_id" for clarity, but the consensus was that /collections/jobs/{job_id} (resulting from the prefix) would be clearer and more consistent.

Based on past review comments.

🧹 Nitpick comments (2)
backend/app/tests/api/routes/collections/test_collection_info.py (2)

84-87: Clarify the assertion on line 86.

Line 86 asserts data["collection_id"] == collection_job.collection_id, but when create_collection_job is called without a collection_id (as in line 74), both sides evaluate to None. While this assertion passes, it doesn't verify meaningful behavior.

Consider either:

  • Asserting explicitly: assert data["collection_id"] is None
  • Or adding a comment explaining why checking None equality is intentional

69-138: Consider expanding test coverage.

The current tests cover the happy path scenarios well. Consider adding tests for:

  • 404 response when job_id doesn't exist
  • Authorization check (accessing another project's job)
  • DELETE action type in addition to CREATE
  • Edge cases like PROCESSING status
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95ef105 and 4de8c1d.

📒 Files selected for processing (4)
  • backend/app/api/docs/collections/create.md (1 hunks)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/api/routes/collection_job.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/api/docs/collections/create.md
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/api/routes/collection_job.py (6)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-159)
  • read_one (82-106)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (11-15)
  • CollectionJobPublic (82-91)
  • CollectionActionType (18-20)
backend/app/models/collection.py (1)
  • CollectionPublic (120-129)
backend/app/utils.py (3)
  • APIResponse (29-53)
  • load_description (248-253)
  • success_response (36-39)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (22-48)
backend/app/tests/api/routes/collections/test_collection_info.py (6)
backend/app/models/collection.py (2)
  • now (49-51)
  • Collection (13-36)
backend/app/models/collection_job.py (4)
  • CollectionJobCreate (67-71)
  • CollectionActionType (18-20)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • update (58-76)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-159)
  • create (55-80)
backend/app/api/routes/collections.py (1)
  • create_collection (44-73)
backend/app/tests/conftest.py (4)
  • db (24-41)
  • client (52-55)
  • user_api_key_header (77-79)
  • user_api_key (89-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/api/docs/collections/delete.md (1)

1-13: LGTM! Documentation clearly describes the job-based deletion workflow.

The updated documentation accurately reflects the asynchronous deletion process using CollectionJob tracking and polling via the job info endpoint.

backend/app/api/routes/collection_job.py (1)

28-50: LGTM! Job info retrieval logic is well-structured.

The endpoint correctly:

  • Scopes job retrieval to the user's project
  • Conditionally embeds the collection for successful CREATE operations
  • Normalizes error messages for failed jobs using extract_error_message

The conditional logic at lines 38-41 appropriately checks all three conditions before attaching collection data.

backend/app/tests/api/routes/collections/test_collection_info.py (2)

19-39: LGTM! Helper function correctly creates persisted collections.

The create_collection helper appropriately:

  • Uses CollectionCrud to persist the collection
  • Optionally populates LLM service fields
  • Returns the created collection for test assertions

42-66: LGTM! Previous review feedback has been addressed.

The create_collection_job helper now correctly omits inserted_at and updated_at from CollectionJobCreate, relying on the model's default_factory for timestamp generation. The conditional error_message update for FAILED jobs is appropriate.

@@ -1,5 +1,4 @@
Retrieve all AI-platform information about a collection given its
ID. This route is very helpful for:
Retrieve detailed information about a specific collection by its ID from the collection table. Note that this endpoint CANNOT be used as a polling endpoint for collection creation because an entry will be made in the collection table only after the resource creation and association has been successful.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion for all the md/docs files: these are user-facing documents, so they should focus on describing the behavior of the endpoints, not their internal implementation details.
For example:

  • If an endpoint triggers a webhook, simply mention that it does, no need to explain how it’s handled internally.
  • If an endpoint deletes a resource, just state that it deletes it, don’t specify whether it’s a soft or hard delete.

Similarly, avoid mentioning internal mechanisms like can be use for polling or background tasks unless they directly affect how the user interacts with the API.

Copy link
Copy Markdown
Collaborator Author

@nishika26 nishika26 Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, but since for now the main users of this endpoint had been dalgo, I think keeping that in mind I wrote the docs in that way . Stating that you need to poll on the job id returned is something that the user needs to know in this case. The fact that the deletion is soft or hard is sometimes needed to be known considering that our current user of the platform is dalgo, glific and us(the internal team) . So I think we can keep the docs which go into the behaviour of endpoints , anyway we will discuss this with other people as well, and according to their opinion we will apply those changes in v2

op.execute(
"""
DELETE FROM collection
WHERE status IN ('processing', 'failed')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure no collection with empty llm_service_id exist

Copy link
Copy Markdown
Collaborator Author

@nishika26 nishika26 Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think there would be a situation where status turned to successful while no llm service id is there as they're committed together in the same transaction, they both persist atomically. By they i mean the status as successful and the llm service id , for reference - here . But for surity i will still add that sql query as well

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
backend/app/api/docs/collections/job_info.md (1)

7-8: Fix list indentation to match Markdown standards.

Nested list items still use 4-space indentation instead of the standard 2-space indentation.

Apply this diff:

 * If the job has finished, has been successful and it was a job of creation of collection then this endpoint will fetch the associated collection details from the collection table, including:
-    - `llm_service_id`: The OpenAI assistant or model used for the collection.
-    - Collection metadata such as ID, project, organization, and timestamps.
+  - `llm_service_id`: The OpenAI assistant or model used for the collection.
+  - Collection metadata such as ID, project, organization, and timestamps.
backend/app/services/collections/delete_collection.py (2)

127-141: Fix outer exception handler—UnboundLocalError and closed session.

The outer exception handler has multiple critical issues:

  1. UnboundLocalError risk: If an exception occurs before line 86 (e.g., during get_openai_client), collection_job and collection_job_crud will be undefined, causing line 128 to raise UnboundLocalError.
  2. Closed session: The with Session(engine) context has exited, so the session is closed and line 130 will fail.
  3. Wrong signature: Line 130 calls update(collection_job.id, collection_job) instead of update(job_id, patch).

Reopen a fresh session and fetch the job defensively:

     except Exception as err:
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
-
         logger.error(
             "[delete_collection.execute_job] Unexpected error during deletion | "
             "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
+            str(collection_id),
             str(err),
             type(err).__name__,
             str(job_id),
             exc_info=True,
         )
-        callback.fail(str(err))
+
+        # Reopen session to mark job as failed
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job_crud.update(
+                    job_id,
+                    CollectionJobUpdate(
+                        status=CollectionJobStatus.FAILED,
+                        error_message=str(err),
+                    ),
+                )
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
+        # Only call callback if it was initialized
+        if "callback" in locals():
+            callback.fail(str(err))

114-116: Fix incorrect update() signature.

Same issue as above—you're passing collection_job instead of a CollectionJobUpdate patch object.

Apply this diff:

                 collection_job.status = CollectionJobStatus.FAILED
                 collection_job.error_message = str(err)
-                collection_job_crud.update(collection_job.id, collection_job)
+                collection_job_crud.update(
+                    collection_job.id,
+                    CollectionJobUpdate(
+                        status=CollectionJobStatus.FAILED,
+                        error_message=str(err),
+                    ),
+                )
🧹 Nitpick comments (5)
backend/app/models/collection.py (3)

48-51: Remove redundant now() classmethod.

The ResponsePayload.now() classmethod simply delegates to the imported now() function from app.core.util. This adds no value and creates confusion about which now() to use.

Apply this diff:

     time: datetime = Field(default_factory=now)
-
-    @classmethod
-    def now(cls):
-        """Returns current UTC time without timezone info"""
-        return now()

68-69: Document deduplication behavior or preserve order.

Converting documents to a set and back to a list removes duplicates but also loses the original ordering. If document processing order matters, consider using dict.fromkeys(self.documents) to preserve insertion order while removing duplicates.

If order should be preserved:

     def model_post_init(self, __context: Any):
-        self.documents = list(set(self.documents))
+        self.documents = list(dict.fromkeys(self.documents))

110-113: Clarify method name and purpose.

The method name extract_super_type is misleading—it yields field name-value pairs, not a supertype. Consider renaming to iter_fields() or adding a docstring explaining its purpose.

Example rename:

-    def extract_super_type(self, cls: "CreationRequest"):
+    def iter_fields(self):
+        """Yield (field_name, field_value) pairs for all fields."""
-        for field_name in cls.__fields__.keys():
+        for field_name in self.__fields__.keys():
             field_value = getattr(self, field_name)
             yield (field_name, field_value)
backend/app/services/collections/delete_collection.py (2)

37-39: Remove unused variable assignment.

The collection_job variable is assigned but never used. The return value of update() is not needed here since we only need the collection_job_id.

Apply this diff:

-    collection_job = job_crud.update(
+    job_crud.update(
         collection_job_id, CollectionJobUpdate(trace_id=trace_id)
     )

67-67: Remove unused task_instance parameter.

The task_instance parameter is declared but never used in the function body. If it's required by the Celery task signature, add a leading underscore (_task_instance) to indicate it's intentionally unused.

Apply this diff:

-    task_instance,
+    _task_instance,
 ) -> None:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4de8c1d and 2c0a11d.

📒 Files selected for processing (5)
  • backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1 hunks)
  • backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)
  • backend/app/api/docs/collections/job_info.md (1 hunks)
  • backend/app/models/collection.py (2 hunks)
  • backend/app/services/collections/delete_collection.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py
  • backend/app/models/collection.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.299Z
Learnt from: CR
PR: ProjectTech4DevAI/ai-platform#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.299Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/

Applied to files:

  • backend/app/models/collection.py
🧬 Code graph analysis (3)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (17-159)
  • read_one (82-106)
  • delete (120-128)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • update (58-76)
  • read_one (24-45)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/models/collection.py (3)
  • Collection (13-36)
  • DeletionRequest (116-117)
  • ResponsePayload (39-51)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (85-92)
  • WebHookCallback (95-118)
  • success (81-82)
  • success (90-92)
  • success (116-118)
  • fail (78-79)
  • fail (86-88)
  • fail (112-114)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/create_collection.py (2)
  • start_job (41-71)
  • execute_job (74-197)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
  • upgrade (20-32)
backend/app/crud/collection/collection_job.py (1)
  • create (78-96)
backend/app/models/collection.py (3)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/organization.py (1)
  • Organization (34-57)
backend/app/models/project.py (1)
  • Project (29-63)
🪛 markdownlint-cli2 (0.18.1)
backend/app/api/docs/collections/job_info.md

7-7: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


8-8: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🪛 Ruff (0.13.3)
backend/app/services/collections/delete_collection.py

37-37: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)


67-67: Unused function argument: task_instance

(ARG001)

backend/app/models/collection.py

3-3: typing.List is deprecated, use list instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)

36-68: Migration structure looks good.

The upgrade path correctly:

  • Creates enum types with checkfirst=True
  • Establishes the collection_jobs table with proper constraints
  • Renames created_at to inserted_at for consistency
  • Enforces non-null constraints on llm_service_id and llm_service_name
  • Removes deprecated columns (owner_id, status, error_message)

The downgrade strategy (lines 94-95) uses hardcoded defaults as a pragmatic workaround, which was already discussed in previous reviews.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
backend/app/tests/services/collections/test_create_collection.py (1)

31-45: Pass CollectionJobCreate schema instead of model instance.

The helper instantiates a CollectionJob model directly (line 38) and passes it to CollectionJobCrud.create(), but according to the CRUD signature in backend/app/crud/collection/collection_job.py (lines 64-68), the create method expects a CollectionJobCreate schema, not a model instance. This bypasses Pydantic validation and can cause type issues.

Apply this diff to use the proper schema:

+from app.models import CollectionJobCreate
+
 def create_collection_job_for_create(
     db: Session,
     project,
     job_id: UUID,
 ):
     """Pre-create a CREATE job with the given id so start_job can update it."""
     return CollectionJobCrud(db, project.id).create(
-        CollectionJob(
+        CollectionJobCreate(
             id=job_id,
             action_type=CollectionActionType.CREATE,
             project_id=project.id,
             collection_id=None,
             status=CollectionJobStatus.PENDING,
         )
     )
backend/app/services/collections/delete_collection.py (1)

137-155: Fix outer exception handler to avoid UnboundLocalError and closed session.

The outer exception handler has critical issues:

  1. UnboundLocalError/NameError: Lines 138-143 reference collection_job_crud and collection_job, which won't be defined if the error occurs before line 87-88 (e.g., during DeletionRequest construction at line 69 or callback setup at lines 72-76).

  2. Closed session: The session from line 84 is closed by the time this handler executes, so any DB operations will fail.

  3. NameError on collection: Line 149 references collection.id, but collection is only defined at line 99 inside the inner try block.

Apply this diff to reopen a fresh session and handle the job update defensively:

     except Exception as err:
-        collection_job_crud.update(
-            collection_job.id,
-            CollectionJobUpdate(
-                status=CollectionJobStatus.FAILED,
-                error_message=str(err),
-            ),
-        )
-
         logger.error(
             "[delete_collection.execute_job] Unexpected error during deletion | "
-            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
+            "{'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
             str(err),
             type(err).__name__,
             str(job_id),
             exc_info=True,
         )
-        callback.fail(str(err))
+
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+                job_crud.update(
+                    job.id,
+                    CollectionJobUpdate(
+                        status=CollectionJobStatus.FAILED,
+                        error_message=str(err),
+                    ),
+                )
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
+        if "callback" in locals():
+            callback.fail(str(err))
🧹 Nitpick comments (1)
backend/app/services/collections/create_collection.py (1)

151-155: Consider consolidating collection and document creation.

The code creates the collection (line 151), reads it back (line 152), then creates DocumentCollection links (lines 154-155). According to CollectionCrud.create signature in backend/app/crud/collection/collection.py (lines 38-61), the method accepts an optional documents parameter that handles the linking internally.

Consider simplifying to:

-                collection_crud.create(collection)
-                collection_data = collection_crud.read_one(collection.id)
-
-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
+                collection = collection_crud.create(
+                    collection, 
+                    documents=flat_docs if flat_docs else None
+                )

This eliminates the redundant read and consolidates the logic into a single operation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c0a11d and 94234b1.

📒 Files selected for processing (5)
  • backend/app/models/collection.py (2 hunks)
  • backend/app/services/collections/create_collection.py (1 hunks)
  • backend/app/services/collections/delete_collection.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (1 hunks)
  • backend/app/tests/services/collections/test_delete_collection.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/create_collection.py
  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/tests/services/collections/test_delete_collection.py
  • backend/app/models/collection.py
  • backend/app/services/collections/delete_collection.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/delete_collection.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.299Z
Learnt from: CR
PR: ProjectTech4DevAI/ai-platform#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.299Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/

Applied to files:

  • backend/app/models/collection.py
🧬 Code graph analysis (5)
backend/app/services/collections/create_collection.py (10)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/models/collection.py (6)
  • now (49-51)
  • Collection (13-36)
  • ResponsePayload (39-51)
  • CreationRequest (105-113)
  • AssistantOptions (72-95)
  • extract_super_type (110-113)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • update (58-76)
  • read_one (24-45)
  • create (78-96)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/services/collections/helpers.py (10)
  • _backout (121-129)
  • batch_documents (51-70)
  • SilentCallback (85-92)
  • WebHookCallback (95-118)
  • success (81-82)
  • success (90-92)
  • success (116-118)
  • fail (78-79)
  • fail (86-88)
  • fail (112-114)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/services/collections/delete_collection.py (2)
  • start_job (25-56)
  • execute_job (59-155)
backend/app/tests/services/collections/test_create_collection.py (7)
backend/app/core/cloud/storage.py (1)
  • AmazonCloudStorageClient (28-84)
backend/app/crud/collection/collection.py (5)
  • CollectionCrud (17-159)
  • create (55-80)
  • _ (131-138)
  • _ (141-159)
  • read_one (82-106)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • read_one (24-45)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/services/collections/create_collection.py (2)
  • start_job (41-71)
  • execute_job (74-212)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/document.py (2)
  • DocumentStore (52-79)
  • project (64-65)
backend/app/tests/services/collections/test_delete_collection.py (7)
backend/app/models/collection.py (3)
  • DeletionRequest (116-117)
  • Collection (13-36)
  • ResponsePayload (39-51)
backend/app/tests/utils/utils.py (1)
  • get_project (70-89)
backend/app/crud/collection/collection.py (5)
  • CollectionCrud (17-159)
  • create (55-80)
  • read_all (108-117)
  • read_one (82-106)
  • delete (120-128)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • create (78-96)
  • read_all (47-56)
  • read_one (24-45)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (11-15)
  • CollectionJob (35-64)
  • CollectionActionType (18-20)
backend/app/api/routes/collections.py (1)
  • delete_collection (80-114)
backend/app/services/collections/delete_collection.py (2)
  • start_job (25-56)
  • execute_job (59-155)
backend/app/models/collection.py (3)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/organization.py (1)
  • Organization (34-57)
backend/app/models/project.py (1)
  • Project (29-63)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (17-159)
  • read_one (82-106)
  • delete (120-128)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • update (58-76)
  • read_one (24-45)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
backend/app/models/collection.py (3)
  • Collection (13-36)
  • DeletionRequest (116-117)
  • ResponsePayload (39-51)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (85-92)
  • WebHookCallback (95-118)
  • success (81-82)
  • success (90-92)
  • success (116-118)
  • fail (78-79)
  • fail (86-88)
  • fail (112-114)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (1)
  • get_openai_client (175-205)
🪛 Ruff (0.13.3)
backend/app/services/collections/create_collection.py

52-52: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)

backend/app/tests/services/collections/test_delete_collection.py

68-68: Local variable precreated is assigned to but never used

Remove assignment to unused variable precreated

(F841)

backend/app/services/collections/delete_collection.py

37-37: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)


67-67: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (13)
backend/app/tests/services/collections/test_delete_collection.py (2)

49-108: LGTM!

The test correctly verifies that start_job updates the CollectionJob and schedules the background task with appropriate parameters. The unused precreated variable (line 68) is acceptable here—the job is created for its side effects and fetched later via CRUD.


111-228: LGTM!

Both tests effectively validate the execute_job workflow:

  • Success path verifies job transitions to SUCCESSFUL, CollectionCrud.delete is invoked, and OpenAI client is used.
  • Failure path confirms job is marked FAILED with the error message when delete raises an exception.

The mocking strategy and assertions are appropriate.

backend/app/tests/services/collections/test_create_collection.py (2)

48-109: LGTM!

The test correctly validates start_job behavior, ensuring the CollectionJob is updated and the background task is scheduled with appropriate parameters. The unused variable at line 68 is acceptable—it's created for side effects.


111-190: LGTM!

The test comprehensively validates the execute_job success flow, including:

  • AWS S3 mocking for document storage
  • OpenAI client interaction
  • Collection and DocumentCollection creation
  • Job status updates

The test structure and assertions are thorough and appropriate.

backend/app/models/collection.py (5)

13-36: LGTM!

The Collection model is properly structured with appropriate foreign keys, cascading deletes, and relationships. The timestamp fields follow the project's naming convention.


39-51: LGTM!

The ResponsePayload model properly encapsulates response metadata for background jobs. The default factories for key and time, along with the now() classmethod, are well-designed.


54-69: LGTM!

The DocumentOptions model is well-designed with appropriate defaults and helpful field descriptions. The model_post_init hook that deduplicates document IDs prevents duplicate processing.


72-102: LGTM!

Both AssistantOptions and CallbackRequest are well-documented with detailed field descriptions. The temperature default and its explanation are particularly helpful for API users.


105-129: LGTM!

The request models (CreationRequest, DeletionRequest) and CollectionPublic are properly structured:

  • Multiple inheritance pattern cleanly composes request models
  • extract_super_type utility method is useful for extracting parent class fields
  • CollectionPublic appropriately exposes only necessary fields for the public API
backend/app/services/collections/create_collection.py (2)

41-71: LGTM!

The start_job function correctly orchestrates job scheduling:

  • Updates the CollectionJob with correlation/trace context
  • Schedules the background task with appropriate parameters
  • Returns the job ID for tracking

The "unused" collection_job variable at line 52 is acceptable—the update is persisted via the CRUD method.


199-212: Outer exception handler risks UnboundLocalError.

If an error occurs before line 95 (e.g., during CreationRequest construction or job fetching), collection_job_crud and collection_job won't be defined, causing lines 206-211 to raise UnboundLocalError. Additionally, the session from line 89 is already closed when this handler executes.

This issue was flagged in past reviews and noted for addressing in a future PR. Please ensure it's tracked in the GitHub issue for v2 session management improvements.

Based on past review comments.

backend/app/services/collections/delete_collection.py (2)

25-56: LGTM!

The start_job function properly orchestrates collection deletion:

  • Updates the CollectionJob with trace context
  • Schedules the background task with all necessary parameters
  • Returns the job ID for tracking

The "unused" collection_job variable at line 37 is acceptable—the update is persisted via the CRUD method.


59-117: LGTM!

The execute_job success path is well-structured:

  • Properly constructs request/payload models and selects the appropriate callback
  • Updates job status to PROCESSING before attempting deletion
  • Uses CollectionJobUpdate schema correctly for job updates
  • Marks job SUCCESSFUL and triggers success callback upon completion

@nishika26 nishika26 merged commit 20af88d into main Oct 9, 2025
3 checks passed
@nishika26 nishika26 deleted the feature/collection_to_celery branch October 9, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Celery : introduce a "collection_jobs" table for job tracking Celery : Moving collection's background jobs to celery

3 participants