Celery : moving collections module to celery by nishika26 · Pull Request #385 · ProjectTech4DevAI/kaapi-backend

nishika26 · 2025-09-23T18:20:25Z

Summary

Target issue is #383 and #389

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

This PR covers :
- Shifting collection module's dependecy on project id instead of user id
- Swapping Fastapi's background task template with celery
- Introducing a new collection job table which takes care of retaining the job status, error message, etc. An entry will be made in the collection table only when the job has been successful

Summary by CodeRabbit

New Features
- Job-based collection create/delete with background workers, webhook callbacks, polling endpoint for collection jobs, and richer public collection & collection-job metadata.
Documentation
- Updated create/info/delete docs; added collection-job docs and polling guidance clarifying job responses.
Refactor
- Routes/CRUD now project-scoped; collection endpoints delegate to service layer and return simplified public models.
Database
- Added collection_jobs table; renamed timestamp and removed legacy columns from collections.
Helpers
- Added error normalization, document batching, callback framework, and safe rollback utilities.
Tests
- Added/updated tests for job workflows, services, CRUD, and collection lifecycle.

…ent creation

1) Move models from route to models folder 2) fix process response to handle callback response

…clean up imports

…tructured response

…nd error handling

…for job execution

coderabbitai · 2025-09-23T18:20:31Z

Walkthrough

Adds DB-backed CollectionJob lifecycle and CRUD; moves collection create/delete into background services with start_job/execute_job; scopes collections to projects (removes owner_id); introduces new models, helpers, migrations, docs, routes, and tests to support job-based polling, callbacks, and error normalization.

Changes

Cohort / File(s)	Summary
API Docs `backend/app/api/docs/collections/*.md`	Update create/delete docs to return a `collection_job`; add `job_info.md` documenting polling-style job-info endpoint; clarify collection info is not a pollable create endpoint.
API Routes `backend/app/api/routes/collections.py`, `backend/app/api/routes/collection_job.py`, `backend/app/api/routes/documents.py`	Routes moved to project-scoped `CurrentUserOrgProject`; create/delete now schedule background service jobs (`start_job`); new GET `/collections/info/jobs/{job_id}` returns `CollectionJobPublic`; document endpoints and CRUD use `project_id`.
Main Router `backend/app/api/main.py`	Register new collection_job router.
CRUD: Collections & Jobs `backend/app/crud/__init__.py`, `backend/app/crud/collection/__init__.py`, `backend/app/crud/collection/collection.py`, `backend/app/crud/collection/collection_job.py`	`CollectionCrud` now initialized with `project_id` (was `owner_id`) and raises 404 on missing; new `CollectionJobCrud` added; package re-exports updated.
Models `backend/app/models/__init__.py`, `backend/app/models/collection.py`, `backend/app/models/collection_job.py`, `backend/app/models/user.py`	Remove `owner_id`, `status`, `error_message` from Collection; rename `created_at`→`inserted_at`; add Creation/Deletion/Response payload models and `CollectionPublic`; add CollectionJob enums/models and public job types; remove User.collections relationship; expand model exports.
Services & Helpers `backend/app/services/collections/*.py`	New service modules for create/delete with `start_job`/`execute_job`, batching, callback handlers (Silent/WebHook), error extraction (`extract_error_message`), assistant backout, and orchestration with CRUD/OpenAI/vector-store.
Migrations `backend/app/alembic/versions/*.py`	Add migration to create `collection_jobs` table and enums; alter `collection` table (drop owner_id/status/error_message, rename created_at→inserted_at); add cleanup migration removing legacy non-successful rows.
Tests `backend/app/tests/...`	Tests updated/added to exercise CollectionJob lifecycle, service execute flows, and project-scoped CRUD; test helpers adjusted to use `get_project` and `project_id`; some tests refactored to mock start_job.
Test Utils `backend/app/tests/utils/collection.py`, `backend/app/tests/utils/utils.py`	`get_collection` signature changed to accept `project_id`; new/updated `get_project` helper used by tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant API as Collections API
  participant Jobs as CollectionJobCrud
  participant BG as Scheduler
  participant Exec as Worker.execute_job
  participant DB
  participant OA as OpenAI/VectorStore
  rect rgba(200,240,200,0.18)
    Client->>API: POST /collections/create (CreationRequest)
    API->>Jobs: create(job PENDING, action=CREATE, project_id)
    API->>BG: schedule execute_job(job_id, payload, project_id, org_id)
    API-->>Client: APIResponse { key: job_id }
  end
  rect rgba(220,220,255,0.12)
    BG->>Exec: invoke execute_job(job_id,...)
    Exec->>DB: open session
    Exec->>Jobs: update(job_id, status=PROCESSING, task_id)
    Exec->>OA: create vector store, upload & batch documents
    Exec->>DB: create Collection, link docs, create assistant
    Exec->>Jobs: update(job_id, status=SUCCESSFUL, collection_id)
    Exec-->>Client: optional webhook callback (SUCCESS)
  end
  alt failure
    Exec->>Jobs: update(job_id, status=FAILED, error_message)
    Exec-->>Client: optional webhook callback (FAIL)
  end

sequenceDiagram
  autonumber
  participant Client
  participant API as Collection Job API
  participant Jobs as CollectionJobCrud
  participant Cols as CollectionCrud
  Client->>API: GET /collections/info/jobs/{job_id}
  API->>Jobs: read_one(job_id)
  alt job.status == SUCCESSFUL and job.collection_id
    API->>Cols: read_one(collection_id)
    API-->>Client: APIResponse { data: CollectionPublic (embedded) }
  else job.status == FAILED
    API-->>Client: APIResponse { data: CollectionJobPublic (normalized error) }
  else
    API-->>Client: APIResponse { data: CollectionJobPublic (PENDING/PROCESSING) }
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Collection: Standardized OpenAI Creds Fetching #258 — related changes to project-scoped routing and OpenAI client handling in collection/document flows.
Refactor Document Module and Cloud Storage to use project-based organization and add signed URL support #346 — overlapping refactor switching collection/document endpoints and CRUD to project-scoped access.

Suggested labels

ready-for-review

Suggested reviewers

avirajsingh7
AkhileshNegi
kartpop

Poem

Thump-thump, I queue the job and hop along,
Batches hum and vectors hum a brighter song.
A job ID key, a webhook if it's done,
I tidy up softly when failures come.
Hooray — projects guide my carrot-chasing fun! 🥕🐰

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “Celery : moving collections module to celery” clearly indicates the primary change of migrating the collections workflow to Celery-based processing, which aligns with the core update of replacing FastAPI background tasks with Celery for collection operations. While the phrasing is slightly redundant, it accurately reflects the main focus of this pull request.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/collection_to_celery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ling

…e them

…the shift from user id to project id

…functions

… request metadata

…erate_response function

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (2)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)
46-46: Duplicate: task_id should be TEXT/VARCHAR, not UUID.

As flagged in a previous review comment, Celery task IDs are strings (not UUIDs), so this column type is incorrect. This issue was already raised but has not been addressed.

Apply this diff:
-        sa.Column("task_id", sa.Uuid(), nullable=True),
+        sa.Column("task_id", sa.String(), nullable=True),
46-46: Critical: task_id column type should be String, not UUID.

Celery generates task IDs as strings (typically 36-character hyphenated strings like "a5f3c8d2-1b4e-4f9a-8c7d-2e5f6a1b8c9d"), not as native UUID types. Using sa.Uuid() will cause type mismatch errors when storing Celery task IDs.

Apply this diff:
-        sa.Column("task_id", sa.Uuid(), nullable=True),
+        sa.Column("task_id", sa.String(), nullable=True),
Note: This issue was previously flagged by @coderabbitai[bot] and remains unresolved.

🧹 Nitpick comments (1)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)
37-38: Add explicit ENUM type cleanup in downgrade.

The upgrade() function creates two ENUM types (lines 37-38), but the downgrade() function (line 94) doesn't explicitly drop them. While PostgreSQL may automatically clean up unused ENUMs when the table is dropped, explicit cleanup ensures complete and predictable reversal.

Add ENUM cleanup to the downgrade function:
     op.alter_column("collection", "inserted_at", new_column_name="created_at")
     op.drop_table("collection_jobs")
+    
+    # Explicitly drop ENUM types created in upgrade
+    collection_action_type.drop(op.get_bind(), checkfirst=True)
+    collection_job_status_enum.drop(op.get_bind(), checkfirst=True)
Note: Use checkfirst=True to avoid errors if the types are still referenced elsewhere or have already been dropped.

Also applies to: 94-94

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81e0890 and 5b6f2f2.

📒 Files selected for processing (2)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1 hunks)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)

upgrade (36-61)

downgrade (64-94)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (2)

upgrade (20-26)

downgrade (29-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (2)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)

57-61: project_id is already on collection; removing owner_id is safe
Migration 3389c67fdcb4_add_alter_columns_in_collections_table.py adds a nullable project_id column to collection, so dropping owner_id here aligns with project-scoped ownership.
backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
20-30: Verify permanent data loss manually.

Automatic script execution failed (sqlalchemy unavailable). Run a SQL query (e.g., via psql):
psql $DATABASE_URL -c "SELECT status, COUNT(*) FROM collection WHERE status IN ('processing','failed') GROUP BY status;"
Confirm that deleting these rows is acceptable. If any data must be retained, archive it to a separate table or backup before applying this migration.

...end/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py

.../app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py

kartpop

general comments

is the __init__.py in services/collections required?

kartpop · 2025-10-06T11:16:23Z

backend/app/api/routes/collections.py

+    description=load_description("collections/job_info.md"),
+    response_model=APIResponse,
+)
+def collection_job_info(


let us keep this endpoint in a separate file to be consistent across the codebase; we already have separate files for documents.py and doc_transformation_job.py

backend/app/models/collection.py

backend/app/services/collections/helpers.py

kartpop · 2025-10-06T12:53:47Z

backend/app/services/collections/create_collection.py

+                    exc_info=True,
+                )
+
+                if "assistant" in locals():


any reason to rely on locals()? simpler way would be to check for assistant and assistant.id and _backout if it does

"locals() is not reliable for checking variable existence - it reflects the state at the time it's called, but the interpreter may optimize variables differently"

yes simpler way would be to check if we have an assistant id and if we do, then we go ahead and do the cleanup. however i would have to spend a little more time on that as it is not that straightforward. will cover it in the assistant cleanup pr i will be making for v2

backend/app/models/collection.py

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/crud/collection/collection.py (1)

67-85: Creation flow is incorrect: broad except swallows duplicates and other errors.

If read_one finds an existing collection, you raise FileExistsError but immediately catch it and still add, causing IntegrityError. Also using read_one for existence is wrong and not scoped to service fields.

-        try:
-            existing = self.read_one(collection.id)
-
-            raise FileExistsError("Collection already present")
-        except:
-            self.session.add(collection)
-            self.session.commit()
+        # Idempotency and scope checks
+        if self._exists(collection):
+            raise FileExistsError("Collection already present for this project and service")
+        if not collection.project_id:
+            collection.project_id = self.project_id
+        self.session.add(collection)
+        self.session.commit()
+        self.session.refresh(collection)

♻️ Duplicate comments (9)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (2)

4-4: Docstring inconsistency has been resolved.

The docstring now correctly states "Revises: c6fb6d0b5897", which matches the down_revision value on line 15. This resolves the critical issue flagged in previous reviews.

14-15: Migration revision chain is now correct.

The down_revision is properly set to "c6fb6d0b5897", ensuring this migration runs after c6fb6d0b5897 and before b30727137e65 (which drops the status column). The migration order is now consistent.

backend/app/models/collection.py (1)

79-79: Typo: “compatable” → “compatible”.

-            "must be compatable with the assistants API; see the "
+            "must be compatible with the assistants API; see the "

backend/app/services/collections/create_collection.py (2)

187-189: Avoid locals() for existence checks.

Initialize assistant to None and check explicitly.

-                if "assistant" in locals():
-                    _backout(assistant_crud, assistant.id)
+                if assistant is not None:
+                    _backout(assistant_crud, assistant.id)

Add near line 119 (before the inner try):

+                assistant = None

197-207: Fix outer exception handler: undefined err, closed session, and ensure job is marked FAILED.

Use outer_err, reopen a session, and send callback failure. Based on learnings

     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
-
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.updated_at = now()
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
+        # Best-effort: update job state in a fresh session
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(UUID(job_id) if isinstance(job_id, str) else job_id)
+            except Exception:
+                logger.exception(
+                    "[create_collection.execute_job] Unable to mark collection job as failed | {'job_id': '%s'}",
+                    str(job_id),
+                )
+            else:
+                job.status = CollectionJobStatus.FAILED
+                job.error_message = str(outer_err)
+                job_crud.update(job.id, job)
+        if 'callback' in locals():
+            callback.fail(str(outer_err))

backend/app/tests/api/routes/collections/test_create_collections.py (1)

2-5: Remove duplicate patch import.

Keep a single import to satisfy Ruff F811.
-from unittest.mock import patch
-
 from fastapi.testclient import TestClient
 from unittest.mock import patch
+from uuid import UUID

backend/app/models/collection_job.py (2)

71-77: Align update schema with actual types and usage.

task_id must be str | None (Celery task IDs are strings).

Prior feedback: consider including id and project_id or change CRUD to accept CollectionJob. Current CRUD ignores payload but type hints are misleading. Based on learnings
-class CollectionJobUpdate(SQLModel):
-    task_id: UUID | None = None
-    status: CollectionJobStatus
-    error_message: str | None = None
-    collection_id: UUID | None = None
-
-    updated_at: datetime
+class CollectionJobUpdate(SQLModel):
+    # Optional: include identifiers if CRUD expects them at type-level
+    # id: UUID
+    # project_id: int
+    task_id: str | None = None
+    status: CollectionJobStatus
+    error_message: str | None = None
+    collection_id: UUID | None = None
+    # Let CRUD set timestamps
+    updated_at: datetime | None = None
47-49: Make task_id optional; it’s not available at creation.

As written, task_id is required and will fail on create(). Celery provides it later.
-    task_id: str = Field(nullable=True)
+    task_id: str | None = Field(default=None, nullable=True)

backend/app/services/collections/delete_collection.py (1)

124-137: Robust failure handling: avoid UnboundLocalError/closed session; fix update call.

Don’t use collection_job/collection_job_crud after session closes.
Current update(collection_job) has wrong arity.
Log with safe IDs even if collection wasn’t loaded. Based on learnings

-    except Exception as err:
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job)
-
-        logger.error(
-            "[delete_collection.execute_job] Unexpected error during deletion | "
-            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
-            str(err),
-            type(err).__name__,
-            str(job_id),
-            exc_info=True,
-        )
-        callback.fail(str(err))
+    except Exception as err:
+        logger.error(
+            "[delete_collection.execute_job] Unexpected error during deletion | "
+            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
+            str(collection_id),
+            str(err),
+            type(err).__name__,
+            str(job_id),
+            exc_info=True,
+        )
+        # Reopen session to update job state
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark collection job as failed | {'job_id': '%s'}",
+                    str(job_id),
+                )
+            else:
+                job.status = CollectionJobStatus.FAILED
+                job.error_message = str(err)
+                job_crud.update(job.id, job)
+        if 'callback' in locals():
+            callback.fail(str(err))

🧹 Nitpick comments (21)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)
29-30: Document the no-op downgrade.

The empty downgrade function is intentional for a data cleanup migration, but consider adding a comment explaining that the deleted data cannot be recovered on rollback.

Apply this diff to document the design decision:
 def downgrade():
+    # No-op: Data cleanup is irreversible. Deleted rows cannot be restored.
     pass
backend/app/models/collection.py (5)
3-3: Use builtin generics; drop typing.List (UP035).

Prefer built‑in collection types with Pydantic v2 / SQLModel. Also remove unused List import.

Based on static analysis hints
-from typing import Any, List, Optional
+from typing import Any, Optional
@@
-    documents: List[UUID] = Field(
+    documents: list[UUID] = Field(
Also applies to: 56-56

31-33: Make timestamps non-nullable for consistency.

Align with Organization/Project models; mark inserted_at/updated_at as nullable=False.

Based on learnings
-    inserted_at: datetime = Field(default_factory=now)
-    updated_at: datetime = Field(default_factory=now)
+    inserted_at: datetime = Field(default_factory=now, nullable=False)
+    updated_at: datetime = Field(default_factory=now, nullable=False)
68-70: Deduplicate while preserving order.

set() reorders UUIDs; use dict.fromkeys to keep stable order.
-    def model_post_init(self, __context: Any):
-        self.documents = list(set(self.documents))
+    def model_post_init(self, __context: Any):
+        # Deduplicate while preserving order
+        self.documents = list(dict.fromkeys(self.documents))
73-75: Fix typo in comment: "clien" → "client".
-    # parameters accepted by the OpenAI.clien.beta.assistants.create
+    # parameters accepted by the OpenAI client (e.g., openai.beta.assistants.create)
110-114: Use Pydantic v2 model_fields and correct type hint.

__fields__ is v1; prefer model_fields with fallback. Accept a model class type.
-    def extract_super_type(self, cls: "CreationRequest"):
-        for field_name in cls.__fields__.keys():
-            field_value = getattr(self, field_name)
-            yield (field_name, field_value)
+    def extract_super_type(self, cls: type[SQLModel]):
+        # Pydantic v2 uses `model_fields`; keep fallback for compatibility
+        fields_map = getattr(cls, "model_fields", None) or getattr(cls, "__fields__", {})
+        for field_name in fields_map.keys():
+            yield field_name, getattr(self, field_name)
backend/app/crud/collection/collection.py (5)
5-5: Remove duplicate import.

import logging is already at Line 1.
-import logging
24-30: Clarify error message and fix typo.

“attempter” typo and confusing labels; make explicit.
-            err = "Invalid collection ownership: owner_project={} attempter={}".format(
-                self.project_id,
-                collection.project_id,
-            )
+            err = (
+                f"Invalid collection ownership: expected_project={self.project_id}, "
+                f"collection_project={collection.project_id}"
+            )
49-65: Existence check should ignore soft-deleted records and log scope.

Prevent false positives from soft-deleted collections; enrich log context.
         stmt = (
             select(Collection.id)
             .where(
-                (Collection.project_id == self.project_id)
-                & (Collection.llm_service_id == collection.llm_service_id)
-                & (Collection.llm_service_name == collection.llm_service_name)
+                (Collection.project_id == self.project_id)
+                & (Collection.llm_service_id == collection.llm_service_id)
+                & (Collection.llm_service_name == collection.llm_service_name)
+                & (Collection.deleted_at.is_(None))
             )
             .limit(1)
         )
         present = self.session.exec(stmt).first() is not None
 
-        logger.info(
-            "[CollectionCrud._exists] Existence check completed | "
-            f"{{'llm_service_id': '{collection.llm_service_id}', 'exists': {present}}}"
-        )
+        logger.info(
+            "[CollectionCrud._exists] Existence check | "
+            f\"{{'project_id': '{self.project_id}', 'llm_service_id': '{collection.llm_service_id}', "
+            f"'llm_service_name': '{collection.llm_service_name}', 'exists': {present}}}\"
+        )
         return present
135-142: Guard remote delete when llm_service_id is None.

Avoid calling remote.delete(None) when the local link is missing.
-        remote.delete(model.llm_service_id)
+        if model.llm_service_id:
+            remote.delete(model.llm_service_id)
146-158: Scope document-based deletion to project and exclude soft-deleted collections.

Prevent cross-project deletions and skip already soft-deleted rows.
         statement = (
             select(Collection)
             .join(
                 DocumentCollection,
                 DocumentCollection.collection_id == Collection.id,
             )
-            .where(DocumentCollection.document_id == model.id)
+            .where(
+                and_(
+                    DocumentCollection.document_id == model.id,
+                    Collection.project_id == self.project_id,
+                    Collection.deleted_at.is_(None),
+                )
+            )
             .distinct()
         )
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (2)
70-70: Pass UUID, not str, to read_one.

Crud expects UUID; avoid implicit casting.
-    retrieved_job = collection_job_crud.read_one(str(collection_job.id))
+    retrieved_job = collection_job_crud.read_one(collection_job.id)
84-84: Unnecessary commit in test.

Helper already commits; this can be removed to speed tests.
-    db.commit()
+    # commit not required here
backend/app/api/routes/collection_job.py (1)
27-29: Prefer | over typing.Union in annotations.

Cleaner and satisfies Ruff UP007.
-    response_model=Union[
-        APIResponse[CollectionPublic], APIResponse[CollectionJobPublic]
-    ],
+    response_model=APIResponse[CollectionPublic] | APIResponse[CollectionJobPublic],
backend/app/services/collections/delete_collection.py (3)
91-101: Set job to PROCESSING when work starts.

Mirror create flow; improves observability and client UX.
     collection_job = collection_job_crud.read_one(job_id)
 
     collection_job.task_id = task_id
+    collection_job.status = CollectionJobStatus.PROCESSING
+    collection_job_crud.update(collection_job.id, collection_job)
65-74: Minor: unused task_instance.

Prefix with underscore to silence Ruff ARG001 and signal intent.
-    task_instance,
+    _task_instance,
66-73: Optional: accept job_id as str and cast to UUID internally (Celery passes str).

Keeps signatures consistent with create flow.
-    task_id: str,
-    job_id: UUID,
+    task_id: str,
+    job_id: str,
     collection_id: UUID,
     _task_instance,
 ) -> None:
+    # Cast once at the start of the function body:
+    job_id = UUID(job_id)
backend/app/services/collections/create_collection.py (1)
86-86: Minor: unused task_instance.

Prefix with underscore to silence Ruff ARG001 and signal intent.
-    task_instance,
+    _task_instance,
backend/app/api/routes/collections.py (3)

4-4: Swap to builtin list (and drop unused Union).

Ruff already points out typing.List; please switch to the builtin list annotation and remove the unused Union import while you’re here.

46-47: Lean on CollectionJobStatus for payload status.

We already import CollectionJobStatus; wiring payload.status = CollectionJobStatus.PENDING.value keeps the metadata aligned with the enum the rest of the flow uses (and avoids ad‑hoc strings like "processing").

80-81: Apply the same enum alignment in delete payloads.

Same reasoning as above—set the payload status via CollectionJobStatus.PENDING.value instead of a bare "processing" string so clients see consistent status values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6f2f2 and 4a1f5ff.

📒 Files selected for processing (15)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1 hunks)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)
backend/app/api/docs/collections/create.md (1 hunks)
backend/app/api/main.py (1 hunks)
backend/app/api/routes/collection_job.py (1 hunks)
backend/app/api/routes/collections.py (4 hunks)
backend/app/crud/collection/collection.py (5 hunks)
backend/app/models/collection.py (2 hunks)
backend/app/models/collection_job.py (1 hunks)
backend/app/services/collections/create_collection.py (1 hunks)
backend/app/services/collections/delete_collection.py (1 hunks)
backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (1 hunks)
backend/app/tests/services/collections/test_create_collection.py (1 hunks)
backend/app/tests/services/collections/test_delete_collection.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

backend/app/tests/services/collections/test_create_collection.py
backend/app/tests/services/collections/test_delete_collection.py
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py

🧰 Additional context used

🧬 Code graph analysis (10)

backend/app/tests/api/routes/collections/test_create_collections.py (3)

backend/app/models/collection.py (2)

Collection (13-36)

CreationRequest (105-113)

backend/app/core/cloud/storage.py (1)

client (30-42)

backend/app/tests/conftest.py (2)

client (52-55)

user_api_key_header (77-79)

backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (4)

backend/app/models/collection_job.py (3)

CollectionJob (34-60)

CollectionJobStatus (10-14)

CollectionActionType (17-19)

backend/app/crud/collection/collection_job.py (1)

CollectionJobCrud (19-93)

backend/app/tests/utils/utils.py (1)

get_project (70-89)

backend/app/tests/conftest.py (1)

db (24-41)

backend/app/api/routes/collections.py (8)

backend/app/crud/collection/collection.py (3)

CollectionCrud (18-163)

read_one (86-110)

read_all (112-121)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-93)

read_one (24-47)

read_all (49-58)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/models/collection.py (4)

ResponsePayload (39-51)

CreationRequest (105-113)

DeletionRequest (116-117)

CollectionPublic (120-129)

backend/app/utils.py (3)

APIResponse (29-53)

load_description (248-253)

success_response (36-39)

backend/app/services/collections/helpers.py (1)

extract_error_message (23-46)

backend/app/services/collections/create_collection.py (1)

start_job (41-76)

backend/app/services/collections/delete_collection.py (1)

start_job (25-62)

backend/app/models/collection.py (3)

backend/app/core/util.py (1)

now (15-16)

backend/app/models/organization.py (1)

Organization (34-57)

backend/app/models/project.py (1)

Project (29-63)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)

upgrade (36-61)

downgrade (64-100)

backend/app/api/routes/collection_job.py (6)

backend/app/crud/collection/collection.py (2)

CollectionCrud (18-163)

read_one (86-110)

backend/app/crud/collection/collection_job.py (1)

CollectionJobCrud (19-93)

backend/app/models/collection_job.py (2)

CollectionJobStatus (10-14)

CollectionJobPublic (80-87)

backend/app/models/collection.py (1)

CollectionPublic (120-129)

backend/app/utils.py (3)

APIResponse (29-53)

load_description (248-253)

success_response (36-39)

backend/app/services/collections/helpers.py (1)

extract_error_message (23-46)

backend/app/services/collections/delete_collection.py (8)

backend/app/crud/collection/collection.py (4)

CollectionCrud (18-163)

create (67-84)

read_one (86-110)

delete (124-132)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-93)

create (76-93)

read_one (24-47)

update (60-74)

backend/app/crud/rag/open_ai.py (1)

OpenAIAssistantCrud (192-246)

backend/app/models/collection.py (3)

Collection (13-36)

DeletionRequest (116-117)

ResponsePayload (39-51)

backend/app/services/collections/helpers.py (8)

SilentCallback (81-88)

WebHookCallback (91-114)

success (77-78)

success (86-88)

success (112-114)

fail (74-75)

fail (82-84)

fail (108-110)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/create_collection.py (2)

start_job (41-76)

execute_job (79-207)

backend/app/services/collections/create_collection.py (9)

backend/app/core/cloud/storage.py (1)

get_cloud_storage (262-279)

backend/app/models/collection.py (6)

now (49-51)

Collection (13-36)

ResponsePayload (39-51)

CreationRequest (105-113)

AssistantOptions (72-95)

extract_super_type (110-113)

backend/app/crud/collection/collection.py (3)

CollectionCrud (18-163)

create (67-84)

read_one (86-110)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-93)

create (76-93)

read_one (24-47)

update (60-74)

backend/app/crud/rag/open_ai.py (2)

OpenAIVectorStoreCrud (101-189)

OpenAIAssistantCrud (192-246)

backend/app/services/collections/helpers.py (10)

_backout (117-124)

batch_documents (50-66)

SilentCallback (81-88)

WebHookCallback (91-114)

success (77-78)

success (86-88)

success (112-114)

fail (74-75)

fail (82-84)

fail (108-110)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/delete_collection.py (2)

start_job (25-62)

execute_job (65-138)

backend/app/crud/collection/collection.py (3)

backend/app/models/collection.py (2)

Collection (13-36)

now (49-51)

backend/app/models/document_collection.py (1)

DocumentCollection (9-23)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/models/collection_job.py (1)

backend/app/models/collection.py (1)

now (49-51)

🪛 Ruff (0.13.3)

backend/app/tests/api/routes/collections/test_create_collections.py

5-5: Redefinition of unused patch from line 2

Remove definition: patch

(F811)

backend/app/api/routes/collections.py

4-4: typing.List is deprecated, use list instead

(UP035)

backend/app/models/collection.py

3-3: typing.List is deprecated, use list instead

(UP035)

backend/app/api/routes/collection_job.py

27-29: Use X | Y for type annotations

(UP007)

backend/app/services/collections/delete_collection.py

73-73: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/create_collection.py

206-206: Undefined name err

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (4)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

20-26: All foreign key constraints referencing collection.id use ON DELETE CASCADE. No non-cascading foreign keys found.

backend/app/api/main.py (1)

23-24: LGTM — route included.

Adding collection_job router looks correct.

Also applies to: 31-31

backend/app/api/docs/collections/create.md (1)

23-26: LGTM — docs aligned with job-based flow.

Clear that key refers to collection job ID and polling via job-info endpoint.

Consider linking the exact route (path/method) for “collection job info” here for quick navigation.
backend/app/services/collections/create_collection.py (1)
156-160: Remove duplicate create; it re-adds the same collection.

Second create risks integrity errors and is redundant.
-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
-
-                collection_crud.create(collection_data)
+                if flat_docs:
+                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
Likely an incorrect or invalid review comment.

backend/app/services/collections/create_collection.py

backend/app/services/collections/delete_collection.py

backend/app/tests/crud/collections/collection_job/test_collection_jobs.py

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

backend/app/services/collections/create_collection.py (1)

195-204: Fix NameError and handle missing variables in outer exception handler.

Line 203 references err (undefined) instead of outer_err. Also, if the error occurs before collection_job/collection_job_crud are initialized (e.g., during session creation), accessing them raises NameError or UnboundLocalError, masking the original error and leaving the job stuck in PENDING.

Reopen a fresh session and defensively update the job:

     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
 
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+                update_data = CollectionJobUpdate(
+                    status=CollectionJobStatus.FAILED,
+                    error_message=str(outer_err),
+                )
+                job_crud.update(job.id, update_data)
+            except Exception:
+                logger.exception(
+                    "[create_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
+        if "callback" in locals():
+            callback.fail(str(outer_err))

Import CollectionJobUpdate if not already present (same as previous comment).

backend/app/services/collections/delete_collection.py (1)

124-138: Fix incorrect update call and handle missing variables.

Line 127 calls update with one argument (collection_job) but the signature requires two: job_id and patch: CollectionJobUpdate. Additionally, if the error occurs before these variables are defined, the handler raises NameError/UnboundLocalError, masking the root cause.

Reopen a session and update defensively:

     except Exception as err:
+        logger.error(
+            "[delete_collection.execute_job] Unexpected error during deletion | "
+            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
+            str(collection_id),
+            str(err),
+            type(err).__name__,
+            str(job_id),
+            exc_info=True,
+        )
+
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+                update_data = CollectionJobUpdate(
+                    status=CollectionJobStatus.FAILED,
+                    error_message=str(err),
+                )
+                job_crud.update(job.id, update_data)
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job)
-
-        logger.error(
-            "[delete_collection.execute_job] Unexpected error during deletion | "
-            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
-            str(err),
-            type(err).__name__,
-            str(job_id),
-            exc_info=True,
-        )
         callback.fail(str(err))

Import CollectionJobUpdate (same as previous comment).

🧹 Nitpick comments (1)

backend/app/crud/collection/collection_job.py (1)
3-3: Use modern list type hint instead of deprecated typing.List.

Python 3.9+ supports native list for type hints.

Apply this diff:
-from typing import List
+from typing import TYPE_CHECKING
-    def read_all(self) -> List[CollectionJob]:
+    def read_all(self) -> list[CollectionJob]:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a1f5ff and 14d50a8.

📒 Files selected for processing (5)

backend/app/crud/collection/collection_job.py (1 hunks)
backend/app/models/collection_job.py (1 hunks)
backend/app/services/collections/create_collection.py (1 hunks)
backend/app/services/collections/delete_collection.py (1 hunks)
backend/app/tests/crud/collections/collection_job/test_collection_jobs.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/tests/crud/collections/collection_job/test_collection_jobs.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/app/crud/collection/collection_job.py (3)

backend/app/models/collection_job.py (3)

CollectionJob (34-60)

CollectionJobUpdate (71-75)

CollectionJobCreate (63-68)

backend/app/models/collection.py (1)

now (49-51)

backend/app/crud/collection/collection.py (3)

read_one (86-110)

read_all (112-121)

create (67-84)

backend/app/services/collections/delete_collection.py (9)

backend/app/crud/collection/collection.py (4)

CollectionCrud (18-163)

create (67-84)

read_one (86-110)

delete (124-132)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-97)

create (80-97)

read_one (24-47)

update (60-78)

backend/app/crud/rag/open_ai.py (1)

OpenAIAssistantCrud (192-246)

backend/app/models/collection_job.py (3)

CollectionJob (34-60)

CollectionJobStatus (10-14)

CollectionActionType (17-19)

backend/app/models/collection.py (3)

Collection (13-36)

DeletionRequest (116-117)

ResponsePayload (39-51)

backend/app/services/collections/helpers.py (8)

SilentCallback (81-88)

WebHookCallback (91-114)

success (77-78)

success (86-88)

success (112-114)

fail (74-75)

fail (82-84)

fail (108-110)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/create_collection.py (2)

start_job (41-76)

execute_job (79-204)

backend/app/services/collections/create_collection.py (10)

backend/app/core/cloud/storage.py (1)

get_cloud_storage (262-279)

backend/app/models/collection.py (6)

now (49-51)

Collection (13-36)

ResponsePayload (39-51)

CreationRequest (105-113)

AssistantOptions (72-95)

extract_super_type (110-113)

backend/app/crud/collection/collection.py (3)

CollectionCrud (18-163)

create (67-84)

read_one (86-110)

backend/app/crud/document.py (1)

DocumentCrud (14-135)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-97)

create (80-97)

read_one (24-47)

update (60-78)

backend/app/crud/rag/open_ai.py (2)

OpenAIVectorStoreCrud (101-189)

OpenAIAssistantCrud (192-246)

backend/app/services/collections/helpers.py (10)

_backout (117-124)

batch_documents (50-66)

SilentCallback (81-88)

WebHookCallback (91-114)

success (77-78)

success (86-88)

success (112-114)

fail (74-75)

fail (82-84)

fail (108-110)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

🪛 Ruff (0.13.3)

backend/app/crud/collection/collection_job.py

3-3: typing.List is deprecated, use list instead

(UP035)

backend/app/services/collections/delete_collection.py

73-73: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/create_collection.py

203-203: Undefined name err

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

backend/app/crud/collection/collection_job.py

backend/app/models/collection_job.py

backend/app/services/collections/create_collection.py

backend/app/services/collections/delete_collection.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

backend/app/services/collections/create_collection.py (1)

193-202: Fix NameError and potential UnboundLocalError in outer exception handler.

The outer exception handler has two critical issues:

Line 201: References undefined variable err instead of outer_err, which will raise NameError when this handler is triggered.
Lines 200-202: Assumes collection_job and collection_job_crud are in scope, but if the exception occurs before line 100-101 (e.g., during request deserialization at lines 95-96), these variables won't exist, causing UnboundLocalError.

To fix both issues safely, open a fresh session and update the job status within a try-except block:

     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
 
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
+        try:
+            with Session(engine) as session:
+                job_crud = CollectionJobCrud(session, project_id)
+                job = job_crud.read_one(UUID(job_id))
+                job.status = CollectionJobStatus.FAILED
+                job.error_message = str(outer_err)
+                job_crud.update(job.id, job)
+        except Exception:
+            logger.exception(
+                "[create_collection.execute_job] Failed to mark job as failed | {'job_id': '%s'}",
+                job_id,
+            )
+
+        if "callback" in locals():
+            callback.fail(str(outer_err))

🧹 Nitpick comments (1)

backend/app/services/collections/create_collection.py (1)
153-157: Consider using collection_crud.create with the documents parameter.

The code creates the collection, reads it back, then links documents via DocumentCollectionCrud. Since CollectionCrud.create already accepts an optional documents parameter, you could simplify this to a single call, eliminating the extra read operation.

Apply this diff:
-                collection_crud.create(collection)
-                collection_data = collection_crud.read_one(collection.id)
-
-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
+                collection = collection_crud.create(collection, flat_docs if flat_docs else None)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14d50a8 and 1dcab67.

📒 Files selected for processing (2)

backend/app/models/collection_job.py (1 hunks)
backend/app/services/collections/create_collection.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/models/collection_job.py

🧰 Additional context used

🧬 Code graph analysis (1)

backend/app/services/collections/create_collection.py (11)

backend/app/core/cloud/storage.py (1)

get_cloud_storage (262-279)

backend/app/models/collection.py (6)

now (49-51)

Collection (13-36)

ResponsePayload (39-51)

CreationRequest (105-113)

AssistantOptions (72-95)

extract_super_type (110-113)

backend/app/crud/collection/collection.py (3)

CollectionCrud (18-163)

create (67-84)

read_one (86-110)

backend/app/crud/document.py (1)

DocumentCrud (14-135)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-97)

create (80-97)

read_one (24-47)

update (60-78)

backend/app/crud/rag/open_ai.py (2)

OpenAIVectorStoreCrud (101-189)

OpenAIAssistantCrud (192-246)

backend/app/services/collections/helpers.py (10)

_backout (117-124)

batch_documents (50-66)

SilentCallback (81-88)

WebHookCallback (91-114)

success (77-78)

success (86-88)

success (112-114)

fail (74-75)

fail (82-84)

fail (108-110)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/delete_collection.py (2)

start_job (25-62)

execute_job (65-138)

🪛 Ruff (0.13.3)

backend/app/services/collections/create_collection.py

201-201: Undefined name err

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

backend/app/api/routes/collection_job.py

backend/app/api/routes/collections.py

backend/app/models/collection_job.py

backend/app/services/collections/create_collection.py

backend/app/services/collections/delete_collection.py

backend/app/services/collections/helpers.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/crud/collection/collection.py (1)
1-5: Remove duplicate logging import.

The logging module is imported twice (lines 1 and 5), which is redundant.

Apply this diff to remove the duplicate:
 import logging
 import functools as ft
 from uuid import UUID
 from typing import Optional
-import logging
 
 from fastapi import HTTPException

♻️ Duplicate comments (3)

backend/app/tests/api/routes/collections/test_create_collections.py (1)
2-5: Remove duplicate import of patch.

The patch import appears twice (lines 2 and 5). This is flagged by Ruff as F811 (redefinition of unused import).

Apply this diff to remove the redundant import:
 from uuid import UUID
-from unittest.mock import patch
-
 from fastapi.testclient import TestClient
 from unittest.mock import patch
backend/app/services/collections/create_collection.py (1)
188-197: Fix undefined variable in outer exception handler.

Line 196 references err which is undefined in the outer exception scope (the caught exception is outer_err). This will raise a NameError if this handler executes, masking the original error and leaving the job un-updated.

Apply this diff:
     except Exception as outer_err:
         logger.error(
             "[create_collection.execute_job] Unexpected Error during collection creation: %s",
             str(outer_err),
             exc_info=True,
         )
 
         collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
+        collection_job.error_message = str(outer_err)
         collection_job_crud.update(collection_job.id, collection_job)
Note: This handler also assumes collection_job and collection_job_crud are in scope, which may not be true if an error occurs before line 95. Consider wrapping this in a fresh session as suggested in previous reviews.
backend/app/services/collections/delete_collection.py (1)
87-87: Persist task_id before proceeding.

The task_id is assigned to collection_job.task_id but not persisted. If an error occurs in the subsequent operations (lines 89-115), the task_id remains unset in the database.

Apply this diff to persist the assignment:
             collection = collection_crud.read_one(collection_id)
             collection_job = collection_job_crud.read_one(job_id)
 
-            collection_job.task_id = task_id
+            update_data = CollectionJobUpdate(task_id=task_id)
+            collection_job = collection_job_crud.update(collection_job.id, update_data)

🧹 Nitpick comments (8)

backend/app/tests/services/collections/test_delete_collection.py (1)
67-72: Remove or acknowledge unused variable.

The variable precreated is assigned but never used in this test. If the assignment is intentional (to ensure the job exists), use the underscore convention to signal this.

Apply this diff:
-        precreated = create_collection_job(
+        _ = create_collection_job(
             db=db,
             project=project,
             collection=created_collection,
             job_id=collection_job_id,
         )
backend/app/crud/collection/collection_job.py (1)
3-3: Use built-in list instead of typing.List.

Python 3.9+ supports using the built-in list type for annotations, making typing.List unnecessary. The project uses Python >= 3.9 per Alembic 1.15 requirements noted in learnings.

Apply this diff:
 from uuid import UUID
 import logging
-from typing import List
 
 from fastapi import HTTPException
And update line 47:
-    def read_all(self) -> List[CollectionJob]:
+    def read_all(self) -> list[CollectionJob]:
backend/app/services/collections/helpers.py (2)
6-6: Use built-in list instead of typing.List.

Python 3.9+ supports using the built-in list type for annotations. The typing.List import is deprecated.

Apply this diff:
 import re
 from uuid import UUID
-from typing import List
 from dataclasses import asdict, replace
And update line 52:
 def batch_documents(
-    document_crud: DocumentCrud, documents: List[UUID], batch_size: int
+    document_crud: DocumentCrud, documents: list[UUID], batch_size: int
 ):
87-87: Remove unnecessary f-string prefixes.

Lines 87, 91, and 117 use f-strings without any placeholders. Remove the f prefix for cleaner code.

Apply this diff:
     def fail(self, body):
-        logger.info(f"[SilentCallback.fail] Silent callback failure")
+        logger.info("[SilentCallback.fail] Silent callback failure")
         return
 
     def success(self, body):
-        logger.info(f"[SilentCallback.success] Silent callback success")
+        logger.info("[SilentCallback.success] Silent callback success")
         return
And for line 117:
     def success(self, body):
-        logger.info(f"[WebHookCallback.success] Callback succeeded")
+        logger.info("[WebHookCallback.success] Callback succeeded")
         self(APIResponse.success_response(body), "complete")
Also applies to: 91-91, 117-117
backend/app/services/collections/create_collection.py (2)
52-54: Remove unused assignment.

The update call returns the updated job but the result is assigned to an unused variable (flagged by Ruff). Either use the returned value or remove the assignment.

Apply this diff:
-    collection_job = job_crud.update(
+    job_crud.update(
         collection_job_id, CollectionJobUpdate(trace_id=trace_id)
     )
148-152: Simplify collection creation flow.

The current flow creates the collection, reads it back, then links documents separately. Since CollectionCrud.create accepts an optional documents parameter (see backend/app/crud/collection/collection.py lines 54-79), you can pass flat_docs directly and eliminate the read-back step.

Apply this diff:
-                collection_crud.create(collection)
-                collection_data = collection_crud.read_one(collection.id)
-
-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
+                collection_crud.create(collection, flat_docs if flat_docs else None)
backend/app/services/collections/delete_collection.py (1)
36-38: Remove unused assignment.

The update call returns the updated job but the result is assigned to an unused variable (flagged by Ruff). Either use the returned value or remove the assignment.

Apply this diff:
-    collection_job = job_crud.update(
+    job_crud.update(
         collection_job_id, CollectionJobUpdate(trace_id=trace_id)
     )
backend/app/api/routes/collections.py (1)
4-4: Use built-in list instead of deprecated typing.List.

Python 3.9+ supports using list directly in type annotations. The typing.List is deprecated.

Apply this diff:
-from typing import List
And update the type hints:
-    response_model=APIResponse[List[CollectionPublic]],
+    response_model=APIResponse[list[CollectionPublic]],
-    response_model=APIResponse[List[DocumentPublic]],
+    response_model=APIResponse[list[DocumentPublic]],

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1dcab67 and 7b09e3f.

📒 Files selected for processing (16)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)
backend/app/api/docs/collections/create.md (1 hunks)
backend/app/api/docs/collections/delete.md (1 hunks)
backend/app/api/routes/collection_job.py (1 hunks)
backend/app/api/routes/collections.py (4 hunks)
backend/app/crud/collection/collection.py (4 hunks)
backend/app/crud/collection/collection_job.py (1 hunks)
backend/app/models/__init__.py (3 hunks)
backend/app/models/collection_job.py (1 hunks)
backend/app/services/collections/create_collection.py (1 hunks)
backend/app/services/collections/delete_collection.py (1 hunks)
backend/app/services/collections/helpers.py (1 hunks)
backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
backend/app/tests/services/collections/test_create_collection.py (1 hunks)
backend/app/tests/services/collections/test_delete_collection.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

backend/app/api/docs/collections/create.md
backend/app/api/routes/collection_job.py
backend/app/models/collection_job.py

🧰 Additional context used

🧬 Code graph analysis (12)

backend/app/tests/api/routes/collections/test_create_collections.py (2)

backend/app/models/collection.py (2)

Collection (13-36)

CreationRequest (105-113)

backend/app/tests/conftest.py (3)

client (52-55)

user_api_key_header (77-79)

user_api_key (89-91)

backend/app/services/collections/helpers.py (5)

backend/app/core/util.py (1)

post_callback (29-39)

backend/app/crud/document.py (1)

DocumentCrud (14-135)

backend/app/models/collection.py (2)

ResponsePayload (39-51)

now (49-51)

backend/app/crud/rag/open_ai.py (1)

OpenAIAssistantCrud (192-246)

backend/app/utils.py (3)

APIResponse (29-53)

failure_response (42-53)

success_response (36-39)

backend/app/tests/services/collections/test_create_collection.py (10)

backend/app/core/cloud/storage.py (1)

AmazonCloudStorageClient (28-84)

backend/app/crud/collection/collection.py (5)

CollectionCrud (17-159)

create (55-80)

_ (131-138)

_ (141-159)

read_one (82-106)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

create (78-96)

read_one (24-45)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/models/collection_job.py (3)

CollectionJobStatus (11-15)

CollectionJob (35-64)

CollectionActionType (18-20)

backend/app/models/collection.py (2)

CreationRequest (105-113)

ResponsePayload (39-51)

backend/app/services/collections/create_collection.py (2)

start_job (41-71)

execute_job (74-197)

backend/app/tests/utils/openai.py (1)

get_mock_openai_client_with_vector_store (90-117)

backend/app/tests/utils/utils.py (1)

get_project (70-89)

backend/app/tests/utils/document.py (2)

DocumentStore (52-79)

project (64-65)

backend/app/services/collections/delete_collection.py (9)

backend/app/crud/collection/collection.py (3)

CollectionCrud (17-159)

read_one (82-106)

delete (120-128)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

update (58-76)

read_one (24-45)

backend/app/crud/rag/open_ai.py (1)

OpenAIAssistantCrud (192-246)

backend/app/models/collection_job.py (2)

CollectionJobStatus (11-15)

CollectionJobUpdate (74-79)

backend/app/models/collection.py (3)

Collection (13-36)

DeletionRequest (116-117)

ResponsePayload (39-51)

backend/app/services/collections/helpers.py (8)

SilentCallback (85-92)

WebHookCallback (95-118)

success (81-82)

success (90-92)

success (116-118)

fail (78-79)

fail (86-88)

fail (112-114)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/create_collection.py (2)

start_job (41-71)

execute_job (74-197)

backend/app/tests/api/routes/collections/test_collection_info.py (6)

backend/app/models/collection.py (2)

now (49-51)

Collection (13-36)

backend/app/models/collection_job.py (4)

CollectionJobCreate (67-71)

CollectionActionType (18-20)

CollectionJobStatus (11-15)

CollectionJobUpdate (74-79)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

create (78-96)

update (58-76)

backend/app/crud/collection/collection.py (2)

CollectionCrud (17-159)

create (55-80)

backend/app/api/routes/collections.py (1)

create_collection (44-73)

backend/app/tests/conftest.py (4)

db (24-41)

client (52-55)

user_api_key_header (77-79)

user_api_key (89-91)

backend/app/api/routes/collections.py (9)

backend/app/crud/collection/collection.py (4)

CollectionCrud (17-159)

create (55-80)

read_one (82-106)

read_all (108-117)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-96)

create (78-96)

read_one (24-45)

read_all (47-56)

backend/app/crud/document_collection.py (2)

DocumentCollectionCrud (8-48)

create (12-23)

backend/app/models/collection_job.py (3)

CollectionJobStatus (11-15)

CollectionActionType (18-20)

CollectionJobCreate (67-71)

backend/app/models/collection.py (4)

ResponsePayload (39-51)

CreationRequest (105-113)

DeletionRequest (116-117)

CollectionPublic (120-129)

backend/app/utils.py (2)

APIResponse (29-53)

success_response (36-39)

backend/app/services/collections/helpers.py (1)

extract_error_message (22-48)

backend/app/services/collections/create_collection.py (1)

start_job (41-71)

backend/app/services/collections/delete_collection.py (1)

start_job (25-55)

backend/app/services/collections/create_collection.py (11)

backend/app/core/cloud/storage.py (1)

get_cloud_storage (262-279)

backend/app/models/collection.py (6)

now (49-51)

Collection (13-36)

ResponsePayload (39-51)

CreationRequest (105-113)

AssistantOptions (72-95)

extract_super_type (110-113)

backend/app/crud/collection/collection.py (3)

CollectionCrud (17-159)

read_one (82-106)

create (55-80)

backend/app/crud/document.py (1)

DocumentCrud (14-135)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-96)

update (58-76)

read_one (24-45)

create (78-96)

backend/app/crud/rag/open_ai.py (2)

OpenAIVectorStoreCrud (101-189)

OpenAIAssistantCrud (192-246)

backend/app/services/collections/helpers.py (10)

_backout (121-129)

batch_documents (51-70)

SilentCallback (85-92)

WebHookCallback (95-118)

success (81-82)

success (90-92)

success (116-118)

fail (78-79)

fail (86-88)

fail (112-114)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/delete_collection.py (2)

start_job (25-55)

execute_job (58-131)

backend/app/crud/collection/collection.py (3)

backend/app/models/collection.py (2)

Collection (13-36)

now (49-51)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/crud/collection/collection_job.py (2)

create (78-96)

read_one (24-45)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

upgrade (20-26)

backend/app/crud/collection/collection_job.py (1)

create (78-96)

backend/app/models/__init__.py (12)

backend/app/models/api_key.py (3)

APIKey (28-38)

APIKeyBase (10-20)

APIKeyPublic (23-25)

backend/app/models/assistants.py (4)

Assistant (33-44)

AssistantBase (11-30)

AssistantCreate (47-76)

AssistantUpdate (79-101)

backend/app/models/collection.py (2)

Collection (13-36)

CollectionPublic (120-129)

backend/app/models/collection_job.py (7)

CollectionActionType (18-20)

CollectionJob (35-64)

CollectionJobBase (23-32)

CollectionJobStatus (11-15)

CollectionJobUpdate (74-79)

CollectionJobPublic (82-91)

CollectionJobCreate (67-71)

backend/app/models/credentials.py (5)

Credential (48-92)

CredsBase (9-16)

CredsCreate (19-29)

CredsPublic (95-103)

CredsUpdate (32-45)

backend/app/models/fine_tuning.py (6)

FineTuningJobBase (20-25)

Fine_Tuning (54-89)

FineTuningJobCreate (28-51)

FineTuningJobPublic (103-119)

FineTuningUpdate (92-100)

FineTuningStatus (13-17)

backend/app/models/model_evaluation.py (6)

ModelEvaluation (38-85)

ModelEvaluationBase (21-26)

ModelEvaluationCreate (29-35)

ModelEvaluationPublic (95-108)

ModelEvaluationStatus (14-18)

ModelEvaluationUpdate (88-92)

backend/app/models/onboarding.py (2)

OnboardingRequest (7-81)

OnboardingResponse (84-103)

backend/app/models/openai_conversation.py (4)

OpenAIConversationPublic (101-111)

OpenAIConversation (58-69)

OpenAIConversationBase (22-55)

OpenAIConversationCreate (72-98)

backend/app/models/organization.py (5)

Organization (34-57)

OrganizationCreate (23-24)

OrganizationPublic (61-64)

OrganizationsPublic (67-69)

OrganizationUpdate (28-30)

backend/app/models/response.py (6)

CallbackResponse (47-54)

Diagnostics (35-39)

FileResultChunk (42-44)

ResponsesAPIRequest (4-11)

ResponseJobStatus (27-32)

ResponsesSyncAPIRequest (14-24)

backend/app/models/threads.py (3)

OpenAI_Thread (18-21)

OpenAIThreadBase (6-11)

OpenAIThreadCreate (14-15)

backend/app/crud/collection/collection_job.py (2)

backend/app/models/collection_job.py (3)

CollectionJob (35-64)

CollectionJobUpdate (74-79)

CollectionJobCreate (67-71)

backend/app/crud/collection/collection.py (3)

read_one (82-106)

read_all (108-117)

create (55-80)

backend/app/tests/services/collections/test_delete_collection.py (7)

backend/app/models/collection.py (3)

DeletionRequest (116-117)

Collection (13-36)

ResponsePayload (39-51)

backend/app/tests/utils/utils.py (1)

get_project (70-89)

backend/app/crud/collection/collection.py (5)

CollectionCrud (17-159)

create (55-80)

read_all (108-117)

read_one (82-106)

delete (120-128)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-96)

create (78-96)

read_all (47-56)

read_one (24-45)

backend/app/models/collection_job.py (3)

CollectionJobStatus (11-15)

CollectionJob (35-64)

CollectionActionType (18-20)

backend/app/services/collections/delete_collection.py (2)

start_job (25-55)

execute_job (58-131)

backend/app/tests/conftest.py (1)

db (24-41)

🪛 Ruff (0.13.3)

backend/app/tests/api/routes/collections/test_create_collections.py

5-5: Redefinition of unused patch from line 2

Remove definition: patch

(F811)

backend/app/services/collections/helpers.py

6-6: typing.List is deprecated, use list instead

(UP035)

87-87: f-string without any placeholders

Remove extraneous f prefix

(F541)

91-91: f-string without any placeholders

Remove extraneous f prefix

(F541)

117-117: f-string without any placeholders

Remove extraneous f prefix

(F541)

backend/app/services/collections/delete_collection.py

36-36: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)

66-66: Unused function argument: task_instance

(ARG001)

backend/app/api/routes/collections.py

4-4: typing.List is deprecated, use list instead

(UP035)

backend/app/services/collections/create_collection.py

52-52: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)

196-196: Undefined name err

(F821)

backend/app/crud/collection/collection_job.py

3-3: typing.List is deprecated, use list instead

(UP035)

backend/app/tests/services/collections/test_delete_collection.py

67-67: Local variable precreated is assigned to but never used

Remove assignment to unused variable precreated

(F841)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

backend/app/services/collections/delete_collection.py

backend/app/tests/api/routes/collections/test_collection_info.py

backend/app/tests/services/collections/test_create_collection.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

backend/app/tests/services/collections/test_create_collection.py (1)
90-93: Simplify enum comparison.

The assertion checks action_type against both the enum member and its .value, which is overly defensive. Since action_type is an Enum field on the model, it should always be the enum member, not a string.

Apply this diff to simplify:
-        assert job.action_type in (
-            CollectionActionType.CREATE,
-            CollectionActionType.CREATE.value,
-        )
+        assert job.action_type == CollectionActionType.CREATE
backend/app/tests/api/routes/collections/test_collection_info.py (1)
58-64: Consider clarifying the FAILED status handling.

The pattern of creating a job and immediately updating it when status == FAILED works correctly, but may be unclear to readers. This is necessary because CollectionJobCreate doesn't include error_message, so a follow-up update is required.

Add a comment to clarify the intent:
     collection_job = CollectionJobCrud(db, user.project_id).create(job_in)
 
+    # CollectionJobCreate doesn't include error_message, so update separately if needed
     if collection_job.status == CollectionJobStatus.FAILED:
         job_in = CollectionJobUpdate(
             error_message="Something went wrong during the collection job process."
         )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b09e3f and 95ef105.

📒 Files selected for processing (2)

backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
backend/app/tests/services/collections/test_create_collection.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

backend/app/tests/services/collections/test_create_collection.py (11)

backend/app/core/cloud/storage.py (1)

AmazonCloudStorageClient (28-84)

backend/app/crud/collection/collection.py (5)

CollectionCrud (17-159)

create (55-80)

_ (131-138)

_ (141-159)

read_one (82-106)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

create (78-96)

read_one (24-45)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/models/collection_job.py (3)

CollectionJobStatus (11-15)

CollectionJob (35-64)

CollectionActionType (18-20)

backend/app/models/collection.py (2)

CreationRequest (105-113)

ResponsePayload (39-51)

backend/app/services/collections/create_collection.py (2)

start_job (41-71)

execute_job (74-197)

backend/app/tests/utils/openai.py (1)

get_mock_openai_client_with_vector_store (90-117)

backend/app/tests/utils/utils.py (1)

get_project (70-89)

backend/app/tests/utils/document.py (2)

DocumentStore (52-79)

project (64-65)

backend/app/tests/conftest.py (1)

db (24-41)

backend/app/tests/api/routes/collections/test_collection_info.py (6)

backend/app/models/collection.py (2)

now (49-51)

Collection (13-36)

backend/app/models/collection_job.py (4)

CollectionJobCreate (67-71)

CollectionActionType (18-20)

CollectionJobStatus (11-15)

CollectionJobUpdate (74-79)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

create (78-96)

update (58-76)

backend/app/crud/collection/collection.py (2)

CollectionCrud (17-159)

create (55-80)

backend/app/api/routes/collections.py (1)

create_collection (44-73)

backend/app/tests/conftest.py (4)

db (24-41)

client (52-55)

user_api_key_header (77-79)

user_api_key (89-91)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (1)

backend/app/tests/services/collections/test_create_collection.py (1)

111-190: LGTM! Comprehensive integration test.

This test thoroughly validates the end-to-end flow of execute_job:

Proper AWS/OpenAI mocking with moto and MagicMock

Correct Session patching to inject the test database

Verification of all critical state transitions (task_id assignment, status updates, collection creation, document linking)

The test design is solid and covers the success path comprehensively.

backend/app/tests/services/collections/test_create_collection.py

kartpop

please change the route path before merging

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

backend/app/api/routes/collection_job.py (1)

23-27: Consider simplifying the route path to /jobs/{job_id}.

Past reviewers suggested changing from /info/jobs/{job_id} to /jobs/{job_id} for consistency with other routes like /documents/transformations/{job_id}. The author mentioned keeping "collection_job_id" for clarity, but the consensus was that /collections/jobs/{job_id} (resulting from the prefix) would be clearer and more consistent.

Based on past review comments.

🧹 Nitpick comments (2)

backend/app/tests/api/routes/collections/test_collection_info.py (2)

84-87: Clarify the assertion on line 86.

Line 86 asserts data["collection_id"] == collection_job.collection_id, but when create_collection_job is called without a collection_id (as in line 74), both sides evaluate to None. While this assertion passes, it doesn't verify meaningful behavior.

Consider either:

Asserting explicitly: assert data["collection_id"] is None

Or adding a comment explaining why checking None equality is intentional

69-138: Consider expanding test coverage.

The current tests cover the happy path scenarios well. Consider adding tests for:

404 response when job_id doesn't exist

Authorization check (accessing another project's job)

DELETE action type in addition to CREATE

Edge cases like PROCESSING status

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95ef105 and 4de8c1d.

📒 Files selected for processing (4)

backend/app/api/docs/collections/create.md (1 hunks)
backend/app/api/docs/collections/delete.md (1 hunks)
backend/app/api/routes/collection_job.py (1 hunks)
backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/api/docs/collections/create.md

🧰 Additional context used

🧬 Code graph analysis (2)

backend/app/api/routes/collection_job.py (6)

backend/app/crud/collection/collection.py (2)

CollectionCrud (17-159)

read_one (82-106)

backend/app/crud/collection/collection_job.py (2)

CollectionJobCrud (19-96)

read_one (24-45)

backend/app/models/collection_job.py (3)

CollectionJobStatus (11-15)

CollectionJobPublic (82-91)

CollectionActionType (18-20)

backend/app/models/collection.py (1)

CollectionPublic (120-129)

backend/app/utils.py (3)

APIResponse (29-53)

load_description (248-253)

success_response (36-39)

backend/app/services/collections/helpers.py (1)

extract_error_message (22-48)

backend/app/tests/api/routes/collections/test_collection_info.py (6)

backend/app/models/collection.py (2)

now (49-51)

Collection (13-36)

backend/app/models/collection_job.py (4)

CollectionJobCreate (67-71)

CollectionActionType (18-20)

CollectionJobStatus (11-15)

CollectionJobUpdate (74-79)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

create (78-96)

update (58-76)

backend/app/crud/collection/collection.py (2)

CollectionCrud (17-159)

create (55-80)

backend/app/api/routes/collections.py (1)

create_collection (44-73)

backend/app/tests/conftest.py (4)

db (24-41)

client (52-55)

user_api_key_header (77-79)

user_api_key (89-91)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (4)

backend/app/api/docs/collections/delete.md (1)

1-13: LGTM! Documentation clearly describes the job-based deletion workflow.

The updated documentation accurately reflects the asynchronous deletion process using CollectionJob tracking and polling via the job info endpoint.

backend/app/api/routes/collection_job.py (1)

28-50: LGTM! Job info retrieval logic is well-structured.

The endpoint correctly:

Scopes job retrieval to the user's project

Conditionally embeds the collection for successful CREATE operations

Normalizes error messages for failed jobs using extract_error_message

The conditional logic at lines 38-41 appropriately checks all three conditions before attaching collection data.

backend/app/tests/api/routes/collections/test_collection_info.py (2)

19-39: LGTM! Helper function correctly creates persisted collections.

The create_collection helper appropriately:

Uses CollectionCrud to persist the collection

Optionally populates LLM service fields

Returns the created collection for test assertions

42-66: LGTM! Previous review feedback has been addressed.

The create_collection_job helper now correctly omits inserted_at and updated_at from CollectionJobCreate, relying on the model's default_factory for timestamp generation. The conditional error_message update for FAILED jobs is appropriate.

avirajsingh7 · 2025-10-09T04:46:02Z

backend/app/api/docs/collections/info.md

@@ -1,5 +1,4 @@
-Retrieve all AI-platform information about a collection given its
-ID. This route is very helpful for:
+Retrieve detailed information about a specific collection by its ID from the collection table. Note that this endpoint CANNOT be used as a polling endpoint for collection creation because an entry will be made in the collection table only after the resource creation and association has been successful.


One suggestion for all the md/docs files: these are user-facing documents, so they should focus on describing the behavior of the endpoints, not their internal implementation details.
For example:

If an endpoint triggers a webhook, simply mention that it does, no need to explain how it’s handled internally.

If an endpoint deletes a resource, just state that it deletes it, don’t specify whether it’s a soft or hard delete.

Similarly, avoid mentioning internal mechanisms like can be use for polling or background tasks unless they directly affect how the user interacts with the API.

noted, but since for now the main users of this endpoint had been dalgo, I think keeping that in mind I wrote the docs in that way . Stating that you need to poll on the job id returned is something that the user needs to know in this case. The fact that the deletion is soft or hard is sometimes needed to be known considering that our current user of the platform is dalgo, glific and us(the internal team) . So I think we can keep the docs which go into the behaviour of endpoints , anyway we will discuss this with other people as well, and according to their opinion we will apply those changes in v2

avirajsingh7 · 2025-10-09T04:47:47Z

...end/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py

+    op.execute(
+        """
+        DELETE FROM collection
+        WHERE status IN ('processing', 'failed')


Make sure no collection with empty llm_service_id exist

I do not think there would be a situation where status turned to successful while no llm service id is there as they're committed together in the same transaction, they both persist atomically. By they i mean the status as successful and the llm service id , for reference - here . But for surity i will still add that sql query as well

backend/app/crud/collection/collection.py

backend/app/models/collection.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

backend/app/api/docs/collections/job_info.md (1)

7-8: Fix list indentation to match Markdown standards.

Nested list items still use 4-space indentation instead of the standard 2-space indentation.

Apply this diff:

 * If the job has finished, has been successful and it was a job of creation of collection then this endpoint will fetch the associated collection details from the collection table, including:
-    - `llm_service_id`: The OpenAI assistant or model used for the collection.
-    - Collection metadata such as ID, project, organization, and timestamps.
+  - `llm_service_id`: The OpenAI assistant or model used for the collection.
+  - Collection metadata such as ID, project, organization, and timestamps.

backend/app/services/collections/delete_collection.py (2)

127-141: Fix outer exception handler—UnboundLocalError and closed session.

The outer exception handler has multiple critical issues:

UnboundLocalError risk: If an exception occurs before line 86 (e.g., during get_openai_client), collection_job and collection_job_crud will be undefined, causing line 128 to raise UnboundLocalError.
Closed session: The with Session(engine) context has exited, so the session is closed and line 130 will fail.
Wrong signature: Line 130 calls update(collection_job.id, collection_job) instead of update(job_id, patch).

Reopen a fresh session and fetch the job defensively:

     except Exception as err:
-        collection_job.status = CollectionJobStatus.FAILED
-        collection_job.error_message = str(err)
-        collection_job_crud.update(collection_job.id, collection_job)
-
         logger.error(
             "[delete_collection.execute_job] Unexpected error during deletion | "
             "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
+            str(collection_id),
             str(err),
             type(err).__name__,
             str(job_id),
             exc_info=True,
         )
-        callback.fail(str(err))
+
+        # Reopen session to mark job as failed
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job_crud.update(
+                    job_id,
+                    CollectionJobUpdate(
+                        status=CollectionJobStatus.FAILED,
+                        error_message=str(err),
+                    ),
+                )
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
+        # Only call callback if it was initialized
+        if "callback" in locals():
+            callback.fail(str(err))

114-116: Fix incorrect update() signature.

Same issue as above—you're passing collection_job instead of a CollectionJobUpdate patch object.

Apply this diff:

                 collection_job.status = CollectionJobStatus.FAILED
                 collection_job.error_message = str(err)
-                collection_job_crud.update(collection_job.id, collection_job)
+                collection_job_crud.update(
+                    collection_job.id,
+                    CollectionJobUpdate(
+                        status=CollectionJobStatus.FAILED,
+                        error_message=str(err),
+                    ),
+                )

🧹 Nitpick comments (5)

backend/app/models/collection.py (3)
48-51: Remove redundant now() classmethod.

The ResponsePayload.now() classmethod simply delegates to the imported now() function from app.core.util. This adds no value and creates confusion about which now() to use.

Apply this diff:
     time: datetime = Field(default_factory=now)
-
-    @classmethod
-    def now(cls):
-        """Returns current UTC time without timezone info"""
-        return now()
68-69: Document deduplication behavior or preserve order.

Converting documents to a set and back to a list removes duplicates but also loses the original ordering. If document processing order matters, consider using dict.fromkeys(self.documents) to preserve insertion order while removing duplicates.

If order should be preserved:
     def model_post_init(self, __context: Any):
-        self.documents = list(set(self.documents))
+        self.documents = list(dict.fromkeys(self.documents))
110-113: Clarify method name and purpose.

The method name extract_super_type is misleading—it yields field name-value pairs, not a supertype. Consider renaming to iter_fields() or adding a docstring explaining its purpose.

Example rename:
-    def extract_super_type(self, cls: "CreationRequest"):
+    def iter_fields(self):
+        """Yield (field_name, field_value) pairs for all fields."""
-        for field_name in cls.__fields__.keys():
+        for field_name in self.__fields__.keys():
             field_value = getattr(self, field_name)
             yield (field_name, field_value)
backend/app/services/collections/delete_collection.py (2)
37-39: Remove unused variable assignment.

The collection_job variable is assigned but never used. The return value of update() is not needed here since we only need the collection_job_id.

Apply this diff:
-    collection_job = job_crud.update(
+    job_crud.update(
         collection_job_id, CollectionJobUpdate(trace_id=trace_id)
     )
67-67: Remove unused task_instance parameter.

The task_instance parameter is declared but never used in the function body. If it's required by the Celery task signature, add a leading underscore (_task_instance) to indicate it's intentionally unused.

Apply this diff:
-    task_instance,
+    _task_instance,
 ) -> None:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4de8c1d and 2c0a11d.

📒 Files selected for processing (5)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1 hunks)
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1 hunks)
backend/app/api/docs/collections/job_info.md (1 hunks)
backend/app/models/collection.py (2 hunks)
backend/app/services/collections/delete_collection.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

backend/app/services/collections/delete_collection.py
backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py
backend/app/models/collection.py

backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

backend/app/services/collections/delete_collection.py

backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

backend/app/models/collection.py

🧠 Learnings (1)

📚 Learning: 2025-10-08T12:05:01.299Z

Learnt from: CR
PR: ProjectTech4DevAI/ai-platform#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.299Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/

Applied to files:

backend/app/models/collection.py

🧬 Code graph analysis (3)

backend/app/services/collections/delete_collection.py (8)

backend/app/crud/collection/collection.py (3)

CollectionCrud (17-159)

read_one (82-106)

delete (120-128)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

update (58-76)

read_one (24-45)

backend/app/crud/rag/open_ai.py (1)

OpenAIAssistantCrud (192-246)

backend/app/models/collection.py (3)

Collection (13-36)

DeletionRequest (116-117)

ResponsePayload (39-51)

backend/app/services/collections/helpers.py (8)

SilentCallback (85-92)

WebHookCallback (95-118)

success (81-82)

success (90-92)

success (116-118)

fail (78-79)

fail (86-88)

fail (112-114)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/create_collection.py (2)

start_job (41-71)

execute_job (74-197)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (2)

backend/app/alembic/versions/7ab577d3af26_delete_non_successful_columns_from_collection_table.py (1)

upgrade (20-32)

backend/app/crud/collection/collection_job.py (1)

create (78-96)

backend/app/models/collection.py (3)

backend/app/core/util.py (1)

now (15-16)

backend/app/models/organization.py (1)

Organization (34-57)

backend/app/models/project.py (1)

Project (29-63)

🪛 markdownlint-cli2 (0.18.1)

backend/app/api/docs/collections/job_info.md

7-7: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

8-8: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🪛 Ruff (0.13.3)

backend/app/services/collections/delete_collection.py

37-37: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)

67-67: Unused function argument: task_instance

(ARG001)

backend/app/models/collection.py

3-3: typing.List is deprecated, use list instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (1)

backend/app/alembic/versions/b30727137e65_adding_collection_job_table_and_alter_collection_table.py (1)

36-68: Migration structure looks good.

The upgrade path correctly:

Creates enum types with checkfirst=True

Establishes the collection_jobs table with proper constraints

Renames created_at to inserted_at for consistency

Enforces non-null constraints on llm_service_id and llm_service_name

Removes deprecated columns (owner_id, status, error_message)

The downgrade strategy (lines 94-95) uses hardcoded defaults as a pragmatic workaround, which was already discussed in previous reviews.

backend/app/models/collection.py

backend/app/services/collections/delete_collection.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

backend/app/tests/services/collections/test_create_collection.py (1)

31-45: Pass CollectionJobCreate schema instead of model instance.

The helper instantiates a CollectionJob model directly (line 38) and passes it to CollectionJobCrud.create(), but according to the CRUD signature in backend/app/crud/collection/collection_job.py (lines 64-68), the create method expects a CollectionJobCreate schema, not a model instance. This bypasses Pydantic validation and can cause type issues.

Apply this diff to use the proper schema:
+from app.models import CollectionJobCreate
+
 def create_collection_job_for_create(
     db: Session,
     project,
     job_id: UUID,
 ):
     """Pre-create a CREATE job with the given id so start_job can update it."""
     return CollectionJobCrud(db, project.id).create(
-        CollectionJob(
+        CollectionJobCreate(
             id=job_id,
             action_type=CollectionActionType.CREATE,
             project_id=project.id,
             collection_id=None,
             status=CollectionJobStatus.PENDING,
         )
     )

backend/app/services/collections/delete_collection.py (1)

137-155: Fix outer exception handler to avoid UnboundLocalError and closed session.

The outer exception handler has critical issues:

UnboundLocalError/NameError: Lines 138-143 reference collection_job_crud and collection_job, which won't be defined if the error occurs before line 87-88 (e.g., during DeletionRequest construction at line 69 or callback setup at lines 72-76).
Closed session: The session from line 84 is closed by the time this handler executes, so any DB operations will fail.
NameError on collection: Line 149 references collection.id, but collection is only defined at line 99 inside the inner try block.

Apply this diff to reopen a fresh session and handle the job update defensively:

     except Exception as err:
-        collection_job_crud.update(
-            collection_job.id,
-            CollectionJobUpdate(
-                status=CollectionJobStatus.FAILED,
-                error_message=str(err),
-            ),
-        )
-
         logger.error(
             "[delete_collection.execute_job] Unexpected error during deletion | "
-            "{'collection_id': '%s', 'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
-            str(collection.id),
+            "{'error': '%s', 'error_type': '%s', 'job_id': '%s'}",
             str(err),
             type(err).__name__,
             str(job_id),
             exc_info=True,
         )
-        callback.fail(str(err))
+
+        with Session(engine) as session:
+            job_crud = CollectionJobCrud(session, project_id)
+            try:
+                job = job_crud.read_one(job_id)
+                job_crud.update(
+                    job.id,
+                    CollectionJobUpdate(
+                        status=CollectionJobStatus.FAILED,
+                        error_message=str(err),
+                    ),
+                )
+            except Exception:
+                logger.exception(
+                    "[delete_collection.execute_job] Unable to mark job as failed | "
+                    "{'job_id': '%s'}",
+                    str(job_id),
+                )
+
+        if "callback" in locals():
+            callback.fail(str(err))

🧹 Nitpick comments (1)

backend/app/services/collections/create_collection.py (1)
151-155: Consider consolidating collection and document creation.

The code creates the collection (line 151), reads it back (line 152), then creates DocumentCollection links (lines 154-155). According to CollectionCrud.create signature in backend/app/crud/collection/collection.py (lines 38-61), the method accepts an optional documents parameter that handles the linking internally.

Consider simplifying to:
-                collection_crud.create(collection)
-                collection_data = collection_crud.read_one(collection.id)
-
-                if flat_docs:
-                    DocumentCollectionCrud(session).create(collection_data, flat_docs)
+                collection = collection_crud.create(
+                    collection, 
+                    documents=flat_docs if flat_docs else None
+                )
This eliminates the redundant read and consolidates the logic into a single operation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c0a11d and 94234b1.

📒 Files selected for processing (5)

backend/app/models/collection.py (2 hunks)
backend/app/services/collections/create_collection.py (1 hunks)
backend/app/services/collections/delete_collection.py (1 hunks)
backend/app/tests/services/collections/test_create_collection.py (1 hunks)
backend/app/tests/services/collections/test_delete_collection.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

backend/app/services/collections/create_collection.py
backend/app/tests/services/collections/test_create_collection.py
backend/app/tests/services/collections/test_delete_collection.py
backend/app/models/collection.py
backend/app/services/collections/delete_collection.py

backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

backend/app/services/collections/create_collection.py
backend/app/services/collections/delete_collection.py

backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

backend/app/models/collection.py

🧠 Learnings (1)

📚 Learning: 2025-10-08T12:05:01.299Z

Learnt from: CR
PR: ProjectTech4DevAI/ai-platform#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.299Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/

Applied to files:

backend/app/models/collection.py

🧬 Code graph analysis (5)

backend/app/services/collections/create_collection.py (10)

backend/app/core/cloud/storage.py (1)

get_cloud_storage (262-279)

backend/app/models/collection.py (6)

now (49-51)

Collection (13-36)

ResponsePayload (39-51)

CreationRequest (105-113)

AssistantOptions (72-95)

extract_super_type (110-113)

backend/app/crud/document.py (1)

DocumentCrud (14-135)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-96)

update (58-76)

read_one (24-45)

create (78-96)

backend/app/crud/rag/open_ai.py (2)

OpenAIVectorStoreCrud (101-189)

OpenAIAssistantCrud (192-246)

backend/app/services/collections/helpers.py (10)

_backout (121-129)

batch_documents (51-70)

SilentCallback (85-92)

WebHookCallback (95-118)

success (81-82)

success (90-92)

success (116-118)

fail (78-79)

fail (86-88)

fail (112-114)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

backend/app/services/collections/delete_collection.py (2)

start_job (25-56)

execute_job (59-155)

backend/app/tests/services/collections/test_create_collection.py (7)

backend/app/core/cloud/storage.py (1)

AmazonCloudStorageClient (28-84)

backend/app/crud/collection/collection.py (5)

CollectionCrud (17-159)

create (55-80)

_ (131-138)

_ (141-159)

read_one (82-106)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

create (78-96)

read_one (24-45)

backend/app/crud/document_collection.py (1)

DocumentCollectionCrud (8-48)

backend/app/services/collections/create_collection.py (2)

start_job (41-71)

execute_job (74-212)

backend/app/tests/utils/openai.py (1)

get_mock_openai_client_with_vector_store (90-117)

backend/app/tests/utils/document.py (2)

DocumentStore (52-79)

project (64-65)

backend/app/tests/services/collections/test_delete_collection.py (7)

backend/app/models/collection.py (3)

DeletionRequest (116-117)

Collection (13-36)

ResponsePayload (39-51)

backend/app/tests/utils/utils.py (1)

get_project (70-89)

backend/app/crud/collection/collection.py (5)

CollectionCrud (17-159)

create (55-80)

read_all (108-117)

read_one (82-106)

delete (120-128)

backend/app/crud/collection/collection_job.py (4)

CollectionJobCrud (19-96)

create (78-96)

read_all (47-56)

read_one (24-45)

backend/app/models/collection_job.py (3)

CollectionJobStatus (11-15)

CollectionJob (35-64)

CollectionActionType (18-20)

backend/app/api/routes/collections.py (1)

delete_collection (80-114)

backend/app/services/collections/delete_collection.py (2)

start_job (25-56)

execute_job (59-155)

backend/app/models/collection.py (3)

backend/app/core/util.py (1)

now (15-16)

backend/app/models/organization.py (1)

Organization (34-57)

backend/app/models/project.py (1)

Project (29-63)

backend/app/services/collections/delete_collection.py (8)

backend/app/crud/collection/collection.py (3)

CollectionCrud (17-159)

read_one (82-106)

delete (120-128)

backend/app/crud/collection/collection_job.py (3)

CollectionJobCrud (19-96)

update (58-76)

read_one (24-45)

backend/app/crud/rag/open_ai.py (1)

OpenAIAssistantCrud (192-246)

backend/app/models/collection_job.py (2)

CollectionJobStatus (11-15)

CollectionJobUpdate (74-79)

backend/app/models/collection.py (3)

Collection (13-36)

DeletionRequest (116-117)

ResponsePayload (39-51)

backend/app/services/collections/helpers.py (8)

SilentCallback (85-92)

WebHookCallback (95-118)

success (81-82)

success (90-92)

success (116-118)

fail (78-79)

fail (86-88)

fail (112-114)

backend/app/celery/utils.py (1)

start_low_priority_job (46-71)

backend/app/utils.py (1)

get_openai_client (175-205)

🪛 Ruff (0.13.3)

backend/app/services/collections/create_collection.py

52-52: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)

backend/app/tests/services/collections/test_delete_collection.py

68-68: Local variable precreated is assigned to but never used

Remove assignment to unused variable precreated

(F841)

backend/app/services/collections/delete_collection.py

37-37: Local variable collection_job is assigned to but never used

Remove assignment to unused variable collection_job

(F841)

67-67: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (13)

backend/app/tests/services/collections/test_delete_collection.py (2)

49-108: LGTM!

The test correctly verifies that start_job updates the CollectionJob and schedules the background task with appropriate parameters. The unused precreated variable (line 68) is acceptable here—the job is created for its side effects and fetched later via CRUD.

111-228: LGTM!

Both tests effectively validate the execute_job workflow:

Success path verifies job transitions to SUCCESSFUL, CollectionCrud.delete is invoked, and OpenAI client is used.

Failure path confirms job is marked FAILED with the error message when delete raises an exception.

The mocking strategy and assertions are appropriate.

backend/app/tests/services/collections/test_create_collection.py (2)

48-109: LGTM!

The test correctly validates start_job behavior, ensuring the CollectionJob is updated and the background task is scheduled with appropriate parameters. The unused variable at line 68 is acceptable—it's created for side effects.

111-190: LGTM!

The test comprehensively validates the execute_job success flow, including:

AWS S3 mocking for document storage

OpenAI client interaction

Collection and DocumentCollection creation

Job status updates

The test structure and assertions are thorough and appropriate.

backend/app/models/collection.py (5)

13-36: LGTM!

The Collection model is properly structured with appropriate foreign keys, cascading deletes, and relationships. The timestamp fields follow the project's naming convention.

39-51: LGTM!

The ResponsePayload model properly encapsulates response metadata for background jobs. The default factories for key and time, along with the now() classmethod, are well-designed.

54-69: LGTM!

The DocumentOptions model is well-designed with appropriate defaults and helpful field descriptions. The model_post_init hook that deduplicates document IDs prevents duplicate processing.

72-102: LGTM!

Both AssistantOptions and CallbackRequest are well-documented with detailed field descriptions. The temperature default and its explanation are particularly helpful for API users.

105-129: LGTM!

The request models (CreationRequest, DeletionRequest) and CollectionPublic are properly structured:

Multiple inheritance pattern cleanly composes request models

extract_super_type utility method is useful for extracting parent class fields

CollectionPublic appropriately exposes only necessary fields for the public API

backend/app/services/collections/create_collection.py (2)

41-71: LGTM!

The start_job function correctly orchestrates job scheduling:

Updates the CollectionJob with correlation/trace context

Schedules the background task with appropriate parameters

Returns the job ID for tracking

The "unused" collection_job variable at line 52 is acceptable—the update is persisted via the CRUD method.

199-212: Outer exception handler risks UnboundLocalError.

If an error occurs before line 95 (e.g., during CreationRequest construction or job fetching), collection_job_crud and collection_job won't be defined, causing lines 206-211 to raise UnboundLocalError. Additionally, the session from line 89 is already closed when this handler executes.

This issue was flagged in past reviews and noted for addressing in a future PR. Please ensure it's tracked in the GitHub issue for v2 session management improvements.

Based on past review comments.

backend/app/services/collections/delete_collection.py (2)

25-56: LGTM!

The start_job function properly orchestrates collection deletion:

Updates the CollectionJob with trace context

Schedules the background task with all necessary parameters

Returns the job ID for tracking

The "unused" collection_job variable at line 37 is acceptable—the update is persisted via the CRUD method.

59-117: LGTM!

The execute_job success path is well-structured:

Properly constructs request/payload models and selects the appropriate callback

Updates job status to PROCESSING before attempting deletion

Uses CollectionJobUpdate schema correctly for job updates

Marks job SUCCESSFUL and triggers success callback upon completion

backend/app/tests/services/collections/test_delete_collection.py

avirajsingh7 and others added 18 commits September 23, 2025 10:13

fix process_response argument

8123e26

Refactor process_response to use get_openai_client for OpenAI API cli…

073a61c

…ent creation

Refactor response api:

a757f29

1) Move models from route to models folder 2) fix process response to handle callback response

move response code to service

824652e

Implement job management and integrate it with response processing

b471c3e

pass trace id to job table

580c3f5

pre commit and pass trace id to celery

0de7cd0

add task id to response log

19928ab

Refactor response handling: split response logic into separate modules

b1e8fb4

Refactor callback handling: move send_callback function to utils and …

3d9b2fa

…clean up imports

Add ResponseJobStatus model and update responses endpoint to return s…

b962760

…tructured response

fix init

7764568

Add tests for JobCrud and response job handling

a569485

update Job model to use string for task_id

87ecdb6

Add tests for response generation and processing, including success a…

96160c6

…nd error handling

pre commit

f150372

rename test_jobs

e7fe68d

move tenant from user to project, add collections in services folder …

e62fa53

…for job execution

avirajsingh7 and others added 11 commits September 24, 2025 14:05

Add flower dependency to pyproject.toml and uv.lock

bc40a18

Refactor job table migration and enhance error handling in job schedu…

f3f1565

…ling

Add CALLBACK_TIMEOUT setting and update send_callback to use it

7504ef7

Add callback timeout settings and update send_callback function to us…

1268dfb

…e them

adding batching document in helper function, updating test cases for …

6bc2a47

…the shift from user id to project id

Remove unused response_chunks in CallbackResponse and update related …

65380f8

…functions

Fix update_job test to assert failure status and correct error message

db0c276

Refactor get_additional_data function to simplify exclusion logic for…

6fe72d6

… request metadata

Update job_type field description for clarity and consistency

eaf8202

Remove unused imports and add conditional previous_response_id in gen…

fcbd764

…erate_response function

services test cases

4083b62

nishika26 added 2 commits October 6, 2025 14:23

migration for removing failed collection columns

8e8fe7d

migration for removing failed collection columns

5b6f2f2

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

kartpop requested changes Oct 6, 2025

View reviewed changes

coderabbit and pr review fixes

4a1f5ff

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

minor changes

14d50a8

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

minor fixes

1dcab67

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

avirajsingh7 suggested changes Oct 7, 2025

View reviewed changes

pr reviews

7b09e3f

coderabbitai bot reviewed Oct 8, 2025

View reviewed changes

backend/app/services/collections/delete_collection.py Show resolved Hide resolved

backend/app/tests/api/routes/collections/test_collection_info.py Show resolved Hide resolved

backend/app/tests/services/collections/test_create_collection.py Show resolved Hide resolved

minimal fixes

95ef105

coderabbitai bot reviewed Oct 8, 2025

View reviewed changes

backend/app/tests/services/collections/test_create_collection.py Show resolved Hide resolved

kartpop approved these changes Oct 8, 2025

View reviewed changes

changing router name

4de8c1d

nishika26 mentioned this pull request Oct 8, 2025

Collections : enhancement of the module #398

Closed

coderabbitai bot reviewed Oct 8, 2025

View reviewed changes

Merge branch 'main' into feature/collection_to_celery

ebf0cc8

avirajsingh7 suggested changes Oct 9, 2025

View reviewed changes

last pr review changes

2c0a11d

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

backend/app/models/collection.py Outdated Show resolved Hide resolved

backend/app/services/collections/delete_collection.py Outdated Show resolved Hide resolved

minimal fixes

94234b1

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

backend/app/tests/services/collections/test_delete_collection.py Show resolved Hide resolved

avirajsingh7 approved these changes Oct 9, 2025

View reviewed changes

nishika26 merged commit 20af88d into main Oct 9, 2025
3 checks passed

nishika26 deleted the feature/collection_to_celery branch October 9, 2025 09:18

coderabbitai bot mentioned this pull request Oct 20, 2025

Collection : with assistant flag and additional enhancements #412

Merged

2 tasks

coderabbitai bot mentioned this pull request Nov 18, 2025

Document : moving transform job to celery and adding callback url #437

Merged

2 tasks

Conversation

nishika26 commented Sep 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kartpop left a comment

Choose a reason for hiding this comment

Uh oh!

kartpop Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kartpop Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

nishika26 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nishika26 commented Sep 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 23, 2025 •

edited

Loading

nishika26 Oct 9, 2025 •

edited

Loading

nishika26 Oct 9, 2025 •

edited

Loading