Skip to content

Collection : with assistant flag and additional enhancements#412

Merged
nishika26 merged 22 commits intomainfrom
enhancement/collection_vector_store
Nov 7, 2025
Merged

Collection : with assistant flag and additional enhancements#412
nishika26 merged 22 commits intomainfrom
enhancement/collection_vector_store

Conversation

@nishika26
Copy link
Copy Markdown
Collaborator

@nishika26 nishika26 commented Oct 20, 2025

Summary

Target issue is #398 and also covering this issue - #231 since this is just for writing test cases for existing router endpoints

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

for reference of request bodies of the job related endpoints of this module you can check this document out.

Create job (create_collection.execute_job) :

  • Introduced with_assistant flag: when model & instructions are both present, create an Assistant backed by a Vector Store; otherwise only create a Vector Store.

  • We now close the DB session before making external API calls—and reopen a fresh session only to persist results—so we don’t hold connections or risk transaction locks. Same is being done during deletion of collection.

Delete job (delete_collection.execute_job) :

  • Detects service type from collection.llm_service_name (Assistant vs Vector Store) and chooses the correct CRUD (OpenAIAssistantCrud / OpenAIVectorStoreCrud) to use delete method of that specific CRUD class.

  • Calls local deletion via CollectionCrud.delete_by_id() once remote deletion succeeds.

CRUD refactors :

  • CollectionCrud.delete_by_id() sets deleted_at=now() and persists.

  • Document→Collection cascade delete now excludes already-deleted collections (deleted_at IS NULL).

API route :

  • /collections/create: determines with_assistant by checking both model and instructions. Returns metadata note when creating vector-store-only.

  • Validation: if only one of model or instructions is provided, responds 422 with a clear message.

Additional :

  • Removed ResponsePayload class because we are now returning collection job object in response of create and delete collections, we use the collection job object for the callback URL as well.

  • Rephrased the endpoint docs to remove the indication that user can use any specific endpoint for polling.

Summary by CodeRabbit

  • New Features

    • Unified Collections API: list, get (now supports include_docs, skip, limit), create and delete return immediate job metadata, support optional callbacks, and expose new public response shapes including collection-with-docs.
  • Bug Fixes

    • Improved cleanup and failure handling, consistent job states, reads ignore deleted items, and deletions correctly route to the appropriate remote backend.
  • Documentation

    • Clarified assistant vs vector-only creation, job semantics, callbacks and polling guidance.
  • Tests

    • Expanded coverage for create/delete jobs, callbacks, listing, pagination and helper utilities.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 20, 2025

Walkthrough

Refactors collection endpoints into job-driven create/delete flows with callback support, consolidates document retrieval into collection info (with pagination), introduces public collection/job models, routes remote ops between assistant and vector-store backends, updates CRUD/services/helpers for cleanup, and adds broad test coverage.

Changes

Cohort / File(s) Summary
API Routes & Callbacks
backend/app/api/routes/collections.py, backend/app/api/routes/collection_job.py
Reworked routes: POST /collections/ and DELETE /collections/{collection_id} accept JSON request models, schedule background jobs (request.model_dump(mode="json")), register callback routes (collection_callback_router), and return new public response models (CollectionJobImmediatePublic, CollectionWithDocsPublic, List[CollectionPublic]).
Models — Collections & Jobs
backend/app/models/collection.py, backend/app/models/collection_job.py, backend/app/models/__init__.py
Removed ResponsePayload; made AssistantOptions.model/instructions optional with a validator enforcing both-or-none; added CollectionIDPublic, CollectionPublic, CollectionWithDocsPublic; converted CollectionJob to table-backed model and introduced CollectionJobBasePublic, CollectionJobImmediatePublic, CollectionJobPublic.
CRUD — Collections & RAG
backend/app/crud/collection/collection.py, backend/app/crud/rag/...
CollectionCrud.create accepts optional documents and returns Collection; added delete_by_id (soft-delete); read filters ignore deleted records; document cleanup/join conditions respect deleted_at; introduced OpenAIVectorStoreCrud and remote-service selection logic.
Services — Create / Delete / Helpers
backend/app/services/collections/create_collection.py, backend/app/services/collections/delete_collection.py, backend/app/services/collections/helpers.py
start_job/execute_job signatures updated to use with_assistant and JSON request payloads; added build_success_payload/build_failure_payload, centralized failure handlers and cleanup (_mark_job_failed[_and_callback], _cleanup_remote_resources); create flow batches docs, initializes vector store (and optional assistant), links docs, updates job state and triggers callbacks; delete flow selects remote service (assistant vs vector store) and deletes remote then local collection; added OPENAI_VECTOR_STORE and pick_service_for_documennt.
Documents Route Changes
backend/app/api/routes/documents.py
Document deletion/permanent-delete now select remote CRUD via pick_service_for_documennt(session, doc_id, a_crud, v_crud) and route deletions through assistant or vector-store backends.
API Docs
backend/app/api/docs/collections/*.md
Docs updated: assistant creation requires both model+instructions; immediate responses emphasize job_id and async creation; delete notes optional callback_url; list/info docs reflect document inclusion and field meanings.
Tests & Test Utilities
backend/app/tests/.../collections/*.py, backend/app/tests/utils/collection.py, backend/app/tests/services/collections/test_helpers.py
Tests added/rewritten to use CreationRequest/DeletionRequest JSON, new helpers (get_collection, get_vector_store_collection, get_collection_job, link_document_to_collection), and to cover callbacks, failure/backout, listing, batching, error extraction, and job endpoints; assertions updated for new public models and job shapes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API
    participant Scheduler
    participant Worker
    participant RemoteSvc
    participant DB

    Client->>API: POST /collections/ (CreationRequest JSON)
    Note right of API: compute with_assistant = model && instructions
    API->>Scheduler: start_job(project_id, org_id, with_assistant, request_json)
    Scheduler->>DB: insert CollectionJob(status=PENDING)
    Scheduler-->>Client: 200 (CollectionJobImmediatePublic)

    rect rgb(240,255,240)
    Note over Worker: async execute_job(job_id, with_assistant, request)
    Worker->>DB: update CollectionJob -> PROCESSING
    alt with_assistant == true
        Worker->>RemoteSvc: create vector store & upload vectors
        Worker->>RemoteSvc: create assistant (model+instructions)
    else
        Worker->>RemoteSvc: upload vectors only
    end
    Worker->>DB: create Collection + link documents
    Worker->>DB: update CollectionJob -> SUCCESSFUL
    alt callback_url present
        Worker->>API: send_callback(success_payload)
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas needing extra attention:

  • create_collection.execute_job and delete_collection.execute_job (resource orchestration, cleanup/backout, callback payload composition).
  • helpers.pick_service_for_documennt and vector-store vs assistant CRUD selection used by document and delete flows.
  • AssistantOptions model_validator and CollectionJob public model serialization/exclude_none behavior.
  • CRUD soft-delete changes (deleted_at) and DocumentCollection linking behaviors.
  • Test expectations that use request.model_dump(mode="json") and new test utilities (get_collection, get_collection_job).

Possibly related PRs

Suggested reviewers

  • AkhileshNegi
  • kartpop
  • avirajsingh7

Poem

🐇 I hopped through queues with JSON cheer,
Jobs queued, then processed far and near.
Vectors hummed, assistants dreamed,
Callbacks chimed while schemas gleamed.
Tests ate carrots — all green and clear.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.61% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change—introducing a with_assistant flag for collection creation and documenting related enhancements. It accurately reflects the primary objective of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enhancement/collection_vector_store

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 20, 2025

@nishika26 nishika26 marked this pull request as ready for review October 20, 2025 13:25
@nishika26 nishika26 changed the title first commit containing everything Collection : with assistant flag and additional enhancements Oct 20, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
backend/app/models/collection.py (1)

120-124: Pydantic v2 break: use model_fields instead of __fields__ in CreationRequest.extract_super_type.

This will fail on v2 (no __fields__). It’s used by execute_job, so collection creation can break at runtime.

Apply this robust v1/v2‑compatible diff:

-    def extract_super_type(self, cls: "CreationRequest"):
-        for field_name in cls.__fields__.keys():
-            field_value = getattr(self, field_name)
-            yield (field_name, field_value)
+    def extract_super_type(self, cls: type[SQLModel]):
+        # Pydantic v2: `model_fields`; v1 fallback: `__fields__`
+        fields = getattr(cls, "model_fields", None) or getattr(cls, "__fields__", {})
+        for field_name in fields.keys():
+            yield (field_name, getattr(self, field_name))
backend/app/services/collections/helpers.py (1)

48-67: Validate batch_size to avoid infinite loops and misuse.

Guard against zero/negative values.

 def batch_documents(
     document_crud: DocumentCrud, documents: List[UUID], batch_size: int
 ):
@@
-    docs_batches = []
+    if batch_size <= 0:
+        raise ValueError("batch_size must be a positive integer")
+    docs_batches = []
backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (1)

3-4: Remove unused OpenAI client and import.

Cleans up dead code and silences F841.

-from openai import OpenAI
 from openai_responses import OpenAIMock
@@
-        client = OpenAI(api_key="sk-test-key")
         collection = get_collection(db, project=project)

Also applies to: 18-19

backend/app/tests/api/routes/collections/test_collection_info.py (1)

26-33: Fix Enum-vs-string comparisons in assertions.

API returns strings; tests compare against Enum instances. Compare to .value (or cast) to avoid false failures.

-    assert data["status"] == CollectionJobStatus.PENDING
+    assert data["status"] == CollectionJobStatus.PENDING.value
@@
-    assert data["status"] == CollectionJobStatus.SUCCESSFUL
-    assert data["action_type"] == CollectionActionType.CREATE
+    assert data["status"] == CollectionJobStatus.SUCCESSFUL.value
+    assert data["action_type"] == CollectionActionType.CREATE.value
@@
-    assert data["status"] == CollectionJobStatus.FAILED
+    assert data["status"] == CollectionJobStatus.FAILED.value

Also applies to: 54-58, 84-85

backend/app/crud/collection/collection.py (1)

22-31: Ensure updated_at is bumped on updates.

set updated_at before commit so soft-deletes and other updates record a fresh timestamp.

 def _update(self, collection: Collection):
+    collection.updated_at = now()
     self.session.add(collection)
     self.session.commit()
     self.session.refresh(collection)
backend/app/services/collections/delete_collection.py (1)

23-52: Return type mismatch in start_job.

Signature says -> str but returns a UUID. Return a string (or change the annotation), to avoid downstream type surprises.

-    return collection_job_id
+    return str(collection_job_id)
backend/app/services/collections/create_collection.py (1)

39-70: Return type mismatch in start_job.

Signature is -> str but returns a UUID. Align the return.

-    return collection_job_id
+    return str(collection_job_id)
🧹 Nitpick comments (10)
backend/app/models/collection.py (2)

6-6: Migrate deprecated root_validator to v2 model_validator(mode="before").

Removes deprecation and aligns with v2 semantics. Behavior remains the same.

Based on learnings

-from pydantic import HttpUrl, root_validator
+from pydantic import HttpUrl, model_validator
@@
-    @root_validator(pre=True)
-    def _assistant_fields_all_or_none(cls, values):
+    @model_validator(mode="before")
+    def _assistant_fields_all_or_none(cls, values: dict):
         def norm(x):
             return x.strip() if isinstance(x, str) and x.strip() else None

Also applies to: 89-105


61-69: Small copy/consistency fixes in field descriptions.

  • “compatable” → “compatible”.
  • Consider clarifying “[To Be Deprecated]” timeline or linking to the replacement API.
-            "must be compatable with the assistants API; see the "
+            "must be compatible with the Assistants API; see the "

Also applies to: 71-78, 80-87

backend/app/services/collections/helpers.py (1)

71-73: Add type hints to callbacks and align _backout docstring with generic resource.

Improves readability and linting; no behavior change.

-from typing import List
+from typing import List, Any, Optional
@@
-class CallbackHandler:
-    def __init__(self, collection_job):
-        self.collection_job = collection_job
+class CallbackHandler:
+    def __init__(self, collection_job: Any):
+        self.collection_job = collection_job
@@
-class WebHookCallback(CallbackHandler):
-    def __init__(self, url: HttpUrl, collection_job):
+class WebHookCallback(CallbackHandler):
+    def __init__(self, url: HttpUrl, collection_job: Any):
         super().__init__(collection_job)
         self.url = url
@@
-    def __call__(self, response: APIResponse):
+    def __call__(self, response: APIResponse) -> None:
         logger.info(
             f"[WebHookCallback.call] Posting callback | {{'url': '{self.url}'}}"
         )
-        post_callback(self.url, response)
+        ok = post_callback(self.url, response)
+        if not ok:
+            logger.warning("[WebHookCallback.call] Callback POST failed")
@@
-def _backout(crud, llm_service_id: str):
-    """Best-effort cleanup: attempt to delete the assistant by ID"""
+def _backout(crud, llm_service_id: str):
+    """Best-effort cleanup: attempt to delete the remote resource by ID"""

Also applies to: 92-119

backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)

71-75: Remove unused APIKey lookup.

api_key isn’t used; delete dead code to keep tests lean.

-        stmt = select(APIKey).where(
-            APIKey.project_id == project.id, APIKey.is_deleted == False
-        )
-        api_key = db.exec(stmt).first()
backend/app/api/routes/collections.py (1)

55-58: Make with_assistant detection robust to “provided vs empty”.

Use Pydantic v2’s model_fields_set to detect fields provided in the body and keep truthiness checks explicit.

-    # True iff both model and instructions were provided in the request body
-    with_assistant = bool(
-        getattr(request, "model", None) and getattr(request, "instructions", None)
-    )
+    # True iff both fields were present in the payload and non-empty
+    provided = getattr(request, "model_fields_set", set())
+    with_assistant = (
+        {"model", "instructions"}.issubset(provided)
+        and bool(getattr(request, "model"))
+        and bool(getattr(request, "instructions"))
+    )

If you intend “present regardless of emptiness,” drop the last two bool(...) checks.

backend/app/crud/collection/collection.py (1)

1-6: Duplicate import.

Drop the second import logging.

-import logging
+import logging
backend/app/services/collections/delete_collection.py (3)

83-96: Brittle service detection by display name.

String-equality on llm_service_name is fragile. At minimum, normalize with casefold() and accept a small set of aliases; longer term, persist a canonical enum/boolean (e.g., service_type or with_assistant) on Collection.

-        service = (collection.llm_service_name or "").strip().lower()
-        is_vector = service == "openai vector store"
+        service = (collection.llm_service_name or "").casefold().strip()
+        is_vector = service in {"openai vector store", "vector_store", "vector-store"}

Please confirm all producers set llm_service_name consistently; otherwise, consider adding a service_type enum.


85-95: Unnecessary getattr fallbacks; simplify to the modeled field.

Collection exposes llm_service_id. The vector_store_id/assistant_id attributes aren’t part of the model here, so the fallbacks add noise.

-        llm_service_id = (
-            (
-                getattr(collection, "vector_store_id", None)
-                or getattr(collection, "llm_service_id", None)
-            )
-            if is_vector
-            else (
-                getattr(collection, "assistant_id", None)
-                or getattr(collection, "llm_service_id", None)
-            )
-        )
+        llm_service_id = getattr(collection, "llm_service_id", None)

54-62: Unused parameter task_instance.

Prefix with _ (or remove) to satisfy linters.

-def execute_job(
+def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
     collection_id: str,
-    task_instance,
+    _task_instance,
 ) -> None:
backend/app/services/collections/create_collection.py (1)

72-80: Unused parameter task_instance.

Rename to _task_instance (or drop if not required by the worker runner).

-def execute_job(
+def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
-    with_assistant: bool,
-    task_instance,
+    with_assistant: bool,
+    _task_instance,
 ) -> None:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b7b640 and 642e39c.

📒 Files selected for processing (16)
  • backend/app/api/routes/collections.py (3 hunks)
  • backend/app/crud/collection/collection.py (3 hunks)
  • backend/app/models/collection.py (2 hunks)
  • backend/app/services/collections/create_collection.py (3 hunks)
  • backend/app/services/collections/delete_collection.py (3 hunks)
  • backend/app/services/collections/helpers.py (2 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (3 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_create.py (2 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (2 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (7 hunks)
  • backend/app/tests/services/collections/test_delete_collection.py (6 hunks)
  • backend/app/tests/services/collections/test_helpers.py (1 hunks)
  • backend/app/tests/utils/collection.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/crud/collection/collection.py
  • backend/app/tests/utils/collection.py
  • backend/app/tests/api/routes/collections/test_collection_info.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py
  • backend/app/tests/api/routes/collections/test_create_collections.py
  • backend/app/api/routes/collections.py
  • backend/app/services/collections/delete_collection.py
  • backend/app/models/collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/helpers.py
  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py
  • backend/app/tests/services/collections/test_helpers.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py
  • backend/app/tests/services/collections/test_delete_collection.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_create.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/collection/collection.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/helpers.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
🧬 Code graph analysis (15)
backend/app/crud/collection/collection.py (4)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/tests/utils/collection.py (5)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/models/collection_job.py (3)
  • CollectionActionType (18-20)
  • CollectionJob (35-64)
  • CollectionJobStatus (11-15)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • create (78-96)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
  • get_collection (16-33)
backend/app/tests/api/routes/collections/test_collection_info.py (4)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-48)
  • get_collection_job (77-98)
backend/app/models/collection_job.py (2)
  • CollectionActionType (18-20)
  • CollectionJobStatus (11-15)
backend/app/tests/conftest.py (3)
  • db (28-45)
  • client (65-68)
  • user_api_key_header (90-92)
backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)
backend/app/tests/utils/collection.py (1)
  • get_collection (25-48)
backend/app/tests/api/routes/collections/test_create_collections.py (3)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
backend/app/models/collection.py (1)
  • CreationRequest (115-123)
backend/app/tests/conftest.py (3)
  • client (65-68)
  • user_api_key_header (90-92)
  • user_api_key (102-104)
backend/app/api/routes/collections.py (3)
backend/app/services/collections/create_collection.py (1)
  • start_job (39-69)
backend/app/services/collections/delete_collection.py (1)
  • start_job (23-51)
backend/app/utils.py (2)
  • APIResponse (29-53)
  • success_response (36-39)
backend/app/services/collections/delete_collection.py (7)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
backend/app/models/collection.py (1)
  • DeletionRequest (126-127)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • update (58-76)
backend/app/crud/collection/collection.py (4)
  • read_one (68-92)
  • CollectionCrud (17-154)
  • delete (112-120)
  • delete_by_id (105-109)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (81-88)
  • WebHookCallback (91-118)
  • fail (74-75)
  • fail (82-84)
  • fail (105-113)
  • success (77-78)
  • success (86-88)
  • success (115-118)
backend/app/services/collections/create_collection.py (10)
backend/app/models/collection.py (4)
  • CreationRequest (115-123)
  • extract_super_type (120-123)
  • AssistantOptions (57-105)
  • Collection (13-36)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • update (58-76)
  • create (78-96)
backend/app/crud/collection/collection.py (3)
  • read_one (68-92)
  • create (42-66)
  • CollectionCrud (17-154)
backend/app/models/collection_job.py (2)
  • CollectionJobUpdate (74-79)
  • CollectionJobStatus (11-15)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/services/collections/helpers.py (10)
  • batch_documents (48-67)
  • SilentCallback (81-88)
  • WebHookCallback (91-118)
  • success (77-78)
  • success (86-88)
  • success (115-118)
  • _backout (121-129)
  • fail (74-75)
  • fail (82-84)
  • fail (105-113)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/services/collections/helpers.py (4)
backend/app/utils.py (3)
  • APIResponse (29-53)
  • failure_response (42-53)
  • success_response (36-39)
backend/app/core/util.py (1)
  • post_callback (29-39)
backend/app/crud/collection/collection.py (1)
  • delete (112-120)
backend/app/tests/services/collections/test_helpers.py (2)
  • delete (145-146)
  • delete (158-159)
backend/app/tests/services/collections/test_create_collection.py (6)
backend/app/models/collection.py (1)
  • CreationRequest (115-123)
backend/app/api/routes/collections.py (1)
  • create_collection (41-79)
backend/app/services/collections/create_collection.py (2)
  • start_job (39-69)
  • execute_job (72-238)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/collection.py (1)
  • get_collection_job (77-98)
backend/app/tests/conftest.py (1)
  • db (28-45)
backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (3)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
  • get_collection (16-33)
backend/app/tests/utils/collection.py (1)
  • get_collection (25-48)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/services/collections/test_helpers.py (2)
backend/app/services/collections/helpers.py (11)
  • extract_error_message (19-45)
  • batch_documents (48-67)
  • SilentCallback (81-88)
  • success (77-78)
  • success (86-88)
  • success (115-118)
  • fail (74-75)
  • fail (82-84)
  • fail (105-113)
  • WebHookCallback (91-118)
  • _backout (121-129)
backend/app/utils.py (1)
  • APIResponse (29-53)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (5)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (1)
  • get_collection (25-48)
backend/app/tests/services/collections/test_delete_collection.py (6)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-48)
  • get_collection_job (77-98)
backend/app/services/collections/delete_collection.py (2)
  • start_job (23-51)
  • execute_job (54-194)
backend/app/crud/collection/collection.py (5)
  • _ (123-130)
  • _ (133-154)
  • delete (112-120)
  • delete_by_id (105-109)
  • read_one (68-92)
backend/app/tests/crud/collections/collection/test_crud_collection_create.py (4)
backend/app/crud/collection/collection.py (1)
  • CollectionCrud (17-154)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/tests/utils/document.py (1)
  • project (65-66)
🪛 Ruff (0.14.0)
backend/app/tests/utils/collection.py

30-30: Unused function argument: model

(ARG001)


56-56: Unused function argument: model

(ARG001)

backend/app/tests/api/routes/collections/test_create_collections.py

5-5: Redefinition of unused patch from line 2

Remove definition: patch

(F811)

backend/app/services/collections/create_collection.py

79-79: Unused function argument: task_instance

(ARG001)

backend/app/tests/services/collections/test_create_collection.py

220-220: Local variable mock_client is assigned to but never used

Remove assignment to unused variable mock_client

(F841)

backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py

18-18: Local variable client is assigned to but never used

Remove assignment to unused variable client

(F841)

backend/app/tests/crud/collections/collection/test_crud_collection_delete.py

16-16: Redefinition of unused get_collection from line 13

(F811)

🔇 Additional comments (19)
backend/app/tests/crud/collections/collection/test_crud_collection_create.py (1)

18-25: LGTM — direct Collection construction matches the model and CRUD flow.

backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)

14-29: The review comment is technically incorrect about the validation behavior.

The Collection model does not define assistant_id or vector_store_id fields, but this won't cause validation errors. By default, SQLModel/Pydantic v2 uses extra='ignore' behavior, so extra fields are silently ignored rather than forbidden. The Collection model has no custom model_config or Config class that would change this.

While passing undefined parameters is poor practice and suggests a code organization issue, it will not error under Pydantic v2 as claimed. The actual concern is code clarity: these extra parameters are silently dropped, making the helper's intent unclear.

The unused client = OpenAI(...) variable on line 19 is a valid minor concern but unrelated to the Collection model issue.

Likely an incorrect or invalid review comment.

backend/app/api/routes/collections.py (1)

69-79: Nice UX: metadata explains vector-store-only path.

Clear guidance in responses when with_assistant is False. LGTM.

backend/app/tests/services/collections/test_delete_collection.py (3)

16-21: start_job test aligns with new scheduling contract.

Asserts on kwargs (function_path, ids, payload via model_dump(mode="json")) look correct.

Also applies to: 61-73


98-137: execute_job success: verifies remote delete and local soft-delete.

Good use of patches; asserts on delete_by_id, task/status updates, and assistant id match are accurate.


160-205: execute_job failure: correct FAILED status and no local delete.

Mocked exception path and assertions look right, including error_message contents.

backend/app/tests/services/collections/test_helpers.py (5)

10-35: extract_error_message tests cover JSON, dict, fallback, truncation.

Concise and robust. LGTM.


41-66: batch_documents tests validate chunking and outputs.

Even/ragged/empty paths well covered.


86-91: SilentCallback no-ops confirmed.

Simple and sufficient.


93-135: WebHookCallback success/failure flows verified with metadata.

Good assertions on APIResponse shape and job id propagation.


140-161: _backout cleanup behavior validated.

Delete called once; OpenAIError swallowed as intended.

backend/app/crud/collection/collection.py (1)

140-143: Good: filter out already-deleted collections when deleting by Document.

This avoids double-deletes and keeps semantics consistent.

backend/app/tests/api/routes/collections/test_create_collections.py (4)

12-14: Helper for metadata is tidy.

Nicely handles both metadata and meta.


16-66: End-to-end assertions look solid.

Good coverage: enums, ids, trace, and with_assistant=True propagation to start_job.


68-110: Vector-only path checks are appropriate.

Validates metadata note and with_assistant=False flow.


112-137: Validation error test is precise.

Matches the model’s “both or none” requirement.

backend/app/services/collections/create_collection.py (3)

125-153: Resource creation flow looks correct.

Vector store created/updated; assistant created conditionally; IDs wired into Collection. Logs are informative.


154-195: Metrics/logging block is useful.

File types, sizes, and timing aid observability.


220-238: Backout and failure marking are robust.

Best-effort cleanup plus job status update and callback are handled.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/api/docs/collections/job_info.md (1)

9-9: Improve clarity of grammar on collection deletion response.

Line 9 uses awkward phrasing: "nothing will be returned as collection" is unclear.

Apply this diff to improve clarity:

-* If the job of delete collection was successful, we will get the status as successful and nothing will be returned as collection.
+* If the collection deletion job was successful, the status will be marked as successful and no collection object will be returned.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 642e39c and aa23f3d.

📒 Files selected for processing (3)
  • backend/app/api/docs/collections/create.md (2 hunks)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/api/docs/collections/job_info.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • backend/app/api/docs/collections/create.md
🧰 Additional context used
🪛 LanguageTool
backend/app/api/docs/collections/delete.md

[grammar] ~13-~13: Ensure spelling is correct
Context: ...deleted and marked as deleted. You will recieve a message in the callback_url about t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/crud/collection/collection.py (3)

1-5: Remove duplicate logging import.

The logging module is imported on both lines 1 and 5.

 import logging
 import functools as ft
 from uuid import UUID
 from typing import Optional
-import logging

22-30: Add return type hint to _update method.

Per the coding guidelines for Python 3.11+ projects, type hints should be used consistently.

As per coding guidelines.

-    def _update(self, collection: Collection):
+    def _update(self, collection: Collection) -> Collection:
         self.session.add(collection)
         self.session.commit()
         self.session.refresh(collection)

94-103: Add return type hint to read_all method.

Per the coding guidelines for Python 3.11+ projects, type hints should be used consistently.

As per coding guidelines.

-    def read_all(self):
+    def read_all(self) -> list[Collection]:
         statement = select(Collection).where(
             and_(
                 Collection.project_id == self.project_id,
♻️ Duplicate comments (2)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)

15-34: Org/project mismatch: helper ignores passed project_id when fetching project.

The helper calls get_project(db) which retrieves a random active project, but then uses the passed project_id parameter directly when constructing the Collection. This causes organization_id and project_id to reference mismatched entities when the caller passes a specific project_id.

Accept an explicit project parameter instead:

 def get_collection_for_delete(
-    db: Session, client=None, project_id: int = None
+    db: Session, project, client=None
 ) -> Collection:
-    project = get_project(db)
     if client is None:
         client = OpenAI(api_key="test_api_key")

     vector_store = client.vector_stores.create()
     assistant = client.beta.assistants.create(
         model="gpt-4o",
         tools=[{"type": "file_search"}],
         tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
     )

     return Collection(
         organization_id=project.organization_id,
-        project_id=project_id,
+        project_id=project.id,
         llm_service_id=assistant.id,
         llm_service_name="gpt-4o",
     )

Then update call sites:

-collection = get_collection_for_delete(db, client, project_id=project.id)
+collection = get_collection_for_delete(db, project, client)
backend/app/crud/collection/collection.py (1)

42-66: PK collision risk with soft-deleted rows remains unresolved.

The past review identified that read_one() filters out soft-deleted collections (line 73: Collection.deleted_at.is_(None)). If a soft-deleted collection with the same id exists, read_one() raises 404, and the subsequent session.add() at line 59 attempts an insert with a duplicate primary key, causing an IntegrityError.

The suggested fix from the previous review was not applied. Additionally, the existing = None initialization on line 45 is redundant.

Apply the corrected diff from the past review:

 def create(
     self, collection: Collection, documents: Optional[list[Document]] = None
 ) -> Collection:
-    existing = None
-    try:
-        existing = self.read_one(collection.id)
-    except HTTPException as e:
-        if e.status_code != 404:
-            raise
-
-    if existing is not None:
+    # First, see if any row exists for this id (deleted or not)
+    any_row = self.session.exec(
+        select(Collection).where(Collection.id == collection.id)
+    ).one_or_none()
+    if any_row is not None:
+        if any_row.deleted_at is None:
+            logger.warning(
+                "[CollectionCrud.create] Collection already present | {'collection_id': '%s'}",
+                collection.id,
+            )
+            return any_row
+        # id is already used by a soft-deleted row
+        raise HTTPException(status_code=409, detail="Collection id already exists (soft-deleted)")
-        logger.warning(
-            "[CollectionCrud.create] Collection already present | "
-            f"{{'collection_id': '{collection.id}'}}"
-        )
-        return existing
 
     self.session.add(collection)
     self.session.commit()
     self.session.refresh(collection)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa23f3d and 501b46a.

📒 Files selected for processing (7)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/crud/collection/collection.py (3 hunks)
  • backend/app/models/collection.py (3 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (4 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (7 hunks)
  • backend/app/tests/utils/collection.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/api/docs/collections/delete.md
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py
  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/models/collection.py
  • backend/app/crud/collection/collection.py
  • backend/app/tests/api/routes/collections/test_create_collections.py
  • backend/app/tests/utils/collection.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/collection/collection.py
🧬 Code graph analysis (5)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (5)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/document.py (2)
  • DocumentStore (53-80)
  • project (65-66)
backend/app/tests/services/collections/test_create_collection.py (4)
backend/app/services/collections/create_collection.py (2)
  • start_job (39-69)
  • execute_job (72-238)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/collection.py (1)
  • get_collection_job (74-95)
backend/app/crud/collection/collection_job.py (4)
  • create (78-96)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
backend/app/tests/conftest.py (3)
  • client (65-68)
  • user_api_key_header (90-92)
  • user_api_key (102-104)
backend/app/tests/utils/collection.py (5)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/models/collection_job.py (3)
  • CollectionActionType (18-20)
  • CollectionJob (35-64)
  • CollectionJobStatus (11-15)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • create (78-96)
backend/app/tests/conftest.py (1)
  • db (28-45)
🔇 Additional comments (8)
backend/app/tests/api/routes/collections/test_create_collections.py (1)

1-136: LGTM!

The test suite comprehensively covers the new with_assistant flow, including:

  • Assistant-enabled path with proper callback verification
  • Vector-only path with metadata validation
  • Validation error for incomplete assistant options

The assertions correctly verify job status, metadata content, and all callback arguments.

backend/app/tests/services/collections/test_create_collection.py (1)

1-255: LGTM!

The service tests correctly verify the updated start_job and execute_job flows with with_assistant=True. The new failure test at lines 194-255 properly validates error handling and vector store cleanup on assistant creation failure.

backend/app/models/collection.py (1)

120-123: LGTM!

Correctly updated to use model_fields (Pydantic v2 API) instead of the deprecated __fields__.

backend/app/tests/utils/collection.py (3)

25-47: LGTM!

The get_collection helper correctly creates an assistant-backed Collection with all required fields matching the schema. The model parameter is properly used for llm_service_name.


50-71: LGTM!

The get_vector_store_collection helper correctly creates a vector-store-only Collection without assistant fields. All fields match the Collection schema.


74-95: LGTM!

The get_collection_job helper provides a clean, reusable way to seed CollectionJob test data with configurable fields. The enum value extraction handles both enum instances and string values properly.

backend/app/crud/collection/collection.py (2)

105-109: LGTM: Clean soft-delete implementation.

The new delete_by_id method correctly implements soft-delete by reading the collection (which verifies it exists and is not already deleted), setting deleted_at, and persisting via _update.


140-143: Good addition: filter excludes soft-deleted collections.

The updated where clause correctly filters out soft-deleted collections when processing document deletions, preventing redundant deletion attempts.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/app/models/collection.py (2)

61-69: Surface deprecations in OpenAPI schema (don’t just say “To Be Deprecated”).

Mark these fields as deprecated in the generated schema so client SDKs and docs flag them. Add json_schema_extra={"deprecated": True} to model, instructions, and temperature.

Apply:

-    model: Optional[str] = Field(
-        default=None,
+    model: Optional[str] = Field(
+        default=None,
         description=(
             "**[To Be Deprecated]**  "
             "OpenAI model to attach to this assistant. The model "
             "must be compatable with the assistants API; see the "
             "OpenAI [model documentation](https://platform.openai.com/docs/models/compare) for more."
-        ),
+        ),
+        json_schema_extra={"deprecated": True},
     )
@@
-    instructions: Optional[str] = Field(
-        default=None,
+    instructions: Optional[str] = Field(
+        default=None,
         description=(
             "**[To Be Deprecated]**  "
             "Assistant instruction. Sometimes referred to as the "
             '"system" prompt.'
-        ),
+        ),
+        json_schema_extra={"deprecated": True},
     )
@@
-    temperature: float = Field(
-        default=1e-6,
+    temperature: float = Field(
+        default=1e-6,
         description=(
             "**[To Be Deprecated]**  "
             "Model temperature. The default is slightly "
             "greater-than zero because it is [unknown how OpenAI "
             "handles zero](https://community.openai.com/t/clarifications-on-setting-temperature-0/886447/5)."
-        ),
+        ),
+        json_schema_extra={"deprecated": True},
     )

Based on learnings.

Also applies to: 71-78, 80-87


122-122: v2-compatible field access is correct; tighten typing and simplify iteration.

Switch the parameter type to a model class and add a return type; iterating dict keys explicitly isn’t needed.

Apply:

-from typing import Any, Optional
+from typing import Any, Optional, Iterator
@@
-    def extract_super_type(self, cls: "CreationRequest"):
-        for field_name in cls.model_fields.keys():
+    def extract_super_type(self, cls: type[SQLModel]) -> Iterator[tuple[str, Any]]:
+        for field_name in cls.model_fields:
             field_value = getattr(self, field_name)
             yield (field_name, field_value)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 501b46a and 1d19b51.

📒 Files selected for processing (1)
  • backend/app/models/collection.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/collection.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/collection.py (1)

6-6: Good move to Pydantic v2 API.

Importing model_validator is correct for v2; no issues.

@nishika26 nishika26 linked an issue Oct 20, 2025 that may be closed by this pull request
@nishika26 nishika26 self-assigned this Oct 20, 2025
@nishika26 nishika26 added enhancement New feature or request ready-for-review labels Oct 20, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 344d77b and cd392c0.

📒 Files selected for processing (9)
  • backend/app/api/routes/collections.py (4 hunks)
  • backend/app/api/routes/documents.py (3 hunks)
  • backend/app/crud/collection/collection.py (4 hunks)
  • backend/app/models/collection_job.py (2 hunks)
  • backend/app/services/collections/create_collection.py (4 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
  • backend/app/services/collections/helpers.py (3 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (3 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/crud/collection/collection.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/collection_job.py
  • backend/app/api/routes/collections.py
  • backend/app/api/routes/documents.py
  • backend/app/tests/api/routes/collections/test_create_collections.py
  • backend/app/services/collections/delete_collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/helpers.py
  • backend/app/tests/api/routes/collections/test_collection_info.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection_job.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
  • backend/app/api/routes/documents.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/helpers.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
PR: ProjectTech4DevAI/ai-platform#0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/services/response/**/*.py : Put OpenAI Responses API integration, conversation management, and job execution in backend/app/services/response/

Applied to files:

  • backend/app/services/collections/create_collection.py
🧬 Code graph analysis (7)
backend/app/api/routes/collections.py (6)
backend/app/models/collection_job.py (4)
  • CollectionJobPublic (86-95)
  • CollectionJobCreate (71-75)
  • CollectionActionType (18-20)
  • CollectionJobStatus (11-15)
backend/app/utils.py (2)
  • APIResponse (29-53)
  • success_response (36-39)
backend/app/services/collections/create_collection.py (1)
  • start_job (40-70)
backend/app/services/collections/delete_collection.py (1)
  • start_job (24-52)
backend/app/models/collection.py (1)
  • DeletionRequest (131-132)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • create (78-96)
backend/app/api/routes/documents.py (3)
backend/app/crud/rag/open_ai.py (3)
  • OpenAIVectorStoreCrud (101-189)
  • delete (174-189)
  • delete (217-246)
backend/app/services/collections/helpers.py (1)
  • pick_service_for_documennt (141-160)
backend/app/crud/collection/collection.py (3)
  • CollectionCrud (18-145)
  • delete (103-111)
  • read_one (59-83)
backend/app/tests/api/routes/collections/test_create_collections.py (3)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
backend/app/models/collection.py (1)
  • CreationRequest (120-128)
backend/app/tests/conftest.py (3)
  • client (65-68)
  • user_api_key_header (90-92)
  • user_api_key (102-104)
backend/app/services/collections/delete_collection.py (6)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (78-83)
  • job_id (67-68)
backend/app/models/collection.py (1)
  • DeletionRequest (131-132)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (86-93)
  • WebHookCallback (96-123)
  • fail (79-80)
  • fail (87-89)
  • fail (110-118)
  • success (82-83)
  • success (91-93)
  • success (120-123)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
backend/app/services/collections/create_collection.py (10)
backend/app/models/collection.py (4)
  • CreationRequest (120-128)
  • extract_super_type (125-128)
  • AssistantOptions (57-110)
  • Collection (13-36)
backend/app/models/collection_job.py (3)
  • job_id (67-68)
  • CollectionJobUpdate (78-83)
  • CollectionJobStatus (11-15)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • update (58-76)
  • create (78-96)
backend/app/crud/collection/collection.py (3)
  • read_one (59-83)
  • create (43-57)
  • CollectionCrud (18-145)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/services/collections/helpers.py (10)
  • batch_documents (53-72)
  • SilentCallback (86-93)
  • WebHookCallback (96-123)
  • success (82-83)
  • success (91-93)
  • success (120-123)
  • _backout (126-134)
  • fail (79-80)
  • fail (87-89)
  • fail (110-118)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/services/collections/helpers.py (4)
backend/app/core/util.py (1)
  • post_callback (29-39)
backend/app/utils.py (3)
  • APIResponse (29-53)
  • failure_response (42-53)
  • success_response (36-39)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/tests/api/routes/collections/test_collection_info.py (4)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_collection_job (74-95)
backend/app/models/collection_job.py (2)
  • CollectionActionType (18-20)
  • CollectionJobStatus (11-15)
backend/app/tests/conftest.py (3)
  • db (28-45)
  • client (65-68)
  • user_api_key_header (90-92)
🪛 Ruff (0.14.2)
backend/app/services/collections/delete_collection.py

62-62: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/create_collection.py

80-80: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/collection_job.py (1)

66-88: job_id alias keeps the API contract consistent.

Glad to see the model now exposes job_id without leaking the internal id, which keeps client payloads stable while preserving the DB primary key.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/tests/services/collections/test_create_collection.py (1)

345-402: Remove the duplicate test_execute_job_success_flow_updates_job_and_creates_collection.

This redefinition overwrites the earlier test, so those assertions never run and ruff raises F811. Please drop the duplicate block (or consolidate any differences) so we keep a single definition.

🧹 Nitpick comments (1)
backend/app/api/routes/collections.py (1)

94-97: Simplify flag determination.

Since request is typed as CreationRequest with model and instructions as direct attributes, you can simplify this without getattr:

-    # True iff both model and instructions were provided in the request body
-    with_assistant = bool(
-        getattr(request, "model", None) and getattr(request, "instructions", None)
-    )
+    # True iff both model and instructions were provided in the request body
+    with_assistant = bool(request.model and request.instructions)

Note: The validation in AssistantOptions._assistant_fields_all_or_none ensures that if one is provided, the other must be too (or the request fails with 422), so this logic is safe.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 335ece7 and dc79111.

📒 Files selected for processing (20)
  • backend/app/api/docs/collections/create.md (2 hunks)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/api/docs/collections/docs.md (0 hunks)
  • backend/app/api/docs/collections/info.md (1 hunks)
  • backend/app/api/routes/collection_job.py (3 hunks)
  • backend/app/api/routes/collections.py (2 hunks)
  • backend/app/models/__init__.py (1 hunks)
  • backend/app/models/collection.py (6 hunks)
  • backend/app/models/collection_job.py (4 hunks)
  • backend/app/services/collections/create_collection.py (3 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
  • backend/app/services/collections/helpers.py (2 hunks)
  • backend/app/tests/api/routes/collections/test_collection_delete.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_collection_job_info.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_collection_list.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (8 hunks)
  • backend/app/tests/services/collections/test_delete_collection.py (6 hunks)
  • backend/app/tests/services/collections/test_helpers.py (1 hunks)
💤 Files with no reviewable changes (1)
  • backend/app/api/docs/collections/docs.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/app/api/docs/collections/delete.md
  • backend/app/tests/services/collections/test_helpers.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/tests/services/collections/test_delete_collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/api/routes/collections.py
  • backend/app/tests/api/routes/collections/test_create_collections.py
  • backend/app/models/collection.py
  • backend/app/tests/api/routes/collections/test_collection_delete.py
  • backend/app/tests/api/routes/collections/test_collection_job_info.py
  • backend/app/tests/api/routes/collections/test_collection_info.py
  • backend/app/api/routes/collection_job.py
  • backend/app/models/__init__.py
  • backend/app/tests/api/routes/collections/test_collection_list.py
  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/models/collection_job.py
  • backend/app/services/collections/helpers.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/helpers.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
  • backend/app/api/routes/collection_job.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
  • backend/app/models/__init__.py
  • backend/app/models/collection_job.py
🧠 Learnings (1)
📚 Learning: 2025-08-06T05:29:45.557Z
Learnt from: avirajsingh7
Repo: ProjectTech4DevAI/ai-platform PR: 319
File: backend/app/crud/prompt.py:111-118
Timestamp: 2025-08-06T05:29:45.557Z
Learning: In FastAPI applications using Pydantic/SQLModel, when update model fields have min_length=1 validation, empty strings are rejected at the validation layer before reaching CRUD functions. This means checking `if field:` vs `if field is not None:` in CRUD operations should consider the model validation constraints.

Applied to files:

  • backend/app/models/collection.py
🧬 Code graph analysis (15)
backend/app/services/collections/delete_collection.py (7)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (108-113)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (175-205)
  • APIResponse (29-53)
  • success_response (36-39)
  • failure_response (42-53)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
backend/app/tests/services/collections/test_delete_collection.py (5)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (12-16)
  • CollectionActionType (19-21)
  • job_id (63-64)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_collection_job (74-95)
backend/app/services/collections/delete_collection.py (2)
  • start_job (26-54)
  • execute_job (146-250)
backend/app/tests/conftest.py (1)
  • db (28-45)
backend/app/crud/collection/collection.py (5)
  • _ (114-121)
  • _ (124-145)
  • delete (103-111)
  • delete_by_id (96-100)
  • read_one (59-83)
backend/app/services/collections/create_collection.py (8)
backend/app/models/collection.py (5)
  • CollectionPublic (143-152)
  • CreationRequest (121-129)
  • AssistantOptions (58-111)
  • Collection (14-37)
  • extract_super_type (126-129)
backend/app/models/collection_job.py (5)
  • CollectionJobPublic (108-113)
  • CollectionJob (24-72)
  • job_id (63-64)
  • CollectionJobUpdate (83-88)
  • CollectionJobStatus (12-16)
backend/app/services/collections/helpers.py (3)
  • _backout (72-80)
  • batch_documents (50-69)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (175-205)
  • APIResponse (29-53)
  • success_response (36-39)
  • failure_response (42-53)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/api/routes/collections.py (6)
backend/app/models/collection_job.py (5)
  • CollectionJobPublic (108-113)
  • CollectionJobImmediatePublic (103-105)
  • CollectionJobCreate (76-80)
  • CollectionActionType (19-21)
  • CollectionJobStatus (12-16)
backend/app/models/collection.py (5)
  • CollectionWithDocsPublic (155-156)
  • CreationRequest (121-129)
  • CallbackRequest (114-118)
  • DeletionRequest (132-133)
  • CollectionPublic (143-152)
backend/app/utils.py (4)
  • APIResponse (29-53)
  • load_description (248-253)
  • success_response (36-39)
  • _ (257-258)
backend/app/crud/collection/collection.py (7)
  • CollectionCrud (18-145)
  • read_all (85-94)
  • delete (103-111)
  • _ (114-121)
  • _ (124-145)
  • read_one (59-83)
  • create (43-57)
backend/app/crud/document_collection.py (3)
  • create (12-23)
  • DocumentCollectionCrud (8-48)
  • read (25-48)
backend/app/models/document.py (1)
  • DocumentPublic (43-60)
backend/app/tests/api/routes/collections/test_create_collections.py (3)
backend/app/models/collection_job.py (1)
  • CollectionJobStatus (12-16)
backend/app/models/collection.py (1)
  • CreationRequest (121-129)
backend/app/tests/conftest.py (3)
  • client (65-68)
  • user_api_key_header (90-92)
  • user_api_key (102-104)
backend/app/models/collection.py (2)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/document.py (1)
  • DocumentPublic (43-60)
backend/app/tests/api/routes/collections/test_collection_delete.py (4)
backend/app/models/collection_job.py (1)
  • CollectionJobStatus (12-16)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (1)
  • get_collection (25-47)
backend/app/tests/conftest.py (3)
  • db (28-45)
  • user_api_key_header (90-92)
  • user_api_key (102-104)
backend/app/tests/api/routes/collections/test_collection_job_info.py (4)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_collection_job (74-95)
backend/app/models/collection_job.py (2)
  • CollectionActionType (19-21)
  • CollectionJobStatus (12-16)
backend/app/tests/conftest.py (2)
  • db (28-45)
  • user_api_key_header (90-92)
backend/app/tests/api/routes/collections/test_collection_info.py (5)
backend/app/tests/utils/utils.py (2)
  • get_project (54-73)
  • get_document (102-123)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_vector_store_collection (50-71)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/models/collection.py (1)
  • Collection (14-37)
backend/app/tests/conftest.py (3)
  • db (28-45)
  • client (65-68)
  • user_api_key_header (90-92)
backend/app/api/routes/collection_job.py (6)
backend/app/models/collection_job.py (2)
  • CollectionActionType (19-21)
  • CollectionJobPublic (108-113)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • CollectionPublic (143-152)
backend/app/utils.py (2)
  • APIResponse (29-53)
  • failure_response (42-53)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (18-145)
  • read_one (59-83)
backend/app/crud/collection/collection_job.py (1)
  • read_one (24-45)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/models/__init__.py (2)
backend/app/models/collection.py (4)
  • Collection (14-37)
  • CollectionPublic (143-152)
  • CollectionIDPublic (139-140)
  • CollectionWithDocsPublic (155-156)
backend/app/models/collection_job.py (2)
  • CollectionJob (24-72)
  • CollectionJobImmediatePublic (103-105)
backend/app/tests/api/routes/collections/test_collection_list.py (3)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_vector_store_collection (50-71)
backend/app/tests/conftest.py (2)
  • db (28-45)
  • user_api_key_header (90-92)
backend/app/tests/services/collections/test_create_collection.py (7)
backend/app/services/collections/create_collection.py (2)
  • start_job (43-73)
  • execute_job (165-320)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/collection.py (2)
  • get_collection_job (74-95)
  • get_collection (25-47)
backend/app/crud/collection/collection.py (6)
  • _ (114-121)
  • _ (124-145)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • create (43-57)
  • delete (103-111)
backend/app/tests/utils/document.py (2)
  • project (65-66)
  • DocumentStore (53-80)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • create (78-96)
  • update (58-76)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/models/collection_job.py (1)
backend/app/models/collection.py (2)
  • CollectionPublic (143-152)
  • CollectionIDPublic (139-140)
backend/app/services/collections/helpers.py (3)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/models/collection.py (1)
  • Collection (14-37)
backend/app/tests/services/collections/test_helpers.py (2)
  • delete (94-95)
  • delete (107-108)
🪛 Ruff (0.14.3)
backend/app/services/collections/delete_collection.py

153-153: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/create_collection.py

172-172: Unused function argument: task_instance

(ARG001)


187-187: Local variable callback is assigned to but never used

Remove assignment to unused variable callback

(F841)

backend/app/tests/api/routes/collections/test_collection_list.py

13-13: Unused function argument: db

(ARG001)


55-55: Local variable before_ids is assigned to but never used

Remove assignment to unused variable before_ids

(F841)


112-112: Unused function argument: db

(ARG001)

backend/app/tests/services/collections/test_create_collection.py

344-344: Redefinition of unused test_execute_job_success_flow_updates_job_and_creates_collection from line 118

(F811)


424-424: Redefinition of unused test_execute_job_assistant_create_failure_marks_failed_and_deletes_vector from line 198

(F811)


575-575: Unused function argument: MockAssistantCrud

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (24)
backend/app/tests/api/routes/collections/test_collection_job_info.py (1)

1-160: LGTM! Comprehensive test coverage for collection job info endpoint.

The test suite thoroughly covers all job states (PENDING, SUCCESSFUL, FAILED) and action types (CREATE, DELETE) with appropriate assertions for job metadata, timestamps, collection presence, and error messages.

backend/app/models/collection.py (5)

6-6: LGTM!

The model_validator import is correctly used for the Pydantic v2 validator on line 90.


9-9: LGTM!

The DocumentPublic import is correctly used in CollectionWithDocsPublic on line 156.


62-111: LGTM! Validator correctly implements both-or-neither constraint.

The @model_validator(mode="before") properly addresses past review feedback by:

  • Using Pydantic v2 model_validator instead of deprecated root_validator
  • Including proper type hints (values: dict[str, Any] -> dict[str, Any])
  • Only normalizing strings while preserving non-strings for Pydantic coercion
  • Using is None checks instead of bool() for presence detection
  • Correctly handling the XOR constraint for model and instructions fields

126-129: LGTM! Correct Pydantic v2 API usage.

The change from cls.__fields__ to cls.model_fields.keys() properly migrates to Pydantic v2 API.


139-156: LGTM! Well-structured public response models.

The new models follow best practices:

  • CollectionIDPublic provides a minimal representation for DELETE responses
  • CollectionPublic exposes all necessary collection metadata
  • CollectionWithDocsPublic cleanly extends with optional document list using proper type hints
backend/app/tests/services/collections/test_delete_collection.py (1)

1-372: LGTM! Comprehensive delete flow test coverage.

The test suite properly validates:

  • Job creation and task scheduling with correct parameters
  • Remote deletion via OpenAIAssistantCrud.delete()
  • Local soft-delete via CollectionCrud.delete_by_id()
  • Success and failure callback payload structure
  • Error message propagation on failures

The tests correctly use the updated test utilities and payload serialization patterns introduced in this PR.

backend/app/api/docs/collections/info.md (1)

1-6: LGTM! Clear documentation of include_docs functionality.

The documentation accurately describes the optional document inclusion feature.

backend/app/api/docs/collections/create.md (1)

1-29: LGTM! Documentation accurately reflects new creation flow.

The updates correctly document:

  • Assistant creation requires both model and instructions
  • Vector-store-only creation when neither is provided
  • Job-based polling using job_id from the response
  • Expanded failure scenarios
backend/app/models/__init__.py (1)

9-23: LGTM! Public model exports correctly updated.

The new exports properly expose:

  • CollectionIDPublic and CollectionWithDocsPublic from the collection module
  • CollectionJobImmediatePublic from the collection_job module

These align with the new public API surface introduced in this PR.

backend/app/api/routes/collection_job.py (3)

28-33: LGTM! Cleaner endpoint path and appropriate field exclusion.

The path simplification and response_model_exclude for error_message ensure errors are only exposed at the top level in failure responses, not nested in data.


44-54: LGTM! Correct collection hydration logic.

The implementation properly:

  • Fetches full collection details for successful CREATE operations
  • Provides minimal ID-only response for DELETE operations
  • Uses appropriate public models (CollectionPublic vs CollectionIDPublic)

56-65: LGTM! Clean error handling with helpful context.

The failure response correctly:

  • Extracts user-friendly error messages via extract_error_message
  • Includes job data for context
  • Maintains consistent API response structure
backend/app/tests/api/routes/collections/test_create_collections.py (4)

1-12: LGTM! Clean imports and defensive metadata helper.

The _extract_metadata helper safely handles API response variations for the metadata field.


15-60: LGTM! Comprehensive assistant creation test.

The test properly validates:

  • Response structure and status
  • Job timestamps presence
  • with_assistant=True flag
  • Request payload serialization
  • Metadata absence for assistant path

62-101: LGTM! Complete vector-only creation test.

The test correctly validates:

  • Response structure for vector-store-only path
  • Presence of descriptive metadata note
  • with_assistant=False flag
  • Request payload handling

103-128: LGTM! Proper validation error test.

The test correctly validates that providing only one of model/instructions triggers a 422 error with a clear message, confirming the Pydantic validator works as expected.

backend/app/services/collections/create_collection.py (4)

76-94: Well-structured success payload builder.

The helper cleanly encapsulates the success response construction using the new public models. The use of exclude_none=True appropriately omits unset optional fields from the JSON response.


120-138: Robust cleanup with guarded fallbacks.

The best-effort cleanup pattern properly handles partially created resources and logs warnings on backout failures without propagating exceptions, which is appropriate for cleanup code.


140-162: Good session isolation for failure marking.

Opening a fresh session to persist the FAILED status is the correct pattern and aligns with the session management approach used throughout this module.


224-231: Assistant options extraction is clean.

Filtering out None values before passing to the OpenAI SDK prevents sending unset optional parameters, which is good practice. The validation in AssistantOptions._assistant_fields_all_or_none ensures both model and instructions are present when with_assistant is True.

backend/app/api/routes/collections.py (3)

42-55: Excellent callback documentation for OpenAPI.

The callback router specification properly documents the expected request/response contract for async callbacks, directly addressing the past review comment about documenting callbacks in Swagger. This follows FastAPI best practices.

Based on past review comments.


108-115: Clear metadata note for vector-store-only mode.

The conditional metadata note effectively communicates to API consumers when assistant creation is skipped, improving the developer experience.


186-191: Good conditional document loading with pagination.

The include_docs flag with pagination parameters provides flexible API control. Conditionally loading documents only when requested avoids unnecessary joins and data transfer.

@nishika26 nishika26 linked an issue Nov 5, 2025 that may be closed by this pull request
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/app/services/collections/create_collection.py (1)

317-319: Potential NameError in exception handler.

If an exception occurs before line 189 (where creation_request is assigned), the reference at line 317 will raise a NameError. Initialize creation_request = None with the other resource variables at the top of the function (around line 186):

     # Keep references for potential backout/cleanup on failure
     assistant = None
     assistant_crud = None
     vector_store = None
     vector_store_crud = None
     collection_job = None
+    creation_request = None

     try:
         creation_request = CreationRequest(**request)
🧹 Nitpick comments (3)
backend/app/services/collections/delete_collection.py (1)

57-96: Consider consolidating payload builders into shared helpers.

The build_success_payload and build_failure_payload functions are duplicated between delete_collection.py and create_collection.py (lines 76-117) with only minor differences. Consider extracting these to app/services/collections/helpers.py as generic functions that accept the appropriate parameters.

backend/app/tests/api/routes/collections/test_collection_list.py (1)

108-127: Unused fixture parameter.

The db parameter is declared but never used in this test. Since the test assumes a potentially empty database, the fixture is unnecessary. Remove it:

 def test_list_collections_does_not_error_with_no_collections(
-    db: Session,
     client: TestClient,
     user_api_key_header,
 ):
backend/app/services/collections/create_collection.py (1)

76-117: Payload builders duplicated across services.

The build_success_payload and build_failure_payload functions are duplicated in both create_collection.py and delete_collection.py with only minor signature differences. Consider consolidating these into app/services/collections/helpers.py as generic utilities.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f0bd33 and 6c54f71.

📒 Files selected for processing (5)
  • backend/app/api/routes/collections.py (3 hunks)
  • backend/app/services/collections/create_collection.py (3 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
  • backend/app/tests/api/routes/collections/test_collection_list.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (8 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/api/routes/collections.py
  • backend/app/services/collections/delete_collection.py
  • backend/app/tests/api/routes/collections/test_collection_list.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/delete_collection.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
🧬 Code graph analysis (5)
backend/app/tests/services/collections/test_create_collection.py (6)
backend/app/models/collection.py (1)
  • CreationRequest (121-129)
backend/app/services/collections/create_collection.py (2)
  • start_job (43-73)
  • execute_job (165-319)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (114-141)
backend/app/tests/utils/collection.py (2)
  • get_collection_job (74-95)
  • get_collection (25-47)
backend/app/tests/utils/document.py (2)
  • project (65-66)
  • DocumentStore (53-80)
backend/app/core/cloud/storage.py (1)
  • AmazonCloudStorageClient (28-84)
backend/app/services/collections/create_collection.py (12)
backend/app/models/collection.py (5)
  • CollectionPublic (143-152)
  • CreationRequest (121-129)
  • AssistantOptions (58-111)
  • Collection (14-37)
  • extract_super_type (126-129)
backend/app/models/collection_job.py (5)
  • CollectionJobPublic (108-113)
  • CollectionJob (24-72)
  • job_id (63-64)
  • CollectionJobUpdate (83-88)
  • CollectionJobStatus (12-16)
backend/app/services/collections/helpers.py (3)
  • _backout (72-80)
  • batch_documents (50-69)
  • extract_error_message (21-47)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/delete_collection.py (3)
  • build_success_payload (57-73)
  • build_failure_payload (76-96)
  • execute_job (146-238)
backend/app/crud/collection/collection_job.py (4)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • create (78-96)
backend/app/crud/collection/collection.py (3)
  • read_one (59-83)
  • create (43-57)
  • CollectionCrud (18-145)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/api/routes/collections.py (7)
backend/app/models/collection_job.py (5)
  • CollectionJobPublic (108-113)
  • CollectionJobImmediatePublic (103-105)
  • CollectionJobCreate (76-80)
  • CollectionActionType (19-21)
  • CollectionJobStatus (12-16)
backend/app/models/collection.py (5)
  • CollectionWithDocsPublic (155-156)
  • CreationRequest (121-129)
  • CallbackRequest (114-118)
  • DeletionRequest (132-133)
  • CollectionPublic (143-152)
backend/app/utils.py (4)
  • APIResponse (30-54)
  • load_description (279-284)
  • success_response (37-40)
  • _ (288-289)
backend/app/crud/collection/collection.py (7)
  • CollectionCrud (18-145)
  • read_all (85-94)
  • delete (103-111)
  • _ (114-121)
  • _ (124-145)
  • read_one (59-83)
  • create (43-57)
backend/app/services/collections/create_collection.py (1)
  • start_job (43-73)
backend/app/services/collections/delete_collection.py (1)
  • start_job (26-54)
backend/app/crud/document_collection.py (3)
  • create (12-23)
  • DocumentCollectionCrud (8-48)
  • read (25-48)
backend/app/services/collections/delete_collection.py (7)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (108-113)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/crud/collection/collection_job.py (2)
  • update (58-76)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
backend/app/tests/api/routes/collections/test_collection_list.py (3)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_vector_store_collection (50-71)
backend/app/tests/conftest.py (2)
  • user_api_key_header (90-92)
  • db (28-45)
🪛 Ruff (0.14.3)
backend/app/services/collections/create_collection.py

172-172: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/delete_collection.py

153-153: Unused function argument: task_instance

(ARG001)

backend/app/tests/api/routes/collections/test_collection_list.py

109-109: Unused function argument: db

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (10)
backend/app/services/collections/delete_collection.py (2)

165-238: Bootstrap error handling is adequate.

The outer try/except block (lines 165-238) properly catches bootstrap failures and delegates to _mark_job_failed_and_callback, which handles job state updates and callback notifications. This addresses the previous concern about jobs getting stuck in PROCESSING state.


18-18: Magic string concern addressed.

The OPENAI_VECTOR_STORE constant is properly imported from the helpers module, addressing the previous concern about magic strings. This ensures consistency across create and delete operations.

Based on learnings

backend/app/tests/services/collections/test_create_collection.py (2)

50-113: Test structure properly updated.

The test correctly uses the new get_collection_job helper and validates the updated start_job signature with with_assistant parameter and JSON serialization via request.model_dump(mode="json").


195-257: Good coverage of assistant creation failure scenario.

This test properly validates that when assistant creation fails, the job is marked FAILED with the error message, and the vector store is cleaned up. The mock assertions confirm the cleanup path is invoked.

backend/app/services/collections/create_collection.py (1)

219-246: Clear conditional logic for assistant creation.

The with_assistant flag properly controls whether to create an assistant-backed collection or a standalone vector store. The logging messages clearly document each path.

backend/app/api/routes/collections.py (5)

42-55: Callback endpoint properly documented.

The callback router and notification endpoint provide clear OpenAPI documentation for consumers, showing the expected callback payload structure with APIResponse[CollectionJobPublic].


94-97: Clean derivation of with_assistant flag.

The logic correctly determines with_assistant by checking both model and instructions are present, aligning with the validation in AssistantOptions.


108-115: Helpful metadata note for vector-store-only creation.

The metadata note clearly communicates to users when an assistant is not being created, which improves the API developer experience.


132-140: DELETE endpoint design follows REST conventions.

The endpoint properly uses DELETE method with the collection ID in the path and an optional callback configuration in the body. The upfront validation via read_one ensures the collection exists before queuing the deletion job.


174-179: Query parameters provide flexible document retrieval.

The include_docs, skip, and limit parameters allow callers to control pagination and whether to fetch associated documents, which is good API design for performance.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c54f71 and 3b2e2a6.

📒 Files selected for processing (6)
  • backend/app/api/docs/collections/create.md (2 hunks)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/api/docs/collections/info.md (1 hunks)
  • backend/app/api/docs/collections/job_info.md (1 hunks)
  • backend/app/api/routes/collections.py (3 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/app/api/docs/collections/delete.md
  • backend/app/api/docs/collections/info.md
  • backend/app/api/docs/collections/create.md
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/api/routes/collections.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
🧬 Code graph analysis (2)
backend/app/services/collections/delete_collection.py (7)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (108-113)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
backend/app/api/routes/collections.py (6)
backend/app/models/collection_job.py (5)
  • CollectionJobPublic (108-113)
  • CollectionJobImmediatePublic (103-105)
  • CollectionJobCreate (76-80)
  • CollectionActionType (19-21)
  • CollectionJobStatus (12-16)
backend/app/models/collection.py (5)
  • CollectionWithDocsPublic (155-156)
  • CreationRequest (121-129)
  • CallbackRequest (114-118)
  • DeletionRequest (132-133)
  • CollectionPublic (143-152)
backend/app/utils.py (4)
  • APIResponse (30-54)
  • load_description (279-284)
  • success_response (37-40)
  • _ (288-289)
backend/app/crud/collection/collection.py (7)
  • CollectionCrud (18-145)
  • read_all (85-94)
  • delete (103-111)
  • _ (114-121)
  • _ (124-145)
  • read_one (59-83)
  • create (43-57)
backend/app/crud/document_collection.py (3)
  • create (12-23)
  • DocumentCollectionCrud (8-48)
  • read (25-48)
backend/app/models/document.py (1)
  • DocumentPublic (43-60)
🪛 markdownlint-cli2 (0.18.1)
backend/app/api/docs/collections/job_info.md

6-6: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


7-7: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🪛 Ruff (0.14.3)
backend/app/services/collections/delete_collection.py

153-153: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b2e2a6 and ead41a0.

📒 Files selected for processing (1)
  • backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/api/routes/collections/test_collection_info.py
🧬 Code graph analysis (1)
backend/app/tests/api/routes/collections/test_collection_info.py (5)
backend/app/tests/utils/utils.py (2)
  • get_project (54-73)
  • get_document (102-123)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_vector_store_collection (50-71)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/tests/conftest.py (3)
  • db (28-45)
  • client (65-68)
  • user_api_key_header (90-92)
backend/app/crud/collection/collection.py (3)
  • create (43-57)
  • _ (114-121)
  • _ (124-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/app/services/collections/delete_collection.py (1)

99-144: Critical: Handle bootstrap failures when collection_job is None.

If get_openai_client or any bootstrap step before line 170 throws, execute_job calls this function with collection_job=None. Line 118 will raise AttributeError on collection_job.id, the exception handler at line 125 swallows it, and the job stays stuck in PENDING/PROCESSING with no callback. This is the exact issue flagged in past reviews.

Apply this diff to use the job_id parameter as fallback:

 def _mark_job_failed_and_callback(
     *,
     project_id: int,
-    collection_job: CollectionJob,
+    collection_job: CollectionJob | None,
     collection_id: UUID,
     job_id: UUID,
     err: Exception,
     callback_url: str | None,
 ) -> None:
     """
     Common failure handler:
     - mark job as FAILED with error_message
     - log error
     - send failure callback (if configured)
     """
     try:
         with Session(engine) as session:
             collection_job_crud = CollectionJobCrud(session, project_id)
+            # Use job_id when collection_job is None (bootstrap failures)
+            target_id = collection_job.id if collection_job else job_id
             collection_job_crud.update(
-                collection_job.id,
+                target_id,
                 CollectionJobUpdate(
                     status=CollectionJobStatus.FAILED,
                     error_message=str(err),
                 ),
             )
-            collection_job = collection_job_crud.read_one(collection_job.id)
+            collection_job = collection_job_crud.read_one(target_id)
     except Exception:
         logger.warning("[delete_collection.execute_job] Failed to mark job as FAILED")

This ensures the job is marked FAILED and the callback fires even when bootstrap steps fail before collection_job is loaded.

🧹 Nitpick comments (4)
backend/app/api/docs/collections/job_info.md (1)

5-8: Simplify the conditional phrasing on line 5 for clarity.

The phrasing "If the job has finished, has been successful and it was a job of creation of collection" is verbose and difficult to parse. Consider rephrasing to be more concise and direct, similar to the pattern used in line 9.

Apply this diff:

-* If the job has finished, has been successful and it was a job of creation of collection then this endpoint will fetch the associated collection details from the collection table, including:
+* If the collection-creation job succeeds, the response includes the associated collection details from the collection table:
backend/app/services/collections/delete_collection.py (3)

146-154: Add return type annotation.

The function is missing a return type annotation. As per coding guidelines, use type hints in Python code.

Apply this diff:

 def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
     collection_id: str,
     task_instance,
-) -> None:
+) -> None:

Wait, the return type is already -> None on line 154 in the provided code. Let me re-check... Actually, looking at the annotated code, line 154 shows:

154
) -> None:

But line 154 doesn't have a ~ marker, so this was already present in the old code. Let me look again at the signature... Lines 146-154 show the full signature with ) -> None: at line 154. So the return type IS present. I was mistaken.

Actually, I see the issue - let me re-read the annotated code format. The code shows line 154 ending the function signature, but I don't see ) -> None: explicitly. Let me check the actual code block again...

Looking at lines 146-154:

146
def execute_job(
147
    request: dict,
148
    project_id: int,
149
    organization_id: int,
150
    task_id: str,
151
    job_id: str,
152
    collection_id: str,
153
    task_instance,
154
) -> None:

Oh I see, line 154 DOES show ) -> None:. So the return type annotation is present. I was confused. Let me revise my analysis.

So the return type is fine. The only issue is the unused task_instance parameter, which is very minor.

[scratchpad_end] -->


153-153: Optional: Document or prefix unused task_instance parameter.

Static analysis flags task_instance as unused. If it's required by Celery's signature, consider prefixing with underscore (_task_instance) to signal intentional non-use.

Based on coding guidelines.


187-191: Consider adding a comment explaining the intentional session closure.

Lines 187-190 are intentionally outside the first with Session block to avoid holding database connections during external API calls (as noted in PR objectives). While this is correct, the indentation might confuse readers unfamiliar with the pattern.

Consider adding a brief comment:

+        # Close DB session before external API calls to avoid holding connection locks
         if is_vector:
             OpenAIVectorStoreCrud(client).delete(llm_service_id)
         else:
             OpenAIAssistantCrud(client).delete(llm_service_id)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ead41a0 and 165edf6.

📒 Files selected for processing (2)
  • backend/app/api/docs/collections/job_info.md (1 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
🧬 Code graph analysis (1)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (108-113)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/create_collection.py (3)
  • build_success_payload (76-94)
  • build_failure_payload (97-117)
  • execute_job (165-319)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
🪛 markdownlint-cli2 (0.18.1)
backend/app/api/docs/collections/job_info.md

6-6: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


7-7: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🪛 Ruff (0.14.3)
backend/app/services/collections/delete_collection.py

153-153: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/services/collections/delete_collection.py (5)

1-21: LGTM: Imports align with vector store vs assistant deletion paths.

The imports properly support the new dual-path deletion logic and use OPENAI_VECTOR_STORE constant from helpers to avoid magic strings.


26-54: LGTM: Request-driven job scheduling.

The refactor to pass request.model_dump(mode="json") and derive collection context from request.collection_id aligns with the PR objectives and maintains proper separation of concerns.


57-96: LGTM: Payload builders are clean and consistent.

The success and failure payload functions properly construct standardized APIResponse objects with CollectionJobPublic. Using extract_error_message for error normalization is a good practice.


182-184: LGTM: Clean service type detection.

The refactored logic directly uses collection.llm_service_id without redundant attribute fallbacks, addressing past review feedback.


192-227: Success and callback paths are correct.

The local deletion, job status update, and conditional callback sending are properly implemented. The exception handler correctly delegates to the centralized failure handler (though that handler has the critical issue flagged separately).

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/app/services/collections/delete_collection.py (1)

99-144: Critical: _mark_job_failed_and_callback still fails when collection_job is None.

The past review comment remains unaddressed. When bootstrap fails (e.g., get_openai_client throws on line 177 or CollectionJobCrud.read_one fails on line 168), collection_job is None when this handler is called from line 220. Line 118's collection_job.id raises AttributeError, the job stays PROCESSING, and clients poll forever.

Apply this diff to handle None collection_job:

 def _mark_job_failed_and_callback(
     *,
     project_id: int,
-    collection_job: CollectionJob,
+    collection_job: CollectionJob | None,
     collection_id: UUID,
     job_id: UUID,
     err: Exception,
     callback_url: str | None,
 ) -> None:
     """
     Common failure handler:
     - mark job as FAILED with error_message
     - log error
     - send failure callback (if configured)
     """
     try:
         with Session(engine) as session:
             collection_job_crud = CollectionJobCrud(session, project_id)
+            job_pk = collection_job.id if collection_job else job_id
             collection_job_crud.update(
-                collection_job.id,
+                job_pk,
                 CollectionJobUpdate(
                     status=CollectionJobStatus.FAILED,
                     error_message=str(err),
                 ),
             )
-            collection_job = collection_job_crud.read_one(collection_job.id)
+            collection_job = collection_job_crud.read_one(job_pk)
     except Exception:
         logger.warning("[delete_collection.execute_job] Failed to mark job as FAILED")
🧹 Nitpick comments (1)
backend/app/services/collections/delete_collection.py (1)

153-153: Prefix unused parameter with underscore.

The task_instance parameter is never used. If it's required by the Celery signature, prefix it with an underscore to signal intentional non-use.

-    task_instance,
+    _task_instance,
 ) -> None:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 165edf6 and 395916d.

📒 Files selected for processing (1)
  • backend/app/services/collections/delete_collection.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
🧬 Code graph analysis (1)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (108-113)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/create_collection.py (2)
  • build_success_payload (76-94)
  • build_failure_payload (97-117)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
🪛 Ruff (0.14.3)
backend/app/services/collections/delete_collection.py

153-153: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
backend/app/services/collections/delete_collection.py (1)

99-143: Keep the failure handler working when the job never loaded.

If CollectionJobCrud.read_one(job_uuid) raises (e.g., the job row was deleted before the worker ran), we enter the except block with collection_job still None. _mark_job_failed_and_callback then dereferences collection_job.id, raising AttributeError, so the job never transitions to FAILED and no callback fires—the worker logs the warning and exits, leaving clients polling forever. Please fall back to job_id whenever the in-memory job object is missing so the failure path can still mark the job and dispatch the callback.

-def _mark_job_failed_and_callback(
-    *,
-    project_id: int,
-    collection_job: CollectionJob,
+def _mark_job_failed_and_callback(
+    *,
+    project_id: int,
+    collection_job: CollectionJob | None,
     collection_id: UUID,
     job_id: UUID,
     err: Exception,
     callback_url: str | None,
 ) -> None:
@@
-        with Session(engine) as session:
-            collection_job_crud = CollectionJobCrud(session, project_id)
-            collection_job_crud.update(
-                collection_job.id,
-                CollectionJobUpdate(
-                    status=CollectionJobStatus.FAILED,
-                    error_message=str(err),
-                ),
-            )
-            collection_job = collection_job_crud.read_one(collection_job.id)
+        with Session(engine) as session:
+            collection_job_crud = CollectionJobCrud(session, project_id)
+            job_pk = collection_job.id if collection_job else job_id
+            collection_job = collection_job_crud.update(
+                job_pk,
+                CollectionJobUpdate(
+                    status=CollectionJobStatus.FAILED,
+                    error_message=str(err),
+                ),
+            )
🧹 Nitpick comments (1)
backend/app/services/collections/delete_collection.py (1)

146-218: Silence the unused Celery task argument.

Ruff (ARG001) flags task_instance as unused. Drop it after the docstring so we keep the signature Celery expects while satisfying lint.

 def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
     collection_id: str,
     task_instance,
 ) -> None:
     """Celery worker entrypoint for deleting a collection (both remote and local)."""
 
+    del task_instance
+
     deletion_request = DeletionRequest(**request)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 395916d and 87ff5ea.

📒 Files selected for processing (3)
  • backend/app/api/docs/collections/job_info.md (1 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/api/docs/collections/job_info.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/tests/api/routes/collections/test_collection_info.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
🧬 Code graph analysis (2)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (108-113)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/create_collection.py (3)
  • build_success_payload (76-94)
  • build_failure_payload (97-117)
  • execute_job (165-319)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
backend/app/tests/api/routes/collections/test_collection_info.py (6)
backend/app/tests/utils/utils.py (2)
  • get_project (54-73)
  • get_document (102-123)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_vector_store_collection (50-71)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/models/collection.py (1)
  • Collection (14-37)
backend/app/models/document.py (1)
  • Document (20-40)
backend/app/tests/conftest.py (2)
  • db (28-45)
  • user_api_key_header (90-92)
🪛 Ruff (0.14.3)
backend/app/services/collections/delete_collection.py

153-153: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/services/collections/delete_collection.py (1)

152-152: Consider prefixing unused parameter with underscore.

The task_instance parameter is flagged as unused by Ruff. If it's required by the Celery framework signature but not used in the function body, consider renaming it to _task_instance to signal intentional non-use.

 def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
     collection_id: str,
-    task_instance,
+    _task_instance,
 ) -> None:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87ff5ea and 4007930.

📒 Files selected for processing (1)
  • backend/app/services/collections/delete_collection.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
🧬 Code graph analysis (1)
backend/app/services/collections/delete_collection.py (8)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (108-113)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/create_collection.py (3)
  • build_success_payload (76-94)
  • build_failure_payload (97-117)
  • execute_job (165-319)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (3)
  • read_one (59-83)
  • delete (103-111)
  • delete_by_id (96-100)
🪛 Ruff (0.14.3)
backend/app/services/collections/delete_collection.py

152-152: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/services/collections/delete_collection.py (4)

26-54: LGTM!

The refactor to use request.collection_id and serialize the request with mode="json" aligns well with the updated job-driven architecture.


57-96: LGTM!

The payload builders follow the same pattern as create_collection.py and appropriately return only the collection ID for deletion operations. Error extraction via extract_error_message ensures consistent error formatting.


164-224: Excellent fix of the critical indentation bug!

The local deletion block (lines 191-202) is now correctly dedented and executes for both vector stores and assistants. Session management is also well-designed: the first session closes before external API calls, and a fresh session opens for local persistence.


181-183: LGTM on service type detection!

Using the OPENAI_VECTOR_STORE constant for service type comparison addresses the past review concern about magic strings and improves maintainability.

"/jobs/{job_id}",
description=load_description("collections/job_info.md"),
response_model=APIResponse[CollectionJobPublic],
response_model_exclude={"data": {"error_message"}},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why response_model_exclude here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in this comment - here

raw_error = getattr(collection_job, "error_message", None)
error_message = extract_error_message(raw_error)

return APIResponse.failure_response(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is some confusion here,
This endpoint gives info of job status,
If job is failure it should not return the failure_response,
Actually your request to this particular endpoint is succesful,
data should contain whatever a job status is (refer doc_transformation endpoints)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my point when all three of us had a discussion about it, that we don't need to return back a failure response here for a failed job, but @kartpop and you were in the support of of returning "false" for success for a failed job, with the error message being returned to the error key of failure response body, or was that just for callback, that when a job has failed then we send success as false to the callback

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed job we were referring to earlier was specifically about the callback response sent when a failure occurs. That is a completely different endpoint

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had a similar discussion for this as well, where i had a similar point that job execution in its whole has not failed, just the job failed, the reason why i updated all the response bodies here , nonethless i still agree with this point that we do not need to show failure response in this endpoint for a failed job

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this endpoint to return a success response even for failed jobs, because a failure response should only be used when the execution of the endpoint itself fails (e.g., an exception or server error).
If the endpoint successfully retrieves the job for the given job ID, it has fulfilled its responsibility — that is not considered an endpoint failure even if the job failed.

Similarly, the top-level "error" field in the response is reserved for issues that occur during the execution of the endpoint itself. If the job itself failed, that "error_message" should be represented within the job object rather than in the top-level error field.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
backend/app/services/collections/delete_collection.py (1)

192-203: Use job_uuid instead of collection_job.id for consistency.

Lines 197 and 203 reference collection_job.id, which is an attribute from an object loaded in the first session (now closed). While accessing the ID attribute is safe, using job_uuid directly is cleaner and avoids any potential confusion about session boundaries.

Apply this diff:

         with Session(engine) as session:
             CollectionCrud(session, project_id).delete_by_id(collection_id)
 
             collection_job_crud = CollectionJobCrud(session, project_id)
             collection_job_crud.update(
-                collection_job.id,
+                job_uuid,
                 CollectionJobUpdate(
                     status=CollectionJobStatus.SUCCESSFUL,
                     error_message=None,
                 ),
             )
-            collection_job = collection_job_crud.read_one(collection_job.id)
+            collection_job = collection_job_crud.read_one(job_uuid)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70dfae1 and 01e2d45.

📒 Files selected for processing (3)
  • backend/app/api/routes/collection_job.py (3 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
  • backend/app/tests/api/routes/collections/test_collection_job_info.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/tests/api/routes/collections/test_collection_job_info.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/api/routes/collection_job.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collection_job.py
🧬 Code graph analysis (2)
backend/app/services/collections/delete_collection.py (6)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/create_collection.py (2)
  • build_success_payload (76-94)
  • build_failure_payload (97-117)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
backend/app/api/routes/collection_job.py (5)
backend/app/models/collection_job.py (3)
  • CollectionJobStatus (12-16)
  • CollectionActionType (19-21)
  • CollectionJobPublic (108-113)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • CollectionPublic (143-152)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (18-145)
  • read_one (59-83)
backend/app/crud/collection/collection_job.py (1)
  • read_one (24-45)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
🪛 Ruff (0.14.3)
backend/app/services/collections/delete_collection.py

153-153: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/services/collections/delete_collection.py (6)

26-54: LGTM: Clean job scheduling with proper type hints.

The function correctly schedules the deletion job using the collection ID from the request and passes the serialized request to the worker. Type hints are properly used throughout, adhering to the coding guidelines.


57-96: LGTM: Consistent payload builders for success and failure cases.

The payload builders follow the same pattern as create_collection.py and appropriately use only the collection ID (rather than the full object) since the collection may be deleted. Error message extraction is properly delegated to the helper utility.


99-143: LGTM: Robust centralized failure handler.

The failure handler correctly:

  • Initializes collection_job = None to prevent NameError if DB operations fail (addresses past review concern)
  • Opens a fresh session to mark the job as FAILED
  • Gracefully degrades if the DB update fails (logs but continues)
  • Only sends callback when both the URL and the loaded job are available

146-164: Note: Unused parameter is likely required by Celery signature.

The task_instance parameter (line 153) is flagged as unused by static analysis. This is typically required by Celery's worker signature even when not used in the function body. You can silence the warning by prefixing with an underscore (_task_instance) if desired.


165-190: LGTM: Proper session management before external API calls.

The code correctly closes the DB session (line 166-186) before making external deletion calls (lines 187-190). This avoids holding database connections and transaction locks during potentially long-running external API operations, which is a best practice for async job processing.


165-225: LGTM: Comprehensive error handling for the deletion flow.

The outer try-except (lines 165-225) properly catches failures from:

  • Bootstrap phase (getting client, reading job/collection)
  • External deletion (OpenAI API calls)
  • Local deletion (database operations)

All failures are routed to the centralized _mark_job_failed_and_callback handler, which ensures the job status is updated and callbacks are triggered. This addresses the bootstrap failure concerns from previous reviews.

Copy link
Copy Markdown
Collaborator

@kartpop kartpop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with comments - please resolve

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/api/docs/collections/create.md (1)

14-30: Tighten wording and punctuation in async response note.

Please add the comma after "instruction" (“..., otherwise only a vector store …”), insert a space in “status. Once …”, and say “the creation of the collection” at the end. These small edits keep the guidance easy to read.

♻️ Duplicate comments (1)
backend/app/services/collections/create_collection.py (1)

187-318: Initialize creation_request before entering the try block.

If CreationRequest(**request) raises (e.g., payload drift, schema change), the except block hits if creation_request … and raises UnboundLocalError, bypassing failure handling and leaving the job stuck in PENDING/PROCESSING. Set creation_request: CreationRequest | None = None before the try, then assign inside it. Example:

-    # Keep references for potential backout/cleanup on failure
+    # Keep references for potential backout/cleanup on failure
+    creation_request: CreationRequest | None = None
     assistant = None
     ...
     try:
         creation_request = CreationRequest(**request)

This guarantees the cleanup path always runs.

🧹 Nitpick comments (2)
backend/app/api/routes/collections.py (2)

78-78: Consider removing response_model_exclude_none for API consistency.

The create_collection and delete_collection endpoints use response_model_exclude_none=True, while collection_info does not. This creates inconsistent response shapes across the API:

  • With exclude_none=True: Fields like error and metadata will be absent when None
  • Without it: All fields are always present

A past review comment noted: "In a REST API, response structures should remain consistent...if we omit them conditionally, the shape of the response changes. That leads to validation issues on the client side."

Consider removing response_model_exclude_none=True from lines 78 and 127 to maintain consistent response shapes, or apply it to all endpoints including collection_info.

Also applies to: 127-127, 168-168


135-135: Consider adding a clarifying comment for the validation check.

The underscore assignment pattern is correct for validating that the collection exists (will raise 404 if not found), but a brief comment would improve readability for future maintainers.

+    # Validate collection exists and user has access (raises 404 if not)
     _ = CollectionCrud(session, current_user.project_id).read_one(collection_id)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01e2d45 and d4de7bd.

📒 Files selected for processing (8)
  • backend/app/api/docs/collections/create.md (2 hunks)
  • backend/app/api/docs/collections/list.md (1 hunks)
  • backend/app/api/routes/collections.py (3 hunks)
  • backend/app/models/collection_job.py (4 hunks)
  • backend/app/services/collections/create_collection.py (3 hunks)
  • backend/app/services/collections/delete_collection.py (2 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (1 hunks)
  • backend/app/tests/api/routes/collections/test_collection_job_info.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/tests/api/routes/collections/test_collection_job_info.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/collections/create_collection.py
  • backend/app/api/routes/collections.py
  • backend/app/tests/api/routes/collections/test_collection_info.py
  • backend/app/models/collection_job.py
  • backend/app/services/collections/delete_collection.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/delete_collection.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection_job.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/services/response/**/*.py : Put OpenAI Responses API integration, conversation management, and job execution in backend/app/services/response/

Applied to files:

  • backend/app/models/collection_job.py
🧬 Code graph analysis (5)
backend/app/services/collections/create_collection.py (10)
backend/app/models/collection.py (5)
  • CollectionPublic (143-152)
  • CreationRequest (121-129)
  • AssistantOptions (58-111)
  • Collection (14-37)
  • extract_super_type (126-129)
backend/app/models/collection_job.py (5)
  • CollectionJobPublic (102-105)
  • CollectionJob (24-72)
  • job_id (63-64)
  • CollectionJobUpdate (83-88)
  • CollectionJobStatus (12-16)
backend/app/services/collections/helpers.py (3)
  • _backout (72-80)
  • batch_documents (50-69)
  • extract_error_message (21-47)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/delete_collection.py (3)
  • start_job (26-54)
  • build_success_payload (57-72)
  • build_failure_payload (75-91)
backend/app/crud/collection/collection_job.py (4)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • create (78-96)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/api/routes/collections.py (9)
backend/app/models/collection_job.py (5)
  • CollectionJobPublic (102-105)
  • CollectionJobImmediatePublic (97-99)
  • CollectionJobCreate (76-80)
  • CollectionActionType (19-21)
  • CollectionJobStatus (12-16)
backend/app/models/collection.py (5)
  • CollectionWithDocsPublic (155-156)
  • CreationRequest (121-129)
  • CallbackRequest (114-118)
  • DeletionRequest (132-133)
  • CollectionPublic (143-152)
backend/app/utils.py (4)
  • APIResponse (30-54)
  • load_description (288-293)
  • success_response (37-40)
  • _ (297-298)
backend/app/crud/collection/collection.py (7)
  • CollectionCrud (18-145)
  • read_all (85-94)
  • delete (103-111)
  • _ (114-121)
  • _ (124-145)
  • read_one (59-83)
  • create (43-57)
backend/app/crud/collection/collection_job.py (4)
  • read_all (47-56)
  • read_one (24-45)
  • CollectionJobCrud (19-96)
  • create (78-96)
backend/app/services/collections/create_collection.py (1)
  • start_job (43-73)
backend/app/services/collections/delete_collection.py (1)
  • start_job (26-54)
backend/app/crud/document_collection.py (3)
  • create (12-23)
  • DocumentCollectionCrud (8-48)
  • read (25-48)
backend/app/models/document.py (1)
  • DocumentPublic (43-60)
backend/app/tests/api/routes/collections/test_collection_info.py (4)
backend/app/tests/utils/utils.py (2)
  • get_project (54-73)
  • get_document (102-123)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-47)
  • get_vector_store_collection (50-71)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/tests/conftest.py (2)
  • db (28-45)
  • user_api_key_header (90-92)
backend/app/models/collection_job.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/collection.py (2)
  • CollectionPublic (143-152)
  • CollectionIDPublic (139-140)
backend/app/services/collections/delete_collection.py (9)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (5)
  • CollectionJobStatus (12-16)
  • CollectionJobUpdate (83-88)
  • CollectionJob (24-72)
  • CollectionJobPublic (102-105)
  • job_id (63-64)
backend/app/models/collection.py (2)
  • CollectionIDPublic (139-140)
  • DeletionRequest (132-133)
backend/app/services/collections/helpers.py (1)
  • extract_error_message (21-47)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (4)
  • get_openai_client (176-206)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/collections/create_collection.py (3)
  • build_success_payload (76-94)
  • build_failure_payload (97-116)
  • execute_job (164-318)
backend/app/crud/collection/collection_job.py (3)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
  • read_one (59-83)
  • CollectionCrud (18-145)
  • delete (103-111)
  • delete_by_id (96-100)
🪛 LanguageTool
backend/app/api/docs/collections/create.md

[uncategorized] ~14-~14: Possible missing comma found.
Context: ...model" and "instruction" in the request body otherwise only a vector store will be ...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~25-~25: Possible missing comma found.
Context: ... from this endpoint is collection_job object which is going to contain the collectio...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~30-~30: Possible missing article found.
Context: ...e the information about the creation of collection.

(AI_HYDRA_LEO_MISSING_THE)

backend/app/api/docs/collections/list.md

[uncategorized] ~4-~4: Possible missing comma found.
Context: ... but not deleted If a vector store was created - llm_service_name and `llm_service_i...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~4-~4: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...se denote the name of the vector store (eg. 'openai vector store') and its id. [To...

(E_G)


[uncategorized] ~6-~6: The abbreviation “e.g.” (= for example) requires two periods.
Context: ...me of the model used int the assistant (eg. 'gpt-4o') and assistant id.

(E_G)

🪛 Ruff (0.14.3)
backend/app/services/collections/create_collection.py

171-171: Unused function argument: task_instance

(ARG001)

backend/app/services/collections/delete_collection.py

148-148: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/api/routes/collections.py (5)

5-5: LGTM: Import additions are appropriate.

The new imports support the callback routing, job response models, and document inclusion features introduced in this PR.

Also applies to: 19-21, 25-25


37-56: LGTM: Callback specification is well-documented.

The callback router and notification specification clearly document the expected callback behavior for collection operations, addressing previous feedback about callback documentation.


58-70: LGTM: List endpoint is correctly implemented.

The endpoint properly retrieves all non-deleted collections for the current project and wraps them in the standard API response format.


133-133: LGTM: Optional request body properly annotated.

The explicit Body(default=None) annotation correctly addresses the previous review feedback, making it clear that the request body is optional for DELETE operations.


174-191: LGTM: Document pagination properly implemented.

The endpoint correctly:

  • Validates pagination parameters with appropriate constraints (skip ≥ 0, 0 < limit ≤ 100)
  • Conditionally fetches documents based on include_docs flag
  • Passes pagination parameters to DocumentCollectionCrud.read() which handles them safely (validates for negative values per the relevant code snippets)
  • Sets documents to None when include_docs=False, avoiding user confusion about empty vs. not-requested data

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
backend/app/api/routes/collections.py (3)

93-96: Consider simplifying attribute access.

Since CreationRequest inherits from AssistantOptions (which defines model and instructions), and validation already happens at the model level, direct attribute access is clearer than getattr.

-    # True iff both model and instructions were provided in the request body
-    with_assistant = bool(
-        getattr(request, "model", None) and getattr(request, "instructions", None)
-    )
+    # True iff both model and instructions were provided in the request body
+    with_assistant = bool(request.model and request.instructions)

133-133: Consider adding a clarifying comment.

The unused assignment _ indicates intentional validation that the collection exists. A brief comment would make this intent explicit.

+    # Validate collection exists (raises 404 if not found)
     _ = CollectionCrud(session, current_user.project_id).read_one(collection_id)

172-177: Consider documenting pagination parameter relevance.

The skip and limit parameters are accepted even when include_docs=False, where they have no effect. While not a bug, this could be clarified in the endpoint description or the parameter descriptions.

     skip: int = Query(0, ge=0),
-    limit: int = Query(100, gt=0, le=100),
+    limit: int = Query(100, gt=0, le=100, description="Applies only when include_docs=True"),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4de7bd and 37872f0.

📒 Files selected for processing (2)
  • backend/app/api/docs/collections/list.md (1 hunks)
  • backend/app/api/routes/collections.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/api/docs/collections/list.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/api/routes/collections.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
🧬 Code graph analysis (1)
backend/app/api/routes/collections.py (9)
backend/app/models/collection_job.py (3)
  • CollectionJobPublic (102-105)
  • CollectionJobImmediatePublic (97-99)
  • CollectionActionType (19-21)
backend/app/models/collection.py (5)
  • CollectionWithDocsPublic (155-156)
  • CreationRequest (121-129)
  • CallbackRequest (114-118)
  • DeletionRequest (132-133)
  • CollectionPublic (143-152)
backend/app/utils.py (3)
  • APIResponse (30-54)
  • success_response (37-40)
  • _ (297-298)
backend/app/crud/collection/collection.py (7)
  • CollectionCrud (18-145)
  • read_all (85-94)
  • delete (103-111)
  • _ (114-121)
  • _ (124-145)
  • read_one (59-83)
  • create (43-57)
backend/app/crud/collection/collection_job.py (4)
  • read_all (47-56)
  • read_one (24-45)
  • CollectionJobCrud (19-96)
  • create (78-96)
backend/app/services/collections/delete_collection.py (1)
  • start_job (26-54)
backend/app/services/collections/create_collection.py (1)
  • start_job (43-73)
backend/app/crud/document_collection.py (3)
  • create (12-23)
  • DocumentCollectionCrud (8-48)
  • read (25-48)
backend/app/models/document.py (1)
  • DocumentPublic (43-60)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/api/routes/collections.py (6)

5-5: LGTM! Imports are well-organized.

All new imports are used appropriately throughout the file and support the callback router functionality and new public response models.

Also applies to: 19-21, 25-25, 39-39


42-55: LGTM! Callback specification properly implemented.

The callback endpoint specification correctly defines the contract for external webhook receivers and addresses the previous review request for callback documentation.


58-70: LGTM! List endpoint is clean and straightforward.

The implementation properly uses dependency injection, respects project boundaries, and returns a consistent response structure.


107-118: LGTM! Helpful metadata and consistent response structure.

The metadata note clearly communicates to users when a vector-store-only collection will be created, and the response model properly represents the immediate job state.


135-160: LGTM! Delete endpoint properly structured.

The endpoint correctly validates collection existence, handles optional callback URLs, and follows the established job pattern. The explicit Body(default=None) annotation addresses the previous review feedback.


182-191: LGTM! Collection info endpoint properly handles conditional document loading.

The implementation correctly:

  • Maintains consistent response structure (documents field is None when include_docs=False, avoiding the response_model_exclude_none issue from past reviews)
  • Applies proper pagination with reasonable defaults
  • Uses appropriate validation for both collection and documents

@nishika26 nishika26 merged commit aa704b9 into main Nov 7, 2025
3 checks passed
@nishika26 nishika26 deleted the enhancement/collection_vector_store branch November 7, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collections : enhancement of the module Collections : Test cases are not there for this router

3 participants