Skip to content

Document : moving transform job to celery and adding callback url#437

Merged
nishika26 merged 17 commits intomainfrom
enhancement/doc_transform_to_celery
Nov 23, 2025
Merged

Document : moving transform job to celery and adding callback url#437
nishika26 merged 17 commits intomainfrom
enhancement/doc_transform_to_celery

Conversation

@nishika26
Copy link
Copy Markdown
Collaborator

@nishika26 nishika26 commented Nov 17, 2025

Summary

Target issue is #429 and #418

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Database Schema Migration

  • Migration: Added nullable task_id and trace_id columns to doc_transformation_job table for celery

  • Field rename: created_at → inserted_at for consistency with other tables, with full reversible downgrade path.

API & Route Updates

  • Endpoints: Restructured paths (e.g., / {job_id} → /transformation/{job_id}) across document routes to include transformation endpoints under document router only and not create a new router for it.

  • Response models: Updated to use new public types for better readability for users (DocTransformationJobPublic, TransformedDocumentPublic).

  • Enhancements: Added include_url query parameter; integrated DocumentCrud and cloud storage helpers for signed URL generation.

  • Imports: Adjusted imports across routes (fine_tuning.py, documents.py) to reflect new CRUD module paths since we moved document crud file to crud/document folder , which contains both document crud file and document transformation crud file

CRUD Layer Refactor

  • Structure: Moved CRUD modules into nested document/ package paths.

  • API changes:

    • create(source_document_id) → create(DocTransformJobCreate)

    • update_status() → update(DocTransformJobUpdate) (patch-based field updates)

    • Cleanup: Removed unused imports and legacy references.

Models & Public DTOs

New schemas: Added DocTransformJobCreate, DocTransformJobUpdate, DocTransformationJobPublic, DocTransformationJobsPublic, and TransformedDocumentPublic.

  • Model updates:

    • Added task_id, trace_id, inserted_at to DocTransformationJob.

    • Added computed properties (job_id, job_inserted_at, job_updated_at).

    • Updated TransformationJobInfo (added status, removed source_format / target_format).

Service Layer Reorganization

Refactor: Removed app.core.doctransform.service and moved doc transformation's core logic to app.services.doctransform.job implements:

Summary by CodeRabbit

  • New Features

    • Callback URL support for transformation jobs to notify external systems on completion/failure.
    • Optional include_url query to attach signed URLs to document and transformed-document responses.
    • Jobs now include task and trace identifiers for improved tracing and observability.
  • API Changes

    • Transformation endpoints consolidated under /documents/transformation with richer job and transformed-document payloads.
    • Document routes simplified to base paths; uploads can schedule transformations and return job info.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 17, 2025

Walkthrough

Moves doctransform code from app.core.doctransform into app.services.doctransform, adds nullable task_id/trace_id and renames created_at→inserted_at via a migration, converts CRUD to DTO-based create/update, introduces document helper orchestration and public response models, updates routes and tests, and adds a new job execution service.

Changes

Cohort / File(s) Summary
DB Migration
\backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py``
New Alembic migration: add nullable task_id and trace_id columns to doc_transformation_job, rename created_atinserted_at, provide downgrade.
Service relocation & new job service
\backend/app/core/doctransform/service.py` (removed), `backend/app/services/doctransform/job.py`, `backend/app/services/doctransform/registry.py`, `backend/app/services/doctransform/zerox_transformer.py``
Removed old core service; added job.py implementing start_job, execute_job (with retry, task_id/trace_id, callback payload builders); moved registry/transformer imports to services.doctransform.
Documents helpers / orchestration
\backend/app/services/documents/helpers.py``
New helpers: pre_transform_validation, schedule_transformation, _to_public_schema, build_document_schema(s), build_job_schema(s) to validate, schedule jobs, and build public schemas with optional signed URLs.
CRUD refactor
\backend/app/crud/document/doc_transformation_job.py`, `backend/app/crud/init.py`, `backend/app/crud/document/document.py``
CRUD create/update now use DTOs DocTransformJobCreate/DocTransformJobUpdate; update_status → patch-style update; __init__ re-exports nested modules; minor typing cleanup.
Models & public types
\backend/app/models/doc_transformation_job.py`, `backend/app/models/document.py`, `backend/app/models/init.py``
Added DTOs (DocTransformJobCreate, DocTransformJobUpdate), public models DocTransformationJobPublic/DocTransformationJobsPublic/TransformedDocumentPublic; added task_id, trace_id, inserted_at, and convenience properties; adjusted document/public fields.
API routes
\backend/app/api/routes/doc_transformation_job.py`, `backend/app/api/routes/documents.py`, `backend/app/api/routes/fine_tuning.py``
Routes updated to new public response models and paths (e.g., /documents/transformation/...), added include_url query param, integrated DocumentCrud and get_cloud_storage, and updated imports and router prefixes.
Celery / task docs
\backend/app/celery/tasks/job_execution.py`, `backend/app/celery/utils.py``
Docstring/example import paths updated to app.services.doctransform.*; no behavioral changes.
Supporting imports & main
\backend/app/services/collections/helpers.py`, `backend/app/api/main.py`, `backend/app/crud/init.py``
Import-path adjustments and minor reorderings to reference app.services and nested app.crud.document modules.
Tests — routes & fixtures
\backend/app/tests/api/routes/documents/test_route_*.py`, `backend/app/tests/api/routes/test_doc_transformation_job.py``
Tests updated: route fixtures target base paths, responses adapted to Public models, include_url/signed URL expectations added, CRUD usage switched to DTOs; endpoints path names updated.
Tests — doctransform job
\backend/app/tests/services/doctransformer/test_job/*.py`, `backend/app/tests/services/doctransformer/test_job/conftest.py`, `backend/app/tests/services/doctransformer/test_job/utils.py``
Tests patched to new module paths (app.services.doctransform.*), use DocTransformJobCreate, pass stringified IDs and task_id/callback_url/task_instance to execute_job; new tests for start_job added; some old tests removed.
Tests — CRUD & other updates
\backend/app/tests/crud/documents/test_doc_transformation_job.py`, various `backend/app/tests/*``
CRUD tests updated for DTO signatures and created_atinserted_at rename; broad path and expectation updates to align with code changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as Documents API
    participant JobCRUD as DocTransformationJob CRUD
    participant Service as services.doctransform.job
    participant Queue as Celery Queue
    participant Worker as Celery Worker
    participant DocCRUD as Document CRUD
    participant Storage as Cloud Storage
    participant Callback as Callback Endpoint

    Client->>API: upload/create (callback_url optional)
    API->>JobCRUD: create(DocTransformJobCreate)
    JobCRUD-->>API: Job (PENDING)
    API->>Service: start_job(job_id, transformer, target_format, callback_url)
    Service->>Queue: enqueue execute_job(job_id, source_document_id, task_id, callback_url)
    Queue-->>API: task_id
    API-->>Client: response (transformation_job_id)

    Worker->>Service: execute_job(job_id, source_document_id, task_id, callback_url)
    Service->>JobCRUD: update(job_id, patch: status=PROCESSING, task_id=...)
    Service->>Storage: download source
    Service->>Service: convert_document(...)
    Service->>Storage: upload transformed
    Service->>DocCRUD: create TransformedDocument
    Service->>JobCRUD: update(job_id, patch: status=COMPLETED, transformed_document_id=...)
    alt success & signed URL available
        Service->>Storage: get_signed_url
        Service->>Callback: POST success_payload (transformed_document + signed_url)
    else failure
        Service->>JobCRUD: update(job_id, patch: status=FAILED, error_message=...)
        Service->>Callback: POST failure_payload
    end
    Service->>Service: cleanup temp files
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • backend/app/services/doctransform/job.py — retry behavior, DB session/transaction scopes, string↔UUID handling, callback error handling, temp-file cleanup, observability/correlation-id usage.
    • Alembic migration vs. models/CRUD — ensure column rename and nullability align across migration, models, and queries.
    • backend/app/crud/document/doc_transformation_job.py — patch semantics (exclude_unset/exclude_none), FK integrity and error mapping.
    • Tests — ensure mocks/patches reference new module paths and DTO usage; validate removed tests are intentional.

Possibly related PRs

Suggested reviewers

  • avirajsingh7
  • AkhileshNegi

Poem

🐇
I hopped from core into services new,
I carried traces and task IDs too,
DTOs in paw and signed-links bright,
I queued the jobs and watched them light,
Transformed docs dancing in the night.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.11% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: moving document transformation jobs to Celery and adding callback URL support, which aligns with the core objectives of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enhancement/doc_transform_to_celery

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9368563 and f6e32ab.

📒 Files selected for processing (1)
  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nishika26 nishika26 changed the title document module enhancement document : module enhancement Nov 18, 2025
@nishika26 nishika26 marked this pull request as ready for review November 18, 2025 03:58
@nishika26 nishika26 self-assigned this Nov 18, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/models/document.py (1)

46-57: Critical: Duplicate field definition.

The signed_url field is defined twice in DocumentPublic (lines 46-48 and 55-57). This is a syntax error that will cause the second definition to override the first, though both are identical. One definition must be removed.

Apply this diff to remove the duplicate:

 class DocumentPublic(DocumentBase):
     id: UUID = Field(description="The unique identifier of the document")
-    signed_url: str | None = Field(
-        default=None, description="A signed URL for accessing the document"
-    )
     inserted_at: datetime = Field(
         description="The timestamp when the document was inserted"
     )
     updated_at: datetime = Field(
         description="The timestamp when the document was last updated"
     )
     signed_url: str | None = Field(
         default=None, description="A signed URL for accessing the document"
     )
backend/app/api/routes/documents.py (1)

67-81: Fix callback_url Form default – description string is being used as the value

Right now:

callback_url: str | None = Form("URL to call to report endpoint status")

This makes "URL to call to report endpoint status" the default value rather than a description. If a client omits callback_url, downstream code will treat that literal string as a real callback URL and attempt to call it from execute_job, which is both incorrect and potentially noisy.

Adjust to match the pattern used for target_format and transformer:

-    callback_url: str | None = Form("URL to call to report endpoint status"),
+    callback_url: str | None = Form(
+        default=None,
+        description="URL to call to report endpoint status",
+    ),

This keeps behavior aligned with the str | None annotation and avoids bogus callback invocations.

🧹 Nitpick comments (15)
backend/app/tests/services/doctransformer/test_service/test_start_job.py (3)

4-7: Drop unused Tuple import and prefer built-in generics

Tuple from typing is imported but never used, and Ruff flags typing.Tuple as deprecated. You can safely remove this import; if you need tuple annotations later, prefer the built‑in tuple[...] form instead.


32-36: Align _create_job with DocTransformationJobCrud.create DTO signature

DocTransformationJobCrud.create is designed to accept a DocTransformJobCreate payload, but _create_job passes a DocTransformationJob instance and does an extra commit. For clarity and to track the same surface tests exercise in production, consider:

-from app.models import (
-    Document,
-    DocTransformationJob,
-    Project,
-    TransformationStatus,
-    UserProjectOrg,
-)
+from app.models import (
+    Document,
+    DocTransformationJob,
+    DocTransformJobCreate,
+    Project,
+    TransformationStatus,
+    UserProjectOrg,
+)
@@
-    def _create_job(self, db: Session, project_id: int, source_document_id):
-        job = DocTransformationJob(source_document_id=source_document_id)
-        job = DocTransformationJobCrud(db, project_id=project_id).create(job)
-        db.commit()
-        return job
+    def _create_job(self, db: Session, project_id: int, source_document_id):
+        return DocTransformationJobCrud(
+            db, project_id=project_id
+        ).create(DocTransformJobCreate(source_document_id=source_document_id))

This keeps the test helper in lockstep with the CRUD API and avoids redundant transaction management.


82-106: Optionally assert scheduler isn’t called for nonexistent jobs

The negative test correctly asserts an HTTPException for a nonexistent job. To tighten it, you can also assert the scheduler was never invoked:

-        with pytest.raises(HTTPException):
-            with patch(
-                "app.services.doctransform.job.start_low_priority_job"
-            ) as mock_schedule:
-                mock_schedule.return_value = "fake-task-id"
-                start_job(
-                    db=db,
-                    current_user=current_user,
-                    job_id=nonexistent_job_id,
-                    transformer_name="test-transformer",
-                    target_format="markdown",
-                    callback_url=None,
-                )
+        with patch(
+            "app.services.doctransform.job.start_low_priority_job"
+        ) as mock_schedule, pytest.raises(HTTPException):
+            mock_schedule.return_value = "fake-task-id"
+            start_job(
+                db=db,
+                current_user=current_user,
+                job_id=nonexistent_job_id,
+                transformer_name="test-transformer",
+                target_format="markdown",
+                callback_url=None,
+            )
+        mock_schedule.assert_not_called()

This ensures failures short‑circuit before enqueuing anything.

backend/app/api/routes/documents.py (3)

88-112: Consider initializing actual_transformer for readability and type-checkers

actual_transformer is only defined inside the if target_format: block but later used in if target_format and actual_transformer:. This is safe at runtime due to short‑circuiting, but static analyzers may flag potential use before assignment.

A small refactor makes intent clearer:

-    # validate if transformation is possible or not
-    if target_format:
+    actual_transformer: str | None = None
+
+    # validate if transformation is possible or not
+    if target_format:
         if not is_transformation_supported(source_format, target_format):
             ...

The rest of the logic can stay as‑is; actual_transformer will be None when no transformation is requested, and truthy when resolved.


140-146: Avoid unnecessary string cast for transformation_job_id

TransformationJobInfo.job_id is typed as UUID, and start_job returns the underlying job ID. You currently cast to str and let Pydantic parse it back:

job_info = TransformationJobInfo(
    message=...,
    job_id=str(transformation_job_id),
    ...
    status_check_url=f"/documents/transformations/{transformation_job_id}",
)

You can pass the UUID directly and let the f‑string handle stringification for the URL:

-        job_info = TransformationJobInfo(
-            message=f"Document accepted for transformation from {source_format} to {target_format}.",
-            job_id=str(transformation_job_id),
-            status=TransformationStatus.PENDING,
-            transformer=actual_transformer,
-            status_check_url=f"/documents/transformations/{transformation_job_id}",
-        )
+        job_info = TransformationJobInfo(
+            message=f"Document accepted for transformation from {source_format} to {target_format}.",
+            job_id=transformation_job_id,
+            status=TransformationStatus.PENDING,
+            transformer=actual_transformer,
+            status_check_url=f"/documents/transformations/{transformation_job_id}",
+        )

This avoids redundant conversions while keeping the API the same.


227-230: Use DocumentPublic | TransformedDocumentPublic instead of Union[...]

Given Python 3.11+, you can simplify the response model annotation per Ruff’s suggestion:

-from typing import Union
@@
-    response_model=APIResponse[Union[DocumentPublic, TransformedDocumentPublic]],
+    response_model=APIResponse[DocumentPublic | TransformedDocumentPublic],

After this change, the Union import can be removed from the top of the file.

backend/app/crud/document/doc_transformation_job.py (1)

26-35: DTO-based create/update look good; note that update cannot clear fields

The shift to DTOs for create and update matches the rest of the refactor and the tests:

  • create(DocTransformJobCreate) cleanly constructs a DocTransformationJob from a typed payload.
  • update(..., patch: DocTransformJobUpdate) with exclude_unset=True, exclude_none=True ensures only explicitly provided, non‑None fields are applied, and updated_at is bumped.

Two minor notes:

  • With exclude_none=True, you cannot use DocTransformJobUpdate to intentionally clear a field (e.g., reset error_message back to None). If you ever need that behavior, you’ll need a different sentinel or to drop exclude_none=True.
  • The log message "update_status" is a bit misleading now that update can change more than status; consider renaming it to "update" for clarity.

Functionally, the implementation aligns well with the tests.

Also applies to: 73-95

backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py (2)

41-43: Align test job creation with DTO-based CRUD API

DocTransformationJobCrud.create is designed to take a DocTransformJobCreate payload, but these tests pass a DocTransformationJob ORM model. It works at runtime because the implementation reconstructs a DocTransformationJob from payload.model_dump(), but it drifts from the public API and makes the tests less representative of real usage.

Consider switching to the DTO for clarity and type-safety:

-from app.models import Document, Project, TransformationStatus, DocTransformationJob
+from app.models import (
+    Document,
+    Project,
+    TransformationStatus,
+    DocTransformJobCreate,
+)

...

-job = job_crud.create(DocTransformationJob(source_document_id=document.id))
+job = job_crud.create(DocTransformJobCreate(source_document_id=document.id))

Apply the same pattern in all four tests that create a job.

Also applies to: 92-94, 138-140, 188-190


61-62: Avoid asserting on a blind Exception in tests

These tests currently use pytest.raises(Exception), which can mask unexpected failures and is flagged by Ruff (B017). You already know the error surface is constrained (e.g. Tenacity’s RetryError and storage/transformer exceptions).

Tighten the expectation to the concrete exception(s) you intend to surface, for example:

from tenacity import RetryError

with pytest.raises(RetryError):
    fast_execute_job(...)

(or a tuple of specific exception types if needed).

Also applies to: 159-160

backend/app/tests/services/doctransformer/test_service/test_execute_job.py (2)

51-54: Use DocTransformJobCreate in tests to match CRUD interface

As in the error tests, these tests call:

job = job_crud.create(DocTransformationJob(source_document_id=document.id))

while DocTransformationJobCrud.create is typed to accept DocTransformJobCreate. Passing the ORM model works but drifts from the intended DTO-based API and weakens type checking.

Updating tests to use the DTO keeps them aligned with the public surface:

-from app.models import Document, Project, TransformationStatus, DocTransformationJob
+from app.models import (
+    Document,
+    Project,
+    TransformationStatus,
+    DocTransformJobCreate,
+)

...

-job = job_crud.create(DocTransformationJob(source_document_id=document.id))
+job = job_crud.create(DocTransformJobCreate(source_document_id=document.id))

Apply across all job creations in this file.

Also applies to: 141-144, 183-186, 228-231, 279-282


154-164: Tighten pytest.raises(Exception) to specific error types

test_execute_job_with_missing_source_document asserts on a bare Exception, which is flagged by Ruff (B017) and can hide unexpected failures.

Given the retry layer (RetryError) and storage/IO error surface, prefer asserting on the concrete exception(s) expected here, e.g.:

from tenacity import RetryError

with pytest.raises(RetryError):
    fast_execute_job(...)

(or a tuple of specific well-known exceptions).

backend/app/services/doctransform/job.py (1)

34-63: Correct start_job return type and avoid redundant read

start_job is annotated to return str but currently returns job.id (a UUID), and it performs an extra read after an update:

def start_job(... ) -> str:
    ...
    job_crud.update(job_id, DocTransformJobUpdate(trace_id=trace_id))
    job = job_crud.read_one(job_id)
    ...
    return job.id

To align the type hints and avoid the redundant query, consider:

-def start_job(... ) -> str:
+def start_job(... ) -> UUID:
@@
-    job_crud.update(job_id, DocTransformJobUpdate(trace_id=trace_id))
-    job = job_crud.read_one(job_id)
+    job = job_crud.update(job_id, DocTransformJobUpdate(trace_id=trace_id))
@@
-    return job.id
+    return job.id

If the external caller really needs a string, cast at the call site or change the annotation to -> str and return str(job.id) consistently.

backend/app/tests/api/routes/test_doc_transformation_job.py (1)

5-10: Tests nicely cover the new job APIs; consider adding include_url coverage

These tests exercise the new /documents/transformation/{job_id} and /documents/transformation/ routes, DTO-based CRUD (DocTransformJobCreate/DocTransformJobUpdate), cross-project isolation, and a range of validation errors. The assertions against DocTransformationJobPublic/DocTransformationJobsPublic shapes look consistent with the API design.

One gap you might want to close later is explicit coverage for include_url=True on both endpoints to verify that signed URLs are attached when available and that failures in URL generation are handled as expected.

Also applies to: 80-97, 140-366

backend/app/api/routes/doc_transformation_job.py (2)

39-53: Gracefully handle failures when generating signed URLs

Both endpoints call get_signed_url directly:

storage = get_cloud_storage(...)
transformed_doc_schema.signed_url = storage.get_signed_url(document.object_store_url)

If storage misconfiguration or the backend raises, these routes will 500 even though the underlying job/document data is fine. In execute_job you already wrap signed URL generation in a try/except and log a warning instead of failing the whole operation.

For consistency and resilience, consider mirroring that pattern here:

-        if include_url:
-            storage = get_cloud_storage(
-                session=session, project_id=current_user.project_id
-            )
-            transformed_doc_schema.signed_url = storage.get_signed_url(
-                document.object_store_url
-            )
+        if include_url:
+            try:
+                storage = get_cloud_storage(
+                    session=session, project_id=current_user.project_id
+                )
+                transformed_doc_schema.signed_url = storage.get_signed_url(
+                    document.object_store_url
+                )
+            except Exception as err:  # keep job fetch working even if URL generation fails
+                logger.warning(
+                    "[get_transformation_job] failed to generate signed URL | job_id=%s | error=%s",
+                    job_id,
+                    err,
+                )

and similarly in the multi-job loop. That way, clients still get job status and metadata even if signed URL generation has issues.

Also applies to: 80-103


67-75: Replace min/max_length with min_items/max_items for list constraint validation

The job_ids parameter uses Query with min_items and max_items keyword arguments on fastapi.Query to control the minimum and maximum number of items in the list, not min and max_length. Update line 71 to enforce the intended "at least 1, at most 100 IDs" constraint:

job_ids: list[UUID] = Query(
    description="List of transformation job IDs", min_items=1, max_items=100
),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa704b9 and cd2fd2c.

📒 Files selected for processing (32)
  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1 hunks)
  • backend/app/api/routes/doc_transformation_job.py (1 hunks)
  • backend/app/api/routes/documents.py (10 hunks)
  • backend/app/api/routes/fine_tuning.py (1 hunks)
  • backend/app/celery/tasks/job_execution.py (2 hunks)
  • backend/app/celery/utils.py (2 hunks)
  • backend/app/core/doctransform/service.py (0 hunks)
  • backend/app/crud/__init__.py (1 hunks)
  • backend/app/crud/document/doc_transformation_job.py (3 hunks)
  • backend/app/crud/document/document.py (0 hunks)
  • backend/app/models/__init__.py (1 hunks)
  • backend/app/models/doc_transformation_job.py (2 hunks)
  • backend/app/models/document.py (3 hunks)
  • backend/app/services/collections/helpers.py (1 hunks)
  • backend/app/services/doctransform/job.py (1 hunks)
  • backend/app/services/doctransform/registry.py (1 hunks)
  • backend/app/services/doctransform/zerox_transformer.py (1 hunks)
  • backend/app/tests/api/routes/documents/test_route_document_info.py (1 hunks)
  • backend/app/tests/api/routes/documents/test_route_document_list.py (1 hunks)
  • backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (1 hunks)
  • backend/app/tests/api/routes/documents/test_route_document_remove.py (1 hunks)
  • backend/app/tests/api/routes/documents/test_route_document_upload.py (4 hunks)
  • backend/app/tests/api/routes/test_doc_transformation_job.py (13 hunks)
  • backend/app/tests/core/doctransformer/test_service/test_start_job.py (0 hunks)
  • backend/app/tests/crud/documents/test_doc_transformation_job.py (6 hunks)
  • backend/app/tests/services/doctransformer/test_registry.py (3 hunks)
  • backend/app/tests/services/doctransformer/test_service/conftest.py (2 hunks)
  • backend/app/tests/services/doctransformer/test_service/test_execute_job.py (9 hunks)
  • backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py (9 hunks)
  • backend/app/tests/services/doctransformer/test_service/test_integration.py (4 hunks)
  • backend/app/tests/services/doctransformer/test_service/test_start_job.py (1 hunks)
  • backend/app/tests/services/doctransformer/test_service/utils.py (1 hunks)
💤 Files with no reviewable changes (3)
  • backend/app/crud/document/document.py
  • backend/app/tests/core/doctransformer/test_service/test_start_job.py
  • backend/app/core/doctransform/service.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/celery/tasks/job_execution.py
  • backend/app/tests/services/doctransformer/test_service/utils.py
  • backend/app/tests/services/doctransformer/test_service/conftest.py
  • backend/app/crud/__init__.py
  • backend/app/api/routes/fine_tuning.py
  • backend/app/models/__init__.py
  • backend/app/tests/api/routes/documents/test_route_document_info.py
  • backend/app/api/routes/documents.py
  • backend/app/services/collections/helpers.py
  • backend/app/services/doctransform/registry.py
  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py
  • backend/app/tests/api/routes/documents/test_route_document_remove.py
  • backend/app/tests/services/doctransformer/test_service/test_integration.py
  • backend/app/api/routes/doc_transformation_job.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/crud/document/doc_transformation_job.py
  • backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py
  • backend/app/services/doctransform/zerox_transformer.py
  • backend/app/tests/crud/documents/test_doc_transformation_job.py
  • backend/app/tests/services/doctransformer/test_registry.py
  • backend/app/services/doctransform/job.py
  • backend/app/celery/utils.py
  • backend/app/tests/services/doctransformer/test_service/test_start_job.py
  • backend/app/tests/api/routes/documents/test_route_document_list.py
  • backend/app/tests/api/routes/documents/test_route_document_upload.py
  • backend/app/tests/api/routes/test_doc_transformation_job.py
  • backend/app/tests/services/doctransformer/test_service/test_execute_job.py
  • backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py
  • backend/app/models/document.py
backend/app/celery/tasks/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define Celery tasks under backend/app/celery/tasks/

Files:

  • backend/app/celery/tasks/job_execution.py
backend/app/celery/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/

Files:

  • backend/app/celery/tasks/job_execution.py
  • backend/app/celery/utils.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/__init__.py
  • backend/app/crud/document/doc_transformation_job.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/fine_tuning.py
  • backend/app/api/routes/documents.py
  • backend/app/api/routes/doc_transformation_job.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/__init__.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/models/document.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/helpers.py
  • backend/app/services/doctransform/registry.py
  • backend/app/services/doctransform/zerox_transformer.py
  • backend/app/services/doctransform/job.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/**/*.py : Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/

Applied to files:

  • backend/app/celery/tasks/job_execution.py
  • backend/app/celery/utils.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/tasks/**/*.py : Define Celery tasks under backend/app/celery/tasks/

Applied to files:

  • backend/app/celery/tasks/job_execution.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/tests/services/doctransformer/test_service/utils.py
  • backend/app/crud/__init__.py
  • backend/app/models/__init__.py
  • backend/app/api/routes/documents.py
  • backend/app/services/doctransform/registry.py
  • backend/app/tests/services/doctransformer/test_service/test_integration.py
  • backend/app/api/routes/doc_transformation_job.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/crud/document/doc_transformation_job.py
  • backend/app/services/doctransform/zerox_transformer.py
  • backend/app/tests/crud/documents/test_doc_transformation_job.py
  • backend/app/tests/services/doctransformer/test_registry.py
  • backend/app/services/doctransform/job.py
  • backend/app/tests/api/routes/test_doc_transformation_job.py
  • backend/app/tests/services/doctransformer/test_service/test_execute_job.py
  • backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py
  • backend/app/models/document.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/cloud/storage.py : Implement cloud storage integration in backend/app/core/cloud/storage.py

Applied to files:

  • backend/app/tests/services/doctransformer/test_service/utils.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/crud/**/*.py : Implement database access operations in backend/app/crud/

Applied to files:

  • backend/app/crud/__init__.py
🧬 Code graph analysis (24)
backend/app/tests/services/doctransformer/test_service/utils.py (1)
backend/app/services/doctransform/transformer.py (1)
  • Transformer (5-14)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)
backend/app/services/doctransform/job.py (1)
  • execute_job (110-272)
backend/app/crud/__init__.py (2)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/api/routes/fine_tuning.py (3)
backend/app/tests/crud/documents/test_doc_transformation_job.py (1)
  • crud (26-27)
backend/app/tests/crud/documents/documents/test_crud_document_delete.py (1)
  • document (15-24)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/models/__init__.py (2)
backend/app/models/document.py (5)
  • DocTransformationJobPublic (82-87)
  • DocTransformationJobsPublic (90-92)
  • TransformedDocumentPublic (60-64)
  • DocumentUploadResponse (77-79)
  • TransformationJobInfo (67-74)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • DocTransformJobUpdate (52-57)
backend/app/tests/api/routes/documents/test_route_document_info.py (1)
backend/app/tests/utils/document.py (1)
  • Route (83-110)
backend/app/api/routes/documents.py (6)
backend/app/services/doctransform/registry.py (4)
  • get_available_transformers (75-79)
  • get_file_format (53-59)
  • is_transformation_supported (70-72)
  • resolve_transformer (82-106)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/crud/document/doc_transformation_job.py (2)
  • DocTransformationJobCrud (21-95)
  • create (26-34)
backend/app/models/document.py (5)
  • Document (21-41)
  • DocumentPublic (44-57)
  • TransformedDocumentPublic (60-64)
  • DocumentUploadResponse (77-79)
  • TransformationJobInfo (67-74)
backend/app/models/doc_transformation_job.py (3)
  • DocTransformJobCreate (48-49)
  • TransformationStatus (11-15)
  • job_id (36-37)
backend/app/services/doctransform/job.py (1)
  • start_job (34-63)
backend/app/services/collections/helpers.py (2)
backend/app/tests/crud/documents/documents/test_crud_document_delete.py (1)
  • document (15-24)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/services/doctransform/registry.py (2)
backend/app/services/doctransform/transformer.py (1)
  • Transformer (5-14)
backend/app/services/doctransform/zerox_transformer.py (1)
  • ZeroxTransformer (12-70)
backend/app/tests/api/routes/documents/test_route_document_remove.py (1)
backend/app/tests/utils/document.py (1)
  • Route (83-110)
backend/app/tests/services/doctransformer/test_service/test_integration.py (5)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/services/doctransform/job.py (2)
  • execute_job (110-272)
  • start_job (34-63)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (3)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • job_id (36-37)
backend/app/models/user.py (1)
  • UserProjectOrg (58-59)
backend/app/api/routes/doc_transformation_job.py (4)
backend/app/crud/document/document.py (3)
  • DocumentCrud (13-134)
  • read_one (18-34)
  • read_each (72-95)
backend/app/models/document.py (3)
  • DocTransformationJobPublic (82-87)
  • DocTransformationJobsPublic (90-92)
  • TransformedDocumentPublic (60-64)
backend/app/utils.py (2)
  • APIResponse (30-54)
  • success_response (37-40)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/models/doc_transformation_job.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/crud/document/doc_transformation_job.py (4)
backend/app/tests/crud/documents/test_doc_transformation_job.py (1)
  • crud (26-27)
backend/app/crud/document/document.py (3)
  • DocumentCrud (13-134)
  • update (97-122)
  • read_one (18-34)
backend/app/models/doc_transformation_job.py (5)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • DocTransformJobUpdate (52-57)
  • job_id (36-37)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (1)
backend/app/tests/utils/document.py (1)
  • Route (83-110)
backend/app/services/doctransform/zerox_transformer.py (1)
backend/app/services/doctransform/transformer.py (1)
  • Transformer (5-14)
backend/app/tests/crud/documents/test_doc_transformation_job.py (5)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • DocTransformJobUpdate (52-57)
backend/app/tests/conftest.py (1)
  • db (28-45)
backend/app/tests/utils/document.py (2)
  • DocumentStore (53-80)
  • project (65-66)
backend/app/tests/utils/utils.py (2)
  • SequentialUuidGenerator (150-163)
  • get_project (54-73)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
  • DocTransformationJobCrud (21-95)
  • update (73-95)
  • read_one (36-55)
backend/app/crud/document/document.py (3)
  • DocumentCrud (13-134)
  • update (97-122)
  • read_one (18-34)
backend/app/models/document.py (3)
  • Document (21-41)
  • DocTransformationJobPublic (82-87)
  • TransformedDocumentPublic (60-64)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformJobUpdate (52-57)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (3)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/doctransform/registry.py (1)
  • convert_document (109-130)
backend/app/tests/services/doctransformer/test_service/test_start_job.py (7)
backend/app/services/doctransform/job.py (2)
  • execute_job (110-272)
  • start_job (34-63)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (3)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_service/utils.py (2)
  • DocTransformTestBase (30-69)
  • MockTestTransformer (16-27)
backend/app/tests/conftest.py (1)
  • db (28-45)
backend/app/tests/services/doctransformer/test_service/conftest.py (2)
  • current_user (55-63)
  • test_document (73-79)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
backend/app/tests/utils/document.py (1)
  • Route (83-110)
backend/app/tests/api/routes/test_doc_transformation_job.py (3)
backend/app/crud/document/doc_transformation_job.py (3)
  • DocTransformationJobCrud (21-95)
  • create (26-34)
  • update (73-95)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • DocTransformJobUpdate (52-57)
backend/app/tests/utils/document.py (2)
  • DocumentStore (53-80)
  • put (68-73)
backend/app/tests/services/doctransformer/test_service/test_execute_job.py (5)
backend/app/services/doctransform/registry.py (1)
  • TransformationError (7-8)
backend/app/services/doctransform/job.py (1)
  • execute_job (110-272)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_service/utils.py (1)
  • MockTestTransformer (16-27)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)
  • fast_execute_job (33-51)
backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py (3)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_service/utils.py (5)
  • DocTransformTestBase (30-69)
  • MockTestTransformer (16-27)
  • failing_convert_document (76-85)
  • create_failing_convert_document (72-87)
  • persistent_failing_convert_document (95-96)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)
  • fast_execute_job (33-51)
backend/app/models/document.py (1)
backend/app/models/doc_transformation_job.py (1)
  • TransformationStatus (11-15)
🪛 Ruff (0.14.5)
backend/app/api/routes/documents.py

229-229: Use X | Y for type annotations

(UP007)

backend/app/services/doctransform/job.py

22-22: Redefinition of unused TransformedDocumentPublic from line 20

Remove definition: TransformedDocumentPublic

(F811)


118-118: Unused function argument: task_instance

(ARG001)

backend/app/tests/services/doctransformer/test_service/test_start_job.py

6-6: typing.Tuple is deprecated, use tuple instead

(UP035)

backend/app/tests/services/doctransformer/test_service/test_execute_job.py

154-154: Do not assert blind exception: Exception

(B017)

backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py

159-159: Do not assert blind exception: Exception

(B017)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (33)
backend/app/celery/tasks/job_execution.py (1)

25-25: Documentation correctly updated to reflect module reorganization.

The docstring examples have been properly updated to reference the new app.services.doctransform.service.execute_job path, aligning with the broader refactoring effort that relocates document transformation logic from app.core.doctransform to app.services.doctransform.

Also applies to: 50-50

backend/app/celery/utils.py (2)

18-43: Code quality and type hints look good.

Both functions follow the established pattern with complete type hints (compliant with Python 3.11+ guidelines), clear docstrings, and proper parameter validation. The logging at lines 42 and 70 provides good observability for task tracking.

Also applies to: 46-71


25-25: Update docstring examples to use correct module path.

The docstring examples reference app.services.doctransform.service.execute_job, but the actual function is located at app.services.doctransform.job.execute_job. Correct the import path in the docstring examples at lines 25 and 53 to match the actual module structure (change .service to .job).

⛔ Skipped due to learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/**/*.py : Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/tasks/**/*.py : Define Celery tasks under backend/app/celery/tasks/
backend/app/models/document.py (2)

60-64: LGTM: TransformedDocumentPublic correctly extends DocumentPublic.

The new TransformedDocumentPublic class appropriately extends DocumentPublic with the source_document_id field, supporting the transformation workflow.


82-92: LGTM: New public models for transformation jobs.

DocTransformationJobPublic and DocTransformationJobsPublic provide a clean public API surface for transformation job responses, with proper error handling via error_message and support for jobs not found.

backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (1)

29-29: LGTM: Route fixture updated to base path.

The change from Route("remove") to Route("") aligns with the broader routing refactor that shifts document-related test routes to use the base path. The tests still correctly target the permanent removal endpoint via route.append(document, suffix="permanent").

backend/app/services/doctransform/zerox_transformer.py (1)

7-7: LGTM: Import path updated to services module.

The import path change from app.core.doctransform.transformer to app.services.doctransform.transformer aligns with the PR's objective to relocate transformation service logic under app.services.doctransform per the coding guideline to implement business logic services under backend/app/services/.

Note: This intentionally supersedes the previous learning about placing doctransform utilities under app.core.doctransform/. Based on learnings.

backend/app/services/collections/helpers.py (1)

11-11: LGTM: Import path updated for nested CRUD module.

The import path change from app.crud.document to app.crud.document.document reflects the CRUD module reorganization to a nested structure. No logic changes.

backend/app/tests/api/routes/documents/test_route_document_info.py (1)

17-17: LGTM: Route fixture updated to base path.

The change from Route("info") to Route("") is consistent with the broader routing refactor pattern observed across document-related test files.

backend/app/tests/api/routes/documents/test_route_document_remove.py (1)

20-20: LGTM: Route fixture updated to base path.

The change from Route("remove") to Route("") maintains consistency with the routing refactor applied across all document-related test files.

backend/app/models/__init__.py (2)

35-37: LGTM: New public transformation models exported.

The addition of DocTransformationJobPublic, DocTransformationJobsPublic, and TransformedDocumentPublic exports aligns with the new public API surface for document transformation responses introduced in backend/app/models/document.py.


44-45: LGTM: Input DTOs exported for transformation CRUD.

The addition of DocTransformJobCreate and DocTransformJobUpdate exports provides input DTOs for transformation job CRUD operations, replacing the previous DocTransformationJobs export and aligning with the updated CRUD interface.

backend/app/tests/services/doctransformer/test_service/utils.py (1)

12-12: LGTM: Import path updated to services module.

The import path change from app.core.doctransform.transformer to app.services.doctransform.transformer is consistent with the broader refactor to relocate transformation service logic under app.services.doctransform per coding guidelines.

Based on learnings.

backend/app/tests/api/routes/documents/test_route_document_list.py (1)

24-24: LGTM!

The route fixture update correctly aligns with the broader routing refactor where document list endpoints now use the base path instead of an explicit "/list" subpath.

backend/app/services/doctransform/registry.py (1)

3-4: Import paths updated correctly for services refactor.

The import paths have been updated to reflect the relocation of document transformation logic from app.core.doctransform to app.services.doctransform. This aligns with the coding guideline to "Implement business logic services under backend/app/services/". Note that this conflicts with a retrieved learning suggesting to place these under backend/app/core/doctransform/, but the coding guidelines take precedence for business logic services.

Based on learnings
As per coding guidelines

backend/app/api/routes/fine_tuning.py (1)

21-21: LGTM!

The import path update correctly reflects the nested CRUD module structure. DocumentCrud is now properly imported from app.crud.document.document.

backend/app/tests/services/doctransformer/test_service/conftest.py (1)

35-50: LGTM!

The test fixture correctly updates import and patch paths from app.core.doctransform.service to app.services.doctransform.job, aligning with the services refactor. The patch target and function references are accurate.

backend/app/crud/__init__.py (1)

9-11: LGTM!

The import paths have been correctly updated to reflect the nested CRUD module structure:

  • DocumentCrud now imported from .document.document
  • DocTransformationJobCrud now imported from .document.doc_transformation_job

These changes align with the broader CRUD reorganization in this PR.

backend/app/tests/api/routes/documents/test_route_document_upload.py (2)

76-76: LGTM!

The route fixture update correctly aligns with the broader routing refactor where document upload now uses the base path.


154-154: LGTM!

The patch targets have been correctly updated from app.core.doctransform.service.start_job to app.services.doctransform.job.start_job, aligning with the services refactor. All three test methods now reference the correct module path.

Also applies to: 187-187, 277-277

backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1)

20-39: LGTM!

The migration is well-structured:

  • Adds task_id and trace_id columns (nullable AutoString) for Celery task tracking
  • Renames created_at to inserted_at for consistency
  • Provides a proper downgrade path that reverses all changes

The migration operations are correctly ordered and the schema changes align with the broader transformation job enhancements in this PR.

backend/app/tests/services/doctransformer/test_registry.py (1)

3-3: LGTM!

The import and patch paths have been correctly updated to reflect the relocation of the registry module from app.core.doctransform.registry to app.services.doctransform.registry. This aligns with the coding guideline to "Implement business logic services under backend/app/services/". Note that this conflicts with a retrieved learning suggesting to place these under backend/app/core/doctransform/, but the coding guidelines take precedence for business logic services.

Based on learnings
As per coding guidelines

Also applies to: 23-23, 50-50

backend/app/tests/services/doctransformer/test_service/test_start_job.py (2)

38-81: Happy-path test covers scheduler integration well

test_start_job_success exercises the full flow: job creation, scheduling via start_low_priority_job, and verifies persisted job fields plus scheduler kwargs. This is solid coverage for the new Celery-based start path.


107-195: Good coverage for format and transformer propagation

The parametrized tests for multiple target_format values and different transformer_name values assert that scheduler kwargs stay consistent and that job status remains PENDING. This gives good confidence that the service correctly threads these parameters through to Celery.

backend/app/api/routes/documents.py (1)

239-248: doc_info branching correctly picks transformed vs. original schema

The doc_info endpoint now chooses DocumentPublic when source_document_id is None and TransformedDocumentPublic otherwise, with optional signed URL enrichment. This matches the model semantics for original vs. transformed documents and looks correct.

backend/app/tests/crud/documents/test_doc_transformation_job.py (4)

30-73: Create-path tests align with new DTO and DB constraints

The TestDocTransformationJobCrudCreate tests correctly verify:

  • Happy path job creation via DocTransformJobCreate.
  • FK integrity by expecting IntegrityError when the source document doesn’t exist.
  • Behavior when the source document is soft‑deleted: creation succeeds but later reads 404.

This suite nicely documents and enforces the intended semantics of create.


75-130: read_one tests thoroughly cover not-found and cross-project cases

The TestDocTransformationJobCrudReadOne tests validate:

  • Successful read for existing jobs.
  • 404 behavior for nonexistent IDs.
  • 404 for jobs whose source document is deleted.
  • 404 when reading a job under a different project.

These align with the join + project/is_deleted filters in read_one.


132-184: read_each tests cover multi-ID, partial, empty, and cross-project behavior

The TestDocTransformationJobCrudReadEach class:

  • Asserts correct results for multiple job IDs.
  • Confirms partial results when some IDs don’t exist.
  • Handles empty input sets gracefully.
  • Verifies that other projects see no jobs due to project scoping.

This gives good safety net coverage for the bulk-read path.


186-273: update tests precisely exercise status transitions and patch semantics

TestDocTransformationJobCrudUpdateStatus:

  • Confirms status transitions to PROCESSING, COMPLETED (with transformed_document_id), and FAILED (with error_message).
  • Verifies that updates on nonexistent jobs raise the expected HTTPException.
  • Ensures that fields not present in the patch (like error_message) are preserved, validating the exclude_unset/exclude_none behavior.

These tests are well-aligned with the new update implementation.

backend/app/models/doc_transformation_job.py (1)

35-57: DTOs and helper properties align with the new workflow

The additions of:

  • DocTransformJobCreate with source_document_id,
  • DocTransformJobUpdate with transformed_document_id, task_id, status, error_message, and trace_id,
  • and the job_id, job_inserted_at, and job_updated_at properties

all fit cleanly with the CRUD and service-layer refactor and make the model easier to consume in public APIs and payload builders.

backend/app/tests/services/doctransformer/test_service/test_integration.py (3)

63-80: LGTM!

The patch targets have been correctly updated to the new module paths, and all parameters passed to execute_job properly match the expected types with appropriate UUID-to-string conversions.


95-196: Test coverage and implementation look good.

The remaining test methods (test_execute_job_concurrent_jobs and test_multiple_format_transformations) follow the correct patterns for execute_job calls with proper type conversions. The test coverage is comprehensive, validating concurrent execution and multiple format transformations.

Note: Lines 107 and 155 have the same job creation issue flagged earlier (using DocTransformationJob instead of DocTransformJobCreate).


51-59: No issues found in the test code.

The test assertion job.id == returned_job_id is correct. Verification confirms that start_job() in backend/app/services/doctransform/job.py:63 returns job.id, which is a UUID object. Both sides of the comparison are UUIDs, so the assertion will work as intended despite the function's incorrect type hint (-> str).

Note: The type hint discrepancy exists in the service file (should return UUID, not str), but this does not affect test correctness.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd2fd2c and cdc34e4.

📒 Files selected for processing (1)
  • backend/app/tests/services/doctransformer/test_service/conftest.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/services/doctransformer/test_service/conftest.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
🧬 Code graph analysis (1)
backend/app/tests/services/doctransformer/test_service/conftest.py (2)
backend/app/services/doctransform/job.py (1)
  • execute_job (110-272)
backend/app/models/doc_transformation_job.py (1)
  • job_id (36-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)

53-62: ****

The execute_job function in job.py (line 109) is decorated with @retry, which means the __wrapped__ attribute exists and is safe to access. The code correctly uses original_execute_job.__wrapped__() to bypass the retry decorator during testing. No changes are needed.

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
backend/app/services/doctransform/job.py (2)

16-24: Remove duplicate TransformedDocumentPublic import (Ruff F811)

TransformedDocumentPublic appears twice in the import tuple, which triggers a redefinition warning and will fail linting. Keep a single entry.

-from app.models import (
-    Document,
-    DocTransformJobUpdate,
-    TransformationStatus,
-    DocTransformationJobPublic,
-    TransformedDocumentPublic,
-    DocTransformationJob,
-    TransformedDocumentPublic,
-)
+from app.models import (
+    Document,
+    DocTransformJobUpdate,
+    TransformationStatus,
+    DocTransformationJobPublic,
+    TransformedDocumentPublic,
+    DocTransformationJob,
+)

110-270: Harden execute_job error path and clean up lint issues

There are a few correctness and lint issues in execute_job:

  1. Potential UnboundLocalError for job_uuid in except

If UUID(job_id) raises (e.g. malformed ID), job_uuid is never assigned, but the except block still references it in logging and in the DB update, which will raise UnboundLocalError and mask the original failure.

Initialize job_uuid before the try and guard DB updates on its presence, also logging job_id as a fallback:

-@retry(wait=wait_exponential(multiplier=5, min=5, max=10), stop=stop_after_attempt(3))
-def execute_job(
+@retry(wait=wait_exponential(multiplier=5, min=5, max=10), stop=stop_after_attempt(3))
+def execute_job(
@@
-    task_instance,
+    task_instance,  # noqa: ARG001  # provided by Celery, not used directly
 ):
-    start_time = time.time()
-    tmp_dir: Path | None = None
-
-    job_for_payload = None  # keep latest job snapshot for payloads
-
-    try:
-        job_uuid = UUID(job_id)
+    start_time = time.time()
+    tmp_dir: Path | None = None
+
+    job_for_payload = None  # keep latest job snapshot for payloads
+    job_uuid: UUID | None = None
+
+    try:
+        job_uuid = UUID(job_id)
@@
-    except Exception as e:
-        logger.error(
-            "[doc_transform.execute_job] FAILED | job_id=%s | error=%s",
-            job_uuid,
-            e,
-            exc_info=True,
-        )
-
-        try:
-            with Session(engine) as db:
-                job_crud = DocTransformationJobCrud(session=db, project_id=project_id)
-                job_for_payload = job_crud.update(
-                    job_uuid,
-                    DocTransformJobUpdate(
-                        status=TransformationStatus.FAILED, error_message=str(e)
-                    ),
-                )
-        except Exception as db_error:
-            logger.error(
-                "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s",
-                job_uuid,
-                db_error,
-            )
+    except Exception as e:
+        logger.error(
+            "[doc_transform.execute_job] FAILED | job_id=%s | error=%s",
+            job_uuid or job_id,
+            e,
+            exc_info=True,
+        )
+
+        if job_uuid is not None:
+            try:
+                with Session(engine) as db:
+                    job_crud = DocTransformationJobCrud(
+                        session=db, project_id=project_id
+                    )
+                    job_for_payload = job_crud.update(
+                        job_uuid,
+                        DocTransformJobUpdate(
+                            status=TransformationStatus.FAILED,
+                            error_message=str(e),
+                        ),
+                    )
+            except Exception as db_error:
+                logger.error(
+                    "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s",
+                    job_uuid,
+                    db_error,
+                )
  1. Unused task_instance argument (Ruff ARG001)

The snippet above adds # noqa: ARG001 to the parameter, making it explicit this is provided by Celery but unused.

  1. Content-type map can be extended for text/html

Tests exercise multiple formats; aligning content types now avoids surprises later:

-        content_type_map = {"markdown": "text/markdown; charset=utf-8"}
-        content_type = content_type_map.get(target_format, "text/plain")
+        content_type_map = {
+            "markdown": "text/markdown; charset=utf-8",
+            "text": "text/plain; charset=utf-8",
+            "html": "text/html; charset=utf-8",
+        }
+        content_type = content_type_map.get(target_format, "text/plain")

These changes make the error handling robust and resolve outstanding lint warnings.

backend/app/tests/services/doctransformer/test_service/test_integration.py (1)

12-20: Use the DocTransformJobCreate DTO when calling DocTransformationJobCrud.create

DocTransformationJobCrud.create() is defined to accept a DocTransformJobCreate payload, but the tests still pass a DocTransformationJob ORM model:

  • Line 40/41: job_crud.create(DocTransformationJob(...))
  • Line 108: same pattern inside the loop
  • Line 156: same pattern for multi-format jobs

This violates the CRUD interface and couples tests to the ORM layer more than necessary. It also risk-picks extra fields (id, timestamps) into the create path.

Please import and use the DTO instead, e.g.:

-from app.models import (
-    Document,
-    DocTransformationJob,
-    Project,
-    TransformationStatus,
-    UserProjectOrg,
-)
+from app.models import (
+    Document,
+    DocTransformationJob,
+    Project,
+    TransformationStatus,
+    UserProjectOrg,
+)
+# or wherever the DTO actually lives:
+# from app.schemas import DocTransformJobCreate  # adjust module as needed
@@
-        job_crud = DocTransformationJobCrud(session=db, project_id=project.id)
-        job = job_crud.create(DocTransformationJob(source_document_id=document.id))
+        job_crud = DocTransformationJobCrud(session=db, project_id=project.id)
+        job = job_crud.create(
+            DocTransformJobCreate(source_document_id=document.id)
+        )
@@
-        for i in range(3):
-            job = job_crud.create(DocTransformationJob(source_document_id=document.id))
+        for i in range(3):
+            job = job_crud.create(
+                DocTransformJobCreate(source_document_id=document.id)
+            )
@@
-        for target_format in formats:
-            job = job_crud.create(DocTransformationJob(source_document_id=document.id))
+        for target_format in formats:
+            job = job_crud.create(
+                DocTransformJobCreate(source_document_id=document.id)
+            )

Adjust the import path for DocTransformJobCreate to match where the DTO is actually defined in your project.

#!/bin/bash
# Locate the DocTransformJobCreate DTO to confirm the correct import path.
rg -n "class DocTransformJobCreate" backend || rg -n "DocTransformJobCreate" backend

Also applies to: 40-42, 107-109, 154-157

🧹 Nitpick comments (1)
backend/app/services/doctransform/job.py (1)

67-107: Payload builders look correct; you can skip redundant re-validation

build_success_payload / build_failure_payload correctly wrap DocTransformationJobPublic in the common APIResponse envelope and exclude error_message from the data section, which matches the intended API shape.

Given transformed_doc is already typed as TransformedDocumentPublic, you can avoid re-validating it via model_validate and just pass it through, if you don’t need coercion:

 def build_success_payload(
     job: DocTransformationJob,
     transformed_doc: TransformedDocumentPublic,
 ) -> dict:
@@
-    transformed_public = TransformedDocumentPublic.model_validate(transformed_doc)
+    transformed_public = transformed_doc

This is optional and purely a small simplification.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdc34e4 and 9225aa9.

📒 Files selected for processing (2)
  • backend/app/services/doctransform/job.py (1 hunks)
  • backend/app/tests/services/doctransformer/test_service/test_integration.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/doctransform/job.py
  • backend/app/tests/services/doctransformer/test_service/test_integration.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/doctransform/job.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/services/doctransform/job.py
🧬 Code graph analysis (2)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
  • DocTransformationJobCrud (21-95)
  • update (73-95)
  • read_one (36-55)
backend/app/crud/document/document.py (3)
  • DocumentCrud (13-134)
  • update (97-122)
  • read_one (18-34)
backend/app/models/document.py (3)
  • Document (21-41)
  • DocTransformationJobPublic (82-87)
  • TransformedDocumentPublic (60-64)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformJobUpdate (52-57)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (3)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/doctransform/registry.py (1)
  • convert_document (109-130)
backend/app/tests/services/doctransformer/test_service/test_integration.py (6)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/services/doctransform/job.py (2)
  • execute_job (111-271)
  • start_job (35-64)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (3)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)
  • current_user (69-77)
backend/app/tests/services/doctransformer/test_service/utils.py (1)
  • MockTestTransformer (16-27)
🪛 Ruff (0.14.5)
backend/app/services/doctransform/job.py

23-23: Redefinition of unused TransformedDocumentPublic from line 21

Remove definition: TransformedDocumentPublic

(F811)


119-119: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/tests/services/doctransformer/test_service/test_integration.py (2)

51-80: start_job / execute_job integration usage looks correct

The tests now:

  • Call start_job with job_id=job.id and callback_url=None, asserting the returned UUID matches, and
  • Invoke execute_job with job_id/source_document_id as strings, a generated task_id, and callback_url=None, while patching Session and TRANSFORMERS to keep things in-process.

This matches the new signatures in app.services.doctransform.job and validates the end-to-end workflow across multiple jobs and formats.

Also applies to: 112-132, 160-180


91-91: Drop debug print from the test to keep output clean

The line:

print("Transformed fname:", transformed_doc.fname)

adds noise to test output and isn’t asserted on. Consider removing it or replacing it with an assertion if you need to verify the value.

[ suggest_optional_refactor ]

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
backend/app/tests/services/doctransformer/test_service/test_integration.py (4)

100-106: Pass DocTransformJobCreate DTO instead of the ORM model.

Line 104 passes a DocTransformationJob ORM model to create(), but the method expects a DocTransformJobCreate payload. This type contract violation was previously flagged.


12-20: Import the missing DocTransformJobCreate DTO.

The CRUD create() method expects a DocTransformJobCreate payload, but this import is missing. This issue was previously flagged.


40-41: Pass DocTransformJobCreate DTO instead of the ORM model.

DocTransformationJobCrud.create() expects a DocTransformJobCreate payload per its signature, but the test passes a DocTransformationJob ORM model. This violates the type contract and was previously flagged.


147-151: Pass DocTransformJobCreate DTO instead of the ORM model.

Line 149 passes a DocTransformationJob ORM model to create(), but the method expects a DocTransformJobCreate payload. This type contract violation was previously flagged.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9225aa9 and 1759942.

📒 Files selected for processing (1)
  • backend/app/tests/services/doctransformer/test_service/test_integration.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/services/doctransformer/test_service/test_integration.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/tests/services/doctransformer/test_service/test_integration.py
🧬 Code graph analysis (1)
backend/app/tests/services/doctransformer/test_service/test_integration.py (8)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/services/doctransform/job.py (2)
  • execute_job (111-271)
  • start_job (35-64)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (3)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • job_id (36-37)
backend/app/tests/conftest.py (1)
  • db (28-45)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)
  • current_user (69-77)
backend/app/tests/services/doctransformer/test_service/utils.py (1)
  • MockTestTransformer (16-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 96.63158% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/services/doctransform/job.py 90.47% 10 Missing ⚠️
backend/app/services/documents/helpers.py 93.75% 4 Missing ⚠️
backend/app/models/doc_transformation_job.py 91.66% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)

32-67: Missing type hint for task_instance parameter.

The task_instance parameter at line 52 still lacks a type annotation. This was flagged in a previous review.

backend/app/services/doctransform/job.py (1)

231-267: Guard against UnboundLocalError in error handling.

If UUID(job_id) fails at line 126, job_uuid is never assigned, but the except block references it at lines 234, 243, 251, and 262. This causes UnboundLocalError and masks the original failure. Initialize job_uuid: UUID | None = None before the try block and guard DB operations:

+    job_uuid: UUID | None = None
+
     try:
         job_uuid = UUID(job_id)
         ...
     except Exception as e:
         logger.error(
             "[doc_transform.execute_job] FAILED | job_id=%s | error=%s",
-            job_uuid,
+            job_uuid or job_id,
             e,
             exc_info=True,
         )
 
-        try:
-            with Session(engine) as db:
-                job_crud = DocTransformationJobCrud(session=db, project_id=project_id)
-                job_for_payload = job_crud.update(
-                    job_uuid,
-                    DocTransformJobUpdate(
-                        status=TransformationStatus.FAILED, error_message=str(e)
-                    ),
-                )
-        except Exception as db_error:
-            logger.error(
-                "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s",
-                job_uuid,
-                db_error,
-            )
+        if job_uuid is not None:
+            try:
+                with Session(engine) as db:
+                    job_crud = DocTransformationJobCrud(session=db, project_id=project_id)
+                    job_for_payload = job_crud.update(
+                        job_uuid,
+                        DocTransformJobUpdate(
+                            status=TransformationStatus.FAILED, error_message=str(e)
+                        ),
+                    )
+            except Exception as db_error:
+                logger.error(
+                    "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s",
+                    job_uuid,
+                    db_error,
+                )
🧹 Nitpick comments (4)
backend/app/tests/services/doctransformer/test_job/test_start_job.py (1)

31-35: Redundant commit after DocTransformationJobCrud.create

DocTransformationJobCrud.create already adds, commits, and refreshes the job; the extra db.commit() in _create_job is unnecessary in these tests. You can drop it to rely on the CRUD abstraction for transaction handling.

backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (2)

30-36: Update fast_execute_job type hint to match the new execute_job signature

fast_execute_job is annotated as Callable[[int, Any, str, str], Any], but the tests now call it with eight arguments: (project_id, job_id, source_document_id, transformer_name, target_format, task_id, callback_url, task_instance). This will confuse type checkers.

Consider updating the annotation to reflect the actual signature, e.g.:

-        fast_execute_job: Callable[[int, Any, str, str], Any],
+        fast_execute_job: Callable[
+            [int, str, str, str, str, str, str | None, Any],
+            Any,
+        ],

Apply the same adjustment anywhere else this fixture is typed.

As per coding guidelines


61-71: Strengthen pytest.raises(Exception) assertions

Several tests use pytest.raises(Exception) to assert failures (e.g., storage error, exhausted retries, DB error during completion). This matches the current implementation (which re-raises a generic Exception), but it’s a weak contract and triggers Ruff’s B017 warning.

If feasible, either:

  • Raise a more specific exception type from the implementation and assert on that, or
  • At least use match="...expected message..." so the tests verify the failure mode, not just “some Exception”.

This would make the tests more robust and clear about the expected error conditions.

Also applies to: 159-169, 213-223

backend/app/services/doctransform/job.py (1)

171-172: Consider expanding the content type mapping.

The current implementation only special-cases "markdown" and falls back to "text/plain" for all other formats. Tests verify behavior for multiple formats including "text" and "html". Adding explicit mappings improves clarity:

-        content_type_map = {"markdown": "text/markdown; charset=utf-8"}
+        content_type_map = {
+            "markdown": "text/markdown; charset=utf-8",
+            "text": "text/plain; charset=utf-8",
+            "html": "text/html; charset=utf-8",
+        }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34405cb and ef76fc7.

📒 Files selected for processing (8)
  • backend/app/models/doc_transformation_job.py (2 hunks)
  • backend/app/services/doctransform/job.py (1 hunks)
  • backend/app/tests/services/doctransformer/test_job/conftest.py (2 hunks)
  • backend/app/tests/services/doctransformer/test_job/test_execute_job.py (9 hunks)
  • backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (9 hunks)
  • backend/app/tests/services/doctransformer/test_job/test_integration.py (4 hunks)
  • backend/app/tests/services/doctransformer/test_job/test_start_job.py (1 hunks)
  • backend/app/tests/services/doctransformer/test_job/utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/services/doctransformer/test_job/utils.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/tests/services/doctransformer/test_job/test_execute_job.py
  • backend/app/tests/services/doctransformer/test_job/test_start_job.py
  • backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py
  • backend/app/tests/services/doctransformer/test_job/conftest.py
  • backend/app/tests/services/doctransformer/test_job/test_integration.py
  • backend/app/services/doctransform/job.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/doc_transformation_job.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/doctransform/job.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/tests/services/doctransformer/test_job/utils.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/tests/services/doctransformer/test_job/test_execute_job.py
  • backend/app/tests/services/doctransformer/test_job/test_integration.py
  • backend/app/services/doctransform/job.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/cloud/storage.py : Implement cloud storage integration in backend/app/core/cloud/storage.py

Applied to files:

  • backend/app/tests/services/doctransformer/test_job/utils.py
📚 Learning: 2025-08-06T05:29:45.557Z
Learnt from: avirajsingh7
Repo: ProjectTech4DevAI/ai-platform PR: 319
File: backend/app/crud/prompt.py:111-118
Timestamp: 2025-08-06T05:29:45.557Z
Learning: In FastAPI applications using Pydantic/SQLModel, when update model fields have min_length=1 validation, empty strings are rejected at the validation layer before reaching CRUD functions. This means checking `if field:` vs `if field is not None:` in CRUD operations should consider the model validation constraints.

Applied to files:

  • backend/app/models/doc_transformation_job.py
🧬 Code graph analysis (8)
backend/app/tests/services/doctransformer/test_job/utils.py (1)
backend/app/services/doctransform/transformer.py (1)
  • Transformer (5-14)
backend/app/models/doc_transformation_job.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py (6)
backend/app/services/doctransform/registry.py (1)
  • TransformationError (7-8)
backend/app/services/doctransform/job.py (1)
  • execute_job (110-270)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_job/utils.py (1)
  • MockTestTransformer (16-27)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
  • fast_execute_job (32-67)
backend/app/tests/services/doctransformer/test_job/test_start_job.py (5)
backend/app/services/doctransform/job.py (1)
  • start_job (34-63)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_job/utils.py (2)
  • DocTransformTestBase (30-69)
  • MockTestTransformer (16-27)
backend/app/tests/services/doctransformer/test_job/conftest.py (2)
  • current_user (71-79)
  • test_document (89-95)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (4)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_job/utils.py (5)
  • DocTransformTestBase (30-69)
  • MockTestTransformer (16-27)
  • failing_convert_document (76-85)
  • create_failing_convert_document (72-87)
  • persistent_failing_convert_document (95-96)
backend/app/tests/conftest.py (1)
  • db (28-45)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
  • fast_execute_job (32-67)
backend/app/tests/services/doctransformer/test_job/conftest.py (2)
backend/app/services/doctransform/job.py (1)
  • execute_job (110-270)
backend/app/models/doc_transformation_job.py (1)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_job/test_integration.py (7)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/services/doctransform/job.py (2)
  • execute_job (110-270)
  • start_job (34-63)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • job_id (36-37)
backend/app/models/user.py (1)
  • UserProjectOrg (58-59)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
  • current_user (71-79)
backend/app/tests/services/doctransformer/test_job/utils.py (1)
  • MockTestTransformer (16-27)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
  • DocTransformationJobCrud (21-95)
  • update (73-95)
  • read_one (36-55)
backend/app/crud/document/document.py (3)
  • DocumentCrud (13-134)
  • update (97-122)
  • read_one (18-34)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformJobUpdate (52-57)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (3)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/doctransform/registry.py (1)
  • convert_document (109-130)
🪛 Ruff (0.14.5)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py

149-149: Do not assert blind exception: Exception

(B017)

backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py

159-159: Do not assert blind exception: Exception

(B017)

backend/app/services/doctransform/job.py

118-118: Unused function argument: task_instance

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (14)
backend/app/tests/services/doctransformer/test_job/utils.py (1)

10-27: Import path update for Transformer looks correct

Switching to app.services.doctransform.transformer.Transformer matches the new services-based layout and keeps the mock aligned with the real ABC. No further changes needed here.

backend/app/models/doc_transformation_job.py (1)

21-57: DocTransformationJob fields and DTOs align well with CRUD usage

The new optional fields (transformed_document_id, task_id, trace_id, error_message) and timestamps are consistent with how DocTransformationJobCrud.create/update use them, and DocTransformJobCreate/DocTransformJobUpdate fit the DTO pattern for create/patch flows. The properties (job_id, job_inserted_at, job_updated_at) give a clean surface for public models without altering persistence behavior.

backend/app/tests/services/doctransformer/test_job/test_start_job.py (1)

81-105: Verify HTTPException type alignment in negative test

DocTransformationJobCrud.read_one raises an HTTP 404 when the job is missing, and this test uses pytest.raises(HTTPException) with HTTPException imported from app.core.exception_handlers. Please double-check that this is the same exception class (or an alias of) the one raised in the CRUD layer (likely FastAPI’s HTTPException); otherwise, the assertion may not catch the error.

backend/app/tests/services/doctransformer/test_job/test_integration.py (2)

32-88: End-to-end integration test matches the new workflow

The integration test now creates jobs via DocTransformJobCreate, schedules them with start_job, and executes them through execute_job using the updated arguments (job_id/source_document_id as strings plus task_id/callback_url). Patching Session, start_low_priority_job, and TRANSFORMERS is correctly targeted to the services module, and the assertions on job status, transformed document linkage, and filename look solid.


100-151: Concurrency and multi-format tests exercise key execute_job paths

The concurrent-jobs and multi-format tests both operate through DocTransformationJobCrud + execute_job, with proper patching of Session and the transformer registry. Verifying status == COMPLETED, non-null transformed_document_id, and format-specific filename extensions gives good coverage of the refactored execution pipeline across multiple jobs and output formats.

Also applies to: 153-187

backend/app/tests/services/doctransformer/test_job/conftest.py (1)

14-14: LGTM: Import updated to new service layer.

The import correctly references the refactored module path under app.services.doctransform.

backend/app/tests/services/doctransformer/test_job/test_execute_job.py (6)

14-18: LGTM: Imports updated to new service layer.

The imports correctly reference the refactored module paths and the new public model DocTransformJobCreate.


37-87: LGTM: Test updated to use new service layer and public models.

The test correctly uses DocTransformJobCreate for job creation and calls execute_job with the expanded signature, including proper string conversion of UUIDs.


91-123: LGTM: Error handling test updated correctly.

The test properly verifies behavior when a job doesn't exist, using the updated module paths and expanded signature.


126-165: LGTM: Missing document test is correct.

The test properly verifies that when a source document is missing from S3, the job transitions to FAILED status. The broad Exception catch at line 149 is appropriate here—the test should pass regardless of the specific exception type raised by the storage layer.


168-213: LGTM: Transformer error handling test is comprehensive.

The test correctly mocks a transformation failure and verifies that the error is properly caught, persisted, and propagates through the retry mechanism.


216-304: LGTM: Status transition and content type tests are well-structured.

Both tests correctly verify job status transitions and content type handling across multiple formats using the updated API surface and module paths.

backend/app/services/doctransform/job.py (2)

34-63: LGTM: Job scheduling logic is correct.

The start_job function properly captures the trace ID, updates the job record, schedules a low-priority Celery task, and returns the job UUID. The return type correctly matches the actual return value.


66-106: LGTM: Payload builders correctly structure success and failure responses.

Both functions properly construct APIResponse envelopes with appropriate data: success includes the transformed document with optional signed URL, while failure explicitly nulls the transformed document. The exclusion of error_message from the nested data envelope is correct.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (1)

30-35: Fix incomplete type hint for fast_execute_job parameter.

The type hint Callable[[int, Any, str, str], Any] is incomplete. According to the fixture definition in conftest.py, fast_execute_job takes 8 parameters, not 4.

This same issue appears on lines 84, 129, and 178.

As per coding guidelines.

Apply this diff to fix all occurrences:

     def test_execute_job_with_storage_error(
         self,
         db: Session,
         test_document: Tuple[Document, Project],
-        fast_execute_job: Callable[[int, Any, str, str], Any],
+        fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any],
     ) -> None:
     def test_execute_job_retry_mechanism(
         self,
         db: Session,
         test_document: Tuple[Document, Project],
-        fast_execute_job: Callable[[int, Any, str, str], Any],
+        fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any],
     ) -> None:
     def test_execute_job_exhausted_retries(
         self,
         db: Session,
         test_document: Tuple[Document, Project],
-        fast_execute_job: Callable[[int, Any, str, str], Any],
+        fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any],
     ) -> None:
     def test_execute_job_database_error_during_completion(
         self,
         db: Session,
         test_document: Tuple[Document, Project],
-        fast_execute_job: Callable[[int, Any, str, str], Any],
+        fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any],
     ) -> None:
🧹 Nitpick comments (6)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (2)

15-22: Consolidate imports from the same module.

Lines 15-22 import from app.tests.services.doctransformer.test_job.utils in two separate statements. Consider consolidating them into a single import block for better readability.

Apply this diff:

-from app.tests.services.doctransformer.test_job.utils import (
-    DocTransformTestBase,
-    MockTestTransformer,
-)
-from app.tests.services.doctransformer.test_job.utils import (
-    create_failing_convert_document,
-    create_persistent_failing_convert_document,
-)
+from app.tests.services.doctransformer.test_job.utils import (
+    DocTransformTestBase,
+    MockTestTransformer,
+    create_failing_convert_document,
+    create_persistent_failing_convert_document,
+)

60-60: Consider using more specific exception types in test assertions.

Lines 60, 156, and 209 use pytest.raises(Exception) which catches any exception. While acceptable in tests, using more specific exception types would make the tests more precise and catch regressions where the wrong exception type is raised.

For example, if the code raises a specific exception type for these failure scenarios, you could update:

with pytest.raises(SpecificExceptionType):

However, if testing that any exception properly fails the job is the intended behavior, the current implementation is acceptable.

backend/app/tests/services/doctransformer/test_job/test_integration.py (1)

90-132: “Concurrent” test currently exercises sequential execution only

test_execute_job_concurrent_jobs runs three jobs one after another under a shared S3 setup and patched Session/TRANSFORMERS, which is fine for verifying that multiple jobs don’t interfere. If you actually want to exercise concurrency (threads/tasks), consider running the execute_job calls in parallel or renaming the test to avoid implying true concurrency.

backend/app/tests/services/doctransformer/test_job/test_execute_job.py (3)

95-122: Update type hints for fast_execute_job to match its actual callable signature

The fast_execute_job fixture is annotated here as Callable[[int, UUID, str, str], Any], but the fixture in conftest.py wraps execute_job and actually exposes the full eight-argument signature. This mismatch can confuse type checkers and future readers.

Consider either:

  • Updating the annotation to the full signature, or
  • Simplifying to Callable[..., Any] to avoid duplicating the exact signature in multiple places.

As per coding guidelines.


136-160: Avoid using a blind Exception in pytest.raises where possible

test_execute_job_with_missing_source_document uses with pytest.raises(Exception):, which Ruff flags (B017) as a blind exception assertion. If you know the specific exception types that can be raised for a missing source document (e.g., an HTTP- or storage-related error), it’s better to assert those explicitly, or use a small tuple of expected types, similar to other tests in this file.

If the failure mode is intentionally broad and may vary by backend, consider adding a short comment explaining why a broad Exception is acceptable here.


254-266: Docstring mentions content types, but test currently verifies only file extensions

test_execute_job_with_different_content_types describes verifying “correct content types,” but the assertions only check the transformed filename extensions. If you need to validate actual content types (e.g., S3 metadata or HTTP headers), consider extending the test to assert those as well; otherwise, adjust the docstring to reflect that this test is focused on output extensions.

Also applies to: 275-296

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef76fc7 and 0b0eb4c.

📒 Files selected for processing (4)
  • backend/app/tests/services/doctransformer/test_job/test_execute_job.py (9 hunks)
  • backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (8 hunks)
  • backend/app/tests/services/doctransformer/test_job/test_integration.py (4 hunks)
  • backend/app/tests/services/doctransformer/test_job/test_start_job.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/tests/services/doctransformer/test_job/test_start_job.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/services/doctransformer/test_job/test_integration.py
  • backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py
  • backend/app/tests/services/doctransformer/test_job/test_execute_job.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/tests/services/doctransformer/test_job/test_integration.py
  • backend/app/tests/services/doctransformer/test_job/test_execute_job.py
🧬 Code graph analysis (3)
backend/app/tests/services/doctransformer/test_job/test_integration.py (6)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/services/doctransform/job.py (2)
  • execute_job (110-270)
  • start_job (34-63)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
  • current_user (71-79)
backend/app/tests/services/doctransformer/test_job/utils.py (1)
  • MockTestTransformer (16-27)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (3)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_job/utils.py (6)
  • DocTransformTestBase (30-69)
  • MockTestTransformer (16-27)
  • failing_convert_document (76-85)
  • create_failing_convert_document (72-87)
  • persistent_failing_convert_document (95-96)
  • create_persistent_failing_convert_document (90-98)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
  • fast_execute_job (32-67)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py (6)
backend/app/services/doctransform/registry.py (1)
  • TransformationError (7-8)
backend/app/services/doctransform/job.py (1)
  • execute_job (110-270)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (3)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-49)
  • job_id (36-37)
backend/app/tests/services/doctransformer/test_job/utils.py (1)
  • MockTestTransformer (16-27)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
  • fast_execute_job (32-67)
🪛 Ruff (0.14.5)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py

156-156: Do not assert blind exception: Exception

(B017)

backend/app/tests/services/doctransformer/test_job/test_execute_job.py

149-149: Do not assert blind exception: Exception

(B017)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (9)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (4)

42-42: LGTM! Consistent use of payload-based job creation.

The migration to DocTransformJobCreate payload for job creation is correctly applied across all four test methods (lines 42, 92, 137, 186), aligning with the updated CRUD API.


46-51: LGTM! Module paths correctly updated.

All patch paths have been consistently updated to the new app.services.doctransform.* module structure throughout the test suite (lines 46-51, 98-104, 145-151, 189-199).


61-70: LGTM! Function calls properly updated with new parameters.

All fast_execute_job calls correctly stringify IDs and include the new parameters (task_id, callback_url, task_instance) consistently throughout the test suite (lines 61-70, 109-118, 157-166, 210-219). The use of uuid4() for generating test task IDs is appropriate.


25-224: LGTM! Well-structured error handling tests.

The test suite comprehensively covers error scenarios (storage failures, transient/persistent errors, database errors) with proper mocking, assertions, and verification of both job status and error messages. The test structure follows best practices with clear AAA pattern.

backend/app/tests/services/doctransformer/test_job/test_integration.py (2)

13-24: start_job/execute_job wiring and DocTransformJobCreate usage look correct

Imports, job creation via DocTransformJobCreate, the start_low_priority_job patch target, and the start_job/execute_job calls (string IDs, task_id, callback_url, task_instance) are all consistent with the updated service API and CRUD layer. No issues spotted here.

Also applies to: 40-79


133-185: Multi-format transformation coverage is aligned with service behavior

The multi-format test using DocTransformJobCreate plus the updated execute_job signature, and asserting transformed file extensions per target_format, correctly tracks the service’s FORMAT_TO_EXTENSION behavior. Looks good.

backend/app/tests/services/doctransformer/test_job/test_execute_job.py (3)

14-21: Imports updated to new services/registry/models APIs are consistent

Switching to app.services.doctransform.job.execute_job, TransformationError from the new registry, and DocTransformJobCreate from app.models is consistent with the current service/public model layout. No issues here.


51-72: Happy-path execute_job test is correctly aligned with the new signature

Creating the job via DocTransformJobCreate, patching app.services.doctransform.job.Session and app.services.doctransform.registry.TRANSFORMERS, and calling execute_job with string IDs plus task_id, callback_url, and task_instance matches the implementation in app.services.doctransform.job.execute_job. The assertions on status, error_message, and transformed document metadata look sound.


138-159: DocTransformJobCreate usage and execute_job/fast_execute_job calls are consistently updated

Across the missing-document, transformer-error, status-transition, and content-type tests, job creation via DocTransformJobCreate and the calls to execute_job/fast_execute_job (string IDs, task_id, callback_url, task_instance) are all consistent with the updated service and CRUD contracts. The status and error assertions correctly reflect the expected outcomes.

Also applies to: 179-212, 223-245, 275-295

@nishika26 nishika26 added enhancement New feature or request ready-for-review labels Nov 18, 2025
@nishika26 nishika26 changed the title document : module enhancement Document : Module Enhancements Nov 19, 2025
@nishika26 nishika26 changed the title Document : Module Enhancements Document : moving transform job to celery and adding callback url Nov 19, 2025
job_id: UUID = Field(description="The unique identifier of the transformation job")
source_format: str = Field(description="The format of the source document")
target_format: str = Field(description="The format of the target document")
status: TransformationStatus
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why we removed the following?

    source_format: str = Field(description="The format of the source document")
    target_format: str = Field(description="The format of the target document")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the message that goes with the response body already contains this information of the transformation that is happening from ehich source format to which target format, so i do not think we need to repeat this information

Copy link
Copy Markdown
Collaborator

@Prajna1999 Prajna1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic comments:

  1. Avoid leaky abstractions.
  2. Use types in a sparse manner.
  3. Avoid excessive metadata.
  4. Use libraries instead of primitives for file I/Os

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1)

11-11: Remove unused import.

The postgresql dialect is imported but never used in this migration.

Apply this diff:

-from sqlalchemy.dialects import postgresql
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd6609 and 7317a6f.

📒 Files selected for processing (1)
  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (2)

20-31: LGTM! Migration upgrade logic is correct.

The upgrade operations are safe and correctly implement the requirements:

  • Adding nullable task_id and trace_id columns won't affect existing rows
  • Renaming created_at to inserted_at is straightforward
  • Using AutoString is appropriate for ID fields

34-39: LGTM! Downgrade path correctly reverses all changes.

The downgrade operations properly reverse the upgrade in the correct order, ensuring the migration is fully reversible.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
backend/app/services/doctransform/job.py (3)

34-63: Align start_job return type annotation with the actual UUID return value

start_job is annotated to return str but returns job.id (a UUID). Callers already wrap it with str(...) when needed, so changing the annotation to UUID is the least noisy and most accurate fix.

 def start_job(
     db: Session,
     current_user: CurrentUserOrgProject,
     job_id: UUID,
     transformer_name: str,
     target_format: str,
     callback_url: str | None,
-) -> str:
+) -> UUID:
@@
-    logger.info(
-        f"[start_job] Job scheduled for document transformation | id: {job.id}, project_id: {project_id}, task_id: {task_id}"
-    )
-    return job.id
+    logger.info(
+        f"[start_job] Job scheduled for document transformation | id: {job.id}, project_id: {project_id}, task_id: {task_id}"
+    )
+    return job.id

116-119: Optional: make task_instance explicitly unused to satisfy Ruff

task_instance is required by Celery but unused, which Ruff flags (ARG001). Renaming or adding a noqa makes this intent clear and keeps lint clean without changing behavior.

-def execute_job(
+def execute_job(
@@
-    task_id: str,
-    callback_url: str | None,
-    task_instance,
+    task_id: str,
+    callback_url: str | None,
+    task_instance,  # noqa: ARG001  # provided by Celery, not used directly
 ):

Alternatively, rename to _task_instance and drop the noqa.


109-267: Guard against job_uuid being undefined when UUID(job_id) fails in execute_job

If UUID(job_id) raises (e.g. malformed ID), job_uuid is never assigned, but the except block logs and uses it for DB updates, which will trigger an UnboundLocalError and mask the original failure. Initializing job_uuid and conditioning DB updates on it makes the error path safe.

 @retry(wait=wait_exponential(multiplier=5, min=5, max=10), stop=stop_after_attempt(3))
 def execute_job(
@@
-    start_time = time.time()
-    tmp_dir: Path | None = None
-
-    job_for_payload = None  # keep latest job snapshot for payloads
-
-    try:
-        job_uuid = UUID(job_id)
+    start_time = time.time()
+    tmp_dir: Path | None = None
+
+    job_for_payload = None  # keep latest job snapshot for payloads
+    job_uuid: UUID | None = None
+
+    try:
+        job_uuid = UUID(job_id)
         source_uuid = UUID(source_document_id)
@@
-    except Exception as e:
-        logger.error(
-            "[doc_transform.execute_job] FAILED | job_id=%s | error=%s",
-            job_uuid,
-            e,
-            exc_info=True,
-        )
-
-        try:
-            with Session(engine) as db:
-                job_crud = DocTransformationJobCrud(session=db, project_id=project_id)
-                job_for_payload = job_crud.update(
-                    job_uuid,
-                    DocTransformJobUpdate(
-                        status=TransformationStatus.FAILED, error_message=str(e)
-                    ),
-                )
-        except Exception as db_error:
-            logger.error(
-                "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s",
-                job_uuid,
-                db_error,
-            )
+    except Exception as e:
+        logger.error(
+            "[doc_transform.execute_job] FAILED | job_id=%s | error=%s",
+            job_uuid or job_id,
+            e,
+            exc_info=True,
+        )
+
+        if job_uuid is not None:
+            try:
+                with Session(engine) as db:
+                    job_crud = DocTransformationJobCrud(session=db, project_id=project_id)
+                    job_for_payload = job_crud.update(
+                        job_uuid,
+                        DocTransformJobUpdate(
+                            status=TransformationStatus.FAILED, error_message=str(e)
+                        ),
+                    )
+            except Exception as db_error:
+                logger.error(
+                    "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s",
+                    job_uuid,
+                    db_error,
+                )
🧹 Nitpick comments (3)
backend/app/api/routes/doc_transformation_job.py (1)

18-97: Transformation job routes correctly delegate to CRUD and helpers

The routes cleanly hand off to DocTransformationJobCrud and the build_job_schema(s) helpers, and the include_url flag is respected via get_cloud_storage. The prefix/tag choices also align these under the documents surface.

If you need deterministic ordering for jobs_not_found (e.g., for clients/tests), consider wrapping list(jobs_not_found) in sorted(...); otherwise this is fine as-is.

backend/app/services/documents/helpers.py (1)

1-3: Modernize type annotations to match Ruff and Python 3.11 style

The helpers are well-structured, but you can align with Ruff’s UP0xx suggestions and current typing best practices by:

  • Importing Iterable from collections.abc.
  • Using built-in tuple[...] instead of Tuple[...].
  • Using X | Y instead of Union[X, Y].
-from typing import Optional, Tuple, Iterable, Union
+from collections.abc import Iterable
+from typing import Optional
@@
-) -> Tuple[str, str | None]:
+) -> tuple[str, str | None]:
@@
-PublicDoc = Union[DocumentPublic, TransformedDocumentPublic]
+PublicDoc = DocumentPublic | TransformedDocumentPublic

The rest of the code can remain unchanged; this should satisfy the Ruff hints without affecting behavior.

Also applies to: 31-32, 110-137

backend/app/api/routes/documents.py (1)

41-70: Document routes: transformation integration and signed URLs look solid

  • list_docs correctly reuses build_document_schemas and conditionally instantiates storage based on include_url and the presence of documents.
  • upload_doc’s use of pre_transform_validation + schedule_transformation cleanly separates validation, persistence, and job scheduling, while still always returning a signed URL for the uploaded document.
  • doc_info now returning APIResponse[DocumentPublic | TransformedDocumentPublic] (conceptually) via build_document_schema matches how transformed docs are represented everywhere else.

As a minor style improvement, you can switch the response_model unions to the X | Y form to align with Python 3.11 and Ruff’s UP007:

-    response_model=APIResponse[list[Union[DocumentPublic, TransformedDocumentPublic]]],
+    response_model=APIResponse[list[DocumentPublic | TransformedDocumentPublic]],
@@
-    response_model=APIResponse[Union[DocumentPublic, TransformedDocumentPublic]],
+    response_model=APIResponse[DocumentPublic | TransformedDocumentPublic],

Also applies to: 72-133, 201-228

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7317a6f and f50b5ea.

📒 Files selected for processing (7)
  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1 hunks)
  • backend/app/api/main.py (1 hunks)
  • backend/app/api/routes/doc_transformation_job.py (1 hunks)
  • backend/app/api/routes/documents.py (6 hunks)
  • backend/app/models/doc_transformation_job.py (2 hunks)
  • backend/app/services/doctransform/job.py (1 hunks)
  • backend/app/services/documents/helpers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/documents/helpers.py
  • backend/app/api/main.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/api/routes/doc_transformation_job.py
  • backend/app/services/doctransform/job.py
  • backend/app/api/routes/documents.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/documents/helpers.py
  • backend/app/services/doctransform/job.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/main.py
  • backend/app/api/routes/doc_transformation_job.py
  • backend/app/api/routes/documents.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/doc_transformation_job.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/services/documents/helpers.py
  • backend/app/api/main.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/api/routes/doc_transformation_job.py
  • backend/app/services/doctransform/job.py
  • backend/app/api/routes/documents.py
📚 Learning: 2025-08-06T05:29:45.557Z
Learnt from: avirajsingh7
Repo: ProjectTech4DevAI/ai-platform PR: 319
File: backend/app/crud/prompt.py:111-118
Timestamp: 2025-08-06T05:29:45.557Z
Learning: In FastAPI applications using Pydantic/SQLModel, when update model fields have min_length=1 validation, empty strings are rejected at the validation layer before reaching CRUD functions. This means checking `if field:` vs `if field is not None:` in CRUD operations should consider the model validation constraints.

Applied to files:

  • backend/app/models/doc_transformation_job.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/

Applied to files:

  • backend/app/models/doc_transformation_job.py
🧬 Code graph analysis (6)
backend/app/services/documents/helpers.py (6)
backend/app/services/doctransform/registry.py (4)
  • get_available_transformers (75-79)
  • get_file_format (53-59)
  • is_transformation_supported (70-72)
  • resolve_transformer (82-106)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformJobCreate (48-51)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/models/document.py (5)
  • TransformationJobInfo (67-74)
  • Document (21-41)
  • DocumentPublic (44-57)
  • DocTransformationJobPublic (82-87)
  • TransformedDocumentPublic (60-64)
backend/app/services/doctransform/job.py (1)
  • start_job (34-63)
backend/app/api/main.py (1)
backend/app/tests/crud/documents/documents/test_crud_document_update.py (1)
  • documents (12-15)
backend/app/models/doc_transformation_job.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/api/routes/doc_transformation_job.py (6)
backend/app/crud/document/doc_transformation_job.py (3)
  • DocTransformationJobCrud (21-95)
  • read_one (36-55)
  • read_each (57-71)
backend/app/crud/document/document.py (3)
  • DocumentCrud (13-134)
  • read_one (18-34)
  • read_each (72-95)
backend/app/models/document.py (3)
  • DocTransformationJobPublic (82-87)
  • DocTransformationJobsPublic (90-92)
  • TransformedDocumentPublic (60-64)
backend/app/utils.py (2)
  • APIResponse (30-54)
  • success_response (37-40)
backend/app/services/documents/helpers.py (2)
  • build_job_schema (146-170)
  • build_job_schemas (173-191)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
  • DocTransformationJobCrud (21-95)
  • update (73-95)
  • read_one (36-55)
backend/app/crud/document/document.py (3)
  • DocumentCrud (13-134)
  • update (97-122)
  • read_one (18-34)
backend/app/models/document.py (1)
  • Document (21-41)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformJobUpdate (54-59)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/celery/utils.py (1)
  • start_low_priority_job (46-71)
backend/app/utils.py (3)
  • APIResponse (30-54)
  • success_response (37-40)
  • failure_response (43-54)
backend/app/services/doctransform/registry.py (1)
  • convert_document (109-130)
backend/app/api/routes/documents.py (3)
backend/app/core/cloud/storage.py (2)
  • get_signed_url (135-137)
  • get_signed_url (217-242)
backend/app/models/document.py (5)
  • Document (21-41)
  • DocumentPublic (44-57)
  • TransformedDocumentPublic (60-64)
  • DocumentUploadResponse (77-79)
  • TransformationJobInfo (67-74)
backend/app/services/documents/helpers.py (4)
  • schedule_transformation (69-107)
  • pre_transform_validation (26-66)
  • build_document_schema (119-128)
  • build_document_schemas (131-143)
🪛 Ruff (0.14.5)
backend/app/services/documents/helpers.py

1-1: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)


1-1: typing.Tuple is deprecated, use tuple instead

(UP035)


110-110: Use X | Y for type annotations

(UP007)

backend/app/services/doctransform/job.py

118-118: Unused function argument: task_instance

(ARG001)

backend/app/api/routes/documents.py

44-44: Use X | Y for type annotations

Convert to X | Y

(UP007)


204-204: Use X | Y for type annotations

(UP007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/models/doc_transformation_job.py (1)

23-59: Doc transformation job model & DTOs line up with CRUD and public schemas

The nullable fields, inserted_at default, and DTOs (create/update) are consistent with the CRUD layer and the public DocTransformationJobPublic usage (via job_id/timestamp properties). No issues from a modeling or type-hinting perspective.

backend/app/api/main.py (1)

3-26: Import reordering is safe

Reordering documents in the import list doesn’t affect router registration or behavior; everything still wires up identically.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
backend/app/services/documents/helpers.py (3)

1-1: Modernize type imports for Python 3.11+.

The imports use deprecated typing constructs. Use built-in tuple instead of typing.Tuple, collections.abc.Iterable instead of typing.Iterable, and | syntax instead of Union and Optional.

Apply this diff:

-from typing import Optional, Tuple, Iterable, Union
+from collections.abc import Iterable

Then replace usages throughout:

  • Line 31: Tuple[str, str | None]tuple[str, str | None]
  • Line 46: Optional[str]str | None
  • Line 110: Union[DocumentPublic, TransformedDocumentPublic]DocumentPublic | TransformedDocumentPublic

119-128: Consider a Protocol or concrete type for the storage parameter.

Using object | None as the type hint for storage provides minimal type safety. Define a Protocol with a get_signed_url method or import the concrete storage class to enable proper type checking and IDE support.

Example Protocol approach:

from typing import Protocol

class StorageProtocol(Protocol):
    def get_signed_url(self, url: str) -> str: ...

def build_document_schema(
    *,
    document: Document,
    include_url: bool,
    storage: StorageProtocol | None,
) -> PublicDoc:

173-191: Consider batch-loading documents for efficiency.

The current implementation may result in N+1 queries when processing many jobs with transformed documents, as each calls doc_crud.read_one separately. If job lists grow large, consider batch-loading all transformed documents upfront and building a lookup map.

Example optimization:

def build_job_schemas(
    *,
    jobs: Iterable[DocTransformationJob],
    doc_crud: DocumentCrud,
    include_url: bool,
    storage: object | None,
) -> list[DocTransformationJobPublic]:
    jobs_list = list(jobs)
    # Batch-load all transformed documents
    doc_ids = {j.transformed_document_id for j in jobs_list if j.transformed_document_id}
    docs_map = {doc.id: doc for doc_id in doc_ids if (doc := doc_crud.read_one(doc_id))}
    
    # Build schemas using the map...
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f50b5ea and ac79b08.

📒 Files selected for processing (1)
  • backend/app/services/documents/helpers.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/services/documents/helpers.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/documents/helpers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/services/documents/helpers.py
🧬 Code graph analysis (1)
backend/app/services/documents/helpers.py (6)
backend/app/services/doctransform/registry.py (4)
  • get_available_transformers (75-79)
  • get_file_format (53-59)
  • is_transformation_supported (70-72)
  • resolve_transformer (82-106)
backend/app/crud/document/doc_transformation_job.py (1)
  • DocTransformationJobCrud (21-95)
backend/app/crud/document/document.py (1)
  • DocumentCrud (13-134)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformJobCreate (48-51)
  • TransformationStatus (11-15)
  • DocTransformationJob (18-45)
  • job_id (36-37)
backend/app/models/document.py (4)
  • Document (21-41)
  • DocumentPublic (44-57)
  • DocTransformationJobPublic (82-87)
  • TransformedDocumentPublic (60-64)
backend/app/services/doctransform/job.py (1)
  • start_job (34-63)
🪛 Ruff (0.14.5)
backend/app/services/documents/helpers.py

1-1: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)


1-1: typing.Tuple is deprecated, use tuple instead

(UP035)


110-110: Use X | Y for type annotations

(UP007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/services/documents/helpers.py (3)

26-66: LGTM: Well-structured validation logic.

The pre-transformation validation properly detects source format, validates transformation support, and resolves the transformer with clear error messages.


113-116: LGTM: Clean schema conversion logic.

The function correctly determines the appropriate public schema based on whether the document is a transformation.


146-170: LGTM: Proper job schema construction.

The function correctly builds job schemas and handles the optional transformed document with appropriate error propagation from read_one.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)

76-76: Add return type hint to fixture.

The route fixture is missing a return type hint. As per coding guidelines, type hints should be used in Python code for this Python 3.11+ project.

Apply this diff to add the return type hint:

 @pytest.fixture
-def route():
+def route() -> Route:
     return Route("")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac79b08 and 55a60f3.

📒 Files selected for processing (1)
  • backend/app/tests/api/routes/documents/test_route_document_upload.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/api/routes/documents/test_route_document_upload.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
🧬 Code graph analysis (1)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
backend/app/tests/utils/document.py (1)
  • Route (83-110)
🔇 Additional comments (2)
backend/app/tests/api/routes/documents/test_route_document_upload.py (2)

183-183: Endpoint path is consistent with actual implementation.

The status_check_url correctly uses the singular /documents/transformation/{id} form. This is consistent across the helper service (backend/app/services/documents/helpers.py:106), all related test cases, and the entire codebase. No remaining plural references exist.


154-154: Patch paths are correct and verified.

The patch decorators at lines 154, 187, and 277 correctly reference app.services.doctransform.job.start_job, which has been confirmed to exist at backend/app/services/doctransform/job.py:34. The function signature is well-defined with proper parameters (db, current_user, job_id, transformer_name, target_format, callback_url), and no references to the old path remain in the codebase. The architectural shift from app.core.doctransform.service to app.services.doctransform.job is complete and intentional across the entire codebase.

Copy link
Copy Markdown
Collaborator

@Prajna1999 Prajna1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/app/api/main.py (1)

3-27: Consider sorting imports alphabetically for consistency.

The import statements are not alphabetically sorted, while the include_router calls below (lines 31-52) are. This mismatch can make it harder to verify that all imported routers are registered.

Apply this diff to sort imports alphabetically:

 from app.api.routes import (
     api_keys,
     assistants,
+    collection_job,
     collections,
     config,
-    doc_transformation_job,
-    documents,
-    login,
-    llm,
-    organization,
-    openai_conversation,
-    project,
-    responses,
-    private,
-    threads,
-    users,
-    utils,
-    onboarding,
     credentials,
     cron,
+    doc_transformation_job,
+    documents,
     evaluation,
     fine_tuning,
+    llm,
+    login,
     model_evaluation,
-    collection_job,
+    onboarding,
+    openai_conversation,
+    organization,
+    private,
+    project,
+    responses,
+    threads,
+    users,
+    utils,
 )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55a60f3 and 9368563.

📒 Files selected for processing (2)
  • backend/app/api/main.py (1 hunks)
  • backend/app/models/__init__.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/api/main.py
  • backend/app/models/__init__.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/main.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/__init__.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/

Applied to files:

  • backend/app/api/main.py
  • backend/app/models/__init__.py
🧬 Code graph analysis (2)
backend/app/api/main.py (1)
backend/app/tests/crud/documents/documents/test_crud_document_update.py (1)
  • documents (12-15)
backend/app/models/__init__.py (2)
backend/app/models/document.py (3)
  • DocTransformationJobPublic (82-87)
  • DocTransformationJobsPublic (90-92)
  • TransformedDocumentPublic (60-64)
backend/app/models/doc_transformation_job.py (4)
  • DocTransformationJob (18-45)
  • TransformationStatus (11-15)
  • DocTransformJobCreate (48-51)
  • DocTransformJobUpdate (54-59)
🔇 Additional comments (1)
backend/app/models/__init__.py (1)

45-53: Re-exports for document transformation models/DTOs look correct

The newly added imports for DocTransformationJobPublic, DocTransformationJobsPublic, TransformedDocumentPublic, DocTransformJobCreate, and DocTransformJobUpdate are consistent with the new document-transformation model/DTO structure and keep the models package’s public surface coherent. No issues from this change alone.

Also applies to: 54-59

@nishika26 nishika26 merged commit f6dc638 into main Nov 23, 2025
3 checks passed
@nishika26 nishika26 deleted the enhancement/doc_transform_to_celery branch November 23, 2025 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Documents : Module Enhancements Documents: Migrate transformation job to celery

3 participants