Document : moving transform job to celery and adding callback url#437
Document : moving transform job to celery and adding callback url#437
Conversation
WalkthroughMoves doctransform code from app.core.doctransform into app.services.doctransform, adds nullable task_id/trace_id and renames created_at→inserted_at via a migration, converts CRUD to DTO-based create/update, introduces document helper orchestration and public response models, updates routes and tests, and adds a new job execution service. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Documents API
participant JobCRUD as DocTransformationJob CRUD
participant Service as services.doctransform.job
participant Queue as Celery Queue
participant Worker as Celery Worker
participant DocCRUD as Document CRUD
participant Storage as Cloud Storage
participant Callback as Callback Endpoint
Client->>API: upload/create (callback_url optional)
API->>JobCRUD: create(DocTransformJobCreate)
JobCRUD-->>API: Job (PENDING)
API->>Service: start_job(job_id, transformer, target_format, callback_url)
Service->>Queue: enqueue execute_job(job_id, source_document_id, task_id, callback_url)
Queue-->>API: task_id
API-->>Client: response (transformation_job_id)
Worker->>Service: execute_job(job_id, source_document_id, task_id, callback_url)
Service->>JobCRUD: update(job_id, patch: status=PROCESSING, task_id=...)
Service->>Storage: download source
Service->>Service: convert_document(...)
Service->>Storage: upload transformed
Service->>DocCRUD: create TransformedDocument
Service->>JobCRUD: update(job_id, patch: status=COMPLETED, transformed_document_id=...)
alt success & signed URL available
Service->>Storage: get_signed_url
Service->>Callback: POST success_payload (transformed_document + signed_url)
else failure
Service->>JobCRUD: update(job_id, patch: status=FAILED, error_message=...)
Service->>Callback: POST failure_payload
end
Service->>Service: cleanup temp files
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/models/document.py (1)
46-57: Critical: Duplicate field definition.The
signed_urlfield is defined twice inDocumentPublic(lines 46-48 and 55-57). This is a syntax error that will cause the second definition to override the first, though both are identical. One definition must be removed.Apply this diff to remove the duplicate:
class DocumentPublic(DocumentBase): id: UUID = Field(description="The unique identifier of the document") - signed_url: str | None = Field( - default=None, description="A signed URL for accessing the document" - ) inserted_at: datetime = Field( description="The timestamp when the document was inserted" ) updated_at: datetime = Field( description="The timestamp when the document was last updated" ) signed_url: str | None = Field( default=None, description="A signed URL for accessing the document" )backend/app/api/routes/documents.py (1)
67-81: Fixcallback_urlForm default – description string is being used as the valueRight now:
callback_url: str | None = Form("URL to call to report endpoint status")This makes
"URL to call to report endpoint status"the default value rather than a description. If a client omitscallback_url, downstream code will treat that literal string as a real callback URL and attempt to call it fromexecute_job, which is both incorrect and potentially noisy.Adjust to match the pattern used for
target_formatandtransformer:- callback_url: str | None = Form("URL to call to report endpoint status"), + callback_url: str | None = Form( + default=None, + description="URL to call to report endpoint status", + ),This keeps behavior aligned with the
str | Noneannotation and avoids bogus callback invocations.
🧹 Nitpick comments (15)
backend/app/tests/services/doctransformer/test_service/test_start_job.py (3)
4-7: Drop unusedTupleimport and prefer built-in generics
Tuplefromtypingis imported but never used, and Ruff flagstyping.Tupleas deprecated. You can safely remove this import; if you need tuple annotations later, prefer the built‑intuple[...]form instead.
32-36: Align_create_jobwithDocTransformationJobCrud.createDTO signature
DocTransformationJobCrud.createis designed to accept aDocTransformJobCreatepayload, but_create_jobpasses aDocTransformationJobinstance and does an extra commit. For clarity and to track the same surface tests exercise in production, consider:-from app.models import ( - Document, - DocTransformationJob, - Project, - TransformationStatus, - UserProjectOrg, -) +from app.models import ( + Document, + DocTransformationJob, + DocTransformJobCreate, + Project, + TransformationStatus, + UserProjectOrg, +) @@ - def _create_job(self, db: Session, project_id: int, source_document_id): - job = DocTransformationJob(source_document_id=source_document_id) - job = DocTransformationJobCrud(db, project_id=project_id).create(job) - db.commit() - return job + def _create_job(self, db: Session, project_id: int, source_document_id): + return DocTransformationJobCrud( + db, project_id=project_id + ).create(DocTransformJobCreate(source_document_id=source_document_id))This keeps the test helper in lockstep with the CRUD API and avoids redundant transaction management.
82-106: Optionally assert scheduler isn’t called for nonexistent jobsThe negative test correctly asserts an
HTTPExceptionfor a nonexistent job. To tighten it, you can also assert the scheduler was never invoked:- with pytest.raises(HTTPException): - with patch( - "app.services.doctransform.job.start_low_priority_job" - ) as mock_schedule: - mock_schedule.return_value = "fake-task-id" - start_job( - db=db, - current_user=current_user, - job_id=nonexistent_job_id, - transformer_name="test-transformer", - target_format="markdown", - callback_url=None, - ) + with patch( + "app.services.doctransform.job.start_low_priority_job" + ) as mock_schedule, pytest.raises(HTTPException): + mock_schedule.return_value = "fake-task-id" + start_job( + db=db, + current_user=current_user, + job_id=nonexistent_job_id, + transformer_name="test-transformer", + target_format="markdown", + callback_url=None, + ) + mock_schedule.assert_not_called()This ensures failures short‑circuit before enqueuing anything.
backend/app/api/routes/documents.py (3)
88-112: Consider initializingactual_transformerfor readability and type-checkers
actual_transformeris only defined inside theif target_format:block but later used inif target_format and actual_transformer:. This is safe at runtime due to short‑circuiting, but static analyzers may flag potential use before assignment.A small refactor makes intent clearer:
- # validate if transformation is possible or not - if target_format: + actual_transformer: str | None = None + + # validate if transformation is possible or not + if target_format: if not is_transformation_supported(source_format, target_format): ...The rest of the logic can stay as‑is;
actual_transformerwill beNonewhen no transformation is requested, and truthy when resolved.
140-146: Avoid unnecessary string cast fortransformation_job_id
TransformationJobInfo.job_idis typed asUUID, andstart_jobreturns the underlying job ID. You currently cast tostrand let Pydantic parse it back:job_info = TransformationJobInfo( message=..., job_id=str(transformation_job_id), ... status_check_url=f"/documents/transformations/{transformation_job_id}", )You can pass the UUID directly and let the f‑string handle stringification for the URL:
- job_info = TransformationJobInfo( - message=f"Document accepted for transformation from {source_format} to {target_format}.", - job_id=str(transformation_job_id), - status=TransformationStatus.PENDING, - transformer=actual_transformer, - status_check_url=f"/documents/transformations/{transformation_job_id}", - ) + job_info = TransformationJobInfo( + message=f"Document accepted for transformation from {source_format} to {target_format}.", + job_id=transformation_job_id, + status=TransformationStatus.PENDING, + transformer=actual_transformer, + status_check_url=f"/documents/transformations/{transformation_job_id}", + )This avoids redundant conversions while keeping the API the same.
227-230: UseDocumentPublic | TransformedDocumentPublicinstead ofUnion[...]Given Python 3.11+, you can simplify the response model annotation per Ruff’s suggestion:
-from typing import Union @@ - response_model=APIResponse[Union[DocumentPublic, TransformedDocumentPublic]], + response_model=APIResponse[DocumentPublic | TransformedDocumentPublic],After this change, the
Unionimport can be removed from the top of the file.backend/app/crud/document/doc_transformation_job.py (1)
26-35: DTO-basedcreate/updatelook good; note thatupdatecannot clear fieldsThe shift to DTOs for
createandupdatematches the rest of the refactor and the tests:
create(DocTransformJobCreate)cleanly constructs aDocTransformationJobfrom a typed payload.update(..., patch: DocTransformJobUpdate)withexclude_unset=True, exclude_none=Trueensures only explicitly provided, non‑Nonefields are applied, andupdated_atis bumped.Two minor notes:
- With
exclude_none=True, you cannot useDocTransformJobUpdateto intentionally clear a field (e.g., reseterror_messageback toNone). If you ever need that behavior, you’ll need a different sentinel or to dropexclude_none=True.- The log message
"update_status"is a bit misleading now thatupdatecan change more than status; consider renaming it to"update"for clarity.Functionally, the implementation aligns well with the tests.
Also applies to: 73-95
backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py (2)
41-43: Align test job creation with DTO-based CRUD API
DocTransformationJobCrud.createis designed to take aDocTransformJobCreatepayload, but these tests pass aDocTransformationJobORM model. It works at runtime because the implementation reconstructs aDocTransformationJobfrompayload.model_dump(), but it drifts from the public API and makes the tests less representative of real usage.Consider switching to the DTO for clarity and type-safety:
-from app.models import Document, Project, TransformationStatus, DocTransformationJob +from app.models import ( + Document, + Project, + TransformationStatus, + DocTransformJobCreate, +) ... -job = job_crud.create(DocTransformationJob(source_document_id=document.id)) +job = job_crud.create(DocTransformJobCreate(source_document_id=document.id))Apply the same pattern in all four tests that create a job.
Also applies to: 92-94, 138-140, 188-190
61-62: Avoid asserting on a blindExceptionin testsThese tests currently use
pytest.raises(Exception), which can mask unexpected failures and is flagged by Ruff (B017). You already know the error surface is constrained (e.g. Tenacity’sRetryErrorand storage/transformer exceptions).Tighten the expectation to the concrete exception(s) you intend to surface, for example:
from tenacity import RetryError with pytest.raises(RetryError): fast_execute_job(...)(or a tuple of specific exception types if needed).
Also applies to: 159-160
backend/app/tests/services/doctransformer/test_service/test_execute_job.py (2)
51-54: UseDocTransformJobCreatein tests to match CRUD interfaceAs in the error tests, these tests call:
job = job_crud.create(DocTransformationJob(source_document_id=document.id))while
DocTransformationJobCrud.createis typed to acceptDocTransformJobCreate. Passing the ORM model works but drifts from the intended DTO-based API and weakens type checking.Updating tests to use the DTO keeps them aligned with the public surface:
-from app.models import Document, Project, TransformationStatus, DocTransformationJob +from app.models import ( + Document, + Project, + TransformationStatus, + DocTransformJobCreate, +) ... -job = job_crud.create(DocTransformationJob(source_document_id=document.id)) +job = job_crud.create(DocTransformJobCreate(source_document_id=document.id))Apply across all job creations in this file.
Also applies to: 141-144, 183-186, 228-231, 279-282
154-164: Tightenpytest.raises(Exception)to specific error types
test_execute_job_with_missing_source_documentasserts on a bareException, which is flagged by Ruff (B017) and can hide unexpected failures.Given the retry layer (
RetryError) and storage/IO error surface, prefer asserting on the concrete exception(s) expected here, e.g.:from tenacity import RetryError with pytest.raises(RetryError): fast_execute_job(...)(or a tuple of specific well-known exceptions).
backend/app/services/doctransform/job.py (1)
34-63: Correctstart_jobreturn type and avoid redundant read
start_jobis annotated to returnstrbut currently returnsjob.id(aUUID), and it performs an extra read after an update:def start_job(... ) -> str: ... job_crud.update(job_id, DocTransformJobUpdate(trace_id=trace_id)) job = job_crud.read_one(job_id) ... return job.idTo align the type hints and avoid the redundant query, consider:
-def start_job(... ) -> str: +def start_job(... ) -> UUID: @@ - job_crud.update(job_id, DocTransformJobUpdate(trace_id=trace_id)) - job = job_crud.read_one(job_id) + job = job_crud.update(job_id, DocTransformJobUpdate(trace_id=trace_id)) @@ - return job.id + return job.idIf the external caller really needs a string, cast at the call site or change the annotation to
-> strandreturn str(job.id)consistently.backend/app/tests/api/routes/test_doc_transformation_job.py (1)
5-10: Tests nicely cover the new job APIs; consider addinginclude_urlcoverageThese tests exercise the new
/documents/transformation/{job_id}and/documents/transformation/routes, DTO-based CRUD (DocTransformJobCreate/DocTransformJobUpdate), cross-project isolation, and a range of validation errors. The assertions againstDocTransformationJobPublic/DocTransformationJobsPublicshapes look consistent with the API design.One gap you might want to close later is explicit coverage for
include_url=Trueon both endpoints to verify that signed URLs are attached when available and that failures in URL generation are handled as expected.Also applies to: 80-97, 140-366
backend/app/api/routes/doc_transformation_job.py (2)
39-53: Gracefully handle failures when generating signed URLsBoth endpoints call
get_signed_urldirectly:storage = get_cloud_storage(...) transformed_doc_schema.signed_url = storage.get_signed_url(document.object_store_url)If storage misconfiguration or the backend raises, these routes will 500 even though the underlying job/document data is fine. In
execute_jobyou already wrap signed URL generation in atry/exceptand log a warning instead of failing the whole operation.For consistency and resilience, consider mirroring that pattern here:
- if include_url: - storage = get_cloud_storage( - session=session, project_id=current_user.project_id - ) - transformed_doc_schema.signed_url = storage.get_signed_url( - document.object_store_url - ) + if include_url: + try: + storage = get_cloud_storage( + session=session, project_id=current_user.project_id + ) + transformed_doc_schema.signed_url = storage.get_signed_url( + document.object_store_url + ) + except Exception as err: # keep job fetch working even if URL generation fails + logger.warning( + "[get_transformation_job] failed to generate signed URL | job_id=%s | error=%s", + job_id, + err, + )and similarly in the multi-job loop. That way, clients still get job status and metadata even if signed URL generation has issues.
Also applies to: 80-103
67-75: Replacemin/max_lengthwithmin_items/max_itemsfor list constraint validationThe
job_idsparameter uses Query withmin_itemsandmax_itemskeyword arguments on fastapi.Query to control the minimum and maximum number of items in the list, notminandmax_length. Update line 71 to enforce the intended "at least 1, at most 100 IDs" constraint:job_ids: list[UUID] = Query( description="List of transformation job IDs", min_items=1, max_items=100 ),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (32)
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py(1 hunks)backend/app/api/routes/doc_transformation_job.py(1 hunks)backend/app/api/routes/documents.py(10 hunks)backend/app/api/routes/fine_tuning.py(1 hunks)backend/app/celery/tasks/job_execution.py(2 hunks)backend/app/celery/utils.py(2 hunks)backend/app/core/doctransform/service.py(0 hunks)backend/app/crud/__init__.py(1 hunks)backend/app/crud/document/doc_transformation_job.py(3 hunks)backend/app/crud/document/document.py(0 hunks)backend/app/models/__init__.py(1 hunks)backend/app/models/doc_transformation_job.py(2 hunks)backend/app/models/document.py(3 hunks)backend/app/services/collections/helpers.py(1 hunks)backend/app/services/doctransform/job.py(1 hunks)backend/app/services/doctransform/registry.py(1 hunks)backend/app/services/doctransform/zerox_transformer.py(1 hunks)backend/app/tests/api/routes/documents/test_route_document_info.py(1 hunks)backend/app/tests/api/routes/documents/test_route_document_list.py(1 hunks)backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py(1 hunks)backend/app/tests/api/routes/documents/test_route_document_remove.py(1 hunks)backend/app/tests/api/routes/documents/test_route_document_upload.py(4 hunks)backend/app/tests/api/routes/test_doc_transformation_job.py(13 hunks)backend/app/tests/core/doctransformer/test_service/test_start_job.py(0 hunks)backend/app/tests/crud/documents/test_doc_transformation_job.py(6 hunks)backend/app/tests/services/doctransformer/test_registry.py(3 hunks)backend/app/tests/services/doctransformer/test_service/conftest.py(2 hunks)backend/app/tests/services/doctransformer/test_service/test_execute_job.py(9 hunks)backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py(9 hunks)backend/app/tests/services/doctransformer/test_service/test_integration.py(4 hunks)backend/app/tests/services/doctransformer/test_service/test_start_job.py(1 hunks)backend/app/tests/services/doctransformer/test_service/utils.py(1 hunks)
💤 Files with no reviewable changes (3)
- backend/app/crud/document/document.py
- backend/app/tests/core/doctransformer/test_service/test_start_job.py
- backend/app/core/doctransform/service.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/celery/tasks/job_execution.pybackend/app/tests/services/doctransformer/test_service/utils.pybackend/app/tests/services/doctransformer/test_service/conftest.pybackend/app/crud/__init__.pybackend/app/api/routes/fine_tuning.pybackend/app/models/__init__.pybackend/app/tests/api/routes/documents/test_route_document_info.pybackend/app/api/routes/documents.pybackend/app/services/collections/helpers.pybackend/app/services/doctransform/registry.pybackend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.pybackend/app/tests/api/routes/documents/test_route_document_remove.pybackend/app/tests/services/doctransformer/test_service/test_integration.pybackend/app/api/routes/doc_transformation_job.pybackend/app/models/doc_transformation_job.pybackend/app/crud/document/doc_transformation_job.pybackend/app/tests/api/routes/documents/test_route_document_permanent_remove.pybackend/app/services/doctransform/zerox_transformer.pybackend/app/tests/crud/documents/test_doc_transformation_job.pybackend/app/tests/services/doctransformer/test_registry.pybackend/app/services/doctransform/job.pybackend/app/celery/utils.pybackend/app/tests/services/doctransformer/test_service/test_start_job.pybackend/app/tests/api/routes/documents/test_route_document_list.pybackend/app/tests/api/routes/documents/test_route_document_upload.pybackend/app/tests/api/routes/test_doc_transformation_job.pybackend/app/tests/services/doctransformer/test_service/test_execute_job.pybackend/app/tests/services/doctransformer/test_service/test_execute_job_errors.pybackend/app/models/document.py
backend/app/celery/tasks/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define Celery tasks under backend/app/celery/tasks/
Files:
backend/app/celery/tasks/job_execution.py
backend/app/celery/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/
Files:
backend/app/celery/tasks/job_execution.pybackend/app/celery/utils.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/__init__.pybackend/app/crud/document/doc_transformation_job.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/fine_tuning.pybackend/app/api/routes/documents.pybackend/app/api/routes/doc_transformation_job.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/__init__.pybackend/app/models/doc_transformation_job.pybackend/app/models/document.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/collections/helpers.pybackend/app/services/doctransform/registry.pybackend/app/services/doctransform/zerox_transformer.pybackend/app/services/doctransform/job.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/**/*.py : Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/
Applied to files:
backend/app/celery/tasks/job_execution.pybackend/app/celery/utils.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/tasks/**/*.py : Define Celery tasks under backend/app/celery/tasks/
Applied to files:
backend/app/celery/tasks/job_execution.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/tests/services/doctransformer/test_service/utils.pybackend/app/crud/__init__.pybackend/app/models/__init__.pybackend/app/api/routes/documents.pybackend/app/services/doctransform/registry.pybackend/app/tests/services/doctransformer/test_service/test_integration.pybackend/app/api/routes/doc_transformation_job.pybackend/app/models/doc_transformation_job.pybackend/app/crud/document/doc_transformation_job.pybackend/app/services/doctransform/zerox_transformer.pybackend/app/tests/crud/documents/test_doc_transformation_job.pybackend/app/tests/services/doctransformer/test_registry.pybackend/app/services/doctransform/job.pybackend/app/tests/api/routes/test_doc_transformation_job.pybackend/app/tests/services/doctransformer/test_service/test_execute_job.pybackend/app/tests/services/doctransformer/test_service/test_execute_job_errors.pybackend/app/models/document.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/cloud/storage.py : Implement cloud storage integration in backend/app/core/cloud/storage.py
Applied to files:
backend/app/tests/services/doctransformer/test_service/utils.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/crud/**/*.py : Implement database access operations in backend/app/crud/
Applied to files:
backend/app/crud/__init__.py
🧬 Code graph analysis (24)
backend/app/tests/services/doctransformer/test_service/utils.py (1)
backend/app/services/doctransform/transformer.py (1)
Transformer(5-14)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)
backend/app/services/doctransform/job.py (1)
execute_job(110-272)
backend/app/crud/__init__.py (2)
backend/app/crud/document/document.py (1)
DocumentCrud(13-134)backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)
backend/app/api/routes/fine_tuning.py (3)
backend/app/tests/crud/documents/test_doc_transformation_job.py (1)
crud(26-27)backend/app/tests/crud/documents/documents/test_crud_document_delete.py (1)
document(15-24)backend/app/crud/document/document.py (1)
DocumentCrud(13-134)
backend/app/models/__init__.py (2)
backend/app/models/document.py (5)
DocTransformationJobPublic(82-87)DocTransformationJobsPublic(90-92)TransformedDocumentPublic(60-64)DocumentUploadResponse(77-79)TransformationJobInfo(67-74)backend/app/models/doc_transformation_job.py (4)
DocTransformationJob(18-45)TransformationStatus(11-15)DocTransformJobCreate(48-49)DocTransformJobUpdate(52-57)
backend/app/tests/api/routes/documents/test_route_document_info.py (1)
backend/app/tests/utils/document.py (1)
Route(83-110)
backend/app/api/routes/documents.py (6)
backend/app/services/doctransform/registry.py (4)
get_available_transformers(75-79)get_file_format(53-59)is_transformation_supported(70-72)resolve_transformer(82-106)backend/app/crud/document/document.py (1)
DocumentCrud(13-134)backend/app/crud/document/doc_transformation_job.py (2)
DocTransformationJobCrud(21-95)create(26-34)backend/app/models/document.py (5)
Document(21-41)DocumentPublic(44-57)TransformedDocumentPublic(60-64)DocumentUploadResponse(77-79)TransformationJobInfo(67-74)backend/app/models/doc_transformation_job.py (3)
DocTransformJobCreate(48-49)TransformationStatus(11-15)job_id(36-37)backend/app/services/doctransform/job.py (1)
start_job(34-63)
backend/app/services/collections/helpers.py (2)
backend/app/tests/crud/documents/documents/test_crud_document_delete.py (1)
document(15-24)backend/app/crud/document/document.py (1)
DocumentCrud(13-134)
backend/app/services/doctransform/registry.py (2)
backend/app/services/doctransform/transformer.py (1)
Transformer(5-14)backend/app/services/doctransform/zerox_transformer.py (1)
ZeroxTransformer(12-70)
backend/app/tests/api/routes/documents/test_route_document_remove.py (1)
backend/app/tests/utils/document.py (1)
Route(83-110)
backend/app/tests/services/doctransformer/test_service/test_integration.py (5)
backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/services/doctransform/job.py (2)
execute_job(110-272)start_job(34-63)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (3)
DocTransformationJob(18-45)TransformationStatus(11-15)job_id(36-37)backend/app/models/user.py (1)
UserProjectOrg(58-59)
backend/app/api/routes/doc_transformation_job.py (4)
backend/app/crud/document/document.py (3)
DocumentCrud(13-134)read_one(18-34)read_each(72-95)backend/app/models/document.py (3)
DocTransformationJobPublic(82-87)DocTransformationJobsPublic(90-92)TransformedDocumentPublic(60-64)backend/app/utils.py (2)
APIResponse(30-54)success_response(37-40)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)
backend/app/models/doc_transformation_job.py (1)
backend/app/core/util.py (1)
now(13-14)
backend/app/crud/document/doc_transformation_job.py (4)
backend/app/tests/crud/documents/test_doc_transformation_job.py (1)
crud(26-27)backend/app/crud/document/document.py (3)
DocumentCrud(13-134)update(97-122)read_one(18-34)backend/app/models/doc_transformation_job.py (5)
DocTransformationJob(18-45)TransformationStatus(11-15)DocTransformJobCreate(48-49)DocTransformJobUpdate(52-57)job_id(36-37)backend/app/core/util.py (1)
now(13-14)
backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (1)
backend/app/tests/utils/document.py (1)
Route(83-110)
backend/app/services/doctransform/zerox_transformer.py (1)
backend/app/services/doctransform/transformer.py (1)
Transformer(5-14)
backend/app/tests/crud/documents/test_doc_transformation_job.py (5)
backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)DocTransformJobUpdate(52-57)backend/app/tests/conftest.py (1)
db(28-45)backend/app/tests/utils/document.py (2)
DocumentStore(53-80)project(65-66)backend/app/tests/utils/utils.py (2)
SequentialUuidGenerator(150-163)get_project(54-73)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
DocTransformationJobCrud(21-95)update(73-95)read_one(36-55)backend/app/crud/document/document.py (3)
DocumentCrud(13-134)update(97-122)read_one(18-34)backend/app/models/document.py (3)
Document(21-41)DocTransformationJobPublic(82-87)TransformedDocumentPublic(60-64)backend/app/models/doc_transformation_job.py (4)
DocTransformJobUpdate(52-57)TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (3)
APIResponse(30-54)success_response(37-40)failure_response(43-54)backend/app/services/doctransform/registry.py (1)
convert_document(109-130)
backend/app/tests/services/doctransformer/test_service/test_start_job.py (7)
backend/app/services/doctransform/job.py (2)
execute_job(110-272)start_job(34-63)backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (3)
DocTransformationJob(18-45)TransformationStatus(11-15)job_id(36-37)backend/app/tests/services/doctransformer/test_service/utils.py (2)
DocTransformTestBase(30-69)MockTestTransformer(16-27)backend/app/tests/conftest.py (1)
db(28-45)backend/app/tests/services/doctransformer/test_service/conftest.py (2)
current_user(55-63)test_document(73-79)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
backend/app/tests/utils/document.py (1)
Route(83-110)
backend/app/tests/api/routes/test_doc_transformation_job.py (3)
backend/app/crud/document/doc_transformation_job.py (3)
DocTransformationJobCrud(21-95)create(26-34)update(73-95)backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)DocTransformJobUpdate(52-57)backend/app/tests/utils/document.py (2)
DocumentStore(53-80)put(68-73)
backend/app/tests/services/doctransformer/test_service/test_execute_job.py (5)
backend/app/services/doctransform/registry.py (1)
TransformationError(7-8)backend/app/services/doctransform/job.py (1)
execute_job(110-272)backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/tests/services/doctransformer/test_service/utils.py (1)
MockTestTransformer(16-27)backend/app/tests/services/doctransformer/test_service/conftest.py (1)
fast_execute_job(33-51)
backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py (3)
backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/tests/services/doctransformer/test_service/utils.py (5)
DocTransformTestBase(30-69)MockTestTransformer(16-27)failing_convert_document(76-85)create_failing_convert_document(72-87)persistent_failing_convert_document(95-96)backend/app/tests/services/doctransformer/test_service/conftest.py (1)
fast_execute_job(33-51)
backend/app/models/document.py (1)
backend/app/models/doc_transformation_job.py (1)
TransformationStatus(11-15)
🪛 Ruff (0.14.5)
backend/app/api/routes/documents.py
229-229: Use X | Y for type annotations
(UP007)
backend/app/services/doctransform/job.py
22-22: Redefinition of unused TransformedDocumentPublic from line 20
Remove definition: TransformedDocumentPublic
(F811)
118-118: Unused function argument: task_instance
(ARG001)
backend/app/tests/services/doctransformer/test_service/test_start_job.py
6-6: typing.Tuple is deprecated, use tuple instead
(UP035)
backend/app/tests/services/doctransformer/test_service/test_execute_job.py
154-154: Do not assert blind exception: Exception
(B017)
backend/app/tests/services/doctransformer/test_service/test_execute_job_errors.py
159-159: Do not assert blind exception: Exception
(B017)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (33)
backend/app/celery/tasks/job_execution.py (1)
25-25: Documentation correctly updated to reflect module reorganization.The docstring examples have been properly updated to reference the new
app.services.doctransform.service.execute_jobpath, aligning with the broader refactoring effort that relocates document transformation logic fromapp.core.doctransformtoapp.services.doctransform.Also applies to: 50-50
backend/app/celery/utils.py (2)
18-43: Code quality and type hints look good.Both functions follow the established pattern with complete type hints (compliant with Python 3.11+ guidelines), clear docstrings, and proper parameter validation. The logging at lines 42 and 70 provides good observability for task tracking.
Also applies to: 46-71
25-25: Update docstring examples to use correct module path.The docstring examples reference
app.services.doctransform.service.execute_job, but the actual function is located atapp.services.doctransform.job.execute_job. Correct the import path in the docstring examples at lines 25 and 53 to match the actual module structure (change.serviceto.job).⛔ Skipped due to learnings
Learnt from: CR Repo: ProjectTech4DevAI/ai-platform PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-10-08T12:05:01.317Z Learning: Applies to backend/app/celery/**/*.py : Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/Learnt from: CR Repo: ProjectTech4DevAI/ai-platform PR: 0 File: CLAUDE.md:0-0 Timestamp: 2025-10-08T12:05:01.317Z Learning: Applies to backend/app/celery/tasks/**/*.py : Define Celery tasks under backend/app/celery/tasks/backend/app/models/document.py (2)
60-64: LGTM: TransformedDocumentPublic correctly extends DocumentPublic.The new
TransformedDocumentPublicclass appropriately extendsDocumentPublicwith thesource_document_idfield, supporting the transformation workflow.
82-92: LGTM: New public models for transformation jobs.
DocTransformationJobPublicandDocTransformationJobsPublicprovide a clean public API surface for transformation job responses, with proper error handling viaerror_messageand support for jobs not found.backend/app/tests/api/routes/documents/test_route_document_permanent_remove.py (1)
29-29: LGTM: Route fixture updated to base path.The change from
Route("remove")toRoute("")aligns with the broader routing refactor that shifts document-related test routes to use the base path. The tests still correctly target the permanent removal endpoint viaroute.append(document, suffix="permanent").backend/app/services/doctransform/zerox_transformer.py (1)
7-7: LGTM: Import path updated to services module.The import path change from
app.core.doctransform.transformertoapp.services.doctransform.transformeraligns with the PR's objective to relocate transformation service logic underapp.services.doctransformper the coding guideline to implement business logic services underbackend/app/services/.Note: This intentionally supersedes the previous learning about placing doctransform utilities under
app.core.doctransform/. Based on learnings.backend/app/services/collections/helpers.py (1)
11-11: LGTM: Import path updated for nested CRUD module.The import path change from
app.crud.documenttoapp.crud.document.documentreflects the CRUD module reorganization to a nested structure. No logic changes.backend/app/tests/api/routes/documents/test_route_document_info.py (1)
17-17: LGTM: Route fixture updated to base path.The change from
Route("info")toRoute("")is consistent with the broader routing refactor pattern observed across document-related test files.backend/app/tests/api/routes/documents/test_route_document_remove.py (1)
20-20: LGTM: Route fixture updated to base path.The change from
Route("remove")toRoute("")maintains consistency with the routing refactor applied across all document-related test files.backend/app/models/__init__.py (2)
35-37: LGTM: New public transformation models exported.The addition of
DocTransformationJobPublic,DocTransformationJobsPublic, andTransformedDocumentPublicexports aligns with the new public API surface for document transformation responses introduced inbackend/app/models/document.py.
44-45: LGTM: Input DTOs exported for transformation CRUD.The addition of
DocTransformJobCreateandDocTransformJobUpdateexports provides input DTOs for transformation job CRUD operations, replacing the previousDocTransformationJobsexport and aligning with the updated CRUD interface.backend/app/tests/services/doctransformer/test_service/utils.py (1)
12-12: LGTM: Import path updated to services module.The import path change from
app.core.doctransform.transformertoapp.services.doctransform.transformeris consistent with the broader refactor to relocate transformation service logic underapp.services.doctransformper coding guidelines.Based on learnings.
backend/app/tests/api/routes/documents/test_route_document_list.py (1)
24-24: LGTM!The route fixture update correctly aligns with the broader routing refactor where document list endpoints now use the base path instead of an explicit "/list" subpath.
backend/app/services/doctransform/registry.py (1)
3-4: Import paths updated correctly for services refactor.The import paths have been updated to reflect the relocation of document transformation logic from
app.core.doctransformtoapp.services.doctransform. This aligns with the coding guideline to "Implement business logic services under backend/app/services/". Note that this conflicts with a retrieved learning suggesting to place these underbackend/app/core/doctransform/, but the coding guidelines take precedence for business logic services.Based on learnings
As per coding guidelinesbackend/app/api/routes/fine_tuning.py (1)
21-21: LGTM!The import path update correctly reflects the nested CRUD module structure. DocumentCrud is now properly imported from
app.crud.document.document.backend/app/tests/services/doctransformer/test_service/conftest.py (1)
35-50: LGTM!The test fixture correctly updates import and patch paths from
app.core.doctransform.servicetoapp.services.doctransform.job, aligning with the services refactor. The patch target and function references are accurate.backend/app/crud/__init__.py (1)
9-11: LGTM!The import paths have been correctly updated to reflect the nested CRUD module structure:
DocumentCrudnow imported from.document.documentDocTransformationJobCrudnow imported from.document.doc_transformation_jobThese changes align with the broader CRUD reorganization in this PR.
backend/app/tests/api/routes/documents/test_route_document_upload.py (2)
76-76: LGTM!The route fixture update correctly aligns with the broader routing refactor where document upload now uses the base path.
154-154: LGTM!The patch targets have been correctly updated from
app.core.doctransform.service.start_jobtoapp.services.doctransform.job.start_job, aligning with the services refactor. All three test methods now reference the correct module path.Also applies to: 187-187, 277-277
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1)
20-39: LGTM!The migration is well-structured:
- Adds
task_idandtrace_idcolumns (nullable AutoString) for Celery task tracking- Renames
created_attoinserted_atfor consistency- Provides a proper downgrade path that reverses all changes
The migration operations are correctly ordered and the schema changes align with the broader transformation job enhancements in this PR.
backend/app/tests/services/doctransformer/test_registry.py (1)
3-3: LGTM!The import and patch paths have been correctly updated to reflect the relocation of the registry module from
app.core.doctransform.registrytoapp.services.doctransform.registry. This aligns with the coding guideline to "Implement business logic services under backend/app/services/". Note that this conflicts with a retrieved learning suggesting to place these underbackend/app/core/doctransform/, but the coding guidelines take precedence for business logic services.Based on learnings
As per coding guidelinesAlso applies to: 23-23, 50-50
backend/app/tests/services/doctransformer/test_service/test_start_job.py (2)
38-81: Happy-path test covers scheduler integration well
test_start_job_successexercises the full flow: job creation, scheduling viastart_low_priority_job, and verifies persisted job fields plus scheduler kwargs. This is solid coverage for the new Celery-based start path.
107-195: Good coverage for format and transformer propagationThe parametrized tests for multiple
target_formatvalues and differenttransformer_namevalues assert that scheduler kwargs stay consistent and that job status remainsPENDING. This gives good confidence that the service correctly threads these parameters through to Celery.backend/app/api/routes/documents.py (1)
239-248: doc_info branching correctly picks transformed vs. original schemaThe
doc_infoendpoint now choosesDocumentPublicwhensource_document_id is NoneandTransformedDocumentPublicotherwise, with optional signed URL enrichment. This matches the model semantics for original vs. transformed documents and looks correct.backend/app/tests/crud/documents/test_doc_transformation_job.py (4)
30-73: Create-path tests align with new DTO and DB constraintsThe
TestDocTransformationJobCrudCreatetests correctly verify:
- Happy path job creation via
DocTransformJobCreate.- FK integrity by expecting
IntegrityErrorwhen the source document doesn’t exist.- Behavior when the source document is soft‑deleted: creation succeeds but later reads 404.
This suite nicely documents and enforces the intended semantics of
create.
75-130: read_one tests thoroughly cover not-found and cross-project casesThe
TestDocTransformationJobCrudReadOnetests validate:
- Successful read for existing jobs.
- 404 behavior for nonexistent IDs.
- 404 for jobs whose source document is deleted.
- 404 when reading a job under a different project.
These align with the join + project/is_deleted filters in
read_one.
132-184: read_each tests cover multi-ID, partial, empty, and cross-project behaviorThe
TestDocTransformationJobCrudReadEachclass:
- Asserts correct results for multiple job IDs.
- Confirms partial results when some IDs don’t exist.
- Handles empty input sets gracefully.
- Verifies that other projects see no jobs due to project scoping.
This gives good safety net coverage for the bulk-read path.
186-273: update tests precisely exercise status transitions and patch semantics
TestDocTransformationJobCrudUpdateStatus:
- Confirms status transitions to PROCESSING, COMPLETED (with transformed_document_id), and FAILED (with error_message).
- Verifies that updates on nonexistent jobs raise the expected
HTTPException.- Ensures that fields not present in the patch (like
error_message) are preserved, validating theexclude_unset/exclude_nonebehavior.These tests are well-aligned with the new
updateimplementation.backend/app/models/doc_transformation_job.py (1)
35-57: DTOs and helper properties align with the new workflowThe additions of:
DocTransformJobCreatewithsource_document_id,DocTransformJobUpdatewithtransformed_document_id,task_id,status,error_message, andtrace_id,- and the
job_id,job_inserted_at, andjob_updated_atpropertiesall fit cleanly with the CRUD and service-layer refactor and make the model easier to consume in public APIs and payload builders.
backend/app/tests/services/doctransformer/test_service/test_integration.py (3)
63-80: LGTM!The patch targets have been correctly updated to the new module paths, and all parameters passed to
execute_jobproperly match the expected types with appropriate UUID-to-string conversions.
95-196: Test coverage and implementation look good.The remaining test methods (
test_execute_job_concurrent_jobsandtest_multiple_format_transformations) follow the correct patterns forexecute_jobcalls with proper type conversions. The test coverage is comprehensive, validating concurrent execution and multiple format transformations.Note: Lines 107 and 155 have the same job creation issue flagged earlier (using
DocTransformationJobinstead ofDocTransformJobCreate).
51-59: No issues found in the test code.The test assertion
job.id == returned_job_idis correct. Verification confirms thatstart_job()inbackend/app/services/doctransform/job.py:63returnsjob.id, which is a UUID object. Both sides of the comparison are UUIDs, so the assertion will work as intended despite the function's incorrect type hint (-> str).Note: The type hint discrepancy exists in the service file (should return
UUID, notstr), but this does not affect test correctness.
backend/app/tests/services/doctransformer/test_service/test_integration.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/tests/services/doctransformer/test_service/conftest.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/tests/services/doctransformer/test_service/conftest.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
🧬 Code graph analysis (1)
backend/app/tests/services/doctransformer/test_service/conftest.py (2)
backend/app/services/doctransform/job.py (1)
execute_job(110-272)backend/app/models/doc_transformation_job.py (1)
job_id(36-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/tests/services/doctransformer/test_service/conftest.py (1)
53-62: ****The
execute_jobfunction injob.py(line 109) is decorated with@retry, which means the__wrapped__attribute exists and is safe to access. The code correctly usesoriginal_execute_job.__wrapped__()to bypass the retry decorator during testing. No changes are needed.Likely an incorrect or invalid review comment.
backend/app/tests/services/doctransformer/test_service/conftest.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
backend/app/services/doctransform/job.py (2)
16-24: Remove duplicateTransformedDocumentPublicimport (Ruff F811)
TransformedDocumentPublicappears twice in the import tuple, which triggers a redefinition warning and will fail linting. Keep a single entry.-from app.models import ( - Document, - DocTransformJobUpdate, - TransformationStatus, - DocTransformationJobPublic, - TransformedDocumentPublic, - DocTransformationJob, - TransformedDocumentPublic, -) +from app.models import ( + Document, + DocTransformJobUpdate, + TransformationStatus, + DocTransformationJobPublic, + TransformedDocumentPublic, + DocTransformationJob, +)
110-270: Hardenexecute_joberror path and clean up lint issuesThere are a few correctness and lint issues in
execute_job:
- Potential
UnboundLocalErrorforjob_uuidinexceptIf
UUID(job_id)raises (e.g. malformed ID),job_uuidis never assigned, but theexceptblock still references it in logging and in the DB update, which will raiseUnboundLocalErrorand mask the original failure.Initialize
job_uuidbefore thetryand guard DB updates on its presence, also loggingjob_idas a fallback:-@retry(wait=wait_exponential(multiplier=5, min=5, max=10), stop=stop_after_attempt(3)) -def execute_job( +@retry(wait=wait_exponential(multiplier=5, min=5, max=10), stop=stop_after_attempt(3)) +def execute_job( @@ - task_instance, + task_instance, # noqa: ARG001 # provided by Celery, not used directly ): - start_time = time.time() - tmp_dir: Path | None = None - - job_for_payload = None # keep latest job snapshot for payloads - - try: - job_uuid = UUID(job_id) + start_time = time.time() + tmp_dir: Path | None = None + + job_for_payload = None # keep latest job snapshot for payloads + job_uuid: UUID | None = None + + try: + job_uuid = UUID(job_id) @@ - except Exception as e: - logger.error( - "[doc_transform.execute_job] FAILED | job_id=%s | error=%s", - job_uuid, - e, - exc_info=True, - ) - - try: - with Session(engine) as db: - job_crud = DocTransformationJobCrud(session=db, project_id=project_id) - job_for_payload = job_crud.update( - job_uuid, - DocTransformJobUpdate( - status=TransformationStatus.FAILED, error_message=str(e) - ), - ) - except Exception as db_error: - logger.error( - "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s", - job_uuid, - db_error, - ) + except Exception as e: + logger.error( + "[doc_transform.execute_job] FAILED | job_id=%s | error=%s", + job_uuid or job_id, + e, + exc_info=True, + ) + + if job_uuid is not None: + try: + with Session(engine) as db: + job_crud = DocTransformationJobCrud( + session=db, project_id=project_id + ) + job_for_payload = job_crud.update( + job_uuid, + DocTransformJobUpdate( + status=TransformationStatus.FAILED, + error_message=str(e), + ), + ) + except Exception as db_error: + logger.error( + "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s", + job_uuid, + db_error, + )
- Unused
task_instanceargument (Ruff ARG001)The snippet above adds
# noqa: ARG001to the parameter, making it explicit this is provided by Celery but unused.
- Content-type map can be extended for
text/htmlTests exercise multiple formats; aligning content types now avoids surprises later:
- content_type_map = {"markdown": "text/markdown; charset=utf-8"} - content_type = content_type_map.get(target_format, "text/plain") + content_type_map = { + "markdown": "text/markdown; charset=utf-8", + "text": "text/plain; charset=utf-8", + "html": "text/html; charset=utf-8", + } + content_type = content_type_map.get(target_format, "text/plain")These changes make the error handling robust and resolve outstanding lint warnings.
backend/app/tests/services/doctransformer/test_service/test_integration.py (1)
12-20: Use theDocTransformJobCreateDTO when callingDocTransformationJobCrud.create
DocTransformationJobCrud.create()is defined to accept aDocTransformJobCreatepayload, but the tests still pass aDocTransformationJobORM model:
- Line 40/41:
job_crud.create(DocTransformationJob(...))- Line 108: same pattern inside the loop
- Line 156: same pattern for multi-format jobs
This violates the CRUD interface and couples tests to the ORM layer more than necessary. It also risk-picks extra fields (
id, timestamps) into the create path.Please import and use the DTO instead, e.g.:
-from app.models import ( - Document, - DocTransformationJob, - Project, - TransformationStatus, - UserProjectOrg, -) +from app.models import ( + Document, + DocTransformationJob, + Project, + TransformationStatus, + UserProjectOrg, +) +# or wherever the DTO actually lives: +# from app.schemas import DocTransformJobCreate # adjust module as needed @@ - job_crud = DocTransformationJobCrud(session=db, project_id=project.id) - job = job_crud.create(DocTransformationJob(source_document_id=document.id)) + job_crud = DocTransformationJobCrud(session=db, project_id=project.id) + job = job_crud.create( + DocTransformJobCreate(source_document_id=document.id) + ) @@ - for i in range(3): - job = job_crud.create(DocTransformationJob(source_document_id=document.id)) + for i in range(3): + job = job_crud.create( + DocTransformJobCreate(source_document_id=document.id) + ) @@ - for target_format in formats: - job = job_crud.create(DocTransformationJob(source_document_id=document.id)) + for target_format in formats: + job = job_crud.create( + DocTransformJobCreate(source_document_id=document.id) + )Adjust the import path for
DocTransformJobCreateto match where the DTO is actually defined in your project.#!/bin/bash # Locate the DocTransformJobCreate DTO to confirm the correct import path. rg -n "class DocTransformJobCreate" backend || rg -n "DocTransformJobCreate" backendAlso applies to: 40-42, 107-109, 154-157
🧹 Nitpick comments (1)
backend/app/services/doctransform/job.py (1)
67-107: Payload builders look correct; you can skip redundant re-validation
build_success_payload/build_failure_payloadcorrectly wrapDocTransformationJobPublicin the commonAPIResponseenvelope and excludeerror_messagefrom thedatasection, which matches the intended API shape.Given
transformed_docis already typed asTransformedDocumentPublic, you can avoid re-validating it viamodel_validateand just pass it through, if you don’t need coercion:def build_success_payload( job: DocTransformationJob, transformed_doc: TransformedDocumentPublic, ) -> dict: @@ - transformed_public = TransformedDocumentPublic.model_validate(transformed_doc) + transformed_public = transformed_docThis is optional and purely a small simplification.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/services/doctransform/job.py(1 hunks)backend/app/tests/services/doctransformer/test_service/test_integration.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/services/doctransform/job.pybackend/app/tests/services/doctransformer/test_service/test_integration.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/doctransform/job.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/services/doctransform/job.py
🧬 Code graph analysis (2)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
DocTransformationJobCrud(21-95)update(73-95)read_one(36-55)backend/app/crud/document/document.py (3)
DocumentCrud(13-134)update(97-122)read_one(18-34)backend/app/models/document.py (3)
Document(21-41)DocTransformationJobPublic(82-87)TransformedDocumentPublic(60-64)backend/app/models/doc_transformation_job.py (4)
DocTransformJobUpdate(52-57)TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (3)
APIResponse(30-54)success_response(37-40)failure_response(43-54)backend/app/services/doctransform/registry.py (1)
convert_document(109-130)
backend/app/tests/services/doctransformer/test_service/test_integration.py (6)
backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/services/doctransform/job.py (2)
execute_job(111-271)start_job(35-64)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (3)
DocTransformationJob(18-45)TransformationStatus(11-15)job_id(36-37)backend/app/tests/services/doctransformer/test_service/conftest.py (1)
current_user(69-77)backend/app/tests/services/doctransformer/test_service/utils.py (1)
MockTestTransformer(16-27)
🪛 Ruff (0.14.5)
backend/app/services/doctransform/job.py
23-23: Redefinition of unused TransformedDocumentPublic from line 21
Remove definition: TransformedDocumentPublic
(F811)
119-119: Unused function argument: task_instance
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/tests/services/doctransformer/test_service/test_integration.py (2)
51-80: start_job / execute_job integration usage looks correctThe tests now:
- Call
start_jobwithjob_id=job.idandcallback_url=None, asserting the returned UUID matches, and- Invoke
execute_jobwithjob_id/source_document_idas strings, a generatedtask_id, andcallback_url=None, while patchingSessionandTRANSFORMERSto keep things in-process.This matches the new signatures in
app.services.doctransform.joband validates the end-to-end workflow across multiple jobs and formats.Also applies to: 112-132, 160-180
91-91: Drop debugThe line:
print("Transformed fname:", transformed_doc.fname)adds noise to test output and isn’t asserted on. Consider removing it or replacing it with an assertion if you need to verify the value.
[ suggest_optional_refactor ]
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
backend/app/tests/services/doctransformer/test_service/test_integration.py (4)
100-106: PassDocTransformJobCreateDTO instead of the ORM model.Line 104 passes a
DocTransformationJobORM model tocreate(), but the method expects aDocTransformJobCreatepayload. This type contract violation was previously flagged.
12-20: Import the missingDocTransformJobCreateDTO.The CRUD
create()method expects aDocTransformJobCreatepayload, but this import is missing. This issue was previously flagged.
40-41: PassDocTransformJobCreateDTO instead of the ORM model.
DocTransformationJobCrud.create()expects aDocTransformJobCreatepayload per its signature, but the test passes aDocTransformationJobORM model. This violates the type contract and was previously flagged.
147-151: PassDocTransformJobCreateDTO instead of the ORM model.Line 149 passes a
DocTransformationJobORM model tocreate(), but the method expects aDocTransformJobCreatepayload. This type contract violation was previously flagged.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/tests/services/doctransformer/test_service/test_integration.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/tests/services/doctransformer/test_service/test_integration.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/tests/services/doctransformer/test_service/test_integration.py
🧬 Code graph analysis (1)
backend/app/tests/services/doctransformer/test_service/test_integration.py (8)
backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/crud/document/document.py (1)
DocumentCrud(13-134)backend/app/services/doctransform/job.py (2)
execute_job(111-271)start_job(35-64)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (3)
DocTransformationJob(18-45)TransformationStatus(11-15)job_id(36-37)backend/app/tests/conftest.py (1)
db(28-45)backend/app/tests/services/doctransformer/test_service/conftest.py (1)
current_user(69-77)backend/app/tests/services/doctransformer/test_service/utils.py (1)
MockTestTransformer(16-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
32-67: Missing type hint fortask_instanceparameter.The
task_instanceparameter at line 52 still lacks a type annotation. This was flagged in a previous review.backend/app/services/doctransform/job.py (1)
231-267: Guard against UnboundLocalError in error handling.If
UUID(job_id)fails at line 126,job_uuidis never assigned, but theexceptblock references it at lines 234, 243, 251, and 262. This causesUnboundLocalErrorand masks the original failure. Initializejob_uuid: UUID | None = Nonebefore thetryblock and guard DB operations:+ job_uuid: UUID | None = None + try: job_uuid = UUID(job_id) ... except Exception as e: logger.error( "[doc_transform.execute_job] FAILED | job_id=%s | error=%s", - job_uuid, + job_uuid or job_id, e, exc_info=True, ) - try: - with Session(engine) as db: - job_crud = DocTransformationJobCrud(session=db, project_id=project_id) - job_for_payload = job_crud.update( - job_uuid, - DocTransformJobUpdate( - status=TransformationStatus.FAILED, error_message=str(e) - ), - ) - except Exception as db_error: - logger.error( - "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s", - job_uuid, - db_error, - ) + if job_uuid is not None: + try: + with Session(engine) as db: + job_crud = DocTransformationJobCrud(session=db, project_id=project_id) + job_for_payload = job_crud.update( + job_uuid, + DocTransformJobUpdate( + status=TransformationStatus.FAILED, error_message=str(e) + ), + ) + except Exception as db_error: + logger.error( + "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s", + job_uuid, + db_error, + )
🧹 Nitpick comments (4)
backend/app/tests/services/doctransformer/test_job/test_start_job.py (1)
31-35: Redundant commit afterDocTransformationJobCrud.create
DocTransformationJobCrud.createalready adds, commits, and refreshes the job; the extradb.commit()in_create_jobis unnecessary in these tests. You can drop it to rely on the CRUD abstraction for transaction handling.backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (2)
30-36: Updatefast_execute_jobtype hint to match the new execute_job signature
fast_execute_jobis annotated asCallable[[int, Any, str, str], Any], but the tests now call it with eight arguments:(project_id, job_id, source_document_id, transformer_name, target_format, task_id, callback_url, task_instance). This will confuse type checkers.Consider updating the annotation to reflect the actual signature, e.g.:
- fast_execute_job: Callable[[int, Any, str, str], Any], + fast_execute_job: Callable[ + [int, str, str, str, str, str, str | None, Any], + Any, + ],Apply the same adjustment anywhere else this fixture is typed.
As per coding guidelines
61-71: Strengthenpytest.raises(Exception)assertionsSeveral tests use
pytest.raises(Exception)to assert failures (e.g., storage error, exhausted retries, DB error during completion). This matches the current implementation (which re-raises a genericException), but it’s a weak contract and triggers Ruff’s B017 warning.If feasible, either:
- Raise a more specific exception type from the implementation and assert on that, or
- At least use
match="...expected message..."so the tests verify the failure mode, not just “some Exception”.This would make the tests more robust and clear about the expected error conditions.
Also applies to: 159-169, 213-223
backend/app/services/doctransform/job.py (1)
171-172: Consider expanding the content type mapping.The current implementation only special-cases
"markdown"and falls back to"text/plain"for all other formats. Tests verify behavior for multiple formats including"text"and"html". Adding explicit mappings improves clarity:- content_type_map = {"markdown": "text/markdown; charset=utf-8"} + content_type_map = { + "markdown": "text/markdown; charset=utf-8", + "text": "text/plain; charset=utf-8", + "html": "text/html; charset=utf-8", + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
backend/app/models/doc_transformation_job.py(2 hunks)backend/app/services/doctransform/job.py(1 hunks)backend/app/tests/services/doctransformer/test_job/conftest.py(2 hunks)backend/app/tests/services/doctransformer/test_job/test_execute_job.py(9 hunks)backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py(9 hunks)backend/app/tests/services/doctransformer/test_job/test_integration.py(4 hunks)backend/app/tests/services/doctransformer/test_job/test_start_job.py(1 hunks)backend/app/tests/services/doctransformer/test_job/utils.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/tests/services/doctransformer/test_job/utils.pybackend/app/models/doc_transformation_job.pybackend/app/tests/services/doctransformer/test_job/test_execute_job.pybackend/app/tests/services/doctransformer/test_job/test_start_job.pybackend/app/tests/services/doctransformer/test_job/test_execute_job_errors.pybackend/app/tests/services/doctransformer/test_job/conftest.pybackend/app/tests/services/doctransformer/test_job/test_integration.pybackend/app/services/doctransform/job.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/doc_transformation_job.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/doctransform/job.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/tests/services/doctransformer/test_job/utils.pybackend/app/models/doc_transformation_job.pybackend/app/tests/services/doctransformer/test_job/test_execute_job.pybackend/app/tests/services/doctransformer/test_job/test_integration.pybackend/app/services/doctransform/job.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/cloud/storage.py : Implement cloud storage integration in backend/app/core/cloud/storage.py
Applied to files:
backend/app/tests/services/doctransformer/test_job/utils.py
📚 Learning: 2025-08-06T05:29:45.557Z
Learnt from: avirajsingh7
Repo: ProjectTech4DevAI/ai-platform PR: 319
File: backend/app/crud/prompt.py:111-118
Timestamp: 2025-08-06T05:29:45.557Z
Learning: In FastAPI applications using Pydantic/SQLModel, when update model fields have min_length=1 validation, empty strings are rejected at the validation layer before reaching CRUD functions. This means checking `if field:` vs `if field is not None:` in CRUD operations should consider the model validation constraints.
Applied to files:
backend/app/models/doc_transformation_job.py
🧬 Code graph analysis (8)
backend/app/tests/services/doctransformer/test_job/utils.py (1)
backend/app/services/doctransform/transformer.py (1)
Transformer(5-14)
backend/app/models/doc_transformation_job.py (1)
backend/app/core/util.py (1)
now(13-14)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py (6)
backend/app/services/doctransform/registry.py (1)
TransformationError(7-8)backend/app/services/doctransform/job.py (1)
execute_job(110-270)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)job_id(36-37)backend/app/tests/services/doctransformer/test_job/utils.py (1)
MockTestTransformer(16-27)backend/app/tests/services/doctransformer/test_job/conftest.py (1)
fast_execute_job(32-67)
backend/app/tests/services/doctransformer/test_job/test_start_job.py (5)
backend/app/services/doctransform/job.py (1)
start_job(34-63)backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/models/doc_transformation_job.py (4)
DocTransformationJob(18-45)TransformationStatus(11-15)DocTransformJobCreate(48-49)job_id(36-37)backend/app/tests/services/doctransformer/test_job/utils.py (2)
DocTransformTestBase(30-69)MockTestTransformer(16-27)backend/app/tests/services/doctransformer/test_job/conftest.py (2)
current_user(71-79)test_document(89-95)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (4)
backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)job_id(36-37)backend/app/tests/services/doctransformer/test_job/utils.py (5)
DocTransformTestBase(30-69)MockTestTransformer(16-27)failing_convert_document(76-85)create_failing_convert_document(72-87)persistent_failing_convert_document(95-96)backend/app/tests/conftest.py (1)
db(28-45)backend/app/tests/services/doctransformer/test_job/conftest.py (1)
fast_execute_job(32-67)
backend/app/tests/services/doctransformer/test_job/conftest.py (2)
backend/app/services/doctransform/job.py (1)
execute_job(110-270)backend/app/models/doc_transformation_job.py (1)
job_id(36-37)
backend/app/tests/services/doctransformer/test_job/test_integration.py (7)
backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/services/doctransform/job.py (2)
execute_job(110-270)start_job(34-63)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)job_id(36-37)backend/app/models/user.py (1)
UserProjectOrg(58-59)backend/app/tests/services/doctransformer/test_job/conftest.py (1)
current_user(71-79)backend/app/tests/services/doctransformer/test_job/utils.py (1)
MockTestTransformer(16-27)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
DocTransformationJobCrud(21-95)update(73-95)read_one(36-55)backend/app/crud/document/document.py (3)
DocumentCrud(13-134)update(97-122)read_one(18-34)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (4)
DocTransformJobUpdate(52-57)TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (3)
APIResponse(30-54)success_response(37-40)failure_response(43-54)backend/app/services/doctransform/registry.py (1)
convert_document(109-130)
🪛 Ruff (0.14.5)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py
149-149: Do not assert blind exception: Exception
(B017)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py
159-159: Do not assert blind exception: Exception
(B017)
backend/app/services/doctransform/job.py
118-118: Unused function argument: task_instance
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (14)
backend/app/tests/services/doctransformer/test_job/utils.py (1)
10-27: Import path update forTransformerlooks correctSwitching to
app.services.doctransform.transformer.Transformermatches the new services-based layout and keeps the mock aligned with the real ABC. No further changes needed here.backend/app/models/doc_transformation_job.py (1)
21-57: DocTransformationJob fields and DTOs align well with CRUD usageThe new optional fields (
transformed_document_id,task_id,trace_id,error_message) and timestamps are consistent with howDocTransformationJobCrud.create/updateuse them, andDocTransformJobCreate/DocTransformJobUpdatefit the DTO pattern for create/patch flows. The properties (job_id,job_inserted_at,job_updated_at) give a clean surface for public models without altering persistence behavior.backend/app/tests/services/doctransformer/test_job/test_start_job.py (1)
81-105: VerifyHTTPExceptiontype alignment in negative test
DocTransformationJobCrud.read_oneraises an HTTP 404 when the job is missing, and this test usespytest.raises(HTTPException)withHTTPExceptionimported fromapp.core.exception_handlers. Please double-check that this is the same exception class (or an alias of) the one raised in the CRUD layer (likely FastAPI’sHTTPException); otherwise, the assertion may not catch the error.backend/app/tests/services/doctransformer/test_job/test_integration.py (2)
32-88: End-to-end integration test matches the new workflowThe integration test now creates jobs via
DocTransformJobCreate, schedules them withstart_job, and executes them throughexecute_jobusing the updated arguments (job_id/source_document_idas strings plustask_id/callback_url). PatchingSession,start_low_priority_job, andTRANSFORMERSis correctly targeted to the services module, and the assertions on job status, transformed document linkage, and filename look solid.
100-151: Concurrency and multi-format tests exercise key execute_job pathsThe concurrent-jobs and multi-format tests both operate through
DocTransformationJobCrud+execute_job, with proper patching ofSessionand the transformer registry. Verifyingstatus == COMPLETED, non-nulltransformed_document_id, and format-specific filename extensions gives good coverage of the refactored execution pipeline across multiple jobs and output formats.Also applies to: 153-187
backend/app/tests/services/doctransformer/test_job/conftest.py (1)
14-14: LGTM: Import updated to new service layer.The import correctly references the refactored module path under
app.services.doctransform.backend/app/tests/services/doctransformer/test_job/test_execute_job.py (6)
14-18: LGTM: Imports updated to new service layer.The imports correctly reference the refactored module paths and the new public model
DocTransformJobCreate.
37-87: LGTM: Test updated to use new service layer and public models.The test correctly uses
DocTransformJobCreatefor job creation and callsexecute_jobwith the expanded signature, including proper string conversion of UUIDs.
91-123: LGTM: Error handling test updated correctly.The test properly verifies behavior when a job doesn't exist, using the updated module paths and expanded signature.
126-165: LGTM: Missing document test is correct.The test properly verifies that when a source document is missing from S3, the job transitions to FAILED status. The broad
Exceptioncatch at line 149 is appropriate here—the test should pass regardless of the specific exception type raised by the storage layer.
168-213: LGTM: Transformer error handling test is comprehensive.The test correctly mocks a transformation failure and verifies that the error is properly caught, persisted, and propagates through the retry mechanism.
216-304: LGTM: Status transition and content type tests are well-structured.Both tests correctly verify job status transitions and content type handling across multiple formats using the updated API surface and module paths.
backend/app/services/doctransform/job.py (2)
34-63: LGTM: Job scheduling logic is correct.The
start_jobfunction properly captures the trace ID, updates the job record, schedules a low-priority Celery task, and returns the job UUID. The return type correctly matches the actual return value.
66-106: LGTM: Payload builders correctly structure success and failure responses.Both functions properly construct APIResponse envelopes with appropriate data: success includes the transformed document with optional signed URL, while failure explicitly nulls the transformed document. The exclusion of
error_messagefrom the nested data envelope is correct.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (1)
30-35: Fix incomplete type hint forfast_execute_jobparameter.The type hint
Callable[[int, Any, str, str], Any]is incomplete. According to the fixture definition in conftest.py,fast_execute_jobtakes 8 parameters, not 4.This same issue appears on lines 84, 129, and 178.
As per coding guidelines.
Apply this diff to fix all occurrences:
def test_execute_job_with_storage_error( self, db: Session, test_document: Tuple[Document, Project], - fast_execute_job: Callable[[int, Any, str, str], Any], + fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any], ) -> None:def test_execute_job_retry_mechanism( self, db: Session, test_document: Tuple[Document, Project], - fast_execute_job: Callable[[int, Any, str, str], Any], + fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any], ) -> None:def test_execute_job_exhausted_retries( self, db: Session, test_document: Tuple[Document, Project], - fast_execute_job: Callable[[int, Any, str, str], Any], + fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any], ) -> None:def test_execute_job_database_error_during_completion( self, db: Session, test_document: Tuple[Document, Project], - fast_execute_job: Callable[[int, Any, str, str], Any], + fast_execute_job: Callable[[int, str, str, str, str, str, str | None, Any], Any], ) -> None:
🧹 Nitpick comments (6)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (2)
15-22: Consolidate imports from the same module.Lines 15-22 import from
app.tests.services.doctransformer.test_job.utilsin two separate statements. Consider consolidating them into a single import block for better readability.Apply this diff:
-from app.tests.services.doctransformer.test_job.utils import ( - DocTransformTestBase, - MockTestTransformer, -) -from app.tests.services.doctransformer.test_job.utils import ( - create_failing_convert_document, - create_persistent_failing_convert_document, -) +from app.tests.services.doctransformer.test_job.utils import ( + DocTransformTestBase, + MockTestTransformer, + create_failing_convert_document, + create_persistent_failing_convert_document, +)
60-60: Consider using more specific exception types in test assertions.Lines 60, 156, and 209 use
pytest.raises(Exception)which catches any exception. While acceptable in tests, using more specific exception types would make the tests more precise and catch regressions where the wrong exception type is raised.For example, if the code raises a specific exception type for these failure scenarios, you could update:
with pytest.raises(SpecificExceptionType):However, if testing that any exception properly fails the job is the intended behavior, the current implementation is acceptable.
backend/app/tests/services/doctransformer/test_job/test_integration.py (1)
90-132: “Concurrent” test currently exercises sequential execution only
test_execute_job_concurrent_jobsruns three jobs one after another under a shared S3 setup and patchedSession/TRANSFORMERS, which is fine for verifying that multiple jobs don’t interfere. If you actually want to exercise concurrency (threads/tasks), consider running theexecute_jobcalls in parallel or renaming the test to avoid implying true concurrency.backend/app/tests/services/doctransformer/test_job/test_execute_job.py (3)
95-122: Update type hints for fast_execute_job to match its actual callable signatureThe
fast_execute_jobfixture is annotated here asCallable[[int, UUID, str, str], Any], but the fixture inconftest.pywrapsexecute_joband actually exposes the full eight-argument signature. This mismatch can confuse type checkers and future readers.Consider either:
- Updating the annotation to the full signature, or
- Simplifying to
Callable[..., Any]to avoid duplicating the exact signature in multiple places.As per coding guidelines.
136-160: Avoid using a blindExceptionin pytest.raises where possible
test_execute_job_with_missing_source_documentuseswith pytest.raises(Exception):, which Ruff flags (B017) as a blind exception assertion. If you know the specific exception types that can be raised for a missing source document (e.g., an HTTP- or storage-related error), it’s better to assert those explicitly, or use a small tuple of expected types, similar to other tests in this file.If the failure mode is intentionally broad and may vary by backend, consider adding a short comment explaining why a broad
Exceptionis acceptable here.
254-266: Docstring mentions content types, but test currently verifies only file extensions
test_execute_job_with_different_content_typesdescribes verifying “correct content types,” but the assertions only check the transformed filename extensions. If you need to validate actual content types (e.g., S3 metadata or HTTP headers), consider extending the test to assert those as well; otherwise, adjust the docstring to reflect that this test is focused on output extensions.Also applies to: 275-296
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py(9 hunks)backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py(8 hunks)backend/app/tests/services/doctransformer/test_job/test_integration.py(4 hunks)backend/app/tests/services/doctransformer/test_job/test_start_job.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/tests/services/doctransformer/test_job/test_start_job.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/tests/services/doctransformer/test_job/test_integration.pybackend/app/tests/services/doctransformer/test_job/test_execute_job_errors.pybackend/app/tests/services/doctransformer/test_job/test_execute_job.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/tests/services/doctransformer/test_job/test_integration.pybackend/app/tests/services/doctransformer/test_job/test_execute_job.py
🧬 Code graph analysis (3)
backend/app/tests/services/doctransformer/test_job/test_integration.py (6)
backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/crud/document/document.py (1)
DocumentCrud(13-134)backend/app/services/doctransform/job.py (2)
execute_job(110-270)start_job(34-63)backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)job_id(36-37)backend/app/tests/services/doctransformer/test_job/conftest.py (1)
current_user(71-79)backend/app/tests/services/doctransformer/test_job/utils.py (1)
MockTestTransformer(16-27)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (3)
backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)job_id(36-37)backend/app/tests/services/doctransformer/test_job/utils.py (6)
DocTransformTestBase(30-69)MockTestTransformer(16-27)failing_convert_document(76-85)create_failing_convert_document(72-87)persistent_failing_convert_document(95-96)create_persistent_failing_convert_document(90-98)backend/app/tests/services/doctransformer/test_job/conftest.py (1)
fast_execute_job(32-67)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py (6)
backend/app/services/doctransform/registry.py (1)
TransformationError(7-8)backend/app/services/doctransform/job.py (1)
execute_job(110-270)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (3)
TransformationStatus(11-15)DocTransformJobCreate(48-49)job_id(36-37)backend/app/tests/services/doctransformer/test_job/utils.py (1)
MockTestTransformer(16-27)backend/app/tests/services/doctransformer/test_job/conftest.py (1)
fast_execute_job(32-67)
🪛 Ruff (0.14.5)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py
156-156: Do not assert blind exception: Exception
(B017)
backend/app/tests/services/doctransformer/test_job/test_execute_job.py
149-149: Do not assert blind exception: Exception
(B017)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (9)
backend/app/tests/services/doctransformer/test_job/test_execute_job_errors.py (4)
42-42: LGTM! Consistent use of payload-based job creation.The migration to
DocTransformJobCreatepayload for job creation is correctly applied across all four test methods (lines 42, 92, 137, 186), aligning with the updated CRUD API.
46-51: LGTM! Module paths correctly updated.All patch paths have been consistently updated to the new
app.services.doctransform.*module structure throughout the test suite (lines 46-51, 98-104, 145-151, 189-199).
61-70: LGTM! Function calls properly updated with new parameters.All
fast_execute_jobcalls correctly stringify IDs and include the new parameters (task_id,callback_url,task_instance) consistently throughout the test suite (lines 61-70, 109-118, 157-166, 210-219). The use ofuuid4()for generating test task IDs is appropriate.
25-224: LGTM! Well-structured error handling tests.The test suite comprehensively covers error scenarios (storage failures, transient/persistent errors, database errors) with proper mocking, assertions, and verification of both job status and error messages. The test structure follows best practices with clear AAA pattern.
backend/app/tests/services/doctransformer/test_job/test_integration.py (2)
13-24: start_job/execute_job wiring and DocTransformJobCreate usage look correctImports, job creation via
DocTransformJobCreate, thestart_low_priority_jobpatch target, and thestart_job/execute_jobcalls (string IDs,task_id,callback_url,task_instance) are all consistent with the updated service API and CRUD layer. No issues spotted here.Also applies to: 40-79
133-185: Multi-format transformation coverage is aligned with service behaviorThe multi-format test using
DocTransformJobCreateplus the updatedexecute_jobsignature, and asserting transformed file extensions pertarget_format, correctly tracks the service’sFORMAT_TO_EXTENSIONbehavior. Looks good.backend/app/tests/services/doctransformer/test_job/test_execute_job.py (3)
14-21: Imports updated to new services/registry/models APIs are consistentSwitching to
app.services.doctransform.job.execute_job,TransformationErrorfrom the new registry, andDocTransformJobCreatefromapp.modelsis consistent with the current service/public model layout. No issues here.
51-72: Happy-path execute_job test is correctly aligned with the new signatureCreating the job via
DocTransformJobCreate, patchingapp.services.doctransform.job.Sessionandapp.services.doctransform.registry.TRANSFORMERS, and callingexecute_jobwith string IDs plustask_id,callback_url, andtask_instancematches the implementation inapp.services.doctransform.job.execute_job. The assertions on status, error_message, and transformed document metadata look sound.
138-159: DocTransformJobCreate usage and execute_job/fast_execute_job calls are consistently updatedAcross the missing-document, transformer-error, status-transition, and content-type tests, job creation via
DocTransformJobCreateand the calls toexecute_job/fast_execute_job(string IDs,task_id,callback_url,task_instance) are all consistent with the updated service and CRUD contracts. The status and error assertions correctly reflect the expected outcomes.Also applies to: 179-212, 223-245, 275-295
| job_id: UUID = Field(description="The unique identifier of the transformation job") | ||
| source_format: str = Field(description="The format of the source document") | ||
| target_format: str = Field(description="The format of the target document") | ||
| status: TransformationStatus |
There was a problem hiding this comment.
any reason why we removed the following?
source_format: str = Field(description="The format of the source document")
target_format: str = Field(description="The format of the target document")
There was a problem hiding this comment.
the message that goes with the response body already contains this information of the transformation that is happening from ehich source format to which target format, so i do not think we need to repeat this information
Prajna1999
left a comment
There was a problem hiding this comment.
Generic comments:
- Avoid leaky abstractions.
- Use types in a sparse manner.
- Avoid excessive metadata.
- Use libraries instead of primitives for file I/Os
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (1)
11-11: Remove unused import.The
postgresqldialect is imported but never used in this migration.Apply this diff:
-from sqlalchemy.dialects import postgresql
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py (2)
20-31: LGTM! Migration upgrade logic is correct.The upgrade operations are safe and correctly implement the requirements:
- Adding nullable
task_idandtrace_idcolumns won't affect existing rows- Renaming
created_attoinserted_atis straightforward- Using
AutoStringis appropriate for ID fields
34-39: LGTM! Downgrade path correctly reverses all changes.The downgrade operations properly reverse the upgrade in the correct order, ensuring the migration is fully reversible.
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
backend/app/services/doctransform/job.py (3)
34-63: Alignstart_jobreturn type annotation with the actual UUID return value
start_jobis annotated to returnstrbut returnsjob.id(aUUID). Callers already wrap it withstr(...)when needed, so changing the annotation toUUIDis the least noisy and most accurate fix.def start_job( db: Session, current_user: CurrentUserOrgProject, job_id: UUID, transformer_name: str, target_format: str, callback_url: str | None, -) -> str: +) -> UUID: @@ - logger.info( - f"[start_job] Job scheduled for document transformation | id: {job.id}, project_id: {project_id}, task_id: {task_id}" - ) - return job.id + logger.info( + f"[start_job] Job scheduled for document transformation | id: {job.id}, project_id: {project_id}, task_id: {task_id}" + ) + return job.id
116-119: Optional: maketask_instanceexplicitly unused to satisfy Ruff
task_instanceis required by Celery but unused, which Ruff flags (ARG001). Renaming or adding anoqamakes this intent clear and keeps lint clean without changing behavior.-def execute_job( +def execute_job( @@ - task_id: str, - callback_url: str | None, - task_instance, + task_id: str, + callback_url: str | None, + task_instance, # noqa: ARG001 # provided by Celery, not used directly ):Alternatively, rename to
_task_instanceand drop the noqa.
109-267: Guard againstjob_uuidbeing undefined whenUUID(job_id)fails inexecute_jobIf
UUID(job_id)raises (e.g. malformed ID),job_uuidis never assigned, but theexceptblock logs and uses it for DB updates, which will trigger anUnboundLocalErrorand mask the original failure. Initializingjob_uuidand conditioning DB updates on it makes the error path safe.@retry(wait=wait_exponential(multiplier=5, min=5, max=10), stop=stop_after_attempt(3)) def execute_job( @@ - start_time = time.time() - tmp_dir: Path | None = None - - job_for_payload = None # keep latest job snapshot for payloads - - try: - job_uuid = UUID(job_id) + start_time = time.time() + tmp_dir: Path | None = None + + job_for_payload = None # keep latest job snapshot for payloads + job_uuid: UUID | None = None + + try: + job_uuid = UUID(job_id) source_uuid = UUID(source_document_id) @@ - except Exception as e: - logger.error( - "[doc_transform.execute_job] FAILED | job_id=%s | error=%s", - job_uuid, - e, - exc_info=True, - ) - - try: - with Session(engine) as db: - job_crud = DocTransformationJobCrud(session=db, project_id=project_id) - job_for_payload = job_crud.update( - job_uuid, - DocTransformJobUpdate( - status=TransformationStatus.FAILED, error_message=str(e) - ), - ) - except Exception as db_error: - logger.error( - "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s", - job_uuid, - db_error, - ) + except Exception as e: + logger.error( + "[doc_transform.execute_job] FAILED | job_id=%s | error=%s", + job_uuid or job_id, + e, + exc_info=True, + ) + + if job_uuid is not None: + try: + with Session(engine) as db: + job_crud = DocTransformationJobCrud(session=db, project_id=project_id) + job_for_payload = job_crud.update( + job_uuid, + DocTransformJobUpdate( + status=TransformationStatus.FAILED, error_message=str(e) + ), + ) + except Exception as db_error: + logger.error( + "[doc_transform.execute_job] failed to persist FAILED status | job_id=%s | db_error=%s", + job_uuid, + db_error, + )
🧹 Nitpick comments (3)
backend/app/api/routes/doc_transformation_job.py (1)
18-97: Transformation job routes correctly delegate to CRUD and helpersThe routes cleanly hand off to
DocTransformationJobCrudand thebuild_job_schema(s)helpers, and theinclude_urlflag is respected viaget_cloud_storage. The prefix/tag choices also align these under the documents surface.If you need deterministic ordering for
jobs_not_found(e.g., for clients/tests), consider wrappinglist(jobs_not_found)insorted(...); otherwise this is fine as-is.backend/app/services/documents/helpers.py (1)
1-3: Modernize type annotations to match Ruff and Python 3.11 styleThe helpers are well-structured, but you can align with Ruff’s UP0xx suggestions and current typing best practices by:
- Importing
Iterablefromcollections.abc.- Using built-in
tuple[...]instead ofTuple[...].- Using
X | Yinstead ofUnion[X, Y].-from typing import Optional, Tuple, Iterable, Union +from collections.abc import Iterable +from typing import Optional @@ -) -> Tuple[str, str | None]: +) -> tuple[str, str | None]: @@ -PublicDoc = Union[DocumentPublic, TransformedDocumentPublic] +PublicDoc = DocumentPublic | TransformedDocumentPublicThe rest of the code can remain unchanged; this should satisfy the Ruff hints without affecting behavior.
Also applies to: 31-32, 110-137
backend/app/api/routes/documents.py (1)
41-70: Document routes: transformation integration and signed URLs look solid
list_docscorrectly reusesbuild_document_schemasand conditionally instantiates storage based oninclude_urland the presence of documents.upload_doc’s use ofpre_transform_validation+schedule_transformationcleanly separates validation, persistence, and job scheduling, while still always returning a signed URL for the uploaded document.doc_infonow returningAPIResponse[DocumentPublic | TransformedDocumentPublic](conceptually) viabuild_document_schemamatches how transformed docs are represented everywhere else.As a minor style improvement, you can switch the
response_modelunions to theX | Yform to align with Python 3.11 and Ruff’s UP007:- response_model=APIResponse[list[Union[DocumentPublic, TransformedDocumentPublic]]], + response_model=APIResponse[list[DocumentPublic | TransformedDocumentPublic]], @@ - response_model=APIResponse[Union[DocumentPublic, TransformedDocumentPublic]], + response_model=APIResponse[DocumentPublic | TransformedDocumentPublic],Also applies to: 72-133, 201-228
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py(1 hunks)backend/app/api/main.py(1 hunks)backend/app/api/routes/doc_transformation_job.py(1 hunks)backend/app/api/routes/documents.py(6 hunks)backend/app/models/doc_transformation_job.py(2 hunks)backend/app/services/doctransform/job.py(1 hunks)backend/app/services/documents/helpers.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/alembic/versions/eed36ae3c79a_alter_doc_transform_table_for_celery.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/services/documents/helpers.pybackend/app/api/main.pybackend/app/models/doc_transformation_job.pybackend/app/api/routes/doc_transformation_job.pybackend/app/services/doctransform/job.pybackend/app/api/routes/documents.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/documents/helpers.pybackend/app/services/doctransform/job.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/main.pybackend/app/api/routes/doc_transformation_job.pybackend/app/api/routes/documents.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/doc_transformation_job.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/services/documents/helpers.pybackend/app/api/main.pybackend/app/models/doc_transformation_job.pybackend/app/api/routes/doc_transformation_job.pybackend/app/services/doctransform/job.pybackend/app/api/routes/documents.py
📚 Learning: 2025-08-06T05:29:45.557Z
Learnt from: avirajsingh7
Repo: ProjectTech4DevAI/ai-platform PR: 319
File: backend/app/crud/prompt.py:111-118
Timestamp: 2025-08-06T05:29:45.557Z
Learning: In FastAPI applications using Pydantic/SQLModel, when update model fields have min_length=1 validation, empty strings are rejected at the validation layer before reaching CRUD functions. This means checking `if field:` vs `if field is not None:` in CRUD operations should consider the model validation constraints.
Applied to files:
backend/app/models/doc_transformation_job.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/
Applied to files:
backend/app/models/doc_transformation_job.py
🧬 Code graph analysis (6)
backend/app/services/documents/helpers.py (6)
backend/app/services/doctransform/registry.py (4)
get_available_transformers(75-79)get_file_format(53-59)is_transformation_supported(70-72)resolve_transformer(82-106)backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/crud/document/document.py (1)
DocumentCrud(13-134)backend/app/models/doc_transformation_job.py (4)
DocTransformJobCreate(48-51)TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/models/document.py (5)
TransformationJobInfo(67-74)Document(21-41)DocumentPublic(44-57)DocTransformationJobPublic(82-87)TransformedDocumentPublic(60-64)backend/app/services/doctransform/job.py (1)
start_job(34-63)
backend/app/api/main.py (1)
backend/app/tests/crud/documents/documents/test_crud_document_update.py (1)
documents(12-15)
backend/app/models/doc_transformation_job.py (1)
backend/app/core/util.py (1)
now(13-14)
backend/app/api/routes/doc_transformation_job.py (6)
backend/app/crud/document/doc_transformation_job.py (3)
DocTransformationJobCrud(21-95)read_one(36-55)read_each(57-71)backend/app/crud/document/document.py (3)
DocumentCrud(13-134)read_one(18-34)read_each(72-95)backend/app/models/document.py (3)
DocTransformationJobPublic(82-87)DocTransformationJobsPublic(90-92)TransformedDocumentPublic(60-64)backend/app/utils.py (2)
APIResponse(30-54)success_response(37-40)backend/app/services/documents/helpers.py (2)
build_job_schema(146-170)build_job_schemas(173-191)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)
backend/app/services/doctransform/job.py (8)
backend/app/crud/document/doc_transformation_job.py (3)
DocTransformationJobCrud(21-95)update(73-95)read_one(36-55)backend/app/crud/document/document.py (3)
DocumentCrud(13-134)update(97-122)read_one(18-34)backend/app/models/document.py (1)
Document(21-41)backend/app/models/doc_transformation_job.py (4)
DocTransformJobUpdate(54-59)TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/celery/utils.py (1)
start_low_priority_job(46-71)backend/app/utils.py (3)
APIResponse(30-54)success_response(37-40)failure_response(43-54)backend/app/services/doctransform/registry.py (1)
convert_document(109-130)
backend/app/api/routes/documents.py (3)
backend/app/core/cloud/storage.py (2)
get_signed_url(135-137)get_signed_url(217-242)backend/app/models/document.py (5)
Document(21-41)DocumentPublic(44-57)TransformedDocumentPublic(60-64)DocumentUploadResponse(77-79)TransformationJobInfo(67-74)backend/app/services/documents/helpers.py (4)
schedule_transformation(69-107)pre_transform_validation(26-66)build_document_schema(119-128)build_document_schemas(131-143)
🪛 Ruff (0.14.5)
backend/app/services/documents/helpers.py
1-1: Import from collections.abc instead: Iterable
Import from collections.abc
(UP035)
1-1: typing.Tuple is deprecated, use tuple instead
(UP035)
110-110: Use X | Y for type annotations
(UP007)
backend/app/services/doctransform/job.py
118-118: Unused function argument: task_instance
(ARG001)
backend/app/api/routes/documents.py
44-44: Use X | Y for type annotations
Convert to X | Y
(UP007)
204-204: Use X | Y for type annotations
(UP007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/models/doc_transformation_job.py (1)
23-59: Doc transformation job model & DTOs line up with CRUD and public schemasThe nullable fields, inserted_at default, and DTOs (create/update) are consistent with the CRUD layer and the public
DocTransformationJobPublicusage (viajob_id/timestamp properties). No issues from a modeling or type-hinting perspective.backend/app/api/main.py (1)
3-26: Import reordering is safeReordering
documentsin the import list doesn’t affect router registration or behavior; everything still wires up identically.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
backend/app/services/documents/helpers.py (3)
1-1: Modernize type imports for Python 3.11+.The imports use deprecated typing constructs. Use built-in
tupleinstead oftyping.Tuple,collections.abc.Iterableinstead oftyping.Iterable, and|syntax instead ofUnionandOptional.Apply this diff:
-from typing import Optional, Tuple, Iterable, Union +from collections.abc import IterableThen replace usages throughout:
- Line 31:
Tuple[str, str | None]→tuple[str, str | None]- Line 46:
Optional[str]→str | None- Line 110:
Union[DocumentPublic, TransformedDocumentPublic]→DocumentPublic | TransformedDocumentPublic
119-128: Consider a Protocol or concrete type for the storage parameter.Using
object | Noneas the type hint forstorageprovides minimal type safety. Define a Protocol with aget_signed_urlmethod or import the concrete storage class to enable proper type checking and IDE support.Example Protocol approach:
from typing import Protocol class StorageProtocol(Protocol): def get_signed_url(self, url: str) -> str: ... def build_document_schema( *, document: Document, include_url: bool, storage: StorageProtocol | None, ) -> PublicDoc:
173-191: Consider batch-loading documents for efficiency.The current implementation may result in N+1 queries when processing many jobs with transformed documents, as each calls
doc_crud.read_oneseparately. If job lists grow large, consider batch-loading all transformed documents upfront and building a lookup map.Example optimization:
def build_job_schemas( *, jobs: Iterable[DocTransformationJob], doc_crud: DocumentCrud, include_url: bool, storage: object | None, ) -> list[DocTransformationJobPublic]: jobs_list = list(jobs) # Batch-load all transformed documents doc_ids = {j.transformed_document_id for j in jobs_list if j.transformed_document_id} docs_map = {doc.id: doc for doc_id in doc_ids if (doc := doc_crud.read_one(doc_id))} # Build schemas using the map...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/services/documents/helpers.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/services/documents/helpers.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/documents/helpers.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/services/documents/helpers.py
🧬 Code graph analysis (1)
backend/app/services/documents/helpers.py (6)
backend/app/services/doctransform/registry.py (4)
get_available_transformers(75-79)get_file_format(53-59)is_transformation_supported(70-72)resolve_transformer(82-106)backend/app/crud/document/doc_transformation_job.py (1)
DocTransformationJobCrud(21-95)backend/app/crud/document/document.py (1)
DocumentCrud(13-134)backend/app/models/doc_transformation_job.py (4)
DocTransformJobCreate(48-51)TransformationStatus(11-15)DocTransformationJob(18-45)job_id(36-37)backend/app/models/document.py (4)
Document(21-41)DocumentPublic(44-57)DocTransformationJobPublic(82-87)TransformedDocumentPublic(60-64)backend/app/services/doctransform/job.py (1)
start_job(34-63)
🪛 Ruff (0.14.5)
backend/app/services/documents/helpers.py
1-1: Import from collections.abc instead: Iterable
Import from collections.abc
(UP035)
1-1: typing.Tuple is deprecated, use tuple instead
(UP035)
110-110: Use X | Y for type annotations
(UP007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/services/documents/helpers.py (3)
26-66: LGTM: Well-structured validation logic.The pre-transformation validation properly detects source format, validates transformation support, and resolves the transformer with clear error messages.
113-116: LGTM: Clean schema conversion logic.The function correctly determines the appropriate public schema based on whether the document is a transformation.
146-170: LGTM: Proper job schema construction.The function correctly builds job schemas and handles the optional transformed document with appropriate error propagation from
read_one.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
76-76: Add return type hint to fixture.The
routefixture is missing a return type hint. As per coding guidelines, type hints should be used in Python code for this Python 3.11+ project.Apply this diff to add the return type hint:
@pytest.fixture -def route(): +def route() -> Route: return Route("")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/tests/api/routes/documents/test_route_document_upload.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/tests/api/routes/documents/test_route_document_upload.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
🧬 Code graph analysis (1)
backend/app/tests/api/routes/documents/test_route_document_upload.py (1)
backend/app/tests/utils/document.py (1)
Route(83-110)
🔇 Additional comments (2)
backend/app/tests/api/routes/documents/test_route_document_upload.py (2)
183-183: Endpoint path is consistent with actual implementation.The
status_check_urlcorrectly uses the singular/documents/transformation/{id}form. This is consistent across the helper service (backend/app/services/documents/helpers.py:106), all related test cases, and the entire codebase. No remaining plural references exist.
154-154: Patch paths are correct and verified.The patch decorators at lines 154, 187, and 277 correctly reference
app.services.doctransform.job.start_job, which has been confirmed to exist atbackend/app/services/doctransform/job.py:34. The function signature is well-defined with proper parameters (db,current_user,job_id,transformer_name,target_format,callback_url), and no references to the old path remain in the codebase. The architectural shift fromapp.core.doctransform.servicetoapp.services.doctransform.jobis complete and intentional across the entire codebase.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/api/main.py (1)
3-27: Consider sorting imports alphabetically for consistency.The import statements are not alphabetically sorted, while the
include_routercalls below (lines 31-52) are. This mismatch can make it harder to verify that all imported routers are registered.Apply this diff to sort imports alphabetically:
from app.api.routes import ( api_keys, assistants, + collection_job, collections, config, - doc_transformation_job, - documents, - login, - llm, - organization, - openai_conversation, - project, - responses, - private, - threads, - users, - utils, - onboarding, credentials, cron, + doc_transformation_job, + documents, evaluation, fine_tuning, + llm, + login, model_evaluation, - collection_job, + onboarding, + openai_conversation, + organization, + private, + project, + responses, + threads, + users, + utils, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/api/main.py(1 hunks)backend/app/models/__init__.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/main.pybackend/app/models/__init__.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/main.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/__init__.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/doctransform/**/*.py : Place document transformation utilities under backend/app/core/doctransform/
Applied to files:
backend/app/api/main.pybackend/app/models/__init__.py
🧬 Code graph analysis (2)
backend/app/api/main.py (1)
backend/app/tests/crud/documents/documents/test_crud_document_update.py (1)
documents(12-15)
backend/app/models/__init__.py (2)
backend/app/models/document.py (3)
DocTransformationJobPublic(82-87)DocTransformationJobsPublic(90-92)TransformedDocumentPublic(60-64)backend/app/models/doc_transformation_job.py (4)
DocTransformationJob(18-45)TransformationStatus(11-15)DocTransformJobCreate(48-51)DocTransformJobUpdate(54-59)
🔇 Additional comments (1)
backend/app/models/__init__.py (1)
45-53: Re-exports for document transformation models/DTOs look correctThe newly added imports for
DocTransformationJobPublic,DocTransformationJobsPublic,TransformedDocumentPublic,DocTransformJobCreate, andDocTransformJobUpdateare consistent with the new document-transformation model/DTO structure and keep the models package’s public surface coherent. No issues from this change alone.Also applies to: 54-59
Summary
Target issue is #429 and #418
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Database Schema Migration
Migration: Added nullable task_id and trace_id columns to doc_transformation_job table for celery
Field rename: created_at → inserted_at for consistency with other tables, with full reversible downgrade path.
API & Route Updates
Endpoints: Restructured paths (e.g., / {job_id} → /transformation/{job_id}) across document routes to include transformation endpoints under document router only and not create a new router for it.
Response models: Updated to use new public types for better readability for users (DocTransformationJobPublic, TransformedDocumentPublic).
Enhancements: Added include_url query parameter; integrated DocumentCrud and cloud storage helpers for signed URL generation.
Imports: Adjusted imports across routes (fine_tuning.py, documents.py) to reflect new CRUD module paths since we moved document crud file to crud/document folder , which contains both document crud file and document transformation crud file
CRUD Layer Refactor
Structure: Moved CRUD modules into nested document/ package paths.
API changes:
create(source_document_id) → create(DocTransformJobCreate)
update_status() → update(DocTransformJobUpdate) (patch-based field updates)
Cleanup: Removed unused imports and legacy references.
Models & Public DTOs
New schemas: Added DocTransformJobCreate, DocTransformJobUpdate, DocTransformationJobPublic, DocTransformationJobsPublic, and TransformedDocumentPublic.
Model updates:
Added task_id, trace_id, inserted_at to DocTransformationJob.
Added computed properties (job_id, job_inserted_at, job_updated_at).
Updated TransformationJobInfo (added status, removed source_format / target_format).
Service Layer Reorganization
Refactor: Removed app.core.doctransform.service and moved doc transformation's core logic to app.services.doctransform.job implements:
Summary by CodeRabbit
New Features
API Changes
✏️ Tip: You can customize this high-level summary in your review settings.