Classification: db models and migration script#305
Classification: db models and migration script#305nishika26 merged 18 commits intofeature/classificationfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis change introduces new database models and migration scripts for fine-tuning jobs and model evaluations, including their relationships with existing organization and project models. It adds new SQLModel-based classes, updates import statements, and establishes bidirectional ORM relationships, supported by a migration script that creates corresponding tables and enums. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API
participant DB
participant FineTuning
participant ModelEvaluation
User->>API: Create Fine-Tuning Job
API->>DB: Insert Fine_Tuning record
DB-->>FineTuning: Store job details
User->>API: Evaluate Model
API->>DB: Insert Model_Evaluation record (linked to Fine_Tuning)
DB-->>ModelEvaluation: Store evaluation details
API->>User: Return job/evaluation status
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (12)
backend/app/models/project.py (1)
52-57: Update type annotations to use lowercaselist.The relationship definitions are correct and follow the established pattern. However, consider updating the type annotations to use lowercase
listinstead ofListfor consistency with modern Python typing conventions.- fine_tuning: List["Fine_Tuning"] = Relationship( + fine_tuning: list["Fine_Tuning"] = Relationship( back_populates="project", cascade_delete=True ) - model_evaluation: List["Model_Evaluation"] = Relationship( + model_evaluation: list["Model_Evaluation"] = Relationship( back_populates="project", cascade_delete=True )backend/app/models/organization.py (1)
57-62: Update type annotations to use lowercaselist.The relationship definitions are well-structured and consistent with existing relationships. Consider updating the type annotations to use lowercase
listfor modern Python typing conventions.- fine_tuning: List["Fine_Tuning"] = Relationship( + fine_tuning: list["Fine_Tuning"] = Relationship( back_populates="organization", cascade_delete=True ) - model_evaluation: List["Model_Evaluation"] = Relationship( + model_evaluation: list["Model_Evaluation"] = Relationship( back_populates="organization", cascade_delete=True )backend/app/models/model_evaluation.py (3)
1-1: Update type annotations for modern Python.Consider updating the type annotations to use modern Python conventions:
-from typing import Optional, List +from typing import Optional- eval_split_ratio: List[float] = Field(sa_column=Column(JSON, nullable=False)) + eval_split_ratio: list[float] = Field(sa_column=Column(JSON, nullable=False))Also applies to: 33-33
44-44: Minor typo in docstring.- """Database model for keep a record of model evaluation""" + """Database model for keeping a record of model evaluation"""
61-61: Consider using union syntax for optional types.For consistency with modern Python typing, consider using the union operator:
- deleted_at: Optional[datetime] = Field(default=None, nullable=True) + deleted_at: datetime | None = Field(default=None, nullable=True)- score: Optional[float] = None + score: float | None = None- deleted_at: Optional[datetime] = None + deleted_at: datetime | None = NoneAlso applies to: 74-74, 78-78
backend/app/models/fine_tuning.py (7)
1-1: Update deprecated typing imports.The
typing.Listimport is deprecated in favor of the built-inlisttype for Python 3.9+.-from typing import Optional, List +from typing import Optional
13-13: Update deprecated List type annotation.The
List[float]type annotation should be updated to use the built-inlisttype.- split_ratio: List[float] = Field(sa_column=Column(JSON, nullable=False)) + split_ratio: list[float] = Field(sa_column=Column(JSON, nullable=False))
42-48: Update Optional type annotation syntax.The Optional type annotations should be updated to use the modern union syntax.
- openai_job_id: Optional[str] = Field( + openai_job_id: str | None = Field( default=None, description="Fine tuning Job ID returned by OpenAI" ) status: str = Field(default=None, description="Status of the fine-tuning job") - fine_tuned_model: Optional[str] = Field( + fine_tuned_model: str | None = Field( default=None, description="Final fine tuned model name from OpenAI" )
45-45: Consider using a more meaningful default for status field.The status field defaults to
None, but it might be better to use a meaningful initial status like "pending" or "created".- status: str = Field(default=None, description="Status of the fine-tuning job") + status: str = Field(default="pending", description="Status of the fine-tuning job")
65-70: Update Optional type annotation syntax.The Optional type annotations should be updated to use the modern union syntax.
- openai_job_id: Optional[str] = None + openai_job_id: str | None = None status: str - fine_tuned_model: Optional[str] = None + fine_tuned_model: str | None = None inserted_at: datetime updated_at: datetime - deleted_at: Optional[datetime] = None + deleted_at: datetime | None = None
52-52: Update Optional type annotation syntax.The Optional type annotation should be updated to use the modern union syntax.
- deleted_at: Optional[datetime] = Field(default=None, nullable=True) + deleted_at: datetime | None = Field(default=None, nullable=True)
56-56: Update List type annotation.The
Listtype annotation should be updated to use the built-inlisttype.- model_evaluation: List["Model_Evaluation"] = Relationship( + model_evaluation: list["Model_Evaluation"] = Relationship( back_populates="fine_tuning" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py(1 hunks)backend/app/models/__init__.py(1 hunks)backend/app/models/fine_tuning.py(1 hunks)backend/app/models/model_evaluation.py(1 hunks)backend/app/models/organization.py(2 hunks)backend/app/models/project.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
backend/app/models/__init__.py (2)
backend/app/models/fine_tuning.py (4)
FineTuningJobBase(11-26)Fine_Tuning(37-58)FineTuningJobCreate(29-34)FineTuningJobPublic(61-70)backend/app/models/model_evaluation.py (4)
Model_Evaluation(43-65)ModelEvaluationBase(19-36)ModelEvaluationCreate(39-40)ModelEvaluationPublic(68-78)
backend/app/models/project.py (1)
backend/app/models/openai_conversation.py (1)
OpenAIConversation(58-69)
backend/app/models/model_evaluation.py (3)
backend/app/models/collection.py (1)
Collection(21-53)backend/app/models/api_key.py (1)
APIKey(28-38)backend/app/models/document.py (1)
Document(10-30)
🪛 Ruff (0.12.2)
backend/app/models/__init__.py
60-60: .fine_tuning.FineTuningJobBase imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
61-61: .fine_tuning.Fine_Tuning imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
62-62: .fine_tuning.FineTuningJobCreate imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
63-63: .fine_tuning.FineTuningJobPublic imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
67-67: .model_evaluation.Model_Evaluation imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
68-68: .model_evaluation.ModelEvaluationBase imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
69-69: .model_evaluation.ModelEvaluationCreate imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
70-70: .model_evaluation.ModelEvaluationPublic imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/models/organization.py
57-57: Use list instead of List for type annotation
Replace with list
(UP006)
60-60: Use list instead of List for type annotation
Replace with list
(UP006)
backend/app/models/project.py
52-52: Use list instead of List for type annotation
Replace with list
(UP006)
52-52: Undefined name Fine_Tuning
(F821)
55-55: Use list instead of List for type annotation
Replace with list
(UP006)
55-55: Undefined name Model_Evaluation
(F821)
backend/app/models/model_evaluation.py
1-1: typing.List is deprecated, use list instead
(UP035)
33-33: Use list instead of List for type annotation
Replace with list
(UP006)
61-61: Use X | None for type annotations
Convert to X | None
(UP045)
63-63: Undefined name Project
(F821)
64-64: Undefined name Organization
(F821)
65-65: Undefined name Fine_Tuning
(F821)
74-74: Use X | None for type annotations
Convert to X | None
(UP045)
78-78: Use X | None for type annotations
Convert to X | None
(UP045)
backend/app/models/fine_tuning.py
1-1: typing.List is deprecated, use list instead
(UP035)
13-13: Use list instead of List for type annotation
Replace with list
(UP006)
42-42: Use X | None for type annotations
Convert to X | None
(UP045)
46-46: Use X | None for type annotations
Convert to X | None
(UP045)
52-52: Use X | None for type annotations
Convert to X | None
(UP045)
54-54: Undefined name Project
(F821)
55-55: Undefined name Organization
(F821)
56-56: Use list instead of List for type annotation
Replace with list
(UP006)
56-56: Undefined name Model_Evaluation
(F821)
65-65: Use X | None for type annotations
Convert to X | None
(UP045)
67-67: Use X | None for type annotations
Convert to X | None
(UP045)
70-70: Use X | None for type annotations
Convert to X | None
(UP045)
🔇 Additional comments (10)
backend/app/models/__init__.py (1)
59-71: LGTM! New model imports follow the established pattern.The imports for fine-tuning and model evaluation models are correctly structured and consistent with the existing import organization in this namespace module.
backend/app/models/organization.py (1)
13-14: LGTM! Proper forward reference imports.The imports for the new models in the TYPE_CHECKING block correctly handle forward references and avoid circular import issues.
backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py (3)
19-26: Good enum handling with checkfirst parameter.The PostgreSQL ENUM creation with
checkfirst=Trueandcreate_type=Falseis a best practice to avoid duplicate type creation issues.
31-83: Well-structured table creation with proper constraints.The migration correctly creates both tables with:
- Appropriate foreign key constraints with CASCADE deletes
- Proper column types matching the ORM models
- Consistent timestamp fields
The table structures align well with the ORM models defined in the corresponding Python files.
86-88: Complete downgrade implementation.The downgrade function properly reverses all changes by dropping tables and the enum type in the correct order.
backend/app/models/model_evaluation.py (3)
12-16: LGTM! Enum values match migration script.The
EvaluationStatusenum is properly defined and the values match those in the corresponding Alembic migration script.
19-36: Well-designed base model with proper foreign key constraints.The
ModelEvaluationBaseclass properly defines foreign key relationships to document, project, and organization entities with appropriate cascade delete behavior.
43-65: Comprehensive database model with proper relationships.The
Model_Evaluationmodel is well-structured with:
- Proper primary key and foreign key definitions
- Appropriate nullable settings for optional fields
- Consistent timestamp fields matching other models
- Correct bidirectional relationships with back_populates
backend/app/models/fine_tuning.py (2)
29-34: LGTM!The create model correctly inherits all required fields from the base class. The pattern of using a separate creation model is appropriate for SQLModel.
54-58: LGTM! Relationship definitions are correct.The relationship definitions properly use forward references with quotes and establish the correct bidirectional relationships with back_populates.
backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
backend/app/models/model_evaluation.py (2)
1-1: Remove unused import.The
Optionalimport is not used in this file.-from typing import Optional
54-54: Make score field description more generic.The description specifically mentions "Matthews Correlation Coefficient" but the field is generic and could store other evaluation metrics. Consider making the description more general.
- score: float = Field(nullable=True, description="Matthews Correlation Coefficient") + score: float = Field(nullable=True, description="Evaluation score for the specified metric")backend/app/models/fine_tuning.py (2)
1-1: Remove unused import.The
Optionalimport is not used in this file.-from typing import Optional
45-45: Consider using an enum for status field.For consistency with
Model_Evaluation.statusand better type safety, consider defining aFineTuningStatusenum instead of using a generic string field.+from enum import Enum + +class FineTuningStatus(str, Enum): + pending = "pending" + running = "running" + completed = "completed" + failed = "failed" + cancelled = "cancelled" - status: str = Field(default="pending", description="Status of the fine-tuning job") + status: FineTuningStatus = Field(default=FineTuningStatus.pending, description="Status of the fine-tuning job")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py(1 hunks)backend/app/models/fine_tuning.py(1 hunks)backend/app/models/model_evaluation.py(1 hunks)backend/app/models/organization.py(2 hunks)backend/app/models/project.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app/models/organization.py
- backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/models/fine_tuning.py (5)
backend/app/models/api_key.py (2)
APIKey(28-38)APIKeyBase(10-20)backend/app/models/collection.py (1)
Collection(21-53)backend/app/models/document.py (1)
Document(10-30)backend/app/models/assistants.py (1)
Assistant(29-40)backend/app/models/openai_conversation.py (1)
OpenAIConversation(58-69)
🪛 Ruff (0.12.2)
backend/app/models/fine_tuning.py
1-1: typing.Optional imported but unused
Remove unused import: typing.Optional
(F401)
54-54: Undefined name Project
(F821)
55-55: Undefined name Organization
(F821)
56-56: Undefined name Model_Evaluation
(F821)
backend/app/models/model_evaluation.py
1-1: typing.Optional imported but unused
Remove unused import: typing.Optional
(F401)
63-63: Undefined name Project
(F821)
64-64: Undefined name Organization
(F821)
65-65: Undefined name Fine_Tuning
(F821)
backend/app/models/project.py
52-52: Undefined name Fine_Tuning
(F821)
55-55: Undefined name Model_Evaluation
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/models/project.py (1)
52-57: LGTM! Relationship definitions follow established patterns.The new relationship fields for
fine_tuningandmodel_evaluationare properly defined with:
- Correct forward reference syntax using quotes
- Appropriate
back_populatesconfiguration- Proper cascade delete behavior for dependent entities
This is consistent with other relationship definitions in the
Projectmodel.backend/app/models/model_evaluation.py (1)
12-78: Well-structured model implementation.The model follows established patterns with:
- Proper enum definition for status values
- Appropriate base/create/table/public class hierarchy
- Correct foreign key definitions with cascade delete
- Proper use of JSON column for complex data (eval_split_ratio)
- Consistent timestamp and soft delete fields
- Bidirectional relationships with back_populates
backend/app/models/fine_tuning.py (2)
11-70: Well-structured model following established patterns.The model implementation is solid with:
- Proper base/create/table/public class hierarchy
- Correct foreign key definitions with cascade delete
- Appropriate use of JSON column for complex data (split_ratio)
- Consistent timestamp and soft delete fields
- Bidirectional relationships with back_populates
56-58: Add cascade_delete=True for consistency.The
model_evaluationrelationship should includecascade_delete=Truefor consistency with other relationships and to ensure proper cleanup when a fine-tuning job is deleted.model_evaluation: list["Model_Evaluation"] = Relationship( - back_populates="fine_tuning" + back_populates="fine_tuning", cascade_delete=True )Likely an incorrect or invalid review comment.
backend/app/models/fine_tuning.py
Outdated
| document_id: UUID = Field( | ||
| foreign_key="document.id", | ||
| nullable=False, | ||
| ondelete="CASCADE", | ||
| ) |
There was a problem hiding this comment.
Is this the same as "training_file" in the Create fine-tuning job API or the Retrieve fine-tuning job API?
Also, not sure if the fine-tuned model must be deleted when a document is deleted.
There was a problem hiding this comment.
We will fetch the document using this document id and then upload it to openai using this api , then the id we get in the response id is what we will pass to the openai's fine tuning API.
You're absolutely right to raise the question of whether a fine-tuned model should be deleted when the associated document is deleted. I had the same question in mind. However, since the current design involves deleting all child resources when the parent document is removed, I chose to follow that pattern for consistency. That said, I'm open to revisiting this. If others feel that the fine-tuned model should persist even after the document is deleted, we can definitely discuss and adjust the approach accordingly. @vijay-T4D @AkhileshNegi
There was a problem hiding this comment.
Ideally we should go with soft delete kind of an approach and then plan the purge after a desired number of days. Also we need to check how the model behaves post a document deletion.
backend/app/alembic/versions/8c477031ccd7_add_fine_tuning_and_model_evaluation_.py
Show resolved
Hide resolved
backend/app/alembic/versions/8c477031ccd7_add_fine_tuning_and_model_evaluation_.py
Outdated
Show resolved
Hide resolved
| "training_file_id", sqlmodel.sql.sqltypes.AutoString(), nullable=True | ||
| ), | ||
| sa.Column("testing_file_id", sqlmodel.sql.sqltypes.AutoString(), nullable=True), | ||
| sa.Column("openai_job_id", sqlmodel.sql.sqltypes.AutoString(), nullable=True), |
There was a problem hiding this comment.
this should not be nullable true
There was a problem hiding this comment.
Since the entire flow—from data processing to file upload and fine-tuning job creation—runs as a background job, these fields can be null initially. For example, if the file upload succeeds but the fine-tuning job creation fails, we’d still have the training and testing file IDs stored. Later, we can easily identify such incomplete rows (i.e., those with file IDs but no OpenAI job ID) for a given document ID and split ratio, and resume the process from there.
There was a problem hiding this comment.
then make entry to DB only when it passes basic checks like file uploading and all and fine tune job has started. So initially the status will be in_progress and if it fails then update the status. so the entry means a fine tune job was initiated.. maybe add a column to store error msgs n all if we get it from openai
There was a problem hiding this comment.
Yes, but there’s still a case where file upload succeeds but fine-tuning job creation fails. If we retry, we’d end up re-downloading and re-uploading unless we explicitly delete the uploaded files on failure. Even with an error message column, we'd still be inserting DB entries without training/testing file IDs or the OpenAI job ID.
backend/app/alembic/versions/8c477031ccd7_add_fine_tuning_and_model_evaluation_.py
Show resolved
Hide resolved
* Classification: db models and migration script (#305) * db models and migration script * Classification: Fine tuning Initiation and retrieve endpoint (#315) * Fine-tuning core, initiation, and retrieval * seperate session for bg task, and formating fixes * fixing alembic revision * Classification : Model evaluation of fine tuned models (#326) * Model evaluation of fine tuned models * fixing alembic revision * alembic revision fix * Classification : train and test data to s3 (#343) * alembic file for adding and removing columns * train and test s3 url column * updating alembic revision * formatting fix * Classification : retaining prediction and fetching data from s3 for model evaluation (#359) * adding new columns to model eval table * test data and prediction data s3 url changes * single migration file * status enum columns * document seeding * Classification : small fixes and storage related changes (#365) * first commit covering all * changing model name to fine tuned model in model eval * error handling in get cloud storage and document not found error handling * fixing alembic revision * uv lock * new uv lock file * updated uv lock file * coderabbit suggestions and removing unused imports * changes in uv lock file * making csv a supported file format, changing uv lock and pyproject toml
Summary
Target issue is #300
Changes Made
-Introduced Fine_Tuning and Model_Evaluation models with necessary relationships and schema definitions.
-Added corresponding Alembic migration to create both tables.
document to refer is here
Summary by CodeRabbit
New Features
Enhancements