Skip to content

Classification: db models and migration script#305

Merged
nishika26 merged 18 commits intofeature/classificationfrom
feature/db_model
Aug 5, 2025
Merged

Classification: db models and migration script#305
nishika26 merged 18 commits intofeature/classificationfrom
feature/db_model

Conversation

@nishika26
Copy link
Copy Markdown
Collaborator

@nishika26 nishika26 commented Jul 28, 2025

Summary

Target issue is #300

Changes Made

-Introduced Fine_Tuning and Model_Evaluation models with necessary relationships and schema definitions.

-Added corresponding Alembic migration to create both tables.

document to refer is here

Summary by CodeRabbit

  • New Features

    • Introduced support for fine-tuning jobs and model evaluations, including new database tables and user-facing data models.
    • Added tracking and status reporting for model evaluation processes.
    • Enabled associations between organizations, projects, fine-tuning jobs, and model evaluations for improved data management.
  • Enhancements

    • Expanded organization and project entities to display related fine-tuning and model evaluation records.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jul 28, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This change introduces new database models and migration scripts for fine-tuning jobs and model evaluations, including their relationships with existing organization and project models. It adds new SQLModel-based classes, updates import statements, and establishes bidirectional ORM relationships, supported by a migration script that creates corresponding tables and enums.

Changes

Cohort / File(s) Change Summary
Alembic Migration for Fine-Tuning & Evaluation
backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py
Adds migration script to create fine_tuning and model_evaluation tables and a new ENUM type for evaluation status.
Fine-Tuning Models
backend/app/models/fine_tuning.py
Introduces SQLModel classes for fine-tuning jobs, including base, creation, DB, and public schemas, with relationships to project, organization, and model evaluation.
Model Evaluation Models
backend/app/models/model_evaluation.py
Adds SQLModel classes and enum for model evaluation, including base, creation, DB, and public schemas, and relationships to fine-tuning, project, and organization.
Model Imports
backend/app/models/__init__.py
Adds imports for new fine-tuning and model evaluation model classes to the module namespace.
Organization Relationships
backend/app/models/organization.py
Adds relationships from Organization to Fine_Tuning and Model_Evaluation models with cascade delete.
Project Relationships
backend/app/models/project.py
Adds relationships from Project to Fine_Tuning and Model_Evaluation models with cascade delete.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant DB
    participant FineTuning
    participant ModelEvaluation

    User->>API: Create Fine-Tuning Job
    API->>DB: Insert Fine_Tuning record
    DB-->>FineTuning: Store job details

    User->>API: Evaluate Model
    API->>DB: Insert Model_Evaluation record (linked to Fine_Tuning)
    DB-->>ModelEvaluation: Store evaluation details

    API->>User: Return job/evaluation status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Poem

In the warren where data grows,
New tables bloom as schema flows.
Models fine-tune, evaluations start,
Relationships woven, each plays a part.
🐇 With paws on keys and ears upright,
The rabbit codes through day and night!
Hooray for progress, hop to delight!

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/db_model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nishika26 nishika26 changed the title db models and migration script Classification: db models and migration script Jul 28, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Jul 28, 2025

Codecov Report

❌ Patch coverage is 97.59036% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/models/organization.py 50.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@nishika26 nishika26 marked this pull request as ready for review July 28, 2025 16:58
@nishika26 nishika26 self-assigned this Jul 28, 2025
@nishika26 nishika26 added the enhancement New feature or request label Jul 28, 2025
@nishika26 nishika26 linked an issue Jul 28, 2025 that may be closed by this pull request
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (12)
backend/app/models/project.py (1)

52-57: Update type annotations to use lowercase list.

The relationship definitions are correct and follow the established pattern. However, consider updating the type annotations to use lowercase list instead of List for consistency with modern Python typing conventions.

-    fine_tuning: List["Fine_Tuning"] = Relationship(
+    fine_tuning: list["Fine_Tuning"] = Relationship(
         back_populates="project", cascade_delete=True
     )
-    model_evaluation: List["Model_Evaluation"] = Relationship(
+    model_evaluation: list["Model_Evaluation"] = Relationship(
         back_populates="project", cascade_delete=True
     )
backend/app/models/organization.py (1)

57-62: Update type annotations to use lowercase list.

The relationship definitions are well-structured and consistent with existing relationships. Consider updating the type annotations to use lowercase list for modern Python typing conventions.

-    fine_tuning: List["Fine_Tuning"] = Relationship(
+    fine_tuning: list["Fine_Tuning"] = Relationship(
         back_populates="organization", cascade_delete=True
     )
-    model_evaluation: List["Model_Evaluation"] = Relationship(
+    model_evaluation: list["Model_Evaluation"] = Relationship(
         back_populates="organization", cascade_delete=True
     )
backend/app/models/model_evaluation.py (3)

1-1: Update type annotations for modern Python.

Consider updating the type annotations to use modern Python conventions:

-from typing import Optional, List
+from typing import Optional
-    eval_split_ratio: List[float] = Field(sa_column=Column(JSON, nullable=False))
+    eval_split_ratio: list[float] = Field(sa_column=Column(JSON, nullable=False))

Also applies to: 33-33


44-44: Minor typo in docstring.

-    """Database model for keep a record of model evaluation"""
+    """Database model for keeping a record of model evaluation"""

61-61: Consider using union syntax for optional types.

For consistency with modern Python typing, consider using the union operator:

-    deleted_at: Optional[datetime] = Field(default=None, nullable=True)
+    deleted_at: datetime | None = Field(default=None, nullable=True)
-    score: Optional[float] = None
+    score: float | None = None
-    deleted_at: Optional[datetime] = None
+    deleted_at: datetime | None = None

Also applies to: 74-74, 78-78

backend/app/models/fine_tuning.py (7)

1-1: Update deprecated typing imports.

The typing.List import is deprecated in favor of the built-in list type for Python 3.9+.

-from typing import Optional, List
+from typing import Optional

13-13: Update deprecated List type annotation.

The List[float] type annotation should be updated to use the built-in list type.

-    split_ratio: List[float] = Field(sa_column=Column(JSON, nullable=False))
+    split_ratio: list[float] = Field(sa_column=Column(JSON, nullable=False))

42-48: Update Optional type annotation syntax.

The Optional type annotations should be updated to use the modern union syntax.

-    openai_job_id: Optional[str] = Field(
+    openai_job_id: str | None = Field(
         default=None, description="Fine tuning Job ID returned by OpenAI"
     )
     status: str = Field(default=None, description="Status of the fine-tuning job")
-    fine_tuned_model: Optional[str] = Field(
+    fine_tuned_model: str | None = Field(
         default=None, description="Final fine tuned model name from OpenAI"
     )

45-45: Consider using a more meaningful default for status field.

The status field defaults to None, but it might be better to use a meaningful initial status like "pending" or "created".

-    status: str = Field(default=None, description="Status of the fine-tuning job")
+    status: str = Field(default="pending", description="Status of the fine-tuning job")

65-70: Update Optional type annotation syntax.

The Optional type annotations should be updated to use the modern union syntax.

-    openai_job_id: Optional[str] = None
+    openai_job_id: str | None = None
     status: str
-    fine_tuned_model: Optional[str] = None
+    fine_tuned_model: str | None = None
     inserted_at: datetime
     updated_at: datetime
-    deleted_at: Optional[datetime] = None
+    deleted_at: datetime | None = None

52-52: Update Optional type annotation syntax.

The Optional type annotation should be updated to use the modern union syntax.

-    deleted_at: Optional[datetime] = Field(default=None, nullable=True)
+    deleted_at: datetime | None = Field(default=None, nullable=True)

56-56: Update List type annotation.

The List type annotation should be updated to use the built-in list type.

-    model_evaluation: List["Model_Evaluation"] = Relationship(
+    model_evaluation: list["Model_Evaluation"] = Relationship(
         back_populates="fine_tuning"
     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a516292 and 3b6e7c4.

📒 Files selected for processing (6)
  • backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py (1 hunks)
  • backend/app/models/__init__.py (1 hunks)
  • backend/app/models/fine_tuning.py (1 hunks)
  • backend/app/models/model_evaluation.py (1 hunks)
  • backend/app/models/organization.py (2 hunks)
  • backend/app/models/project.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
backend/app/models/__init__.py (2)
backend/app/models/fine_tuning.py (4)
  • FineTuningJobBase (11-26)
  • Fine_Tuning (37-58)
  • FineTuningJobCreate (29-34)
  • FineTuningJobPublic (61-70)
backend/app/models/model_evaluation.py (4)
  • Model_Evaluation (43-65)
  • ModelEvaluationBase (19-36)
  • ModelEvaluationCreate (39-40)
  • ModelEvaluationPublic (68-78)
backend/app/models/project.py (1)
backend/app/models/openai_conversation.py (1)
  • OpenAIConversation (58-69)
backend/app/models/model_evaluation.py (3)
backend/app/models/collection.py (1)
  • Collection (21-53)
backend/app/models/api_key.py (1)
  • APIKey (28-38)
backend/app/models/document.py (1)
  • Document (10-30)
🪛 Ruff (0.12.2)
backend/app/models/__init__.py

60-60: .fine_tuning.FineTuningJobBase imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


61-61: .fine_tuning.Fine_Tuning imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


62-62: .fine_tuning.FineTuningJobCreate imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


63-63: .fine_tuning.FineTuningJobPublic imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


67-67: .model_evaluation.Model_Evaluation imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


68-68: .model_evaluation.ModelEvaluationBase imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


69-69: .model_evaluation.ModelEvaluationCreate imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


70-70: .model_evaluation.ModelEvaluationPublic imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

backend/app/models/organization.py

57-57: Use list instead of List for type annotation

Replace with list

(UP006)


60-60: Use list instead of List for type annotation

Replace with list

(UP006)

backend/app/models/project.py

52-52: Use list instead of List for type annotation

Replace with list

(UP006)


52-52: Undefined name Fine_Tuning

(F821)


55-55: Use list instead of List for type annotation

Replace with list

(UP006)


55-55: Undefined name Model_Evaluation

(F821)

backend/app/models/model_evaluation.py

1-1: typing.List is deprecated, use list instead

(UP035)


33-33: Use list instead of List for type annotation

Replace with list

(UP006)


61-61: Use X | None for type annotations

Convert to X | None

(UP045)


63-63: Undefined name Project

(F821)


64-64: Undefined name Organization

(F821)


65-65: Undefined name Fine_Tuning

(F821)


74-74: Use X | None for type annotations

Convert to X | None

(UP045)


78-78: Use X | None for type annotations

Convert to X | None

(UP045)

backend/app/models/fine_tuning.py

1-1: typing.List is deprecated, use list instead

(UP035)


13-13: Use list instead of List for type annotation

Replace with list

(UP006)


42-42: Use X | None for type annotations

Convert to X | None

(UP045)


46-46: Use X | None for type annotations

Convert to X | None

(UP045)


52-52: Use X | None for type annotations

Convert to X | None

(UP045)


54-54: Undefined name Project

(F821)


55-55: Undefined name Organization

(F821)


56-56: Use list instead of List for type annotation

Replace with list

(UP006)


56-56: Undefined name Model_Evaluation

(F821)


65-65: Use X | None for type annotations

Convert to X | None

(UP045)


67-67: Use X | None for type annotations

Convert to X | None

(UP045)


70-70: Use X | None for type annotations

Convert to X | None

(UP045)

🔇 Additional comments (10)
backend/app/models/__init__.py (1)

59-71: LGTM! New model imports follow the established pattern.

The imports for fine-tuning and model evaluation models are correctly structured and consistent with the existing import organization in this namespace module.

backend/app/models/organization.py (1)

13-14: LGTM! Proper forward reference imports.

The imports for the new models in the TYPE_CHECKING block correctly handle forward references and avoid circular import issues.

backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py (3)

19-26: Good enum handling with checkfirst parameter.

The PostgreSQL ENUM creation with checkfirst=True and create_type=False is a best practice to avoid duplicate type creation issues.


31-83: Well-structured table creation with proper constraints.

The migration correctly creates both tables with:

  • Appropriate foreign key constraints with CASCADE deletes
  • Proper column types matching the ORM models
  • Consistent timestamp fields

The table structures align well with the ORM models defined in the corresponding Python files.


86-88: Complete downgrade implementation.

The downgrade function properly reverses all changes by dropping tables and the enum type in the correct order.

backend/app/models/model_evaluation.py (3)

12-16: LGTM! Enum values match migration script.

The EvaluationStatus enum is properly defined and the values match those in the corresponding Alembic migration script.


19-36: Well-designed base model with proper foreign key constraints.

The ModelEvaluationBase class properly defines foreign key relationships to document, project, and organization entities with appropriate cascade delete behavior.


43-65: Comprehensive database model with proper relationships.

The Model_Evaluation model is well-structured with:

  • Proper primary key and foreign key definitions
  • Appropriate nullable settings for optional fields
  • Consistent timestamp fields matching other models
  • Correct bidirectional relationships with back_populates
backend/app/models/fine_tuning.py (2)

29-34: LGTM!

The create model correctly inherits all required fields from the base class. The pattern of using a separate creation model is appropriate for SQLModel.


54-58: LGTM! Relationship definitions are correct.

The relationship definitions properly use forward references with quotes and establish the correct bidirectional relationships with back_populates.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
backend/app/models/model_evaluation.py (2)

1-1: Remove unused import.

The Optional import is not used in this file.

-from typing import Optional

54-54: Make score field description more generic.

The description specifically mentions "Matthews Correlation Coefficient" but the field is generic and could store other evaluation metrics. Consider making the description more general.

-    score: float = Field(nullable=True, description="Matthews Correlation Coefficient")
+    score: float = Field(nullable=True, description="Evaluation score for the specified metric")
backend/app/models/fine_tuning.py (2)

1-1: Remove unused import.

The Optional import is not used in this file.

-from typing import Optional

45-45: Consider using an enum for status field.

For consistency with Model_Evaluation.status and better type safety, consider defining a FineTuningStatus enum instead of using a generic string field.

+from enum import Enum
+
+class FineTuningStatus(str, Enum):
+    pending = "pending"
+    running = "running" 
+    completed = "completed"
+    failed = "failed"
+    cancelled = "cancelled"

-    status: str = Field(default="pending", description="Status of the fine-tuning job")
+    status: FineTuningStatus = Field(default=FineTuningStatus.pending, description="Status of the fine-tuning job")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b6e7c4 and 55a0760.

📒 Files selected for processing (5)
  • backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py (1 hunks)
  • backend/app/models/fine_tuning.py (1 hunks)
  • backend/app/models/model_evaluation.py (1 hunks)
  • backend/app/models/organization.py (2 hunks)
  • backend/app/models/project.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/app/models/organization.py
  • backend/app/alembic/versions/a2f5ce7d32d8_add_fine_tuning_and_model_evaluation_.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/models/fine_tuning.py (5)
backend/app/models/api_key.py (2)
  • APIKey (28-38)
  • APIKeyBase (10-20)
backend/app/models/collection.py (1)
  • Collection (21-53)
backend/app/models/document.py (1)
  • Document (10-30)
backend/app/models/assistants.py (1)
  • Assistant (29-40)
backend/app/models/openai_conversation.py (1)
  • OpenAIConversation (58-69)
🪛 Ruff (0.12.2)
backend/app/models/fine_tuning.py

1-1: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


54-54: Undefined name Project

(F821)


55-55: Undefined name Organization

(F821)


56-56: Undefined name Model_Evaluation

(F821)

backend/app/models/model_evaluation.py

1-1: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)


63-63: Undefined name Project

(F821)


64-64: Undefined name Organization

(F821)


65-65: Undefined name Fine_Tuning

(F821)

backend/app/models/project.py

52-52: Undefined name Fine_Tuning

(F821)


55-55: Undefined name Model_Evaluation

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/models/project.py (1)

52-57: LGTM! Relationship definitions follow established patterns.

The new relationship fields for fine_tuning and model_evaluation are properly defined with:

  • Correct forward reference syntax using quotes
  • Appropriate back_populates configuration
  • Proper cascade delete behavior for dependent entities

This is consistent with other relationship definitions in the Project model.

backend/app/models/model_evaluation.py (1)

12-78: Well-structured model implementation.

The model follows established patterns with:

  • Proper enum definition for status values
  • Appropriate base/create/table/public class hierarchy
  • Correct foreign key definitions with cascade delete
  • Proper use of JSON column for complex data (eval_split_ratio)
  • Consistent timestamp and soft delete fields
  • Bidirectional relationships with back_populates
backend/app/models/fine_tuning.py (2)

11-70: Well-structured model following established patterns.

The model implementation is solid with:

  • Proper base/create/table/public class hierarchy
  • Correct foreign key definitions with cascade delete
  • Appropriate use of JSON column for complex data (split_ratio)
  • Consistent timestamp and soft delete fields
  • Bidirectional relationships with back_populates

56-58: Add cascade_delete=True for consistency.

The model_evaluation relationship should include cascade_delete=True for consistency with other relationships and to ensure proper cleanup when a fine-tuning job is deleted.

    model_evaluation: list["Model_Evaluation"] = Relationship(
-        back_populates="fine_tuning"
+        back_populates="fine_tuning", cascade_delete=True
    )

Likely an incorrect or invalid review comment.

@nishika26 nishika26 changed the base branch from main to feature/classification July 29, 2025 05:28
Comment on lines +15 to +19
document_id: UUID = Field(
foreign_key="document.id",
nullable=False,
ondelete="CASCADE",
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as "training_file" in the Create fine-tuning job API or the Retrieve fine-tuning job API?

Also, not sure if the fine-tuned model must be deleted when a document is deleted.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will fetch the document using this document id and then upload it to openai using this api , then the id we get in the response id is what we will pass to the openai's fine tuning API.

You're absolutely right to raise the question of whether a fine-tuned model should be deleted when the associated document is deleted. I had the same question in mind. However, since the current design involves deleting all child resources when the parent document is removed, I chose to follow that pattern for consistency. That said, I'm open to revisiting this. If others feel that the fine-tuned model should persist even after the document is deleted, we can definitely discuss and adjust the approach accordingly. @vijay-T4D @AkhileshNegi

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should go with soft delete kind of an approach and then plan the purge after a desired number of days. Also we need to check how the model behaves post a document deletion.

"training_file_id", sqlmodel.sql.sqltypes.AutoString(), nullable=True
),
sa.Column("testing_file_id", sqlmodel.sql.sqltypes.AutoString(), nullable=True),
sa.Column("openai_job_id", sqlmodel.sql.sqltypes.AutoString(), nullable=True),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be nullable true

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the entire flow—from data processing to file upload and fine-tuning job creation—runs as a background job, these fields can be null initially. For example, if the file upload succeeds but the fine-tuning job creation fails, we’d still have the training and testing file IDs stored. Later, we can easily identify such incomplete rows (i.e., those with file IDs but no OpenAI job ID) for a given document ID and split ratio, and resume the process from there.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then make entry to DB only when it passes basic checks like file uploading and all and fine tune job has started. So initially the status will be in_progress and if it fails then update the status. so the entry means a fine tune job was initiated.. maybe add a column to store error msgs n all if we get it from openai

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but there’s still a case where file upload succeeds but fine-tuning job creation fails. If we retry, we’d end up re-downloading and re-uploading unless we explicitly delete the uploaded files on failure. Even with an error message column, we'd still be inserting DB entries without training/testing file IDs or the OpenAI job ID.

Copy link
Copy Markdown
Collaborator

@vijay-T4D vijay-T4D left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@nishika26 nishika26 merged commit 7ea5dad into feature/classification Aug 5, 2025
1 check passed
@nishika26 nishika26 deleted the feature/db_model branch August 5, 2025 12:56
@coderabbitai coderabbitai bot mentioned this pull request Aug 31, 2025
AkhileshNegi pushed a commit that referenced this pull request Sep 4, 2025
* Classification: db models and migration script (#305)

* db models and migration script

* Classification: Fine tuning Initiation and retrieve endpoint (#315)

* Fine-tuning core, initiation, and retrieval

* seperate session for bg task, and formating fixes

* fixing alembic revision

* Classification : Model evaluation of fine tuned models (#326)

* Model evaluation of fine tuned models

* fixing alembic revision

* alembic revision fix

* Classification : train and test data to s3 (#343)

* alembic file for adding and removing columns

* train and test s3 url column

* updating alembic revision

* formatting fix

* Classification : retaining prediction and fetching data from s3 for model evaluation (#359)

* adding new columns to model eval table

* test data and prediction data s3 url changes

* single migration file

* status enum columns

* document seeding

* Classification : small fixes and storage related changes (#365)

* first commit covering all

* changing model name to fine tuned model in model eval

* error handling in get cloud storage and document not found error handling

* fixing alembic revision

* uv lock

* new uv lock file

* updated uv lock file

* coderabbit suggestions and removing unused imports

* changes in uv lock file

* making csv a supported file format, changing uv lock and pyproject toml
@coderabbitai coderabbitai bot mentioned this pull request Sep 23, 2025
2 tasks
@coderabbitai coderabbitai bot mentioned this pull request Oct 8, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Classification : DB Models and migration script

4 participants