Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ jobs:
exit 1

- name: Upload benchmark results
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v6
with:
name: bench-${{ matrix.service }}-${{ matrix.dataset }}-${{ matrix.count }}.csv
path: bench-${{ matrix.service }}-${{ matrix.dataset }}-${{ matrix.count }}.csv
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/cd-staging.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ on:

jobs:
build:
if: false
Comment on lines 9 to +10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/cd-staging.yml

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1676


🏁 Script executed:

find .github/workflows -type f -name "*.yml" -o -name "*.yaml"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 220


🏁 Script executed:

cat -n .github/workflows/cd-production.yml

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1794


🏁 Script executed:

git log -p --follow -S "if: false" -- .github/workflows/cd-staging.yml | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

git log --oneline .github/workflows/cd-staging.yml | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 110


🏁 Script executed:

git show 23c9685:.github/workflows/cd-staging.yml | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1354


🏁 Script executed:

git log --all --oneline -- .github/workflows/cd-staging.yml | cat

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 146


🏁 Script executed:

git show ef56025:.github/workflows/cd-staging.yml 2>/dev/null | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1354


🏁 Script executed:

git log --all --reverse --oneline -- .github/workflows/cd-staging.yml

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 146


🏁 Script executed:

git show ef56025 --stat | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 475


🏁 Script executed:

git log --all --reverse --oneline | head -5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 146


🏁 Script executed:

grep -r "staging" README.md 2>/dev/null || echo "No README found or no staging mention"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 420


Staging CD job is disabled; confirm intent and consider using a toggleable condition.

if: false at line 10 disables the entire staging deployment job for all main branch pushes. If this is intentional but temporary, use a repository variable to make the toggle explicit and allow re-enabling without code changes.

Suggested toggle via repo variable
 jobs:
   build:
-    if: false
+    if: ${{ vars.ENABLE_STAGING_CD == 'true' }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
build:
if: false
build:
if: ${{ vars.ENABLE_STAGING_CD == 'true' }}
🧰 Tools
🪛 actionlint (1.7.10)

10-10: constant expression "false" in condition. remove the if: section

(if-cond)

🤖 Prompt for AI Agents
In @.github/workflows/cd-staging.yml around lines 9 - 10, The staging CD job
currently uses a hard disable (`if: false`) on the `build` job which permanently
prevents runs; replace this with a toggleable condition using a repository or
workflow variable (e.g., `if: ${{ vars.STAGING_CD_ENABLED == 'true' }}` or a
workflow input) so the job can be re-enabled without editing the YAML; update
references to the `build` job and document the repo variable in repo settings or
the workflow inputs so maintainers can flip the toggle as needed.

runs-on: ubuntu-latest
environment: AWS_ENV_VARS

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
"""extend collection table for provider agnostic support

Revision ID: 041
Revises: 040
Create Date: 2026-01-15 16:53:19.495583

"""
from alembic import op
import sqlalchemy as sa
import sqlmodel.sql.sqltypes
from sqlalchemy.dialects import postgresql


# revision identifiers, used by Alembic.
revision = "041"
down_revision = "040"
branch_labels = None
depends_on = None

provider_type = postgresql.ENUM(
"OPENAI",
# aws
# gemini
name="providertype",
create_type=False,
)


def upgrade():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add return type hints to migration functions.

Required by the repo’s Python typing guidelines.

✍️ Suggested update
-def upgrade():
+def upgrade() -> None:
@@
-def downgrade():
+def downgrade() -> None:

Also applies to: 74-74

🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/041_extend_collection_table_for_provider_.py` at
line 29, Add explicit return type hints to the Alembic migration functions:
change the signature of upgrade() (and the corresponding downgrade() at the
other location) to include a return type of None (e.g., def upgrade() -> None:
and def downgrade() -> None:) so the functions conform to the repo’s Python
typing guidelines; ensure any other migration functions in this file follow the
same pattern.

provider_type.create(op.get_bind(), checkfirst=True)
op.add_column(
"collection",
sa.Column(
"provider",
provider_type,
nullable=False,
comment="LLM provider used for this collection",
),
)
op.execute("UPDATE collection SET provider = 'OPENAI' WHERE provider IS NULL")
Comment on lines +31 to +40
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -name "041_extend_collection_table_for_provider_*.py" -type f

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 150


🏁 Script executed:

find backend -name "*.py" -path "*alembic/versions*" | grep -i provider

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 225


🏁 Script executed:

ls -la backend/app/alembic/versions/ 2>/dev/null | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1587


🏁 Script executed:

find . -type d -name versions

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 105


🏁 Script executed:

git ls-files | grep -i alembic | grep -i version | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1328


🏁 Script executed:

find . -name "*041*" -type f 2>/dev/null

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 150


🏁 Script executed:

rg "extend_collection_table_for_provider" -l

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

rg "provider_type" -l | head -10

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 286


🏁 Script executed:

git ls-files | grep alembic

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2986


🏁 Script executed:

rg "op.add_column.*provider" -A 5 -B 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

rg 'UPDATE collection SET provider' -B 5 -A 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 923


🏁 Script executed:

ls -la

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1555


🏁 Script executed:

git ls-files | wc -l

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 78


🏁 Script executed:

git ls-files | grep -E "\.py$" | grep -i migration | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🌐 Web query:

Alembic add_column non-nullable without server_default existing rows failure

💡 Result:

Problem

  • Adding a NOT NULL column to a table with existing rows fails because existing rows would have NULL for the new column and the DB enforces NOT NULL. Alembic/autogenerate often emits op.add_column(... nullable=False) which triggers this. [1][2]

Safe solutions (pick one)

  1. Fast: add with a temporary server_default, then remove it
  • Add column with server_default so the DB populates existing rows at add-time, then ALTER to drop the default and set nullable=False.
  • PostgreSQL can optimize ADD COLUMN with a non-volatile DEFAULT so this is fast on modern versions; older PG may rewrite the table. Example:
    • op.add_column('t', sa.Column('col', sa.String(), nullable=False, server_default='x'))
    • op.alter_column('t', 'col', server_default=None)
  • References: Alembic docs; PostgreSQL behavior discussion. [2][3]
  1. Three-step (most portable / safest)
  • Step A: add nullable column:
    • op.add_column('t', sa.Column('col', sa.String(), nullable=True))
  • Step B: populate existing rows (UPDATE in migration, possibly batched):
    • op.execute("UPDATE t SET col = 'x' WHERE col IS NULL")
  • Step C: make column NOT NULL:
    • op.alter_column('t', 'col', nullable=False)
  • Use this for fine-grained control or when you need per-row logic and for DBs that don’t support fast server-default semantics. [1][4]
  1. Constraint/validation approach (avoid long exclusive locks)
  • Create a NOT VALID CHECK constraint enforcing col IS NOT NULL, update rows, then VALIDATE CONSTRAINT and drop the check if desired — reduces locking on large tables (Postgres-specific pattern). [5]

Notes and gotchas

  • SQLite requires batch mode (op.batch_alter_table) for some ALTERs. [2]
  • If you use autogenerate, manually edit the migration to add server_default or split into the three steps above.
  • After using a server_default to populate rows, remove the server_default if it was only intended for migration.

Concise Alembic examples

  • Server-default quick path:
    from alembic import op
    import sqlalchemy as sa

    def upgrade():
    op.add_column('users', sa.Column('plan', sa.String(), nullable=False, server_default='free'))
    op.alter_column('users', 'plan', server_default=None)

  • Three-step portable path:
    def upgrade():
    op.add_column('users', sa.Column('plan', sa.String(), nullable=True))
    op.execute("UPDATE users SET plan='free' WHERE plan IS NULL")
    op.alter_column('users', 'plan', nullable=False)

References
[1] Alembic operation docs (add_column / alter_column).
[2] Ross Gray — Alembic/Postgres defaults & performance (ADD COLUMN with DEFAULT optimization).
[3] Dev.to / blog posts showing server_default and three-step patterns.
[4] Community posts & examples resolving NotNull/IntegrityError when adding columns.
[5] Squawk docs — adding NOT NULL safely (NOT VALID check constraint pattern).

If you want, tell me your DB (Postgres/MySQL/SQLite) and I’ll produce a migration file tuned to it.


🏁 Script executed:

cat -n backend/app/alembic/versions/041_extend_collection_table_for_provider_.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3797


🏁 Script executed:

wc -l backend/app/alembic/versions/041_extend_collection_table_for_provider_.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 152


🏁 Script executed:

cat backend/app/alembic/versions/041_extend_collection_table_for_provider_.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3034


🏁 Script executed:

cat backend/app/alembic/versions/012_added_provider_column_to_the_credential_.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2468


🏁 Script executed:

head -100 backend/app/alembic/versions/041_extend_collection_table_for_provider_.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2693


Fix migration failure risk: non-nullable provider column is added without a server default.

On tables with existing rows, the ADD COLUMN with nullable=False will fail immediately without a server default, preventing the subsequent UPDATE statement from executing. Add server_default="OPENAI" to the column definition, then remove the server default after the update to allow future inserts without a default value.

Additionally, the upgrade() and downgrade() functions lack return type hints. Add -> None return type annotations to both functions per the coding guidelines.

🔧 Suggested migration fix
     op.add_column(
         "collection",
         sa.Column(
             "provider",
             provider_type,
             nullable=False,
+            server_default="OPENAI",
             comment="LLM provider used for this collection",
         ),
     )
     op.execute("UPDATE collection SET provider = 'OPENAI' WHERE provider IS NULL")
+    op.alter_column(
+        "collection",
+        "provider",
+        server_default=None,
+        existing_type=provider_type,
+    )

Also add return type hints:

-def upgrade():
+def upgrade() -> None:

-def downgrade():
+def downgrade() -> None:
🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/041_extend_collection_table_for_provider_.py`
around lines 31 - 40, The migration adds a non-nullable "provider" column via
op.add_column and then executes an UPDATE, which fails on tables with rows
because there's no server default; modify the op.add_column call for "provider"
to include server_default="OPENAI" so the ALTER succeeds, then run the UPDATE
(op.execute) and finally remove the server default with an ALTER/sa.DDL or
op.alter_column to drop the server_default; also add return type hints -> None
to the upgrade() and downgrade() functions to match coding guidelines.

op.add_column(
"collection",
sa.Column(
"name",
sqlmodel.sql.sqltypes.AutoString(),
nullable=True,
comment="Name of the collection",
),
)
op.add_column(
"collection",
sa.Column(
"description",
sqlmodel.sql.sqltypes.AutoString(),
nullable=True,
comment="Description of the collection",
),
)
op.alter_column(
"collection",
"llm_service_name",
existing_type=sa.VARCHAR(),
comment="Name of the LLM service",
existing_comment="Name of the LLM service provider",
existing_nullable=False,
)
op.create_unique_constraint(None, "collection", ["name"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -t f "041_extend_collection_table_for_provider_" backend/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 148


🏁 Script executed:

find . -path "*/alembic/versions/*" -name "*041*" -type f

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 150


🏁 Script executed:

rg "exists_by_name" --type py -A 5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1138


🏁 Script executed:

cat -n backend/app/alembic/versions/041_extend_collection_table_for_provider_.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3797


🏁 Script executed:

rg "class Collection" --type py -A 20 backend/app/models/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 8074


🏁 Script executed:

cat -n backend/app/models/collection.py | head -60

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2376


🏁 Script executed:

rg "project_id" backend/app/models/collection.py -B 2 -A 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 338


🏁 Script executed:

cat -n backend/app/models/collection.py | sed -n '50,75p'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1212


🏁 Script executed:

rg "unique=True" backend/app/models/collection.py -B 5 -A 5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 428


🏁 Script executed:

rg "Collection.name" --type py -B 3 -A 3 backend/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 656


🏁 Script executed:

rg "unique_together\|UniqueConstraint" --type py backend/app/models/collection.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

rg "Index\|unique" backend/app/models/collection.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

python3 << 'EOF'
# Check if there are any other constraints or indexes defined
import ast
import sys

with open('backend/app/models/collection.py', 'r') as f:
    content = f.read()
    
# Search for __table_args__ or similar constraint definitions
if '__table_args__' in content:
    print("Found __table_args__ definition")
    lines = content.split('\n')
    for i, line in enumerate(lines):
        if '__table_args__' in line:
            print(f"Line {i+1}: {line}")
            for j in range(1, 10):
                if i+j < len(lines):
                    print(f"Line {i+j+1}: {lines[i+j]}")
else:
    print("No __table_args__ found")

# Also check for other constraint patterns
if 'UniqueConstraint' in content or 'Index' in content:
    print("\nFound constraint/index patterns")
EOF

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 161


🏁 Script executed:

# Check if there's a previous migration that might have already defined a unique constraint differently
git log --oneline backend/app/alembic/versions/ | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 121


🏁 Script executed:

# Check for any other usages or assumptions about name uniqueness
rg "\.name" backend/app/crud/collection/collection.py -B 2 -A 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 286


🏁 Script executed:

# Verify the Collection model fully to see if unique=True constraint is defined at field level
sed -n '22,85p' backend/app/models/collection.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2448


🏁 Script executed:

# Check if there are any queries that might assume global name uniqueness
rg "filter.*name\|where.*name" backend/app/crud/collection/ --type py -B 1 -A 1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


Unique constraint must include project_id to be project-scoped.

The exists_by_name method filters by project_id, allowing the same collection name to exist in different projects. However, the global unique constraint on name alone prevents this. The DB constraint contradicts the application logic and will reject valid use cases where different projects have collections with identical names.

Required constraint change
-    op.create_unique_constraint(None, "collection", ["name"])
+    op.create_unique_constraint(
+        "uq_collection_project_id_name", "collection", ["project_id", "name"]
+    )
-    op.drop_constraint("collection_name_key", "collection", type_="unique")
+    op.drop_constraint("uq_collection_project_id_name", "collection", type_="unique")

Also update the Collection model to remove unique=True from the name field (line 52) and add a composite unique constraint via __table_args__ if ORM-level validation is needed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
op.create_unique_constraint(None, "collection", ["name"])
op.create_unique_constraint(
"uq_collection_project_id_name", "collection", ["project_id", "name"]
)
🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/041_extend_collection_table_for_provider_.py` at
line 67, The database unique constraint on collection.name is too strict; change
the migration that calls op.create_unique_constraint(None, "collection",
["name"]) to instead create a composite unique constraint on ("project_id",
"name") so names are scoped per project, and update the Collection ORM model to
remove unique=True from the name column and add a composite unique constraint in
__table_args__ (e.g., UniqueConstraint("project_id", "name")). Also ensure the
exists_by_name method still filters by project_id and will now align with the DB
constraint.

op.drop_constraint(
op.f("collection_organization_id_fkey"), "collection", type_="foreignkey"
)
op.drop_column("collection", "organization_id")


def downgrade():
op.add_column(
"collection",
sa.Column(
"organization_id",
sa.INTEGER(),
autoincrement=False,
nullable=True,
comment="Reference to the organization",
),
)
op.execute(
"""UPDATE collection SET organization_id = (SELECT organization_id FROM project
WHERE project.id = collection.project_id)"""
)
op.alter_column("collection", "organization_id", nullable=False)
op.create_foreign_key(
op.f("collection_organization_id_fkey"),
"collection",
"organization",
["organization_id"],
["id"],
ondelete="CASCADE",
)
op.drop_constraint("collection_name_key", "collection", type_="unique")
op.alter_column(
"collection",
"llm_service_name",
existing_type=sa.VARCHAR(),
comment="Name of the LLM service provider",
existing_comment="Name of the LLM service",
existing_nullable=False,
)
op.drop_column("collection", "description")
op.drop_column("collection", "name")
op.drop_column("collection", "provider")
4 changes: 2 additions & 2 deletions backend/app/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
onboarding,
credentials,
cron,
evaluation,
evaluations,
fine_tuning,
model_evaluation,
collection_job,
Expand All @@ -37,7 +37,7 @@
api_router.include_router(cron.router)
api_router.include_router(documents.router)
api_router.include_router(doc_transformation_job.router)
api_router.include_router(evaluation.router)
api_router.include_router(evaluations.router)
api_router.include_router(llm.router)
api_router.include_router(login.router)
api_router.include_router(onboarding.router)
Expand Down
4 changes: 4 additions & 0 deletions backend/app/api/routes/collections.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
CollectionPublic,
)
from app.utils import APIResponse, load_description, validate_callback_url
from app.services.collections.helpers import ensure_unique_name
from app.services.collections import (
create_collection as create_service,
delete_collection as delete_service,
Expand Down Expand Up @@ -87,6 +88,9 @@ def create_collection(
if request.callback_url:
validate_callback_url(str(request.callback_url))

if request.name:
ensure_unique_name(session, current_user.project_.id, request.name)

collection_job_crud = CollectionJobCrud(session, current_user.project_.id)
collection_job = collection_job_crud.create(
CollectionJobCreate(
Expand Down
Loading