Skip to content

add a document table to persist all the uploaded document names and sizes#9

Merged
alvinsjoy merged 5 commits into
mainfrom
dev
Apr 28, 2026
Merged

add a document table to persist all the uploaded document names and sizes#9
alvinsjoy merged 5 commits into
mainfrom
dev

Conversation

@alvinsjoy
Copy link
Copy Markdown
Member

No description provided.

@alvinsjoy alvinsjoy requested a review from Copilot April 28, 2026 14:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@alvinsjoy has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 42 minutes and 26 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 425181c1-49e9-47a0-ba18-9da00dccab37

📥 Commits

Reviewing files that changed from the base of the PR and between 8892ef6 and 267abff.

📒 Files selected for processing (3)
  • app/api/upload/route.ts
  • backend/services/vector_service.py
  • components/chat-area.tsx
📝 Walkthrough

Walkthrough

Replaces Chat.documentCount with a Document model across backend, API, database, data layer, actions, and UI; ingestion now returns storage_key and file_size, uploads persist Document records (id from storage_key, size as BigInt), and deletions remove Document rows and vector chunks by storage_key.

Changes

Cohort / File(s) Summary
API Upload Handler
app/api/upload/route.ts
POST upload now reads storage_key and file_size from ingestion response and creates a Document record (id from storage_key, size -> BigInt) instead of incrementing chat.documentCount.
Backend Ingestion
backend/api/v1/ingestion_routes.py, backend/schemas/ingestion.py, backend/services/ingestion_service.py
Ingestion route generates storage_key; service returns storage_key and file_size (bytes); chunk IDs/metadata use storage_key as source_id.
Vector/Delete Flow
backend/api/v1/vector_routes.py, backend/services/vector_service.py
Delete endpoints and service switched from filename-based deletion to storage_key-based deletion (filter on source_id).
Data Layer & Types
lib/data/chat.ts, lib/types.ts
Adds exported Document type and Chat.documents: Document[]; queries now include related documents and derive documentCount from Prisma _count.documents.
Actions / Deletion
lib/actions/chat.ts
deleteDocumentAction signature changed to accept documentId (storage_key); backend DELETE uses storage_key; after backend success, deletes Document row by primary key instead of decrementing Chat counter.
Frontend UI
components/chat-area.tsx, app/chat/loading.tsx
Chat-area seeds documents from currentChat.documents, formats file size, removes document-count subtitle, compacts upload/document chip UI; loading header skeleton made more compact.
Database Schema & Migrations
prisma/schema.prisma, prisma/migrations/...add_document_model/migration.sql, prisma/migrations/...change_size_to_bigint/migration.sql, prisma/migrations/...remove_document_filename/migration.sql
Removes Chat.documentCount, adds Document model (name, size as BIGINT, createdAt, chatId FK with cascade), migrates size -> BIGINT, and removes deprecated filename column.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant AppAPI as App API (Next.js route)
  participant Python as Ingestion Backend
  participant DB as Prisma/Database
  participant Vector as Vector Service

  Client->>AppAPI: POST /api/upload (file, chatId, auth)
  AppAPI->>Python: POST /ingest (stream file, chat_id, generated storage_key)
  Python-->>AppAPI: 200 { chat_id, filename, storage_key, file_size, ... }
  AppAPI->>DB: prisma.document.create({ id: storage_key, name: filename, size: BigInt(file_size), chatId })
  AppAPI->>Vector: (optional) notify vector or rely on Python to upsert chunks using storage_key
  AppAPI-->>Client: 200 OK (document persisted)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰
Hop, I carried keys of storage bright,
From count to catalog, each file takes flight.
Chunks know their source, sizes tallied true,
I nibble bugs and leave the code anew. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No description was provided by the author, making it impossible to assess relevance to the changeset. Add a pull request description explaining the motivation, implementation details, and any breaking changes or migration notes.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change—introducing a Document table to persist uploaded document metadata (names and sizes).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/schemas/ingestion.py (1)

23-31: ⚠️ Potential issue | 🟡 Minor

Update the example to include the new file_size field.

The json_schema_extra example is missing the newly added file_size field. This could cause confusion for API consumers reviewing the generated OpenAPI documentation.

📝 Proposed fix
     class Config:
         json_schema_extra = {
             "example": {
                 "chat_id": "uuid-123",
                 "filename": "architecture_guide.pdf",
+                "file_size": 1048576,
                 "chunks_created": 15,
                 "document_ids": ["arch_guide_chunk_0", "arch_guide_chunk_1"],
                 "pages_processed": 5
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/schemas/ingestion.py` around lines 23 - 31, The json_schema_extra
example in the ingestion model is missing the new file_size field; update the
example object inside json_schema_extra (the "example" dict) to include a
file_size key with a realistic numeric value (e.g., file_size: 102400) so the
generated OpenAPI docs show the new property for consumers.
🧹 Nitpick comments (1)
prisma/migrations/20260428143008_add_document_model/migration.sql (1)

88-88: Add uniqueness for document identity per chat.

Without a composite unique key, retries can create duplicate document rows for the same (chatId, filename), which hurts idempotency and document counts.

Index addition
 -- CreateIndex
 CREATE INDEX "Document_chatId_idx" ON "Document"("chatId");
+
+-- CreateIndex
+CREATE UNIQUE INDEX "Document_chatId_filename_key" ON "Document"("chatId", "filename");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prisma/migrations/20260428143008_add_document_model/migration.sql` at line
88, The migration currently creates a non-unique index ("Document_chatId_idx")
on Document(chatId) but needs a composite uniqueness constraint to prevent
duplicate documents per chat; modify the migration to add a composite unique
index or UNIQUE constraint on (chatId, filename) for the Document table (e.g.,
create a UNIQUE INDEX or ALTER TABLE ADD CONSTRAINT unique on chatId+filename)
so (chatId, filename) is enforced unique and idempotent — reference the Document
table and the chatId and filename columns when applying the change and
update/remove the existing non-unique index name accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/chat-area.tsx`:
- Around line 57-65: The documents state (documents/useState) is only
initialized once and isn't reset when currentChat changes; update the effect
that currently resets localMessages (the useEffect around localMessages reset)
to also reset documents by calling setDocuments with currentChat?.documents
mapped into UploadedDocument shape (id, name, size via formatFileSize, filename,
uploading:false) whenever currentChat changes so documents reflect the newly
selected chat.

In `@prisma/migrations/20260428143008_add_document_model/migration.sql`:
- Around line 96-97: The migration adds the FK constraint ALTER TABLE "Chat" ADD
CONSTRAINT "Chat_userId_fkey" ... referencing "user", but existing Chat rows
have userId='guest' so the FK will fail; before adding this FK either insert a
corresponding 'guest' row into the newly created "user" table or backfill Chat
rows with a valid user id. Modify the migration to (1) after creating the "user"
table but before the ALTER TABLE, INSERT a 'guest' user record with the same id
used by Chat, or (2) run an UPDATE on the "Chat" table to set userId to an
existing user id for orphaned rows, then add the ALTER TABLE statement. Ensure
the insert/backfill runs in the same migration and precedes the ALTER TABLE
"Chat" ADD CONSTRAINT "Chat_userId_fkey" FOREIGN KEY ("userId") ... statement.
- Line 74: The migration uses "size" INTEGER which can overflow for large files;
update the migration to alter the column type from INTEGER to BIGINT (change
"size" INTEGER NOT NULL to "size" BIGINT NOT NULL) and ensure any related DDL
(constraints/defaults) that reference "size" are preserved; also update the
corresponding Prisma model field (the Document model's size attribute) to
BigInt/BigInt? type so the ORM and DB types stay in sync.

---

Outside diff comments:
In `@backend/schemas/ingestion.py`:
- Around line 23-31: The json_schema_extra example in the ingestion model is
missing the new file_size field; update the example object inside
json_schema_extra (the "example" dict) to include a file_size key with a
realistic numeric value (e.g., file_size: 102400) so the generated OpenAPI docs
show the new property for consumers.

---

Nitpick comments:
In `@prisma/migrations/20260428143008_add_document_model/migration.sql`:
- Line 88: The migration currently creates a non-unique index
("Document_chatId_idx") on Document(chatId) but needs a composite uniqueness
constraint to prevent duplicate documents per chat; modify the migration to add
a composite unique index or UNIQUE constraint on (chatId, filename) for the
Document table (e.g., create a UNIQUE INDEX or ALTER TABLE ADD CONSTRAINT unique
on chatId+filename) so (chatId, filename) is enforced unique and idempotent —
reference the Document table and the chatId and filename columns when applying
the change and update/remove the existing non-unique index name accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f96d3e3c-338f-4660-9bdb-57c9c39e4391

📥 Commits

Reviewing files that changed from the base of the PR and between 6061c86 and eba8ee7.

📒 Files selected for processing (9)
  • app/api/upload/route.ts
  • backend/schemas/ingestion.py
  • backend/services/ingestion_service.py
  • components/chat-area.tsx
  • lib/actions/chat.ts
  • lib/data/chat.ts
  • lib/types.ts
  • prisma/migrations/20260428143008_add_document_model/migration.sql
  • prisma/schema.prisma

Comment thread components/chat-area.tsx
"id" TEXT NOT NULL,
"name" TEXT NOT NULL,
"filename" TEXT NOT NULL,
"size" INTEGER NOT NULL,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat prisma/migrations/20260428143008_add_document_model/migration.sql

Repository: PointerRAG/Project

Length of output: 2876


🏁 Script executed:

# Check the schema file to understand size field context and consistency
find . -name "schema.prisma" -o -name "*schema*" -type f | head -20

Repository: PointerRAG/Project

Length of output: 116


🏁 Script executed:

cat prisma/schema.prisma

Repository: PointerRAG/Project

Length of output: 2532


🏁 Script executed:

# Check if there are any file size limits or constraints defined in the codebase
rg -i "file.*size|max.*upload|size.*limit" --type ts --type js --type tsx --type jsx | head -30

Repository: PointerRAG/Project

Length of output: 90


🏁 Script executed:

# Search for file size constraints without specifying file types
rg -i "file.*size|max.*upload|size.*limit|INTEGER|BIGINT" prisma/ src/ app/ --max-count 20

Repository: PointerRAG/Project

Length of output: 506


🏁 Script executed:

cat app/api/upload/route.ts

Repository: PointerRAG/Project

Length of output: 3386


🏁 Script executed:

# Search for upload size limits or constraints
rg -i "limit|max|size" app/api/upload/ --max-count 15

Repository: PointerRAG/Project

Length of output: 175


Change size field to BIGINT to prevent overflow.

PostgreSQL INTEGER has a maximum value of ~2.14 GB, which is insufficient for document storage. Files larger than this limit would cause integer overflow and data corruption. BIGINT safely handles arbitrarily large file sizes.

Type adjustment
-    "size" INTEGER NOT NULL,
+    "size" BIGINT NOT NULL,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"size" INTEGER NOT NULL,
"size" BIGINT NOT NULL,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prisma/migrations/20260428143008_add_document_model/migration.sql` at line
74, The migration uses "size" INTEGER which can overflow for large files; update
the migration to alter the column type from INTEGER to BIGINT (change "size"
INTEGER NOT NULL to "size" BIGINT NOT NULL) and ensure any related DDL
(constraints/defaults) that reference "size" are preserved; also update the
corresponding Prisma model field (the Document model's size attribute) to
BigInt/BigInt? type so the ORM and DB types stay in sync.

Comment thread prisma/migrations/20260428143008_add_document_model/migration.sql
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class document metadata tracking to the chat system by introducing a Document model/table and plumbing file size through the ingestion/upload flow, enabling the UI to render uploaded document chips based on persisted records.

Changes:

  • Replaces Chat.documentCount column with a Chat -> Document[] relation and counts via Prisma _count.
  • Extends backend ingestion response to include file_size, and creates Document records on upload + deletes them on document removal.
  • Updates chat UI to initialize document chips from currentChat.documents and tweaks header/chip styling.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
prisma/schema.prisma Removes documentCount, adds Document model and Chat.documents relation.
prisma/migrations/20260428143008_add_document_model/migration.sql Migration to drop documentCount and create Document table (currently includes additional unrelated table creation/FKs).
lib/types.ts Adds Document type and Chat.documents.
lib/data/chat.ts Fetches documents and document counts via Prisma includes + _count.
lib/actions/chat.ts Deletes Document records when removing a document (no longer decrements documentCount).
components/chat-area.tsx Seeds document chips from persisted documents; adjusts header and chip styling.
backend/services/ingestion_service.py Adds file_size to ingestion response.
backend/schemas/ingestion.py Adds file_size field to IngestResponse.
app/api/upload/route.ts Creates Document row after successful ingestion using returned filename + file_size.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread components/chat-area.tsx
Comment on lines +57 to +65
const [documents, setDocuments] = useState<UploadedDocument[]>(
(currentChat?.documents ?? []).map((d) => ({
id: d.id,
name: d.name,
size: formatFileSize(d.size),
filename: d.filename,
uploading: false,
})),
);
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documents state is initialized from currentChat?.documents, but it won’t update when navigating to a different chat or after router.refresh() changes currentChat props. This can cause the document chips to show the previous chat’s documents or miss newly persisted documents. Add an effect keyed on currentChat?.id (and/or currentChat?.documents) to reset documents from the latest props when the active chat changes.

Copilot uses AI. Check for mistakes.
Comment thread lib/actions/chat.ts Outdated
Comment on lines 171 to 174
// Delete the Document record from Prisma
await prisma.document.deleteMany({
where: { chatId, filename },
});
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteDocumentAction deletes Document rows by (chatId, filename). If a user uploads multiple documents with the same filename into the same chat, a single remove action will delete all matching records (and the vector delete endpoint also targets filename). If you intend single-document deletion, consider using a unique storage key per upload (persisted in Document.filename) and deleting by that, or enforce uniqueness on (chatId, filename) to make the current behavior explicit.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +22
-- CreateTable
CREATE TABLE "user" (
"id" TEXT NOT NULL,
"name" TEXT NOT NULL,
"email" TEXT NOT NULL,
"emailVerified" BOOLEAN NOT NULL,
"image" TEXT,
"createdAt" TIMESTAMP(3) NOT NULL,
"updatedAt" TIMESTAMP(3) NOT NULL,

CONSTRAINT "user_pkey" PRIMARY KEY ("id")
);
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration is creating the Better Auth tables ("user", "session", "account", "verification") even though the repo already has a Better Auth SQL migration that creates them. Applying this Prisma migration after Better Auth migrations will fail with "relation already exists". Regenerate/split the migration so Prisma only adds the new Document table (and any Chat/Message changes), and ensure the auth tables are managed by a single migration source of truth.

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +10
-- AlterTable
ALTER TABLE "Chat" DROP COLUMN "documentCount",
ALTER COLUMN "userId" DROP DEFAULT;

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration drops the "Chat".userId default (previously 'guest' in the initial migration) and adds a foreign key to "user". If any existing Chat rows have userId values that don’t exist in "user" (e.g. 'guest'), the FK add will fail at migration time. Consider a data migration step (backfill/delete/ensure a matching user row) or avoid adding the FK until data is consistent.

Copilot uses AI. Check for mistakes.
Comment thread app/api/upload/route.ts
Comment on lines +97 to +106
// 5. Create a Document record in Prisma
const filename = result?.filename || "unknown";
const fileSize = result?.file_size || 0;

await prisma.document.create({
data: {
documentCount: { increment: 1 },
updatedAt: new Date(),
name: filename,
filename: filename,
size: BigInt(fileSize),
chatId,
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema comment says Document.name is the original display name and Document.filename is the backend storage name, but this code sets both to result.filename (which the backend describes as the original filename). Either persist the original name separately from a stable storage key returned by the backend, or rename/adjust the schema fields/comments so they reflect what’s actually stored. Also, silently defaulting to "unknown" / 0 risks writing bad rows if the backend response changes—prefer validating required fields and returning an error instead.

Copilot uses AI. Check for mistakes.
Comment thread lib/actions/chat.ts
await prisma.document.deleteMany({
where: { chatId, filename },
});

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously document deletes updated Chat.updatedAt (via prisma.chat.update). Now the Document row is deleted but the Chat timestamp won’t change, which can leave the sidebar ordering / "last updated" time stale after document add/remove (getUserChats orders by updatedAt). Consider touching the Chat (e.g., updating updatedAt) when adding/removing documents to preserve the existing UX behavior.

Suggested change
// Touch the chat so updatedAt reflects document removal and chat ordering stays fresh
await prisma.chat.update({
where: { id: chatId },
data: { updatedAt: new Date() },
});

Copilot uses AI. Check for mistakes.
@alvinsjoy alvinsjoy requested a review from Copilot April 28, 2026 15:30
@alvinsjoy alvinsjoy changed the title add a document table to keep all the uploaded document names and sizes add a document table to persist all the uploaded document names and sizes Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread prisma/schema.prisma Outdated
Comment on lines +97 to +98
name String // original display name (e.g. "report.pdf")
filename String // backend storage name
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document.filename is documented as a backend storage name, but the backend ingestion API and delete endpoint operate on the original upload filename. Either update this field’s meaning/comment to match actual usage, or rename/split the fields so name (display) and filename (storage key) are truly distinct.

Suggested change
name String // original display name (e.g. "report.pdf")
filename String // backend storage name
name String // display name shown in the UI
filename String // original upload filename used by ingestion/delete APIs

Copilot uses AI. Check for mistakes.
Comment thread app/api/upload/route.ts
Comment on lines +97 to +106
// 5. Create a Document record in Prisma
const filename = result?.filename || "unknown";
const fileSize = result?.file_size || 0;

await prisma.document.create({
data: {
documentCount: { increment: 1 },
updatedAt: new Date(),
name: filename,
filename: filename,
size: BigInt(fileSize),
chatId,
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stores the same value into both name and filename. Since the backend currently returns the original upload filename, this makes filename redundant and contradicts the schema’s intent of separating display name vs storage key. Consider persisting the original filename to name and using a distinct storage identifier for filename (or drop one of the fields).

Copilot uses AI. Check for mistakes.
Comment thread app/api/upload/route.ts
Comment on lines +101 to 108
await prisma.document.create({
data: {
documentCount: { increment: 1 },
updatedAt: new Date(),
name: filename,
filename: filename,
size: BigInt(fileSize),
chatId,
},
});
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document upload no longer updates Chat.updatedAt. Since the chat list is ordered by updatedAt, uploads may not bump the chat in recency ordering (previously this was updated when incrementing documentCount). Consider updating the parent chat’s updatedAt (or using a dedicated “lastActivityAt”) when creating a Document.

Copilot uses AI. Check for mistakes.
Comment on lines 14 to 20
"""Successful response for document ingestion."""
chat_id: str = Field(..., description="Chat ID the documents were ingested into")
filename: str = Field(..., description="Original filename of the uploaded document")
file_size: int = Field(..., description="File size in bytes")
chunks_created: int = Field(..., description="Number of text chunks created from the document")
document_ids: List[str] = Field(..., description="List of document IDs for the created chunks")
pages_processed: int = Field(default=1, description="Number of pages processed (for PDFs)")
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IngestResponse now includes required file_size, but the json_schema_extra example below doesn’t include it. Consider adding file_size to the example so the generated OpenAPI docs stay accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +8
- You are about to drop the column `documentCount` on the `Chat` table. All the data in the column will be lost.

*/
-- AlterTable
ALTER TABLE "Chat" DROP COLUMN "documentCount",
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping Chat.documentCount will irreversibly lose existing counts, and switching to a Document table means existing chats with uploaded docs will now show 0 documents unless you backfill Document rows. If this repo has any existing data, consider a data migration/backfill plan (or keep the column temporarily) so document counts don’t regress after deploy.

Suggested change
- You are about to drop the column `documentCount` on the `Chat` table. All the data in the column will be lost.
*/
-- AlterTable
ALTER TABLE "Chat" DROP COLUMN "documentCount",
- Existing `Chat.documentCount` values are preserved in this migration to avoid regressing document counts for pre-existing chats.
*/
-- AlterTable
ALTER TABLE "Chat"

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +98
ALTER TABLE "Chat" ADD CONSTRAINT "Chat_userId_fkey" FOREIGN KEY ("userId") REFERENCES "user"("id") ON DELETE CASCADE ON UPDATE CASCADE;

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a foreign key from Chat.userId to "user" can fail on deploy if any existing Chat rows have userId values that don’t exist in the auth "user" table (e.g., legacy 'guest' rows). Before adding this constraint, ensure the data is cleaned/backfilled (or create a corresponding user row) so the migration is guaranteed to succeed.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/api/upload/route.ts Outdated
// 5. Create a Document record in Prisma
// Use storage_key as the Document ID for direct lookup during deletion
const filename = result?.filename || "unknown";
const storageKey = result?.storage_key || filename;
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falling back to filename when storage_key is missing can create non-unique Document IDs and break deletion (and can cause Prisma unique constraint failures). Treat storage_key as required (or generate a UUID here) and return an error if the backend didn’t provide it.

Suggested change
const storageKey = result?.storage_key || filename;
const storageKey =
typeof result?.storage_key === "string" ? result.storage_key.trim() : "";
if (!storageKey) {
console.error("Python ingestion response missing storage_key:", result);
return NextResponse.json(
{ error: "Invalid response from backend: missing storage_key" },
{ status: 502 },
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +15
-- CreateTable
CREATE TABLE "user" (
"id" TEXT NOT NULL,
"name" TEXT NOT NULL,
"email" TEXT NOT NULL,
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration creates the auth tables (user, session, etc.). Those tables are already created by better-auth_migrations/2025-11-28T17-20-59.231Z.sql, so re-creating them here will fail on existing databases. Remove these CREATE TABLE statements from the Prisma migration and only apply the incremental changes needed for this PR.

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +74
"filename" TEXT NOT NULL,
"size" INTEGER NOT NULL,
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Document table is created with a filename column and size INTEGER, but later migrations immediately change size to BIGINT and drop filename. Consider squashing/editing this migration to create the table in its final shape (name + size BIGINT, no filename) to avoid unnecessary churn and potential drift.

Suggested change
"filename" TEXT NOT NULL,
"size" INTEGER NOT NULL,
"size" BIGINT NOT NULL,

Copilot uses AI. Check for mistakes.
Comment thread lib/data/chat.ts
Comment on lines +69 to +71
name: d.name,
size: Number(d.size),
createdAt: new Date(d.createdAt).toISOString(),
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting BigInt to Number can lose precision for values above Number.MAX_SAFE_INTEGER. If you want to preserve exact sizes, consider returning size as a string (or bigint) through your types and only formatting for display at the UI layer.

Copilot uses AI. Check for mistakes.
Comment thread components/chat-area.tsx
Comment on lines 167 to 172
id: `${Date.now()}-${index}-${file.name}`,
name: file.name,
size: (file.size / 1024).toFixed(2) + " KB",
filename: file.name,
uploading: true,
} satisfies UploadedDocument,
}));
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimistic document id is generated client-side, but deletes now use doc.id as the backend storage_key/Prisma Document.id. After upload succeeds you need to replace this optimistic id with the server-provided storage_key (or fully re-sync documents from refreshed currentChat.documents), otherwise delete will call the backend/DB with an ID that doesn’t exist.

Copilot uses AI. Check for mistakes.
Comment thread components/chat-area.tsx
uploading: false,
})),
);
}, [messages, currentChat?.id]);
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documents state is only reset when currentChat?.id changes, so after router.refresh() the updated currentChat.documents won't repopulate local state (same chat id). This leaves the UI with stale/optimistic document IDs and breaks later delete calls. Update the effect dependencies (or derive documents directly from props) so local state syncs when the documents list changes.

Suggested change
}, [messages, currentChat?.id]);
}, [messages, currentChat?.id, currentChat?.documents]);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/actions/chat.ts (1)

161-174: ⚠️ Potential issue | 🔴 Critical

Scope the document delete to the current chat before calling the backend.

chatId ownership is checked, but documentId is never tied back to that chat. A caller can pass a valid chat they own plus a document ID from another chat; the vector delete becomes a no-op for the owned chat, then prisma.document.delete({ id: documentId }) still removes the other record.

🔒 Proposed fix
+  const document = await prisma.document.findFirst({
+    where: { id: documentId, chatId },
+    select: { id: true },
+  });
+
+  if (!document) {
+    throw new Error("Document not found");
+  }
+
   // Tell the Python backend to delete all chunks for this document
-  const url = `${BACKEND_API_BASE}/vector/document/${chatId}?storage_key=${encodeURIComponent(documentId)}`;
+  const url = `${BACKEND_API_BASE}/vector/document/${chatId}?storage_key=${encodeURIComponent(document.id)}`;
   const response = await fetch(url, { method: "DELETE" });
@@
   // Delete the Document record from Prisma by its unique ID
   await prisma.document.delete({
-    where: { id: documentId },
+    where: { id: document.id },
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/actions/chat.ts` around lines 161 - 174, The document delete is not
scoped to the chat so a user can delete a document from another chat; before
calling the backend (using BACKEND_API_BASE and fetch) verify ownership by
looking up the document with the current chatId (e.g. await
prisma.document.findFirst({ where: { id: documentId, chatId } })) and throw if
not found; only after that successful check proceed to call the Python backend
and then remove the Prisma record (use prisma.document.delete or
prisma.document.deleteMany with the same id+chatId where clause to ensure the DB
delete is also scoped).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/services/vector_service.py`:
- Around line 247-253: delete_documents_by_storage_key currently calls
get_collection(chat_id) which auto-creates collections; modify
delete_documents_by_storage_key to avoid creating collections by using a
non-creating lookup (e.g., add an optional create=False parameter to
get_collection or add a new helper like find_collection/get_existing_collection)
and return immediately (no-op) if no collection exists, otherwise proceed with
the delete logic; update references to get_collection in
delete_documents_by_storage_key to use the new non-creating behavior so orphaned
empty collections are not created.

---

Outside diff comments:
In `@lib/actions/chat.ts`:
- Around line 161-174: The document delete is not scoped to the chat so a user
can delete a document from another chat; before calling the backend (using
BACKEND_API_BASE and fetch) verify ownership by looking up the document with the
current chatId (e.g. await prisma.document.findFirst({ where: { id: documentId,
chatId } })) and throw if not found; only after that successful check proceed to
call the Python backend and then remove the Prisma record (use
prisma.document.delete or prisma.document.deleteMany with the same id+chatId
where clause to ensure the DB delete is also scoped).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5ea1858a-fec8-453d-9ee3-dbadc7588894

📥 Commits

Reviewing files that changed from the base of the PR and between eba8ee7 and 8892ef6.

📒 Files selected for processing (14)
  • app/api/upload/route.ts
  • app/chat/loading.tsx
  • backend/api/v1/ingestion_routes.py
  • backend/api/v1/vector_routes.py
  • backend/schemas/ingestion.py
  • backend/services/ingestion_service.py
  • backend/services/vector_service.py
  • components/chat-area.tsx
  • lib/actions/chat.ts
  • lib/data/chat.ts
  • lib/types.ts
  • prisma/migrations/20260428150627_change_size_to_bigint/migration.sql
  • prisma/migrations/20260428155457_remove_document_filename/migration.sql
  • prisma/schema.prisma
✅ Files skipped from review due to trivial changes (2)
  • prisma/migrations/20260428150627_change_size_to_bigint/migration.sql
  • lib/types.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • backend/schemas/ingestion.py
  • app/api/upload/route.ts
  • prisma/schema.prisma
  • backend/services/ingestion_service.py
  • components/chat-area.tsx

Comment thread backend/services/vector_service.py
@alvinsjoy alvinsjoy merged commit ddd57f0 into main Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants