refactor(platform): collapse RAG indexing status onto fileMetadata.ragStatus#1840
Conversation
…gStatus RAG indexing status was stored in three places that drifted: documents.ragInfo + documents.indexed, fileMetadata.ragStatus, and the RAG service's own row, kept loosely in sync by polling. Make fileMetadata.ragStatus the single source of truth within Convex and project it on reads. - documents.ragInfo / documents.indexed are no longer written (kept @deprecated, optional, for existing rows). Reads go through a new getDocumentRagProjection helper (join documents.fileId == fileMetadata.storageId), so the response shape and the frontend are unchanged. - Add fileMetadata.ragIndexedAt (canonical completion timestamp) + a new by_organizationId_and_ragStatus index. getAgentScopedFileIds and listIndexedDocumentsForAgent query that index and resolve to documents, replacing the retired documents by_organizationId_and_indexed index. - Add a server-driven poller pollFileRagStatus (keyed by storageId), scheduled by uploadFileToRag, so Document-Hub uploads advance to completed even with no chat open. The documents-pipeline uploaders (uploadDocumentToRag, reindexDocumentInRag, retryRagIndexing) now write fileMetadata status and schedule this poller; checkRagDocumentStatus is retired-but-kept for drain. - Fix the one creation gap: REST POST /api/v1/documents with a fileId now creates the fileMetadata row. - One-time backfill copies legacy documents.ragInfo -> fileMetadata (fill-holes only, idempotent), registered in migrations runAll. Retrieval/search path and the RAG Python service are untouched; vectors stay in RAG's Postgres.
📝 WalkthroughWalkthroughThis PR migrates RAG (Retrieval-Augmented Generation) indexing state from the document-level Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
services/platform/convex/documents/list_indexed_documents_for_agent.ts (2)
292-326:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winSame filtering issue in the inline composite-cursor mock.
The ad-hoc mock at lines 304-309 also creates
fmsfor all docs without filtering byragInfo.status === 'completed'. Apply the same filter as the maincreateMockCtx.🛠️ Proposed fix
- const fms = docs.map((d) => ({ - storageId: d.fileId, - documentId: d._id, - ragStatus: 'completed' as const, - organizationId: 'org1', - })); + const fms = docs + .filter( + (d) => + (d.ragInfo as { status?: string } | undefined)?.status === 'completed', + ) + .map((d) => ({ + storageId: d.fileId, + documentId: d._id, + ragStatus: 'completed' as const, + organizationId: 'org1', + })); const docsById = new Map(docs.map((d) => [d._id, d]));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/convex/documents/list_indexed_documents_for_agent.ts` around lines 292 - 326, The inline composite-cursor mock in createMockCtx is building the fms array for all documents without filtering by ragInfo.status === 'completed'; update the ad-hoc mock (the composite-cursor branch that constructs fms at runtime) to apply the same filter used in the main createMockCtx so only documents with ragInfo?.status === 'completed' are included when populating fms (match the filtering logic and field checks used elsewhere in createMockCtx).
14-50:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winMock does not filter by completion status.
The mock creates
fileMetadatarows withragStatus: 'completed'for all input docs (line 18-23), without filtering byragInfo.status. This breaks test accuracy:
- Test "excludes non-completed documents" (line 175) passes docs with
ragInfo: { status: 'queued' }but the mock will return them as completedfileMetadata, causing incorrect test results.- The production code queries
by_organizationId_and_ragStatuswith.eq('ragStatus', 'completed'), which should only return truly completed files.Compare to
get_agent_scoped_file_ids.test.ts(lines 24-28), which correctly filters:const completedFms = docs .filter( (d) => (d.ragInfo as { status?: string } | undefined)?.status === 'completed', ) .map((d) => ({...}));🛠️ Proposed fix
function createMockCtx(docs: MockDoc[]) { - // Completion is canonical on fileMetadata.ragStatus now: the query paginates - // completed fileMetadata (one per doc), then resolves each to its document - // via ctx.db.get(documentId). - const fms = docs.map((d) => ({ - storageId: d.fileId, - documentId: d._id, - ragStatus: 'completed' as const, - organizationId: 'org1', - })); + // Derive completed fileMetadata from docs where ragInfo.status === 'completed' + // to match the by_organizationId_and_ragStatus index filter in production. + const fms = docs + .filter( + (d) => + (d.ragInfo as { status?: string } | undefined)?.status === 'completed', + ) + .map((d) => ({ + storageId: d.fileId, + documentId: d._id, + ragStatus: 'completed' as const, + organizationId: 'org1', + })); const docsById = new Map(docs.map((d) => [d._id, d]));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/convex/documents/list_indexed_documents_for_agent.ts` around lines 14 - 50, The test mock that generates fileMetadata rows should only mark a doc as ragStatus: 'completed' when the source doc's ragInfo.status === 'completed'; update the mock generation logic to filter docs by (d.ragInfo?.status === 'completed') before mapping to fileMetadata so it mirrors the production query that uses by_organizationId_and_ragStatus.eq('ragStatus','completed'); locate the mock in the tests that reference AgentIndexedDocumentItem / AgentIndexedDocumentListResult (the place creating fileMetadata entries for docs) and change the mapping/filtering to match the pattern used in get_agent_scoped_file_ids.test.ts (filter then map).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/platform/convex/documents/list_indexed_documents_for_agent.test.ts`:
- Line 45: The inline getter uses a type assertion (id as string); replace it
with a type guard to avoid assertions—update the get function (const get = async
(id: unknown) => ...) to check that id is a string (e.g., if (typeof id !==
'string') return null) before calling docsById.get(id) and return the result or
null; keep the function signature and semantics but remove the "as" assertion
and rely on the runtime type guard.
- Line 322: The inline type assertion in the get function (const get = async
(id: unknown) => docsById.get(id as string) ?? null;) should be replaced with a
type guard: check that id is a string (e.g., typeof id === "string") before
calling docsById.get and return null for non-strings; update the get function to
perform this guard so it no longer uses the `as string` assertion and adheres to
the project's type-safety guideline.
In `@services/platform/convex/documents/rest_api.ts`:
- Around line 78-95: When handling body.fileId in this block (where you call
toId<'_storage'>(body.fileId),
rc.ctx.runQuery(internal.file_metadata.internal_queries.getStorageMetadata, ...)
and
rc.ctx.runMutation(internal.file_metadata.internal_mutations.saveFileMetadata,
...)), add a defensive check for meta === null/undefined: if meta is missing,
emit a warning via the request context logger (e.g., rc.ctx.logger.warn or
rc.log.warn) that includes the storageId and fileId, and either abort with a
clear client error/throw or return early instead of saving a metadata row with
size: 0; ensure the warning message includes enough identifiers (storageId,
rc.org.organizationId, rc.user.userId) so the caller/debugger can correlate the
invalid fileId.
In `@services/platform/convex/documents/upsert_document_by_external_id.test.ts`:
- Around line 66-69: The test fixture's mock for the query builder (.first())
doesn't simulate lookups by storageId, so calls to fileMetadata.by_storageId
always return empty and the existingIndexed branch never gets exercised; update
the fixture that defines matched and the mock first() used in the RAG-status
reindex gate so it captures and responds to storageId queries (i.e., when
fileMetadata.by_storageId is invoked with a specific storageId return the
corresponding matched entry instead of null), ensuring the test triggers the
existingIndexed code path—look for the mocked first(), matched array,
fileMetadata.by_storageId lookup, and the existingIndexed branch to implement
this behavior.
In `@services/platform/convex/documents/upsert_document_by_external_id.ts`:
- Around line 163-170: The code currently does an unconditional lookup using
existingFileId -> ctx.db.query('fileMetadata').withIndex('by_storageId'...) to
set existingFm/existingIndexed even for updates that cannot trigger reindex;
gate that DB read behind the same reindex preconditions (the same
boolean/condition used elsewhere to decide reindexing) so you only query when
reindex is allowed. Modify the logic around
existingFileId/existingFm/existingIndexed (and the similar occurrence around the
block at the other occurrence) to check the reindex-enabled condition first, and
only call ctx.db.query('fileMetadata').withIndex('by_storageId', ...) when that
condition is true; default existingFm/null and existingIndexed=false when
skipped.
In `@services/platform/convex/file_metadata/internal_actions.ts`:
- Around line 195-200: The empty catch around "body = await response.json();"
should not swallow errors; change it to catch the error (e.g., catch (err)) and
log it with console.warn or console.error including context (such as the
response status or that parsing failed) before calling reschedule() and
returning null (or re-throw if you prefer). Update the try/catch in the same
block that calls response.json(), referencing the variables "body", "response",
and the helper "reschedule" so the log clearly identifies which response failed
to parse.
---
Outside diff comments:
In `@services/platform/convex/documents/list_indexed_documents_for_agent.ts`:
- Around line 292-326: The inline composite-cursor mock in createMockCtx is
building the fms array for all documents without filtering by ragInfo.status ===
'completed'; update the ad-hoc mock (the composite-cursor branch that constructs
fms at runtime) to apply the same filter used in the main createMockCtx so only
documents with ragInfo?.status === 'completed' are included when populating fms
(match the filtering logic and field checks used elsewhere in createMockCtx).
- Around line 14-50: The test mock that generates fileMetadata rows should only
mark a doc as ragStatus: 'completed' when the source doc's ragInfo.status ===
'completed'; update the mock generation logic to filter docs by
(d.ragInfo?.status === 'completed') before mapping to fileMetadata so it mirrors
the production query that uses
by_organizationId_and_ragStatus.eq('ragStatus','completed'); locate the mock in
the tests that reference AgentIndexedDocumentItem /
AgentIndexedDocumentListResult (the place creating fileMetadata entries for
docs) and change the mapping/filtering to match the pattern used in
get_agent_scoped_file_ids.test.ts (filter then map).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1b65b587-7964-45b2-8f06-3b8771714a9b
⛔ Files ignored due to path filters (2)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**services/platform/convex/migrations/backfill_filemetadata_rag_status.tsis excluded by!**/migrations/**
📒 Files selected for processing (22)
services/platform/convex/documents/actions.tsservices/platform/convex/documents/get_agent_scoped_file_ids.test.tsservices/platform/convex/documents/get_agent_scoped_file_ids.tsservices/platform/convex/documents/get_document_rag_projection.tsservices/platform/convex/documents/internal_actions.tsservices/platform/convex/documents/list_indexed_documents_for_agent.test.tsservices/platform/convex/documents/list_indexed_documents_for_agent.tsservices/platform/convex/documents/rest_api.tsservices/platform/convex/documents/schema.tsservices/platform/convex/documents/transform_to_document_item.tsservices/platform/convex/documents/types.tsservices/platform/convex/documents/update_document_internal.tsservices/platform/convex/documents/update_document_rag_info.tsservices/platform/convex/documents/upsert_document_by_external_id.test.tsservices/platform/convex/documents/upsert_document_by_external_id.tsservices/platform/convex/documents/validators.tsservices/platform/convex/file_metadata/internal_actions.tsservices/platform/convex/file_metadata/internal_mutations.tsservices/platform/convex/file_metadata/internal_queries.tsservices/platform/convex/file_metadata/schema.tsservices/platform/convex/migrations.tsservices/platform/convex/projects/queries.ts
| }); | ||
|
|
||
| return { db: { query: queryFn } } as unknown as Parameters< | ||
| const get = async (id: unknown) => docsById.get(id as string) ?? null; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Replace type assertion with type guard.
The guideline prohibits using as type assertions. Use a type guard instead to maintain type safety.
♻️ Proposed fix
- const get = async (id: unknown) => docsById.get(id as string) ?? null;
+ const get = async (id: unknown) =>
+ typeof id === 'string' ? docsById.get(id) ?? null : null;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const get = async (id: unknown) => docsById.get(id as string) ?? null; | |
| const get = async (id: unknown) => | |
| typeof id === 'string' ? docsById.get(id) ?? null : null; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/platform/convex/documents/list_indexed_documents_for_agent.test.ts`
at line 45, The inline getter uses a type assertion (id as string); replace it
with a type guard to avoid assertions—update the get function (const get = async
(id: unknown) => ...) to check that id is a string (e.g., if (typeof id !==
'string') return null) before calling docsById.get(id) and return the result or
null; keep the function signature and semantics but remove the "as" assertion
and rely on the runtime type guard.
Source: Coding guidelines
| }), | ||
| }); | ||
| const ctx = { db: { query: queryFn } } as unknown as Parameters< | ||
| const get = async (id: unknown) => docsById.get(id as string) ?? null; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Replace type assertion with type guard.
Same issue as line 45—use a type guard instead of as string to align with coding guidelines.
♻️ Proposed fix
- const get = async (id: unknown) => docsById.get(id as string) ?? null;
+ const get = async (id: unknown) =>
+ typeof id === 'string' ? docsById.get(id) ?? null : null;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const get = async (id: unknown) => docsById.get(id as string) ?? null; | |
| const get = async (id: unknown) => | |
| typeof id === 'string' ? docsById.get(id) ?? null : null; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/platform/convex/documents/list_indexed_documents_for_agent.test.ts`
at line 322, The inline type assertion in the get function (const get = async
(id: unknown) => docsById.get(id as string) ?? null;) should be replaced with a
type guard: check that id is a string (e.g., typeof id === "string") before
calling docsById.get and return null for non-strings; update the get function to
perform this guard so it no longer uses the `as string` assertion and adheres to
the project's type-safety guideline.
Source: Coding guidelines
| if (body.fileId) { | ||
| const storageId = toId<'_storage'>(body.fileId); | ||
| const meta = await rc.ctx.runQuery( | ||
| internal.file_metadata.internal_queries.getStorageMetadata, | ||
| { storageId }, | ||
| ); | ||
| await rc.ctx.runMutation( | ||
| internal.file_metadata.internal_mutations.saveFileMetadata, | ||
| { | ||
| organizationId: rc.org.organizationId, | ||
| storageId, | ||
| fileName: body.title ?? 'document', | ||
| contentType: | ||
| body.mimeType ?? meta?.contentType ?? 'application/octet-stream', | ||
| size: meta?.size ?? 0, | ||
| uploadedBy: rc.user.userId, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider handling missing blob metadata more defensively.
When meta is null (blob doesn't exist), the code proceeds with size: 0 which may create an inconsistent fileMetadata row. If the caller passes an invalid fileId, the operation silently succeeds but the RAG pipeline will likely fail later.
This may be acceptable as a "caller error" scenario, but a warning log would help with debugging.
Optional: Add a warning for missing blob
if (body.fileId) {
const storageId = toId<'_storage'>(body.fileId);
const meta = await rc.ctx.runQuery(
internal.file_metadata.internal_queries.getStorageMetadata,
{ storageId },
);
+ if (!meta) {
+ console.warn(
+ `[createDocument] No blob found for fileId=${body.fileId}; fileMetadata row may be inconsistent`,
+ );
+ }
await rc.ctx.runMutation(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/platform/convex/documents/rest_api.ts` around lines 78 - 95, When
handling body.fileId in this block (where you call
toId<'_storage'>(body.fileId),
rc.ctx.runQuery(internal.file_metadata.internal_queries.getStorageMetadata, ...)
and
rc.ctx.runMutation(internal.file_metadata.internal_mutations.saveFileMetadata,
...)), add a defensive check for meta === null/undefined: if meta is missing,
emit a warning via the request context logger (e.g., rc.ctx.logger.warn or
rc.log.warn) that includes the storageId and fileId, and either abort with a
clear client error/throw or return early instead of saving a metadata row with
size: 0; ensure the warning message includes enough identifiers (storageId,
rc.org.organizationId, rc.user.userId) so the caller/debugger can correlate the
invalid fileId.
| // The RAG-status reindex gate queries fileMetadata `by_storageId` | ||
| // and calls .first(); these fixtures have no fileMetadata, so the | ||
| // gate stays inactive (matched is empty for that lookup). | ||
| first: async () => matched[0] ?? null, |
There was a problem hiding this comment.
Mock .first() does not model fileMetadata.by_storageId behavior.
Line 69 adds .first(), but the query builder never captures storageId, so canonical gate lookups always resolve as empty in this fixture. That leaves the new existingIndexed branch effectively untested.
Suggested fixture enhancement
interface MockFolder {
@@
}
-function createMockCtx(initial: MockDoc[], initialFolders: MockFolder[] = []) {
+interface MockFileMetadata {
+ storageId: string;
+ ragStatus?: 'queued' | 'running' | 'completed' | 'failed';
+}
+
+function createMockCtx(
+ initial: MockDoc[],
+ initialFolders: MockFolder[] = [],
+ initialFileMetadata: MockFileMetadata[] = [],
+) {
@@
const folders = new Map<string, MockFolder>();
for (const f of initialFolders) folders.set(f._id, f);
+ const fileMetadata = new Map<string, MockFileMetadata>();
+ for (const fm of initialFileMetadata) fileMetadata.set(fm.storageId, fm);
@@
- query: () => ({
+ query: (table: string) => ({
withIndex: (
@@
let orgFilter: string | undefined;
let externalFilter: string | undefined;
+ let storageFilter: string | undefined;
const qb = {
eq: (field: string, value: unknown) => {
if (field === 'organizationId') orgFilter = value as string;
if (field === 'externalItemId') externalFilter = value as string;
+ if (field === 'storageId') storageFilter = value as string;
return qb;
},
};
cb(qb);
+ if (table === 'fileMetadata') {
+ const fm = storageFilter
+ ? fileMetadata.get(storageFilter) ?? null
+ : null;
+ return {
+ [Symbol.asyncIterator]: async function* () {
+ if (fm) yield fm;
+ },
+ first: async () => fm,
+ };
+ }
const matched: MockDoc[] = [];🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/platform/convex/documents/upsert_document_by_external_id.test.ts`
around lines 66 - 69, The test fixture's mock for the query builder (.first())
doesn't simulate lookups by storageId, so calls to fileMetadata.by_storageId
always return empty and the existingIndexed branch never gets exercised; update
the fixture that defines matched and the mock first() used in the RAG-status
reindex gate so it captures and responds to storageId queries (i.e., when
fileMetadata.by_storageId is invoked with a specific storageId return the
corresponding matched entry instead of null), ensuring the test triggers the
existingIndexed code path—look for the mocked first(), matched array,
fileMetadata.by_storageId lookup, and the existingIndexed branch to implement
this behavior.
| const existingFileId = existing.fileId; | ||
| const existingFm = existingFileId | ||
| ? await ctx.db | ||
| .query('fileMetadata') | ||
| .withIndex('by_storageId', (q) => q.eq('storageId', existingFileId)) | ||
| .first() | ||
| : null; | ||
| const existingIndexed = existingFm?.ragStatus === 'completed'; |
There was a problem hiding this comment.
Gate the fileMetadata read behind the same reindex preconditions.
Line 164 currently performs a by_storageId lookup on every update, including metadata/folder-only updates where reindex cannot run. This adds avoidable reads on a hot sync path.
Proposed fix
- const existingFileId = existing.fileId;
- const existingFm = existingFileId
- ? await ctx.db
- .query('fileMetadata')
- .withIndex('by_storageId', (q) => q.eq('storageId', existingFileId))
- .first()
- : null;
- const existingIndexed = existingFm?.ragStatus === 'completed';
+ let existingIndexed = false;
+ if (
+ contentChanged &&
+ existing.fileId &&
+ args.fileId &&
+ existing.fileId !== args.fileId
+ ) {
+ const existingFm = await ctx.db
+ .query('fileMetadata')
+ .withIndex('by_storageId', (q) => q.eq('storageId', existing.fileId))
+ .first();
+ existingIndexed = existingFm?.ragStatus === 'completed';
+ }Also applies to: 177-177
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/platform/convex/documents/upsert_document_by_external_id.ts` around
lines 163 - 170, The code currently does an unconditional lookup using
existingFileId -> ctx.db.query('fileMetadata').withIndex('by_storageId'...) to
set existingFm/existingIndexed even for updates that cannot trigger reindex;
gate that DB read behind the same reindex preconditions (the same
boolean/condition used elsewhere to decide reindexing) so you only query when
reindex is allowed. Modify the logic around
existingFileId/existingFm/existingIndexed (and the similar occurrence around the
block at the other occurrence) to check the reindex-enabled condition first, and
only call ctx.db.query('fileMetadata').withIndex('by_storageId', ...) when that
condition is true; default existingFm/null and existingIndexed=false when
skipped.
| try { | ||
| body = await response.json(); | ||
| } catch { | ||
| await reschedule(); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Empty catch block violates coding guidelines.
The catch block on line 197 is empty. Per coding guidelines, catch blocks should log with console.warn/console.error or re-throw to avoid hiding bugs.
Proposed fix
let body: unknown;
try {
body = await response.json();
- } catch {
+ } catch (parseErr) {
+ console.warn(
+ `[pollFileRagStatus] Failed to parse RAG response for ${args.storageId}:`,
+ parseErr,
+ );
await reschedule();
return null;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| body = await response.json(); | |
| } catch { | |
| await reschedule(); | |
| return null; | |
| } | |
| try { | |
| body = await response.json(); | |
| } catch (parseErr) { | |
| console.warn( | |
| `[pollFileRagStatus] Failed to parse RAG response for ${args.storageId}:`, | |
| parseErr, | |
| ); | |
| await reschedule(); | |
| return null; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/platform/convex/file_metadata/internal_actions.ts` around lines 195
- 200, The empty catch around "body = await response.json();" should not swallow
errors; change it to catch the error (e.g., catch (err)) and log it with
console.warn or console.error including context (such as the response status or
that parsing failed) before calling reschedule() and returning null (or re-throw
if you prefer). Update the try/catch in the same block that calls
response.json(), referencing the variables "body", "response", and the helper
"reschedule" so the log clearly identifies which response failed to parse.
Source: Coding guidelines
Review follow-up to the fileMetadata.ragStatus collapse. - ragIndexedAt is now stamped in Unix seconds (was Date.now() ms). The RagStatusBadge renders `new Date(indexedAt * 1000)` and both the legacy ragInfo.indexedAt writer and the backfill store seconds, so writing ms made every fresh completion render a far-future date and mixed units in the column. - retryRagIndexing and uploadDocumentToRag now ensure the canonical fileMetadata row exists (new ensureFileMetadataForDocument mutation) before writing status. updateFileRagStatus silently no-ops on a missing row, which left the "Index" button reporting success while status stayed stuck on not_indexed. The ensure mutation reads size/contentType from _storage and schedules no upload, so it does not double-index. - Register backfillFileMetadataDocumentId in runAll ahead of the rag-status backfill so completed fileMetadata rows are linked and stay in agent RAG retrieval scope. - Add unit tests for the seconds stamping and ensureFileMetadataForDocument.
…ueries
Addresses medium+ findings from the review of the fileMetadata.ragStatus
single-source-of-truth cutover.
Migration (backfill_filemetadata_rag_status):
- Mirror only TERMINAL legacy ragInfo statuses. Copying queued/running would
pin the canonical row forever — nothing advances a non-terminal ragStatus
post-cutover and expireStaleRagQueue only rescues 'queued'.
- Create the fileMetadata row when missing instead of skipping, so a legacy
completed doc whose blob never got a row no longer regresses to not_indexed
and silently drops out of agent RAG retrieval. Idempotent (re-run finds the
created row and skips); OCC serializes overlapping chains.
- BATCH_SIZE 200 -> 100 for write-byte / OCC headroom now that batches insert.
- Rewrite the misleading runAll ordering comment (the two backfills are
independent; call order is cosmetic).
Agent-scope query scale (getAgentScopedFileIds, listIndexedDocumentsForAgent):
- Replace by_organizationId_and_ragStatus with composite
by_organizationId_and_ragStatus_and_documentId and query
.eq(org).eq('completed').gt('documentId', undefined). Convex orders a missing
optional field as the least value, so the bound seeks PAST chat-upload /
transcript rows (documentId absent) at the index level — the scan stays
bounded by Hub-doc count instead of the org's whole completed corpus.
- Lower list_indexed MAX_LIMIT 500 -> 100.
- Degrade gracefully (skip knowledge context / empty results) when the scope
query throws, instead of aborting the agent turn — in generate_response and
both rag_search tool paths.
Tests:
- New: backfill migration (terminal-only filter, insert-when-missing,
idempotency, self-schedule), pollFileRagStatus poller (status/HTTP branches),
update_document_internal reindex gate.
- Extended: upsert reindex gate (now exercised), get_agent_scoped /
list_indexed SSOT skip branches (documentId-missing, stale-blob, trashed),
transform_to_document_item RAG projection mapping.
Context
RAG indexing status was stored in three places that drift:
documents.ragInfo+documents.indexed,fileMetadata.ragStatus, and the RAG service's own Postgres row — kept loosely in sync by polling, with two independent writers inside Convex. This makesfileMetadata.ragStatusthe single source of truth within Convex and projects it on reads. It's a Convex-internal change: the RAG Python service, the synchronous search/retrieval path, and the receive-only boundary are untouched; vectors stay in RAG's Postgres.What changed
Single canonical owner
documents.ragInfo/documents.indexedare no longer written (kept@deprecated+ optional for existing rows). Every reader projects status through one helper,getDocumentRagProjection(joindocuments.fileId == fileMetadata.storageId), so the response shape and the frontend are unchanged and drift becomes structurally impossible. This also resolves the within-Convex asymmetry (chat uploads have nodocumentsrow) by keying onstorageId, which both upload kinds share.Schema + index swap
fileMetadata.ragIndexedAt(canonical completion timestamp) and aby_organizationId_and_ragStatusindex.getAgentScopedFileIdsandlistIndexedDocumentsForAgentquery that index and resolve to documents, replacing the retireddocuments.by_organizationId_and_indexed(kept, deprecate-don't-delete).Server-driven status advancement
pollFileRagStatus(storageId-keyed, self-scheduling), scheduled byuploadFileToRag, so Document-Hub uploads reachcompletedeven with no chat open — previously the only server-driven poller was the documents pipeline, andcheckFileRagStatusesis polled only by the chat UI. The documents-pipeline uploaders (uploadDocumentToRag,reindexDocumentInRag,retryRagIndexing) now writefileMetadatastatus and schedule this poller.checkRagDocumentStatusis retired-but-kept (@deprecated) so any pre-deploy scheduled job drains harmlessly.Coverage + backfill
POST /api/v1/documentswith afileIdnow creates thefileMetadatarow (the one creation path that didn't).backfill_filemetadata_rag_statuscopies legacydocuments.ragInfo→fileMetadata(fill-holes-only, idempotent), registered in migrationsrunAll.Verification
tsc --noEmit,oxlint --type-aware, andformat:checkall clean.documents/file_metadata/projects); updated the mocks inget_agent_scoped_file_ids,list_indexed_documents_for_agent, andupsert_document_by_external_idfor the fileMetadata-driven query path.convex dev --oncepushed successfully — schema validated against existing local data and the new index built.completed; agent retrieval; chat upload; an integration sync re-queue; REST create withfileId.Summary by CodeRabbit
Release Notes
New Features
Improvements