feat: contract comparison workflow with chunk-level RAG#722
Conversation
…rn_chunks option - JOIN chunks with documents table in search queries to include filename - Add sync=true parameter to upload endpoint for synchronous ingestion - Add return_chunks=true parameter to content endpoint for individual chunk arrays Refs #664
- Add sync option to UploadFileArgs, appending ?sync=true query param - Increase timeout to 120s for sync uploads vs 30s default - Skip polling and set ragInfo to 'completed' directly when sync=true - Add sync to RagActionParams type and parametersValidator Refs #664
…ction - Create fetch_document_content.ts with shared HTTP fetch logic for RAG content endpoint - Refactor retrieve_document.ts to delegate fetch to shared helper - Add 'retrieve' operation to document action (with access control via getAccessibleDocumentIds) - Add 'generate_docx' operation to document action (delegates to generateDocument internal action) - Support returnChunks option in fetch helper for chunk-level iteration Refs #664
- Add 'search' variant to RagActionParams and parametersValidator - POST to /api/v1/search with document_ids, top_k, similarity_threshold - Return raw SearchResult[] with content, score, document_id, filename - Add filename field to SearchResult interface Refs #664
Add example workflow definition for comparing legal contracts using chunk-level RAG search. Fix lint error in RAG search timeout handling. Refs #664
The create_workflow tool's Zod schema and internal mutation validator were missing 'output' as a valid stepType, despite it being supported by the schema, save_workflow_definition tool, and workflow engine. Refs #664
Replace lenient jsonRecordValidator with stepConfigValidator in the createWorkflowCreationApproval mutation so invalid step configs are caught before the approval card is shown, not at execution time. Also fix contract comparison workflow inputSchema to use the correct properties/required wrapper format. Refs #664
…RAG operations RAG service indexes files by their storage ID, not the application-level document ID. This aligns all RAG interactions (upload, delete, search, retrieve) to use file storage IDs consistently, adds indexedFileId tracking to ragInfo schema, automatic re-indexing on document file updates, backward-compatible param migration for existing workflows, and reliable MIME type resolution via extension fallback for uploads.
Introduce a fileMetadata schema and module that records organizationId, storageId, fileName, contentType, and size for every file stored in Convex storage. Integrate metadata persistence across all file upload paths: chat uploads, document creation, OneDrive imports, agent tools (excel/txt), document generation (docx/pptx/generic), integration sandbox storage, and webhook file uploads. The RAG upload_document helper now resolves fileName and contentType from fileMetadata, ensuring correct file extensions are sent to the RAG service.
…ment retrieve Add stop/cancel button to workflow run approval card allowing users to cancel running executions. Includes new cancelExecution mutation with proper status validation and component workflow cleanup. Remove redundant access check from document retrieve action since access is already validated at the workflow level. Update contract comparison config to use formatList template helper.
…s and executive overview Replace free-form LLM summary with structured difference tables per clause and a brief executive introduction, embedding clause-level results directly in the DOCX report for better readability and consistency.
Storage blobs from workflow executions are now deleted via scheduled jobs instead of immediately, preventing race conditions where concurrent reads may still reference old blobs. Intermediate storage uses a 24-hour retention period; final storage retains the existing 30-day window. Adds deferred cleanup to cancellation and stuck execution recovery paths.
…low output serializeAndCompleteExecution was JSON.parsing variables containing a _storageRef pointer instead of fetching the actual data from blob storage, causing workflow output to be lost for large payloads (>400KB).
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
…pletion - Add get_metadata document action to retrieve file names by storage ID - Include base and comparison document names in workflow config, LLM prompts, and final report - Replace passive system message with active agent generation on workflow completion - Add idempotency guard to prevent duplicate completion messages - Dynamic searchTopK based on number of comparison documents - Require LLM to output sections for all comparison docs even without RAG matches
Split the serialize-and-complete action into a serialize-only step (persistExecutionOutput) so output and variables are persisted before the component callback fires. The onWorkflowComplete callback remains the sole authority for transitioning execution status to terminal states, reading the already-persisted output instead of relying on the void returnValue from dynamic workflows.
…mprove workflow prompts Rename fileId/url to fileStorageId/downloadUrl across all document generation tools, types, and validators for consistent naming. Suppress LLM meta-commentary in contract comparison prompts and use system role for workflow completion messages.
Remove colored borders and relocate the status badge from the top-right corner (where it was truncated by overflow-hidden) to the bottom-right where there is sufficient space for full display.
Prefix unused `variables` parameter with underscore and replace unsafe `as` cast with `toId` helper for storage ID conversion.
700e49f to
71823c9
Compare
📝 WalkthroughWalkthroughThis PR implements a comprehensive file metadata persistence system and refactors the execution lifecycle for workflow runs. Key changes include: (1) introducing a new Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
✨ Finishing Touches
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 27
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
services/platform/convex/agent_tools/files/pptx_tool.ts (1)
327-336:⚠️ Potential issue | 🔴 CriticalFix type mismatch between
GeneratePptxResult.fileStorageIdandGenerateResult.fileStorageId.The
generatePptxaction returnsfileStorageIdasId<'_storage'>, butGenerateResultexpects it asstring. The type assertionas GenerateResultmasks this type incompatibility. Either changeGenerateResult.fileStorageIdtoId<'_storage'>to match the upstream return type, or explicitly convert the ID before spreading the result.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/files/pptx_tool.ts` around lines 327 - 336, Type mismatch: GeneratePptxResult.fileStorageId is Id<'_storage'> but GenerateResult.fileStorageId is string and the current `as GenerateResult` hides this; fix by either updating the GenerateResult type to use Id<'_storage'> for fileStorageId or convert the returned ID to string before returning. Locate the return in generatePptx (the `result` object used in the block that logs 'tool:pptx generate success') and either change GenerateResult.fileStorageId to Id<'_storage'> or replace the spread with an explicit fileStorageId: String(result.fileStorageId) (or equivalent conversion) so the returned shape matches GenerateResult without a type assertion.services/platform/convex/agent_tools/rag/format_search_results.ts (1)
8-13:⚠️ Potential issue | 🟠 MajorSurface
filenamein the formatted chunk output.Adding
filenametoSearchResultis inert right now becauseformatSearchResultsstill emits only score + content. In the new multi-contract comparison flow, that drops source attribution for each chunk, so the agent can't reliably tell which contract an excerpt came from.♻️ Proposed fix
export function formatSearchResults( results: SearchResult[], ): string | undefined { if (results.length === 0) return undefined; return results .map((r, idx) => { const score = (r.score * 100).toFixed(1); - return `[${idx + 1}] (Relevance: ${score}%)\n${r.content}`; + const sourceLine = r.filename ? `Source: ${r.filename}\n` : ''; + return `[${idx + 1}] (Relevance: ${score}%)\n${sourceLine}${r.content}`; }) .join('\n\n---\n\n'); }Also applies to: 40-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/rag/format_search_results.ts` around lines 8 - 13, formatSearchResults currently only emits score and content, losing the new SearchResult.filename; update the function formatSearchResults (and any helper like formatChunk or buildFormattedResults) to include filename (and optionally document_id/metadata) in each formatted chunk output so the agent can see source attribution. Locate where SearchResult objects are mapped/serialized inside formatSearchResults and append a readable filename field (e.g., "filename: <value>") to the returned string/object for each result, ensuring undefined filenames are handled gracefully (skip or show "unknown").services/rag/app/routers/search.py (1)
35-41:⚠️ Potential issue | 🟡 MinorPropagate
filenamein/generatesources too.
/searchnow includes filenames, butgenerate()still strips them when buildingsources. That makes the two RAG endpoints expose different provenance and leaves generated answers without the same source attribution.♻️ Proposed fix
sources = [ SearchResult( content=s.get("content", ""), score=s.get("score", 0.0), document_id=s.get("document_id"), + filename=s.get("filename"), metadata=s.get("metadata"), ) for s in result.get("sources", []) ]Also applies to: 75-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/rag/app/routers/search.py` around lines 35 - 41, The generate() handler currently builds its sources from search results but omits the filename field, causing inconsistent provenance; update the code that maps search results into source objects (the block in generate() that builds "sources", referenced around the mapping of SearchResult/`r` into source entries) to include the filename (e.g., use r.get("filename") or sr.filename) and propagate it into each source's metadata or filename property so the /generate response includes the same filename attribution as /search; apply the same change where sources are constructed in the second occurrence (the other mapping around lines 75-81).services/rag/app/services/rag_service.py (1)
328-340:⚠️ Potential issue | 🟡 MinorMake
return_chunks=trueconsistent for empty ranges.
chunksis only populated on the non-empty path. When the requested window resolves to zero rows, the earlier return still omits it, so callers explicitly asking for chunk data receivenull/missing instead of[].♻️ Proposed fix
if not rows: - return { + result = { "document_id": document_id, "title": doc["filename"], "content": "", "chunk_range": {"start": 0, "end": 0}, "total_chunks": total_chunks, "total_chars": 0, - } + } + if return_chunks: + result["chunks"] = [] + return result🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/rag/app/services/rag_service.py` around lines 328 - 340, The result dict currently only adds "chunks" when there are rows, so callers with return_chunks=True get no "chunks" key for empty ranges; after constructing result (using variables like result, return_chunks, rows, combined, document_id) ensure that if return_chunks is True and "chunks" isn't already set you add an empty list (e.g., if return_chunks and "chunks" not in result: result["chunks"] = []). This guarantees callers always receive a chunks key (either populated or []) before the function returns.services/platform/convex/agent_tools/files/excel_tool.ts (1)
214-243:⚠️ Potential issue | 🟠 Major
downloadUrlstill points at the raw storage object.Now that the tool tells the model to surface
downloadUrlverbatim, returningctx.storage.getUrl(fileId)will download/open as the opaque storage blob rather thanresult.fileName. The document generators in this PR use the filename-preserving download endpoint for exactly this reason; the Excel tool should do the same.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/files/excel_tool.ts` around lines 214 - 243, The returned downloadUrl currently uses ctx.storage.getUrl(fileId) which points to the raw storage blob; instead return the filename-preserving download endpoint so clients get a download with result.fileName. Replace the getUrl call used to populate downloadUrl in the generate branch (where fileStorageId is fileId and file metadata was saved via internal.file_metadata.internal_mutations.saveFileMetadata) with the storage download endpoint that preserves filenames (e.g., use ctx.storage.getDownloadUrl or construct the filename-preserving download URL with fileId and result.fileName) and throw if that URL is unavailable.services/platform/convex/onedrive/internal_actions.ts (1)
102-125:⚠️ Potential issue | 🟠 MajorDon't silently fall back to the
'system'org here.If a caller misses the new
organizationId, this action still succeeds and saves metadata under a synthetic tenant. The imported file then disappears from org-scoped queries instead of failing fast, which makes propagation bugs very hard to catch. MakeorganizationIdrequired for this path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/onedrive/internal_actions.ts` around lines 102 - 125, The handler for this action silently falls back to 'system' when args.organizationId is missing; change this so organizationId is required and the action fails fast. Update the action's input schema (replace v.optional(v.string()) for organizationId with v.string()) and in the handler/downloadAndStoreFileImpl path remove the defaulting (args.organizationId ?? 'system') and instead validate/throw or return a failure if args.organizationId is undefined before calling saveFileMetadata (and ensure saveFileMetadata is invoked with the provided args.organizationId).services/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.ts (1)
195-219:⚠️ Potential issue | 🟡 MinorThese assertions won't catch an unexpected intermediate-cleanup job.
Line 206 and Line 218 only prove there was no synchronous
storage.delete().completeExecution()now defers cleanup throughscheduler.runAfter, so both tests still pass if an unwantedINTERMEDIATE_STORAGE_RETENTION_MSdeletion is scheduled.Suggested assertions
it('should not schedule deferred deletion when IDs match', async () => { @@ - expect(ctx._deletedStorageIds).not.toContain('same_storage'); + expect(ctx.scheduler.runAfter).not.toHaveBeenCalledWith( + INTERMEDIATE_STORAGE_RETENTION_MS, + expect.anything(), + { storageId: 'same_storage' }, + ); }); it('should NOT schedule deferred deletion when no replacement is provided', async () => { @@ - expect(ctx.storage.delete).not.toHaveBeenCalled(); + expect(ctx.scheduler.runAfter).not.toHaveBeenCalledWith( + INTERMEDIATE_STORAGE_RETENTION_MS, + expect.anything(), + { storageId: 'existing_storage' }, + ); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.ts` around lines 195 - 219, The tests currently only assert storage.delete wasn't called synchronously but miss checking deferred deletions scheduled via scheduler.runAfter; update the two tests around completeExecution to also assert that scheduler.runAfter was not invoked (or was invoked with no deletion task) and/or that no job referencing INTERMEDIATE_STORAGE_RETENTION_MS or the storage id was scheduled; specifically, in the tests that set ctx._storedExecution.variablesStorageId and call completeExecution, add assertions against the mocked scheduler.runAfter (and/or inspect ctx._deletedStorageIds) to ensure no deferred deletion job for the storage id was scheduled after completeExecution.services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts (1)
53-76:⚠️ Potential issue | 🟠 MajorGuard
workflow.completedemission behind the same terminal-state check.
wasTerminalsuppresses the state transition and chat response, butemitEvent(...)still runs on every successful re-entry. A retried callback will therefore fireworkflow.completedagain and can retrigger downstream subscriptions for an execution that is already terminal.Suggested guard
if (kind === 'success') { // Only transition to completed if not already done (the serialize action // does NOT set status; this callback is the sole completion authority). // The idempotency guard in completeExecution also protects against races. if (!wasTerminal) { // Use the already-persisted output from the serialize action. // Fall back to result.returnValue for simple workflows that skip serialization. const output = exec.output ?? toConvexJsonValue(result.returnValue); await ctx.runMutation( internal.wf_executions.internal_mutations.completeExecution, { executionId: toId<'wfExecutions'>(exec._id), output, }, ); - } - const updatedExec = await ctx.db.get(exec._id); - if (updatedExec) { - await emitEvent(ctx, { - organizationId: updatedExec.organizationId, - eventType: 'workflow.completed', - eventData: { execution: updatedExec }, - }); + const updatedExec = await ctx.db.get(exec._id); + if (updatedExec?.status === 'completed') { + await emitEvent(ctx, { + organizationId: updatedExec.organizationId, + eventType: 'workflow.completed', + eventData: { execution: updatedExec }, + }); + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts` around lines 53 - 76, The emitEvent call for 'workflow.completed' is not guarded by the same terminal-state check (wasTerminal) used for the state transition, causing duplicate completion events on retries; update on_workflow_complete.ts to only call emitEvent when the execution was not already terminal (i.e., wrap or gate the emitEvent invocation with if (!wasTerminal) or move it inside the block that runs completeExecution) so that emitEvent (via emitEvent(ctx, {...})) only fires when ctx.runMutation(internal.wf_executions.internal_mutations.completeExecution, {...}) is performed and not on re-entries.services/platform/convex/documents/validators.ts (1)
81-89:⚠️ Potential issue | 🟡 MinorUse
v.id('_storage')forfileStorageIdto align with TypeScript types and codebase pattern.The
fileStorageIdfield usesv.string()but the TypeScript interfaces (GenerateDocxResult,GenerateDocxFromTemplateResult,GeneratePptxResult) all define it asId<'_storage'>. This is inconsistent with other storage ID validators in the codebase (e.g.,variablesStorageId,outputStorageId,iconStorageIdall usev.id('_storage')) and with the project learning to preferv.id('_storage')for storage IDs.Change to match:
Suggested diff
export const generateDocumentResponseValidator = v.object({ success: v.boolean(), - fileStorageId: v.string(), + fileStorageId: v.id('_storage'), downloadUrl: v.string(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/documents/validators.ts` around lines 81 - 89, The generateDocumentResponseValidator currently defines fileStorageId as v.string(); change it to use v.id('_storage') to match the TypeScript types (Id<'_storage'>) and the project's storage ID pattern used by other validators (e.g., variablesStorageId, outputStorageId, iconStorageId); update the fileStorageId property in generateDocumentResponseValidator to v.id('_storage') so the validator aligns with the interfaces GenerateDocxResult / GenerateDocxFromTemplateResult / GeneratePptxResult.services/platform/app/features/chat/hooks/use-convex-file-upload.ts (1)
108-120:⚠️ Potential issue | 🟡 MinorImplement compensating cleanup if metadata save fails after file upload.
If
saveFileMetadatafails after the upload succeeds (line 114), the file remains in storage but the metadata record is not created, and thestorageIdis lost in the error handler. There is no background cleanup mechanism for orphaned chat files. Either wrap both operations in a transaction-like pattern, delete the uploaded file if metadata save fails, or ensure a background job cleans up untracked storage blobs in the chat context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts` around lines 108 - 120, When saveFileMetadata fails after a successful upload, perform compensating cleanup by deleting the uploaded blob using the returned storageId; specifically, wrap the metadata save call in a try/catch around saveFileMetadata and on error call the storage delete/cleanup API with storageId (the same storageId from result.json) before rethrowing the error, ensuring you reference fileToUpload.name and contentType/resolvedType only for logging; alternatively, if a delete API isn't available expose a cleanup task/queue invocation with storageId so orphaned blobs are removed later—make the change around the block that calls saveFileMetadata to guarantee no uploaded file remains without metadata.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@services/platform/convex/agent_tools/files/image_tool.ts`:
- Line 71: Update the return documentation in image_tool.ts to match the
GenerateImageResult type by adding the missing fileStorageId and extension
fields to the documented return spec; locate the comment block that lists
"Returns: { operation, success, downloadUrl, fileName, contentType, size }" and
include fileStorageId and extension so the docstring matches the
GenerateImageResult shape used by the code.
In `@services/platform/convex/agent_tools/files/pdf_tool.ts`:
- Line 51: The return documentation for GeneratePdfResult is missing the
fileStorageId and extension fields—update the doc comment that currently lists
"Returns: { success, downloadUrl, fileName, contentType, size }" to include
fileStorageId and extension so it matches the GeneratePdfResult type; locate the
comment in pdf_tool.ts (the doc block around GeneratePdfResult / the generatePdf
export) and add entries describing fileStorageId (storage identifier) and
extension (file extension) to the returned fields.
In `@services/platform/convex/agent_tools/files/txt_tool.ts`:
- Around line 113-126: Summary: Storing the blob (ctx.storage.store(blob))
before saving metadata
(internal.file_metadata.internal_mutations.saveFileMetadata) can leave orphaned
blobs if the metadata mutation fails; wrap the two steps with compensating logic
and error handling. Update the flow in the function that calls
ctx.storage.store, internal.file_metadata.internal_mutations.saveFileMetadata,
and ctx.storage.getUrl so that after successfully calling
ctx.storage.store(blob) you attempt the metadata save in a try/catch and on
failure call the storage cleanup API (e.g., ctx.storage.delete(fileId) or
equivalent) to remove the stored blob, then rethrow or surface the original
error; ensure you only call ctx.storage.getUrl(fileId) after metadata save
succeeds so the returned URL is consistent with saved metadata.
In `@services/platform/convex/conversations/send_message_via_integration.ts`:
- Around line 67-73: Replace the blind insert of file metadata with a
lookup-and-conditional-insert: in send_message_via_integration where
ctx.db.insert('fileMetadata', {...}) is called, first query for an existing
fileMetadata row by storageId (and optionally organizationId), if found validate
that the existing row.organizationId matches args.organizationId (or handle
mismatch as an error), and only insert a new row when none exists; alternatively
use a database upsert/insert-if-not-exists operation keyed on storageId to
ensure idempotence and deterministic cleanup behavior.
In `@services/platform/convex/custom_agents/webhooks/http_actions.ts`:
- Around line 105-114: The upload currently stores the blob with
ctx.storage.store(file) and then calls
internal.file_metadata.internal_mutations.saveFileMetadata, but if the mutation
throws the stored blob becomes orphaned; wrap the mutation call in a try/catch:
after obtaining storageId, run the saveFileMetadata inside try, and in catch
call ctx.storage.delete(storageId) (or the appropriate ctx.storage removal
method) to delete the uploaded blob, then log a warning if the delete fails and
rethrow the original mutation error so the httpAction still fails; reference
ctx.storage.store(file), storageId, and
internal.file_metadata.internal_mutations.saveFileMetadata when making the
change.
In
`@services/platform/convex/documents/__tests__/get_accessible_file_ids.test.ts`:
- Around line 13-35: The mock createMockCtx currently ignores the withIndex
callback and returns all docs; update it to capture the callback passed to
withIndex and use it to filter docs so the test asserts that
q.eq('organizationId', args.organizationId) is honored by getAccessibleFileIds.
Specifically, change the vi.fn() for query().withIndex to accept a predicate
(the index callback) and apply that predicate to each document when building the
asyncIterator (or throw/assert if the predicate doesn't filter by
organizationId), referencing createMockCtx, withIndex, and getAccessibleFileIds
so the test fails if the organizationId equality check is removed.
In `@services/platform/convex/documents/generate_document.ts`:
- Around line 116-125: The saveFileMetadata call can throw after the blob is
already uploaded to _storage, leaving an orphaned blob; wrap the
ctx.runMutation(internal.file_metadata.internal_mutations.saveFileMetadata, ...)
call in a try/catch and on error attempt a best-effort cleanup of the uploaded
blob (call the same _storage delete/remove method used for uploads with the
storageId), swallow any delete errors but ensure the original error is rethrown;
apply the same pattern to the PPTX helper that uploads to _storage so uploads
are deleted on metadata persistence failure or scheduled for cleanup if
immediate deletion fails.
In `@services/platform/convex/documents/generate_docx_from_template.ts`:
- Around line 128-137: Wrap the call to
ctx.runMutation(internal.file_metadata.internal_mutations.saveFileMetadata, ...)
in a try/catch so that if saveFileMetadata throws you delete or schedule
deletion of the uploaded blob identified by storageId before rethrowing; locate
the saveFileMetadata invocation in generate_docx_from_template.ts and in the
catch call your storage cleanup routine (the existing storage deletion helper or
API used elsewhere in this module) using storageId, then rethrow the original
error to preserve failure semantics.
In `@services/platform/convex/documents/generate_docx.ts`:
- Around line 122-131: The saveFileMetadata mutation call (ctx.runMutation with
internal.file_metadata.internal_mutations.saveFileMetadata) can fail after the
DOCX blob has been uploaded (storageId), leaving orphaned blobs; wrap the
ctx.runMutation call in a try/catch, and on any error call the same storage
deletion/cleanup routine used elsewhere (delete the uploaded blob by storageId
or equivalent cleanup method) before rethrowing the original error so callers
still see the failure but no orphaned file remains; ensure you reference
storageId and the docxBytes.length context and preserve the original error when
rethrowing.
In `@services/platform/convex/documents/get_accessible_document_ids.ts`:
- Around line 51-80: getAccessibleFileIds duplicates the document iteration and
team/rag visibility predicate used in getAccessibleDocumentIds; extract that
shared scan into a single helper (e.g., collectAccessibleDocuments or
scanAccessibleDocs) that accepts ctx and args.organizationId/userId, yields or
returns matching document records, and encapsulates the ragInfo.status ===
'completed' and team membership checks (using getUserTeamIds and the org/team
set logic). Replace the loop in both getAccessibleFileIds and
getAccessibleDocumentIds to call this helper and then map the resulting
documents to fileId or _id respectively at the call site.
In `@services/platform/convex/documents/mutations.ts`:
- Around line 130-137: Resolve and reuse a single normalized contentType
variable instead of using args.contentType directly: compute something like
resolvedContentType = args.contentType && args.contentType !== '' ?
args.contentType : 'application/octet-stream' (or trim-check for emptiness) and
use resolvedContentType in both the ctx.db.insert('fileMetadata', ...) call and
the subsequent createDocument(...) call so documents.mimeType is never stored as
an empty string; update places that reference args.contentType (e.g., the
fileMetadata insert block and the createDocument(...) invocation) to use the
normalized variable.
In `@services/platform/convex/file_metadata/internal_mutations.ts`:
- Around line 14-25: The current upsert looks up rows only by storageId
(query('fileMetadata').withIndex('by_storageId', q => q.eq('storageId',
args.storageId))) and will patch and return an existing row even if its
organizationId differs; update the logic in internal_mutations.ts so the lookup
is scoped to organizationId (either include organizationId in the query/index
predicate or, after fetching existing, check existing.organizationId !==
args.organizationId and throw/reject) before calling ctx.db.patch on the
existing._id; reference the fileMetadata lookup, withIndex('by_storageId'),
existing, args.storageId, args.organizationId and ctx.db.patch when making the
change.
In `@services/platform/convex/file_metadata/mutations.ts`:
- Around line 14-40: The handler in the file metadata upsert currently only
authenticates the caller via authComponent.getAuthUser and then inserts/patches
any storageId regardless of org membership; change the flow to authenticate →
call getOrganizationMember (or equivalent org membership check) for
args.organizationId and reject if not a member, and before patching verify
existing.organizationId === args.organizationId (throw if mismatched) so updates
cannot modify rows from other organizations; ensure inserts use the authorized
organizationId from args after membership validation and keep the same control
flow (authComponent.getAuthUser, getOrganizationMember, then patch or insert) in
the handler function.
In
`@services/platform/convex/node_only/integration_sandbox/helpers/create_convex_storage_provider.ts`:
- Around line 52-61: The metadata save mutation can fail after
ctx.storage.store() has committed, leaving orphaned blobs; wrap the calls that
persist metadata (the ctx.runMutation call using
internal.file_metadata.internal_mutations.saveFileMetadata and the surrounding
download/store metadata flow) in try-catch blocks so that if the mutation throws
you call ctx.storage.delete(blobKey) (or the appropriate delete method) to
remove the already-stored blob, then rethrow the error; apply the identical
try-catch-and-delete pattern to the other ctx.storage.store() usage (the store
invocation referenced around the store() block) to ensure no orphaned blobs
remain on metadata persistence failure.
In `@services/platform/convex/onedrive/download_and_store_file.ts`:
- Around line 17-21: storeFile() currently persists the blob and only then calls
saveFileMetadata(storageId, contentType, size), so if saveFileMetadata throws
you end up with orphaned/duplicated _storage blobs; modify the flow in
download_and_store_file.ts so that either (a) you persist metadata first (if
possible) before finalizing the blob, or (b) if metadata must be written after
the blob, ensure you perform a compensating delete of the stored blob when
saveFileMetadata throws (i.e., catch errors from saveFileMetadata and call the
storage cleanup/delete for the created storageId/_storage before rethrowing or
returning failure), applying the same cleanup logic around all calls to
saveFileMetadata() (including the occurrence at lines ~52) so no orphaned blobs
remain.
In `@services/platform/convex/onedrive/import_files.ts`:
- Around line 173-178: The import flow currently calls deps.storeFile (creating
a blob) then deps.saveFileMetadata(storageId, ...) separately, which can leave
the blob/storageId orphaned if saveFileMetadata or subsequent document writes
fail; change this by either (A) wrapping storage+metadata persistence in a
single atomic abstraction (e.g., add a deps.saveFileWithMetadata or
storageService.saveWithMetadata that calls storeFile and saveFileMetadata and
rolls back/delete the blob on any failure) or (B) add explicit cleanup logic:
after storeFile returns storageId, ensure any catch/finally around
deps.saveFileMetadata and later document writes will call
deps.deleteStorage(storageId) (or the equivalent cleanup method) on failure.
Update the code paths that call storeFile (the call site using storageId, blob,
downloadResult) to use the new atomic method or the cleanup branch so storageId
cannot be left orphaned.
In `@services/platform/convex/onedrive/upload_and_create_document.ts`:
- Around line 77-83: storageStore is non-transactional relative to
saveFileMetadata so if saveFileMetadata throws you must remove the orphaned
blob: after obtaining storageId from deps.storageStore(blob) wrap the
deps.saveFileMetadata(...) call in a try/catch; in the catch await a deletion
call (e.g. await deps.deleteStoredBlob(storageId) — or the appropriate cleanup
helper in this module) to remove the stored blob, handle/log any deletion error
but ensure you rethrow the original saveFileMetadata error, leaving
storageStore, saveFileMetadata, storageId and blob references intact.
In `@services/platform/convex/predefined_workflows/document_rag_sync.ts`:
- Around line 88-91: The workflow assumes every document returned by the step
find_unprocessed_document has storage-backed fields (fileId, title, mimeType)
which are optional; fix by either updating the find_unprocessed_document query
to filter out documents where fileId (and optionally title/mimeType) is missing
so upload_document always receives valid storage-backed rows, or add an explicit
branch after find_unprocessed_document that checks these fields and
marks/records the document as skipped/processed (so it won't be repeatedly
selected) before attempting upload_document; reference find_unprocessed_document
and upload_document when making the change.
In
`@services/platform/convex/workflow_engine/action_defs/document/document_action.ts`:
- Line 90: The execute function currently declares a third parameter named
variables that is unused; rename it to _variables (or prefix it with an
underscore) in the execute signature (async execute(ctx, params, _variables)) to
satisfy the linter and avoid unused-variable warnings—ensure any existing
references (if any) in the body are updated to _variables as well and keep the
function name execute unchanged.
- Around line 145-159: The unsafe type assertion occurs in the get_metadata
branch where fileId is cast as Id<'_storage'> when calling
ctx.runQuery(internal.file_metadata.internal_queries.getByStorageId); either
validate the incoming fileId values (e.g., check format/length/regex) before
casting and only call the query with a guaranteed valid storageId, or if you
intend to accept string boundaries here, add a targeted linter suppression
comment directly above the cast (with a short rationale like "string boundary ->
cast to Id<'_storage'> at call site by design") so the pipeline knows this is
intentional; update the mapping in the case 'get_metadata' block around
params.fileIds.map(...) and the ctx.runQuery call accordingly.
In
`@services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts`:
- Around line 118-150: The completion message currently treats kind ===
'canceled' as a failure and shows "unknown error"; update the messageContent
construction in on_workflow_complete.ts to handle the 'canceled' case separately
(check the local kind variable or result.kind), creating a distinct canceled
template that uses any cancellation reason available from result (e.g.,
result.error or result.reason) and does not label it as failed, while keeping
the existing success and failed branches intact; ensure the adjusted
messageContent is passed to
internal.agent_tools.workflows.internal_mutations.triggerWorkflowCompletionResponse
so canceled runs notify the user that the workflow was canceled with the
provided reason.
In
`@services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts`:
- Around line 394-397: The fileId property for rag.upload_document is currently
typed as a plain string; update its schema to use the project's storage-ID shape
instead (the same _storage ID model used elsewhere in this file/registry) so
downstream validation treats it as a storage identifier rather than an opaque
string—replace the current fileId schema (type: 'string') with the shared
_storage ID schema/reference used by other fields in action_schemas.ts (copy the
exact structure/reference from another storage ID field in this file).
In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts`:
- Around line 113-118: Add a new optional parameter to the rag.search contract
(e.g., returnChunks?: boolean or chunked?: boolean) and update the documentation
block for Output (search) to explicitly show both modes: when chunked is false
return data.results: array of {documentId/ragDocumentId, fileId, filename?,
snippet, score, metadata?} plus summary fields (totalHits, executionTimeMs);
when chunked is true return data.chunks: array of {chunkId, ragDocumentId,
sourceFileId/fileId, filename?, text, startOffset, endOffset, score, similarity}
and keep data.results (or references) plus executionTimeMs and
topK/similarityThreshold echoes; reference the rag.search contract name in
workflow_syntax_compact.ts and ensure parameter names, types, and output field
names match the rest of the workflow engine (so generated steps can request
chunk payloads and read steps.*.output.data.chunks or .results consistently).
- Around line 93-99: The compact syntax contract string currently lists only
"update, retrieve, generate_docx" so the new workflow operation is
undiscoverable; add a new operation entry "get_metadata" alongside the others
and document its params and output shape in the same style—Params
(get_metadata): documentId (required) and optionally fields? (array of strings)
or keys? (array of strings); Output (get_metadata): `{ data: { metadata: {...} }
}` to return the full metadata object; update the string that starts with "Ops:"
and the corresponding Params/Output blocks in workflow_syntax_compact.ts so
agents can discover and wire the get_metadata action (look for the literal block
"Ops: update, retrieve, generate_docx" and the Params/Output lines to modify).
In `@services/platform/convex/workflows/executions/complete_execution.ts`:
- Around line 49-60: The current cleanup only schedules deletion when a new
variablesStorageId replaces the old blob; update the condition in
complete_execution.ts to also schedule deletion when the execution is
transitioned from stored blob to inline content (variablesSerialized).
Concretely, modify the if that references oldVariablesStorageId and
args.variablesStorageId so it triggers when oldVariablesStorageId exists AND
(args.variablesStorageId is present and differs from oldVariablesStorageId OR
args.variablesSerialized is truthy), then call
ctx.scheduler.runAfter(INTERMEDIATE_STORAGE_RETENTION_MS,
internal.wf_executions.internal_mutations.deleteStorageBlob, { storageId:
oldVariablesStorageId }) as currently implemented; mirror the same logic used in
resume_execution.ts.
In `@services/platform/convex/workflows/executions/delete_storage_blob.ts`:
- Around line 19-23: The current try/catch around
ctx.storage.delete(args.storageId) swallows all errors; change it to catch the
error value, detect/ignore only the “not found/already deleted” case and
otherwise log the failure (e.g., console.warn or processLogger/error) so
deferred-cleanup failures are visible; update the catch in the
delete_storage_blob workflow to reference ctx.storage.delete and log unexpected
errors with context including args.storageId and the caught error.
---
Outside diff comments:
In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts`:
- Around line 108-120: When saveFileMetadata fails after a successful upload,
perform compensating cleanup by deleting the uploaded blob using the returned
storageId; specifically, wrap the metadata save call in a try/catch around
saveFileMetadata and on error call the storage delete/cleanup API with storageId
(the same storageId from result.json) before rethrowing the error, ensuring you
reference fileToUpload.name and contentType/resolvedType only for logging;
alternatively, if a delete API isn't available expose a cleanup task/queue
invocation with storageId so orphaned blobs are removed later—make the change
around the block that calls saveFileMetadata to guarantee no uploaded file
remains without metadata.
In `@services/platform/convex/agent_tools/files/excel_tool.ts`:
- Around line 214-243: The returned downloadUrl currently uses
ctx.storage.getUrl(fileId) which points to the raw storage blob; instead return
the filename-preserving download endpoint so clients get a download with
result.fileName. Replace the getUrl call used to populate downloadUrl in the
generate branch (where fileStorageId is fileId and file metadata was saved via
internal.file_metadata.internal_mutations.saveFileMetadata) with the storage
download endpoint that preserves filenames (e.g., use ctx.storage.getDownloadUrl
or construct the filename-preserving download URL with fileId and
result.fileName) and throw if that URL is unavailable.
In `@services/platform/convex/agent_tools/files/pptx_tool.ts`:
- Around line 327-336: Type mismatch: GeneratePptxResult.fileStorageId is
Id<'_storage'> but GenerateResult.fileStorageId is string and the current `as
GenerateResult` hides this; fix by either updating the GenerateResult type to
use Id<'_storage'> for fileStorageId or convert the returned ID to string before
returning. Locate the return in generatePptx (the `result` object used in the
block that logs 'tool:pptx generate success') and either change
GenerateResult.fileStorageId to Id<'_storage'> or replace the spread with an
explicit fileStorageId: String(result.fileStorageId) (or equivalent conversion)
so the returned shape matches GenerateResult without a type assertion.
In `@services/platform/convex/agent_tools/rag/format_search_results.ts`:
- Around line 8-13: formatSearchResults currently only emits score and content,
losing the new SearchResult.filename; update the function formatSearchResults
(and any helper like formatChunk or buildFormattedResults) to include filename
(and optionally document_id/metadata) in each formatted chunk output so the
agent can see source attribution. Locate where SearchResult objects are
mapped/serialized inside formatSearchResults and append a readable filename
field (e.g., "filename: <value>") to the returned string/object for each result,
ensuring undefined filenames are handled gracefully (skip or show "unknown").
In `@services/platform/convex/documents/validators.ts`:
- Around line 81-89: The generateDocumentResponseValidator currently defines
fileStorageId as v.string(); change it to use v.id('_storage') to match the
TypeScript types (Id<'_storage'>) and the project's storage ID pattern used by
other validators (e.g., variablesStorageId, outputStorageId, iconStorageId);
update the fileStorageId property in generateDocumentResponseValidator to
v.id('_storage') so the validator aligns with the interfaces GenerateDocxResult
/ GenerateDocxFromTemplateResult / GeneratePptxResult.
In `@services/platform/convex/onedrive/internal_actions.ts`:
- Around line 102-125: The handler for this action silently falls back to
'system' when args.organizationId is missing; change this so organizationId is
required and the action fails fast. Update the action's input schema (replace
v.optional(v.string()) for organizationId with v.string()) and in the
handler/downloadAndStoreFileImpl path remove the defaulting (args.organizationId
?? 'system') and instead validate/throw or return a failure if
args.organizationId is undefined before calling saveFileMetadata (and ensure
saveFileMetadata is invoked with the provided args.organizationId).
In
`@services/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.ts`:
- Around line 195-219: The tests currently only assert storage.delete wasn't
called synchronously but miss checking deferred deletions scheduled via
scheduler.runAfter; update the two tests around completeExecution to also assert
that scheduler.runAfter was not invoked (or was invoked with no deletion task)
and/or that no job referencing INTERMEDIATE_STORAGE_RETENTION_MS or the storage
id was scheduled; specifically, in the tests that set
ctx._storedExecution.variablesStorageId and call completeExecution, add
assertions against the mocked scheduler.runAfter (and/or inspect
ctx._deletedStorageIds) to ensure no deferred deletion job for the storage id
was scheduled after completeExecution.
In
`@services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts`:
- Around line 53-76: The emitEvent call for 'workflow.completed' is not guarded
by the same terminal-state check (wasTerminal) used for the state transition,
causing duplicate completion events on retries; update on_workflow_complete.ts
to only call emitEvent when the execution was not already terminal (i.e., wrap
or gate the emitEvent invocation with if (!wasTerminal) or move it inside the
block that runs completeExecution) so that emitEvent (via emitEvent(ctx, {...}))
only fires when
ctx.runMutation(internal.wf_executions.internal_mutations.completeExecution,
{...}) is performed and not on re-entries.
In `@services/rag/app/routers/search.py`:
- Around line 35-41: The generate() handler currently builds its sources from
search results but omits the filename field, causing inconsistent provenance;
update the code that maps search results into source objects (the block in
generate() that builds "sources", referenced around the mapping of
SearchResult/`r` into source entries) to include the filename (e.g., use
r.get("filename") or sr.filename) and propagate it into each source's metadata
or filename property so the /generate response includes the same filename
attribution as /search; apply the same change where sources are constructed in
the second occurrence (the other mapping around lines 75-81).
In `@services/rag/app/services/rag_service.py`:
- Around line 328-340: The result dict currently only adds "chunks" when there
are rows, so callers with return_chunks=True get no "chunks" key for empty
ranges; after constructing result (using variables like result, return_chunks,
rows, combined, document_id) ensure that if return_chunks is True and "chunks"
isn't already set you add an empty list (e.g., if return_chunks and "chunks" not
in result: result["chunks"] = []). This guarantees callers always receive a
chunks key (either populated or []) before the function returns.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 33beeeb0-7c94-4b9b-9d63-95512226939b
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (98)
examples/workflows/contract-comparison/config.jsonservices/platform/app/features/automations/hooks/use-assistant-chat.tsservices/platform/app/features/chat/components/chat-interface.tsxservices/platform/app/features/chat/components/human-input-request-card.tsxservices/platform/app/features/chat/components/integration-approval-card.tsxservices/platform/app/features/chat/components/workflow-creation-approval-card.tsxservices/platform/app/features/chat/components/workflow-run-approval-card.tsxservices/platform/app/features/chat/hooks/use-convex-file-upload.tsservices/platform/app/features/chat/hooks/use-execution-status.tsservices/platform/app/features/custom-agents/hooks/use-test-chat.tsservices/platform/app/features/documents/components/documents-action-menu.tsxservices/platform/app/features/documents/hooks/mutations.tsservices/platform/convex/agent_tools/documents/__tests__/document_retrieve_tool.test.tsservices/platform/convex/agent_tools/documents/helpers/fetch_document_content.tsservices/platform/convex/agent_tools/documents/helpers/retrieve_document.tsservices/platform/convex/agent_tools/files/docx_tool.tsservices/platform/convex/agent_tools/files/excel_tool.tsservices/platform/convex/agent_tools/files/image_tool.tsservices/platform/convex/agent_tools/files/pdf_tool.tsservices/platform/convex/agent_tools/files/pptx_tool.tsservices/platform/convex/agent_tools/files/txt_tool.tsservices/platform/convex/agent_tools/rag/format_search_results.tsservices/platform/convex/agent_tools/rag/query_rag_context.tsservices/platform/convex/agent_tools/rag/rag_search_tool.tsservices/platform/convex/agent_tools/workflows/create_workflow_tool.tsservices/platform/convex/agent_tools/workflows/internal_mutations.tsservices/platform/convex/conversations/send_message_via_integration.tsservices/platform/convex/custom_agents/webhooks/http_actions.tsservices/platform/convex/documents/__tests__/get_accessible_file_ids.test.tsservices/platform/convex/documents/actions.tsservices/platform/convex/documents/generate_document.tsservices/platform/convex/documents/generate_docx.tsservices/platform/convex/documents/generate_docx_from_template.tsservices/platform/convex/documents/generate_pptx.tsservices/platform/convex/documents/get_accessible_document_ids.tsservices/platform/convex/documents/internal_actions.tsservices/platform/convex/documents/internal_queries.tsservices/platform/convex/documents/mutations.tsservices/platform/convex/documents/schema.tsservices/platform/convex/documents/types.tsservices/platform/convex/documents/update_document_internal.tsservices/platform/convex/documents/upload_base64_to_storage.tsservices/platform/convex/documents/validators.tsservices/platform/convex/file_metadata/helpers.tsservices/platform/convex/file_metadata/internal_mutations.tsservices/platform/convex/file_metadata/internal_queries.tsservices/platform/convex/file_metadata/mutations.tsservices/platform/convex/file_metadata/schema.tsservices/platform/convex/lib/agent_response/generate_response.tsservices/platform/convex/node_only/integration_sandbox/helpers/create_convex_storage_provider.tsservices/platform/convex/node_only/integration_sandbox/internal_actions.tsservices/platform/convex/onedrive/actions.tsservices/platform/convex/onedrive/download_and_store_file.tsservices/platform/convex/onedrive/import_files.tsservices/platform/convex/onedrive/import_files_deps.tsservices/platform/convex/onedrive/internal_actions.tsservices/platform/convex/onedrive/upload_and_create_document.tsservices/platform/convex/onedrive/upload_and_create_document_deps.tsservices/platform/convex/predefined_workflows/document_rag_sync.tsservices/platform/convex/schema.tsservices/platform/convex/wf_executions/internal_mutations.tsservices/platform/convex/wf_executions/mutations.tsservices/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.tsservices/platform/convex/workflow_engine/__tests__/execution_variables.test.tsservices/platform/convex/workflow_engine/action_defs/document/document_action.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/delete_document.test.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/delete_document.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/types.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/upload_document.test.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/upload_document.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/upload_file_direct.test.tsservices/platform/convex/workflow_engine/action_defs/rag/helpers/upload_file_direct.tsservices/platform/convex/workflow_engine/action_defs/rag/rag_action.tsservices/platform/convex/workflow_engine/helpers/engine/dynamic_workflow_handler.tsservices/platform/convex/workflow_engine/helpers/engine/index.tsservices/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.test.tsservices/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.tsservices/platform/convex/workflow_engine/helpers/engine/serialize_and_complete_execution_handler.test.tsservices/platform/convex/workflow_engine/helpers/engine/serialize_and_complete_execution_handler.tsservices/platform/convex/workflow_engine/helpers/recovery/recover_stuck_executions.test.tsservices/platform/convex/workflow_engine/helpers/recovery/recover_stuck_executions.tsservices/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.tsservices/platform/convex/workflow_engine/internal_actions.tsservices/platform/convex/workflow_engine/workflow_syntax_compact.tsservices/platform/convex/workflows/executions/cleanup_execution_storage.tsservices/platform/convex/workflows/executions/complete_execution.tsservices/platform/convex/workflows/executions/delete_storage_blob.tsservices/platform/convex/workflows/executions/persist_execution_output.tsservices/platform/convex/workflows/executions/resume_execution.tsservices/platform/convex/workflows/executions/update_execution_variables.tsservices/platform/lib/shared/__tests__/file-types.test.tsservices/platform/lib/shared/file-types.tsservices/platform/messages/en.jsonservices/rag/app/models.pyservices/rag/app/routers/documents.pyservices/rag/app/routers/search.pyservices/rag/app/services/rag_service.pyservices/rag/app/services/search_service.py
| - content: The HTML/Markdown you wrote, or a URL to capture | ||
| - quality: "low" (1x JPEG, smallest file, default), "standard" (1x PNG), "high" (2x PNG), "ultra" (4x PNG). Use "low" unless user requests higher quality. | ||
| Returns: { operation, success, url, fileName, contentType, size } | ||
| Returns: { operation, success, downloadUrl, fileName, contentType, size } |
There was a problem hiding this comment.
Documentation omits fileStorageId and extension from return spec.
The documented return object is missing two fields that GenerateImageResult declares. Consider updating for completeness:
- Returns: { operation, success, downloadUrl, fileName, contentType, size }
+ Returns: { operation, success, fileStorageId, downloadUrl, fileName, contentType, extension, size }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Returns: { operation, success, downloadUrl, fileName, contentType, size } | |
| Returns: { operation, success, fileStorageId, downloadUrl, fileName, contentType, extension, size } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agent_tools/files/image_tool.ts` at line 71, Update
the return documentation in image_tool.ts to match the GenerateImageResult type
by adding the missing fileStorageId and extension fields to the documented
return spec; locate the comment block that lists "Returns: { operation, success,
downloadUrl, fileName, contentType, size }" and include fileStorageId and
extension so the docstring matches the GenerateImageResult shape used by the
code.
| - extraCss: Additional CSS to inject | ||
| - wrapInTemplate: Whether to wrap in HTML template | ||
| Returns: { success, url, fileName, contentType, size } | ||
| Returns: { success, downloadUrl, fileName, contentType, size } |
There was a problem hiding this comment.
Documentation is missing fileStorageId and extension fields.
The return documentation omits fields that are part of GeneratePdfResult. This inconsistency may confuse consumers relying on the tool description.
📝 Proposed fix
- Returns: { success, downloadUrl, fileName, contentType, size }
+ Returns: { success, fileStorageId, downloadUrl, fileName, contentType, extension, size }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Returns: { success, downloadUrl, fileName, contentType, size } | |
| Returns: { success, fileStorageId, downloadUrl, fileName, contentType, extension, size } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agent_tools/files/pdf_tool.ts` at line 51, The
return documentation for GeneratePdfResult is missing the fileStorageId and
extension fields—update the doc comment that currently lists "Returns: {
success, downloadUrl, fileName, contentType, size }" to include fileStorageId
and extension so it matches the GeneratePdfResult type; locate the comment in
pdf_tool.ts (the doc block around GeneratePdfResult / the generatePdf export)
and add entries describing fileStorageId (storage identifier) and extension
(file extension) to the returned fields.
| const fileId = await ctx.storage.store(blob); | ||
|
|
||
| await ctx.runMutation( | ||
| internal.file_metadata.internal_mutations.saveFileMetadata, | ||
| { | ||
| organizationId: ctx.organizationId ?? 'system', | ||
| storageId: fileId, | ||
| fileName: filename, | ||
| contentType: 'text/plain; charset=utf-8', | ||
| size: blob.size, | ||
| }, | ||
| ); | ||
|
|
||
| const url = await ctx.storage.getUrl(fileId); |
There was a problem hiding this comment.
Potential for orphaned storage blobs if metadata save fails.
The ctx.storage.store(blob) call on line 113 executes in the action context, while saveFileMetadata runs as a separate mutation. If the mutation fails (e.g., due to validation or transient errors), the stored blob persists without associated metadata.
This isn't a critical issue since:
- The file is still usable via the returned URL
- Deferred storage cleanup (mentioned in PR objectives) may address orphans
However, consider wrapping both operations or adding compensating logic for production reliability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agent_tools/files/txt_tool.ts` around lines 113 -
126, Summary: Storing the blob (ctx.storage.store(blob)) before saving metadata
(internal.file_metadata.internal_mutations.saveFileMetadata) can leave orphaned
blobs if the metadata mutation fails; wrap the two steps with compensating logic
and error handling. Update the flow in the function that calls
ctx.storage.store, internal.file_metadata.internal_mutations.saveFileMetadata,
and ctx.storage.getUrl so that after successfully calling
ctx.storage.store(blob) you attempt the metadata save in a try/catch and on
failure call the storage cleanup API (e.g., ctx.storage.delete(fileId) or
equivalent) to remove the stored blob, then rethrow or surface the original
error; ensure you only call ctx.storage.getUrl(fileId) after metadata save
succeeds so the returned URL is consistent with saved metadata.
| await ctx.db.insert('fileMetadata', { | ||
| organizationId: args.organizationId, | ||
| storageId: att.storageId, | ||
| fileName: att.fileName, | ||
| contentType: att.contentType, | ||
| size: att.size, | ||
| }); |
There was a problem hiding this comment.
Avoid re-inserting metadata for an existing attachment blob.
This blindly creates a new fileMetadata row for every send. If the same storageId is reused, lookup/cleanup becomes non-deterministic, and an existing row is never validated before the blob is reused under this organization.
Possible fix
- await ctx.db.insert('fileMetadata', {
- organizationId: args.organizationId,
- storageId: att.storageId,
- fileName: att.fileName,
- contentType: att.contentType,
- size: att.size,
- });
+ const existingFileMetadata = await ctx.db
+ .query('fileMetadata')
+ .withIndex('by_storageId', (q) => q.eq('storageId', att.storageId))
+ .first();
+
+ if (
+ existingFileMetadata &&
+ existingFileMetadata.organizationId !== args.organizationId
+ ) {
+ throw new Error('Attachment does not belong to organization');
+ }
+
+ if (!existingFileMetadata) {
+ await ctx.db.insert('fileMetadata', {
+ organizationId: args.organizationId,
+ storageId: att.storageId,
+ fileName: att.fileName,
+ contentType: att.contentType,
+ size: att.size,
+ });
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/conversations/send_message_via_integration.ts`
around lines 67 - 73, Replace the blind insert of file metadata with a
lookup-and-conditional-insert: in send_message_via_integration where
ctx.db.insert('fileMetadata', {...}) is called, first query for an existing
fileMetadata row by storageId (and optionally organizationId), if found validate
that the existing row.organizationId matches args.organizationId (or handle
mismatch as an error), and only insert a new row when none exists; alternatively
use a database upsert/insert-if-not-exists operation keyed on storageId to
ensure idempotence and deterministic cleanup behavior.
| await ctx.runMutation( | ||
| internal.file_metadata.internal_mutations.saveFileMetadata, | ||
| { | ||
| organizationId: webhook.organizationId, | ||
| storageId, | ||
| fileName: file.name, | ||
| contentType: file.type || 'application/octet-stream', | ||
| size: file.size, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
Clean up the uploaded blob if saveFileMetadata fails.
This runs in an httpAction, so if the metadata mutation throws after ctx.storage.store(file) succeeds, the webhook fails and leaves an orphaned _storage file behind.
♻️ Suggested fix
if (file instanceof File) {
const storageId = await ctx.storage.store(file);
- await ctx.runMutation(
- internal.file_metadata.internal_mutations.saveFileMetadata,
- {
- organizationId: webhook.organizationId,
- storageId,
- fileName: file.name,
- contentType: file.type || 'application/octet-stream',
- size: file.size,
- },
- );
+ try {
+ await ctx.runMutation(
+ internal.file_metadata.internal_mutations.saveFileMetadata,
+ {
+ organizationId: webhook.organizationId,
+ storageId,
+ fileName: file.name,
+ contentType: file.type || 'application/octet-stream',
+ size: file.size,
+ },
+ );
+ } catch (error) {
+ await ctx.storage.delete(storageId);
+ throw error;
+ }
attachment = {Based on learnings, Convex mutations are transactional with storage, but actions are not, so this upload path needs explicit compensation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/custom_agents/webhooks/http_actions.ts` around lines
105 - 114, The upload currently stores the blob with ctx.storage.store(file) and
then calls internal.file_metadata.internal_mutations.saveFileMetadata, but if
the mutation throws the stored blob becomes orphaned; wrap the mutation call in
a try/catch: after obtaining storageId, run the saveFileMetadata inside try, and
in catch call ctx.storage.delete(storageId) (or the appropriate ctx.storage
removal method) to delete the uploaded blob, then log a warning if the delete
fails and rethrow the original mutation error so the httpAction still fails;
reference ctx.storage.store(file), storageId, and
internal.file_metadata.internal_mutations.saveFileMetadata when making the
change.
| Ops: update, retrieve, generate_docx | ||
| Params (update): documentId (required), title?, content?, metadata?, mimeType?, extension?, sourceProvider? | ||
| Params (retrieve): fileId (required - file storage ID), chunkStart? (number), chunkEnd? (number), returnChunks? (boolean) | ||
| Params (generate_docx): fileName (required), sourceType (required: 'markdown'|'html'), content (required) | ||
| Output (update): \`{ data: {...document} }\` - returns full document entity | ||
| Output (retrieve): \`{ data: { content, chunks?, ... } }\` - returns document content from RAG | ||
| Output (generate_docx): \`{ data: { fileStorageId, downloadUrl, fileName } }\` |
There was a problem hiding this comment.
Add document.get_metadata to the compact syntax contract.
The PR objective adds get_metadata, but this reference only exposes update, retrieve, and generate_docx. Since this string is used to drive workflow generation, omitting the op means agents cannot discover or wire the new action correctly. Please add the operation and its params/output shape here.
Suggested update
- Ops: update, retrieve, generate_docx
+ Ops: update, retrieve, get_metadata, generate_docx
Params (update): documentId (required), title?, content?, metadata?, mimeType?, extension?, sourceProvider?
Params (retrieve): fileId (required - file storage ID), chunkStart? (number), chunkEnd? (number), returnChunks? (boolean)
+ Params (get_metadata): ...
Params (generate_docx): fileName (required), sourceType (required: 'markdown'|'html'), content (required)
Output (update): `{ data: {...document} }` - returns full document entity
Output (retrieve): `{ data: { content, chunks?, ... } }` - returns document content from RAG
+ Output (get_metadata): `{ data: {...metadata} }`
Output (generate_docx): `{ data: { fileStorageId, downloadUrl, fileName } }`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts` around
lines 93 - 99, The compact syntax contract string currently lists only "update,
retrieve, generate_docx" so the new workflow operation is undiscoverable; add a
new operation entry "get_metadata" alongside the others and document its params
and output shape in the same style—Params (get_metadata): documentId (required)
and optionally fields? (array of strings) or keys? (array of strings); Output
(get_metadata): `{ data: { metadata: {...} } }` to return the full metadata
object; update the string that starts with "Ops:" and the corresponding
Params/Output blocks in workflow_syntax_compact.ts so agents can discover and
wire the get_metadata action (look for the literal block "Ops: update, retrieve,
generate_docx" and the Params/Output lines to modify).
| Ops: upload_document, delete_document, search | ||
| Params (upload_document): fileId (required - file storage ID), fileName? (optional), contentType? (optional), sync? (optional boolean) | ||
| Params (delete_document): fileId (required - file storage ID to delete from RAG) | ||
| Params (search): query (required), fileIds (required - array of file storage IDs), topK? (optional number), similarityThreshold? (optional number) | ||
| Output (upload_document): \`{ data: { success, executionTimeMs, fileId, ragDocumentId?, ... } }\` | ||
|
|
There was a problem hiding this comment.
Document the full rag.search interface, including chunked results.
This section exposes search, but it does not describe the new chunk-return flag or the search result shape. For the contract-comparison workflow, that means generated steps will have to guess both how to request chunk payloads and which fields to read back from steps.*.output.data for filenames/chunks. Please add the missing param and an explicit Output (search) contract.
Suggested update
Ops: upload_document, delete_document, search
Params (upload_document): fileId (required - file storage ID), fileName? (optional), contentType? (optional), sync? (optional boolean)
Params (delete_document): fileId (required - file storage ID to delete from RAG)
- Params (search): query (required), fileIds (required - array of file storage IDs), topK? (optional number), similarityThreshold? (optional number)
+ Params (search): query (required), fileIds (required - array of file storage IDs), topK? (optional number), similarityThreshold? (optional number), returnChunks? (optional boolean)
Output (upload_document): `{ data: { success, executionTimeMs, fileId, ragDocumentId?, ... } }`
+ Output (search): `{ data: [...results] }` - results include filename metadata and chunk fields when `returnChunks` is true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts` around
lines 113 - 118, Add a new optional parameter to the rag.search contract (e.g.,
returnChunks?: boolean or chunked?: boolean) and update the documentation block
for Output (search) to explicitly show both modes: when chunked is false return
data.results: array of {documentId/ragDocumentId, fileId, filename?, snippet,
score, metadata?} plus summary fields (totalHits, executionTimeMs); when chunked
is true return data.chunks: array of {chunkId, ragDocumentId,
sourceFileId/fileId, filename?, text, startOffset, endOffset, score, similarity}
and keep data.results (or references) plus executionTimeMs and
topK/similarityThreshold echoes; reference the rag.search contract name in
workflow_syntax_compact.ts and ensure parameter names, types, and output field
names match the rest of the workflow engine (so generated steps can request
chunk payloads and read steps.*.output.data.chunks or .results consistently).
…n, and value sanitization Replace count-based message loading (recentMessages) with token-budget prioritized loading (maxHistoryTokens) — conversational messages load first, tool messages fill remaining budget with age-tiered truncation. Add workflow input validation against the start step's inputSchema before creating approval cards. Simplify contract comparison config by passing file names via input objects instead of metadata lookups. Add sanitizeConvexValue to handle undefined → null for Convex compatibility in set_variables action.
…arison Rewrite the contract comparison workflow (v2.0) to use paragraph-level text diffing instead of chunk-level RAG similarity search. Adds a diff service, compare endpoint, and structured JSON output from the LLM step to enable clause amendment frequency analysis in the report overview.
…ct comparison
Enrich diff output with word-level inline diffs ([-deleted-]{+added+} markers),
clause/section reference extraction (Section, Ziff., §, Art., etc.), and page
number annotations from DOCX page breaks. Update prompts to leverage these fields
for more precise legal analysis.
Rearchitect contract comparison to compare adjacent versions in a chronological chain instead of comparing all files against a single baseline. Add escape sequence normalization for JEXL-to-DOCX content and track version provenance (versionFrom/versionTo) in clause data.
…guage support Replace 4-way classification (Addition/Deletion/Modification/Rewording) with 3-tier significance model (Editorial/Structural/Substantive) that separates legal impact from textual change. Add configurable output language parameter with auto-detection fallback.
Summary
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Improvements