feat: contract comparison workflow with chunk-level RAG by larryro · Pull Request #722 · tale-project/tale

larryro · 2026-03-09T04:44:33Z

Summary

Add a predefined contract comparison workflow that uses chunk-level RAG search to compare multiple legal contracts against a base document, producing a structured DOCX comparison report
Extend RAG service with search operation, sync upload mode, filename in results, and return_chunks option
Add document action capabilities (retrieve, get_metadata, generate_docx) for workflow steps
Add file metadata tracking table and refactor RAG operations to use file storage IDs instead of document IDs
Add workflow execution cancellation, deferred storage cleanup, and separate output persistence from execution completion
Improve chat UI with better approval cards, file upload handling, and agent response triggering on workflow completion

Test plan

Existing tests updated and passing for execution lifecycle, variables, RAG helpers, and document retrieval
New tests added for file metadata access control, upload_document, upload_file_direct, on_workflow_complete, and file-types utilities
Manual end-to-end test: upload two+ contracts, run the contract comparison workflow, verify DOCX report output
Verify workflow cancellation works correctly mid-execution
Verify deferred storage cleanup runs after workflow completion

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added contract comparison workflow for analyzing multiple contracts
- Workflow executions can now be stopped mid-run
- Enhanced document operations in workflows (retrieve content, generate DOCX)
- RAG search capability integrated into workflows
Improvements
- Status badges repositioned in approval cards for clearer visibility
- File uploads now scoped to organization context

…rn_chunks option - JOIN chunks with documents table in search queries to include filename - Add sync=true parameter to upload endpoint for synchronous ingestion - Add return_chunks=true parameter to content endpoint for individual chunk arrays Refs #664

- Add sync option to UploadFileArgs, appending ?sync=true query param - Increase timeout to 120s for sync uploads vs 30s default - Skip polling and set ragInfo to 'completed' directly when sync=true - Add sync to RagActionParams type and parametersValidator Refs #664

…ction - Create fetch_document_content.ts with shared HTTP fetch logic for RAG content endpoint - Refactor retrieve_document.ts to delegate fetch to shared helper - Add 'retrieve' operation to document action (with access control via getAccessibleDocumentIds) - Add 'generate_docx' operation to document action (delegates to generateDocument internal action) - Support returnChunks option in fetch helper for chunk-level iteration Refs #664

- Add 'search' variant to RagActionParams and parametersValidator - POST to /api/v1/search with document_ids, top_k, similarity_threshold - Return raw SearchResult[] with content, score, document_id, filename - Add filename field to SearchResult interface Refs #664

Add example workflow definition for comparing legal contracts using chunk-level RAG search. Fix lint error in RAG search timeout handling. Refs #664

…nt status

The create_workflow tool's Zod schema and internal mutation validator were missing 'output' as a valid stepType, despite it being supported by the schema, save_workflow_definition tool, and workflow engine. Refs #664

Replace lenient jsonRecordValidator with stepConfigValidator in the createWorkflowCreationApproval mutation so invalid step configs are caught before the approval card is shown, not at execution time. Also fix contract comparison workflow inputSchema to use the correct properties/required wrapper format. Refs #664

…RAG operations RAG service indexes files by their storage ID, not the application-level document ID. This aligns all RAG interactions (upload, delete, search, retrieve) to use file storage IDs consistently, adds indexedFileId tracking to ragInfo schema, automatic re-indexing on document file updates, backward-compatible param migration for existing workflows, and reliable MIME type resolution via extension fallback for uploads.

Introduce a fileMetadata schema and module that records organizationId, storageId, fileName, contentType, and size for every file stored in Convex storage. Integrate metadata persistence across all file upload paths: chat uploads, document creation, OneDrive imports, agent tools (excel/txt), document generation (docx/pptx/generic), integration sandbox storage, and webhook file uploads. The RAG upload_document helper now resolves fileName and contentType from fileMetadata, ensuring correct file extensions are sent to the RAG service.

…ment retrieve Add stop/cancel button to workflow run approval card allowing users to cancel running executions. Includes new cancelExecution mutation with proper status validation and component workflow cleanup. Remove redundant access check from document retrieve action since access is already validated at the workflow level. Update contract comparison config to use formatList template helper.

…s and executive overview Replace free-form LLM summary with structured difference tables per clause and a brief executive introduction, embedding clause-level results directly in the DOCX report for better readability and consistency.

Storage blobs from workflow executions are now deleted via scheduled jobs instead of immediately, preventing race conditions where concurrent reads may still reference old blobs. Intermediate storage uses a 24-hour retention period; final storage retains the existing 30-day window. Adds deferred cleanup to cancellation and stuck execution recovery paths.

…low output serializeAndCompleteExecution was JSON.parsing variables containing a _storageRef pointer instead of fetching the actual data from blob storage, causing workflow output to be lost for large payloads (>400KB).

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

…pletion - Add get_metadata document action to retrieve file names by storage ID - Include base and comparison document names in workflow config, LLM prompts, and final report - Replace passive system message with active agent generation on workflow completion - Add idempotency guard to prevent duplicate completion messages - Dynamic searchTopK based on number of comparison documents - Require LLM to output sections for all comparison docs even without RAG matches

Split the serialize-and-complete action into a serialize-only step (persistExecutionOutput) so output and variables are persisted before the component callback fires. The onWorkflowComplete callback remains the sole authority for transitioning execution status to terminal states, reading the already-persisted output instead of relying on the void returnValue from dynamic workflows.

…mprove workflow prompts Rename fileId/url to fileStorageId/downloadUrl across all document generation tools, types, and validators for consistent naming. Suppress LLM meta-commentary in contract comparison prompts and use system role for workflow completion messages.

Remove colored borders and relocate the status badge from the top-right corner (where it was truncated by overflow-hidden) to the bottom-right where there is sufficient space for full display.

Prefix unused `variables` parameter with underscore and replace unsafe `as` cast with `toId` helper for storage ID conversion.

coderabbitai · 2026-03-09T04:58:33Z

📝 Walkthrough

Walkthrough

This PR implements a comprehensive file metadata persistence system and refactors the execution lifecycle for workflow runs. Key changes include: (1) introducing a new fileMetadata table with mutations to persist file details alongside storage operations; (2) renaming result fields across file generation tools from fileId/url to fileStorageId/downloadUrl for clarity; (3) adding execution cancellation support with proper state transitions and storage cleanup scheduling; (4) refactoring execution serialization to separate output persistence from variable updates; (5) updating RAG service integration to use file IDs instead of record/document IDs and adding synchronous ingestion and search capabilities; (6) modifying file upload hooks to require and persist organizationId throughout the stack; (7) refining approval-card UI components for status badge placement; and (8) adding a new "output" step type to workflow definitions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

tale-project/tale\Replace Cognee with lightweight RAG pipeline #607 — Both PRs modify overlapping RAG/document/file-storage codepaths including fileId → fileStorageId/downloadUrl renames, file metadata persistence, RAG upload/search helpers, DOCX/Excel/Image tool result shapes, and workflow action changes.
tale-project/tale\fix(platform): improve workflow engine reliability and prevent execution buildup #433 — Both PRs modify execution serialization, persistence, storage cleanup scheduling, and recovery paths, including the introduction of deferred storage cleanup and INTERMEDIATE_STORAGE_RETENTION_MS.
tale-project/tale\feat(platform): improve chat UX with persisted drafts and error handling #502 — Both PRs modify the chat file-upload hook (use-convex-file-upload.ts), updating its API signature to require organizationId and manage file metadata persistence.

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/664-contract-comparison-workflow

coderabbitai

Actionable comments posted: 27

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)

services/platform/convex/agent_tools/files/pptx_tool.ts (1)

327-336: ⚠️ Potential issue | 🔴 Critical

Fix type mismatch between GeneratePptxResult.fileStorageId and GenerateResult.fileStorageId.

The generatePptx action returns fileStorageId as Id<'_storage'>, but GenerateResult expects it as string. The type assertion as GenerateResult masks this type incompatibility. Either change GenerateResult.fileStorageId to Id<'_storage'> to match the upstream return type, or explicitly convert the ID before spreading the result.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/files/pptx_tool.ts` around lines 327 -
336, Type mismatch: GeneratePptxResult.fileStorageId is Id<'_storage'> but
GenerateResult.fileStorageId is string and the current `as GenerateResult` hides
this; fix by either updating the GenerateResult type to use Id<'_storage'> for
fileStorageId or convert the returned ID to string before returning. Locate the
return in generatePptx (the `result` object used in the block that logs
'tool:pptx generate success') and either change GenerateResult.fileStorageId to
Id<'_storage'> or replace the spread with an explicit fileStorageId:
String(result.fileStorageId) (or equivalent conversion) so the returned shape
matches GenerateResult without a type assertion.

services/platform/convex/agent_tools/rag/format_search_results.ts (1)

8-13: ⚠️ Potential issue | 🟠 Major

Surface filename in the formatted chunk output.

Adding filename to SearchResult is inert right now because formatSearchResults still emits only score + content. In the new multi-contract comparison flow, that drops source attribution for each chunk, so the agent can't reliably tell which contract an excerpt came from.

♻️ Proposed fix

 export function formatSearchResults(
   results: SearchResult[],
 ): string | undefined {
   if (results.length === 0) return undefined;

   return results
     .map((r, idx) => {
       const score = (r.score * 100).toFixed(1);
-      return `[${idx + 1}] (Relevance: ${score}%)\n${r.content}`;
+      const sourceLine = r.filename ? `Source: ${r.filename}\n` : '';
+      return `[${idx + 1}] (Relevance: ${score}%)\n${sourceLine}${r.content}`;
     })
     .join('\n\n---\n\n');
 }

Also applies to: 40-50

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/rag/format_search_results.ts` around
lines 8 - 13, formatSearchResults currently only emits score and content, losing
the new SearchResult.filename; update the function formatSearchResults (and any
helper like formatChunk or buildFormattedResults) to include filename (and
optionally document_id/metadata) in each formatted chunk output so the agent can
see source attribution. Locate where SearchResult objects are mapped/serialized
inside formatSearchResults and append a readable filename field (e.g.,
"filename: <value>") to the returned string/object for each result, ensuring
undefined filenames are handled gracefully (skip or show "unknown").

services/rag/app/routers/search.py (1)

35-41: ⚠️ Potential issue | 🟡 Minor

Propagate filename in /generate sources too.

/search now includes filenames, but generate() still strips them when building sources. That makes the two RAG endpoints expose different provenance and leaves generated answers without the same source attribution.

♻️ Proposed fix

         sources = [
             SearchResult(
                 content=s.get("content", ""),
                 score=s.get("score", 0.0),
                 document_id=s.get("document_id"),
+                filename=s.get("filename"),
                 metadata=s.get("metadata"),
             )
             for s in result.get("sources", [])
         ]

Also applies to: 75-81

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/rag/app/routers/search.py` around lines 35 - 41, The generate()
handler currently builds its sources from search results but omits the filename
field, causing inconsistent provenance; update the code that maps search results
into source objects (the block in generate() that builds "sources", referenced
around the mapping of SearchResult/`r` into source entries) to include the
filename (e.g., use r.get("filename") or sr.filename) and propagate it into each
source's metadata or filename property so the /generate response includes the
same filename attribution as /search; apply the same change where sources are
constructed in the second occurrence (the other mapping around lines 75-81).

services/rag/app/services/rag_service.py (1)

328-340: ⚠️ Potential issue | 🟡 Minor

Make return_chunks=true consistent for empty ranges.

chunks is only populated on the non-empty path. When the requested window resolves to zero rows, the earlier return still omits it, so callers explicitly asking for chunk data receive null/missing instead of [].

♻️ Proposed fix

         if not rows:
-            return {
+            result = {
                 "document_id": document_id,
                 "title": doc["filename"],
                 "content": "",
                 "chunk_range": {"start": 0, "end": 0},
                 "total_chunks": total_chunks,
                 "total_chars": 0,
-            }
+            }
+            if return_chunks:
+                result["chunks"] = []
+            return result

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/rag/app/services/rag_service.py` around lines 328 - 340, The result
dict currently only adds "chunks" when there are rows, so callers with
return_chunks=True get no "chunks" key for empty ranges; after constructing
result (using variables like result, return_chunks, rows, combined, document_id)
ensure that if return_chunks is True and "chunks" isn't already set you add an
empty list (e.g., if return_chunks and "chunks" not in result: result["chunks"]
= []). This guarantees callers always receive a chunks key (either populated or
[]) before the function returns.

services/platform/convex/agent_tools/files/excel_tool.ts (1)

214-243: ⚠️ Potential issue | 🟠 Major

downloadUrl still points at the raw storage object.

Now that the tool tells the model to surface downloadUrl verbatim, returning ctx.storage.getUrl(fileId) will download/open as the opaque storage blob rather than result.fileName. The document generators in this PR use the filename-preserving download endpoint for exactly this reason; the Excel tool should do the same.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/files/excel_tool.ts` around lines 214 -
243, The returned downloadUrl currently uses ctx.storage.getUrl(fileId) which
points to the raw storage blob; instead return the filename-preserving download
endpoint so clients get a download with result.fileName. Replace the getUrl call
used to populate downloadUrl in the generate branch (where fileStorageId is
fileId and file metadata was saved via
internal.file_metadata.internal_mutations.saveFileMetadata) with the storage
download endpoint that preserves filenames (e.g., use ctx.storage.getDownloadUrl
or construct the filename-preserving download URL with fileId and
result.fileName) and throw if that URL is unavailable.

services/platform/convex/onedrive/internal_actions.ts (1)

102-125: ⚠️ Potential issue | 🟠 Major

Don't silently fall back to the 'system' org here.

If a caller misses the new organizationId, this action still succeeds and saves metadata under a synthetic tenant. The imported file then disappears from org-scoped queries instead of failing fast, which makes propagation bugs very hard to catch. Make organizationId required for this path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/onedrive/internal_actions.ts` around lines 102 -
125, The handler for this action silently falls back to 'system' when
args.organizationId is missing; change this so organizationId is required and
the action fails fast. Update the action's input schema (replace
v.optional(v.string()) for organizationId with v.string()) and in the
handler/downloadAndStoreFileImpl path remove the defaulting (args.organizationId
?? 'system') and instead validate/throw or return a failure if
args.organizationId is undefined before calling saveFileMetadata (and ensure
saveFileMetadata is invoked with the provided args.organizationId).

services/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.ts (1)

195-219: ⚠️ Potential issue | 🟡 Minor

These assertions won't catch an unexpected intermediate-cleanup job.

Line 206 and Line 218 only prove there was no synchronous storage.delete(). completeExecution() now defers cleanup through scheduler.runAfter, so both tests still pass if an unwanted INTERMEDIATE_STORAGE_RETENTION_MS deletion is scheduled.

Suggested assertions

   it('should not schedule deferred deletion when IDs match', async () => {
@@
-    expect(ctx._deletedStorageIds).not.toContain('same_storage');
+    expect(ctx.scheduler.runAfter).not.toHaveBeenCalledWith(
+      INTERMEDIATE_STORAGE_RETENTION_MS,
+      expect.anything(),
+      { storageId: 'same_storage' },
+    );
   });

   it('should NOT schedule deferred deletion when no replacement is provided', async () => {
@@
-    expect(ctx.storage.delete).not.toHaveBeenCalled();
+    expect(ctx.scheduler.runAfter).not.toHaveBeenCalledWith(
+      INTERMEDIATE_STORAGE_RETENTION_MS,
+      expect.anything(),
+      { storageId: 'existing_storage' },
+    );
   });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@services/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.ts`
around lines 195 - 219, The tests currently only assert storage.delete wasn't
called synchronously but miss checking deferred deletions scheduled via
scheduler.runAfter; update the two tests around completeExecution to also assert
that scheduler.runAfter was not invoked (or was invoked with no deletion task)
and/or that no job referencing INTERMEDIATE_STORAGE_RETENTION_MS or the storage
id was scheduled; specifically, in the tests that set
ctx._storedExecution.variablesStorageId and call completeExecution, add
assertions against the mocked scheduler.runAfter (and/or inspect
ctx._deletedStorageIds) to ensure no deferred deletion job for the storage id
was scheduled after completeExecution.

services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts (1)

53-76: ⚠️ Potential issue | 🟠 Major

Guard workflow.completed emission behind the same terminal-state check.

wasTerminal suppresses the state transition and chat response, but emitEvent(...) still runs on every successful re-entry. A retried callback will therefore fire workflow.completed again and can retrigger downstream subscriptions for an execution that is already terminal.

Suggested guard

   if (kind === 'success') {
     // Only transition to completed if not already done (the serialize action
     // does NOT set status; this callback is the sole completion authority).
     // The idempotency guard in completeExecution also protects against races.
     if (!wasTerminal) {
       // Use the already-persisted output from the serialize action.
       // Fall back to result.returnValue for simple workflows that skip serialization.
       const output = exec.output ?? toConvexJsonValue(result.returnValue);
       await ctx.runMutation(
         internal.wf_executions.internal_mutations.completeExecution,
         {
           executionId: toId<'wfExecutions'>(exec._id),
           output,
         },
       );
-    }
-    const updatedExec = await ctx.db.get(exec._id);
-    if (updatedExec) {
-      await emitEvent(ctx, {
-        organizationId: updatedExec.organizationId,
-        eventType: 'workflow.completed',
-        eventData: { execution: updatedExec },
-      });
+      const updatedExec = await ctx.db.get(exec._id);
+      if (updatedExec?.status === 'completed') {
+        await emitEvent(ctx, {
+          organizationId: updatedExec.organizationId,
+          eventType: 'workflow.completed',
+          eventData: { execution: updatedExec },
+        });
+      }
     }
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts`
around lines 53 - 76, The emitEvent call for 'workflow.completed' is not guarded
by the same terminal-state check (wasTerminal) used for the state transition,
causing duplicate completion events on retries; update on_workflow_complete.ts
to only call emitEvent when the execution was not already terminal (i.e., wrap
or gate the emitEvent invocation with if (!wasTerminal) or move it inside the
block that runs completeExecution) so that emitEvent (via emitEvent(ctx, {...}))
only fires when
ctx.runMutation(internal.wf_executions.internal_mutations.completeExecution,
{...}) is performed and not on re-entries.

services/platform/convex/documents/validators.ts (1)

81-89: ⚠️ Potential issue | 🟡 Minor

Use v.id('_storage') for fileStorageId to align with TypeScript types and codebase pattern.

The fileStorageId field uses v.string() but the TypeScript interfaces (GenerateDocxResult, GenerateDocxFromTemplateResult, GeneratePptxResult) all define it as Id<'_storage'>. This is inconsistent with other storage ID validators in the codebase (e.g., variablesStorageId, outputStorageId, iconStorageId all use v.id('_storage')) and with the project learning to prefer v.id('_storage') for storage IDs.

Change to match:
Suggested diff
 export const generateDocumentResponseValidator = v.object({
   success: v.boolean(),
-  fileStorageId: v.string(),
+  fileStorageId: v.id('_storage'),
   downloadUrl: v.string(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/documents/validators.ts` around lines 81 - 89, The
generateDocumentResponseValidator currently defines fileStorageId as v.string();
change it to use v.id('_storage') to match the TypeScript types (Id<'_storage'>)
and the project's storage ID pattern used by other validators (e.g.,
variablesStorageId, outputStorageId, iconStorageId); update the fileStorageId
property in generateDocumentResponseValidator to v.id('_storage') so the
validator aligns with the interfaces GenerateDocxResult /
GenerateDocxFromTemplateResult / GeneratePptxResult.

services/platform/app/features/chat/hooks/use-convex-file-upload.ts (1)

108-120: ⚠️ Potential issue | 🟡 Minor

Implement compensating cleanup if metadata save fails after file upload.

If saveFileMetadata fails after the upload succeeds (line 114), the file remains in storage but the metadata record is not created, and the storageId is lost in the error handler. There is no background cleanup mechanism for orphaned chat files. Either wrap both operations in a transaction-like pattern, delete the uploaded file if metadata save fails, or ensure a background job cleans up untracked storage blobs in the chat context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts` around
lines 108 - 120, When saveFileMetadata fails after a successful upload, perform
compensating cleanup by deleting the uploaded blob using the returned storageId;
specifically, wrap the metadata save call in a try/catch around saveFileMetadata
and on error call the storage delete/cleanup API with storageId (the same
storageId from result.json) before rethrowing the error, ensuring you reference
fileToUpload.name and contentType/resolvedType only for logging; alternatively,
if a delete API isn't available expose a cleanup task/queue invocation with
storageId so orphaned blobs are removed later—make the change around the block
that calls saveFileMetadata to guarantee no uploaded file remains without
metadata.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/platform/convex/agent_tools/files/image_tool.ts`:
- Line 71: Update the return documentation in image_tool.ts to match the
GenerateImageResult type by adding the missing fileStorageId and extension
fields to the documented return spec; locate the comment block that lists
"Returns: { operation, success, downloadUrl, fileName, contentType, size }" and
include fileStorageId and extension so the docstring matches the
GenerateImageResult shape used by the code.

In `@services/platform/convex/agent_tools/files/pdf_tool.ts`:
- Line 51: The return documentation for GeneratePdfResult is missing the
fileStorageId and extension fields—update the doc comment that currently lists
"Returns: { success, downloadUrl, fileName, contentType, size }" to include
fileStorageId and extension so it matches the GeneratePdfResult type; locate the
comment in pdf_tool.ts (the doc block around GeneratePdfResult / the generatePdf
export) and add entries describing fileStorageId (storage identifier) and
extension (file extension) to the returned fields.

In `@services/platform/convex/agent_tools/files/txt_tool.ts`:
- Around line 113-126: Summary: Storing the blob (ctx.storage.store(blob))
before saving metadata
(internal.file_metadata.internal_mutations.saveFileMetadata) can leave orphaned
blobs if the metadata mutation fails; wrap the two steps with compensating logic
and error handling. Update the flow in the function that calls
ctx.storage.store, internal.file_metadata.internal_mutations.saveFileMetadata,
and ctx.storage.getUrl so that after successfully calling
ctx.storage.store(blob) you attempt the metadata save in a try/catch and on
failure call the storage cleanup API (e.g., ctx.storage.delete(fileId) or
equivalent) to remove the stored blob, then rethrow or surface the original
error; ensure you only call ctx.storage.getUrl(fileId) after metadata save
succeeds so the returned URL is consistent with saved metadata.

In `@services/platform/convex/conversations/send_message_via_integration.ts`:
- Around line 67-73: Replace the blind insert of file metadata with a
lookup-and-conditional-insert: in send_message_via_integration where
ctx.db.insert('fileMetadata', {...}) is called, first query for an existing
fileMetadata row by storageId (and optionally organizationId), if found validate
that the existing row.organizationId matches args.organizationId (or handle
mismatch as an error), and only insert a new row when none exists; alternatively
use a database upsert/insert-if-not-exists operation keyed on storageId to
ensure idempotence and deterministic cleanup behavior.

In `@services/platform/convex/custom_agents/webhooks/http_actions.ts`:
- Around line 105-114: The upload currently stores the blob with
ctx.storage.store(file) and then calls
internal.file_metadata.internal_mutations.saveFileMetadata, but if the mutation
throws the stored blob becomes orphaned; wrap the mutation call in a try/catch:
after obtaining storageId, run the saveFileMetadata inside try, and in catch
call ctx.storage.delete(storageId) (or the appropriate ctx.storage removal
method) to delete the uploaded blob, then log a warning if the delete fails and
rethrow the original mutation error so the httpAction still fails; reference
ctx.storage.store(file), storageId, and
internal.file_metadata.internal_mutations.saveFileMetadata when making the
change.

In
`@services/platform/convex/documents/__tests__/get_accessible_file_ids.test.ts`:
- Around line 13-35: The mock createMockCtx currently ignores the withIndex
callback and returns all docs; update it to capture the callback passed to
withIndex and use it to filter docs so the test asserts that
q.eq('organizationId', args.organizationId) is honored by getAccessibleFileIds.
Specifically, change the vi.fn() for query().withIndex to accept a predicate
(the index callback) and apply that predicate to each document when building the
asyncIterator (or throw/assert if the predicate doesn't filter by
organizationId), referencing createMockCtx, withIndex, and getAccessibleFileIds
so the test fails if the organizationId equality check is removed.

In `@services/platform/convex/documents/generate_document.ts`:
- Around line 116-125: The saveFileMetadata call can throw after the blob is
already uploaded to _storage, leaving an orphaned blob; wrap the
ctx.runMutation(internal.file_metadata.internal_mutations.saveFileMetadata, ...)
call in a try/catch and on error attempt a best-effort cleanup of the uploaded
blob (call the same _storage delete/remove method used for uploads with the
storageId), swallow any delete errors but ensure the original error is rethrown;
apply the same pattern to the PPTX helper that uploads to _storage so uploads
are deleted on metadata persistence failure or scheduled for cleanup if
immediate deletion fails.

In `@services/platform/convex/documents/generate_docx_from_template.ts`:
- Around line 128-137: Wrap the call to
ctx.runMutation(internal.file_metadata.internal_mutations.saveFileMetadata, ...)
in a try/catch so that if saveFileMetadata throws you delete or schedule
deletion of the uploaded blob identified by storageId before rethrowing; locate
the saveFileMetadata invocation in generate_docx_from_template.ts and in the
catch call your storage cleanup routine (the existing storage deletion helper or
API used elsewhere in this module) using storageId, then rethrow the original
error to preserve failure semantics.

In `@services/platform/convex/documents/generate_docx.ts`:
- Around line 122-131: The saveFileMetadata mutation call (ctx.runMutation with
internal.file_metadata.internal_mutations.saveFileMetadata) can fail after the
DOCX blob has been uploaded (storageId), leaving orphaned blobs; wrap the
ctx.runMutation call in a try/catch, and on any error call the same storage
deletion/cleanup routine used elsewhere (delete the uploaded blob by storageId
or equivalent cleanup method) before rethrowing the original error so callers
still see the failure but no orphaned file remains; ensure you reference
storageId and the docxBytes.length context and preserve the original error when
rethrowing.

In `@services/platform/convex/documents/get_accessible_document_ids.ts`:
- Around line 51-80: getAccessibleFileIds duplicates the document iteration and
team/rag visibility predicate used in getAccessibleDocumentIds; extract that
shared scan into a single helper (e.g., collectAccessibleDocuments or
scanAccessibleDocs) that accepts ctx and args.organizationId/userId, yields or
returns matching document records, and encapsulates the ragInfo.status ===
'completed' and team membership checks (using getUserTeamIds and the org/team
set logic). Replace the loop in both getAccessibleFileIds and
getAccessibleDocumentIds to call this helper and then map the resulting
documents to fileId or _id respectively at the call site.

In `@services/platform/convex/documents/mutations.ts`:
- Around line 130-137: Resolve and reuse a single normalized contentType
variable instead of using args.contentType directly: compute something like
resolvedContentType = args.contentType && args.contentType !== '' ?
args.contentType : 'application/octet-stream' (or trim-check for emptiness) and
use resolvedContentType in both the ctx.db.insert('fileMetadata', ...) call and
the subsequent createDocument(...) call so documents.mimeType is never stored as
an empty string; update places that reference args.contentType (e.g., the
fileMetadata insert block and the createDocument(...) invocation) to use the
normalized variable.

In `@services/platform/convex/file_metadata/internal_mutations.ts`:
- Around line 14-25: The current upsert looks up rows only by storageId
(query('fileMetadata').withIndex('by_storageId', q => q.eq('storageId',
args.storageId))) and will patch and return an existing row even if its
organizationId differs; update the logic in internal_mutations.ts so the lookup
is scoped to organizationId (either include organizationId in the query/index
predicate or, after fetching existing, check existing.organizationId !==
args.organizationId and throw/reject) before calling ctx.db.patch on the
existing._id; reference the fileMetadata lookup, withIndex('by_storageId'),
existing, args.storageId, args.organizationId and ctx.db.patch when making the
change.

In `@services/platform/convex/file_metadata/mutations.ts`:
- Around line 14-40: The handler in the file metadata upsert currently only
authenticates the caller via authComponent.getAuthUser and then inserts/patches
any storageId regardless of org membership; change the flow to authenticate →
call getOrganizationMember (or equivalent org membership check) for
args.organizationId and reject if not a member, and before patching verify
existing.organizationId === args.organizationId (throw if mismatched) so updates
cannot modify rows from other organizations; ensure inserts use the authorized
organizationId from args after membership validation and keep the same control
flow (authComponent.getAuthUser, getOrganizationMember, then patch or insert) in
the handler function.

In
`@services/platform/convex/node_only/integration_sandbox/helpers/create_convex_storage_provider.ts`:
- Around line 52-61: The metadata save mutation can fail after
ctx.storage.store() has committed, leaving orphaned blobs; wrap the calls that
persist metadata (the ctx.runMutation call using
internal.file_metadata.internal_mutations.saveFileMetadata and the surrounding
download/store metadata flow) in try-catch blocks so that if the mutation throws
you call ctx.storage.delete(blobKey) (or the appropriate delete method) to
remove the already-stored blob, then rethrow the error; apply the identical
try-catch-and-delete pattern to the other ctx.storage.store() usage (the store
invocation referenced around the store() block) to ensure no orphaned blobs
remain on metadata persistence failure.

In `@services/platform/convex/onedrive/download_and_store_file.ts`:
- Around line 17-21: storeFile() currently persists the blob and only then calls
saveFileMetadata(storageId, contentType, size), so if saveFileMetadata throws
you end up with orphaned/duplicated _storage blobs; modify the flow in
download_and_store_file.ts so that either (a) you persist metadata first (if
possible) before finalizing the blob, or (b) if metadata must be written after
the blob, ensure you perform a compensating delete of the stored blob when
saveFileMetadata throws (i.e., catch errors from saveFileMetadata and call the
storage cleanup/delete for the created storageId/_storage before rethrowing or
returning failure), applying the same cleanup logic around all calls to
saveFileMetadata() (including the occurrence at lines ~52) so no orphaned blobs
remain.

In `@services/platform/convex/onedrive/import_files.ts`:
- Around line 173-178: The import flow currently calls deps.storeFile (creating
a blob) then deps.saveFileMetadata(storageId, ...) separately, which can leave
the blob/storageId orphaned if saveFileMetadata or subsequent document writes
fail; change this by either (A) wrapping storage+metadata persistence in a
single atomic abstraction (e.g., add a deps.saveFileWithMetadata or
storageService.saveWithMetadata that calls storeFile and saveFileMetadata and
rolls back/delete the blob on any failure) or (B) add explicit cleanup logic:
after storeFile returns storageId, ensure any catch/finally around
deps.saveFileMetadata and later document writes will call
deps.deleteStorage(storageId) (or the equivalent cleanup method) on failure.
Update the code paths that call storeFile (the call site using storageId, blob,
downloadResult) to use the new atomic method or the cleanup branch so storageId
cannot be left orphaned.

In `@services/platform/convex/onedrive/upload_and_create_document.ts`:
- Around line 77-83: storageStore is non-transactional relative to
saveFileMetadata so if saveFileMetadata throws you must remove the orphaned
blob: after obtaining storageId from deps.storageStore(blob) wrap the
deps.saveFileMetadata(...) call in a try/catch; in the catch await a deletion
call (e.g. await deps.deleteStoredBlob(storageId) — or the appropriate cleanup
helper in this module) to remove the stored blob, handle/log any deletion error
but ensure you rethrow the original saveFileMetadata error, leaving
storageStore, saveFileMetadata, storageId and blob references intact.

In `@services/platform/convex/predefined_workflows/document_rag_sync.ts`:
- Around line 88-91: The workflow assumes every document returned by the step
find_unprocessed_document has storage-backed fields (fileId, title, mimeType)
which are optional; fix by either updating the find_unprocessed_document query
to filter out documents where fileId (and optionally title/mimeType) is missing
so upload_document always receives valid storage-backed rows, or add an explicit
branch after find_unprocessed_document that checks these fields and
marks/records the document as skipped/processed (so it won't be repeatedly
selected) before attempting upload_document; reference find_unprocessed_document
and upload_document when making the change.

In
`@services/platform/convex/workflow_engine/action_defs/document/document_action.ts`:
- Line 90: The execute function currently declares a third parameter named
variables that is unused; rename it to _variables (or prefix it with an
underscore) in the execute signature (async execute(ctx, params, _variables)) to
satisfy the linter and avoid unused-variable warnings—ensure any existing
references (if any) in the body are updated to _variables as well and keep the
function name execute unchanged.
- Around line 145-159: The unsafe type assertion occurs in the get_metadata
branch where fileId is cast as Id<'_storage'> when calling
ctx.runQuery(internal.file_metadata.internal_queries.getByStorageId); either
validate the incoming fileId values (e.g., check format/length/regex) before
casting and only call the query with a guaranteed valid storageId, or if you
intend to accept string boundaries here, add a targeted linter suppression
comment directly above the cast (with a short rationale like "string boundary ->
cast to Id<'_storage'> at call site by design") so the pipeline knows this is
intentional; update the mapping in the case 'get_metadata' block around
params.fileIds.map(...) and the ctx.runQuery call accordingly.

In
`@services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts`:
- Around line 118-150: The completion message currently treats kind ===
'canceled' as a failure and shows "unknown error"; update the messageContent
construction in on_workflow_complete.ts to handle the 'canceled' case separately
(check the local kind variable or result.kind), creating a distinct canceled
template that uses any cancellation reason available from result (e.g.,
result.error or result.reason) and does not label it as failed, while keeping
the existing success and failed branches intact; ensure the adjusted
messageContent is passed to
internal.agent_tools.workflows.internal_mutations.triggerWorkflowCompletionResponse
so canceled runs notify the user that the workflow was canceled with the
provided reason.

In
`@services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts`:
- Around line 394-397: The fileId property for rag.upload_document is currently
typed as a plain string; update its schema to use the project's storage-ID shape
instead (the same _storage ID model used elsewhere in this file/registry) so
downstream validation treats it as a storage identifier rather than an opaque
string—replace the current fileId schema (type: 'string') with the shared
_storage ID schema/reference used by other fields in action_schemas.ts (copy the
exact structure/reference from another storage ID field in this file).

In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts`:
- Around line 113-118: Add a new optional parameter to the rag.search contract
(e.g., returnChunks?: boolean or chunked?: boolean) and update the documentation
block for Output (search) to explicitly show both modes: when chunked is false
return data.results: array of {documentId/ragDocumentId, fileId, filename?,
snippet, score, metadata?} plus summary fields (totalHits, executionTimeMs);
when chunked is true return data.chunks: array of {chunkId, ragDocumentId,
sourceFileId/fileId, filename?, text, startOffset, endOffset, score, similarity}
and keep data.results (or references) plus executionTimeMs and
topK/similarityThreshold echoes; reference the rag.search contract name in
workflow_syntax_compact.ts and ensure parameter names, types, and output field
names match the rest of the workflow engine (so generated steps can request
chunk payloads and read steps.*.output.data.chunks or .results consistently).
- Around line 93-99: The compact syntax contract string currently lists only
"update, retrieve, generate_docx" so the new workflow operation is
undiscoverable; add a new operation entry "get_metadata" alongside the others
and document its params and output shape in the same style—Params
(get_metadata): documentId (required) and optionally fields? (array of strings)
or keys? (array of strings); Output (get_metadata): `{ data: { metadata: {...} }
}` to return the full metadata object; update the string that starts with "Ops:"
and the corresponding Params/Output blocks in workflow_syntax_compact.ts so
agents can discover and wire the get_metadata action (look for the literal block
"Ops: update, retrieve, generate_docx" and the Params/Output lines to modify).

In `@services/platform/convex/workflows/executions/complete_execution.ts`:
- Around line 49-60: The current cleanup only schedules deletion when a new
variablesStorageId replaces the old blob; update the condition in
complete_execution.ts to also schedule deletion when the execution is
transitioned from stored blob to inline content (variablesSerialized).
Concretely, modify the if that references oldVariablesStorageId and
args.variablesStorageId so it triggers when oldVariablesStorageId exists AND
(args.variablesStorageId is present and differs from oldVariablesStorageId OR
args.variablesSerialized is truthy), then call
ctx.scheduler.runAfter(INTERMEDIATE_STORAGE_RETENTION_MS,
internal.wf_executions.internal_mutations.deleteStorageBlob, { storageId:
oldVariablesStorageId }) as currently implemented; mirror the same logic used in
resume_execution.ts.

In `@services/platform/convex/workflows/executions/delete_storage_blob.ts`:
- Around line 19-23: The current try/catch around
ctx.storage.delete(args.storageId) swallows all errors; change it to catch the
error value, detect/ignore only the “not found/already deleted” case and
otherwise log the failure (e.g., console.warn or processLogger/error) so
deferred-cleanup failures are visible; update the catch in the
delete_storage_blob workflow to reference ctx.storage.delete and log unexpected
errors with context including args.storageId and the caught error.

---

Outside diff comments:
In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts`:
- Around line 108-120: When saveFileMetadata fails after a successful upload,
perform compensating cleanup by deleting the uploaded blob using the returned
storageId; specifically, wrap the metadata save call in a try/catch around
saveFileMetadata and on error call the storage delete/cleanup API with storageId
(the same storageId from result.json) before rethrowing the error, ensuring you
reference fileToUpload.name and contentType/resolvedType only for logging;
alternatively, if a delete API isn't available expose a cleanup task/queue
invocation with storageId so orphaned blobs are removed later—make the change
around the block that calls saveFileMetadata to guarantee no uploaded file
remains without metadata.

In `@services/platform/convex/agent_tools/files/excel_tool.ts`:
- Around line 214-243: The returned downloadUrl currently uses
ctx.storage.getUrl(fileId) which points to the raw storage blob; instead return
the filename-preserving download endpoint so clients get a download with
result.fileName. Replace the getUrl call used to populate downloadUrl in the
generate branch (where fileStorageId is fileId and file metadata was saved via
internal.file_metadata.internal_mutations.saveFileMetadata) with the storage
download endpoint that preserves filenames (e.g., use ctx.storage.getDownloadUrl
or construct the filename-preserving download URL with fileId and
result.fileName) and throw if that URL is unavailable.

In `@services/platform/convex/agent_tools/files/pptx_tool.ts`:
- Around line 327-336: Type mismatch: GeneratePptxResult.fileStorageId is
Id<'_storage'> but GenerateResult.fileStorageId is string and the current `as
GenerateResult` hides this; fix by either updating the GenerateResult type to
use Id<'_storage'> for fileStorageId or convert the returned ID to string before
returning. Locate the return in generatePptx (the `result` object used in the
block that logs 'tool:pptx generate success') and either change
GenerateResult.fileStorageId to Id<'_storage'> or replace the spread with an
explicit fileStorageId: String(result.fileStorageId) (or equivalent conversion)
so the returned shape matches GenerateResult without a type assertion.

In `@services/platform/convex/agent_tools/rag/format_search_results.ts`:
- Around line 8-13: formatSearchResults currently only emits score and content,
losing the new SearchResult.filename; update the function formatSearchResults
(and any helper like formatChunk or buildFormattedResults) to include filename
(and optionally document_id/metadata) in each formatted chunk output so the
agent can see source attribution. Locate where SearchResult objects are
mapped/serialized inside formatSearchResults and append a readable filename
field (e.g., "filename: <value>") to the returned string/object for each result,
ensuring undefined filenames are handled gracefully (skip or show "unknown").

In `@services/platform/convex/documents/validators.ts`:
- Around line 81-89: The generateDocumentResponseValidator currently defines
fileStorageId as v.string(); change it to use v.id('_storage') to match the
TypeScript types (Id<'_storage'>) and the project's storage ID pattern used by
other validators (e.g., variablesStorageId, outputStorageId, iconStorageId);
update the fileStorageId property in generateDocumentResponseValidator to
v.id('_storage') so the validator aligns with the interfaces GenerateDocxResult
/ GenerateDocxFromTemplateResult / GeneratePptxResult.

In `@services/platform/convex/onedrive/internal_actions.ts`:
- Around line 102-125: The handler for this action silently falls back to
'system' when args.organizationId is missing; change this so organizationId is
required and the action fails fast. Update the action's input schema (replace
v.optional(v.string()) for organizationId with v.string()) and in the
handler/downloadAndStoreFileImpl path remove the defaulting (args.organizationId
?? 'system') and instead validate/throw or return a failure if
args.organizationId is undefined before calling saveFileMetadata (and ensure
saveFileMetadata is invoked with the provided args.organizationId).

In
`@services/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.ts`:
- Around line 195-219: The tests currently only assert storage.delete wasn't
called synchronously but miss checking deferred deletions scheduled via
scheduler.runAfter; update the two tests around completeExecution to also assert
that scheduler.runAfter was not invoked (or was invoked with no deletion task)
and/or that no job referencing INTERMEDIATE_STORAGE_RETENTION_MS or the storage
id was scheduled; specifically, in the tests that set
ctx._storedExecution.variablesStorageId and call completeExecution, add
assertions against the mocked scheduler.runAfter (and/or inspect
ctx._deletedStorageIds) to ensure no deferred deletion job for the storage id
was scheduled after completeExecution.

In
`@services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts`:
- Around line 53-76: The emitEvent call for 'workflow.completed' is not guarded
by the same terminal-state check (wasTerminal) used for the state transition,
causing duplicate completion events on retries; update on_workflow_complete.ts
to only call emitEvent when the execution was not already terminal (i.e., wrap
or gate the emitEvent invocation with if (!wasTerminal) or move it inside the
block that runs completeExecution) so that emitEvent (via emitEvent(ctx, {...}))
only fires when
ctx.runMutation(internal.wf_executions.internal_mutations.completeExecution,
{...}) is performed and not on re-entries.

In `@services/rag/app/routers/search.py`:
- Around line 35-41: The generate() handler currently builds its sources from
search results but omits the filename field, causing inconsistent provenance;
update the code that maps search results into source objects (the block in
generate() that builds "sources", referenced around the mapping of
SearchResult/`r` into source entries) to include the filename (e.g., use
r.get("filename") or sr.filename) and propagate it into each source's metadata
or filename property so the /generate response includes the same filename
attribution as /search; apply the same change where sources are constructed in
the second occurrence (the other mapping around lines 75-81).

In `@services/rag/app/services/rag_service.py`:
- Around line 328-340: The result dict currently only adds "chunks" when there
are rows, so callers with return_chunks=True get no "chunks" key for empty
ranges; after constructing result (using variables like result, return_chunks,
rows, combined, document_id) ensure that if return_chunks is True and "chunks"
isn't already set you add an empty list (e.g., if return_chunks and "chunks" not
in result: result["chunks"] = []). This guarantees callers always receive a
chunks key (either populated or []) before the function returns.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 33beeeb0-7c94-4b9b-9d63-95512226939b

📥 Commits

Reviewing files that changed from the base of the PR and between ae1534b and 700e49f.

⛔ Files ignored due to path filters (1)

services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**

📒 Files selected for processing (98)

examples/workflows/contract-comparison/config.json
services/platform/app/features/automations/hooks/use-assistant-chat.ts
services/platform/app/features/chat/components/chat-interface.tsx
services/platform/app/features/chat/components/human-input-request-card.tsx
services/platform/app/features/chat/components/integration-approval-card.tsx
services/platform/app/features/chat/components/workflow-creation-approval-card.tsx
services/platform/app/features/chat/components/workflow-run-approval-card.tsx
services/platform/app/features/chat/hooks/use-convex-file-upload.ts
services/platform/app/features/chat/hooks/use-execution-status.ts
services/platform/app/features/custom-agents/hooks/use-test-chat.ts
services/platform/app/features/documents/components/documents-action-menu.tsx
services/platform/app/features/documents/hooks/mutations.ts
services/platform/convex/agent_tools/documents/__tests__/document_retrieve_tool.test.ts
services/platform/convex/agent_tools/documents/helpers/fetch_document_content.ts
services/platform/convex/agent_tools/documents/helpers/retrieve_document.ts
services/platform/convex/agent_tools/files/docx_tool.ts
services/platform/convex/agent_tools/files/excel_tool.ts
services/platform/convex/agent_tools/files/image_tool.ts
services/platform/convex/agent_tools/files/pdf_tool.ts
services/platform/convex/agent_tools/files/pptx_tool.ts
services/platform/convex/agent_tools/files/txt_tool.ts
services/platform/convex/agent_tools/rag/format_search_results.ts
services/platform/convex/agent_tools/rag/query_rag_context.ts
services/platform/convex/agent_tools/rag/rag_search_tool.ts
services/platform/convex/agent_tools/workflows/create_workflow_tool.ts
services/platform/convex/agent_tools/workflows/internal_mutations.ts
services/platform/convex/conversations/send_message_via_integration.ts
services/platform/convex/custom_agents/webhooks/http_actions.ts
services/platform/convex/documents/__tests__/get_accessible_file_ids.test.ts
services/platform/convex/documents/actions.ts
services/platform/convex/documents/generate_document.ts
services/platform/convex/documents/generate_docx.ts
services/platform/convex/documents/generate_docx_from_template.ts
services/platform/convex/documents/generate_pptx.ts
services/platform/convex/documents/get_accessible_document_ids.ts
services/platform/convex/documents/internal_actions.ts
services/platform/convex/documents/internal_queries.ts
services/platform/convex/documents/mutations.ts
services/platform/convex/documents/schema.ts
services/platform/convex/documents/types.ts
services/platform/convex/documents/update_document_internal.ts
services/platform/convex/documents/upload_base64_to_storage.ts
services/platform/convex/documents/validators.ts
services/platform/convex/file_metadata/helpers.ts
services/platform/convex/file_metadata/internal_mutations.ts
services/platform/convex/file_metadata/internal_queries.ts
services/platform/convex/file_metadata/mutations.ts
services/platform/convex/file_metadata/schema.ts
services/platform/convex/lib/agent_response/generate_response.ts
services/platform/convex/node_only/integration_sandbox/helpers/create_convex_storage_provider.ts
services/platform/convex/node_only/integration_sandbox/internal_actions.ts
services/platform/convex/onedrive/actions.ts
services/platform/convex/onedrive/download_and_store_file.ts
services/platform/convex/onedrive/import_files.ts
services/platform/convex/onedrive/import_files_deps.ts
services/platform/convex/onedrive/internal_actions.ts
services/platform/convex/onedrive/upload_and_create_document.ts
services/platform/convex/onedrive/upload_and_create_document_deps.ts
services/platform/convex/predefined_workflows/document_rag_sync.ts
services/platform/convex/schema.ts
services/platform/convex/wf_executions/internal_mutations.ts
services/platform/convex/wf_executions/mutations.ts
services/platform/convex/workflow_engine/__tests__/execution_lifecycle.test.ts
services/platform/convex/workflow_engine/__tests__/execution_variables.test.ts
services/platform/convex/workflow_engine/action_defs/document/document_action.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/delete_document.test.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/delete_document.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/types.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/upload_document.test.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/upload_document.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/upload_file_direct.test.ts
services/platform/convex/workflow_engine/action_defs/rag/helpers/upload_file_direct.ts
services/platform/convex/workflow_engine/action_defs/rag/rag_action.ts
services/platform/convex/workflow_engine/helpers/engine/dynamic_workflow_handler.ts
services/platform/convex/workflow_engine/helpers/engine/index.ts
services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.test.ts
services/platform/convex/workflow_engine/helpers/engine/on_workflow_complete.ts
services/platform/convex/workflow_engine/helpers/engine/serialize_and_complete_execution_handler.test.ts
services/platform/convex/workflow_engine/helpers/engine/serialize_and_complete_execution_handler.ts
services/platform/convex/workflow_engine/helpers/recovery/recover_stuck_executions.test.ts
services/platform/convex/workflow_engine/helpers/recovery/recover_stuck_executions.ts
services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts
services/platform/convex/workflow_engine/internal_actions.ts
services/platform/convex/workflow_engine/workflow_syntax_compact.ts
services/platform/convex/workflows/executions/cleanup_execution_storage.ts
services/platform/convex/workflows/executions/complete_execution.ts
services/platform/convex/workflows/executions/delete_storage_blob.ts
services/platform/convex/workflows/executions/persist_execution_output.ts
services/platform/convex/workflows/executions/resume_execution.ts
services/platform/convex/workflows/executions/update_execution_variables.ts
services/platform/lib/shared/__tests__/file-types.test.ts
services/platform/lib/shared/file-types.ts
services/platform/messages/en.json
services/rag/app/models.py
services/rag/app/routers/documents.py
services/rag/app/routers/search.py
services/rag/app/services/rag_service.py
services/rag/app/services/search_service.py

coderabbitai · 2026-03-09T04:58:36Z

   - content: The HTML/Markdown you wrote, or a URL to capture
   - quality: "low" (1x JPEG, smallest file, default), "standard" (1x PNG), "high" (2x PNG), "ultra" (4x PNG). Use "low" unless user requests higher quality.
-   Returns: { operation, success, url, fileName, contentType, size }
+   Returns: { operation, success, downloadUrl, fileName, contentType, size }


⚠️ Potential issue | 🟡 Minor

Documentation omits fileStorageId and extension from return spec.

The documented return object is missing two fields that GenerateImageResult declares. Consider updating for completeness:

- Returns: { operation, success, downloadUrl, fileName, contentType, size } + Returns: { operation, success, fileStorageId, downloadUrl, fileName, contentType, extension, size }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Returns: { operation, success, downloadUrl, fileName, contentType, size }

Returns: { operation, success, fileStorageId, downloadUrl, fileName, contentType, extension, size }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/files/image_tool.ts` at line 71, Update the return documentation in image_tool.ts to match the GenerateImageResult type by adding the missing fileStorageId and extension fields to the documented return spec; locate the comment block that lists "Returns: { operation, success, downloadUrl, fileName, contentType, size }" and include fileStorageId and extension so the docstring matches the GenerateImageResult shape used by the code.

coderabbitai · 2026-03-09T04:58:36Z

   - extraCss: Additional CSS to inject
   - wrapInTemplate: Whether to wrap in HTML template
-   Returns: { success, url, fileName, contentType, size }
+   Returns: { success, downloadUrl, fileName, contentType, size }


⚠️ Potential issue | 🟡 Minor

Documentation is missing fileStorageId and extension fields.

The return documentation omits fields that are part of GeneratePdfResult. This inconsistency may confuse consumers relying on the tool description.

📝 Proposed fix

- Returns: { success, downloadUrl, fileName, contentType, size } + Returns: { success, fileStorageId, downloadUrl, fileName, contentType, extension, size }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Returns: { success, downloadUrl, fileName, contentType, size }

Returns: { success, fileStorageId, downloadUrl, fileName, contentType, extension, size }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/files/pdf_tool.ts` at line 51, The return documentation for GeneratePdfResult is missing the fileStorageId and extension fields—update the doc comment that currently lists "Returns: { success, downloadUrl, fileName, contentType, size }" to include fileStorageId and extension so it matches the GeneratePdfResult type; locate the comment in pdf_tool.ts (the doc block around GeneratePdfResult / the generatePdf export) and add entries describing fileStorageId (storage identifier) and extension (file extension) to the returned fields.

coderabbitai · 2026-03-09T04:58:36Z

          const fileId = await ctx.storage.store(blob);
+
+          await ctx.runMutation(
+            internal.file_metadata.internal_mutations.saveFileMetadata,
+            {
+              organizationId: ctx.organizationId ?? 'system',
+              storageId: fileId,
+              fileName: filename,
+              contentType: 'text/plain; charset=utf-8',
+              size: blob.size,
+            },
+          );
+
          const url = await ctx.storage.getUrl(fileId);


⚠️ Potential issue | 🟡 Minor

Potential for orphaned storage blobs if metadata save fails.

The ctx.storage.store(blob) call on line 113 executes in the action context, while saveFileMetadata runs as a separate mutation. If the mutation fails (e.g., due to validation or transient errors), the stored blob persists without associated metadata.

This isn't a critical issue since:

The file is still usable via the returned URL

Deferred storage cleanup (mentioned in PR objectives) may address orphans

However, consider wrapping both operations or adding compensating logic for production reliability.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/files/txt_tool.ts` around lines 113 - 126, Summary: Storing the blob (ctx.storage.store(blob)) before saving metadata (internal.file_metadata.internal_mutations.saveFileMetadata) can leave orphaned blobs if the metadata mutation fails; wrap the two steps with compensating logic and error handling. Update the flow in the function that calls ctx.storage.store, internal.file_metadata.internal_mutations.saveFileMetadata, and ctx.storage.getUrl so that after successfully calling ctx.storage.store(blob) you attempt the metadata save in a try/catch and on failure call the storage cleanup API (e.g., ctx.storage.delete(fileId) or equivalent) to remove the stored blob, then rethrow or surface the original error; ensure you only call ctx.storage.getUrl(fileId) after metadata save succeeds so the returned URL is consistent with saved metadata.

coderabbitai · 2026-03-09T04:58:36Z

+        await ctx.db.insert('fileMetadata', {
+          organizationId: args.organizationId,
+          storageId: att.storageId,
+          fileName: att.fileName,
+          contentType: att.contentType,
+          size: att.size,
+        });


⚠️ Potential issue | 🟠 Major

Avoid re-inserting metadata for an existing attachment blob.

This blindly creates a new fileMetadata row for every send. If the same storageId is reused, lookup/cleanup becomes non-deterministic, and an existing row is never validated before the blob is reused under this organization.

Possible fix

- await ctx.db.insert('fileMetadata', { - organizationId: args.organizationId, - storageId: att.storageId, - fileName: att.fileName, - contentType: att.contentType, - size: att.size, - }); + const existingFileMetadata = await ctx.db + .query('fileMetadata') + .withIndex('by_storageId', (q) => q.eq('storageId', att.storageId)) + .first(); + + if ( + existingFileMetadata && + existingFileMetadata.organizationId !== args.organizationId + ) { + throw new Error('Attachment does not belong to organization'); + } + + if (!existingFileMetadata) { + await ctx.db.insert('fileMetadata', { + organizationId: args.organizationId, + storageId: att.storageId, + fileName: att.fileName, + contentType: att.contentType, + size: att.size, + }); + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/conversations/send_message_via_integration.ts` around lines 67 - 73, Replace the blind insert of file metadata with a lookup-and-conditional-insert: in send_message_via_integration where ctx.db.insert('fileMetadata', {...}) is called, first query for an existing fileMetadata row by storageId (and optionally organizationId), if found validate that the existing row.organizationId matches args.organizationId (or handle mismatch as an error), and only insert a new row when none exists; alternatively use a database upsert/insert-if-not-exists operation keyed on storageId to ensure idempotence and deterministic cleanup behavior.

coderabbitai · 2026-03-09T04:58:36Z

+      await ctx.runMutation(
+        internal.file_metadata.internal_mutations.saveFileMetadata,
+        {
+          organizationId: webhook.organizationId,
+          storageId,
+          fileName: file.name,
+          contentType: file.type || 'application/octet-stream',
+          size: file.size,
+        },
+      );


⚠️ Potential issue | 🟠 Major

Clean up the uploaded blob if saveFileMetadata fails.

This runs in an httpAction, so if the metadata mutation throws after ctx.storage.store(file) succeeds, the webhook fails and leaves an orphaned _storage file behind.

♻️ Suggested fix

if (file instanceof File) { const storageId = await ctx.storage.store(file); - await ctx.runMutation( - internal.file_metadata.internal_mutations.saveFileMetadata, - { - organizationId: webhook.organizationId, - storageId, - fileName: file.name, - contentType: file.type || 'application/octet-stream', - size: file.size, - }, - ); + try { + await ctx.runMutation( + internal.file_metadata.internal_mutations.saveFileMetadata, + { + organizationId: webhook.organizationId, + storageId, + fileName: file.name, + contentType: file.type || 'application/octet-stream', + size: file.size, + }, + ); + } catch (error) { + await ctx.storage.delete(storageId); + throw error; + } attachment = {

Based on learnings, Convex mutations are transactional with storage, but actions are not, so this upload path needs explicit compensation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/custom_agents/webhooks/http_actions.ts` around lines 105 - 114, The upload currently stores the blob with ctx.storage.store(file) and then calls internal.file_metadata.internal_mutations.saveFileMetadata, but if the mutation throws the stored blob becomes orphaned; wrap the mutation call in a try/catch: after obtaining storageId, run the saveFileMetadata inside try, and in catch call ctx.storage.delete(storageId) (or the appropriate ctx.storage removal method) to delete the uploaded blob, then log a warning if the delete fails and rethrow the original mutation error so the httpAction still fails; reference ctx.storage.store(file), storageId, and internal.file_metadata.internal_mutations.saveFileMetadata when making the change.

coderabbitai · 2026-03-09T04:58:37Z

+Ops: update, retrieve, generate_docx
+Params (update): documentId (required), title?, content?, metadata?, mimeType?, extension?, sourceProvider?
+Params (retrieve): fileId (required - file storage ID), chunkStart? (number), chunkEnd? (number), returnChunks? (boolean)
+Params (generate_docx): fileName (required), sourceType (required: 'markdown'|'html'), content (required)
 Output (update): \`{ data: {...document} }\` - returns full document entity
+Output (retrieve): \`{ data: { content, chunks?, ... } }\` - returns document content from RAG
+Output (generate_docx): \`{ data: { fileStorageId, downloadUrl, fileName } }\`


⚠️ Potential issue | 🟠 Major

Add document.get_metadata to the compact syntax contract.

The PR objective adds get_metadata, but this reference only exposes update, retrieve, and generate_docx. Since this string is used to drive workflow generation, omitting the op means agents cannot discover or wire the new action correctly. Please add the operation and its params/output shape here.

Suggested update

- Ops: update, retrieve, generate_docx + Ops: update, retrieve, get_metadata, generate_docx Params (update): documentId (required), title?, content?, metadata?, mimeType?, extension?, sourceProvider? Params (retrieve): fileId (required - file storage ID), chunkStart? (number), chunkEnd? (number), returnChunks? (boolean) + Params (get_metadata): ... Params (generate_docx): fileName (required), sourceType (required: 'markdown'|'html'), content (required) Output (update): `{ data: {...document} }` - returns full document entity Output (retrieve): `{ data: { content, chunks?, ... } }` - returns document content from RAG + Output (get_metadata): `{ data: {...metadata} }` Output (generate_docx): `{ data: { fileStorageId, downloadUrl, fileName } }`

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts` around lines 93 - 99, The compact syntax contract string currently lists only "update, retrieve, generate_docx" so the new workflow operation is undiscoverable; add a new operation entry "get_metadata" alongside the others and document its params and output shape in the same style—Params (get_metadata): documentId (required) and optionally fields? (array of strings) or keys? (array of strings); Output (get_metadata): `{ data: { metadata: {...} } }` to return the full metadata object; update the string that starts with "Ops:" and the corresponding Params/Output blocks in workflow_syntax_compact.ts so agents can discover and wire the get_metadata action (look for the literal block "Ops: update, retrieve, generate_docx" and the Params/Output lines to modify).

coderabbitai · 2026-03-09T04:58:37Z

+Ops: upload_document, delete_document, search
+Params (upload_document): fileId (required - file storage ID), fileName? (optional), contentType? (optional), sync? (optional boolean)
+Params (delete_document): fileId (required - file storage ID to delete from RAG)
+Params (search): query (required), fileIds (required - array of file storage IDs), topK? (optional number), similarityThreshold? (optional number)
+Output (upload_document): \`{ data: { success, executionTimeMs, fileId, ragDocumentId?, ... } }\`



⚠️ Potential issue | 🟠 Major

Document the full rag.search interface, including chunked results.

This section exposes search, but it does not describe the new chunk-return flag or the search result shape. For the contract-comparison workflow, that means generated steps will have to guess both how to request chunk payloads and which fields to read back from steps.*.output.data for filenames/chunks. Please add the missing param and an explicit Output (search) contract.

Suggested update

Ops: upload_document, delete_document, search Params (upload_document): fileId (required - file storage ID), fileName? (optional), contentType? (optional), sync? (optional boolean) Params (delete_document): fileId (required - file storage ID to delete from RAG) - Params (search): query (required), fileIds (required - array of file storage IDs), topK? (optional number), similarityThreshold? (optional number) + Params (search): query (required), fileIds (required - array of file storage IDs), topK? (optional number), similarityThreshold? (optional number), returnChunks? (optional boolean) Output (upload_document): `{ data: { success, executionTimeMs, fileId, ragDocumentId?, ... } }` + Output (search): `{ data: [...results] }` - results include filename metadata and chunk fields when `returnChunks` is true

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/workflow_engine/workflow_syntax_compact.ts` around lines 113 - 118, Add a new optional parameter to the rag.search contract (e.g., returnChunks?: boolean or chunked?: boolean) and update the documentation block for Output (search) to explicitly show both modes: when chunked is false return data.results: array of {documentId/ragDocumentId, fileId, filename?, snippet, score, metadata?} plus summary fields (totalHits, executionTimeMs); when chunked is true return data.chunks: array of {chunkId, ragDocumentId, sourceFileId/fileId, filename?, text, startOffset, endOffset, score, similarity} and keep data.results (or references) plus executionTimeMs and topK/similarityThreshold echoes; reference the rag.search contract name in workflow_syntax_compact.ts and ensure parameter names, types, and output field names match the rest of the workflow engine (so generated steps can request chunk payloads and read steps.*.output.data.chunks or .results consistently).

…n, and value sanitization Replace count-based message loading (recentMessages) with token-budget prioritized loading (maxHistoryTokens) — conversational messages load first, tool messages fill remaining budget with age-tiered truncation. Add workflow input validation against the start step's inputSchema before creating approval cards. Simplify contract comparison config by passing file names via input objects instead of metadata lookups. Add sanitizeConvexValue to handle undefined → null for Convex compatibility in set_variables action.

…arison Rewrite the contract comparison workflow (v2.0) to use paragraph-level text diffing instead of chunk-level RAG similarity search. Adds a diff service, compare endpoint, and structured JSON output from the LLM step to enable clause amendment frequency analysis in the report overview.

…ct comparison Enrich diff output with word-level inline diffs ([-deleted-]{+added+} markers), clause/section reference extraction (Section, Ziff., §, Art., etc.), and page number annotations from DOCX page breaks. Update prompts to leverage these fields for more precise legal analysis.

Rearchitect contract comparison to compare adjacent versions in a chronological chain instead of comparing all files against a single baseline. Add escape sequence normalization for JEXL-to-DOCX content and track version provenance (versionFrom/versionTo) in clause data.

…guage support Replace 4-way classification (Addition/Deletion/Modification/Rewording) with 3-tier significance model (Editorial/Structural/Substantive) that separates legal impact from textual change. Add configurable output language parameter with auto-detection fallback.

larryro added 15 commits March 9, 2026 12:32

feat(platform): add contract comparison workflow example

410a201

Add example workflow definition for comparing legal contracts using chunk-level RAG search. Fix lint error in RAG search timeout handling. Refs #664

feat(platform): always show import menu regardless of Microsoft accou…

2497618

…nt status

chore: increase contract comparison workflow timeout to 2 hours

8f54dc3

greptile-apps Bot reviewed Mar 9, 2026

View reviewed changes

larryro added 5 commits March 9, 2026 12:47

fix(platform): move status badge from header to footer in approval cards

df89fb7

Remove colored borders and relocate the status badge from the top-right corner (where it was truncated by overflow-hidden) to the bottom-right where there is sufficient space for full display.

fix(platform): resolve lint errors in document action

71823c9

Prefix unused `variables` parameter with underscore and replace unsafe `as` cast with `toId` helper for storage ID conversion.

larryro force-pushed the feat/664-contract-comparison-workflow branch from 700e49f to 71823c9 Compare March 9, 2026 04:49

coderabbitai Bot requested changes Mar 9, 2026

View reviewed changes

larryro added 6 commits March 9, 2026 13:53

chore: increase contract comparison workflow timeout to 4 hours

7b24224

larryro merged commit ab3f6d6 into main Mar 9, 2026
16 checks passed

larryro deleted the feat/664-contract-comparison-workflow branch March 9, 2026 11:24

This was referenced Mar 11, 2026

feat(documents): enhance contract comparison with clause frequency analysis #750

Merged

feat: agent-scoped knowledge with team toggle and file uploads #794

Merged

This was referenced Mar 18, 2026

feat(agent): generate file download cards in chat via appendFilePart #810

Merged

feat: extract and surface document file dates across platform #862

Merged

coderabbitai Bot mentioned this pull request Mar 26, 2026

feat: source file dates, dbmate migrations, and subpath deployment support #870

Merged

7 tasks

yannickmonney pushed a commit that referenced this pull request Apr 8, 2026

feat: contract comparison workflow with chunk-level RAG (#722)

fc76570

coderabbitai Bot mentioned this pull request Apr 11, 2026

feat(platform): simplify file tools — auto RAG indexing, remove parse/template ops #1407

Merged

7 tasks

coderabbitai Bot mentioned this pull request May 6, 2026

fix(platform,rag): authenticate RAG service + add SSRF guard #1675

Closed

7 tasks

	Returns: { operation, success, downloadUrl, fileName, contentType, size }
	Returns: { operation, success, fileStorageId, downloadUrl, fileName, contentType, extension, size }

	Returns: { success, downloadUrl, fileName, contentType, size }
	Returns: { success, fileStorageId, downloadUrl, fileName, contentType, extension, size }

Conversation

larryro commented Mar 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Mar 9, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Mar 9, 2026 •

edited by coderabbitai Bot

Loading