feat: add document_write agent tool with approval flow#807
Conversation
Add a new document_write tool that allows agents to save downloaded files as documents, gated behind an approval workflow. Includes crawler HTTP download utility, approval card UI, and updates to file tools for consistent document creation tracking.
- Add didRetry guard to prevent savedMessageId from overwriting full message content (including tool call parts) on non-retry generations - Add atomic claimDocumentWriteForExecution mutation to prevent double-execution race condition - Add organization ownership check on fileIds - Add random suffix to resourceId for uniqueness - Add rate limiting to createDocumentWriteApproval - Add metadata and organizationId to approval card memo comparator - Add batch-all-failed UI state for approval card - Extract hardcoded error strings to i18n - Replace fragile ValueError with DownloadDetectedException in crawler - Validate folder path segments early in tool handler - Add handler tests for document_write tool
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Sort import blocks and prefix unused variable with underscore.
📝 WalkthroughWalkthroughThis PR implements an end-to-end document write approval workflow. In the crawler service, it adds download detection for URLs that trigger file downloads via a new Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can suggest fixes for GitHub Check annotations.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/crawler/app/services/pdf_service.py (1)
133-147:⚠️ Potential issue | 🟠 Major
wait_untilparameter is partially ignored inurl_to_pdf.Navigation is hardcoded to
domcontentloaded, and onlyloadgets an additional wait. Callers can passcommitornetworkidlevia the API (as documented inUrlToPdfRequest), but these values are not honored. Theimage_service.pydemonstrates the correct handling pattern already used in the codebase.💡 Proposed fix
- # Navigate with domcontentloaded first, detecting download-triggered responses + # Navigate while preserving explicit caller intent where possible try: - await page.goto(url, wait_until="domcontentloaded", timeout=timeout) + if wait_until == "commit": + await page.goto(url, wait_until="commit", timeout=timeout) + else: + await page.goto(url, wait_until="domcontentloaded", timeout=timeout) except PlaywrightError as e: if "Download is starting" in str(e): raise DownloadDetectedException( f"URL triggers a file download instead of rendering a page: {url}" ) from e raise - # If 'load' was requested, try to wait for it but don't fail if it times out - if wait_until == "load": + # If stricter load states were requested, try to wait without hard-failing on timeout + if wait_until in ("load", "networkidle"): with contextlib.suppress(PlaywrightTimeoutError): - await page.wait_for_load_state("load", timeout=timeout) + await page.wait_for_load_state(wait_until, timeout=timeout)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/crawler/app/services/pdf_service.py` around lines 133 - 147, The url_to_pdf implementation ignores caller-supplied wait_until values by hardcoding page.goto(..., wait_until="domcontentloaded") and only conditionally waiting for "load"; update url_to_pdf to pass the incoming wait_until parameter into page.goto (e.g., page.goto(url, wait_until=wait_until, timeout=timeout)) and handle special cases like "load" or "commit"/"networkidle" the same way image_service.py does (preserve the DownloadDetectedException handling for download-triggering errors and still suppress PlaywrightTimeoutError when doing any extra wait_for_load_state or equivalent). Ensure the function name url_to_pdf and the UrlToPdfRequest semantics are preserved so callers that pass "commit" or "networkidle" receive the intended behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@services/crawler/app/routers/image.py`:
- Around line 158-162: The Content-Disposition filename currently uses
request.options.image_type while media_type uses the actual downloaded content
type (ct); change the code that builds the filename (around the Response return
using file_bytes, ct, and request.options.image_type) to derive the file
extension from ct (e.g., map common MIME types like image/png, image/jpeg,
image/gif, image/webp -> png, jpg/jpeg, gif, webp) and use that extension in the
filename, falling back to request.options.image_type if ct is unknown or
unmapped, so the returned filename extension matches the actual media_type.
In `@services/crawler/app/routers/pdf.py`:
- Line 149: Change the unpacking of the fallback download so the unused
content_type is ignored to silence linters; instead of assigning to content_type
from the call to download_file(...) (the line with file_bytes, content_type =
await download_file(str(request.url), timeout_seconds)), discard the second
value (e.g., file_bytes, _ = await download_file(... ) or file_bytes,
_content_type = ...) so only file_bytes is used; keep the call to download_file,
request.url and timeout_seconds unchanged.
In `@services/crawler/app/utils/http_download.py`:
- Line 53: The debug log currently outputs the full URL
(logger.debug(f"Downloaded {total} bytes from {url}")), which can leak
credentials in query strings; change this to log a sanitized URL by parsing the
url variable (e.g., using urllib.parse.urlparse) and reconstructing only the
scheme, netloc and path or replace the query/fragment with a fixed token like
"<REDACTED>", then call logger.debug with the sanitized value (keep the same
total variable). Locate the logger.debug line that references url and replace it
with the sanitized-url approach so no raw query parameters or fragments are
logged.
- Around line 36-40: Guard the Content-Length parsing so malformed headers don't
raise ValueError and crash: when checking content_length against
MAX_FILE_DOWNLOAD_BYTES, parse it inside a try/except ValueError (e.g. assign
parsed_length = int(content_length) inside try), and if parsing fails simply
skip the size-based HTTPException check (or log the malformed header) so
streaming-size enforcement continues; update the block that references
content_length, int(content_length), MAX_FILE_DOWNLOAD_BYTES, HTTPException and
status.HTTP_400_BAD_REQUEST to use the safe parsed_length variable and only
raise the 400 when parsed_length is successfully obtained and exceeds
MAX_FILE_DOWNLOAD_BYTES.
In
`@services/platform/app/features/chat/components/document-write-approval-card.tsx`:
- Around line 186-307: The success banners and footer currently depend on the
top-level executionError; change their conditionals to derive success/failure
from per-file counts (failedCount and successCount) instead: update the two
"savedSuccessfully" blocks, the batch partial success block, the "All failed
(batch)" block, and the resolved footer (the ternary that shows
statusApprovedPartial/statusApprovedFailed/statusApprovedSuccess) to use
failedCount, successCount and files.length along with isBatch, executedAt and
status instead of checking executionError; keep the executionError/temporary UI
error block to display the top-level error when present and ensure single-file
execution error display still shows file.executionError where appropriate (refer
to variables executionError, failedCount, successCount, files.length, isBatch,
executedAt, status in document-write-approval-card.tsx).
In
`@services/platform/convex/agent_tools/documents/__tests__/document_write_tool.test.ts`:
- Around line 240-248: The test currently only checks that result.success is
defined, which passes even when the handler failed because the default mock
ctx's runQuery returns undefined; update the test that calls handler(...) with
folderPath 'a//b' to stub the context so runQuery returns valid file metadata
(and runMutation or whatever persistence method the handler uses is stubbed to
succeed) before invoking handler(createMockCtx()) and then assert explicitly
that result.success === true; refer to the test helper createMockCtx, the
handler under test, and the mocked runQuery/runMutation behavior to locate and
change the stubbing and the final assertion.
In `@services/platform/convex/agent_tools/documents/document_write_tool.ts`:
- Around line 130-159: The loop that looks up file metadata (using
toId<'_storage'>(file.fileId) and
ctx.runQuery(internal.file_metadata.internal_queries.getByStorageId, ...)) can
throw for malformed LLM-produced fileId values; wrap the validation and query in
a runtime-check + try/catch: first validate file.fileId (non-empty and matches
expected storage ID pattern/format) and if invalid return the same structured
user-facing error { success: false, message: "...invalid fileId...", error:
"Invalid fileId: ..." } as other checks do; if validation passes, run the query
inside try/catch and on any query/argument error return a descriptive
invalid-fileId failure instead of letting the exception bubble out (update the
code around the for-loop and the filesMetadata.push to only run after successful
validation/query).
In `@services/platform/convex/agent_tools/documents/internal_mutations.ts`:
- Around line 74-77: The patch currently uses executedAt as a claim flag which
can leave an approval permanently marked as done if a later step fails; change
the double-execution protection to use a separate claim/status field (e.g.,
claimedAt or status: 'claimed') instead of executedAt. Specifically, in the code
that reads const approval = await ctx.db.get(args.approvalId) and then does
await ctx.db.patch(args.approvalId, { executedAt: Date.now() }), update the
guard to check for approval.claimedAt (or approval.status === 'claimed') and
patch claimedAt: Date.now() (or status:'claimed') there; ensure executedAt is
only set later in the successful completion/result-persistence path (i.e., set
executedAt via ctx.db.patch after results are persisted) so exceptions do not
leave the approval unretryable.
- Around line 94-96: Replace the throw when the approval record is missing with
a silent early return: instead of throwing after const approval = await
ctx.db.get(args.approvalId), detect if (!approval) and return immediately
(no-op) so the result updater does not fail on raced/deleted approvals; keep the
existing executedAt check (if (approval.executedAt) return) and continue
treating an absent approval as a benign condition. Use the same
ctx.db.get(args.approvalId), args.approvalId, and approval.executedAt
identifiers to locate the code to change.
- Around line 29-53: The handler should validate args.files before creating the
approval: add a guard at the start of the handler (before constructing metadata
and computing fileCount/firstTitle) to reject with a clear, user-friendly error
if args.files is empty and to detect duplicate fileId values (e.g., by checking
a Set over args.files.map(f => f.fileId)) and reject if any duplicates exist;
return/throw a descriptive error (not silently create an Id<'approvals'>) so
callers get immediate runtime validation feedback.
In `@services/platform/convex/agent_tools/files/docx_tool.ts`:
- Around line 202-203: The documentation line incorrectly instructs passing
fileStorageId directly; update the guidance to show calling document_write with
a files array containing objects { fileId: fileStorageId } and an optional
folderPath. Specifically, modify the AFTER GENERATING note in docx_tool.ts to
state: call document_write with a payload like { files: [{ fileId: <returned
fileStorageId> }], folderPath: <optional path> } so consumers know to include
fileStorageId inside the files array rather than as a top-level param.
In `@services/platform/convex/agents/file/agent.ts`:
- Around line 119-123: Update the instruction for using the pdf tool to include
the required fileName parameter: when calling pdf_tool with operation="generate"
(the pdf tool handler), include fileName=<desired PDF name>, sourceType="url",
and content=<the URL> so the handler receives fileName and avoids "Missing
required 'fileName' for generate operation" errors; update the text that
currently shows operation="generate", sourceType="url", content=<the URL> to the
full form with fileName as specified.
In `@services/platform/convex/approvals/types.ts`:
- Around line 127-143: The current normalization unconditionally synthesizes a
blank file when raw.files is empty, causing fake file entries (fileId: '') to
bypass validations; update the logic in the normalization block that checks
raw.files to only synthesize a legacy file object when at least one legacy field
is present (check raw.fileId, raw.fileName, raw.title, raw.mimeType,
raw.fileSize, raw.createdDocumentId, raw.executionError), otherwise return an
empty files array (or leave raw.files empty) instead of creating a blank file;
modify the branch that builds the files array so it only constructs the file
object when any of those legacy fields are non-null/defined.
In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 1065-1072: The mutation that updates the retry message (using
ctx.runMutation(components.agent.messages.updateMessage) when didRetry &&
savedMessageId && result.text) must be best-effort and not allowed to fail the
whole action; wrap that call in a try/catch and skip/return silently on any
error, and also avoid running it if the action has been cancelled (check
ctx.signal?.aborted or the action cancellation flag before attempting the
mutation). Ensure you reference the didRetry, savedMessageId, result.text,
ctx.runMutation, and components.agent.messages.updateMessage symbols so the fix
is applied to that exact patch site.
In
`@services/platform/convex/workflow_engine/action_defs/document/document_action.ts`:
- Around line 238-286: After fetching fileMetadata (variable fileMetadata) and
before calling internal.documents.internal_mutations.createDocument (where
documentId is created), validate that fileMetadata.organizationId matches the
workflow organizationId (variable organizationId) and throw an error if they
differ; i.e., add a check comparing fileMetadata.organizationId !==
organizationId and throw a clear Error("File does not belong to this
organization.") to prevent cross-organization document creation (mirroring the
same check in document_write_tool.ts).
---
Outside diff comments:
In `@services/crawler/app/services/pdf_service.py`:
- Around line 133-147: The url_to_pdf implementation ignores caller-supplied
wait_until values by hardcoding page.goto(..., wait_until="domcontentloaded")
and only conditionally waiting for "load"; update url_to_pdf to pass the
incoming wait_until parameter into page.goto (e.g., page.goto(url,
wait_until=wait_until, timeout=timeout)) and handle special cases like "load" or
"commit"/"networkidle" the same way image_service.py does (preserve the
DownloadDetectedException handling for download-triggering errors and still
suppress PlaywrightTimeoutError when doing any extra wait_for_load_state or
equivalent). Ensure the function name url_to_pdf and the UrlToPdfRequest
semantics are preserved so callers that pass "commit" or "networkidle" receive
the intended behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 752defd2-cc60-412c-85ff-dd762d9c3d2e
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (44)
services/crawler/app/exceptions.pyservices/crawler/app/routers/image.pyservices/crawler/app/routers/pdf.pyservices/crawler/app/routers/web.pyservices/crawler/app/services/base_converter.pyservices/crawler/app/services/image_service.pyservices/crawler/app/services/pdf_service.pyservices/crawler/app/utils/http_download.pyservices/platform/app/features/chat/components/chat-interface.tsxservices/platform/app/features/chat/components/chat-messages.tsxservices/platform/app/features/chat/components/document-write-approval-card.tsxservices/platform/app/features/chat/hooks/mutations.tsservices/platform/app/features/chat/hooks/queries.tsservices/platform/app/features/chat/hooks/use-merged-chat-items.tsservices/platform/app/features/custom-agents/components/test-chat-panel/test-message-list.tsxservices/platform/app/features/custom-agents/components/tool-selector.tsxservices/platform/app/features/custom-agents/hooks/use-test-chat.tsservices/platform/convex/agent_tools/documents/__tests__/document_write_tool.test.tsservices/platform/convex/agent_tools/documents/document_write_tool.tsservices/platform/convex/agent_tools/documents/internal_actions.tsservices/platform/convex/agent_tools/documents/internal_mutations.tsservices/platform/convex/agent_tools/files/docx_tool.tsservices/platform/convex/agent_tools/files/excel_tool.tsservices/platform/convex/agent_tools/files/image_tool.tsservices/platform/convex/agent_tools/files/pdf_tool.tsservices/platform/convex/agent_tools/files/pptx_tool.tsservices/platform/convex/agent_tools/files/text_tool.tsservices/platform/convex/agent_tools/tool_names.tsservices/platform/convex/agent_tools/tool_registry.tsservices/platform/convex/agents/file/agent.tsservices/platform/convex/approvals/actions.tsservices/platform/convex/approvals/helpers.tsservices/platform/convex/approvals/schema.tsservices/platform/convex/approvals/types.tsservices/platform/convex/approvals/validators.tsservices/platform/convex/custom_agents/system_defaults.tsservices/platform/convex/documents/query_documents.tsservices/platform/convex/documents/schema.tsservices/platform/convex/documents/update_document.tsservices/platform/convex/documents/update_document_internal.tsservices/platform/convex/documents/validators.tsservices/platform/convex/lib/agent_response/generate_response.tsservices/platform/convex/workflow_engine/action_defs/document/document_action.tsservices/platform/messages/en.json
| return Response( | ||
| content=file_bytes, | ||
| media_type=ct, | ||
| headers={"Content-Disposition": f"attachment; filename=screenshot.{request.options.image_type}"}, | ||
| ) |
There was a problem hiding this comment.
Use filename extension derived from actual downloaded content type.
The response media_type is based on downloaded content (ct), but the filename extension is taken from the requested screenshot type. This can return mismatched files (for example, image/gif body with .png filename).
💡 Proposed fix
return Response(
content=file_bytes,
media_type=ct,
- headers={"Content-Disposition": f"attachment; filename=screenshot.{request.options.image_type}"},
+ headers={
+ "Content-Disposition": (
+ "attachment; filename=screenshot."
+ + ("jpg" if ct == "image/jpeg" else ct.split("/", 1)[1])
+ )
+ },
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return Response( | |
| content=file_bytes, | |
| media_type=ct, | |
| headers={"Content-Disposition": f"attachment; filename=screenshot.{request.options.image_type}"}, | |
| ) | |
| return Response( | |
| content=file_bytes, | |
| media_type=ct, | |
| headers={ | |
| "Content-Disposition": ( | |
| "attachment; filename=screenshot." | |
| ("jpg" if ct == "image/jpeg" else ct.split("/", 1)[1]) | |
| ) | |
| }, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/crawler/app/routers/image.py` around lines 158 - 162, The
Content-Disposition filename currently uses request.options.image_type while
media_type uses the actual downloaded content type (ct); change the code that
builds the filename (around the Response return using file_bytes, ct, and
request.options.image_type) to derive the file extension from ct (e.g., map
common MIME types like image/png, image/jpeg, image/gif, image/webp -> png,
jpg/jpeg, gif, webp) and use that extension in the filename, falling back to
request.options.image_type if ct is unknown or unmapped, so the returned
filename extension matches the actual media_type.
| logger.info(f"URL triggers download, falling back to HTTP: {request.url}") | ||
| try: | ||
| timeout_seconds = request.timeout / 1000 | ||
| file_bytes, content_type = await download_file(str(request.url), timeout_seconds) |
There was a problem hiding this comment.
Remove unused unpacked variable from fallback download.
content_type is never used after unpacking and should be ignored explicitly to avoid lint noise/failures.
💡 Proposed fix
- file_bytes, content_type = await download_file(str(request.url), timeout_seconds)
+ file_bytes, _content_type = await download_file(str(request.url), timeout_seconds)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| file_bytes, content_type = await download_file(str(request.url), timeout_seconds) | |
| file_bytes, _content_type = await download_file(str(request.url), timeout_seconds) |
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 149-149: Unpacked variable content_type is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/crawler/app/routers/pdf.py` at line 149, Change the unpacking of the
fallback download so the unused content_type is ignored to silence linters;
instead of assigning to content_type from the call to download_file(...) (the
line with file_bytes, content_type = await download_file(str(request.url),
timeout_seconds)), discard the second value (e.g., file_bytes, _ = await
download_file(... ) or file_bytes, _content_type = ...) so only file_bytes is
used; keep the call to download_file, request.url and timeout_seconds unchanged.
| if content_length and int(content_length) > MAX_FILE_DOWNLOAD_BYTES: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_400_BAD_REQUEST, | ||
| detail=f"File too large ({int(content_length)} bytes). Maximum: {MAX_FILE_DOWNLOAD_BYTES} bytes.", | ||
| ) |
There was a problem hiding this comment.
Guard Content-Length parsing to prevent malformed-header crashes.
At Line 36, int(content_length) can throw ValueError for invalid upstream headers, causing a 500 instead of continuing with streaming-size enforcement.
Proposed fix
- if content_length and int(content_length) > MAX_FILE_DOWNLOAD_BYTES:
- raise HTTPException(
- status_code=status.HTTP_400_BAD_REQUEST,
- detail=f"File too large ({int(content_length)} bytes). Maximum: {MAX_FILE_DOWNLOAD_BYTES} bytes.",
- )
+ if content_length:
+ try:
+ parsed_content_length = int(content_length)
+ except ValueError:
+ parsed_content_length = None
+
+ if (
+ parsed_content_length is not None
+ and parsed_content_length > MAX_FILE_DOWNLOAD_BYTES
+ ):
+ raise HTTPException(
+ status_code=status.HTTP_400_BAD_REQUEST,
+ detail=f"File too large ({parsed_content_length} bytes). Maximum: {MAX_FILE_DOWNLOAD_BYTES} bytes.",
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if content_length and int(content_length) > MAX_FILE_DOWNLOAD_BYTES: | |
| raise HTTPException( | |
| status_code=status.HTTP_400_BAD_REQUEST, | |
| detail=f"File too large ({int(content_length)} bytes). Maximum: {MAX_FILE_DOWNLOAD_BYTES} bytes.", | |
| ) | |
| if content_length: | |
| try: | |
| parsed_content_length = int(content_length) | |
| except ValueError: | |
| parsed_content_length = None | |
| if ( | |
| parsed_content_length is not None | |
| and parsed_content_length > MAX_FILE_DOWNLOAD_BYTES | |
| ): | |
| raise HTTPException( | |
| status_code=status.HTTP_400_BAD_REQUEST, | |
| detail=f"File too large ({parsed_content_length} bytes). Maximum: {MAX_FILE_DOWNLOAD_BYTES} bytes.", | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/crawler/app/utils/http_download.py` around lines 36 - 40, Guard the
Content-Length parsing so malformed headers don't raise ValueError and crash:
when checking content_length against MAX_FILE_DOWNLOAD_BYTES, parse it inside a
try/except ValueError (e.g. assign parsed_length = int(content_length) inside
try), and if parsing fails simply skip the size-based HTTPException check (or
log the malformed header) so streaming-size enforcement continues; update the
block that references content_length, int(content_length),
MAX_FILE_DOWNLOAD_BYTES, HTTPException and status.HTTP_400_BAD_REQUEST to use
the safe parsed_length variable and only raise the 400 when parsed_length is
successfully obtained and exceeds MAX_FILE_DOWNLOAD_BYTES.
| ) | ||
| chunks.append(chunk) | ||
|
|
||
| logger.debug(f"Downloaded {total} bytes from {url}") |
There was a problem hiding this comment.
Avoid logging full download URLs to prevent credential/query leakage.
At Line 53, logging the raw URL can expose signed query params or tokens in logs.
Proposed fix
+from urllib.parse import urlsplit, urlunsplit
@@
- logger.debug(f"Downloaded {total} bytes from {url}")
+ parsed = urlsplit(url)
+ safe_url = urlunsplit((parsed.scheme, parsed.netloc, parsed.path, "", ""))
+ logger.debug("Downloaded {} bytes from {}", total, safe_url)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/crawler/app/utils/http_download.py` at line 53, The debug log
currently outputs the full URL (logger.debug(f"Downloaded {total} bytes from
{url}")), which can leak credentials in query strings; change this to log a
sanitized URL by parsing the url variable (e.g., using urllib.parse.urlparse)
and reconstructing only the scheme, netloc and path or replace the
query/fragment with a fixed token like "<REDACTED>", then call logger.debug with
the sanitized value (keep the same total variable). Locate the logger.debug line
that references url and replace it with the sanitized-url approach so no raw
query parameters or fragments are logged.
| {/* Batch success summary */} | ||
| {status === 'approved' && executedAt && isBatch && !executionError && ( | ||
| <HStack gap={1} className="mb-3 text-xs text-green-600"> | ||
| <CheckCircle className="size-3" /> | ||
| {t('savedSuccessfully')} | ||
| </HStack> | ||
| )} | ||
|
|
||
| {/* Single file success */} | ||
| {status === 'approved' && executedAt && !isBatch && !executionError && ( | ||
| <HStack gap={1} className="mb-3 text-xs text-green-600"> | ||
| <CheckCircle className="size-3" /> | ||
| {t('savedSuccessfully')} | ||
| </HStack> | ||
| )} | ||
|
|
||
| {/* Partial success (batch) */} | ||
| {status === 'approved' && | ||
| executedAt && | ||
| isBatch && | ||
| failedCount > 0 && | ||
| successCount > 0 && ( | ||
| <HStack gap={1} className="mb-3 text-xs text-amber-600"> | ||
| <CheckCircle className="size-3" /> | ||
| {t('savedPartially', { | ||
| success: successCount, | ||
| total: files.length, | ||
| })} | ||
| </HStack> | ||
| )} | ||
|
|
||
| {/* Execution error (single file) */} | ||
| {status === 'approved' && executionError && !isBatch && ( | ||
| <HStack | ||
| gap={1} | ||
| align="start" | ||
| className="text-destructive mb-3 text-xs wrap-break-word" | ||
| > | ||
| <XCircle className="size-3 shrink-0" /> | ||
| <Text as="span" className="min-w-0"> | ||
| {executionError} | ||
| </Text> | ||
| </HStack> | ||
| )} | ||
|
|
||
| {/* All failed (batch) */} | ||
| {status === 'approved' && | ||
| executedAt && | ||
| isBatch && | ||
| failedCount === files.length && ( | ||
| <HStack | ||
| gap={1} | ||
| align="start" | ||
| className="text-destructive mb-3 text-xs wrap-break-word" | ||
| > | ||
| <XCircle className="size-3 shrink-0" /> | ||
| <Text as="span" className="min-w-0"> | ||
| {t('allFilesFailed', { count: files.length })} | ||
| </Text> | ||
| </HStack> | ||
| )} | ||
|
|
||
| {/* Temporary UI error */} | ||
| {error && ( | ||
| <HStack | ||
| gap={1} | ||
| align="start" | ||
| className="text-destructive mb-3 text-xs wrap-break-word" | ||
| > | ||
| <XCircle className="size-3 shrink-0" /> | ||
| <Text as="span" className="min-w-0"> | ||
| {error} | ||
| </Text> | ||
| </HStack> | ||
| )} | ||
|
|
||
| {/* Action buttons */} | ||
| {isPending && ( | ||
| <ActionRow gap={2}> | ||
| <Tooltip | ||
| content={isBatch ? t('approveTooltipBatch') : t('approveTooltip')} | ||
| > | ||
| <Button | ||
| size="sm" | ||
| variant="primary" | ||
| onClick={handleApprove} | ||
| disabled={isProcessing} | ||
| className="flex-1" | ||
| > | ||
| {isApproving && <Loader2 className="mr-1 size-4 animate-spin" />} | ||
| {t('approve')} | ||
| </Button> | ||
| </Tooltip> | ||
|
|
||
| <Tooltip | ||
| content={isBatch ? t('rejectTooltipBatch') : t('rejectTooltip')} | ||
| > | ||
| <Button | ||
| size="sm" | ||
| variant="secondary" | ||
| onClick={handleReject} | ||
| disabled={isProcessing} | ||
| className="flex-1" | ||
| > | ||
| {isRejecting && <Loader2 className="mr-1 size-4 animate-spin" />} | ||
| {t('reject')} | ||
| </Button> | ||
| </Tooltip> | ||
| </ActionRow> | ||
| )} | ||
|
|
||
| {/* Resolved status */} | ||
| {!isPending && ( | ||
| <HStack justify="between" align="center" className="mt-2"> | ||
| <Text as="div" variant="caption"> | ||
| {status === 'approved' && failedCount > 0 && successCount > 0 | ||
| ? t('statusApprovedPartial') | ||
| : status === 'approved' && executionError | ||
| ? t('statusApprovedFailed') | ||
| : status === 'approved' | ||
| ? t('statusApprovedSuccess') | ||
| : t('statusRejected')} |
There was a problem hiding this comment.
Derive the success UI from per-file results, not only executionError.
The card already renders failures from file.executionError, but every "success" branch here still keys off the top-level executionError. If execution writes only file-level errors, a single-file or batch approval can still show savedSuccessfully / statusApprovedSuccess next to failed rows. Use failedCount / successCount as the source of truth for the banners and resolved footer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/app/features/chat/components/document-write-approval-card.tsx`
around lines 186 - 307, The success banners and footer currently depend on the
top-level executionError; change their conditionals to derive success/failure
from per-file counts (failedCount and successCount) instead: update the two
"savedSuccessfully" blocks, the batch partial success block, the "All failed
(batch)" block, and the resolved footer (the ternary that shows
statusApprovedPartial/statusApprovedFailed/statusApprovedSuccess) to use
failedCount, successCount and files.length along with isBatch, executedAt and
status instead of checking executionError; keep the executionError/temporary UI
error block to display the top-level error when present and ensure single-file
execution error display still shows file.executionError where appropriate (refer
to variables executionError, failedCount, successCount, files.length, isBatch,
executedAt, status in document-write-approval-card.tsx).
|
|
||
| AFTER GENERATING: To save the file to a folder in the documents hub, call document_write with the returned fileStorageId and the desired folderPath. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify document_write tool parameter names match the documentation
# Check the document_write_tool.ts file for its parameter schema
ast-grep --pattern $'const $_ = z.object({
$$$
})' services/platform/convex/agent_tools/documents/document_write_tool.ts
# Also check for any validator or args definition
rg -n -A 10 "fileStorageId|folderPath" services/platform/convex/agent_tools/documents/document_write_tool.tsRepository: tale-project/tale
Length of output: 6386
Fix documentation to reflect the correct document_write API structure.
The guidance on lines 202-203 is incomplete. Users must pass the fileStorageId inside a files array with the structure { fileId: fileStorageId }, not as a direct parameter. The corrected guidance should be:
AFTER GENERATING: To save the file to the documents hub, call document_write with the files array (where each entry has fileId set to the returned fileStorageId) and the desired folderPath.
Or more explicitly show the structure: AFTER GENERATING: Call document_write({ files: [{ fileId: <returned fileStorageId> }], folderPath: <optional path> }) to save to the documents hub.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agent_tools/files/docx_tool.ts` around lines 202 -
203, The documentation line incorrectly instructs passing fileStorageId
directly; update the guidance to show calling document_write with a files array
containing objects { fileId: fileStorageId } and an optional folderPath.
Specifically, modify the AFTER GENERATING note in docx_tool.ts to state: call
document_write with a payload like { files: [{ fileId: <returned fileStorageId>
}], folderPath: <optional path> } so consumers know to include fileStorageId
inside the files array rather than as a top-level param.
| **DOWNLOADING FILES FROM URLS** | ||
| When the user provides URLs to existing PDF files and asks to download/save them: | ||
| 1. Use pdf tool with operation="generate", sourceType="url", content=<the URL> | ||
| 2. The tool automatically detects direct PDF links and downloads the original file | ||
| 3. To save to a folder, follow up with document_write using the returned fileStorageId |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
PDF_TOOL_FILE="$(fd -t f pdf_tool.ts | head -n 1)"
echo "Inspecting: $PDF_TOOL_FILE"
sed -n '1,260p' "$PDF_TOOL_FILE"
echo "---- schema/arg key scan ----"
rg -n -C3 --type=ts 'operation|sourceType|content|args|createTool|z\.object|v\.object' "$PDF_TOOL_FILE"
echo "---- related tests/examples ----"
rg -n -C2 --type=ts 'sourceType.*url|operation.*generate|pdf' services/platform/convex/agent_toolsRepository: tale-project/tale
Length of output: 50373
🏁 Script executed:
cat -n services/platform/convex/agents/file/agent.ts | sed -n '119,123p'Repository: tale-project/tale
Length of output: 458
Update the instruction to include the required fileName parameter.
Line 121 omits fileName, which is required by the pdf_tool handler. The tool will reject calls without it, resulting in "Missing required 'fileName' for generate operation" errors.
Fix:
Replace line 121:
1. Use pdf tool with operation="generate", sourceType="url", content=<the URL>
With:
1. Use pdf tool with operation="generate", fileName=<desired PDF name>, sourceType="url", content=<the URL>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agents/file/agent.ts` around lines 119 - 123, Update
the instruction for using the pdf tool to include the required fileName
parameter: when calling pdf_tool with operation="generate" (the pdf tool
handler), include fileName=<desired PDF name>, sourceType="url", and
content=<the URL> so the handler receives fileName and avoids "Missing required
'fileName' for generate operation" errors; update the text that currently shows
operation="generate", sourceType="url", content=<the URL> to the full form with
fileName as specified.
| if (raw.files?.length) return raw; | ||
| return { | ||
| files: [ | ||
| { | ||
| fileId: raw.fileId ?? '', | ||
| fileName: raw.fileName ?? '', | ||
| title: raw.title ?? '', | ||
| mimeType: raw.mimeType ?? '', | ||
| fileSize: raw.fileSize ?? 0, | ||
| createdDocumentId: raw.createdDocumentId, | ||
| executionError: raw.executionError, | ||
| }, | ||
| ], | ||
| folderPath: raw.folderPath, | ||
| requestedAt: raw.requestedAt, | ||
| executedAt: raw.executedAt, | ||
| }; |
There was a problem hiding this comment.
Do not synthesize a blank file entry during metadata normalization.
If raw.files is empty and legacy fields are absent, this creates a fake file with fileId: ''. That can bypass empty-file validation and fail later in execution with misleading per-file errors.
✅ Safer normalization logic
export function normalizeDocumentWriteMetadata(
raw: DocumentWriteMetadata,
): DocumentWriteMetadata {
if (raw.files?.length) return raw;
+ const hasLegacyFileId =
+ typeof raw.fileId === 'string' && raw.fileId.trim().length > 0;
return {
- files: [
- {
- fileId: raw.fileId ?? '',
- fileName: raw.fileName ?? '',
- title: raw.title ?? '',
- mimeType: raw.mimeType ?? '',
- fileSize: raw.fileSize ?? 0,
- createdDocumentId: raw.createdDocumentId,
- executionError: raw.executionError,
- },
- ],
+ files: hasLegacyFileId
+ ? [
+ {
+ fileId: raw.fileId!,
+ fileName: raw.fileName ?? '',
+ title: raw.title ?? '',
+ mimeType: raw.mimeType ?? '',
+ fileSize: raw.fileSize ?? 0,
+ createdDocumentId: raw.createdDocumentId,
+ executionError: raw.executionError,
+ },
+ ]
+ : [],
folderPath: raw.folderPath,
requestedAt: raw.requestedAt,
executedAt: raw.executedAt,
};
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (raw.files?.length) return raw; | |
| return { | |
| files: [ | |
| { | |
| fileId: raw.fileId ?? '', | |
| fileName: raw.fileName ?? '', | |
| title: raw.title ?? '', | |
| mimeType: raw.mimeType ?? '', | |
| fileSize: raw.fileSize ?? 0, | |
| createdDocumentId: raw.createdDocumentId, | |
| executionError: raw.executionError, | |
| }, | |
| ], | |
| folderPath: raw.folderPath, | |
| requestedAt: raw.requestedAt, | |
| executedAt: raw.executedAt, | |
| }; | |
| export function normalizeDocumentWriteMetadata( | |
| raw: DocumentWriteMetadata, | |
| ): DocumentWriteMetadata { | |
| if (raw.files?.length) return raw; | |
| const hasLegacyFileId = | |
| typeof raw.fileId === 'string' && raw.fileId.trim().length > 0; | |
| return { | |
| files: hasLegacyFileId | |
| ? [ | |
| { | |
| fileId: raw.fileId!, | |
| fileName: raw.fileName ?? '', | |
| title: raw.title ?? '', | |
| mimeType: raw.mimeType ?? '', | |
| fileSize: raw.fileSize ?? 0, | |
| createdDocumentId: raw.createdDocumentId, | |
| executionError: raw.executionError, | |
| }, | |
| ] | |
| : [], | |
| folderPath: raw.folderPath, | |
| requestedAt: raw.requestedAt, | |
| executedAt: raw.executedAt, | |
| }; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/approvals/types.ts` around lines 127 - 143, The
current normalization unconditionally synthesizes a blank file when raw.files is
empty, causing fake file entries (fileId: '') to bypass validations; update the
logic in the normalization block that checks raw.files to only synthesize a
legacy file object when at least one legacy field is present (check raw.fileId,
raw.fileName, raw.title, raw.mimeType, raw.fileSize, raw.createdDocumentId,
raw.executionError), otherwise return an empty files array (or leave raw.files
empty) instead of creating a blank file; modify the branch that builds the files
array so it only constructs the file object when any of those legacy fields are
non-null/defined.
| if (didRetry && savedMessageId && result.text) { | ||
| await ctx.runMutation(components.agent.messages.updateMessage, { | ||
| messageId: savedMessageId, | ||
| patch: { | ||
| message: { role: 'assistant', content: result.text }, | ||
| }, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Make retry-message patch best-effort and cancellation-aware.
Line 1065 performs a required mutation before the final cancellation gate and without error isolation. A transient patch failure can fail the whole action after a valid response was generated.
🔧 Suggested hardening
- if (didRetry && savedMessageId && result.text) {
- await ctx.runMutation(components.agent.messages.updateMessage, {
- messageId: savedMessageId,
- patch: {
- message: { role: 'assistant', content: result.text },
- },
- });
- }
+ if (didRetry && savedMessageId && result.text) {
+ // Avoid patching if generation was cancelled after retry completed.
+ if (!(await checkFreshAbort())) {
+ try {
+ await ctx.runMutation(components.agent.messages.updateMessage, {
+ messageId: savedMessageId,
+ patch: {
+ message: { role: 'assistant', content: result.text },
+ },
+ });
+ } catch (patchError) {
+ console.error(
+ '[generateAgentResponse] Failed to persist retry text to saved message:',
+ patchError,
+ );
+ }
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/lib/agent_response/generate_response.ts` around
lines 1065 - 1072, The mutation that updates the retry message (using
ctx.runMutation(components.agent.messages.updateMessage) when didRetry &&
savedMessageId && result.text) must be best-effort and not allowed to fail the
whole action; wrap that call in a try/catch and skip/return silently on any
error, and also avoid running it if the action has been cancelled (check
ctx.signal?.aborted or the action cancellation flag before attempting the
mutation). Ensure you reference the didRetry, savedMessageId, result.text,
ctx.runMutation, and components.agent.messages.updateMessage symbols so the fix
is applied to that exact patch site.
| const fileMetadata = await ctx.runQuery( | ||
| internal.file_metadata.internal_queries.getByStorageId, | ||
| { storageId }, | ||
| ); | ||
|
|
||
| if (!fileMetadata) { | ||
| throw new Error( | ||
| `File metadata not found for storage ID "${params.fileId}". The file may not exist.`, | ||
| ); | ||
| } | ||
|
|
||
| const docTitle = params.title ?? fileMetadata.fileName; | ||
| const organizationId = | ||
| typeof _variables.organizationId === 'string' | ||
| ? _variables.organizationId | ||
| : undefined; | ||
| const userId = | ||
| typeof _variables.userId === 'string' ? _variables.userId : undefined; | ||
|
|
||
| if (!organizationId) { | ||
| throw new Error( | ||
| 'organizationId is required in workflow variables to create a document', | ||
| ); | ||
| } | ||
|
|
||
| let folderId: string | null = null; | ||
| if (params.folderPath) { | ||
| folderId = await ctx.runMutation( | ||
| internal.folders.internal_mutations.getOrCreateFolderPath, | ||
| { | ||
| organizationId, | ||
| pathSegments: params.folderPath.split('/').filter(Boolean), | ||
| createdBy: userId, | ||
| }, | ||
| ); | ||
| } | ||
|
|
||
| const documentId = await ctx.runMutation( | ||
| internal.documents.internal_mutations.createDocument, | ||
| { | ||
| organizationId, | ||
| title: docTitle, | ||
| fileId: storageId, | ||
| mimeType: fileMetadata.contentType, | ||
| sourceProvider: 'agent', | ||
| createdBy: userId, | ||
| ...(folderId ? { folderId: toId<'folders'>(folderId) } : {}), | ||
| }, | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== documents/internal_mutations.ts ==="
fd 'internal_mutations.ts' services/platform/convex/documents -x sed -n '1,260p' {}
echo
echo "=== ownership-related references ==="
rg -n -C3 'createDocument|organizationId|fileId|storageId|getByStorageId' \
services/platform/convex/documents \
services/platform/convex/agent_tools/documents \
services/platform/convex/workflow_engine/action_defs/documentRepository: tale-project/tale
Length of output: 50374
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== create_document helper implementation ==="
fd 'create_document.ts' services/platform/convex/documents -x cat {}
echo
echo "=== file_metadata internal queries (search for getByStorageId) ==="
fd 'internal_queries.ts' services/platform/convex/file_metadata -x head -80 {}
echo
echo "=== file_metadata schema/types (to see organizationId field) ==="
rg -n 'organizationId' services/platform/convex/file_metadata --max-count=20Repository: tale-project/tale
Length of output: 2469
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== document_write_tool.ts: search for ownership/org check ==="
rg -n -B5 -A5 'fileMetadata|organizationId.*file|createDocument' \
services/platform/convex/agent_tools/documents/document_write_tool.ts | head -80Repository: tale-project/tale
Length of output: 1756
Add organization ownership check for file metadata before creating document.
The code fetches file metadata but never validates that fileMetadata.organizationId matches the workflow's organizationId. The document_write_tool.ts already enforces this check (lines 144–150), preventing cross-organization file attachment. Apply the same validation here before calling createDocument:
Suggested fix location
After fetching fileMetadata (line 238–241) and before calling createDocument (line 275), add:
if (fileMetadata.organizationId !== organizationId) {
throw new Error(`File does not belong to this organization.`);
}This ensures a document in one organization cannot reference a storage file belonging to another organization.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/convex/workflow_engine/action_defs/document/document_action.ts`
around lines 238 - 286, After fetching fileMetadata (variable fileMetadata) and
before calling internal.documents.internal_mutations.createDocument (where
documentId is created), validate that fileMetadata.organizationId matches the
workflow organizationId (variable organizationId) and throw an error if they
differ; i.e., add a check comparing fileMetadata.organizationId !==
organizationId and throw a clear Error("File does not belong to this
organization.") to prevent cross-organization document creation (mirroring the
same check in document_write_tool.ts).
Raise the default agent timeout from 7 minutes to 20 minutes and make it configurable per agent (1–25 min) via a new Advanced section in the agent settings UI. Update generate_response to trust the propagated deadline directly instead of capping it at 9 minutes, and bump Convex action limits in the Dockerfile to support longer-running agents.
… back to 'system'
The approval flow already gates document writes, making the rate limit redundant. The check referenced a rate limit name that was never registered, causing a runtime error.
Summary
document_writetool that allows agents to save downloaded files as documents, gated behind an approval workflow with a dedicated approval card UIdidRetryguard to fixgenerate_responseoverwrite bugTest plan
document_writetool and approval card renders correctlygenerate_responseno longer overwrites full message content on non-retry generationsdocument_write_tool.test.tsunit tests🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes