Skip to content

feat(platform): configurable file retention policies#1391

Merged
larryro merged 3 commits into
mainfrom
feat/configurable-file-retention-policies
Apr 11, 2026
Merged

feat(platform): configurable file retention policies#1391
larryro merged 3 commits into
mainfrom
feat/configurable-file-retention-policies

Conversation

@larryro

@larryro larryro commented Apr 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add three independently configurable file retention categories: document retention, user temporary file cleanup, and agent temporary file cleanup
  • Each category has its own enable/disable toggle and retention period, configured via Governance > Retention settings tab
  • Cleanup is triggered on file upload with rate-limited scheduling (1/hour/org) using existing rate-limiter infrastructure
  • Add source field to fileMetadata to distinguish user vs agent file origins
  • Set DeepSeek V3.2 as default chat model for OpenRouter provider

Test plan

  • Verify retention editor UI renders in Governance settings tab
  • Test enabling/disabling each retention category independently
  • Confirm file uploads trigger rate-limited cleanup scheduling
  • Verify documents older than retention period are cleaned up
  • Check source field is correctly set for user and agent file uploads
  • Validate OpenRouter default model change works in chat agent

Summary by CodeRabbit

  • New Features

    • Added Retention tab to governance settings with configurable policies for document and temporary file retention, including separate controls for user and agent-generated temporary files with customizable retention periods and enable/disable toggles.
  • Chores

    • Updated default model selection in example configurations and enhanced internal file metadata tracking to record source attribution.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

larryro added 2 commits April 11, 2026 09:41
Add three independently configurable retention categories:
- Document retention: auto-delete documents exceeding a retention period
- User temporary file cleanup: delete orphaned user-uploaded files
- Agent temporary file cleanup: delete orphaned agent-generated files

Each category has its own enable/disable toggle and retention period,
configured via the Governance > Retention settings tab. Cleanup is
triggered on file upload with rate-limited scheduling (1/hour/org)
using the existing @convex-dev/rate-limiter infrastructure.

Also adds a `source` field to fileMetadata to distinguish user vs
agent file origins, and fixes the previously unused scope filter in
listExpiredDocuments.
Prioritize DeepSeek V3.2 over Kimi K2.5 as the default chat model
in the OpenRouter provider config and chat agent supported models list.
@larryro larryro force-pushed the feat/configurable-file-retention-policies branch from 13131fb to fa05d2f Compare April 11, 2026 01:42
@coderabbitai

coderabbitai Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This pull request introduces a file source tracking system and enhances retention policy management across the platform. It adds a source field ('user' | 'agent') to file metadata persistence, tracked throughout file upload paths including chat uploads, document generation, OneDrive imports, and integration sandbox operations. The retention policy configuration is restructured to support separate temporary file retention rules for user and agent sources, replacing a prior scope-based model. A new governance UI component enables configuration of document and temporary file retention intervals. Supporting infrastructure includes database schema updates with a new index, rate limiting for cleanup operations, and expanded retention cleanup logic to handle source-specific temporary file deletion.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Summary

Key Areas of Change:

  • Retention Policy Configuration: Schema refactored from scope-based ('all'|'upload'|'agent') to source-based with separate user/agent temporary retention hours
  • File Metadata Tracking: source field added consistently across 15+ files (agent tools, document generators, import handlers, upload flows)
  • Governance UI: New RetentionEditor component with form controls for enabling/disabling and configuring retention intervals
  • Database Layer: Added source field to fileMetadata schema and new composite index by_organizationId_and_source_and_documentId
  • Retention Cleanup Logic: Expanded retention_cleanup.ts with separate workflows for document retention, user temporary file retention, and agent temporary file retention; added new listExpiredTempFiles query and deleteExpiredTempFile mutation
  • Rate Limiting: New 'cleanup:retention' rule (1 request/hour) to throttle retention maintenance operations
  • Test Infrastructure: Extended mocks to support rate limiter and scheduler in test contexts

Heterogeneity: While many files follow repetitive patterns (adding source field), the changes span distinct domains—UI, persistence, cleanup logic, and configuration—requiring separate reasoning for governance refactoring and retention cleanup workflow understanding.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main feature: adding configurable file retention policies. It directly reflects the PR's core objective of introducing retention policy configuration options.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/configurable-file-retention-policies

Comment @coderabbitai help to get the list of available commands and usage tips.

The retention cleanup rate limiting added to saveFileMetadata calls
checkOrganizationRateLimit which requires ctx.runMutation internally.
Mock the rate limiter helpers and add scheduler to the test context.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts`:
- Line 198: Replace the unnecessary type assertion on the `source` field in
use-convex-file-upload.ts: locate the object literal where `source: 'user' as
const` is set (inside the Convex saveFileMetadata call) and remove the `as
const` so it becomes `source: 'user'`; this matches the Convex schema
`v.optional(v.union(v.literal('user'), v.literal('agent')))` and avoids an
unneeded assertion.

In
`@services/platform/app/features/settings/governance/components/retention-editor.tsx`:
- Around line 107-109: The component returns null while isLoading which causes
layout shift; update the loading branch in the RetentionEditor component to
render a consistent placeholder by returning a skeleton component (e.g.,
RetentionEditorSkeleton) instead of null, import or create
RetentionEditorSkeleton if missing, and ensure the skeleton matches the editor's
dimensions so UI layout remains stable while isLoading is true.
- Around line 113-146: Replace hardcoded user-facing strings in the retention
editor with the translation hook: import and call the t() function and replace
literals used in PageSection title/description ("Document Retention", the long
description), the Switch label ("Enabled"), the Input label ("Retention Days"),
and the Text hint ("Documents older than this will be deleted.") with t('...')
keys; update the PageSection, Switch, Input, and Text usages to pass translated
strings (e.g., title={t('...')}, description={t('...')}, label={t('...')}) and
apply the same change to the other two PageSection blocks so all visible text
uses t() keys consistently.

In `@services/platform/app/routes/dashboard/`$id/settings/governance.tsx:
- Around line 58-62: The tab label is hardcoded as "Retention"; replace it with
a localized string by calling the project's translation function/hook used in
this module (e.g., useTranslations or t) and pass a key like
"governance.tabs.retention"; update the tab entry where value: 'retention' and
content: <RetentionEditor organizationId={organizationId} /> to use the
translated label instead of the literal "Retention" so the UI respects
localization.

In `@services/platform/convex/file_metadata/__tests__/internal_mutations.test.ts`:
- Around line 29-38: Tests lack assertions for the new rate-limit and scheduler
branches: add expectations that the mocked checkOrganizationRateLimit (from
'../../lib/rate_limiter/helpers') is called with the correct org/context
arguments and that ctx.scheduler.runAfter is invoked with the expected delay and
payload when scheduling occurs; additionally add two error-path tests using the
mocked RateLimitExceededError to assert that this specific error is suppressed
(no throw, and scheduling/rate-limit behavior as expected) while a
non-rate-limit error thrown from checkOrganizationRateLimit still propagates
(test should expect a thrown error). Reference the mocked symbols
checkOrganizationRateLimit, RateLimitExceededError and ctx.scheduler.runAfter in
your assertions and create separate test cases for the suppressed-rate-limit
branch and the non-rate-limit error branch.

In `@services/platform/convex/file_metadata/schema.ts`:
- Around line 8-20: Legacy fileMetadata rows with source undefined are skipped
by the cleanup because listExpiredTempFiles filters with .eq('source',
args.source) expecting 'user'|'agent'; update the cleanup and schema handling by
either adding a migration to backfill missing source values in fileMetadata (set
a sensible default like 'user' or infer from existing fields) and/or make
listExpiredTempFiles tolerate undefined by changing the query to include rows
where source is null/undefined (e.g., OR the predicate to match args.source OR
source IS_NULL/UNDEFINED) so legacy rows are returned; locate the query named
listExpiredTempFiles and the fileMetadata schema (fields: source, fileName,
contentType, size and indices by_organizationId, by_storageId,
by_organizationId_and_documentId, by_organizationId_and_source_and_documentId)
and implement the migration or broadened filter accordingly.

In `@services/platform/convex/governance/internal_mutations_retention.ts`:
- Around line 38-44: In deleteExpiredTempFile, after retrieving metadata via
ctx.db.get(args.fileMetadataId) add a defensive guard that only calls
deleteStorageWithMetadata when metadata.documentId is undefined; if
metadata.documentId is set, return null (or skip deletion) to prevent deleting
document-backed storage—i.e., check metadata.documentId before invoking
deleteStorageWithMetadata and short-circuit when it exists.

In `@services/platform/convex/governance/retention_cleanup.ts`:
- Around line 85-147: The two nearly identical blocks for user and agent temp
file cleanup should be extracted into a shared helper to remove duplication; add
a function like async cleanupTempFiles(ctx: ActionCtx, ragUrl: string,
organizationId: string, source: 'user' | 'agent', hours: number, batchSize:
number) that encapsulates calling
internal.governance.internal_queries.listExpiredTempFiles, iterating
expiredFiles, deleting the rag document via
fetch(`${ragUrl}/api/v1/documents/${encodeURIComponent(file.storageId)}`, {
method: 'DELETE', signal: AbortSignal.timeout(30000) }) with the same try/catch
logging, and calling
internal.governance.internal_mutations_retention.deleteExpiredTempFile for each
file, then replace both blocks to call cleanupTempFiles(...) using
config.userTempRetentionHours/config.agentTempRetentionHours and the existing
flags config.userTempEnabled/config.agentTempEnabled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dc0a14aa-2d71-4b97-8953-015cc55d4a5b

📥 Commits

Reviewing files that changed from the base of the PR and between 2051970 and d15669e.

📒 Files selected for processing (29)
  • examples/agents/chat-agent.json
  • examples/providers/openrouter.json
  • services/platform/app/features/chat/hooks/use-convex-file-upload.ts
  • services/platform/app/features/settings/governance/components/retention-editor.tsx
  • services/platform/app/routes/dashboard/$id/settings/governance.tsx
  • services/platform/convex/agent_tools/files/excel_tool.ts
  • services/platform/convex/agent_tools/files/text_tool.ts
  • services/platform/convex/agents/webhooks/http_actions.ts
  • services/platform/convex/documents/generate_document.ts
  • services/platform/convex/documents/generate_docx.ts
  • services/platform/convex/documents/generate_docx_from_template.ts
  • services/platform/convex/documents/generate_pptx.ts
  • services/platform/convex/documents/upload_base64_to_storage.ts
  • services/platform/convex/file_metadata/__tests__/internal_mutations.test.ts
  • services/platform/convex/file_metadata/__tests__/mutations.test.ts
  • services/platform/convex/file_metadata/internal_mutations.ts
  • services/platform/convex/file_metadata/mutations.ts
  • services/platform/convex/file_metadata/schema.ts
  • services/platform/convex/governance/internal_mutations_retention.ts
  • services/platform/convex/governance/internal_queries.ts
  • services/platform/convex/governance/retention_cleanup.ts
  • services/platform/convex/lib/rate_limiter/index.ts
  • services/platform/convex/node_only/integration_sandbox/helpers/create_convex_storage_provider.ts
  • services/platform/convex/onedrive/import_files_deps.ts
  • services/platform/convex/onedrive/internal_actions.ts
  • services/platform/convex/onedrive/upload_and_create_document_deps.ts
  • services/platform/convex/workflow_engine/action_defs/document/helpers/apply_docx_structured.ts
  • services/platform/lib/shared/schemas/governance.ts
  • services/platform/messages/en.json

fileName: fileToUpload.name,
contentType: resolvedType || 'application/octet-stream',
size: fileToUpload.size,
source: 'user' as const,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the file
fd -t f "use-convex-file-upload.ts" services/

Repository: tale-project/tale

Length of output: 128


🏁 Script executed:

# Read the file with context around line 198
cat -n services/platform/app/features/chat/hooks/use-convex-file-upload.ts | sed -n '190,210p'

Repository: tale-project/tale

Length of output: 909


🏁 Script executed:

# Get more context to see the full function/mutation call
cat -n services/platform/app/features/chat/hooks/use-convex-file-upload.ts | sed -n '170,220p'

Repository: tale-project/tale

Length of output: 2032


🏁 Script executed:

# Find the saveFileMetadata function definition
rg -t ts "saveFileMetadata" services/platform --max-count 10 -B 2

Repository: tale-project/tale

Length of output: 9817


🏁 Script executed:

# Search for the import of saveFileMetadata in the file
cat services/platform/app/features/chat/hooks/use-convex-file-upload.ts | head -50

Repository: tale-project/tale

Length of output: 1584


🏁 Script executed:

# Find the mutations.ts file for file_metadata
cat -n services/platform/convex/file_metadata/mutations.ts | head -100

Repository: tale-project/tale

Length of output: 2729


Remove the unnecessary type assertion on source.

Line 198: The Convex schema for saveFileMetadata explicitly defines source: v.optional(v.union(v.literal('user'), v.literal('agent'))), which means the literal string 'user' is already the correct type. Remove as const and use source: 'user' directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts` at line
198, Replace the unnecessary type assertion on the `source` field in
use-convex-file-upload.ts: locate the object literal where `source: 'user' as
const` is set (inside the Convex saveFileMetadata call) and remove the `as
const` so it becomes `source: 'user'`; this matches the Convex schema
`v.optional(v.union(v.literal('user'), v.literal('agent')))` and avoids an
unneeded assertion.

Comment on lines +107 to +109
if (isLoading) {
return null;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding a loading skeleton.

Returning null during loading may cause layout shift. A skeleton component would provide a smoother user experience.

if (isLoading) {
  return <RetentionEditorSkeleton />;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@services/platform/app/features/settings/governance/components/retention-editor.tsx`
around lines 107 - 109, The component returns null while isLoading which causes
layout shift; update the loading branch in the RetentionEditor component to
render a consistent placeholder by returning a skeleton component (e.g.,
RetentionEditorSkeleton) instead of null, import or create
RetentionEditorSkeleton if missing, and ensure the skeleton matches the editor's
dimensions so UI layout remains stable while isLoading is true.

Comment on lines +113 to +146
<PageSection
title="Document Retention"
description="Automatically delete documents that exceed the retention period. Use with caution — deleted documents cannot be recovered."
action={
<Switch
label="Enabled"
checked={enabled}
onCheckedChange={(checked) => {
setEnabled(checked);
void saveConfig({ enabled: checked });
}}
disabled={cannotManage || upsertMutation.isPending}
/>
}
>
<div className="max-w-xs">
<Input
label="Retention Days"
type="number"
value={retentionDays}
onChange={(e) =>
setRetentionDays(e.target.value ? Number(e.target.value) : 0)
}
onBlur={() => void saveConfig({ retentionDays })}
disabled={cannotManage || !enabled}
size="sm"
placeholder="e.g. 90"
min={0}
/>
<Text className="text-muted-foreground mt-1 text-xs">
Documents older than this will be deleted.
</Text>
</div>
</PageSection>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded English strings should use translation hooks.

Multiple UI strings are hardcoded instead of using the t() function:

  • "Document Retention" (line 114)
  • "Automatically delete documents..." (line 115)
  • "Enabled" (line 118)
  • "Retention Days" (line 130)
  • "Documents older than this..." (line 143)

This applies to all three PageSection blocks. As per coding guidelines, use translation hooks for user-facing UI text.

Example fix for Document Retention section
       <PageSection
-        title="Document Retention"
-        description="Automatically delete documents that exceed the retention period. Use with caution — deleted documents cannot be recovered."
+        title={t('retentionPolicy.documentRetention.title')}
+        description={t('retentionPolicy.documentRetention.description')}
         action={
           <Switch
-            label="Enabled"
+            label={t('retentionPolicy.enabled')}
             checked={enabled}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@services/platform/app/features/settings/governance/components/retention-editor.tsx`
around lines 113 - 146, Replace hardcoded user-facing strings in the retention
editor with the translation hook: import and call the t() function and replace
literals used in PageSection title/description ("Document Retention", the long
description), the Switch label ("Enabled"), the Input label ("Retention Days"),
and the Text hint ("Documents older than this will be deleted.") with t('...')
keys; update the PageSection, Switch, Input, and Text usages to pass translated
strings (e.g., title={t('...')}, description={t('...')}, label={t('...')}) and
apply the same change to the other two PageSection blocks so all visible text
uses t() keys consistently.

Comment on lines +58 to +62
{
value: 'retention',
label: 'Retention',
content: <RetentionEditor organizationId={organizationId} />,
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Localize the new tab label instead of hardcoding English.

Line 60 introduces a hardcoded user-facing label ("Retention"). Use translation keys here to keep locale coverage consistent.

♻️ Suggested fix
-  const { t } = useT('accessDenied');
+  const { t } = useT('accessDenied');
+  const { t: tGovernance } = useT('governance');
@@
       {
         value: 'retention',
-        label: 'Retention',
+        label: tGovernance('retentionPolicy.title'),
         content: <RetentionEditor organizationId={organizationId} />,
       },
@@
-    [organizationId],
+    [organizationId, tGovernance],

As per coding guidelines, "Do NOT hardcode text, use the translation hooks/functions instead for user-facing UI".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/app/routes/dashboard/`$id/settings/governance.tsx around
lines 58 - 62, The tab label is hardcoded as "Retention"; replace it with a
localized string by calling the project's translation function/hook used in this
module (e.g., useTranslations or t) and pass a key like
"governance.tabs.retention"; update the tab entry where value: 'retention' and
content: <RetentionEditor organizationId={organizationId} /> to use the
translated label instead of the literal "Retention" so the UI respects
localization.

Comment on lines +29 to +38
vi.mock('../../lib/rate_limiter/helpers', () => ({
checkOrganizationRateLimit: vi.fn(),
RateLimitExceededError: class extends Error {},
}));

vi.mock('../../_generated/api', () => ({
internal: {
governance: { retention_cleanup: { runRetentionCleanup: 'mock' } },
},
}));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add assertions for the new rate-limit and scheduling branches.
The test setup was updated, but there are no assertions that checkOrganizationRateLimit and ctx.scheduler.runAfter are invoked correctly, nor tests for RateLimitExceededError suppression vs non-rate-limit error propagation.

Suggested test additions
+ it('schedules retention cleanup when under the limit', async () => {
+   // assert checkOrganizationRateLimit(ctx, 'cleanup:retention', organizationId)
+   // assert ctx.scheduler.runAfter(0, internal.governance.retention_cleanup.runRetentionCleanup, {})
+ });
+
+ it('swallows RateLimitExceededError and still succeeds', async () => {
+   // mock checkOrganizationRateLimit to throw RateLimitExceededError
+   // assert mutation returns inserted/updated id and does not throw
+ });
+
+ it('rethrows non-rate-limit failures from scheduling path', async () => {
+   // mock runAfter to throw Error('boom')
+   // assert mutation rejects
+ });

As per coding guidelines, "Tests should cover happy paths, edge cases, and error conditions".

Also applies to: 52-54

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/file_metadata/__tests__/internal_mutations.test.ts`
around lines 29 - 38, Tests lack assertions for the new rate-limit and scheduler
branches: add expectations that the mocked checkOrganizationRateLimit (from
'../../lib/rate_limiter/helpers') is called with the correct org/context
arguments and that ctx.scheduler.runAfter is invoked with the expected delay and
payload when scheduling occurs; additionally add two error-path tests using the
mocked RateLimitExceededError to assert that this specific error is suppressed
(no throw, and scheduling/rate-limit behavior as expected) while a
non-rate-limit error thrown from checkOrganizationRateLimit still propagates
(test should expect a thrown error). Reference the mocked symbols
checkOrganizationRateLimit, RateLimitExceededError and ctx.scheduler.runAfter in
your assertions and create separate test cases for the suppressed-rate-limit
branch and the non-rate-limit error branch.

Comment on lines +8 to +20
source: v.optional(v.union(v.literal('user'), v.literal('agent'))),
fileName: v.string(),
contentType: v.string(),
size: v.number(),
})
.index('by_organizationId', ['organizationId'])
.index('by_storageId', ['storageId'])
.index('by_organizationId_and_documentId', ['organizationId', 'documentId']);
.index('by_organizationId_and_documentId', ['organizationId', 'documentId'])
.index('by_organizationId_and_source_and_documentId', [
'organizationId',
'source',
'documentId',
]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect source-scoped cleanup query usage:"
rg -n "listExpiredTempFiles|by_organizationId_and_source_and_documentId|source: 'user'|source: 'agent'|documentId" \
  services/platform/convex/governance/internal_queries.ts \
  services/platform/convex/governance/retention_cleanup.ts \
  services/platform/convex/file_metadata/schema.ts

echo
echo "Search for backfill/migration logic setting fileMetadata.source:"
rg -n "fileMetadata.*source|backfill|migration|migrate" services/platform/convex

Repository: tale-project/tale

Length of output: 14181


🏁 Script executed:

cat -n services/platform/convex/governance/internal_queries.ts | sed -n '53,75p'

Repository: tale-project/tale

Length of output: 943


🏁 Script executed:

cat -n services/platform/convex/governance/retention_cleanup.ts | sed -n '85,130p'

Repository: tale-project/tale

Length of output: 1984


Legacy rows without source will not be cleaned up by temporary file retention logic.

The listExpiredTempFiles query filters by .eq('source', args.source) where source is strictly 'user' or 'agent'. Existing fileMetadata rows with source: undefined (created before this schema change) cannot match either cleanup call and will be skipped. No migration backfills the source field, and no fallback logic handles undefined source, causing legacy temporary files to accumulate indefinitely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/file_metadata/schema.ts` around lines 8 - 20, Legacy
fileMetadata rows with source undefined are skipped by the cleanup because
listExpiredTempFiles filters with .eq('source', args.source) expecting
'user'|'agent'; update the cleanup and schema handling by either adding a
migration to backfill missing source values in fileMetadata (set a sensible
default like 'user' or infer from existing fields) and/or make
listExpiredTempFiles tolerate undefined by changing the query to include rows
where source is null/undefined (e.g., OR the predicate to match args.source OR
source IS_NULL/UNDEFINED) so legacy rows are returned; locate the query named
listExpiredTempFiles and the fileMetadata schema (fields: source, fileName,
contentType, size and indices by_organizationId, by_storageId,
by_organizationId_and_documentId, by_organizationId_and_source_and_documentId)
and implement the migration or broadened filter accordingly.

Comment on lines +38 to +44
const metadata = await ctx.db.get(args.fileMetadataId);
if (!metadata) {
return null;
}

await deleteStorageWithMetadata(ctx, metadata.storageId);
return null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add a defensive invariant to avoid accidental deletion of document-backed files.
deleteExpiredTempFile currently deletes storage for any fileMetadataId. Guarding metadata.documentId === undefined prevents accidental misuse from deleting non-temp document storage.

Suggested hardening
   const metadata = await ctx.db.get(args.fileMetadataId);
   if (!metadata) {
     return null;
   }
+  if (metadata.documentId !== undefined) {
+    return null;
+  }

   await deleteStorageWithMetadata(ctx, metadata.storageId);
   return null;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const metadata = await ctx.db.get(args.fileMetadataId);
if (!metadata) {
return null;
}
await deleteStorageWithMetadata(ctx, metadata.storageId);
return null;
const metadata = await ctx.db.get(args.fileMetadataId);
if (!metadata) {
return null;
}
if (metadata.documentId !== undefined) {
return null;
}
await deleteStorageWithMetadata(ctx, metadata.storageId);
return null;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/governance/internal_mutations_retention.ts` around
lines 38 - 44, In deleteExpiredTempFile, after retrieving metadata via
ctx.db.get(args.fileMetadataId) add a defensive guard that only calls
deleteStorageWithMetadata when metadata.documentId is undefined; if
metadata.documentId is set, return null (or skip deletion) to prevent deleting
document-backed storage—i.e., check metadata.documentId before invoking
deleteStorageWithMetadata and short-circuit when it exists.

Comment on lines +85 to 147
// 2. User temporary file cleanup
if (config.userTempEnabled) {
const hours =
config.userTempRetentionHours ?? DEFAULT_TEMP_RETENTION_HOURS;
const cutoffMs = Date.now() - hours * 60 * 60 * 1000;

for (const doc of expiredDocs) {
if (doc.fileId) {
const expiredFiles = await ctx.runQuery(
internal.governance.internal_queries.listExpiredTempFiles,
{ organizationId, source: 'user', cutoffMs, batchSize },
);

for (const file of expiredFiles) {
try {
await fetch(
`${ragUrl}/api/v1/documents/${encodeURIComponent(doc.fileId)}`,
`${ragUrl}/api/v1/documents/${encodeURIComponent(file.storageId)}`,
{ method: 'DELETE', signal: AbortSignal.timeout(30000) },
);
} catch (error) {
console.warn(
`[RetentionCleanup] Failed to delete RAG entry for document ${doc._id}:`,
`[RetentionCleanup] Failed to delete RAG entry for temp file ${file._id}:`,
error,
);
}

await ctx.runMutation(
internal.governance.internal_mutations_retention
.deleteExpiredTempFile,
{ fileMetadataId: file._id },
);
}
}

await ctx.runMutation(
internal.governance.internal_mutations_retention
.deleteExpiredDocument,
{ documentId: doc._id },
// 3. Agent temporary file cleanup
if (config.agentTempEnabled) {
const hours =
config.agentTempRetentionHours ?? DEFAULT_TEMP_RETENTION_HOURS;
const cutoffMs = Date.now() - hours * 60 * 60 * 1000;

const expiredFiles = await ctx.runQuery(
internal.governance.internal_queries.listExpiredTempFiles,
{ organizationId, source: 'agent', cutoffMs, batchSize },
);

for (const file of expiredFiles) {
try {
await fetch(
`${ragUrl}/api/v1/documents/${encodeURIComponent(file.storageId)}`,
{ method: 'DELETE', signal: AbortSignal.timeout(30000) },
);
} catch (error) {
console.warn(
`[RetentionCleanup] Failed to delete RAG entry for temp file ${file._id}:`,
error,
);
}

await ctx.runMutation(
internal.governance.internal_mutations_retention
.deleteExpiredTempFile,
{ fileMetadataId: file._id },
);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider extracting temp file cleanup into a helper.

The user and agent temp file cleanup blocks are nearly identical. A helper function could reduce duplication:

async function cleanupTempFiles(
  ctx: ActionCtx,
  ragUrl: string,
  organizationId: string,
  source: 'user' | 'agent',
  hours: number,
  batchSize: number,
) { /* shared logic */ }

However, the current explicit structure is readable and acceptable for a cleanup action.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/governance/retention_cleanup.ts` around lines 85 -
147, The two nearly identical blocks for user and agent temp file cleanup should
be extracted into a shared helper to remove duplication; add a function like
async cleanupTempFiles(ctx: ActionCtx, ragUrl: string, organizationId: string,
source: 'user' | 'agent', hours: number, batchSize: number) that encapsulates
calling internal.governance.internal_queries.listExpiredTempFiles, iterating
expiredFiles, deleting the rag document via
fetch(`${ragUrl}/api/v1/documents/${encodeURIComponent(file.storageId)}`, {
method: 'DELETE', signal: AbortSignal.timeout(30000) }) with the same try/catch
logging, and calling
internal.governance.internal_mutations_retention.deleteExpiredTempFile for each
file, then replace both blocks to call cleanupTempFiles(...) using
config.userTempRetentionHours/config.agentTempRetentionHours and the existing
flags config.userTempEnabled/config.agentTempEnabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant