optimize rag action by larryro · Pull Request #4 · tale-project/tale

larryro · 2025-12-03T10:26:24Z

Summary by CodeRabbit

Release Notes

Improvements
- Updated default scheduling for Document RAG Sync, Product RAG Sync, Website Pages RAG Sync, and Customer RAG Sync from hourly to every 20 minutes
- Standardized workflow processing terminology: renamed document-related fields to record-based identifiers for consistency across workflow execution and processing APIs
- Simplified RAG upload action parameters—now accepts recordId instead of multiple document fields
- Enhanced workflow execution tracking with root workflow definition reference

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ories - Move action helper files and types into dedicated /helpers subdirectories across multiple workflow action modules: - approval/ - conversation/ - crawler/ - imap/ - integration/ - rag/ - website/ - websitePages/ - workflow/ - workflow_processing_records/ - Simplify RAG action interface: - Replace documentId/organizationId params with single recordId - Remove forceReupload and includeMetadata options - Remove explicit timeout parameters from workflow definitions - Update upload_text and upload_document operations accordingly - Update predefined workflow definitions to use new recordId parameter: - customer_rag_sync.ts - document_rag_sync.ts - product_rag_sync.ts - website_pages_rag_sync.ts - Update all import paths across IMAP library files and action modules - Remove obsolete README documentation files (imap, product, rag) - Regenerate Convex API type definitions

- Rename `documentId` to `recordId` across workflow processing records - Rename `documentCreationTime` to `recordCreationTime` - Rename `workflowId` to `wfDefinitionId` for consistency with schema - Add `rootWfDefinitionId` to wfExecutions for tracking workflow family versions - Update schema indexes: `by_document` -> `by_record`, `by_org_table_workflow_*` -> `by_org_table_wfDefinition_*` - Update predefined workflows to use `{{rootWfDefinitionId}}` instead of `{{workflowId}}` - Change default RAG sync schedules from hourly to every 20 minutes - Remove unused `create_execution.ts` and related types - Rename helper function `isDocumentProcessed` to `isRecordProcessed`

coderabbitai · 2025-12-03T10:31:00Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR systematically refactors the workflow processing and execution system by renaming fields from document-centric to record-centric terminology (documentId → recordId, documentCreationTime → recordCreationTime), renaming workflowId to wfDefinitionId, and introducing rootWfDefinitionId to track workflow versions. It reorganizes helper modules into ./helpers/ subdirectories across multiple action types, refactors RAG upload APIs from multi-parameter to object-based signatures, removes the createExecution function, updates database schema indexes to reflect record terminology, and adjusts default RAG sync workflow schedules from hourly to 20 minutes. Import paths throughout are updated to accommodate the new directory structure.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Specific areas requiring attention:

RAG action refactoring (rag_action.ts, upload_file_direct.ts, upload_text_document.ts): Signature changes from multi-parameter to object-based API, metadata field restructuring, and recordId propagation logic need verification across all call sites
Schema index renames (schema.ts): Verify all index reference updates (by_org_table_workflow → by_org_table_wfDefinition, by_document → by_record) are correctly reflected throughout model files and query builders
Workflow execution changes (start_workflow_handler.ts, initialize_execution_variables.ts): rootWfDefinitionId introduction and propagation into execution payload and variables must be traced through the entire workflow lifecycle
Predefined workflows (10+ files): Consistent parameter updates across all workflow definitions—verify wfDefinitionId/recordId substitutions match the new action reference signatures
Helper module relocations: Multiple files moved to helpers/ subdirectories with corresponding import path updates—verify all references are correctly updated and no circular dependencies introduced
Removed function impacts (createExecution removal): Verify all callers have been updated and the execution creation logic has been properly relocated/refactored

Possibly related PRs

tale-project/poc2#354: Directly modifies the same ACTION_REFERENCE metadata for record_processed operation with matching field renamings (documentId/workflowId → recordId/wfDefinitionId)
tale-project/poc2#447: Parallel coordinated changes to workflow identifiers and APIs, updating workflow start/processing helpers with wfDefinitionId and record terminology throughout
tale-project/poc2#393: Overlapping RAG upload API refactoring with identical parameter shape changes (documentId → recordId) and uploadTextDocument/uploadFileDirect signature updates

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 3c13944 and e2378b4.

⛔ Files ignored due to path filters (1)

services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**

📒 Files selected for processing (97)

services/platform/convex/agent_tools/convex_tools/workflows/action_reference.ts (1 hunks)
services/platform/convex/lib/create_workflow_agent.ts (1 hunks)
services/platform/convex/lib/variables/jexl_instance.ts (1 hunks)
services/platform/convex/model/organizations/save_default_workflows.ts (3 hunks)
services/platform/convex/model/wf_executions/create_execution.ts (0 hunks)
services/platform/convex/model/wf_executions/index.ts (0 hunks)
services/platform/convex/model/wf_executions/types.ts (0 hunks)
services/platform/convex/model/workflow_processing_records/find_product_recommendation_by_status.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/find_unprocessed.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/find_unprocessed_open_conversation.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/helpers/find_unprocessed_with_custom_query.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/helpers/get_latest_processed_creation_time.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/helpers/is_document_processed.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/helpers/run_query.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/index.ts (1 hunks)
services/platform/convex/model/workflow_processing_records/record_processed.ts (3 hunks)
services/platform/convex/model/workflow_processing_records/types.ts (1 hunks)
services/platform/convex/node_only/imap/lib/build_email_message.ts (1 hunks)
services/platform/convex/node_only/imap/lib/collect_thread_message_ids.ts (1 hunks)
services/platform/convex/node_only/imap/lib/fetch_and_parse_message.ts (1 hunks)
services/platform/convex/node_only/imap/lib/fetch_email_by_uid.ts (1 hunks)
services/platform/convex/node_only/imap/lib/fetch_messages_from_search_results.ts (1 hunks)
services/platform/convex/node_only/imap/lib/find_root_message.ts (1 hunks)
services/platform/convex/node_only/imap/lib/search_thread_messages.ts (1 hunks)
services/platform/convex/node_only/imap/retrieve_imap_emails.ts (1 hunks)
services/platform/convex/predefined_workflows/conversation_auto_reply.ts (2 hunks)
services/platform/convex/predefined_workflows/customer_rag_sync.ts (3 hunks)
services/platform/convex/predefined_workflows/document_rag_sync.ts (3 hunks)
services/platform/convex/predefined_workflows/general_customer_status_assessment.ts (2 hunks)
services/platform/convex/predefined_workflows/general_product_recommendation.ts (3 hunks)
services/platform/convex/predefined_workflows/loopi_customer_status_assessment.ts (2 hunks)
services/platform/convex/predefined_workflows/loopi_product_recommendation.ts (3 hunks)
services/platform/convex/predefined_workflows/onedrive_sync.ts (2 hunks)
services/platform/convex/predefined_workflows/product_rag_sync.ts (3 hunks)
services/platform/convex/predefined_workflows/product_recommendation_email.ts (2 hunks)
services/platform/convex/predefined_workflows/product_relationship_analysis.ts (2 hunks)
services/platform/convex/predefined_workflows/website_pages_rag_sync.ts (3 hunks)
services/platform/convex/schema.ts (2 hunks)
services/platform/convex/workflow/actions/approval/approval_action.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/create_approval.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/get_approval.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/get_approval_history.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/list_approvals_for_execution.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/list_pending_approvals.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/list_pending_approvals_for_execution.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/types.ts (1 hunks)
services/platform/convex/workflow/actions/approval/helpers/update_approval_status.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/conversation_action.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/add_message_to_conversation.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/build_conversation_metadata.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/build_email_metadata.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/build_initial_message.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/create_conversation.ts (2 hunks)
services/platform/convex/workflow/actions/conversation/helpers/create_conversation_from_email.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/create_conversation_from_sent_email.ts (2 hunks)
services/platform/convex/workflow/actions/conversation/helpers/find_or_create_customer_from_email.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/find_related_conversation.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/get_conversation_by_id.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/query_conversation_messages.ts (2 hunks)
services/platform/convex/workflow/actions/conversation/helpers/query_conversations.ts (2 hunks)
services/platform/convex/workflow/actions/conversation/helpers/query_latest_message_by_delivery_state.ts (2 hunks)
services/platform/convex/workflow/actions/conversation/helpers/types.ts (1 hunks)
services/platform/convex/workflow/actions/conversation/helpers/update_conversations.ts (2 hunks)
services/platform/convex/workflow/actions/conversation/helpers/update_message.ts (1 hunks)
services/platform/convex/workflow/actions/crawler/crawler_action.ts (1 hunks)
services/platform/convex/workflow/actions/crawler/helpers/types.ts (1 hunks)
services/platform/convex/workflow/actions/imap/README.md (0 hunks)
services/platform/convex/workflow/actions/imap/helpers/get_imap_credentials.ts (1 hunks)
services/platform/convex/workflow/actions/imap/helpers/types.ts (1 hunks)
services/platform/convex/workflow/actions/imap/imap_action.ts (1 hunks)
services/platform/convex/workflow/actions/integration/helpers/build_secrets_from_integration.ts (1 hunks)
services/platform/convex/workflow/actions/integration/integration_action.ts (1 hunks)
services/platform/convex/workflow/actions/product/README.md (0 hunks)
services/platform/convex/workflow/actions/rag/README.md (0 hunks)
services/platform/convex/workflow/actions/rag/helpers/get_document_info.ts (4 hunks)
services/platform/convex/workflow/actions/rag/helpers/get_rag_config.ts (1 hunks)
services/platform/convex/workflow/actions/rag/helpers/types.ts (3 hunks)
services/platform/convex/workflow/actions/rag/helpers/upload_file_direct.ts (4 hunks)
services/platform/convex/workflow/actions/rag/helpers/upload_text_document.ts (3 hunks)
services/platform/convex/workflow/actions/rag/rag_action.ts (3 hunks)
services/platform/convex/workflow/actions/website/website_action.ts (1 hunks)
services/platform/convex/workflow/actions/websitePages/helpers/types.ts (1 hunks)
services/platform/convex/workflow/actions/websitePages/websitePages_action.ts (1 hunks)
services/platform/convex/workflow/actions/workflow/helpers/types.ts (1 hunks)
services/platform/convex/workflow/actions/workflow/helpers/upload_workflows.ts (4 hunks)
services/platform/convex/workflow/actions/workflow/workflow_action.ts (1 hunks)
services/platform/convex/workflow/actions/workflow_processing_records/helpers/find_product_recommendation_by_status.ts (2 hunks)
services/platform/convex/workflow/actions/workflow_processing_records/helpers/find_unprocessed.ts (3 hunks)
services/platform/convex/workflow/actions/workflow_processing_records/helpers/find_unprocessed_open_conversation.ts (1 hunks)
services/platform/convex/workflow/actions/workflow_processing_records/helpers/record_processed.ts (1 hunks)
services/platform/convex/workflow/actions/workflow_processing_records/helpers/types.ts (2 hunks)
services/platform/convex/workflow/actions/workflow_processing_records/record_processed.ts (0 hunks)
services/platform/convex/workflow/actions/workflow_processing_records/workflow_processing_records_action.ts (8 hunks)
services/platform/convex/workflow/helpers/engine/start_workflow_handler.ts (2 hunks)
services/platform/convex/workflow/helpers/step_execution/initialize_execution_variables.ts (1 hunks)
services/platform/convex/workflow/helpers/step_execution/types.ts (1 hunks)
services/platform/convex/workflow_processing_records.ts (5 hunks)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…volos) Replaces the runtime patch_sops.py monkey-patch with an in-source change. Reads existing *.secrets.json as plain JSON (seeded by tale-init.sh from OPENROUTER_API_KEY) with OPENROUTER_API_KEY / OPENAI_API_KEY env fallback; saveProviderSecret writes plain JSON instead of SOPS-encrypting. Also documented in FORK.md as patch tale-project#4.

Round-2-confirmed critical bug. The docstring on `requestErasure` advertised `LEGAL_HOLD_BLOCKS_ERASURE` as a refusal code; the body had `const blocked: string[] = []` hard-coded as a "phase-8 placeholder" with `loadActiveHolds` commented out. Result: an org admin (or anyone with a compromised admin token) could spoliate litigation-hold data via this mutation, and the audit row would record `threadsBlocked: []` — actively misleading. Direct GDPR Art 17(3)(e) and FRCP 37(e) violation. `loadActiveHolds` ships in the same branch (legal_hold.ts:65) and is already consumed by retention_cleanup; only erasure ignored it. This commit: - Calls `loadActiveHolds` and partitions the subject's threads into held vs erasable. - Fails the whole request closed when `holds.orgHeld === true` or any of the subject's threads is held. Emits a `gdpr_erasure_blocked_by_hold` audit row before throwing so the refusal itself is logged in the tamper-evident chain (compliance teams need this). - Throws `ConvexError({ code: 'LEGAL_HOLD_BLOCKS_ERASURE', orgHeld, heldThreadIds })` matching the docstring contract. - Tracks per-thread cascade completion explicitly. The previous loop ran cascadeDeleteThreadChildren up to 50 times then incremented `erased` regardless of outcome; a thread with >50 pages of children was reported as erased while leaving residue. Now only counts threads whose cascade actually returned `done: true`, and emits the audit row with `status: 'failure'` + an explicit error message when any thread was partial. - Audit row records `heldThreadIds` (when present) and `threadsTargeted` (so reviewers can spot partial erasures). Out-of-scope here (separate work-streams): - Sub-thread cascade hold check (W1 #3 — cascade_helpers). - Snapshot-race fix in deleteExpired* mutations (W1 #4). - Subject scope expansion (W2: userMemories, BetterAuth, audit-PII scrub, RAG propagation, gdprErasureRequests state machine).

…ascade Round-1 reviewer-confirmed: 12 of 14 retention cleanup categories did not consult holds.orgHeld, and cascadeDeleteThreadChildren recursed into sub-threads with no hold check. Both let a litigation/preservation hold be silently bypassed for the table that records *why* the hold exists (auditLogs), every PII-bearing table, and any held sub-thread when its parent ages out — direct US FRCP 37(e) / EU GDPR Art 21 spoliation risk. retention_cleanup.ts (W1 #2) - Added `holds: ActiveHolds` to every cleanup category that lacked it: cleanupTempFiles (user + agent), cleanupAuditLogs, cleanupWorkflowLogs, cleanupUsageLedger, cleanupChatFilterEvents, cleanupPromptTemplates, cleanupMessageFeedback, cleanupMemoryAudit, cleanupCustomers, cleanupVendors, cleanupExternalConversations, cleanupMessageMetadata. - Each short-circuits with a clear info log when holds.orgHeld is true. - cleanupWorkflowLogs additionally consults holds.executionIds for per-execution holds (`targetType: 'execution'`) — until now those rows were silently ignored by cleanup. - The dispatcher's category list updated to thread `holds` to all 15 category invocations. cascade_helpers.ts (W1 #3 + partial W1 #4) - cascadeDeleteThreadChildren accepts an optional `holds` snapshot. If omitted AND organizationId is set, the helper re-loads `legalHolds` itself — closing the snapshot-once race where a hold placed mid-run provided zero protection because the dispatcher's pre-fetched Set was already stale by the time per-thread cascade fired. - Sub-thread recursion now passes the snapshot through, so the per-sub-thread hold check uses the same authoritative read. - The helper returns `{ done: true, remaining: 0 }` (no-op) on a held thread; callers (retention / erasure) treat that as "skip and continue" rather than "delete completed".

…OCTOU Adds assertSafeRetentionDelete to internal_mutations_retention.ts and threads the new cutoffMs through every retention dispatcher call site. The guard runs in-mutation (V8 transaction) and closes three round-2 findings simultaneously: - W1 #4 — snapshot-race vs legal hold. The dispatcher's loadActiveHoldsForOrg snapshot is up to 25 min stale; a hold placed AFTER snapshot but BEFORE the per-row mutation otherwise had zero protection. Re-reading inside each mutation makes the in-mutation read the only authoritative gate. - W6 #10 — cross-org corruption. Every deleteExpired* trusted args.organizationId blindly; a swapped id silently deleted org A's row while logging the audit under org B (and forking the per-org hash chain). Now we assert row.organizationId === args.organizationId. - W4 #13 — cutoff TOCTOU. Between the dispatcher's listExpired query and the per-row delete, a user can re-touch the row (chat thread.updatedAt bump, document patch). The mutation now re-evaluates cutoff against (updatedAt ?? _creationTime) and skips when the row is no longer eligible. Mutations updated (each accepts new optional cutoffMs): - deleteExpiredDocument (targetType: 'document') - deleteExpiredThread (targetType: 'thread') - deleteExpiredWorkflowExecution (targetType: 'execution') - deleteExpiredWorkflowTriggerLog (orgId + cutoff only) - deleteExpiredCustomer / Vendor / ExternalConversation / PromptTemplate / MessageFeedback / MemoryAuditRow / ChatFilterEvent / UsageLedgerRow (orgId + cutoff) Out-of-scope here: - deleteExpiredTempFile, deleteExpiredMessageMetadata, deleteExpiredTwoFactorAttempt, deleteExpiredLoginAttempt, deleteExpiredLoginBlockCounter — none of these carry organizationId on their args today (they're either cross-org-by-shape or have an indirect link). The orgHeld / cross-org checks are not applicable without an org backreference, which is itself a separate phase-10 follow-up. Their existing absence-of-row idempotence is preserved.

…eout Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs + /redoc unauthenticated, RAG container ran as root, default token baked into image ENV, strict-mode env name diverged across the wire, non-constant-time token compare, plus three SSRF-guard gaps. services/rag/app/auth.py - W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes the dead-code EXEMPT_PATHS frozenset. services/rag/app/routers/health.py - W7 #1: split into public_router (`/`, `/health`) and protected_router (`/config`). main.py mounts the protected one under Depends(verify_internal_token). Old `router` re-export stays for backwards compat. services/rag/app/main.py - W7 #2: docs_url / redoc_url / openapi_url are None outside debug. - W7 #4: CORS allow_credentials flipped to False (bearer rides Authorization, never cookies). - W7 #1 wiring: mount health-public + health-protected separately. services/rag/app/config.py - W7 #8: require_custom_internal_token accepts BOTH RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN via pydantic AliasChoices. services/rag/Dockerfile + services/convex/Dockerfile - W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`). RAG ingests untrusted PDFs/DOCX through native parsers; biggest blast radius in the stack, now hardened. - W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from both runtime + scratch-squash stages and the matching bake in services/convex/Dockerfile. Operators MUST supply via env / compose / k8s secret. services/platform/convex/lib/helpers/rag_config.ts - W7 #9 F1: `redirect: 'manual'` on every ragFetch. - W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2). - W7 #9 F3: strip trailing `.` before hostname blocklist lookup. - W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding + env rotation mitigation). - W7 #9 F9: deleted path.startsWith('http') override branch (future- bypass foot-gun). services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts - W7 #10: pass timeoutMs=60_000 (default 10s was a regression). - Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to defend against an adversarial RAG response.

… rag propagation W2 — completes the round-2-confirmed gaps in requestErasure: - v11 C2: erasure was a mutation, structurally unable to call the RAG service. Vector chunks + indexed text persisted after a "successful" erasure receipt. - v11 C2/C3: bounded loop reported partial cascades as success; receipt lied about coverage. No durable receipt for the subject (Art 19). - v12 H3: single-mutation cascade hit Convex transaction limits. Schema (gdprErasureRequestsTable) - New per-request state machine: pending → running → done | partial | failed. Carries threadsTargeted snapshot, threadsErased counter, threadsBlockedByHold, ragDocumentsRemoved, slaDeadlineAt (= requestedAt + 30 days per Art 12(3)), startedAt / completedAt. erasure.ts — split-phase pipeline - Public mutation requestErasure: auth + org-admin gate + legal-hold gate (refusal also audit-logged) + inserts request row + schedules the processor + returns { requestId, threadsTargeted } so the admin can poll the row for progress. - Internal action processErasureRequest: flips row to 'running', cascades each thread via cascadeDeleteThreadChildren (up to 50 pages per thread), propagates to RAG via ragFetch DELETE per subject-owned document, then finalizeProcessing transitions the row to done | partial | failed with explicit counts. - Three supporting internal mutations: beginProcessing, eraseThreadById, finalizeProcessing, listSubjectDocuments. Subject scope - Today: chat threads + their RAG-indexed descendants (departing- employee case). - Out of scope (documented in file header): userMemories, userPreferences, feedback, fileMetadata blobs, audit-chain PII scrub (W3 #4), BetterAuth tables, loginAttempts / twoFactorAttempts, policyAcknowledgements. Receipt = the gdprErasureRequests row. The admin UI can render the full state machine without scraping audit logs; SLA deadline is durable.

Two work-streams in one commit because they share the erasure processor's per-table cleanup path. Subject-scope expansion (W2 follow-up) - Adds eight per-table internal mutations called sequentially after the chat-thread cascade + RAG propagation: * eraseSubjectUserMemories * eraseSubjectUserPreferences * eraseSubjectMessageFeedback * eraseSubjectFileMetadata (deletes _storage blobs too) * eraseSubjectUsageLedger * eraseSubjectTwoFactorAttempts * eraseSubjectPolicyAcknowledgements * eraseSubjectOnedrive - loginAttempts / loginBlockCounters are email-keyed; new lookupSubjectEmail mutation reads the BetterAuth user row first, then eraseSubjectLoginAttempts wipes both tables. - BetterAuth `user`/`account`/`session`/`verification` rows are intentionally NOT touched here — the auth component owns its own delete-user flow and direct table manipulation could leave dangling sessions in flight. Documented as out-of-scope in the file header for a follow-up that goes through `authComponent`. Audit-chain PII scrub (W3 #4) - New `auditLogs.piiScrubbed` (+ `piiScrubbedAt`) field. The processor blanks `actorEmail`, `actorRole`, `ipAddress`, `userAgent`, `previousState`, `newState`, `metadata` on every row where `actorId === targetUserId`. Row existence + timestamp + action are preserved per Art 17(3)(b)/(e) accountability requirements. - New `auditLogCheckpoints.subtype` field discriminates 'retention' (existing) from 'pii_scrub'. The scrub mutation inserts a signed checkpoint with `scrubbedSubjectId` + `scrubbedRowCount` so an external auditor can confirm the divergence is bounded and intentional. - `verifyIntegrity` updated to recognize the boundary: rows with `piiScrubbed: true` (or whose actorId is in any pii_scrub checkpoint's subjectId set) skip the SHA-256 recompute step. Chain forward-link via `previousHash` is still validated — the scrub does not break order, only blanks the body. - `scrubSubjectAuditLogs` is in audit_logs/internal_mutations.ts so it can use the existing `signCheckpoint` HMAC helper directly. The processor finalizeProcessing audit log on a 'done' run now implicitly covers the scrub-marker checkpoint (admins reviewing the erasure see both rows in the chain). No client-callable surface added — scrub fires only as part of an authorized requestErasure flow.