Skip to content

docs: remove outdated documentation files#10

Merged
larryro merged 2 commits into
mainfrom
optimize-docs
Dec 11, 2025
Merged

docs: remove outdated documentation files#10
larryro merged 2 commits into
mainfrom
optimize-docs

Conversation

@larryro

@larryro larryro commented Dec 11, 2025

Copy link
Copy Markdown
Collaborator

Remove 33 documentation files that are no longer needed:

  • Admin setup and deployment guides
  • Workflow-related documentation (architecture, patterns, types)
  • Integration guides (OneDrive, email providers, Shopify/Circuly)
  • API documentation
  • Internal design documents

These docs have become stale and are being consolidated or deprecated.

Summary by CodeRabbit

  • Documentation
    • Removed comprehensive documentation for admin setup, deployment configuration, email providers, and integrations.
    • Removed OneDrive integration and debugging guides.
    • Removed workflow module documentation including architecture, patterns, database operations, and version management guides.
    • Removed deployment guides for database and RAG services.
    • Removed configuration and setup guides for local and self-hosted deployments.

✏️ Tip: You can customize this high-level summary in your review settings.

Remove 33 documentation files that are no longer needed:
- Admin setup and deployment guides
- Workflow-related documentation (architecture, patterns, types)
- Integration guides (OneDrive, email providers, Shopify/Circuly)
- API documentation
- Internal design documents

These docs have become stale and are being consolidated or deprecated.
@coderabbitai

coderabbitai Bot commented Dec 11, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This pull request removes 33 documentation files spanning multiple areas of the repository. Deletions include comprehensive guides for workflow architecture and patterns, email and OneDrive integration documentation, API references, deployment and setup guides, and design documentation for features like variables replacement and row-level security. No functional code is modified—only documentation is deleted.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring attention:

  • Workflow documentation (12+ files): Verify that the removal of architectural guides (architecture.md, data-spec.md, developer-guide.md, etc.) does not leave the workflow system undocumented or confuse future maintainers
  • Integration guides (OneDrive and Email): Confirm that deletion of detailed setup and debugging guides (onedrive-background-sync.md, email-api-sending.md, etc.) aligns with any migration or consolidation of integration documentation
  • Deployment and setup guides: Ensure critical setup instructions (convex-self-hosted-setup.md, tale-db-deployment.md, etc.) are not orphaned and that users have alternate documentation resources
  • Variables design document: Verify that removal of the comprehensive design doc (services/platform/convex/lib/variables/DESIGN.md) does not impede future development or maintenance of the variables system
  • Cross-references: Check whether any remaining documentation, code comments, or external docs reference the deleted files and may need updates

Possibly related PRs

  • tale-project/poc2#354: Removes workflow and AI-workflow documentation files that overlap with similar removals in this PR
  • tale-project/poc2#338: Modifies the same workflow documentation files being deleted in this PR, indicating potential coordination or documentation reorganization efforts
  • talecorp/poc2#30: Adds or updates multiple workflow documentation files that align with the scope of deletions in this PR

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 423bb04 and 22fde0f.

📒 Files selected for processing (33)
  • docs/admin_setup.md (0 hunks)
  • docs/ai-workflow-assistant.md (0 hunks)
  • docs/api/create-conversation-message.md (0 hunks)
  • docs/convex-local-setup.md (0 hunks)
  • docs/convex-self-hosted-setup.md (0 hunks)
  • docs/deployment-modes.md (0 hunks)
  • docs/email-api-sending.md (0 hunks)
  • docs/email-providers.md (0 hunks)
  • docs/integration-health-checks.md (0 hunks)
  • docs/integrations-schema.md (0 hunks)
  • docs/management-dashboard-user-guide.md (0 hunks)
  • docs/onedrive-background-sync.md (0 hunks)
  • docs/onedrive-debugging-guide.md (0 hunks)
  • docs/onedrive-integration-guide.md (0 hunks)
  • docs/onedrive-parent-directory-tracking.md (0 hunks)
  • docs/row-level-security-guide.md (0 hunks)
  • docs/tale-db-deployment.md (0 hunks)
  • docs/tale-rag-deployment.md (0 hunks)
  • docs/tone-of-voice-implementation.md (0 hunks)
  • docs/url-configuration.md (0 hunks)
  • docs/workflows/REFACTORING_SUMMARY.md (0 hunks)
  • docs/workflows/architecture.md (0 hunks)
  • docs/workflows/auto-processing-pattern.md (0 hunks)
  • docs/workflows/data-spec.md (0 hunks)
  • docs/workflows/database-operations.md (0 hunks)
  • docs/workflows/developer-guide.md (0 hunks)
  • docs/workflows/entity-finder-prompt-design.md (0 hunks)
  • docs/workflows/find-unprocessed-entities-tool.md (0 hunks)
  • docs/workflows/manual-configuration.md (0 hunks)
  • docs/workflows/version-management.md (0 hunks)
  • docs/workflows/workflow-patterns-guide.md (0 hunks)
  • docs/workflows/workflow-types.md (0 hunks)
  • services/platform/convex/lib/variables/DESIGN.md (0 hunks)
💤 Files with no reviewable changes (33)
  • docs/onedrive-parent-directory-tracking.md
  • docs/onedrive-integration-guide.md
  • docs/deployment-modes.md
  • docs/url-configuration.md
  • docs/tale-rag-deployment.md
  • docs/ai-workflow-assistant.md
  • docs/api/create-conversation-message.md
  • docs/convex-self-hosted-setup.md
  • services/platform/convex/lib/variables/DESIGN.md
  • docs/email-api-sending.md
  • docs/onedrive-debugging-guide.md
  • docs/tone-of-voice-implementation.md
  • docs/workflows/developer-guide.md
  • docs/workflows/REFACTORING_SUMMARY.md
  • docs/workflows/architecture.md
  • docs/email-providers.md
  • docs/row-level-security-guide.md
  • docs/onedrive-background-sync.md
  • docs/admin_setup.md
  • docs/management-dashboard-user-guide.md
  • docs/workflows/manual-configuration.md
  • docs/workflows/auto-processing-pattern.md
  • docs/workflows/find-unprocessed-entities-tool.md
  • docs/workflows/version-management.md
  • docs/workflows/data-spec.md
  • docs/workflows/workflow-patterns-guide.md
  • docs/workflows/workflow-types.md
  • docs/workflows/database-operations.md
  • docs/workflows/entity-finder-prompt-design.md
  • docs/convex-local-setup.md
  • docs/integrations-schema.md
  • docs/tale-db-deployment.md
  • docs/integration-health-checks.md

Comment @coderabbitai help to get the list of available commands and usage tips.

@larryro larryro merged commit 94f31c0 into main Dec 11, 2025
1 check passed
@larryro larryro deleted the optimize-docs branch December 11, 2025 01:32
larryro added a commit that referenced this pull request May 7, 2026
…OCTOU

Adds assertSafeRetentionDelete to internal_mutations_retention.ts and
threads the new cutoffMs through every retention dispatcher call site.
The guard runs in-mutation (V8 transaction) and closes three round-2
findings simultaneously:

- W1 #4 — snapshot-race vs legal hold. The dispatcher's
  loadActiveHoldsForOrg snapshot is up to 25 min stale; a hold placed
  AFTER snapshot but BEFORE the per-row mutation otherwise had zero
  protection. Re-reading inside each mutation makes the in-mutation
  read the only authoritative gate.
- W6 #10 — cross-org corruption. Every deleteExpired* trusted
  args.organizationId blindly; a swapped id silently deleted org A's
  row while logging the audit under org B (and forking the per-org
  hash chain). Now we assert row.organizationId === args.organizationId.
- W4 #13 — cutoff TOCTOU. Between the dispatcher's listExpired query
  and the per-row delete, a user can re-touch the row (chat
  thread.updatedAt bump, document patch). The mutation now re-evaluates
  cutoff against (updatedAt ?? _creationTime) and skips when the row
  is no longer eligible.

Mutations updated (each accepts new optional cutoffMs):
- deleteExpiredDocument            (targetType: 'document')
- deleteExpiredThread              (targetType: 'thread')
- deleteExpiredWorkflowExecution   (targetType: 'execution')
- deleteExpiredWorkflowTriggerLog  (orgId + cutoff only)
- deleteExpiredCustomer / Vendor / ExternalConversation /
  PromptTemplate / MessageFeedback / MemoryAuditRow /
  ChatFilterEvent / UsageLedgerRow (orgId + cutoff)

Out-of-scope here:
- deleteExpiredTempFile, deleteExpiredMessageMetadata,
  deleteExpiredTwoFactorAttempt, deleteExpiredLoginAttempt,
  deleteExpiredLoginBlockCounter — none of these carry organizationId
  on their args today (they're either cross-org-by-shape or have an
  indirect link). The orgHeld / cross-org checks are not applicable
  without an org backreference, which is itself a separate phase-10
  follow-up. Their existing absence-of-row idempotence is preserved.
larryro added a commit that referenced this pull request May 7, 2026
Three round-2-confirmed gaps in the dual-control flow.

placeLegalHold (W1 #11)
- targetType: 'org' is the nuclear "halt all retention" hold; a single
  admin placing it freezes cleanup org-wide, defeating dual-control.
  Refuse by default; deployments running a single admin opt back in via
  the existing TALE_LEGAL_HOLD_SINGLE_ADMIN_OK env. A future maker-
  checker placement flow (legalHoldPlaceRequests + approveLegalHoldPlacement)
  supersedes this; today's commit restores the contract.

approveLegalHoldRelease (W1 #12)
- 5-min minimum delay between request and approval (constant
  RELEASE_APPROVAL_MIN_DELAY_MS). Defeats the chained-call attack
  where one admin requests + a second admin instantly approves via the
  same automation flow. Skipped when self-approval is in effect (the
  escape hatch already opted out of dual-control entirely).
- Re-check that the original requester is still an org admin at
  approval time. A demoted/removed requester whose request lingered
  loses the ability to retroactively gate a destructive change.

closeLegalMatter (W1 #10 + W4 #6)
- Implements the docstring promise (round-2 v08 bonus): closing a
  matter now fans out PENDING release requests for every active hold
  linked via matterRef, leveraging the new
  by_organizationId_matterRef index. Approval still requires a
  second admin via approveLegalHoldRelease — matter-close does NOT
  auto-release, dual-control survives.
- Returns releaseRequestsFiled count so the UI can confirm the
  cascade. Audit row records linkedHolds + releaseRequestsFiled.
- Skips holds with an existing pending or approved request (idempotent
  against re-running close on a half-closed matter).
larryro added a commit that referenced this pull request May 7, 2026
…eout

Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs +
/redoc unauthenticated, RAG container ran as root, default token baked
into image ENV, strict-mode env name diverged across the wire,
non-constant-time token compare, plus three SSRF-guard gaps.

services/rag/app/auth.py
- W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes
  the dead-code EXEMPT_PATHS frozenset.

services/rag/app/routers/health.py
- W7 #1: split into public_router (`/`, `/health`) and protected_router
  (`/config`). main.py mounts the protected one under
  Depends(verify_internal_token). Old `router` re-export stays for
  backwards compat.

services/rag/app/main.py
- W7 #2: docs_url / redoc_url / openapi_url are None outside debug.
- W7 #4: CORS allow_credentials flipped to False (bearer rides
  Authorization, never cookies).
- W7 #1 wiring: mount health-public + health-protected separately.

services/rag/app/config.py
- W7 #8: require_custom_internal_token accepts BOTH
  RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN
  via pydantic AliasChoices.

services/rag/Dockerfile + services/convex/Dockerfile
- W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`).
  RAG ingests untrusted PDFs/DOCX through native parsers; biggest
  blast radius in the stack, now hardened.
- W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from
  both runtime + scratch-squash stages and the matching bake in
  services/convex/Dockerfile. Operators MUST supply via env / compose
  / k8s secret.

services/platform/convex/lib/helpers/rag_config.ts
- W7 #9 F1: `redirect: 'manual'` on every ragFetch.
- W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2).
- W7 #9 F3: strip trailing `.` before hostname blocklist lookup.
- W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding +
  env rotation mitigation).
- W7 #9 F9: deleted path.startsWith('http') override branch (future-
  bypass foot-gun).

services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts
- W7 #10: pass timeoutMs=60_000 (default 10s was a regression).
- Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to
  defend against an adversarial RAG response.
larryro added a commit that referenced this pull request May 7, 2026
Round-2 W8 batch (six independent UI / docs items, all small).

i18n keys (W8 #1)
- Add governance.retentionPolicy.group.deletionBehavior.{title,description}
  to en/de/fr — previously missing in all three locales, retention editor
  fell back to the inline English string in every locale.
- Add governance.retentionPolicy.chatFilterEvents.{title,description,
  placeholder,helper} to en/de/fr — previously the row title rendered
  the literal id "chatFilterEvents".

useTranslation namespace (W8 #2)
- use-data-classification-notice.ts now binds useTranslation('dataNotice')
  so the dataNotice.default fallback string is reachable; the prior
  default-namespace bind meant the DE/FR fallbacks were unreachable.

UI delete copy (W8 #7)
- en/de/fr.json deletePermanentMessage now matches the actual
  user-visible behavior (Trash + grace + retention) instead of saying
  "permanently deleted" while the code soft-trashes.

Stale test (W8 #10)
- delete_chat_thread.test.ts was asserting the legacy
  `{ status: 'deleted' }` patch shape; the helper now writes
  `{ status: 'trashed', statusChangedAt: <ms> }` in user-trash mode.

Docs cron + maker-checker (W8 #8)
- docs/{en,de,fr}/self-hosted/configuration/retention.md: cron 03:00
  UTC → 04:00 UTC (matches crons.ts), plus a sibling note about the
  new 01:00 UTC effectReleasesOnly cron.
- en docs: replace the non-existent `releaseLegalHold` reference with
  the correct dual-control flow names plus the new
  RELEASE_APPROVAL_MIN_DELAY_MS, ORG_HOLD_REQUIRES_DUAL_CONTROL, and
  closeLegalMatter fan-out behavior.
- retention_cleanup.ts stale "03:00 UTC" comment fixed.
larryro added a commit that referenced this pull request May 8, 2026
…OCTOU

Adds assertSafeRetentionDelete to internal_mutations_retention.ts and
threads the new cutoffMs through every retention dispatcher call site.
The guard runs in-mutation (V8 transaction) and closes three round-2
findings simultaneously:

- W1 #4 — snapshot-race vs legal hold. The dispatcher's
  loadActiveHoldsForOrg snapshot is up to 25 min stale; a hold placed
  AFTER snapshot but BEFORE the per-row mutation otherwise had zero
  protection. Re-reading inside each mutation makes the in-mutation
  read the only authoritative gate.
- W6 #10 — cross-org corruption. Every deleteExpired* trusted
  args.organizationId blindly; a swapped id silently deleted org A's
  row while logging the audit under org B (and forking the per-org
  hash chain). Now we assert row.organizationId === args.organizationId.
- W4 #13 — cutoff TOCTOU. Between the dispatcher's listExpired query
  and the per-row delete, a user can re-touch the row (chat
  thread.updatedAt bump, document patch). The mutation now re-evaluates
  cutoff against (updatedAt ?? _creationTime) and skips when the row
  is no longer eligible.

Mutations updated (each accepts new optional cutoffMs):
- deleteExpiredDocument            (targetType: 'document')
- deleteExpiredThread              (targetType: 'thread')
- deleteExpiredWorkflowExecution   (targetType: 'execution')
- deleteExpiredWorkflowTriggerLog  (orgId + cutoff only)
- deleteExpiredCustomer / Vendor / ExternalConversation /
  PromptTemplate / MessageFeedback / MemoryAuditRow /
  ChatFilterEvent / UsageLedgerRow (orgId + cutoff)

Out-of-scope here:
- deleteExpiredTempFile, deleteExpiredMessageMetadata,
  deleteExpiredTwoFactorAttempt, deleteExpiredLoginAttempt,
  deleteExpiredLoginBlockCounter — none of these carry organizationId
  on their args today (they're either cross-org-by-shape or have an
  indirect link). The orgHeld / cross-org checks are not applicable
  without an org backreference, which is itself a separate phase-10
  follow-up. Their existing absence-of-row idempotence is preserved.
larryro added a commit that referenced this pull request May 8, 2026
Three round-2-confirmed gaps in the dual-control flow.

placeLegalHold (W1 #11)
- targetType: 'org' is the nuclear "halt all retention" hold; a single
  admin placing it freezes cleanup org-wide, defeating dual-control.
  Refuse by default; deployments running a single admin opt back in via
  the existing TALE_LEGAL_HOLD_SINGLE_ADMIN_OK env. A future maker-
  checker placement flow (legalHoldPlaceRequests + approveLegalHoldPlacement)
  supersedes this; today's commit restores the contract.

approveLegalHoldRelease (W1 #12)
- 5-min minimum delay between request and approval (constant
  RELEASE_APPROVAL_MIN_DELAY_MS). Defeats the chained-call attack
  where one admin requests + a second admin instantly approves via the
  same automation flow. Skipped when self-approval is in effect (the
  escape hatch already opted out of dual-control entirely).
- Re-check that the original requester is still an org admin at
  approval time. A demoted/removed requester whose request lingered
  loses the ability to retroactively gate a destructive change.

closeLegalMatter (W1 #10 + W4 #6)
- Implements the docstring promise (round-2 v08 bonus): closing a
  matter now fans out PENDING release requests for every active hold
  linked via matterRef, leveraging the new
  by_organizationId_matterRef index. Approval still requires a
  second admin via approveLegalHoldRelease — matter-close does NOT
  auto-release, dual-control survives.
- Returns releaseRequestsFiled count so the UI can confirm the
  cascade. Audit row records linkedHolds + releaseRequestsFiled.
- Skips holds with an existing pending or approved request (idempotent
  against re-running close on a half-closed matter).
larryro added a commit that referenced this pull request May 8, 2026
…eout

Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs +
/redoc unauthenticated, RAG container ran as root, default token baked
into image ENV, strict-mode env name diverged across the wire,
non-constant-time token compare, plus three SSRF-guard gaps.

services/rag/app/auth.py
- W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes
  the dead-code EXEMPT_PATHS frozenset.

services/rag/app/routers/health.py
- W7 #1: split into public_router (`/`, `/health`) and protected_router
  (`/config`). main.py mounts the protected one under
  Depends(verify_internal_token). Old `router` re-export stays for
  backwards compat.

services/rag/app/main.py
- W7 #2: docs_url / redoc_url / openapi_url are None outside debug.
- W7 #4: CORS allow_credentials flipped to False (bearer rides
  Authorization, never cookies).
- W7 #1 wiring: mount health-public + health-protected separately.

services/rag/app/config.py
- W7 #8: require_custom_internal_token accepts BOTH
  RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN
  via pydantic AliasChoices.

services/rag/Dockerfile + services/convex/Dockerfile
- W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`).
  RAG ingests untrusted PDFs/DOCX through native parsers; biggest
  blast radius in the stack, now hardened.
- W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from
  both runtime + scratch-squash stages and the matching bake in
  services/convex/Dockerfile. Operators MUST supply via env / compose
  / k8s secret.

services/platform/convex/lib/helpers/rag_config.ts
- W7 #9 F1: `redirect: 'manual'` on every ragFetch.
- W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2).
- W7 #9 F3: strip trailing `.` before hostname blocklist lookup.
- W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding +
  env rotation mitigation).
- W7 #9 F9: deleted path.startsWith('http') override branch (future-
  bypass foot-gun).

services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts
- W7 #10: pass timeoutMs=60_000 (default 10s was a regression).
- Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to
  defend against an adversarial RAG response.
larryro added a commit that referenced this pull request May 8, 2026
Round-2 W8 batch (six independent UI / docs items, all small).

i18n keys (W8 #1)
- Add governance.retentionPolicy.group.deletionBehavior.{title,description}
  to en/de/fr — previously missing in all three locales, retention editor
  fell back to the inline English string in every locale.
- Add governance.retentionPolicy.chatFilterEvents.{title,description,
  placeholder,helper} to en/de/fr — previously the row title rendered
  the literal id "chatFilterEvents".

useTranslation namespace (W8 #2)
- use-data-classification-notice.ts now binds useTranslation('dataNotice')
  so the dataNotice.default fallback string is reachable; the prior
  default-namespace bind meant the DE/FR fallbacks were unreachable.

UI delete copy (W8 #7)
- en/de/fr.json deletePermanentMessage now matches the actual
  user-visible behavior (Trash + grace + retention) instead of saying
  "permanently deleted" while the code soft-trashes.

Stale test (W8 #10)
- delete_chat_thread.test.ts was asserting the legacy
  `{ status: 'deleted' }` patch shape; the helper now writes
  `{ status: 'trashed', statusChangedAt: <ms> }` in user-trash mode.

Docs cron + maker-checker (W8 #8)
- docs/{en,de,fr}/self-hosted/configuration/retention.md: cron 03:00
  UTC → 04:00 UTC (matches crons.ts), plus a sibling note about the
  new 01:00 UTC effectReleasesOnly cron.
- en docs: replace the non-existent `releaseLegalHold` reference with
  the correct dual-control flow names plus the new
  RELEASE_APPROVAL_MIN_DELAY_MS, ORG_HOLD_REQUIRES_DUAL_CONTROL, and
  closeLegalMatter fan-out behavior.
- retention_cleanup.ts stale "03:00 UTC" comment fixed.
larryro added a commit that referenced this pull request May 8, 2026
…OCTOU

Adds assertSafeRetentionDelete to internal_mutations_retention.ts and
threads the new cutoffMs through every retention dispatcher call site.
The guard runs in-mutation (V8 transaction) and closes three round-2
findings simultaneously:

- W1 #4 — snapshot-race vs legal hold. The dispatcher's
  loadActiveHoldsForOrg snapshot is up to 25 min stale; a hold placed
  AFTER snapshot but BEFORE the per-row mutation otherwise had zero
  protection. Re-reading inside each mutation makes the in-mutation
  read the only authoritative gate.
- W6 #10 — cross-org corruption. Every deleteExpired* trusted
  args.organizationId blindly; a swapped id silently deleted org A's
  row while logging the audit under org B (and forking the per-org
  hash chain). Now we assert row.organizationId === args.organizationId.
- W4 #13 — cutoff TOCTOU. Between the dispatcher's listExpired query
  and the per-row delete, a user can re-touch the row (chat
  thread.updatedAt bump, document patch). The mutation now re-evaluates
  cutoff against (updatedAt ?? _creationTime) and skips when the row
  is no longer eligible.

Mutations updated (each accepts new optional cutoffMs):
- deleteExpiredDocument            (targetType: 'document')
- deleteExpiredThread              (targetType: 'thread')
- deleteExpiredWorkflowExecution   (targetType: 'execution')
- deleteExpiredWorkflowTriggerLog  (orgId + cutoff only)
- deleteExpiredCustomer / Vendor / ExternalConversation /
  PromptTemplate / MessageFeedback / MemoryAuditRow /
  ChatFilterEvent / UsageLedgerRow (orgId + cutoff)

Out-of-scope here:
- deleteExpiredTempFile, deleteExpiredMessageMetadata,
  deleteExpiredTwoFactorAttempt, deleteExpiredLoginAttempt,
  deleteExpiredLoginBlockCounter — none of these carry organizationId
  on their args today (they're either cross-org-by-shape or have an
  indirect link). The orgHeld / cross-org checks are not applicable
  without an org backreference, which is itself a separate phase-10
  follow-up. Their existing absence-of-row idempotence is preserved.
larryro added a commit that referenced this pull request May 8, 2026
Three round-2-confirmed gaps in the dual-control flow.

placeLegalHold (W1 #11)
- targetType: 'org' is the nuclear "halt all retention" hold; a single
  admin placing it freezes cleanup org-wide, defeating dual-control.
  Refuse by default; deployments running a single admin opt back in via
  the existing TALE_LEGAL_HOLD_SINGLE_ADMIN_OK env. A future maker-
  checker placement flow (legalHoldPlaceRequests + approveLegalHoldPlacement)
  supersedes this; today's commit restores the contract.

approveLegalHoldRelease (W1 #12)
- 5-min minimum delay between request and approval (constant
  RELEASE_APPROVAL_MIN_DELAY_MS). Defeats the chained-call attack
  where one admin requests + a second admin instantly approves via the
  same automation flow. Skipped when self-approval is in effect (the
  escape hatch already opted out of dual-control entirely).
- Re-check that the original requester is still an org admin at
  approval time. A demoted/removed requester whose request lingered
  loses the ability to retroactively gate a destructive change.

closeLegalMatter (W1 #10 + W4 #6)
- Implements the docstring promise (round-2 v08 bonus): closing a
  matter now fans out PENDING release requests for every active hold
  linked via matterRef, leveraging the new
  by_organizationId_matterRef index. Approval still requires a
  second admin via approveLegalHoldRelease — matter-close does NOT
  auto-release, dual-control survives.
- Returns releaseRequestsFiled count so the UI can confirm the
  cascade. Audit row records linkedHolds + releaseRequestsFiled.
- Skips holds with an existing pending or approved request (idempotent
  against re-running close on a half-closed matter).
larryro added a commit that referenced this pull request May 8, 2026
…eout

Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs +
/redoc unauthenticated, RAG container ran as root, default token baked
into image ENV, strict-mode env name diverged across the wire,
non-constant-time token compare, plus three SSRF-guard gaps.

services/rag/app/auth.py
- W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes
  the dead-code EXEMPT_PATHS frozenset.

services/rag/app/routers/health.py
- W7 #1: split into public_router (`/`, `/health`) and protected_router
  (`/config`). main.py mounts the protected one under
  Depends(verify_internal_token). Old `router` re-export stays for
  backwards compat.

services/rag/app/main.py
- W7 #2: docs_url / redoc_url / openapi_url are None outside debug.
- W7 #4: CORS allow_credentials flipped to False (bearer rides
  Authorization, never cookies).
- W7 #1 wiring: mount health-public + health-protected separately.

services/rag/app/config.py
- W7 #8: require_custom_internal_token accepts BOTH
  RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN
  via pydantic AliasChoices.

services/rag/Dockerfile + services/convex/Dockerfile
- W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`).
  RAG ingests untrusted PDFs/DOCX through native parsers; biggest
  blast radius in the stack, now hardened.
- W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from
  both runtime + scratch-squash stages and the matching bake in
  services/convex/Dockerfile. Operators MUST supply via env / compose
  / k8s secret.

services/platform/convex/lib/helpers/rag_config.ts
- W7 #9 F1: `redirect: 'manual'` on every ragFetch.
- W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2).
- W7 #9 F3: strip trailing `.` before hostname blocklist lookup.
- W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding +
  env rotation mitigation).
- W7 #9 F9: deleted path.startsWith('http') override branch (future-
  bypass foot-gun).

services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts
- W7 #10: pass timeoutMs=60_000 (default 10s was a regression).
- Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to
  defend against an adversarial RAG response.
larryro added a commit that referenced this pull request May 8, 2026
Round-2 W8 batch (six independent UI / docs items, all small).

i18n keys (W8 #1)
- Add governance.retentionPolicy.group.deletionBehavior.{title,description}
  to en/de/fr — previously missing in all three locales, retention editor
  fell back to the inline English string in every locale.
- Add governance.retentionPolicy.chatFilterEvents.{title,description,
  placeholder,helper} to en/de/fr — previously the row title rendered
  the literal id "chatFilterEvents".

useTranslation namespace (W8 #2)
- use-data-classification-notice.ts now binds useTranslation('dataNotice')
  so the dataNotice.default fallback string is reachable; the prior
  default-namespace bind meant the DE/FR fallbacks were unreachable.

UI delete copy (W8 #7)
- en/de/fr.json deletePermanentMessage now matches the actual
  user-visible behavior (Trash + grace + retention) instead of saying
  "permanently deleted" while the code soft-trashes.

Stale test (W8 #10)
- delete_chat_thread.test.ts was asserting the legacy
  `{ status: 'deleted' }` patch shape; the helper now writes
  `{ status: 'trashed', statusChangedAt: <ms> }` in user-trash mode.

Docs cron + maker-checker (W8 #8)
- docs/{en,de,fr}/self-hosted/configuration/retention.md: cron 03:00
  UTC → 04:00 UTC (matches crons.ts), plus a sibling note about the
  new 01:00 UTC effectReleasesOnly cron.
- en docs: replace the non-existent `releaseLegalHold` reference with
  the correct dual-control flow names plus the new
  RELEASE_APPROVAL_MIN_DELAY_MS, ORG_HOLD_REQUIRES_DUAL_CONTROL, and
  closeLegalMatter fan-out behavior.
- retention_cleanup.ts stale "03:00 UTC" comment fixed.
larryro added a commit that referenced this pull request May 9, 2026
…stodian cascade

Round-1 #10/#12/#13 + round-2 V3 confirmed three custodian-cascade gaps
(P0-6 retention guard, P0-7 deleteDocumentById, P0-8 cleanupMessageMetadata
+ cleanupChatFilterEvents). The user's observation made the deeper truth
clear: in production, only `org` and `userMembership` hold target types
are operator-facing — `thread` / `document` / `execution` were modeled in
the original fine-grained hold design but never wired into the place-hold
UI (`PICKER_TARGET_TYPES = ['userMembership', 'org']`), and `usePlaceLegalHold`
is called from a single dialog. The corresponding `holds.threadIds` /
`documentIds` / `executionIds` Sets in `ActiveHolds` therefore stayed
empty in every deployment, while ~80 LOC of dead per-row checks
proliferated through retention, erasure, restore, folders, and the
cascade helper. This commit collapses the model to its actual shape.

Schema-level narrowing (legal_hold.ts):
  - HOLD_TARGET_TYPES = ['userMembership', 'org'] (write-side)
  - ActiveHolds = { orgHeld, userMembershipIds }
  - loadActiveHolds switch reduces to 2 cases; legacy thread/document/
    execution rows in the table (if any) are intentionally ignored —
    they're surfaced in admin UI as `(legacy)` via the read-side
    validator in legal_hold_queries.ts.
  - resolveAndAssertTarget keeps only `org` + `userMembership` cases.
  - isHeld() simplified to org + custodian.

Guard simplification (legal_hold_guard.ts):
  - assertNotHeld drops the per-row branches; `targetType`/`targetId`
    are now error-message context only. The user-custodian cascade
    via `authorUserId` is the sole remaining gate beyond org-wide.

assertSafeRetentionDelete (internal_mutations_retention.ts):
  - Drops `targetType?` / `targetId?` args; adds `authorUserId?`.
  - Internal handler is now: cross-org check → TOCTOU check →
    org-wide hold → user-custodian cascade. Single, clear contract.

P0-6 cascade wiring — every retention mutation passes the row's
author user id (~15 call sites in internal_mutations_retention.ts):
  - documents → doc.createdBy
  - threads → thread.userId
  - workflowExecution → execution.userId
  - workflowTriggerLog → exec.userId via wfExecutionId join
  - messageFeedback → row.userId
  - memoryAudit → row.subjectUserId (with thread-fallback)
  - chatFilterEvent → parent thread's userId via index lookup
  - usageLedger → row.userId
  - promptTemplate → row.createdBy
  - messageMetadata → parent thread's userId (closes P0-8 directly)
  - tempFile, customer, vendor, externalConversation: org-only
    (no author concept on the row)

P0-8 cascade fix — cleanupMessageMetadata + cleanupChatFilterEvents
no longer rely on the now-empty `holds.threadIds` Set; both look up
the parent thread's `userId` for the user-membership cascade
(messageMetadata cleanup mutation was already P0-8 fixed in commit 1).

Other dead-check removals:
  - retention_cleanup.ts: ~10 dead `holds.threadIds.has` /
    `documentIds.has` / `executionIds.has` checks removed; per-row
    custodian cascade either replaces them or the mutation re-checks.
  - governance/restore.ts: drops dead per-row branches; user-custodian
    cascade in restoreSoftDeletedRow (commit 1) covers the rest.
  - threads/cascade_helpers.ts: per-thread hold check replaced with
    a thread-metadata lookup → `userMembershipIds.has(thread.userId)`.
  - threads/delete_chat_thread.ts + restore_chat_thread.ts: drop
    `holds.threadIds.has(threadId)`; ownerHeld + orgHeld cover all
    refusal cases.
  - folders/mutations.ts assertNoHeldDescendantDocs: per-document
    hold check replaced with `userMembershipIds.has(doc.createdBy)`.
  - governance/erasure.ts requestErasure: drops heldThreadIds /
    heldDocumentIds collection — refusal is now `orgHeld ||
    userCustodianHeld`. eraseSubjectDocuments drops the per-row
    held-document check (subject is by definition the only author
    in the iteration; user-custodian gate applied at request time).
  - legal_hold_internal.ts: loadActiveHoldsForOrg returns the
    narrowed ActiveHolds shape across the action/query boundary.

Tests rewritten — `governance/__tests__/retention_legal_hold_gaps.test.ts`:
  - Drops thread/document/execution per-row hold cases (behaviour
    deleted on purpose).
  - Adds 6 cases covering the simplified contract: authorUserId
    cascade, TOCTOU + cross-org pre-checks, missing authorUserId
    grandfather, org-wide hold blanket refusal.
  - Source-grep regressions: every retention mutation passes the
    right authorUserId; cleanup actions pre-filter via the right
    `holds.userMembershipIds.has(...)`; per-row Sets stay deleted
    from the cleanup / erasure / guard sources.

Note: this is a one-way migration. Any deployment with active
legacy thread/document/execution holds loses enforcement of those
holds going forward; the read-side validator in legal_hold_queries
keeps showing them as `(legacy)`. Acceptable per project memory
(demo stage, no prod use). Pre-prod check: convert to user/org
holds first if any deployment has them.

Verified: typecheck clean; 851 tests pass across affected dirs;
full vitest run shows 5641/5642 pass (1 pre-existing flake in
canvas-pane.test.tsx, unrelated).
larryro added a commit that referenced this pull request May 9, 2026
- auditLogChainGenesis: per-org sentinel patched on every createAuditLog
  to force concurrent first-writers to conflict on a real document.
  Closes round-2 review CRITICAL #10 (genesis fork: two concurrent
  initial audit writes both observe lastEntry=null and commit with
  previousHash='', forking the chain permanently).

- activeLegalHoldClaims: per-target lock row for placeLegalHold to
  serialize concurrent placements via OCC. Closes CRITICAL #21 (TOCTOU
  duplicate-active-hold: two concurrent placeLegalHold both observe
  no active hold and both insert; row-level OCC does not detect
  range phantoms and Convex has no unique-constraint primitive).

- threads.by_org_user index: enables GDPR requestErasure to enumerate
  a single user's threads via paged scheduler-driven continuation
  (Phase D.4.1) instead of .collect()-on-org which silently truncates
  past the 16K per-transaction read limit on large orgs.

Schema-only; no migration needed (additive). All three close concrete
spoliation/IDOR/integrity risks identified by the review.
larryro added a commit that referenced this pull request May 9, 2026
…heck

- audit_hash: add lifecycleStatus + statusChangedAt to EXCLUDED_FIELDS so
  retention soft-delete (markRowExpiredGeneric) patching audit log rows
  doesn't poison the chain hash recompute. Pre-fix, ANY soft-deleted audit
  row caused verifyIntegrity to fail valid=false from that row forward.
  Round-2 review CRITICAL #8.

- audit_logs/validators: declare lifecycleStatus + statusChangedAt on
  auditLogItemValidator so query-return validation accepts soft-deleted
  rows. Defense-in-depth alongside the EXCLUDED_FIELDS fix.

- verify_integrity: anchor candidate filter accepts subtype === undefined
  (legacy retention checkpoints written before subtype field existed).
  Strict equality dropped them, breaking verifyIntegrity for any
  deployment that ran retention pre-upgrade. Match canonicalCheckpointPayload's
  `?? 'retention'` fallback. Round-2 review CRITICAL #9.

- verify_integrity: add fromTimestamp arg for paged resumption + suppress
  isFirstEntry head-anchor when paging mid-chain. Pre-fix, the response
  promised "page from lastVerifiedTimestamp + 1" but the query had no
  such arg — large-org chains could not be paged.

- verify_integrity: drop unsignedScrubSubjects Set (security-flavored dead
  code; unsignedScrubCount alone tracks the metric). The set was populated
  but never read; the actual gate is `!hasSigningKey`. Comment clarified.

- verify_integrity: type entries as Doc<'auditLogs'>[] instead of an
  open `[key: string]: unknown` index signature.

- audit_logs/internal_mutations: delete archiveOldLogs deprecated
  re-export — zero callers, AGENTS.md prohibits @deprecated tombstones.

- audit_logs/helpers (createAuditLog): introduce buildAuditRecordHashInput
  as single source of truth for the canonical record payload — both
  writer and self-check call it, eliminating drift risk that schema
  additions could change the hash output across writes vs verify.

- audit_logs/helpers (createAuditLog): genesis sentinel — read + patch
  the per-org auditLogChainGenesis row before each write. This forces
  OCC contention on a real document for the first audit write per org,
  closing the genesis-fork race where two concurrent first-writers both
  observe lastEntry=null and commit two roots with previousHash=''.
  Round-2 review CRITICAL #10.

- audit_logs/helpers (createAuditLog): inline self-check on every write
  recomputes the prior row's integrity hash and console.errors on
  mismatch. Catches naive scenario-1 tampering (field changed, hash not
  updated) at the next legitimate audit write — the only automated tamper
  detection today. Wrapped in try/catch and skips piiScrubbed rows so
  it cannot affect the legitimate write path. Round-2 review C.5.

Lint + typecheck clean. Convex codegen succeeded.
larryro added a commit that referenced this pull request May 9, 2026
- audit_hash.test: add three regression tests for round-2 CRITICAL #8.
  (1) lifecycleStatus + statusChangedAt are excluded from canonicalize
  (2) chainSuccessor + piiScrubbedAt are excluded
  (3) writer-vs-verifier hash equality after retention soft-delete patches
  the audit row. Pre-fix, soft-deleting any audit row poisoned the chain.

- delete_document.test: replace the "404 returns error" assertion with
  the post-fix contract: 404 is treated as `success: true, deletedCount: 0`
  for idempotency on retention re-runs. The previous test asserted 400
  to dodge the bug; real RAG returns 404 for not-found.

- audit_logs/helpers.test: extend the mock to support the new
  `auditLogChainGenesis` query + insert (the genesis-fork sentinel from
  CRITICAL #10) and filter assertions to count only `auditLogs` inserts
  so the sentinel write doesn't break the count expectations.

- delete_chat_thread.test: support `for await` over the
  agentWebhookUserThreads query — the cascade now iterates instead of
  collecting. Async iterator returns the pre-seeded mappings.

All 3155 server tests pass.
larryro added a commit that referenced this pull request May 9, 2026
…OCTOU

Adds assertSafeRetentionDelete to internal_mutations_retention.ts and
threads the new cutoffMs through every retention dispatcher call site.
The guard runs in-mutation (V8 transaction) and closes three round-2
findings simultaneously:

- W1 #4 — snapshot-race vs legal hold. The dispatcher's
  loadActiveHoldsForOrg snapshot is up to 25 min stale; a hold placed
  AFTER snapshot but BEFORE the per-row mutation otherwise had zero
  protection. Re-reading inside each mutation makes the in-mutation
  read the only authoritative gate.
- W6 #10 — cross-org corruption. Every deleteExpired* trusted
  args.organizationId blindly; a swapped id silently deleted org A's
  row while logging the audit under org B (and forking the per-org
  hash chain). Now we assert row.organizationId === args.organizationId.
- W4 #13 — cutoff TOCTOU. Between the dispatcher's listExpired query
  and the per-row delete, a user can re-touch the row (chat
  thread.updatedAt bump, document patch). The mutation now re-evaluates
  cutoff against (updatedAt ?? _creationTime) and skips when the row
  is no longer eligible.

Mutations updated (each accepts new optional cutoffMs):
- deleteExpiredDocument            (targetType: 'document')
- deleteExpiredThread              (targetType: 'thread')
- deleteExpiredWorkflowExecution   (targetType: 'execution')
- deleteExpiredWorkflowTriggerLog  (orgId + cutoff only)
- deleteExpiredCustomer / Vendor / ExternalConversation /
  PromptTemplate / MessageFeedback / MemoryAuditRow /
  ChatFilterEvent / UsageLedgerRow (orgId + cutoff)

Out-of-scope here:
- deleteExpiredTempFile, deleteExpiredMessageMetadata,
  deleteExpiredTwoFactorAttempt, deleteExpiredLoginAttempt,
  deleteExpiredLoginBlockCounter — none of these carry organizationId
  on their args today (they're either cross-org-by-shape or have an
  indirect link). The orgHeld / cross-org checks are not applicable
  without an org backreference, which is itself a separate phase-10
  follow-up. Their existing absence-of-row idempotence is preserved.
larryro added a commit that referenced this pull request May 9, 2026
Three round-2-confirmed gaps in the dual-control flow.

placeLegalHold (W1 #11)
- targetType: 'org' is the nuclear "halt all retention" hold; a single
  admin placing it freezes cleanup org-wide, defeating dual-control.
  Refuse by default; deployments running a single admin opt back in via
  the existing TALE_LEGAL_HOLD_SINGLE_ADMIN_OK env. A future maker-
  checker placement flow (legalHoldPlaceRequests + approveLegalHoldPlacement)
  supersedes this; today's commit restores the contract.

approveLegalHoldRelease (W1 #12)
- 5-min minimum delay between request and approval (constant
  RELEASE_APPROVAL_MIN_DELAY_MS). Defeats the chained-call attack
  where one admin requests + a second admin instantly approves via the
  same automation flow. Skipped when self-approval is in effect (the
  escape hatch already opted out of dual-control entirely).
- Re-check that the original requester is still an org admin at
  approval time. A demoted/removed requester whose request lingered
  loses the ability to retroactively gate a destructive change.

closeLegalMatter (W1 #10 + W4 #6)
- Implements the docstring promise (round-2 v08 bonus): closing a
  matter now fans out PENDING release requests for every active hold
  linked via matterRef, leveraging the new
  by_organizationId_matterRef index. Approval still requires a
  second admin via approveLegalHoldRelease — matter-close does NOT
  auto-release, dual-control survives.
- Returns releaseRequestsFiled count so the UI can confirm the
  cascade. Audit row records linkedHolds + releaseRequestsFiled.
- Skips holds with an existing pending or approved request (idempotent
  against re-running close on a half-closed matter).
larryro added a commit that referenced this pull request May 9, 2026
…eout

Round-2 v15 confirmed: /config unauthenticated, /openapi.json + /docs +
/redoc unauthenticated, RAG container ran as root, default token baked
into image ENV, strict-mode env name diverged across the wire,
non-constant-time token compare, plus three SSRF-guard gaps.

services/rag/app/auth.py
- W7 #3: hmac.compare_digest replaces == on the bearer compare. Removes
  the dead-code EXEMPT_PATHS frozenset.

services/rag/app/routers/health.py
- W7 #1: split into public_router (`/`, `/health`) and protected_router
  (`/config`). main.py mounts the protected one under
  Depends(verify_internal_token). Old `router` re-export stays for
  backwards compat.

services/rag/app/main.py
- W7 #2: docs_url / redoc_url / openapi_url are None outside debug.
- W7 #4: CORS allow_credentials flipped to False (bearer rides
  Authorization, never cookies).
- W7 #1 wiring: mount health-public + health-protected separately.

services/rag/app/config.py
- W7 #8: require_custom_internal_token accepts BOTH
  RAG_REQUIRE_CUSTOM_INTERNAL_TOKEN and TALE_REQUIRE_CUSTOM_RAG_TOKEN
  via pydantic AliasChoices.

services/rag/Dockerfile + services/convex/Dockerfile
- W7 #5: RAG container runs as non-root (uid:gid 1001:1001 `app`).
  RAG ingests untrusted PDFs/DOCX through native parsers; biggest
  blast radius in the stack, now hardened.
- W7 #6: removed RAG_INTERNAL_TOKEN=tale-rag-dev-only ENV bake from
  both runtime + scratch-squash stages and the matching bake in
  services/convex/Dockerfile. Operators MUST supply via env / compose
  / k8s secret.

services/platform/convex/lib/helpers/rag_config.ts
- W7 #9 F1: `redirect: 'manual'` on every ragFetch.
- W7 #9 F2: added fc00::/7 (IPv6 ULA) to v6 blocklist (AWS IPv6 IMDSv2).
- W7 #9 F3: strip trailing `.` before hostname blocklist lookup.
- W7 #9 F4: re-validate URL per ragFetch invocation (DNS rebinding +
  env rotation mitigation).
- W7 #9 F9: deleted path.startsWith('http') override branch (future-
  bypass foot-gun).

services/platform/convex/agent_tools/rag/helpers/fetch_document_chunks.ts
- W7 #10: pass timeoutMs=60_000 (default 10s was a regression).
- Plus MAX_ITERATIONS=30 cap and "cursor did not advance" break to
  defend against an adversarial RAG response.
larryro added a commit that referenced this pull request May 9, 2026
Round-2 W8 batch (six independent UI / docs items, all small).

i18n keys (W8 #1)
- Add governance.retentionPolicy.group.deletionBehavior.{title,description}
  to en/de/fr — previously missing in all three locales, retention editor
  fell back to the inline English string in every locale.
- Add governance.retentionPolicy.chatFilterEvents.{title,description,
  placeholder,helper} to en/de/fr — previously the row title rendered
  the literal id "chatFilterEvents".

useTranslation namespace (W8 #2)
- use-data-classification-notice.ts now binds useTranslation('dataNotice')
  so the dataNotice.default fallback string is reachable; the prior
  default-namespace bind meant the DE/FR fallbacks were unreachable.

UI delete copy (W8 #7)
- en/de/fr.json deletePermanentMessage now matches the actual
  user-visible behavior (Trash + grace + retention) instead of saying
  "permanently deleted" while the code soft-trashes.

Stale test (W8 #10)
- delete_chat_thread.test.ts was asserting the legacy
  `{ status: 'deleted' }` patch shape; the helper now writes
  `{ status: 'trashed', statusChangedAt: <ms> }` in user-trash mode.

Docs cron + maker-checker (W8 #8)
- docs/{en,de,fr}/self-hosted/configuration/retention.md: cron 03:00
  UTC → 04:00 UTC (matches crons.ts), plus a sibling note about the
  new 01:00 UTC effectReleasesOnly cron.
- en docs: replace the non-existent `releaseLegalHold` reference with
  the correct dual-control flow names plus the new
  RELEASE_APPROVAL_MIN_DELAY_MS, ORG_HOLD_REQUIRES_DUAL_CONTROL, and
  closeLegalMatter fan-out behavior.
- retention_cleanup.ts stale "03:00 UTC" comment fixed.
larryro added a commit that referenced this pull request May 9, 2026
…stodian cascade

Round-1 #10/#12/#13 + round-2 V3 confirmed three custodian-cascade gaps
(P0-6 retention guard, P0-7 deleteDocumentById, P0-8 cleanupMessageMetadata
+ cleanupChatFilterEvents). The user's observation made the deeper truth
clear: in production, only `org` and `userMembership` hold target types
are operator-facing — `thread` / `document` / `execution` were modeled in
the original fine-grained hold design but never wired into the place-hold
UI (`PICKER_TARGET_TYPES = ['userMembership', 'org']`), and `usePlaceLegalHold`
is called from a single dialog. The corresponding `holds.threadIds` /
`documentIds` / `executionIds` Sets in `ActiveHolds` therefore stayed
empty in every deployment, while ~80 LOC of dead per-row checks
proliferated through retention, erasure, restore, folders, and the
cascade helper. This commit collapses the model to its actual shape.

Schema-level narrowing (legal_hold.ts):
  - HOLD_TARGET_TYPES = ['userMembership', 'org'] (write-side)
  - ActiveHolds = { orgHeld, userMembershipIds }
  - loadActiveHolds switch reduces to 2 cases; legacy thread/document/
    execution rows in the table (if any) are intentionally ignored —
    they're surfaced in admin UI as `(legacy)` via the read-side
    validator in legal_hold_queries.ts.
  - resolveAndAssertTarget keeps only `org` + `userMembership` cases.
  - isHeld() simplified to org + custodian.

Guard simplification (legal_hold_guard.ts):
  - assertNotHeld drops the per-row branches; `targetType`/`targetId`
    are now error-message context only. The user-custodian cascade
    via `authorUserId` is the sole remaining gate beyond org-wide.

assertSafeRetentionDelete (internal_mutations_retention.ts):
  - Drops `targetType?` / `targetId?` args; adds `authorUserId?`.
  - Internal handler is now: cross-org check → TOCTOU check →
    org-wide hold → user-custodian cascade. Single, clear contract.

P0-6 cascade wiring — every retention mutation passes the row's
author user id (~15 call sites in internal_mutations_retention.ts):
  - documents → doc.createdBy
  - threads → thread.userId
  - workflowExecution → execution.userId
  - workflowTriggerLog → exec.userId via wfExecutionId join
  - messageFeedback → row.userId
  - memoryAudit → row.subjectUserId (with thread-fallback)
  - chatFilterEvent → parent thread's userId via index lookup
  - usageLedger → row.userId
  - promptTemplate → row.createdBy
  - messageMetadata → parent thread's userId (closes P0-8 directly)
  - tempFile, customer, vendor, externalConversation: org-only
    (no author concept on the row)

P0-8 cascade fix — cleanupMessageMetadata + cleanupChatFilterEvents
no longer rely on the now-empty `holds.threadIds` Set; both look up
the parent thread's `userId` for the user-membership cascade
(messageMetadata cleanup mutation was already P0-8 fixed in commit 1).

Other dead-check removals:
  - retention_cleanup.ts: ~10 dead `holds.threadIds.has` /
    `documentIds.has` / `executionIds.has` checks removed; per-row
    custodian cascade either replaces them or the mutation re-checks.
  - governance/restore.ts: drops dead per-row branches; user-custodian
    cascade in restoreSoftDeletedRow (commit 1) covers the rest.
  - threads/cascade_helpers.ts: per-thread hold check replaced with
    a thread-metadata lookup → `userMembershipIds.has(thread.userId)`.
  - threads/delete_chat_thread.ts + restore_chat_thread.ts: drop
    `holds.threadIds.has(threadId)`; ownerHeld + orgHeld cover all
    refusal cases.
  - folders/mutations.ts assertNoHeldDescendantDocs: per-document
    hold check replaced with `userMembershipIds.has(doc.createdBy)`.
  - governance/erasure.ts requestErasure: drops heldThreadIds /
    heldDocumentIds collection — refusal is now `orgHeld ||
    userCustodianHeld`. eraseSubjectDocuments drops the per-row
    held-document check (subject is by definition the only author
    in the iteration; user-custodian gate applied at request time).
  - legal_hold_internal.ts: loadActiveHoldsForOrg returns the
    narrowed ActiveHolds shape across the action/query boundary.

Tests rewritten — `governance/__tests__/retention_legal_hold_gaps.test.ts`:
  - Drops thread/document/execution per-row hold cases (behaviour
    deleted on purpose).
  - Adds 6 cases covering the simplified contract: authorUserId
    cascade, TOCTOU + cross-org pre-checks, missing authorUserId
    grandfather, org-wide hold blanket refusal.
  - Source-grep regressions: every retention mutation passes the
    right authorUserId; cleanup actions pre-filter via the right
    `holds.userMembershipIds.has(...)`; per-row Sets stay deleted
    from the cleanup / erasure / guard sources.

Note: this is a one-way migration. Any deployment with active
legacy thread/document/execution holds loses enforcement of those
holds going forward; the read-side validator in legal_hold_queries
keeps showing them as `(legacy)`. Acceptable per project memory
(demo stage, no prod use). Pre-prod check: convert to user/org
holds first if any deployment has them.

Verified: typecheck clean; 851 tests pass across affected dirs;
full vitest run shows 5641/5642 pass (1 pre-existing flake in
canvas-pane.test.tsx, unrelated).
larryro added a commit that referenced this pull request May 9, 2026
- auditLogChainGenesis: per-org sentinel patched on every createAuditLog
  to force concurrent first-writers to conflict on a real document.
  Closes round-2 review CRITICAL #10 (genesis fork: two concurrent
  initial audit writes both observe lastEntry=null and commit with
  previousHash='', forking the chain permanently).

- activeLegalHoldClaims: per-target lock row for placeLegalHold to
  serialize concurrent placements via OCC. Closes CRITICAL #21 (TOCTOU
  duplicate-active-hold: two concurrent placeLegalHold both observe
  no active hold and both insert; row-level OCC does not detect
  range phantoms and Convex has no unique-constraint primitive).

- threads.by_org_user index: enables GDPR requestErasure to enumerate
  a single user's threads via paged scheduler-driven continuation
  (Phase D.4.1) instead of .collect()-on-org which silently truncates
  past the 16K per-transaction read limit on large orgs.

Schema-only; no migration needed (additive). All three close concrete
spoliation/IDOR/integrity risks identified by the review.
larryro added a commit that referenced this pull request May 9, 2026
…heck

- audit_hash: add lifecycleStatus + statusChangedAt to EXCLUDED_FIELDS so
  retention soft-delete (markRowExpiredGeneric) patching audit log rows
  doesn't poison the chain hash recompute. Pre-fix, ANY soft-deleted audit
  row caused verifyIntegrity to fail valid=false from that row forward.
  Round-2 review CRITICAL #8.

- audit_logs/validators: declare lifecycleStatus + statusChangedAt on
  auditLogItemValidator so query-return validation accepts soft-deleted
  rows. Defense-in-depth alongside the EXCLUDED_FIELDS fix.

- verify_integrity: anchor candidate filter accepts subtype === undefined
  (legacy retention checkpoints written before subtype field existed).
  Strict equality dropped them, breaking verifyIntegrity for any
  deployment that ran retention pre-upgrade. Match canonicalCheckpointPayload's
  `?? 'retention'` fallback. Round-2 review CRITICAL #9.

- verify_integrity: add fromTimestamp arg for paged resumption + suppress
  isFirstEntry head-anchor when paging mid-chain. Pre-fix, the response
  promised "page from lastVerifiedTimestamp + 1" but the query had no
  such arg — large-org chains could not be paged.

- verify_integrity: drop unsignedScrubSubjects Set (security-flavored dead
  code; unsignedScrubCount alone tracks the metric). The set was populated
  but never read; the actual gate is `!hasSigningKey`. Comment clarified.

- verify_integrity: type entries as Doc<'auditLogs'>[] instead of an
  open `[key: string]: unknown` index signature.

- audit_logs/internal_mutations: delete archiveOldLogs deprecated
  re-export — zero callers, AGENTS.md prohibits @deprecated tombstones.

- audit_logs/helpers (createAuditLog): introduce buildAuditRecordHashInput
  as single source of truth for the canonical record payload — both
  writer and self-check call it, eliminating drift risk that schema
  additions could change the hash output across writes vs verify.

- audit_logs/helpers (createAuditLog): genesis sentinel — read + patch
  the per-org auditLogChainGenesis row before each write. This forces
  OCC contention on a real document for the first audit write per org,
  closing the genesis-fork race where two concurrent first-writers both
  observe lastEntry=null and commit two roots with previousHash=''.
  Round-2 review CRITICAL #10.

- audit_logs/helpers (createAuditLog): inline self-check on every write
  recomputes the prior row's integrity hash and console.errors on
  mismatch. Catches naive scenario-1 tampering (field changed, hash not
  updated) at the next legitimate audit write — the only automated tamper
  detection today. Wrapped in try/catch and skips piiScrubbed rows so
  it cannot affect the legitimate write path. Round-2 review C.5.

Lint + typecheck clean. Convex codegen succeeded.
larryro added a commit that referenced this pull request May 9, 2026
- audit_hash.test: add three regression tests for round-2 CRITICAL #8.
  (1) lifecycleStatus + statusChangedAt are excluded from canonicalize
  (2) chainSuccessor + piiScrubbedAt are excluded
  (3) writer-vs-verifier hash equality after retention soft-delete patches
  the audit row. Pre-fix, soft-deleting any audit row poisoned the chain.

- delete_document.test: replace the "404 returns error" assertion with
  the post-fix contract: 404 is treated as `success: true, deletedCount: 0`
  for idempotency on retention re-runs. The previous test asserted 400
  to dodge the bug; real RAG returns 404 for not-found.

- audit_logs/helpers.test: extend the mock to support the new
  `auditLogChainGenesis` query + insert (the genesis-fork sentinel from
  CRITICAL #10) and filter assertions to count only `auditLogs` inserts
  so the sentinel write doesn't break the count expectations.

- delete_chat_thread.test: support `for await` over the
  agentWebhookUserThreads query — the cascade now iterates instead of
  collecting. Async iterator returns the pre-seeded mappings.

All 3155 server tests pass.
larryro added a commit that referenced this pull request May 17, 2026
Closes #9, #10, #11, #12 — cascade correctness + GC durability.

- `personalization_cascade.ts:cascadeOnOrgDeleted` swaps delete order:
  `db.delete` runs FIRST, then `storage.delete` inside the try/catch.
  Matches the documented contract in `tts/cascade_helpers.ts:55-62` —
  Convex `_storage` writes are out-of-band and not rolled back on tx
  abort, so the reverse order leaves a surviving row pointing at a
  dead storageId (404 on `/api/tts-audio`).
- `threads/cascade_helpers.ts` step 7c TTS cleanup gets the same swap,
  for the same reason.
- `cascadeOnTtsForMemberRemoved` per-mutation page cap lowered from
  50 (~10K writes) to 30 (~6K writes) to stay under Convex's ~8K
  per-mutation write budget. `cascadeOnOrgDeleted` gets the same cap
  reduction. The hourly cron picks up whatever doesn't fit in a single
  pass — still well inside the 30-day Art 12(3) GDPR window.
- `gcOrgTtsChunks` now persists its org-cursor in a new singleton
  `ttsGcCursor` table between cron runs. A deployment with more orgs
  than `MAX_ORGS_PER_RUN` now advances through the full org list over
  successive hours instead of restarting from the lex-first org every
  time and starving lex-tail orgs forever. On reaching the end of the
  org list the cursor wraps to null and the next run starts over.
- `gcOrgTtsChunks` skip-empty: an org with no rows older than the
  retention cutoff no longer counts against `MAX_ORGS_PER_RUN`. Without
  this, a busy tail of stale orgs sandwiched behind quiet lex-leading
  orgs would never get reaped.
larryro added a commit that referenced this pull request May 17, 2026
Closes #9, #10, #11, #12 — cascade correctness + GC durability.

- `personalization_cascade.ts:cascadeOnOrgDeleted` swaps delete order:
  `db.delete` runs FIRST, then `storage.delete` inside the try/catch.
  Matches the documented contract in `tts/cascade_helpers.ts:55-62` —
  Convex `_storage` writes are out-of-band and not rolled back on tx
  abort, so the reverse order leaves a surviving row pointing at a
  dead storageId (404 on `/api/tts-audio`).
- `threads/cascade_helpers.ts` step 7c TTS cleanup gets the same swap,
  for the same reason.
- `cascadeOnTtsForMemberRemoved` per-mutation page cap lowered from
  50 (~10K writes) to 30 (~6K writes) to stay under Convex's ~8K
  per-mutation write budget. `cascadeOnOrgDeleted` gets the same cap
  reduction. The hourly cron picks up whatever doesn't fit in a single
  pass — still well inside the 30-day Art 12(3) GDPR window.
- `gcOrgTtsChunks` now persists its org-cursor in a new singleton
  `ttsGcCursor` table between cron runs. A deployment with more orgs
  than `MAX_ORGS_PER_RUN` now advances through the full org list over
  successive hours instead of restarting from the lex-first org every
  time and starving lex-tail orgs forever. On reaching the end of the
  org list the cursor wraps to null and the next run starts over.
- `gcOrgTtsChunks` skip-empty: an org with no rows older than the
  retention cutoff no longer counts against `MAX_ORGS_PER_RUN`. Without
  this, a busy tail of stale orgs sandwiched behind quiet lex-leading
  orgs would never get reaped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant