Feature/refresh sync out 3645#109
Conversation
Additive migration. Adds pending_action, pending_action_target, pending_action_attempts, pending_action_last_attempt_at, and pending_action_last_error columns plus a CHECK constraint enforcing that pending_action and pending_action_target are both set or both null. Also gitignores docs/ for spec and plan artifacts. Subsequent PRs populate these columns from per-file handlers and read from the resync sweeper. PR 1 is observation-only — no code reads or writes the new columns yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add two partial unique indexes scoped to live rows so a single assemblyFileId maps to at most one dbxFileId per portal+channel: (portal_id, channel_sync_id, assembly_file_id) WHERE deleted_at IS NULL (portal_id, channel_sync_id, dbx_file_id) WHERE deleted_at IS NULL Foundation for upcoming pre-insert work (INSERT ... ON CONFLICT DO NOTHING) that needs these as atomic race-protection targets. Shipping separately so the migration bakes before any dependent code paths land. Audit query for existing duplicates is in the migration SQL comments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add IF NOT EXISTS to the CREATE UNIQUE INDEX statements so the migration is idempotent and tolerates manual pre-application. - Narrow each partial index WHERE clause to also require the indexed file_id column IS NOT NULL. Rows with a NULL file id can never trigger a uniqueness violation (NULLs are distinct in PG), so excluding them keeps the index lean. Snapshot updated to match. - Document the implication in the schema: any future ON CONFLICT targeting these indexes must repeat the partial predicate verbatim. - Expand audit-query comments into multi-line block format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…re-insert creates Make failed sync operations between Dropbox and Assembly durable and recoverable in either direction: - Per-file delete handlers wrap the side effect with a tombstone (pending_action* columns), soft-deleting on success and leaving the row for the sweeper on failure. - Create handlers (Assembly→Dropbox in syncAssemblyFilesToDropbox and Dropbox→Assembly leaf-file in createAndUploadFileToAssembly) now pre-insert the mapping row with a create-pending tombstone before the target-side API call. Race protection via INSERT ... ON CONFLICT DO NOTHING against the two partial unique indexes. - A Trigger.dev schedule (*/15 * * * *) sweeps rows with active tombstones (subject to MAX_ATTEMPTS=10 and per-attempt backoff), dispatches by (action, target) to the appropriate retry helper. - Update path keeps the existing delete+sync chain composition to avoid webhook ping-pong (FileDeleted/FileCreated with stale source IDs mid-flight). Cleanups: - Removed Vercel cron /api/workers/resync-failed-files and CRON_SECRET. - Removed recoverLegacySync and the legacy contentHash IS NULL branch in findFailedSyncs. Production cleanup of pre-PR-2 partial-create rows handled separately via one-time SQL script. - Shared normalizeError utility at src/utils/normalizeError.ts. Tests added for MapFilesService tombstone helpers and ResyncService per-portal fan-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The partial unique indexes on file_folder_sync were narrowed in OUT-3778
to also require the indexed file_id column IS NOT NULL. ON CONFLICT
clauses targeting those indexes must repeat the same predicate or
Postgres throws "no unique or exclusion constraint matching the
ON CONFLICT specification" at runtime.
In insertCreatePending, derive both the conflict columns and the WHERE
clause from `isDropboxTarget`:
- target=DROPBOX → match (portal, channel, assembly_file_id) index,
WHERE deleted_at IS NULL AND assembly_file_id IS NOT NULL
- target=ASSEMBLY → match (portal, channel, dbx_file_id) index,
WHERE deleted_at IS NULL AND dbx_file_id IS NOT NULL
Semantically the IS NOT NULL is always satisfied at insert time (the
caller pre-populates whichever id corresponds to the target side); the
clause exists only so Postgres can resolve the partial index.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- insertCreatePending now stamps pendingActionLastAttemptAt = NOW() so the sweeper's backoff window guards the row while the original completePending* call is still in-flight. Prevents the sweeper from racing the original call and producing duplicate remote files. - Removes markFailure from inside completePendingAssemblyCreate / completePendingDropboxCreate; errors now propagate to callers. Each caller (the original create path in Sync.service.ts and the sweeper retry path in retryFailedSyncsForPortal) records the failure exactly once, eliminating the double-markFailure on the sweeper path that was shortening the backoff window. - Test mock extended to cover the insert chain; new test asserts the pendingActionLastAttemptAt stamp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the sweeper from `*/15 * * * *` to `0 8,20 * * *` (08:00 + 20:00 UTC). 15-minute polling was overkill: the typical failure modes (transient Dropbox/Copilot 5xx, intermittent rate-limits) recover within minutes and the next sweep tick covers the same row regardless. Off-peak hours minimise contention with user-driven syncs during business hours. Tradeoff: in-flight transient failures now take up to ~12 hours to be swept (vs. ~15 min). The user-triggered Resync button (OUT-3784) is the on-demand recovery path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fileIdColumn insertCreatePending previously branched twice on `isDropboxTarget`: once to pick the conflict columns and again to build the WHERE clause. Both references resolve to the same column per branch, so collapse to a single `fileIdColumn` ref and define `conflictColumns` / `conflictWhere` once. Behaviour unchanged — drizzle interpolates the column ref by name in the sql template, so the generated SQL matches the same partial unique index as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds a pending-action queue mechanism to the
Confidence Score: 3/5Safe to merge for schema correctness, but the second migration will exclusively lock file_folder_sync for the full index build, which poses a real availability risk on a production table before that is addressed. The column additions and check constraint are clean and low-risk. The unique-index migration runs without CONCURRENTLY, taking an ACCESS EXCLUSIVE lock on file_folder_sync for the entire build and blocking all reads and writes on what is likely a busy table. There is also no mechanism to reset pending_action_attempts when a new pending action cycle begins on a row, which could silently exhaust the retry budget. src/db/migrations/20260526094109_add_unique_indexes_on_portal_channel_assembly_and_dbx_file.sql needs attention for the non-concurrent index creation; src/db/schema/fileFolderSync.schema.ts warrants a second look around the attempts-counter reset semantics. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Event triggers sync change] --> B{Set pending_action + pending_action_target}
B --> C[file_folder_sync row - check constraint enforced]
C --> D[Worker polls: WHERE pending_action IS NOT NULL]
D --> E{Attempt action}
E -->|Success| F[Clear pending_action and pending_action_target to NULL]
E -->|Failure| G[Increment pending_action_attempts]
G --> H{attempts >= MAX?}
H -->|No| D
H -->|Yes| I[Dead-letter / give up]
F --> J[Row retains pending_action_attempts value - no auto-reset]
Reviews (1): Last reviewed commit: "refactor(OUT-3645): unify ON CONFLICT co..." | Re-trigger Greptile |
| assemblyFileId: uuid(), | ||
| pendingAction: PendingActionEnum(), | ||
| pendingActionTarget: PendingActionTargetEnum(), | ||
| pendingActionAttempts: integer().notNull().default(0), |
There was a problem hiding this comment.
pendingActionAttempts can carry over stale counts to a new pending cycle
There is no constraint or default reset that zeros pending_action_attempts when pending_action is cleared and then re-set on the same row. If application code checks attempts >= MAX_RETRIES before processing, a row that previously failed 3 times and was cleared will start its next pending action cycle with attempts = 3 and could be immediately abandoned.
| pendingAction: PendingActionEnum(), | ||
| pendingActionTarget: PendingActionTargetEnum(), | ||
| pendingActionAttempts: integer().notNull().default(0), | ||
| pendingActionLastAttemptAt: timestamp({ withTimezone: true, mode: 'date' }), | ||
| pendingActionLastError: text(), |
There was a problem hiding this comment.
No index to efficiently query rows with a pending action
The worker or poller that picks up rows needing processing will query something like WHERE pending_action IS NOT NULL. Without an index on pending_action, every such scan is a full table sweep on file_folder_sync. Since this column is sparse (most rows will have NULL), a partial index on (pending_action, pending_action_target) where pending_action IS NOT NULL would keep the working set tiny and make polling very cheap.
No description provided.