Skip to content

OUT-3800: dedupe parallel folder creation in Dropbox→Assembly sync#111

Merged
SandipBajracharya merged 3 commits into
mainfrom
OUT-3800
Jun 3, 2026
Merged

OUT-3800: dedupe parallel folder creation in Dropbox→Assembly sync#111
SandipBajracharya merged 3 commits into
mainfrom
OUT-3800

Conversation

@SandipBajracharya

Copy link
Copy Markdown
Collaborator

Problem

Sibling files syncing in parallel (e.g. /abc/file1.txt, /abc/file2.txt) both try to create their shared parent folder /abc. Assembly's folder create is path-idempotent and returns the same assemblyFileId to both racers, so the second insertFileMap collided on the file_folder_sync_portal_channel_assembly_unique partial index and threw the Failed query: insert into file_folder_sync ... error in Sentry.

Fix

  • insertFileMap now uses onConflictDoNothing against the existing assembly unique index (no new index added) and returns null on conflict. The synced-files count only increments when a row was actually inserted.
  • createAndUploadFileToAssembly pre-checks the mapping table by path (getDbxMappedFileFromPath) to skip the redundant Assembly create in the common sequential case. onConflictDoNothing is the safety net for the true concurrent race.
  • Insert-race-loser now stamps dbxFileId via handleFolderCreatedCase, so a folder entry that loses the insert race to an intermediate-segment writer isn't left with dbxFileId=null (which would break later deletes/moves).
  • Refactor: split the long createAndUploadFileToAssembly into createLeafFileInAssembly and createFolderInAssembly; the public method is now a thin dispatcher.

Notes

  • No migration / no new index — reuses file_folder_sync_portal_channel_assembly_unique.
  • Relies on Assembly returning the same assemblyFileId for a duplicate folder (confirmed by the Sentry error). The existing "Folder already exists" catch covers the sequential variant.

Tests

Added coverage in MapFiles.tombstone.test.ts for the conflict target columns, the partial-index predicate, and the inserted / null-on-conflict return paths. pnpm typecheck, pnpm lint, and all 14 tests pass.

🤖 Generated with Claude Code

Sibling files syncing in parallel both create their shared parent folder.
Assembly's folder create is path-idempotent and returns the SAME
assemblyFileId to both racers, so the second insertFileMap collided on
file_folder_sync_portal_channel_assembly_unique and threw.

- insertFileMap uses onConflictDoNothing against the existing assembly
  unique index (no new index), returns null on conflict; synced count only
  increments on a real insert.
- createAndUploadFileToAssembly pre-checks the mapping table by path to skip
  the redundant Assembly create in the common sequential case.
- Insert-race-loser stamps dbxFileId via handleFolderCreatedCase so a folder
  entry that loses to an intermediate-segment writer isn't left null.
- Split createAndUploadFileToAssembly into createLeafFileInAssembly and
  createFolderInAssembly; public method is now a thin dispatcher.
- Cover insertFileMap conflict target, predicate, and null path in tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@linear-code

linear-code Bot commented Jun 1, 2026

Copy link
Copy Markdown

OUT-3800

@vercel

vercel Bot commented Jun 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dropbox-integration Ready Ready Preview, Comment Jun 1, 2026 11:33am

Request Review

@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown

Greptile Summary

Fixes a file_folder_sync unique-index collision that occurred when sibling files syncing in parallel both tried to create their shared parent folder in Assembly, which is path-idempotent and returns the same assemblyFileId to both racers.

  • insertFileMap gains an onConflictDoNothing targeting the existing file_folder_sync_portal_channel_assembly_unique partial index (columns portalId, channelSyncId, assemblyFileId with predicate deletedAt IS NULL AND assemblyFileId IS NOT NULL), returning null on conflict. The synced-files counter is only incremented for the winning insert.
  • createAndUploadFileToAssembly is split into createLeafFileInAssembly and createFolderInAssembly; the folder path adds a path-based pre-check via getDbxMappedFileFromPath to short-circuit the common sequential case, while onConflictDoNothing serves as the safety net for the true concurrent race.
  • The insert-race loser now calls handleFolderCreatedCase to stamp dbxFileId onto the winning row, preventing null from being left when an intermediate-segment writer wins the insert for a folder entry that carries a Dropbox file ID.

Confidence Score: 4/5

Safe to merge; the deduplication logic is correct, the partial-index predicate exactly mirrors the schema, and the synced-files count increments exactly once per unique folder.

The partial-index predicate in insertFileMap exactly mirrors the schema so the deduplication fires correctly in Postgres. The entire concurrent-race branch relies on Assembly returning the same assemblyFileId for the same folder path, a property confirmed empirically but not enforced in code. If that contract breaks, silent duplicate rows could be inserted. Test coverage for the new insertFileMap behaviour is thorough.

src/features/sync/lib/Sync.service.ts — specifically the concurrent folder-creation path in createFolderInAssembly and the assumptions around Assembly idempotency.

Important Files Changed

Filename Overview
src/features/sync/lib/MapFiles.service.ts insertFileMap now returns null on onConflictDoNothing; partial-index target and WHERE predicate exactly match the schema definition
src/features/sync/lib/Sync.service.ts createAndUploadFileToAssembly split into createLeafFileInAssembly / createFolderInAssembly; path-based pre-check + onConflictDoNothing deduplication is logically sound but relies on Assembly returning the same assemblyFileId for concurrent folder creates
src/features/sync/lib/tests/MapFiles.tombstone.test.ts New insertFileMap tests cover conflict-target columns, partial-index predicate, successful insert, and null-on-conflict return; insertReturning fixture shared correctly via module-level let

Reviews (1): Last reviewed commit: "fix(OUT-3800): dedupe parallel folder cr..." | Re-trigger Greptile

Comment thread src/features/sync/lib/Sync.service.ts
Comment thread src/features/sync/lib/Sync.service.ts
@SandipBajracharya SandipBajracharya changed the title fix(OUT-3800): dedupe parallel folder creation in Dropbox→Assembly sync OUT-3800: dedupe parallel folder creation in Dropbox→Assembly sync Jun 1, 2026
Note that handleFolderCreatedCase covers both sub-cases: needed when the
winner was an intermediate-segment writer (dbxFileId=null), no-op otherwise.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Defensive guard: getDbxMappedFileFromPath already fetches all live rows for
a path, so warn when >1 exists. Surfaces the case where the
onConflictDoNothing dedupe fails because Assembly stopped returning the same
assemblyFileId on duplicate folder create.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@arpandhakal arpandhakal left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to go. Lets reduce the number of comments and make them simple. claude pollutes the codebase with extra comments as of now.

@SandipBajracharya SandipBajracharya merged commit dcfd1f8 into main Jun 3, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants