fix(import): assign imported session to current project and directory#15826
Closed
Juhwa-Lee1023 wants to merge 3 commits into
Closed
fix(import): assign imported session to current project and directory#15826Juhwa-Lee1023 wants to merge 3 commits into
Juhwa-Lee1023 wants to merge 3 commits into
Conversation
When importing a session, override both project_id and directory with the current project context (Instance.project.id and Instance.worktree) instead of preserving the original session's project binding. Also include directory in onConflictDoUpdate so that re-importing a session correctly updates the directory to the current project. Fixes anomalyco#15797
Contributor
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
6 tasks
Contributor
|
Closing this pull request because it has had no updates for more than 60 days. If you plan to continue working on it, feel free to reopen or open a new PR. |
Acksell
added a commit
to Acksell/clank
that referenced
this pull request
May 14, 2026
* sync: session export/import substrate (push leg)
Adds the building blocks for syncing opencode sessions alongside code
checkpoints during `clank push --migrate`. Sessions are stored as
opaque opencode export blobs in the same S3 prefix as the code
bundles; one ownership-flip atomically covers both legs.
Empirically pinned `opencode import` semantics first (Step 0 in
TestOpenCodeImportSemantics) — confirmed §E path (a): import
preserves info.id and additive-merges messages by ID, so the
RegisterImportedSession upsert can key on host ULID + external_id.
New primitives:
- pkg/sync/checkpoint/sessionmanifest.go: SessionManifest schema +
versioning, mirrors Manifest's discipline.
- pkg/sync/storage/sessions.go: KeyForSession +
BlobSessionManifest, sibling layout under checkpoints/.
- internal/agent/opencode_export.go: thin wrappers around
`opencode export` / `import` subprocesses with ANSI-stripped
stdout parsing for the imported session ID.
- internal/host/sessions_export.go: Service.ExportSessions —
enumerate sessions for a worktree, abort busy ones immediately
(per user UX call: no idle wait), invoke opencode export per
session. Claude-code sessions are skipped with a warning.
- internal/host/sessions_import.go: Service.RegisterImportedSession
— idempotent upsert keyed on host ULID + worktreeID.
- internal/host/mux/sessions_sync.go: /sync/sessions/{build,upload,
delete,apply-from-urls} handlers mirroring the existing code
trio.
- pkg/sync/sessions.go: Server.PresignSessionPuts +
DownloadSessionURLs.
- pkg/gateway/migrate_sessions.go: gateway orchestration
(pushSessionsToSprite) — fetches the session manifest inline,
mints per-session GET URLs, POSTs to the sprite's
/sync/sessions/apply-from-urls. Wired into handleMigrateWorktree
between code apply and TransferOwnership.
What's NOT in this commit (follow-up):
- Laptop-side push.go orchestration of session build + upload
(gateway-facing HTTP route on the sync server + daemonclient/
syncclient methods).
- Symmetric pull-back session leg (migrate_back.go +
daemonclient.MaterializeResponse session URL fields + pull.go
apply).
- The inject-"please continue" message after migrate (v1.1).
- Auto-sync on regular `clank push` (v1.1).
- Claude-code backend session sync (future).
Tests:
- internal/agent: TestOpenCodeImportSemantics (Step-0 diagnostic),
TestOpenCodeExportImport_RoundTrip.
- internal/host: TestExportSessions_* (skip-claude, empty
worktree, happy-path with real opencode binary),
TestRegisterImportedSession_RoundTrip.
- internal/host/mux: TestSyncSessions{Build,Apply}_* input
validation, malformed manifest rejection.
- pkg/sync: TestPresignSessionPuts_*, TestDownloadSessionURLs_*.
- pkg/sync/checkpoint: TestSessionManifest_* round-trip + version
rejection.
- pkg/sync/storage: TestKeyForSession_* (valid, path-escape,
cross-tenant isolation, shares-checkpoint-prefix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* sync: wire session leg into clank push --migrate
End-to-end push flow now ships session blobs alongside code:
laptop gateway sprite
────── ─────── ──────
clank push --migrate
├─ PushCheckpoint (code → S3 via presign)
├─ pushSessionLeg
│ ├─ BuildSessionCheckpoint → clank-host quiesces + exports
│ ├─ RequestSessionUploadURLs → /v1/checkpoints/{id}/sessions
│ └─ UploadSessionCheckpoint → clank-host PUTs blobs to S3
└─ MigrateWorktree → handleMigrateWorktree
├─ pushCheckpointToSprite (code apply)
├─ pushSessionsToSprite → /sync/sessions/apply-from-urls
│ (sprite imports via RegisterImportedSession)
└─ TransferOwnership (atomic flip)
New pieces:
- pkg/sync/sessions_handler.go: POST /v1/checkpoints/{id}/sessions
HTTP route on the sync server (tenancy + ownership gated).
- pkg/sync/client/sessions.go: Client.RequestSessionUploadURLs.
- internal/daemonclient/sessions_sync.go: BuildSessionCheckpoint +
UploadSessionCheckpoint targeting clank-host via the local
clankd proxy.
- internal/cli/clankcli/push_sessions.go: pushSessionLeg helper
that orchestrates the three-step flow between PushCheckpoint
and MigrateWorktree. Surfaces interrupted sessions and skipped
Claude sessions to the user.
Tests:
- pkg/sync: TestSessionPresignHandler_{HappyPath,EmptySessions,
UnknownCheckpointReturns404,WrongTenantForbidden}.
Still follow-up:
- Symmetric pull-back session leg (materialize → apply on the
laptop side; needs MaterializeResponse field expansion +
pull.go integration).
- The inject-"please continue" message after migration (v1.1).
- Auto-sync on regular push (v1.1).
- Claude-code backend session sync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* gateway: narrow laptop-gateway sync-route block to code leg only
The proxy guard in pkg/gateway/proxy.go was blocking the entire
/sync/* namespace on laptop gateways (when cfg.Sync == nil),
including the new /sync/sessions/* family from the session-sync
substrate. Result: `clank push --migrate` failed with a 404 from
the gateway itself before any of the new code ran.
The original block exists to keep arbitrary socket-holding clients
from writing into ~/work/<repo>/ via /sync/build, /sync/builds/*,
/sync/apply-from-urls. /sync/sessions/* does not touch ~/work/ —
its handlers call into the host Service to drive opencode export
/import + host.db upserts, confined to ~/.local/share/opencode/
and the host DB. Same socket-access requirement, different blast
radius.
Carve out /sync/sessions/* via isBlockedCodeSyncRoute. The
existing TestProxy_BlocksSyncPathsWhenSyncNil is extended to
cover /sync/build and /sync/builds/{id}/upload too, and a new
TestProxy_AllowsSessionSyncOnLaptopGateway pins the carve-out
end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* host: don't chdir into the source's projectDir on session import
RegisterImportedSession was passing entry.ProjectDir (the SOURCE
host's local path) as cmd.Dir to `opencode import` on the
DESTINATION. On the sprite this manifested as:
apply sessions to sprite: sprite sessions/apply-from-urls
(500, code=session_import_failed): register session 01K…:
opencode import /tmp/clank-session-import-…/01K….json:
chdir /Users/axelengstrom/github.com/acksell/mindmouth:
no such file or directory
`opencode import` reads/writes its own storage (HOME-relative) and
ignores cwd — confirmed by TestOpenCodeImportSemantics. The
projectDir pass-through was speculative "future-proofing" with no
real benefit; remove it on the import path.
Adds TestRegisterImportedSession_IgnoresSourceProjectDir to pin the
regression: the destination MUST succeed even when entry.ProjectDir
points at a path that doesn't exist locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* host: rewrite info.directory in opencode blob to destination path
Imported sessions on a sprite were landing under opencode's
synthetic 'global' project because info.directory in the export
blob held the SOURCE host's local path (e.g. /Users/foo/repo on
the laptop), which doesn't resolve on the sprite.
Add a small rewrite shim in RegisterImportedSession: read the
blob, replace info.directory with workRoot/<worktreeID> and
clear info.projectID so opencode rederives it on import. Then
pass the rewritten file to opencode import.
This is a workaround for upstream opencode not supporting a
target-directory or path-rebase flag on import. Once that lands,
both the rewrite and its regression test can be removed in
favor of passing the flag through:
- anomalyco/opencode#15797
- anomalyco/opencode#15826
Tests:
- TestRegisterImportedSession_RewritesDirectoryToDestination
seeds a blob with a deliberately-fake source directory +
projectID, imports via RegisterImportedSession, re-exports
the imported session, and asserts (a) info.directory matches
the destination's workRoot/<worktreeID>, (b) info.projectID
is no longer the source's hash (proxy for opencode having
rederived it).
- buildSyntheticOCBlob is refactored to delegate to a new
parameterized buildOCBlobWithDir so the new test can vary
directory + projectID independently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* store: host.db time columns → INTEGER unix millis (modernc bug fix)
modernc.org/sqlite v1.x serialises time.Time parameters via
t.String() (including the monotonic-clock suffix m=+... for
time.Now() values), then can't parse the result back. Reproduced
empirically on v1.50.1 (latest as of this commit). Symptom in
production:
daemon: list sessions: sql: Scan error on column index 14,
name "created_at": unsupported Scan, storing driver.Value
type string into type *time.Time
The bug fires whenever any row written via time.Now() is read
back through ListSessions / GetSession. Some rows escape because
.UTC(), .Round(0), or JSON round-trips happen to strip monotonic.
This is why it's intermittent — every dataset has a mix of
parseable and unparseable timestamps.
Fix: store all time columns as INTEGER unix milliseconds, with
boundary conversion in store/sessions.go. modernc never gets to
"interpret" a time.Time at all, so the entire round-trip class
is gone.
Scope: host.db only (sessions + primary_agents). The provisioner-
side clank.db (hosts/worktrees/checkpoints) almost certainly has
the same problem and warrants its own migration, but it's not
firing today and the migration is its own concern.
Schema migration v3:
- Build sessions_v3 / primary_agents_v3 with INTEGER time cols.
- Read every row from the legacy table, scanning time columns
as raw strings (CAST AS TEXT, to avoid the very bug we're
fixing).
- parseLegacyTimeMs (timeparse.go) converts each string to
millis. Covers the four formats observed on real sprite DBs:
* Go String() + monotonic: "... +0000 UTC m=+..."
* Go String() UTC named: "... +0000 UTC"
* Go String() fixed offset: "... +0200 +0200"
* RFC3339Nano (JSON-derived)
Last-ditch fallback: extract millis from the row's ULID
(which encodes them directly), so a single unparseable row
never blocks startup.
- Drop legacy table, rename _v3 to the original name, recreate
indexes. Whole thing runs in a transaction.
Boundary conversion at the store layer:
- UpsertSession: time.Time → timeToMs(t) → int64
- sessionFromRow: int64 → msToTime(ms) → time.Time (UTC)
- SearchSessions: Since/Until filters pass int64 millis
- LastReadAt: sql.NullInt64 mirrors the nullable column
- UpsertPrimaryAgents: time.Now().UnixMilli()
Trade-off: lose sub-millisecond precision. For our use case (UI
sort order, inbox display, sweep timestamps) this is invisible.
Pinned by TestUpsertSession_TimeNowRoundTripsLosslessly, which
asserts at ms precision exactly because that's the new contract.
Tests:
- timeparse_test.go: every observed legacy format parses to
the expected millis; ULID fallback is deterministic; bogus
input + bogus ULID falls back to time.Now().
- TestUpsertSession_TimeNowRoundTripsLosslessly: the explicit
regression test for the original symptom — time.Now() is
inserted, ListSessions reads it back without driver errors.
- TestOpen_MigrationV2ToV3_BackfillsTimeFormats: hand-crafts
a v2 DB with one row per observed format, opens via store.
Open (triggering v3 migration), asserts all six rows survive
with the correct UnixMilli.
- TestOpen_MigrationV2ToV3_UnparseableTimeFallsBackToULID:
proves a single corrupt row doesn't block startup — its
timestamp gets backfilled from the ULID prefix.
- The existing persistence_e2e test that compared UpdatedAt
via time.Equal is loosened to ms precision (the test's
intent — "sweep didn't bump UpdatedAt" — is preserved).
Future direction: the user has expressed interest in migrating
the entire host store to entgo.io/ent, which handles all of this
natively. That's a separate, larger lift; this commit unblocks
the immediate failure mode without committing to that path yet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* sync: wire session leg into clank pull --migrate
Symmetric to the push side. After the gateway materializes the
sprite's code checkpoint, it now also asks the sprite to export
its sessions, mints presigned PUT URLs from the sync server, and
has the sprite upload session blobs + the session manifest to S3.
The materialize response gains session_manifest_url and
session_blob_urls; the laptop's pull.go forwards those to its own
clank-host's /sync/sessions/apply-from-urls before committing the
ownership flip.
Materialize flow (additions in CAPS):
1. Sprite builds code bundles to local disk.
2. Gateway creates checkpoint row + mints code PUT URLs.
3. Sprite uploads code bundles via PUT URLs.
4. Gateway commits the checkpoint.
5. SPRITE BUILDS SESSION BLOBS TO LOCAL DISK.
6. GATEWAY MINTS SESSION PUT URLs VIA PresignSessionPuts.
7. SPRITE UPLOADS SESSION BLOBS + MANIFEST VIA PUT URLs.
8. Gateway mints GET URLs (code + session) for the laptop.
9. Return materializeResponse with everything.
Laptop flow (additions in CAPS):
1. Materialize.
2. Download + apply code checkpoint locally.
3. CALL LOCAL CLANK-HOST /sync/sessions/apply-from-urls WITH
THE SESSION URLs FROM THE RESPONSE.
4. Commit ownership transfer.
If step 3 fails, commit is skipped — the worktree stays
sprite-owned so the user can retry without a half-migrated state
(code applied locally, sessions still on the sprite).
Empty session manifest (sprite had no opencode sessions in the
worktree) is a silent no-op throughout — no per-session PUT/GET
URLs are minted, and the laptop's apply-from-urls call is skipped
entirely via the empty-URL fast path in ApplySessionCheckpoint.
Files:
- pkg/gateway/migrate_back.go: adds triggerSpriteSessionBuild,
triggerSpriteSessionUpload, deleteSpriteSessionBuild (mirror
the existing code-leg helpers); extends materializeResponse
with session URL fields; threads the session leg through
handleMigrateMaterialize between the code commit and the
GET-URL mint.
- pkg/gateway/migrate_test.go: extends the inline sprite stub
with no-op /sync/sessions/{build,builds/{id}/upload,
builds/{id}} handlers so TestMigrate_TwoPhaseRoundTrip's
empty-sessions path passes.
- internal/daemonclient/migrate.go: extends MaterializeResponse
with SessionManifestURL + SessionBlobURLs (omitempty).
- internal/daemonclient/sessions_sync.go: adds
ApplySessionCheckpoint, a thin wrapper around clank-host's
/sync/sessions/apply-from-urls. Empty manifest URL is a
silent no-op so the pull-side caller doesn't need a special
case.
- internal/cli/clankcli/pull.go: after applyRemoteCheckpoint,
invoke ApplySessionCheckpoint on a NewLocalClient before
CommitMigration. Logs "imported N session(s) locally".
Tests:
- TestApplySessionCheckpoint_EmptyManifestURLIsNoOp pins the
fast-path that makes pull --migrate keep working when the
sprite had no sessions.
- TestApplySessionCheckpoint_ForwardsBodyToDaemon pins the
wire shape against drift between the daemonclient wrapper
and the host mux handler.
- TestApplySessionCheckpoint_PropagatesDaemonError ensures a
daemon-side failure aborts the migration before the
ownership commit.
- TestApplySessionCheckpoint_RejectsEmptyWorktreeID covers
input validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* host: don't chdir into a stale LocalPath on session export
Mirror of the import-side fix in commit bd49c8b. ExportSessions
was passing info.GitRef.LocalPath to `opencode export` as cmd.Dir.
That field can hold a path that doesn't exist on the current host
— e.g. a session imported earlier from a laptop has the laptop's
filesystem path baked into project_dir. chdir fails before
opencode runs, surfaced on pull --migrate as:
sprite session build: sprite sessions/build 500:
{"code":"session_export_failed","error":"export sessions: 01K…:
opencode export ses_…: chdir /Users/…/repo:
no such file or directory"}
`opencode export` doesn't need cwd (verified by
TestOpenCodeImportSemantics), so passing "" is correct AND robust.
Regression: TestExportSessions_IgnoresStaleLocalPath stamps
host.db with a definitely-nonexistent SOURCE-style LocalPath,
then asserts ExportSessions still succeeds.
This pair (this commit + bd49c8b) closes the chdir-on-cross-
machine-path footgun in both directions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* daemonclient: stop silently truncating migrate responses past 4 KiB
MaterializeMigration / CommitMigration / MigrateWorktree all read
the response via:
io.ReadAll(io.LimitReader(resp.Body, 4<<10))
io.ReadAll on a saturated LimitReader returns happily — there's no
"hit the cap" signal — so any response over 4 KiB gets silently
truncated and json.Unmarshal then fails with the unhelpful:
decode materialize response: unexpected end of JSON input
Hit in production by `clank pull --migrate` after the session-leg
work landed: each session adds a presigned S3 URL (~600 bytes
signed) to materializeResponse.SessionBlobURLs, so a worktree with
~5 sessions easily clears 4 KiB.
Fixes:
- Bump cap to 1 MiB. Generous; a worktree would need hundreds
of sessions to approach it.
- Replace io.ReadAll(LimitReader(N)) with readAllCapped(N) that
reads N+1 bytes and ERRORS when actual size > N. Future
overflows surface loudly with "exceeds N-byte cap (would
truncate)" instead of corrupting JSON parsing.
- Apply uniformly to all three migrate calls (CommitMigration
and MigrateWorktree have small responses today but the
silent-failure mode shouldn't be a footgun anywhere).
Tests:
- TestMaterializeMigration_LargeResponseDoesNotSilentlyTruncate
seeds an 8 KiB+ response (well past the old 4 KiB cap, well
under the new 1 MiB cap) and asserts every field round-trips
cleanly. Tightly-coupled to the regression — flips red the
moment anyone re-introduces a sub-cap LimitReader.
- TestMaterializeMigration_OversizedResponseErrorsLoudly streams
1.25 MiB of bytes back and asserts the loud-error path fires
instead of silent corruption. Documents the new contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* host: coerce undefined diff before/after to "" before opencode import
Second instance of opencode export emitting values its own import
validator rejects. Surfaced via pull --migrate on a real sprite
session:
Invalid input: expected string, received undefined
path: ["summary", "diffs", 0, "before"]
path: ["summary", "diffs", 0, "after"]
Inside messages[*].info.summary.diffs[*], opencode export
sometimes leaves before/after undefined while file/additions/
deletions are populated. The import validator then refuses the
whole blob — opencode never even starts writing the session,
and our daemon's RegisterImportedSession surfaces the Zod error
verbatim.
Extend the rewriteImportBlob shim (renamed from
rewriteImportBlobDir) to walk every diff entry and coerce
non-string before/after to "". Diff entries that already have
strings are untouched.
Renamed because the helper now applies multiple opencode
export/import workarounds; the call site in
RegisterImportedSession enumerates them. Each workaround can be
deleted independently when opencode tightens its export schema
or relaxes its import validator. No upstream issue identified
yet for the diff-undefined case — would be worth filing.
Tests:
- TestRegisterImportedSession_RepairsUndefinedDiffStrings
seeds a blob with the production shape (file/additions/
deletions set, before/after absent) and asserts import
succeeds. Tightly mirrors the failing payload — flips red
the moment the shim regresses.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* host: revert diff-coercion shim, keep only the directory rebase
The diff-before/after coercion in rewriteImportBlob was paving
over an opencode export/import schema asymmetry: opencode export
emits messages[*].info.summary.diffs[*].{before,after} as
undefined, opencode import then rejects undefined where it
expects strings. Coercing to "" satisfied the validator but
misrepresented the data — we don't actually know what before/
after were on the source, and "" is a lie.
The directory rebase, by contrast, is justified: info.directory
in the source's blob is a filesystem path that genuinely doesn't
exist on the destination. Without rebasing, opencode files the
imported session under a synthetic "global" project. That shim
stays.
Going forward, validator failures from `opencode import` surface
verbatim. Future work in handleSyncSessionsApplyFromURLs can
choose to skip-and-warn rather than abort the whole migration —
that's the architecturally correct response, not "fabricate the
field opencode wanted to see".
Reverts the shim + regression test added in 73f4102. The
TestRegisterImportedSession_RewritesDirectoryToDestination case
covers the remaining (justified) blob rewrite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* host: skip-and-warn on orphan host.db rows during export
Service.ExportSessions used to abort the entire migration on the
first 'opencode export <external_id>' failure, including the
'Session not found' case where opencode's storage no longer holds
a session that host.db still references. That made one orphan row
fatal to push --migrate / pull --migrate.
How an orphan happens: the user (or any tool) runs 'opencode
session delete' directly. opencode prunes its storage; clank-host
keeps watching its own host.db; the next migration hits the dead
external_id and the whole flow dies at the first row.
Surfaced today after my own diagnostic exec on a sprite
accidentally deleted a session; the user's next pull --migrate
failed with:
sprite session build: sprite sessions/build 500:
{"code":"session_export_failed","error":"… Session not found:
ses_1dd78ccccffeN4I5i9kcgvmAkh"}
Fix: detect 'Session not found' from opencode's stderr and treat
it as a SkippedSession with a clear reason ("opencode storage
missing this session (host.db orphan; was it deleted via opencode
CLI?)") instead of returning an error from ExportSessions. Other
opencode-export failures (chdir errors, JSON corruption, etc.)
still abort the migration loudly — the skip is narrowly scoped to
"the row is provably orphaned".
Substring-matches on opencode's English error message because the
opencode CLI doesn't surface error codes to subprocess callers.
Documented; if opencode rephrases, tests catch it and we update
the matcher.
We deliberately do NOT auto-delete the host.db row. Leaving it in
place lets the user see what was lost (via the Skipped slice
surfaced through the CLI) before deciding whether to manually
clean up. Auto-self-heal is a reasonable follow-up but it's its
own design choice that deserves its own PR.
Test: TestExportSessions_SkipsMissingOpencodeSession seeds two
host.db rows — one mapping to a real opencode session, one with
a fabricated external_id opencode has never seen — and asserts
ExportSessions returns 1 entry + 1 Skipped, no error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* clankcli: refuse migrations on incompatible opencode versions
Production case that motivated this: laptop on opencode 1.3.15,
sprite on 1.14.48. The 1.14 export schema replaced before/after
diff strings with patch/status — a forward-incompatible schema
change between minor versions. clank pull --migrate would happily
upload code, mint URLs, materialize on the laptop, then fail at
opencode import with an opaque Zod validator error and a
half-migrated state.
This commit catches the mismatch before any work happens.
Policy (in agent.AssertOpencodeVersionsCompatible):
- exact match → silent OK
- patch-only diff → log a one-line warning, proceed
- minor or major diff → refuse with upgrade hint
- either side empty/bogus → refuse (can't reason about compat)
Architecture:
- agent.OpenCodeVersion: runs `opencode --version` on the
local host, returns the bare version string.
- clank-host mux gains GET /opencode-version, returning
{"version": "1.14.48"}. Reachable through both the local
clankd proxy AND the cloud gateway proxy without any
new firewall changes (it's not under /sync/*).
- daemonclient.OpenCodeVersion: thin wrapper around the new
endpoint. Symmetric — same client targets both ends.
- clankcli.assertOpencodeCompatible: queries both versions,
runs the policy check, formats the user-facing error with
upgrade instructions.
- push.go and pull.go call it ONCE up-front (before
PushCheckpoint / MaterializeMigration), so we fail before
any expensive S3 / sprite work. Fails with:
opencode version mismatch (local=1.3.15, remote=1.14.48):
minor version differs — opencode has shipped breaking
schema changes between minors before; upgrade one side
to match the other before retrying
Upgrade either side to match the other:
laptop: `opencode upgrade`
remote: the sprite picks up a fresh opencode whenever
it's reprovisioned; ask your admin or rebuild
the sprite image
Tests:
- TestAssertOpencodeVersionsCompatible: 9-case table covering
exact match, patch drift, minor drift (the production
repro), major drift, empty/garbage inputs, and the
two-segment "1.14" parses-as-(1,14,0) edge case.
- TestOpencodeVersionEndpoint_ReturnsBareVersion: smoke-tests
the mux handler against a real opencode binary.
Refactor: pull.go now constructs localCli once at the top and
reuses it both for the version check and the later
ApplySessionCheckpoint call (was a redundant NewLocalClient
inside the session-apply block).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* flyio: pin opencode version on the sprite to agent.PinnedOpencodeVersion
Closes the door on the version-skew bug class at the source. The
sprite-side install used to do `bun install -g opencode-ai`, which
always picks up the latest npm version — meaning every fresh
sprite gets whatever opencode happened to be released the day it
was provisioned. Combined with laptops that drift behind on
manual installs, you get the 1.3 ↔ 1.14 schema mismatch we lived
through.
This commit makes the sprite always run a deliberately-chosen
opencode version (currently 1.14.49, defined in
agent.PinnedOpencodeVersion), with three layered defenses:
1. Probe-and-upgrade (not just probe-and-skip).
ensureOpenCodeInstalled now captures the installed version
and only fast-paths if it exactly matches the pin. A drift
(older opencode lingering on a long-lived sprite) triggers
reinstall instead of being silently accepted.
2. Install at the pinned version explicitly.
`bun install -g opencode-ai@<pinned>` (or npm fallback). The
pin is interpolated via __PINNED_VERSION__ template
substitution, gated by validatePinForShell which rejects
anything outside [A-Za-z0-9.-] — defense against a future
pin constant that would shell-inject.
3. Post-install version verification.
The script's tail now compares `opencode --version` to the
pin and exits 1 if they disagree. Catches the (rare but
possible) case where bun's registry routing serves a
different version than requested. Without this, we'd
silently lie to the caller about install success.
Companion change in clankcli: when the laptop-side version-skew
check refuses a migration (commit 7768138), the error message
now suggests the EXACT pinned version to upgrade to, not just
"match the other side":
clank pins opencode at version 1.14.49. Upgrade your laptop
to match:
opencode upgrade --version 1.14.49
Why pinning at the clank level + runtime check is belt-and-
suspenders, not redundancy:
- Pinning makes the common case impossible (sprite always at
the right version on next EnsureHost).
- The runtime check catches what pinning can't: laptops are
outside clank's control. A user on a 5-year-old opencode
that ignores `opencode upgrade` would still be caught.
- Together they give "boring, predictable opencode versions
everywhere" as the default state.
Bumping the pin is a deliberate, reviewable change:
1. Update agent.PinnedOpencodeVersion.
2. `make install` — laptop binaries learn the new pin.
3. Sprites probe-and-reinstall on next EnsureHost (~30-90s
one-shot cost per sprite).
4. Laptop users `opencode upgrade --version <new>` — runtime
check refuses migrations until they do.
Tests:
- TestValidatePinForShell_Allows / _Rejects: shell-injection
boundary (only [A-Za-z0-9.-] passes).
- TestPinnedOpencodeVersion_PassesShellSafetyCheck: meta-test
that the constant clank actually ships with isn't shaped in
a way the safety check rejects.
- TestOpencodeInstallScript_HasPinnedSubstitution: regression
against a future refactor that removes the
__PINNED_VERSION__ placeholder while leaving the
substitution call (which would silently install
opencode-ai@__PINNED_VERSION__, a nonexistent package).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* flyio: drop the validatePinForShell theatre
The pin is a Go constant compiled into clank's binary. It's not
configurable at runtime, not read from a file, not derived from
user input. Defending it against shell-injection assumes a threat
actor that doesn't exist: any clank developer who wanted to run
arbitrary shell on a sprite has direct access to the install
script three lines above.
Delete validatePinForShell, the call to it in
ensureOpenCodeInstalled, and the entire opencode_pin_test.go
file (-108 LOC).
Fat-finger protection comes for free: a malformed pin produces
`bun install -g opencode-ai@<bogus>` which npm rejects loudly
("no such version"), failing the install before anything
dangerous executes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* clankcli: --timing flag printing per-phase breakdown of push/pull --migrate
Refactor-prep instrumentation. Adds opt-in latency measurement
across the laptop-side phases of clank push --migrate and
clank pull --migrate so we can see where wall-clock time
actually goes before deciding what to optimize.
Enable with --timing or CLANK_TIMING=1. Output (sorted largest-
first, with percentage of total):
pushed checkpoint 01K... (HEAD aca7ca8d)
pushed 5 session(s) for checkpoint 01K...
migrated worktree 01K... → remote/01K...
timing (sorted, largest first):
migrate worktree (gateway) 9.7s 44%
build session export 8.3s 38%
upload sessions 2.1s 10%
push checkpoint 1.4s 6%
version compatibility check 400ms 2%
request session upload URLs 150ms 1%
──────────────────────────────────────────
total 22.1s
Phases instrumented:
push --migrate
register worktree (first push only)
push checkpoint (git bundle build + S3 upload + commit)
version compatibility check (laptop + sprite version round-trip)
build session export (clank-host enumerate + opencode export per session)
request session upload URLs (gateway presign call)
upload sessions (clank-host PUTs blobs to S3)
migrate worktree (gateway) (EnsureHost + sprite apply for code+sessions + ownership flip)
pull --migrate
version compatibility check
materialize (gateway) (sprite build + presign + GETs minted)
apply code locally (download bundles + checkpoint.Apply)
apply sessions locally (clank-host /sync/sessions/apply-from-urls)
commit migration (ownership flip)
Design choices:
- Disabled by default; zero-overhead when off (one extra
method dispatch per Start, immeasurable at ms granularity).
- Sorted largest-first because that's how you read it for
"what to optimize next." Percentages because absolute
numbers don't tell you ratios at a glance.
- Closure-on-call shape (done := timer.Start(...); ...; done())
instead of wrapping every operation in a func — keeps the
call sites readable when most phases produce a value the
next line uses.
- humanDuration formats at the right granularity (µs / ms /
s.x) instead of d.String()'s default "1.234567s" noise.
What's NOT measured here:
- Server-side (gateway) sub-phases of "migrate worktree"
and "materialize". These are single round trips from the
laptop's perspective; subdividing them needs a Server-Timing
response header, which is a follow-up once the laptop
numbers say "we always lose 8s in materialize, let's
decompose it."
- Sub-phases of cli.PushCheckpoint (bundle build vs S3
upload vs commit). One call in pkg/sync/client today;
breaking it apart adds API surface — defer until the
higher-level numbers say it's worth it.
- Distributed tracing. Way too heavy for "I want to know
where 22 seconds go."
Tests:
- TestPhaseTimer_DisabledIsNoOp: zero-overhead when off.
- TestPhaseTimer_RecordsAndSorts: sort order and total
percentages.
- TestPhaseTimer_StartActuallyTimes: end-to-end timing.
- TestHumanDuration: formatting boundaries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* clankcli: run version check BEFORE PushCheckpoint on --migrate
Wiring bug from the previous commit: the version-compatibility
check was inside the `if alsoMig {}` block but AFTER
cli.PushCheckpoint, so a sprite that's unreachable or runs the
wrong opencode version still cost the user a full code upload
before the migration aborted.
Move the pre-flight (NewRemoteClient + NewLocalClient + version
check) ahead of PushCheckpoint when --migrate is set. Both
clients are reused downstream for the session leg + migrate call;
no extra round trips.
Also improve the user-facing error when the version check fails
on `host unavailable` (the gateway's signal that EnsureHost
failed — sprite suspended, provisioner creds expired, etc.).
Previously the error read like a version mismatch:
Error: read remote opencode version: opencode-version:
daemon returned status 502: host unavailable
Now it tells the user what actually went wrong and where to look:
Error: can't reach the remote sprite: ...host unavailable
The gateway returned 'host unavailable', which means its
provisioner.EnsureHost failed. Check the gateway's logs for
the underlying reason. For the dev stack:
make docker-logs
Look for a line like 'gateway: ensure host for user …: …'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* clank-host: generalise /opencode-version → /software-manifest, cache it,
asymmetric timeouts in compat check
Two changes wrapped into one because they share a stage:
1) Replace the narrow GET /opencode-version endpoint with a general
GET /software-manifest. The wire format is forward-extensible —
today populated only with opencode's version, but the struct
tag layout reserves room for claude, clank-host's own version,
and future tools without breaking the protocol. The CLI's
compat check now reads .OpenCode.Version off the manifest
instead of a top-level scalar.
Replacement is a clean swap: the only consumer of the old
endpoint was clankcli's compat check, which this commit moves
atomically. -113 LOC of one-off plumbing files, +123 LOC of
reusable manifest plumbing. Net wash on size, but the new
endpoint is what we'd want to evolve toward anyway.
2) Cache the manifest in-process on clank-host. First call probes
`opencode --version`, subsequent calls serve from memory in
nanoseconds. Eliminates ~300-500ms of JS startup cost from
every migration's compat-check phase — which timing
instrumentation just showed was the single largest phase even
when both ends were warm (1.1-1.6s consistently).
Cache invalidation: only on clank-host restart. Safe because
clank-host restarts whenever its binary changes (sidecar-hash
mechanic in the flyio provisioner), and the only reason
opencode's version would change under a running clank-host
is an out-of-band manual upgrade — which the runtime compat
check would catch on the laptop side immediately anyway.
Asymmetric timeouts in assertOpencodeCompatible:
localCompatCheckTimeout = 5 * time.Second
remoteCompatCheckTimeout = 2 * time.Minute
The previous symmetric 10s timeout caused spurious "context
deadline exceeded" errors on first push to a fresh sprite —
EnsureHost on a cold sprite legitimately takes 30-90s for the
embedded clank-host install + opencode install. Local clank-host
serves /software-manifest from cache in milliseconds; remote
clank-host may cold-start. Independent budgets reflect that.
Tests:
- TestSoftwareManifestEndpoint_ReturnsOpenCodeVersion: smoke-tests
the new endpoint against a real opencode binary.
- TestSoftwareManifestEndpoint_SubsequentCallsAreFast: pins
the caching contract — second call must be <50ms (catches
the regression where someone removes the cache and falls
back to re-shelling opencode per request, which would be
~200ms+ minimum from JS startup).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* mux: trim verbose handleSoftwareManifest comment
* make docker-down: remove volumes too
* fix: address coderabbit correctness findings on PR #18
- opencode_compat: parse before comparing, so unparseable-but-equal
inputs still refuse, and 1.14 ≡ 1.14.0 (semantic patch equality)
- software_manifest: detach sync.Once probe from the caller's ctx so
a canceled first request can't poison the process-global cache
- timeparse: pass pure-digit input through as int64 millis to make
parseLegacyTimeMs idempotent — a half-committed v3 migration can
re-enter without rewriting timestamps via fallbackTimeMs/now()
- sessions.ListSessionsByWorktree: return []SessionInfo{} (not nil)
for empty worktreeID to honor the docstring
- clankcli: error message says 'opencode upgrade v<version>'
(positional), not the wrong --version flag
Regression tests added for each, including a half-committed v3
re-entry case (sessions table already INTEGER, user_version
hand-rewound to 2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* gateway: surface S3/sprite error bodies on the session-migrate path
- fetchURL non-200 path includes a 4 KiB body preview so SigV4 /
object-not-found XML errors stop showing up as opaque "GET 403"
- session retry loop captures the last sprite error and wraps it
into the "exhausted retries" message so the underlying cause
isn't silently dropped after url_expired ping-pong
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test/doc: opt the opencode-import semantics test into t.Parallel and
retract the unverified phaseTimer concurrency claim
- TestOpenCodeImportSemantics is hermetic (per-test TempDir + per-cmd
env array), so add t.Parallel per CLAUDE.md's "use t.Parallel where
safe" rule
- phaseTimer's doc claimed nested/concurrent Start calls were fine, but
every caller (push/pull/push_sessions) drives migration phases
sequentially. Drop the aspirational sentence rather than add a
synchronization that nothing exercises — keeps the contract honest.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* gateway: TODO(coderabbit) marker for the duplicated sprite-request blocks
Six near-identical NewRequestWithContext+Authorization+Transport blocks
across the code and session legs of migrate_back.go are real
duplication, but collapsing them is out of scope for the
import-export-sessions PR (+5936 already). Anchor the deferred work at
the first occurrence so a future contributor editing any of the six
sites lands on the link.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* gateway: commit checkpoint AFTER session upload in materialize handler
CodeRabbit caught a real ordering bug. handleMigrateMaterialize used to
run g.cfg.Sync.CommitCheckpoint — which advances
latest_synced_checkpoint via UpdateWorktreePointer — BEFORE the session
leg. If session-build or session-upload failed afterwards, the response
was a 502 to the laptop but the worktree pointer had already moved to a
checkpoint whose session blobs were missing from storage. Subsequent
push/pull operations that read LatestSyncedCheckpoint would then 404 on
session-manifest.json and silently skip sessions.
Re-ordered: session build + session upload come first; CommitCheckpoint
only runs once both code AND session uploads succeeded. The session
DownloadSessionURLs call moves after the commit (it gates on
UploadedAt). The "no sessions in this worktree" path is unchanged.
Regression test pins the invariant: with a sprite mock that succeeds
on code-leg and returns 500 on session-upload, the worktree's
LatestSyncedCheckpoint must remain at the pre-materialize value. Test
fails without the fix (pointer advances to the new checkpoint id),
passes with it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #15797
Type of change
What does this PR do?
opencode importruns insidebootstrap(process.cwd(), ...), which sets up theInstancecontext for the current directory. However, the imported session row was built entirely from the exported session data viaSession.toRow(exportData.info), which preserves the original session'sdirectory. That means the session ends up pointing at wherever it was exported from, not the current project.Two fixes in one line each:
directorywithInstance.worktree(the resolved worktree of the current project) so the imported session is associated with the correct project directory.directoryto theonConflictDoUpdateset so that re-importing an already-present session also updates its directory, not justproject_id.How did you verify your code works?
bun test test/cli/import.test.ts→ 4 pass, 0 fail.bootstrap(process.cwd(), cb)→Instance.provide({ directory: process.cwd(), ... })→ insidecb,Instance.worktreeresolves to the current project's worktree. ConfirmedInstance.worktreeis the correct field to use (it is whatSession.createuses when creating new sessions in the same context).Screenshots / recordings
N/A — no UI changes.
Checklist