Skip to content

Atlas foundation: codebase-knowledge layer in Pathfinder (off-by-default)#94

Merged
jpr5 merged 15 commits into
mainfrom
blitz/atlas-foundation/integration
Jun 6, 2026
Merged

Atlas foundation: codebase-knowledge layer in Pathfinder (off-by-default)#94
jpr5 merged 15 commits into
mainfrom
blitz/atlas-foundation/integration

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented Jun 6, 2026

Summary

Adds the Atlas foundation to Pathfinder — an agent-maintained codebase-knowledge layer (the "codebase-memory" quadrant alongside auto-memory, handoffs, and episodic memory). This PR lands the foundation dormant / off-by-default: the schema, providers, gardener, ratification endpoints, webhook ingestion, analytics, and a thin atlas CLI are all present and tested, but no operational loop is scheduled and no behavior changes for existing Pathfinder users.

What's included

  • Additive DB migration: atlas_seed_entries (durable inputs — decisions/corrections/inbox/schema) and atlas_cache_pages (regenerable derived pages). Durability attaches to inputs, not the wiki.
  • AtlasDataProvider (src/db/atlas.ts) — seed + cache persistence.
  • Gardener (src/indexing/atlas-gardener.ts) — regenerates cache pages from seed; hardened error path (logs failures, guards bookkeeping).
  • Ratification endpoints (src/server.ts) — body-param routes for approving/rejecting seed entries (path-param variants intentionally dropped — see below).
  • GitHub webhook PR ingestion (src/webhooks/) — capture is webhook-driven server-side, not agent-driven.
  • Retrieval analytics (src/db/analytics.ts) — Atlas retrievals excluded from standard /analytics.
  • atlas-cli.ts — thin stateless MCP client so agents (esp. Codex, which struggles with MCP reconnect) get a first-class atlas search "<question>" access path without configuring an MCP server.

Scope notes

  • Off-by-default. Wired to the library boundary, not the running service. Merging this does not turn anything on.
  • Path-param ratification routes were dropped in favor of the body-param routes — the path variants introduced a double-URL-decode bug on slash-bearing keys (Express 5 already decodes wildcard segments) and were redundant. Body routes are the single supported path.

Deferred wiring follow-up (pilot prerequisite — NOT in this PR)

The operational loop is a deliberate follow-up, required before the pilot:

  • Gardener scheduler (periodic cache regeneration)
  • Retrieval-metric endpoint surfacing
  • service: session tagging on retrievals
  • seed_path wiring (seed lives in a private backoffice sidecar for the pilot, not in-repo)

Pilot repos: copilotkit/copilotkit + ag-ui-protocol/ag-ui.

Test plan

  • tsc --noEmit — 0 errors
  • prettier — clean
  • full vitest suite — 3673/3673 passing
  • build — succeeds
  • independent 7-agent CR + confirmation round converged to zero
  • Sandbox E2E (PGlite, reuse test fixtures, deterministic gardener stub) MUST run before merge — runbook prepared

jpr5 added 15 commits June 5, 2026 23:45
A transient failure in markAtlasCachePagesStaleForSources caused
executeJob to reject, so onReindexComplete never fired for a reindex
that actually succeeded — suppressing bash-instance refresh, llms.txt/
faq.txt cache clearing, and the reindex audit. Wrap the Atlas cache
invalidation call in its own try/catch that logs and continues, keeping
it before the callback but unable to suppress it.
The per-page catch in gardenAtlasCachePages persisted the generation
error to the DB but never logged it, leaving operators blind. Worse, the
recordAtlasCachePageGenerationError call was unguarded: if it threw (e.g.
"Atlas cache page not found" on a concurrently deleted/re-keyed row, or
any transient DB error), the rejection escaped the loop and aborted the
entire gardening pass, losing all prior progress and never returning a
summary.

Now the generation failure is logged via console.error, and the
bookkeeping call is wrapped in its own try/catch that logs and continues
so a single page's bookkeeping failure can't poison the batch. Adds a
red-green test covering the bookkeeping-throws case.
- parseSseMessages now skips empty/whitespace `data:` frames (keepalives)
  and wraps per-event JSON.parse so unparseable frames are skipped instead
  of crashing the search command with an opaque "Unexpected end of JSON
  input" error.
- DEFAULT_TOOL is now "atlas-search" to match the Atlas tool name in
  pathfinder.example.yaml, so `atlas search "x"` targets Atlas by default
  instead of the docs search tool.

Adds red-green tests: an empty `data:` SSE frame that previously crashed,
and a default-tool assertion pinned to "atlas-search".
…path keys

Finding 1 (HIGH): approveAtlasCandidate silently returned 200 without
queuing a reindex when no orchestrator was wired (Atlas sources but no
search/knowledge tools). Now log a loud, actionable error and surface
reindexQueued:boolean in the JSON response. The orchestrator-present 200
path is unchanged.

Finding 2: the path-param approve/reject routes used :canonicalKey, which
a literal "/" in a real key (e.g. "github-pr:atlas:owner/repo:42") would
truncate, addressing the wrong key. Switch to an Express 5 wildcard param
(*canonicalKey) and reconstruct/decode the full key in atlasCanonicalKey,
so both %2F-escaped and literal-slash keys round-trip. Body-based routes
are untouched.

Tests: add red-green coverage for both findings in
atlas-ratification-endpoints.test.ts.
The path-param wildcard routes (POST /api/atlas/candidates/*canonicalKey/
approve and /reject) double-decoded the key (Express 5 decodes wildcard
segments, then decodeURIComponent ran again, corrupting %XX keys), were
body/path-inconsistent, and were fully redundant with the working
body-based routes. Drop both registrations and the now-unused
atlasCanonicalKey(req) helper. Keep the body routes, atlasCanonicalKeyFromBody,
and the approve-without-orchestrator fix. Convert the surviving tests to the
body route and remove the path-param-only slash-key test.
Add an `atlas feedback` subcommand that wraps the submit-feedback collect
tool through a shared callTool helper. Harden response-frame selection:
coerced JSON-RPC id matching, id-less error/result fallback, result.isError
surfaced as exit 1, missing-response now fails loud, and non-array content
is guarded. Add fail-loud guards for the `--for` and `--tool` flags.
@jpr5 jpr5 merged commit d43b88e into main Jun 6, 2026
5 checks passed
@jpr5 jpr5 deleted the blitz/atlas-foundation/integration branch June 6, 2026 19:09
jpr5 added a commit that referenced this pull request Jun 6, 2026
## Summary

Version-only release cut. Bumps `@copilotkit/pathfinder` **1.13.3 →
1.14.0** so the version-gated `publish-release.yml` workflow fires on
merge and publishes `@copilotkit/pathfinder@1.14.0` to npm — carrying
the new `bin.atlas` CLI. The server/source code is **unchanged from
#94** (commit `d43b88e`), which is already deployed to prod; this PR
only changes the version (`package.json`, `package-lock.json`,
`src/cli.ts`) and adds a CHANGELOG entry.

### Why
- The npm publish is version-gated: pushing to `main` publishes only
when `package.json` version is unpublished. `1.14.0` is not yet on npm
(verified `npm view @copilotkit/pathfinder@1.14.0` → 404), so merging
this fires the publish.
- Unblocks internal-skills #121, which needs to pin
`@copilotkit/pathfinder@1.14.0` for the `atlas` CLI.

### What 1.14.0 ships (all from #94, now released)
- **`atlas` CLI** — first-party client as `bin.atlas`: `atlas search` +
`atlas feedback`, with hardened `tools/call` response handling.
- **`prepublishOnly` build guard** — the published tarball always ships
a fresh `dist/` (incl. `dist/atlas-cli.js`).
- **Atlas foundation** — off-by-default codebase-knowledge layer:
additive schema, ratification endpoints, gardener, webhook PR ingestion.
Disabled unless explicitly enabled; existing deployments unaffected.

### Bump rationale
Minor bump (new additive `atlas` CLI feature, backward compatible). Repo
uses a plain `package.json` version bump (no changesets/release-please).
`src/cli.ts` `.version()` bumped in lockstep to satisfy the
`version-sync` CI gate and the publish workflow's "Verify CLI version
matches package version" check.

## Local gate results (node 25 / `/tmp/pf-release`)
- prettier `--check` (package.json, src/cli.ts, CHANGELOG.md): **clean**
- `scripts/check-version-sync.sh`: **✓ in sync (1.14.0)**
- `npx tsc --noEmit`: **0 errors**
- `npm run build`: **succeeds**, emits `dist/atlas-cli.js`,
`dist/cli.js`, `dist/index.js`
- `npm test`: **3712 passed (254 files)** after build (CI test job
builds before testing, matching this order)

## Merge note
Do **not** auto-merge. The orchestrator will merge after the prod deploy
is confirmed healthy; merge fires the npm publish.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant