Skip to content

Pathfinder gap-report remediation: hybrid search + docs repoint + ag-ui sources + chunker/durability#93

Merged
jpr5 merged 8 commits into
mainfrom
blitz/pathfinder-gaps/integration-engine
Jun 8, 2026
Merged

Pathfinder gap-report remediation: hybrid search + docs repoint + ag-ui sources + chunker/durability#93
jpr5 merged 8 commits into
mainfrom
blitz/pathfinder-gaps/integration-engine

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented Jun 6, 2026

What this PR actually is

This is the Pathfinder gap-report remediation (the 30-day docs-MCP gap analysis). Merging it deploys to the live docs-MCP (mcp.copilotkit.ai) via push-to-main → Docker → Railway, and runs a small additive DB migration. It bundles the engine-code fixes and the deploy/copilotkit-docs.yaml config quick-wins — so merging flips production behavior, not just code.

Config quick-wins (deploy/copilotkit-docs.yaml)

  • Hybrid search ON for all 4 search tools (search_mode: hybrid) + min_score: 0.3 floor — the report's Add collect tool type #1 lever (prod was pure-vector).
  • Docs source repointed from the retired docs/content/docs/ tree to the live showcase/shell-docs/src/content/docs/ (+ strip_prefix / webhook.path_triggers).
  • Code source de-polluted: exclude examples/**, showcase/**, **/.next/**, **/*.d.ts (was ~79% boilerplate).
  • ag-ui code sources added: integrations/** (Python adapters), sdks/python/, sdks/community/ (JVM + ports); exclude generated/ + *.pb.*.
  • Tool descriptions sharpened with scope/exclusions (fixes cross-tool mis-routing).
  • Drop the two non-.mdx docs files that derived 404 URLs.

Engine changes (src/)

  • Markdown chunker hardening + large invariant test corpus (fence integrity, verbatim-substring fidelity, heading-path soundness, robust LLM-array recovery).
  • MDX <Snippet/> inlining (snippets.ts) — recovers snippet-composed pages (v2 Migration Guide, etc.).
  • Embed title + headingPath alongside content (pipeline.ts), not content-only.
  • Indexing durability: per-item partial-failure handling, composed with Atlas cache invalidation.
  • Request-source tagging (X-Pathfinder-Sourcequery_log.request_source + session_id) — anti-self-inflation groundwork. Migration is ADD COLUMN IF NOT EXISTS + CREATE INDEX IF NOT EXISTS on the append-only query_log (additive, idempotent, online-safe).

After merge

  • Watch deploy-health-check (auto on push-to-main) + index-health-monitor (4h cron).
  • Retrieval gains (hybrid, repoint, new sources, embed-text) take full effect on the next nightly reindex (03:00 UTC) which re-chunks + re-embeds.
  • 4234 tests green; CI green; rebased onto current main (Atlas integrated).

Not included (follow-up)

  • Query-time alias/acronym expansion (HITL↔human-in-the-loop, etc.) — gap-report QW7.
  • Structural: split examples into their own source, content-hash dedup, AST chunking, cross-encoder reranker.

jpr5 added 8 commits June 6, 2026 13:35
…ctness and content integrity

Unify heading and fence detection behind one CommonMark-correct predicate, make
overlap and the line-split fallback fence-aware, normalize CRLF and re-normalize
inlined snippet bytes before chunking, and bound the from-import brace so half-open
fences, severed multi-backtick spans, and unclosed imports can no longer corrupt
served chunk text. De-duplicate identical snippet import lines and strip every copy
on inline. Add a comprehensive chunker-invariant oracle covering split completeness
and fenced-content preservation.
…eries correct

Hold the state token on transient incremental read failures and propagate index
failures instead of advancing over or deleting unindexed items, make chunk
delete+upsert atomic, run deletion detection before the no-matching-changes
short-circuit, fix path:'.' indexing data-loss, include file size in the
change-detection hash so mtime-preserving edits reindex, and clear stale chunks on
zero-chunk items. Guard FAQ confidence casts with rollback, fetch FAQ metadata by
result id, over-fetch before the confidence filter, and order FAQ browse by global
recency. Coerce p95 latency and webhook by_decision counts, surface stat and
extract-fallback failures, and span all data for the All time range. Add coverage
across pipeline, file-provider, state-token, analytics, knowledge, and schema.
Span all data in the All time view, cap the range-mode per-day series, and exclude
the browse sentinel so the dashboard charts match the corrected analytics queries.
…filter fixture

Exclude the ag-ui .md sources whose derived URLs 404 from the docs deploy config,
and update the test-path-filter fixture to match the current code excludes and docs
repoint.
@jpr5 jpr5 force-pushed the blitz/pathfinder-gaps/integration-engine branch from 2aafd40 to cc5c8c5 Compare June 7, 2026 00:05
@jpr5 jpr5 changed the title Pathfinder retrieval upgrade + query-source observability Pathfinder gap-report remediation: hybrid search + docs repoint + ag-ui sources + chunker/durability Jun 7, 2026
@jpr5 jpr5 merged commit 1d80c08 into main Jun 8, 2026
5 checks passed
@jpr5 jpr5 deleted the blitz/pathfinder-gaps/integration-engine branch June 8, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant