Skip to content

Require proof before superseded PR closeout#196

Open
jesse-merhi wants to merge 4 commits into
mainfrom
jesse/pr-supersession-proof
Open

Require proof before superseded PR closeout#196
jesse-merhi wants to merge 4 commits into
mainfrom
jesse/pr-supersession-proof

Conversation

@jesse-merhi
Copy link
Copy Markdown
Member

Summary

  • Require hydrated PR A / PR B proof before normal apply can close PR A as superseded.
  • Add compact shared supersession proof helpers, prompt templates, and schema-backed model output checks.
  • Gate repair replacement closeout on source PR security/context checks and fall back to link-only when proof is missing or blocked.

Validation

  • fnm exec --using v24.14.1 pnpm run format
  • fnm exec --using v24.14.1 pnpm run build:all
  • fnm exec --using v24.14.1 node --test --test-name-pattern 'PRs superseded by linked pull requests|linked supersession proof is incomplete|same-file supersession is incomplete|security-labeled PRs open|comments contain security markers|supersession proof prompt' test/clawsweeper.test.ts
  • fnm exec --using v24.14.1 node --test test/repair/execute-fix-artifact-source.test.ts test/repair/execute-fix-github.test.ts
  • fnm exec --using v24.14.1 pnpm run check
  • git diff --check --cached

Copilot AI review requested due to automatic review settings May 25, 2026 06:10
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 25, 2026

Codex review: needs changes before merge. Reviewed May 26, 2026, 4:12 PM ET / 20:12 UTC.

Summary
The branch adds schema-backed supersession proof prompts/helpers, hydrates PR context for apply and repair closeout, adds Codex setup to the apply workflow, and expands tests around superseded PR close decisions.

Reproducibility: yes. Source inspection of the PR head shows replacementCloseoutBlockReason accepts proof from arbitrary PR body/comment/review text, and pullRequestFileContextBlockReason does not block filesTruncated views.

Review metrics: 2 noteworthy metrics.

  • Diff size: 14 files changed, 3469 added, 1691 deleted. The PR changes core apply logic, repair closeout, workflow setup, prompts, schema, and tests, so review must cover live automation behavior.
  • Workflow surface: 1 workflow changed. The apply-existing job gains Codex setup before close application, which affects live close automation rather than only local code paths.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🌊 off-meta tidepool
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Restrict replacement proof acceptance to controlled ClawSweeper labels or parsed ClawSweeper-authored durable review/report comments.
  • Block or link-only repair replacement closeout when source or replacement file context is truncated, with regression tests for >100-file PRs.

Risk before merge

  • Repair closeout currently treats arbitrary replacement PR body/comment/review text matching proof: sufficient or proof: override as proof-positive before privileged close automation runs.
  • The repair closeout path does not fail open when file context is truncated, so large source or replacement PRs can be compared from an incomplete file list.
  • The apply workflow now runs additional Codex proof checks in the live close path, so setup failures, timeouts, and thin context need to remain link-only or keep-open outcomes.

Maintainer options:

  1. Tighten proof trust and context gates (recommended)
    Accept positive proof only from ClawSweeper-managed proof labels or parsed ClawSweeper-authored durable review/report comments, and link-only when source or replacement file context is truncated.
  2. Accept broader repair-closeout trust
    Maintainers could explicitly allow PR text to satisfy proof for repair replacements, but that means contributor-authored text can influence whether another PR is closed.
  3. Pause the repair closeout part
    If the permanent repair-closeout proof source is still a policy question, land only the normal apply supersession proof path and defer repair source closeout closure.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Update the replacement-closeout proof gates so positive proof is accepted only from ClawSweeper-managed proof labels or parsed ClawSweeper-authored durable review/report comments, reject or link-only on filesTruncated for source or replacement PRs, and add focused tests for spoofed proof text and >100-file truncated contexts.

Next step before merge
The remaining blockers are concrete repairable guardrail fixes in the replacement closeout proof path.

Security
Needs attention: The diff adds a concrete automation trust-boundary concern: uncontrolled replacement PR text can satisfy a privileged closeout proof gate.

Review findings

  • [P1] Require controlled proof before replacement closeout — src/repair/execute-fix-github.ts:129-131
  • [P1] Fail open on truncated PR file context — src/repair/execute-fix-github.ts:160-162
Review details

Best possible solution:

Keep the proof gate, but require controlled proof signals and fail open on truncated or missing context before any source PR can be closed as superseded.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection of the PR head shows replacementCloseoutBlockReason accepts proof from arbitrary PR body/comment/review text, and pullRequestFileContextBlockReason does not block filesTruncated views.

Is this the best way to solve the issue?

No. The hydrated PR A/PR B comparison is a good direction, but the repair closeout gate should trust only controlled proof sources and should fail open when context is incomplete.

Full review comments:

  • [P1] Require controlled proof before replacement closeout — src/repair/execute-fix-github.ts:129-131
    This treats any proof: sufficient or proof: override string in the replacement PR body, issue comments, reviews, or review comments as positive proof. Those fields can include contributor-authored text, so an open replacement can satisfy the proof-positive gate without a ClawSweeper label or durable review record and then close the source PR after model coverage. Keep proofPassed tied to controlled labels or parsed ClawSweeper-authored review/report comments.
    Confidence: 0.88
  • [P1] Fail open on truncated PR file context — src/repair/execute-fix-github.ts:160-162
    fetchPullRequestFiles hydrates only one 100-file page and marks filesTruncated when changedFiles is larger, but this helper only rejects missing arrays. For a source or replacement with more than 100 files, repair closeout can ask the model to prove supersession from an incomplete file list and then close the source PR. Treat truncated file context as a link-only blocker, matching the normal apply path.
    Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against dc4a8a27df88.

Label changes

Label changes:

  • add merge-risk: 🚨 security-boundary: The repair closeout gate currently lets uncontrolled PR text satisfy a proof condition before privileged GitHub close automation runs.

Label justifications:

  • P1: This PR changes automation that can close real pull requests, and unsafe closeout regressions can affect active maintainer workflows.
  • merge-risk: 🚨 automation: The diff changes apply close promotion, repair replacement closeout, Codex proof execution, and apply workflow setup.
  • merge-risk: 🚨 security-boundary: The repair closeout gate currently lets uncontrolled PR text satisfy a proof condition before privileged GitHub close automation runs.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🌊 off-meta tidepool and patch quality is 🧂 unranked krab.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The external-contributor proof gate is not applicable because this is a member-authored maintainer automation PR; the body lists validation commands but separate contributor setup proof is not required.
Evidence reviewed

Security concerns:

  • [medium] Untrusted proof text can drive closeout — src/repair/execute-fix-github.ts:129
    replacementCloseoutBlockReason accepts proof: sufficient or proof: override from arbitrary PR body/comment/review text, allowing user-controlled text to contribute to a source PR close decision.
    Confidence: 0.82

Acceptance criteria:

  • fnm exec --using v24.14.1 node --test test/repair/execute-fix-github.test.ts test/repair/execute-fix-artifact-source.test.ts
  • fnm exec --using v24.14.1 pnpm run check

What I checked:

Likely related people:

  • brokemac79: Introduced the current-main unsafe replacement safeguards for superseded PR closes in the merged safety path this PR extends. (role: recent adjacent contributor; confidence: high; commits: f2ec021eb55c; files: src/clawsweeper.ts, test/clawsweeper.test.ts)
  • Peter Steinberger: Git blame and history show the stale PR close promotion and repair source closeout paths trace back to commits he authored. (role: introduced behavior; confidence: high; commits: c028d7905c2a, 97885cf4e291; files: src/clawsweeper.ts, src/repair/execute-fix-artifact.ts, src/repair/execute-fix-github.ts)
  • Dallin Romney: Recent current-main work touched repair lane and workflow concurrency near the same automation surface. (role: recent repair lane contributor; confidence: medium; commits: 50ca51d85c23; files: src/clawsweeper.ts, src/repair/execute-fix-artifact.ts, .github/workflows/sweep.yml)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens the “superseded PR” auto-close flow by requiring explicit, schema-validated “supersession proof” (plus security/context gating) before ClawSweeper can close a source PR as superseded by a linked replacement PR.

Changes:

  • Adds shared supersession proof utilities + schema parsing/normalization to ensure “superseded” decisions have concrete coverage evidence.
  • Updates apply/repair flows to hydrate PR context (labels/comments/files/body) and gate closeout on security signals and model proof; falls back to link-only when blocked or incomplete.
  • Expands tests and mocks to cover proof-required behavior, security gating, and prompt/content constraints.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/supersession-proof.ts New shared helpers for compact PR views + schema-backed parsing/normalization of supersession proof output.
src/clawsweeper.ts Adds supersession-proof runtime/prompt/model execution and blocks “superseded close” promotions unless proof allows it.
src/repair/execute-fix-github.ts Hydrates PR view with labels/comments/files/body and introduces a security gate helper for source PRs.
src/repair/execute-fix-artifact.ts Adds replacement closeout proof runner and enforces security + proof checks before gh pr close; otherwise link-only.
schema/clawsweeper-supersession-proof.schema.json New JSON schema for model output validation.
prompts/supersession-proof.md New prompt template for general supersession proof in apply flow.
prompts/repair/replacement-closeout-proof.md New prompt template for repair replacement closeout proof.
test/clawsweeper.test.ts Adds mocks + tests covering proof-required close behavior and security/incomplete-proof keep-open behavior.
test/repair/execute-fix-github.test.ts Adds unit tests for the source PR security gating helper.
test/repair/execute-fix-artifact-source.test.ts Adds source-level regression tests ensuring security/proof checks precede closeout calls and prompt changes are in place.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 34 to +54
export function fetchSourcePullRequestView({
repo,
number,
targetDir,
}: {
repo: string;
number: JsonValue;
targetDir: string;
}): LooseRecord {
return JSON.parse(
run(
"gh",
["pr", "view", String(number), "--repo", repo, "--json", "author,state,mergedAt,title,url"],
[
"pr",
"view",
String(number),
"--repo",
repo,
"--json",
"author,state,mergedAt,title,url,body,labels,comments,files,headRefOid,updatedAt",
],
Comment on lines +1840 to +1841
} catch {
return { close: false, reason: "replacement closeout proof output was invalid JSON" };
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 25, 2026

ClawSweeper PR egg

🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress.

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.
What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@nxmxbbd
Copy link
Copy Markdown

nxmxbbd commented May 25, 2026

Adding evidence that the supersession-proof gate in this PR would catch a real
case beyond the closed-unmerged replacement scenario.

openclaw/openclaw#86590 was auto-closed on 2026-05-25 as
duplicate_or_superseded by the merged #85063, but current main still lacks
the follow-up surfaces #86590 was adding:

  • SessionMcpRuntime.getOmittedServers() and the OmittedMcpServerReason enum
  • connect-timeout and list-tools-failed omitted-server reasons on
    materialized tools
  • the active-lease rebind guard message
    bundle-mcp runtime busy; cannot rebind workspace/config while active
  • compact's managed-runtime reuse via getOrCreateSessionMcpRuntime(...) in
    src/agents/pi-embedded-runner/compact.ts
  • the compact runtime-context omit for empty sandboxSessionKey

The ClawSweeper review body on #86590 explicitly said "Keep open for maintainer
review: ... sufficient after-fix proof and no blocking code-review findings"
and emitted <!-- clawsweeper-verdict:needs-human ... confidence=high -->,
but the close-applied marker still fired about eight minutes later. Same
family as openclaw/openclaw#86006 (verdict-vs-apply contract gap) and
#197 (supersede-by-closed-unmerged), except here PR B
(#85063) is merged and PR A (#86590) has unique uncovered work.

If the supersession-proof gate in this PR is fed PR A's file paths and PR B's
merged squash (07f500aa562d — bounded tools/list discovery only),
uniqueSourceWork should be non-empty and the decision should land as
keep_open, so this PR is likely to prevent the #86590-style close in
practice.

One question: is the marker-contract layer (apply lane gated independently on
clawsweeper-verdict:needs-human) also in scope here, or planned separately?
The model proof gate looks robust; a static marker check would be a cheap
second layer against future review-vs-apply drift.

@jesse-merhi jesse-merhi requested a review from a team as a code owner May 26, 2026 00:03
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 Urgent regression or broken agent/channel workflow affecting real users now. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 26, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 26, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 26, 2026
@RomneyDa RomneyDa force-pushed the jesse/pr-supersession-proof branch from 96d9315 to 97aa705 Compare May 26, 2026 20:05
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 26, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. label May 26, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 26, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@RomneyDa RomneyDa force-pushed the jesse/pr-supersession-proof branch from 88250e6 to 96d9315 Compare May 26, 2026 22:28
@RomneyDa
Copy link
Copy Markdown
Member

noting that I accidentally pushed a commit to this pr and then force reverted it.

@nxmxbbd
Copy link
Copy Markdown

nxmxbbd commented May 27, 2026

Adding two more instances of the same supersession-coverage class found in openclaw/openclaw since the original #86590 case linked above. Both are ClawSweeper auto-applied closes (close-applied marker present) with the same self-contradictory narrative-vs-action signature.

Case 2 — openclaw/openclaw#86817

  • Contributor: tianxiaochannel-oss88 (single PR, no post-close response)

  • Closed 2026-05-26T10:08:20Z by ClawSweeper auto-apply

  • Named canonical: fix(openai): avoid stale Responses message id replay openclaw#85277 (state: OPEN, labels include proof: sufficient and status: 👀 ready for maintainer look)

  • ClawSweeper's own narrative inside the close comment:

    Keep open: current main does not yet have this runner-level one-shot Responses continuation recovery, and the related open PRs are not a merged or clearly viable superset.
    Canonical path: Close this PR as superseded by #85277.

File-layer check:

  • #85277 changes 3 files (~57 lines) in the OpenAI Responses transport/replay layer: src/agents/openai-responses-replay.ts, src/agents/openai-responses.reasoning-replay.test.ts, src/agents/openai-transport-stream.ts.
  • #86817 changes 10+ files (~600+ lines) entirely in the embedded-runner layer: new src/agents/pi-embedded-runner/run.ts recovery hook (+145), new openai-responses-transport-replay-recovery.test.ts (+281), pi-embedded-runner/replay-history.ts, pi-embedded-runner/run/attempt.ts, plus runner-side helpers.

Zero file overlap. The canonical patches the transport layer; the closed PR is at the runner layer. The narrative's "runner-level ... recovery" phrase identifies the gap precisely, and the close action discards it.

Case 3 — openclaw/openclaw#87097

  • Contributor: Kaspre (single PR, no post-close response)

  • Closed 2026-05-27T06:28:11Z by ClawSweeper auto-apply

  • Named canonical: fix(cli): wait for gateway client teardown before exit openclaw#70691 (MERGED 2026-04-23, over a month before this close)

  • ClawSweeper's own narrative inside the close comment:

    Keep open for maintainer review: current main still lacks a post-completion local-agent process boundary, this PR has focused source/tests and real Crabbox proof, and I found no blocking patch defect; the remaining concern is the deliberate lifecycle and compatibility tradeoff.
    Canonical path: Close this PR as superseded by #70691.

File-layer check:

  • #70691 merged with changes only to src/gateway/call.ts (16+ / 6-) plus its test and CHANGELOG. Concern: gateway client teardown before exit.
  • #87097 adds new files src/cli/local-agent-lifetime.ts (+320) and src/cli/local-agent-lifetime.test.ts (+308) plus changes to src/entry.ts and src/index.ts. Concern: post-completion CLI lifetime and finite timeout boundary.

Different directory tree (src/cli/ vs src/gateway/), different concern, zero file overlap. Narrative again states "current main still lacks ..." then closes against the month-old merged canonical.

Common signature

All three cases (#86590 above, plus #86817 and #87097) carry the identical narrative-action contradiction inside a single ClawSweeper close comment: the narrative explicitly lists work that current main lacks or that the canonical does not cover, immediately followed by Canonical path: Close this PR as superseded by #X. The auto-apply close acts on the "Canonical path" sentence and discards the narrative.

This is the exact gap this PR's schema-backed coveredWork / uniqueSourceWork fields close: a superseded decision is only valid when uniqueSourceWork is empty, and the prose in each of the three close comments would have forced at least one item into uniqueSourceWork.

Not a #198 angle: #85277 carried proof: sufficient and status: 👀 ready for maintainer look at the time of close, so it sits inside #198's intentional "open replacements that are cleanly mergeable and proof-positive" carve-out. The unsafe property in all three cases is missing coverage proof, not replacement state.

3 distinct contributors, 3 PRs, ~36 hours, same auto-apply path.

@nxmxbbd
Copy link
Copy Markdown

nxmxbbd commented May 27, 2026

Adding a code-path-level read of the bug, with two diagrams, since the narrative-vs-action contradiction is easier to evaluate when the two-step shape is named explicitly. References below are pinned to clawsweeper@main as of 2026-05-27 and PR #196 head 96d9315c; line numbers are paired with symbol names so they remain locatable if main shifts.

Why the narrative and the close action disagree on main

The contradiction lives between two steps in src/clawsweeper.ts on current main, not inside one model output:

  1. Model review step produces a JSON decision with decision, verdict, bestSolution, and prose. For the three observed cases, the model returned decision: "keep_open", verdict: "needs-human", and a bestSolution that contained the "Keep open for maintainer review: ..." text now visible in the close comments.
  2. Apply-lane close-promotion step then runs heuristics that can upgrade a keep_open PR into a duplicate_or_superseded close: livePullRequestHasNoDiff (src/clawsweeper.ts L13871) and pullRequestClosePromotion (L13898 / function defined around L10669-10710 inside the upgrade path). The promotion path overwrites bestSolution with "Close this PR as superseded by ${linkedPull.url}." (upgradePullRequestClosePromotionReport site, L10710) and sets closeReason = "duplicate_or_superseded".
  3. Close-comment render then pulls the original review prose (still says "Keep open ...") and the post-promotion bestSolution (now says "Close as superseded"). Both end up in the same comment because they were written at different steps and the render path does not reconcile them.
  4. duplicateCanonicalPathLine / duplicateCanonicalLinks (L7548 / L7562, callsite around closeOutro rendering) then regex-extract the canonical PR link from the rewritten bestSolution text only, so the close-apply step never sees the keep-open verdict it overrode.

That is why all three observed cases share the same signature: original verdict prose says "keep open / current main lacks ...", the canonical-path sentence and the apply action come from the second-step overwrite.

Pre-#196 flow (the bug)

   STEP 1  Model review
   +---------------------------------------------------------+
   |  decision      = "keep_open"                            |
   |  verdict       = "needs-human"                          |
   |  bestSolution  = "Keep open for maintainer review ..."  |
   |  narrative     = "current main lacks <unique work> ..." |
   +---------------------------------------------------------+
                              |
                              v
   STEP 2  Apply-lane promotion heuristic
   +---------------------------------------------------------+
   |  fires when: no-diff  OR  (stale AND linked-PR present) |
   +---------------------------------------------------------+
                              | fires
                              v
   STEP 3  >>>  OVERWRITE  <<<      (clawsweeper.ts L10710)
   +=========================================================+
   |  closeReason   := "duplicate_or_superseded"             |
   |  bestSolution  := "Close this PR as superseded by #X."  |
   |  narrative     :  UNCHANGED (still says "keep open ...")|
   +=========================================================+
                              |
                              v
   STEP 4  Render close comment
   +---------------------------------------------------------+
   |  - narrative prose         (from STEP 1: keep_open)     |
   |  - "Canonical path: ..."   (regex of new bestSolution)  |
   |  ----> both end up in the same comment, contradictory   |
   +---------------------------------------------------------+
                              |
                              v
   STEP 5  Auto-apply close
   +---------------------------------------------------------+
   |  reads only the rewritten bestSolution                  |
   |  STEP 1's keep_open verdict is never re-consulted       |
   +---------------------------------------------------------+

The double-bordered box is the bug surface. The model's keep_open verdict and the prose narrative both survive into the close comment unchanged; only bestSolution is overwritten, which is the field the close-apply step actually reads.

Post-#196 flow (with the supersession-proof gate)

   STEP 1  Model review                                (unchanged)
   +---------------------------------------------------------+
   |  decision = "keep_open"   bestSolution = "Keep open..." |
   +---------------------------------------------------------+
                              |
                              v
   STEP 2  Apply-lane promotion heuristic              (unchanged)
   +---------------------------------------------------------+
   |  no-diff OR stale+linked-PR  --->  wants to promote     |
   +---------------------------------------------------------+
                              | wants to fire
                              v
   STEP 2.5  >>>  NEW: Supersession-proof gate  <<<
   +=========================================================+
   |  - hydrate PR A + PR B (files / patches / comments)     |
   |  - second LLM call with structured prompt               |
   |  - schema-validated JSON output:                        |
   |        coveredWork[]                                    |
   |        uniqueSourceWork[]                               |
   |        securityBlocked                                  |
   |        decision    in {superseded, keep_open}           |
   +=========================================================+
                              |
              +---------------+---------------+
              |                               |
        decision != superseded         decision == superseded
        OR uniqueSourceWork != []      AND uniqueSourceWork == []
              |                               |
              v                               v
   +-----------------------+         +-----------------------+
   |  promotion BLOCKED    |         |  promotion proceeds   |
   |  PR kept open,        |         |  STEP 3+ as before    |
   |  link-only fallback   |         |  (overwrite + close)  |
   +-----------------------+         +-----------------------+

The new gate runs after the heuristic decides to promote but before the apply path can overwrite bestSolution, and it operates on schema-validated fields (uniqueSourceWork[], coveredWork[], securityBlocked, decision) instead of free-form prose. The three observed close comments would each have produced at least one uniqueSourceWork[] entry because each one already names the unique work in its current narrative (runner-level recovery, post-completion local-agent process boundary, getOmittedServers + lease rebind guard).

Why other PRs in the same window were not hit

The bug class is narrow because the promotion heuristic fires on a specific PR shape. In the same 36h window I scanned, ClawSweeper auto-closed ~70 PRs in openclaw/openclaw. Three were of the class above. The rest fall into separate, intentional paths:

  • ~50 PRs from one organization batch-submitting auto-generated fix-issue-XXXXX-bug-... branches against the same issue keys (5+ PRs per issue). These are literal duplicates among themselves and close correctly under the same duplicate_or_superseded reason.
  • ~10 PRs closed under implemented_on_main / mostly_implemented_on_main (CloseReason enum at src/clawsweeper.ts L175, closeIntro / closeOutro rendering at L7494+). Different decision class; does not go through the promotion path the proof gate guards. Some of those may be wrong on the merits, but they are a separate bug class from the one this PR addresses.
  • A couple closed under low_signal_unmergeable_pr (the won't-land branch) or against an issue marked not-planned. Different paths again.

The susceptibility profile for the bug this PR fixes is roughly: proof-positive PR + topically-adjacent merged or proof-positive open PR + different file surface (different directory tree or layer) + no contributor pushback to interrupt the auto-apply window. All three observed cases match this profile exactly.

Net: the failure shape is the apply-lane overwrite gap on main, the fix shape is a structured proof gate that runs at the same point and blocks the overwrite when uniqueSourceWork is non-empty. The three documented cases confirm the gate would have changed the outcome.

@Kaspre
Copy link
Copy Markdown

Kaspre commented May 27, 2026

Thanks for raising this issue. It's very frustrating, especially when I have explicitly called out what the prior PR did, what the new PR does, and why they are not the same.

For example, "PR openclaw/openclaw#70691, merged on 2026-04-23 by @Takhoffman, fixed the gateway WebSocket teardown path by waiting for gateway client shutdown. This PR (#87097) covers the remaining local-agent CLI lifetime gap where work has completed but non-gateway handles, such as compaction loops, telemetry exporter timers, or plugin loops, still keep the process alive."

This seems straightforward, so I don't know why clawsweeper is still confused by this.

Some additional examples:

openclaw/openclaw#85630 "Close reason: duplicate or superseded."
openclaw/openclaw#85708 Replacement PR - Merged because it was NOT a duplicate of nor superseded by openclaw/openclaw#71465

openclaw/openclaw#85448 "Close reason: duplicate or superseded."
openclaw/openclaw#85707 Replacement PR - Merged because it was NOT a duplicate of nor superseded by openclaw/openclaw#80751

Maybe clawsweeper could give us a warning and a chance to explain before it closes the PR? There doesn't seem to be any way to reopen the PR and contest it's decision after it has been closed. Thank you.

@Kaspre
Copy link
Copy Markdown

Kaspre commented May 27, 2026

Additional feedback from other contributors:

Keshav G [Role icon, Clawtributors] — 5/23/2026 10:07 PM

Hello, my PR got autoclosed by @clawsweeper as superceeded by another PR 83722. But that PR is exactly the reason i created new PR to fix performance regression introduced by it 🤯

Though still WIP
Can someone please reopen the PR so i can improve upon it

openclaw/openclaw#85853

Pinched-Nerve — 5/24/2026 12:00 AM

I think clawsweeper closed a bug in error. It claims 84261 is a duplicate of 71992.

While these two bugs do share similar symptoms, they are not the same.

84261: messages duplicate on occasion (specifically not all the time.)
71992: messages duplicate 100% of the time.

84261: issue started somewhere in the May 2026 betas. (specifically wasn't present before May)
71992: issue is reported in 2026.4.xx builds.

I think it's premature for clawsweeper to mark these as duplicates before a root cause is determined to be the same (or before a fix is made for one that also fixes the other.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P1 Urgent regression or broken agent/channel workflow affecting real users now. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants