fix(samples): propagate handler isError to MCP response; add content-text defense-in-depth by Copilot · Pull Request #40060 · github/gh-aw

Copilot · 2026-06-18T13:12:29Z

In samples mode, apply_samples reported every replay as successful even when the safe-output handler returned {"result":"error", ...}. The handler correctly sets isError: true on its return value, but both MCP dispatch layers discarded it and hardcoded isError: false, making handler failures invisible to the replay driver and producing green runs with no outputs.jsonl.

Root cause

mcp_server_core.cjs (handleRequest / handleMessage) and safe_outputs_mcp_server_http.cjs both did:

const content = handlerResult?.content ?? [];
return { content, isError: false };  // handler's isError silently dropped

The handlers in safe_outputs_handlers.cjs already returned { content, isError: true } on failure — that flag was never forwarded.

Changes

mcp_server_core.cjs — handleRequest and handleMessage now preserve isError from the handler result instead of hardcoding false.
safe_outputs_mcp_server_http.cjs — same fix for both predefined-tool and dynamic safe-job registrations.
apply_samples.cjs — added sampleResultIsError(result) helper as a defense-in-depth fallback: detects {"result":"error"} in the content text even if an older server emits isError: false, keeping the driver resilient to protocol-version skew. The replay loop now uses this helper.

Tests

mcp_server_core.test.cjs: new case verifying a handler returning isError: true produces isError: true in the JSON-RPC response.
apply_samples.test.cjs: 6 unit tests for sampleResultIsError() — covers the success path, the isError flag, the content-text fallback (the pre-fix scenario), non-JSON content, null, and empty content array.

…nt check in apply_samples Co-authored-by: dsyme <7204669+dsyme@users.noreply.github.com>

github-actions · 2026-06-18T13:49:45Z

✅ Test Quality Sentinel completed test quality analysis.

github-actions · 2026-06-18T13:49:50Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-06-18T13:50:01Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #40060 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100).

github-actions · 2026-06-18T13:50:04Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

Copilot

⚠️ Not ready to approve

The newly added mcp_server_core.test.cjs test redundantly re-spies on process.stderr.write despite an existing spy in the enclosing beforeEach, which can cause Vitest to throw or behave inconsistently.

Pull request overview

This PR fixes error propagation in the sample replay path by preserving the isError flag from MCP tool handlers (instead of hardcoding false), and adds a defense-in-depth check in the samples driver to detect handler-encoded {"result":"error"} payloads even under protocol/version skew.

Changes:

Propagate isError from handler results through both MCP dispatch layers (mcp_server_core.cjs + safe_outputs_mcp_server_http.cjs).
Add sampleResultIsError() to apply_samples.cjs and use it in the replay loop as a fallback detector.
Add unit tests covering isError propagation and the new sampleResultIsError() helper.

File summaries

File	Description
actions/setup/js/safe_outputs_mcp_server_http.cjs	Preserves handler-provided `isError` when normalizing tool results for MCP responses.
actions/setup/js/mcp_server_core.cjs	Preserves handler-provided `isError` in both HTTP (`handleRequest`) and stdio (`handleMessage`) dispatch paths.
actions/setup/js/apply_samples.cjs	Adds `sampleResultIsError()` fallback detection and uses it to mark sample replays as failed.
actions/setup/js/mcp_server_core.test.cjs	Adds a regression test asserting `isError:true` is forwarded into the JSON-RPC tools/call result.
actions/setup/js/apply_samples.test.cjs	Adds unit tests for `sampleResultIsError()` covering multiple input shapes and fallbacks.

Copilot's findings

Files reviewed: 5/5 changed files
Comments generated: 1

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-06-18T13:56:05Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 86/100 — Excellent

Analyzed 7 test(s) across 2 JavaScript test files: 7 design tests, 0 implementation tests, 0 guideline violations.

📊 Metrics & Test Classification (7 tests analyzed)

Metric	Value
New/modified tests analyzed	7
✅ Design tests (behavioral contracts)	7 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	6 (86%)
Duplicate test clusters	0
Test inflation detected	Yes — `mcp_server_core.test.cjs` (+44 lines) vs `mcp_server_core.cjs` (+6 lines) = 7.3x ratio
🚨 Coding-guideline violations	0

Test	File	Classification	Issues Detected
`returns false for a successful result`	`apply_samples.test.cjs:491`	✅ Design	—
`returns true when isError is true on the result`	`apply_samples.test.cjs:497`	✅ Design	—
`returns true (defense-in-depth) when content has result:error but isError false`	`apply_samples.test.cjs:505`	✅ Design	—
`returns false for non-JSON content text`	`apply_samples.test.cjs:513`	✅ Design	—
`returns false for null result`	`apply_samples.test.cjs:517`	✅ Design	—
`returns false for empty content array`	`apply_samples.test.cjs:521`	✅ Design	—
`should propagate isError:true from handler result to MCP response`	`mcp_server_core.test.cjs:190`	✅ Design	—

Go: 0 (*_test.go); JavaScript: 7 (*.test.cjs, *.test.js). No other languages detected.

Inflation note: The +44 / +6 ratio for mcp_server_core is expected — testing the handleMessage path requires full server scaffolding (create server, register tool with error handler, send JSON-RPC message, inspect reply). The 10-pt penalty is applied per rubric, but the test itself is high-quality and correctly targets the behavioral regression.

Mocking note: vi.spyOn(process.stderr, "write") in the mcp_server_core test mocks external I/O to suppress noise — this is the accepted pattern.

Verdict

✅ Check passed. 0% implementation tests (threshold: 30%). All 7 new tests verify observable behavioral contracts: sampleResultIsError input/output contracts across 6 scenarios (null safety, empty array, non-JSON, primary error flag, defense-in-depth content check, and happy path), plus end-to-end isError propagation through handleMessage. The PR’s core fix — preserving isError from handler results — is directly exercised.

🧪 Test quality analysis by Test Quality Sentinel · ◷

github-actions

✅ Test Quality Sentinel: 86/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving with non-blocking suggestions on test coverage.

📋 Key Themes & Highlights

Key Themes

Test coverage gaps: handleRequest and the HTTP server fix sites lack regression tests — the three fixes are structurally identical, but only handleMessage is covered by the new test
Test signal isolation: the isError: true unit test in apply_samples.test.cjs conflates two detection paths; cleaner with success content

Positive Highlights

✅ Root cause correctly identified and fixed at all three dispatch sites (handleRequest, handleMessage, HTTP ×2) — no half-measures
✅ sampleResultIsError() is a well-designed defense layer: fast-path on isError, graceful fallback on content text, safe JSON parse with try/catch, and clean early-return structure
✅ The pre-fix scenario (content-text only, isError: false) is explicitly tested and commented — this is excellent as documentation of the original failure mode
✅ All six unit tests for sampleResultIsError cover meaningful boundaries (null, empty, non-JSON, success, flag, fallback)
✅ Export of sampleResultIsError is clearly flagged in the export comment as test-only

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer

github-actions

The core logic fix is correct and minimal — isError is now properly forwarded at all three dispatch sites, and the sampleResultIsError defense-in-depth helper is well-reasoned and well-tested.

🔍 Review themes

Test coverage gaps

Two of the three fixed code paths lack tests:

handleRequest in mcp_server_core.cjs has no unit test at all (only handleMessage is covered by the new test).
safe_outputs_mcp_server_http.cjs has no test file; neither the predefined-tool nor the dynamic safe-job isError fix is verified.

The sampleResultIsError and handleMessage tests are thorough; extending coverage to the other two paths would complete the picture.

Existing concern (already reviewed)

The new handleMessage test redundantly re-creates the server and calls vi.spyOn on process.stderr.write a second time, which the earlier review comment already flagged.

🔎 Code quality review by PR Code Quality Reviewer

- Simplify handleMessage isError test to reuse beforeEach server/stderr spy - Add handleRequest isError propagation regression tests - Isolate flag path in sampleResultIsError flag-precedence test (success content) - Extract normalizeMcpToolResult helper in HTTP server + unit-test both reg paths

Initial plan

cef4429

Copilot AI assigned Copilot and dsyme Jun 18, 2026

Copilot started work on behalf of dsyme June 18, 2026 13:17 View session

fix: preserve isError from handler result; add defense-in-depth conte…

0b9cb68

…nt check in apply_samples Co-authored-by: dsyme <7204669+dsyme@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix samples mode to report errors correctly from safe-output handler~~ fix(samples): propagate handler isError to MCP response; add content-text defense-in-depth Jun 18, 2026

Copilot finished work on behalf of dsyme June 18, 2026 13:29

Copilot AI requested a review from dsyme June 18, 2026 13:29

github-actions Bot mentioned this pull request Jun 18, 2026

[PR Triage Report] [PR Triage] Copilot Agent PR Queue — 2026-06-18 (Run #27762753975) #40066

Open

dsyme marked this pull request as ready for review June 18, 2026 13:48

Copilot AI review requested due to automatic review settings June 18, 2026 13:48

Copilot started reviewing on behalf of dsyme June 18, 2026 13:48 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread actions/setup/js/mcp_server_core.test.cjs Outdated

github-actions Bot mentioned this pull request Jun 18, 2026

[aw] No-Op Runs #39849

Open

github-actions Bot approved these changes Jun 18, 2026

View reviewed changes

Potential fix for pull request finding

f419df0

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

github-actions Bot approved these changes Jun 18, 2026

View reviewed changes

Comment thread actions/setup/js/mcp_server_core.test.cjs

Comment thread actions/setup/js/apply_samples.test.cjs Outdated

Comment thread actions/setup/js/safe_outputs_mcp_server_http.cjs Outdated

github-actions Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread actions/setup/js/mcp_server_core.cjs

Comment thread actions/setup/js/safe_outputs_mcp_server_http.cjs Outdated

dsyme merged commit 5c9d0f8 into main Jun 18, 2026
14 checks passed

dsyme deleted the copilot/fix-samples-mode-error-reporting branch June 18, 2026 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(samples): propagate handler isError to MCP response; add content-text defense-in-depth#40060

fix(samples): propagate handler isError to MCP response; add content-text defense-in-depth#40060
dsyme merged 4 commits into
mainfrom
copilot/fix-samples-mode-error-reporting

Copilot AI commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Changes

Tests

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

⚠️ Not ready to approve

Copilot's findings

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Test coverage gaps

Existing concern (already reviewed)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading