Skip to content

fix(samples): propagate handler isError to MCP response; add content-text defense-in-depth#40060

Merged
dsyme merged 4 commits into
mainfrom
copilot/fix-samples-mode-error-reporting
Jun 18, 2026
Merged

fix(samples): propagate handler isError to MCP response; add content-text defense-in-depth#40060
dsyme merged 4 commits into
mainfrom
copilot/fix-samples-mode-error-reporting

Conversation

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

In samples mode, apply_samples reported every replay as successful even when the safe-output handler returned {"result":"error", ...}. The handler correctly sets isError: true on its return value, but both MCP dispatch layers discarded it and hardcoded isError: false, making handler failures invisible to the replay driver and producing green runs with no outputs.jsonl.

Root cause

mcp_server_core.cjs (handleRequest / handleMessage) and safe_outputs_mcp_server_http.cjs both did:

const content = handlerResult?.content ?? [];
return { content, isError: false };  // handler's isError silently dropped

The handlers in safe_outputs_handlers.cjs already returned { content, isError: true } on failure — that flag was never forwarded.

Changes

  • mcp_server_core.cjshandleRequest and handleMessage now preserve isError from the handler result instead of hardcoding false.
  • safe_outputs_mcp_server_http.cjs — same fix for both predefined-tool and dynamic safe-job registrations.
  • apply_samples.cjs — added sampleResultIsError(result) helper as a defense-in-depth fallback: detects {"result":"error"} in the content text even if an older server emits isError: false, keeping the driver resilient to protocol-version skew. The replay loop now uses this helper.

Tests

  • mcp_server_core.test.cjs: new case verifying a handler returning isError: true produces isError: true in the JSON-RPC response.
  • apply_samples.test.cjs: 6 unit tests for sampleResultIsError() — covers the success path, the isError flag, the content-text fallback (the pre-fix scenario), non-JSON content, null, and empty content array.

…nt check in apply_samples

Co-authored-by: dsyme <7204669+dsyme@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix samples mode to report errors correctly from safe-output handler fix(samples): propagate handler isError to MCP response; add content-text defense-in-depth Jun 18, 2026
Copilot finished work on behalf of dsyme June 18, 2026 13:29
Copilot AI requested a review from dsyme June 18, 2026 13:29
@dsyme dsyme marked this pull request as ready for review June 18, 2026 13:48
Copilot AI review requested due to automatic review settings June 18, 2026 13:48
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #40060 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100).

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

The newly added mcp_server_core.test.cjs test redundantly re-spies on process.stderr.write despite an existing spy in the enclosing beforeEach, which can cause Vitest to throw or behave inconsistently.

Pull request overview

This PR fixes error propagation in the sample replay path by preserving the isError flag from MCP tool handlers (instead of hardcoding false), and adds a defense-in-depth check in the samples driver to detect handler-encoded {"result":"error"} payloads even under protocol/version skew.

Changes:

  • Propagate isError from handler results through both MCP dispatch layers (mcp_server_core.cjs + safe_outputs_mcp_server_http.cjs).
  • Add sampleResultIsError() to apply_samples.cjs and use it in the replay loop as a fallback detector.
  • Add unit tests covering isError propagation and the new sampleResultIsError() helper.
File summaries
File Description
actions/setup/js/safe_outputs_mcp_server_http.cjs Preserves handler-provided isError when normalizing tool results for MCP responses.
actions/setup/js/mcp_server_core.cjs Preserves handler-provided isError in both HTTP (handleRequest) and stdio (handleMessage) dispatch paths.
actions/setup/js/apply_samples.cjs Adds sampleResultIsError() fallback detection and uses it to mark sample replays as failed.
actions/setup/js/mcp_server_core.test.cjs Adds a regression test asserting isError:true is forwarded into the JSON-RPC tools/call result.
actions/setup/js/apply_samples.test.cjs Adds unit tests for sampleResultIsError() covering multiple input shapes and fallbacks.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread actions/setup/js/mcp_server_core.test.cjs Outdated
@github-actions github-actions Bot mentioned this pull request Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 86/100 — Excellent

Analyzed 7 test(s) across 2 JavaScript test files: 7 design tests, 0 implementation tests, 0 guideline violations.

📊 Metrics & Test Classification (7 tests analyzed)
Metric Value
New/modified tests analyzed 7
✅ Design tests (behavioral contracts) 7 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 6 (86%)
Duplicate test clusters 0
Test inflation detected Yes — mcp_server_core.test.cjs (+44 lines) vs mcp_server_core.cjs (+6 lines) = 7.3x ratio
🚨 Coding-guideline violations 0
Test File Classification Issues Detected
returns false for a successful result apply_samples.test.cjs:491 ✅ Design
returns true when isError is true on the result apply_samples.test.cjs:497 ✅ Design
returns true (defense-in-depth) when content has result:error but isError false apply_samples.test.cjs:505 ✅ Design
returns false for non-JSON content text apply_samples.test.cjs:513 ✅ Design
returns false for null result apply_samples.test.cjs:517 ✅ Design
returns false for empty content array apply_samples.test.cjs:521 ✅ Design
should propagate isError:true from handler result to MCP response mcp_server_core.test.cjs:190 ✅ Design

Go: 0 (*_test.go); JavaScript: 7 (*.test.cjs, *.test.js). No other languages detected.

Inflation note: The +44 / +6 ratio for mcp_server_core is expected — testing the handleMessage path requires full server scaffolding (create server, register tool with error handler, send JSON-RPC message, inspect reply). The 10-pt penalty is applied per rubric, but the test itself is high-quality and correctly targets the behavioral regression.

Mocking note: vi.spyOn(process.stderr, "write") in the mcp_server_core test mocks external I/O to suppress noise — this is the accepted pattern.

Verdict

Check passed. 0% implementation tests (threshold: 30%). All 7 new tests verify observable behavioral contracts: sampleResultIsError input/output contracts across 6 scenarios (null safety, empty array, non-JSON, primary error flag, defense-in-depth content check, and happy path), plus end-to-end isError propagation through handleMessage. The PR’s core fix — preserving isError from handler results — is directly exercised.

🧪 Test quality analysis by Test Quality Sentinel ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 86/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving with non-blocking suggestions on test coverage.

📋 Key Themes & Highlights

Key Themes

  • Test coverage gaps: handleRequest and the HTTP server fix sites lack regression tests — the three fixes are structurally identical, but only handleMessage is covered by the new test
  • Test signal isolation: the isError: true unit test in apply_samples.test.cjs conflates two detection paths; cleaner with success content

Positive Highlights

  • ✅ Root cause correctly identified and fixed at all three dispatch sites (handleRequest, handleMessage, HTTP ×2) — no half-measures
  • sampleResultIsError() is a well-designed defense layer: fast-path on isError, graceful fallback on content text, safe JSON parse with try/catch, and clean early-return structure
  • ✅ The pre-fix scenario (content-text only, isError: false) is explicitly tested and commented — this is excellent as documentation of the original failure mode
  • ✅ All six unit tests for sampleResultIsError cover meaningful boundaries (null, empty, non-JSON, success, flag, fallback)
  • ✅ Export of sampleResultIsError is clearly flagged in the export comment as test-only

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer

Comment thread actions/setup/js/mcp_server_core.test.cjs
Comment thread actions/setup/js/apply_samples.test.cjs Outdated
Comment thread actions/setup/js/safe_outputs_mcp_server_http.cjs Outdated

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core logic fix is correct and minimal — isError is now properly forwarded at all three dispatch sites, and the sampleResultIsError defense-in-depth helper is well-reasoned and well-tested.

🔍 Review themes

Test coverage gaps

Two of the three fixed code paths lack tests:

  • handleRequest in mcp_server_core.cjs has no unit test at all (only handleMessage is covered by the new test).
  • safe_outputs_mcp_server_http.cjs has no test file; neither the predefined-tool nor the dynamic safe-job isError fix is verified.

The sampleResultIsError and handleMessage tests are thorough; extending coverage to the other two paths would complete the picture.

Existing concern (already reviewed)

The new handleMessage test redundantly re-creates the server and calls vi.spyOn on process.stderr.write a second time, which the earlier review comment already flagged.

🔎 Code quality review by PR Code Quality Reviewer

Comment thread actions/setup/js/mcp_server_core.cjs
Comment thread actions/setup/js/safe_outputs_mcp_server_http.cjs Outdated
- Simplify handleMessage isError test to reuse beforeEach server/stderr spy
- Add handleRequest isError propagation regression tests
- Isolate flag path in sampleResultIsError flag-precedence test (success content)
- Extract normalizeMcpToolResult helper in HTTP server + unit-test both reg paths
@dsyme dsyme merged commit 5c9d0f8 into main Jun 18, 2026
14 checks passed
@dsyme dsyme deleted the copilot/fix-samples-mode-error-reporting branch June 18, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants