-
Notifications
You must be signed in to change notification settings - Fork 427
Add a 4-process guardrail for Go MCP server child gh invocations #38544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fdbb911
77b9170
e9b2e46
254b1c9
e4eab79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| # ADR-38544: Cap MCP Server Child Process Concurrency with a Shared Semaphore Guardrail | ||
|
|
||
| **Date**: 2026-06-11 | ||
| **Status**: Accepted — the Go MCP server now enforces a shared 4-process subprocess cap. | ||
| **Deciders**: pelikhan, Copilot | ||
|
|
||
| ## Context | ||
|
|
||
| The Go MCP server shells out to `gh` and `gh aw` child processes to service tool calls (workflow add/update/fix, logs, audit, diff, compile, inspect), as well as for supporting operations such as repository lookup, actor permission checks, config validation, and the `gh version` startup probe. Under concurrent tool usage these subprocess invocations were unbounded: every in-flight request that needed a child process spawned one immediately, so a burst of concurrent requests could fan out an arbitrary number of `gh`/`gh aw` processes and exhaust host resources (file descriptors, memory, process table). There was no central place enforcing a ceiling, and each call site managed its own `cmd.Output()` / `cmd.CombinedOutput()` independently. This change is preventive hardening: the issue was to add an explicit server-side guardrail before resource pressure showed up as flaky or host-specific failures. | ||
|
|
||
| ## Decision | ||
|
|
||
| We will cap the number of simultaneously-active server-managed child processes at **4** using a single shared, context-aware guardrail. The guardrail (`mcpSubprocessGuardrail`) is a buffered-channel semaphore: each subprocess acquires one slot before executing and releases it when done. Acquisition is context-aware, so a cancelled request stops waiting rather than blocking behind queued subprocesses. All subprocess call sites route through the shared `defaultMCPSubprocessGuardrail` via the helpers `runMCPSubprocessOutput`, `runMCPSubprocessCombinedOutput`, `runMCPExecOutput`, and `runMCPExecCombinedOutput`, replacing direct `cmd.Output()`/`cmd.CombinedOutput()` calls. Output contracts and tool behavior are unchanged; only concurrency is bounded. | ||
|
|
||
| ## Alternatives Considered | ||
|
|
||
| ### Alternative 1: Leave subprocess spawning unbounded | ||
| Keep relying on the operating system and `gh`'s own behavior to absorb concurrent load. Rejected because the failure mode (resource exhaustion under a burst of concurrent tool calls) is silent and hard to diagnose, and the server has no backpressure mechanism of its own once limits are hit. | ||
|
|
||
| ### Alternative 2: Per-call-site or per-tool limits | ||
| Give each tool or call site its own concurrency limit rather than one shared cap. Rejected because the resource pressure is global — total live child processes is what matters — so independent per-tool counters could still sum well past a safe ceiling, and would duplicate limiting logic across many files. | ||
|
|
||
| ### Alternative 3: Explicit worker pool / job queue | ||
| Introduce a fixed pool of worker goroutines that own subprocess execution, with requests submitting jobs to a queue. Rejected as heavier than needed: it changes the execution model and call-site ergonomics, whereas a buffered-channel semaphore achieves the same bound with a minimal, drop-in wrapper around existing `exec.Cmd` calls. | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### Positive | ||
| - Total concurrent server-managed child processes is bounded at 4, preventing unbounded fan-out and the associated resource exhaustion. | ||
| - A single chokepoint (`defaultMCPSubprocessGuardrail`) centralizes the limit; future call sites that use the helpers are covered automatically. | ||
| - Slot acquisition respects `context` cancellation, so cancelled or timed-out requests do not hang waiting for a slot. | ||
|
|
||
| ### Negative | ||
| - The limit `4` is a hardcoded constant (`maxActiveMCPChildProcesses`) and is not configurable per host or deployment; tuning requires a code change. That is intentional for this first guardrail because the goal is to add one deterministic ceiling without introducing new user-facing configuration surface. | ||
| - Under high concurrency, requests block while waiting for a free slot, which can increase tail latency for tool calls that previously ran immediately. | ||
| - Correctness depends on every subprocess call site using the guardrail helpers; a direct `cmd.Output()`/`cmd.CombinedOutput()` added later would silently bypass the cap. | ||
|
|
||
| ### Neutral | ||
| - The guardrail is process-global state (a package-level `defaultMCPSubprocessGuardrail`), shared across all MCP requests in the server process. | ||
| - Existing stdout/stderr separation (using `Output()` vs `CombinedOutput()` for JSON-producing commands) is preserved through distinct helper variants. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| package cli | ||
|
|
||
| import ( | ||
| "context" | ||
| "os/exec" | ||
| ) | ||
|
|
||
| const maxActiveMCPChildProcesses = 4 | ||
|
|
||
| type mcpSubprocessGuardrail struct { | ||
| slots chan struct{} | ||
| } | ||
|
|
||
| var defaultMCPSubprocessGuardrail = newMCPSubprocessGuardrail(maxActiveMCPChildProcesses) | ||
|
|
||
| func newMCPSubprocessGuardrail(limit int) *mcpSubprocessGuardrail { | ||
| return &mcpSubprocessGuardrail{ | ||
| slots: make(chan struct{}, limit), | ||
| } | ||
| } | ||
|
|
||
| func (g *mcpSubprocessGuardrail) acquire(ctx context.Context) error { | ||
| if err := ctx.Err(); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| select { | ||
| case g.slots <- struct{}{}: | ||
| if err := ctx.Err(); err != nil { | ||
| g.release() | ||
| return err | ||
| } | ||
| return nil | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| } | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
💡 Detail and suggested fixThe companion test
Fix with a pre-check that makes the already-cancelled case deterministic: func (g *mcpSubprocessGuardrail) acquire(ctx context.Context) error {
if err := ctx.Err(); err != nil {
return err
}
select {
case g.slots <- struct{}{}:
return nil
case <-ctx.Done():
return ctx.Err()
}
}The TOCTOU window after the check is harmless — a context cancelled right after still proceeds, but the cmd (created with |
||
|
|
||
| func (g *mcpSubprocessGuardrail) release() { | ||
| <-g.slots | ||
| } | ||
|
|
||
| func (g *mcpSubprocessGuardrail) output(ctx context.Context, cmd *exec.Cmd) ([]byte, error) { | ||
| if err := g.acquire(ctx); err != nil { | ||
| return nil, err | ||
| } | ||
| defer g.release() | ||
|
|
||
| return cmd.Output() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Context is silently discarded after slot acquisition — 💡 DetailThis is a real exposure in At minimum, document the contract on // runMCPSubprocessOutput executes cmd under the guardrail semaphore.
// ctx is used only for semaphore acquisition. To bound subprocess execution,
// cmd must be created with exec.CommandContext or workflow.ExecGHContext
// using the same ctx.
func runMCPSubprocessOutput(ctx context.Context, cmd *exec.Cmd) ([]byte, error) {The stronger fix is to enforce context-aware cmd creation at the call sites that currently use |
||
| } | ||
|
|
||
| func (g *mcpSubprocessGuardrail) combinedOutput(ctx context.Context, cmd *exec.Cmd) ([]byte, error) { | ||
| if err := g.acquire(ctx); err != nil { | ||
| return nil, err | ||
| } | ||
| defer g.release() | ||
|
|
||
| return cmd.CombinedOutput() | ||
| } | ||
|
|
||
| // runMCPSubprocessOutput executes cmd under the shared MCP subprocess guardrail. | ||
| // ctx governs slot acquisition and any subprocess cancellation only when cmd was | ||
| // created with the same context (for example via exec.CommandContext or ExecGHContext). | ||
| func runMCPSubprocessOutput(ctx context.Context, cmd *exec.Cmd) ([]byte, error) { | ||
| return defaultMCPSubprocessGuardrail.output(ctx, cmd) | ||
| } | ||
|
|
||
| // runMCPSubprocessCombinedOutput executes cmd under the shared MCP subprocess | ||
| // guardrail. ctx governs slot acquisition and any subprocess cancellation only | ||
| // when cmd was created with the same context. | ||
| func runMCPSubprocessCombinedOutput(ctx context.Context, cmd *exec.Cmd) ([]byte, error) { | ||
| return defaultMCPSubprocessGuardrail.combinedOutput(ctx, cmd) | ||
| } | ||
|
|
||
| func runMCPExecOutput(ctx context.Context, execCmd execCmdFunc, args ...string) ([]byte, error) { | ||
| return runMCPSubprocessOutput(ctx, execCmd(ctx, args...)) | ||
| } | ||
|
|
||
| func runMCPExecCombinedOutput(ctx context.Context, execCmd execCmdFunc, args ...string) ([]byte, error) { | ||
| return runMCPSubprocessCombinedOutput(ctx, execCmd(ctx, args...)) | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.