github · pelikhan · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026
diff --git a/docs/adr/38544-cap-mcp-server-child-process-concurrency.md b/docs/adr/38544-cap-mcp-server-child-process-concurrency.md
@@ -0,0 +1,40 @@
+# ADR-38544: Cap MCP Server Child Process Concurrency with a Shared Semaphore Guardrail
+
+**Date**: 2026-06-11
+**Status**: Accepted — the Go MCP server now enforces a shared 4-process subprocess cap.
+**Deciders**: pelikhan, Copilot
+
+## Context
+
+The Go MCP server shells out to `gh` and `gh aw` child processes to service tool calls (workflow add/update/fix, logs, audit, diff, compile, inspect), as well as for supporting operations such as repository lookup, actor permission checks, config validation, and the `gh version` startup probe. Under concurrent tool usage these subprocess invocations were unbounded: every in-flight request that needed a child process spawned one immediately, so a burst of concurrent requests could fan out an arbitrary number of `gh`/`gh aw` processes and exhaust host resources (file descriptors, memory, process table). There was no central place enforcing a ceiling, and each call site managed its own `cmd.Output()` / `cmd.CombinedOutput()` independently. This change is preventive hardening: the issue was to add an explicit server-side guardrail before resource pressure showed up as flaky or host-specific failures.
+
+## Decision
+
+We will cap the number of simultaneously-active server-managed child processes at **4** using a single shared, context-aware guardrail. The guardrail (`mcpSubprocessGuardrail`) is a buffered-channel semaphore: each subprocess acquires one slot before executing and releases it when done. Acquisition is context-aware, so a cancelled request stops waiting rather than blocking behind queued subprocesses. All subprocess call sites route through the shared `defaultMCPSubprocessGuardrail` via the helpers `runMCPSubprocessOutput`, `runMCPSubprocessCombinedOutput`, `runMCPExecOutput`, and `runMCPExecCombinedOutput`, replacing direct `cmd.Output()`/`cmd.CombinedOutput()` calls. Output contracts and tool behavior are unchanged; only concurrency is bounded.
+
+## Alternatives Considered
+
+### Alternative 1: Leave subprocess spawning unbounded
+Keep relying on the operating system and `gh`'s own behavior to absorb concurrent load. Rejected because the failure mode (resource exhaustion under a burst of concurrent tool calls) is silent and hard to diagnose, and the server has no backpressure mechanism of its own once limits are hit.
+
+### Alternative 2: Per-call-site or per-tool limits
+Give each tool or call site its own concurrency limit rather than one shared cap. Rejected because the resource pressure is global — total live child processes is what matters — so independent per-tool counters could still sum well past a safe ceiling, and would duplicate limiting logic across many files.
+
+### Alternative 3: Explicit worker pool / job queue
+Introduce a fixed pool of worker goroutines that own subprocess execution, with requests submitting jobs to a queue. Rejected as heavier than needed: it changes the execution model and call-site ergonomics, whereas a buffered-channel semaphore achieves the same bound with a minimal, drop-in wrapper around existing `exec.Cmd` calls.
+
+## Consequences
+
+### Positive
+- Total concurrent server-managed child processes is bounded at 4, preventing unbounded fan-out and the associated resource exhaustion.
+- A single chokepoint (`defaultMCPSubprocessGuardrail`) centralizes the limit; future call sites that use the helpers are covered automatically.
+- Slot acquisition respects `context` cancellation, so cancelled or timed-out requests do not hang waiting for a slot.
+
+### Negative
+- The limit `4` is a hardcoded constant (`maxActiveMCPChildProcesses`) and is not configurable per host or deployment; tuning requires a code change. That is intentional for this first guardrail because the goal is to add one deterministic ceiling without introducing new user-facing configuration surface.
+- Under high concurrency, requests block while waiting for a free slot, which can increase tail latency for tool calls that previously ran immediately.
+- Correctness depends on every subprocess call site using the guardrail helpers; a direct `cmd.Output()`/`cmd.CombinedOutput()` added later would silently bypass the cap.
+
+### Neutral
+- The guardrail is process-global state (a package-level `defaultMCPSubprocessGuardrail`), shared across all MCP requests in the server process.
+- Existing stdout/stderr separation (using `Output()` vs `CombinedOutput()` for JSON-producing commands) is preserved through distinct helper variants.
diff --git a/pkg/cli/mcp_helpers.go b/pkg/cli/mcp_helpers.go
@@ -7,7 +7,9 @@ package cli
 import (
 	"fmt"
 	"os"
+	"os/exec"
 	"path/filepath"
+	"strings"
 
 	"github.com/github/gh-aw/pkg/logger"
 )
@@ -67,3 +69,27 @@ func logAndValidateBinaryPath() (string, error) {
 	mcpHelpersLog.Printf("gh-aw binary path: %s", binaryPath)
 	return binaryPath, nil
 }
+
+func withNonInteractiveCIEnv(env []string) []string {
+	if env == nil {
+		env = os.Environ()
+	}
+
+	env = append([]string(nil), env...)
+	for i, entry := range env {
+		if strings.HasPrefix(entry, "CI=") {
+			env[i] = "CI=1"
+			return env
+		}
+	}
+
+	return append(env, "CI=1")
+}
+
+func setNonInteractiveCIEnv(cmd *exec.Cmd) {
+	if cmd == nil {
+		return
+	}
+
+	cmd.Env = withNonInteractiveCIEnv(cmd.Env)
+}
diff --git a/pkg/cli/mcp_permissions.go b/pkg/cli/mcp_permissions.go
@@ -34,7 +34,7 @@ func queryActorRole(ctx context.Context, actor string, repo string) (string, err
 	mcpLog.Printf("Querying GitHub API for %s's permission in %s", actor, repo)
 
 	cmd := workflow.ExecGHContext(ctx, "api", apiPath, "--jq", ".permission")
-	output, err := cmd.Output()
+	output, err := runMCPSubprocessOutput(ctx, cmd)
 	if err != nil {
 		mcpLog.Printf("Failed to query actor permission: %v", err)
 		return "", fmt.Errorf("failed to query actor permission: %w", err)
@@ -83,7 +83,7 @@ func checkActorPermission(ctx context.Context, actor string, validateActor bool,
 	}
 
 	// Get repository using cached lookup
-	repo, err := getRepository()
+	repo, err := getRepository(ctx)
 	if err != nil {
 		mcpLog.Printf("Tool %s: failed to get repository context, denying access: %v", toolName, err)
 		return newMCPError(jsonrpc.CodeInternalError, "permission check failed", map[string]any{

diff --git a/pkg/cli/mcp_repository.go b/pkg/cli/mcp_repository.go
@@ -1,6 +1,7 @@
 package cli
 
 import (
+	"context"
 	"errors"
 	"fmt"
 	"os"
@@ -12,7 +13,7 @@ import (
 // getRepository retrieves the current repository name (owner/repo format).
 // Results are cached for 1 hour to avoid repeated queries.
 // Checks GITHUB_REPOSITORY environment variable first, then falls back to gh repo view.
-func getRepository() (string, error) {
+func getRepository(ctx context.Context) (string, error) {
 	// Check cache first
 	if repo, ok := mcpCache.GetRepo(); ok {
 		mcpLog.Printf("Using cached repository: %s", repo)
@@ -29,8 +30,8 @@ func getRepository() (string, error) {
 
 	// Fall back to gh repo view
 	mcpLog.Print("Querying repository using gh repo view")
-	cmd := workflow.ExecGH("repo", "view", "--json", "nameWithOwner", "--jq", ".nameWithOwner")
-	output, err := cmd.Output()
+	cmd := workflow.ExecGHContext(ctx, "repo", "view", "--json", "nameWithOwner", "--jq", ".nameWithOwner")
+	output, err := runMCPSubprocessOutput(ctx, cmd)
 	if err != nil {
 		mcpLog.Printf("Failed to get repository: %v", err)
 		return "", fmt.Errorf("failed to get repository: %w", err)

diff --git a/pkg/cli/mcp_server.go b/pkg/cli/mcp_server.go
@@ -16,15 +16,21 @@ type execCmdFunc func(ctx context.Context, args ...string) *exec.Cmd
 // The second return value is a reserved SDK extension slot and must be nil for now.
 
 // createMCPServer creates and configures the MCP server with all tools
-func createMCPServer(cmdPath string, actor string, validateActor bool, manifestCacheFile string) *mcp.Server {
+func createMCPServer(cmdPath string, actor string, validateActor bool, manifestCacheFile string, env []string) *mcp.Server {
 	// Helper function to execute command with proper path
 	execCmd := func(ctx context.Context, args ...string) *exec.Cmd {
+		var cmd *exec.Cmd
 		if cmdPath != "" {
 			// Use custom command path
-			return exec.CommandContext(ctx, cmdPath, args...)
+			cmd = exec.CommandContext(ctx, cmdPath, args...)
+		} else {
+			// Use default gh aw command with proper token handling
+			cmd = workflow.ExecGHContext(ctx, append([]string{"aw"}, args...)...)
+		}
+		if env != nil {
+			cmd.Env = append([]string(nil), env...)
 		}
-		// Use default gh aw command with proper token handling
-		return workflow.ExecGHContext(ctx, append([]string{"aw"}, args...)...)
+		return cmd
 	}
 
 	// Log actor and validation settings

diff --git a/pkg/cli/mcp_server_command.go b/pkg/cli/mcp_server_command.go
@@ -3,10 +3,8 @@ package cli
 import (
 	"context"
 	"os"
-	"strings"
 
 	"github.com/github/gh-aw/pkg/logger"
-	"github.com/github/gh-aw/pkg/workflow"
 	"github.com/modelcontextprotocol/go-sdk/mcp"
 	"github.com/spf13/cobra"
 )
@@ -60,7 +58,7 @@ Examples:
   GITHUB_ACTOR=octocat gh aw mcp-server                # Set actor via environment variable for access control
   DEBUG=mcp:* GITHUB_ACTOR=octocat gh aw mcp-server    # Run with verbose debug logging and actor set via environment variable`,
 		RunE: func(cmd *cobra.Command, args []string) error {
-			return runMCPServer(port, cmdPath, validateActor)
+			return runMCPServer(cmd.Context(), port, cmdPath, validateActor)
 		},
 	}
 
@@ -71,24 +69,10 @@ Examples:
 	return cmd
 }
 
-// checkAndLogGHVersion checks if gh CLI is available and logs its version.
-// Diagnostics are emitted through the debug logger only.
-func checkAndLogGHVersion() {
-	cmd := workflow.ExecGH("version")
-	output, err := cmd.CombinedOutput()
-
-	if err != nil {
-		mcpLog.Print("WARNING: gh CLI not found in PATH")
-		return
-	}
-
-	// Parse and log the version
-	versionOutput := strings.TrimSpace(string(output))
-	mcpLog.Printf("gh CLI version: %s", versionOutput)
-}
-
 // runMCPServer starts the MCP server on stdio or HTTP transport
-func runMCPServer(port int, cmdPath string, validateActor bool) error {
+func runMCPServer(ctx context.Context, port int, cmdPath string, validateActor bool) error {
+	mcpServerEnv := withNonInteractiveCIEnv(nil)
+
 	// Get actor from environment variable
 	actor := os.Getenv("GITHUB_ACTOR")
 
@@ -127,13 +111,10 @@ func runMCPServer(port int, cmdPath string, validateActor bool) error {
 		mcpLog.Printf("WARNING: Failed to get current working directory: %v", err)
 	}
 
-	// Check and log gh CLI version
-	checkAndLogGHVersion()
-
 	// Validate that the CLI and secrets are properly configured
 	// Note: Validation failures are logged as warnings but don't prevent server startup
 	// This allows the server to start in test environments or non-repository directories
-	if err := validateMCPServerConfiguration(cmdPath); err != nil {
+	if err := validateMCPServerConfiguration(ctx, cmdPath, mcpServerEnv); err != nil {
 		mcpLog.Printf("Configuration validation warning: %v", err)
 	}
 
@@ -159,7 +140,7 @@ func runMCPServer(port int, cmdPath string, validateActor bool) error {
 	}
 
 	// Create the server configuration
-	server := createMCPServer(cmdPath, actor, validateActor, manifestCacheFile)
+	server := createMCPServer(cmdPath, actor, validateActor, manifestCacheFile, mcpServerEnv)
 
 	if port > 0 {
 		// Run HTTP server with SSE transport
@@ -168,5 +149,5 @@ func runMCPServer(port int, cmdPath string, validateActor bool) error {
 
 	// Run stdio transport
 	mcpLog.Print("MCP server ready on stdio")
-	return server.Run(context.Background(), &mcp.StdioTransport{})
+	return server.Run(ctx, &mcp.StdioTransport{})
 }
diff --git a/pkg/cli/mcp_server_unit_test.go b/pkg/cli/mcp_server_unit_test.go
@@ -19,7 +19,7 @@ import (
 // TestMCPServerUnit_ListTools verifies that the MCP server exposes exactly the
 // expected set of tools without spawning a subprocess.
 func TestMCPServerUnit_ListTools(t *testing.T) {
-	server := createMCPServer("", "", false, "")
+	server := createMCPServer("", "", false, "", nil)
 	session := connectInMemory(t, server)
 
 	ctx := context.Background()
@@ -43,7 +43,7 @@ func TestMCPServerUnit_ListTools(t *testing.T) {
 // TestMCPServerUnit_ServerCapabilities verifies that the server advertises the
 // Tools capability with ListChanged=false (tools are static, no notifications needed).
 func TestMCPServerUnit_ServerCapabilities(t *testing.T) {
-	server := createMCPServer("", "", false, "")
+	server := createMCPServer("", "", false, "", nil)
 	session := connectInMemory(t, server)
 
 	initResult := session.InitializeResult()
@@ -66,7 +66,7 @@ func TestMCPServerUnit_StatusTool(t *testing.T) {
 	require.NoError(t, os.Chdir(tmpDir), "should change to temp dir")
 	t.Cleanup(func() { _ = os.Chdir(origDir) })
 
-	server := createMCPServer("", "", false, "")
+	server := createMCPServer("", "", false, "", nil)
 	session := connectInMemory(t, server)
 
 	ctx := context.Background()

diff --git a/pkg/cli/mcp_subprocess_guardrail.go b/pkg/cli/mcp_subprocess_guardrail.go
@@ -0,0 +1,81 @@
+package cli
+
+import (
+	"context"
+	"os/exec"
+)
+
+const maxActiveMCPChildProcesses = 4
+
+type mcpSubprocessGuardrail struct {
+	slots chan struct{}
+}
+
+var defaultMCPSubprocessGuardrail = newMCPSubprocessGuardrail(maxActiveMCPChildProcesses)
+
+func newMCPSubprocessGuardrail(limit int) *mcpSubprocessGuardrail {
+	return &mcpSubprocessGuardrail{
+		slots: make(chan struct{}, limit),
+	}
+}
+
+func (g *mcpSubprocessGuardrail) acquire(ctx context.Context) error {
+	if err := ctx.Err(); err != nil {
+		return err
+	}
+
+	select {
+	case g.slots <- struct{}{}:
+		if err := ctx.Err(); err != nil {
+			g.release()
+			return err
+		}
+		return nil
+	case <-ctx.Done():
+		return ctx.Err()
+	}
+}
+
+func (g *mcpSubprocessGuardrail) release() {
+	<-g.slots
+}
+
+func (g *mcpSubprocessGuardrail) output(ctx context.Context, cmd *exec.Cmd) ([]byte, error) {
+	if err := g.acquire(ctx); err != nil {
+		return nil, err
+	}
+	defer g.release()
+
+	return cmd.Output()
+}
+
+func (g *mcpSubprocessGuardrail) combinedOutput(ctx context.Context, cmd *exec.Cmd) ([]byte, error) {
+	if err := g.acquire(ctx); err != nil {
+		return nil, err
+	}
+	defer g.release()
+
+	return cmd.CombinedOutput()
+}
+
+// runMCPSubprocessOutput executes cmd under the shared MCP subprocess guardrail.
+// ctx governs slot acquisition and any subprocess cancellation only when cmd was
+// created with the same context (for example via exec.CommandContext or ExecGHContext).
+func runMCPSubprocessOutput(ctx context.Context, cmd *exec.Cmd) ([]byte, error) {
+	return defaultMCPSubprocessGuardrail.output(ctx, cmd)
+}
+
+// runMCPSubprocessCombinedOutput executes cmd under the shared MCP subprocess
+// guardrail. ctx governs slot acquisition and any subprocess cancellation only
+// when cmd was created with the same context.
+func runMCPSubprocessCombinedOutput(ctx context.Context, cmd *exec.Cmd) ([]byte, error) {
+	return defaultMCPSubprocessGuardrail.combinedOutput(ctx, cmd)
+}
+
+func runMCPExecOutput(ctx context.Context, execCmd execCmdFunc, args ...string) ([]byte, error) {
+	return runMCPSubprocessOutput(ctx, execCmd(ctx, args...))
+}
+
+func runMCPExecCombinedOutput(ctx context.Context, execCmd execCmdFunc, args ...string) ([]byte, error) {
+	return runMCPSubprocessCombinedOutput(ctx, execCmd(ctx, args...))
+}