fix: isolate copilot_sdk_driver test session state writes to prevent false-positive tool-denial issues#39940
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…false-positive tool-denial issues Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses a false-positive guard.tool_denials_exceeded detection by isolating Copilot SDK session-state JSONL writes during JavaScript validation tests, so test-only “tool denial” events don’t get picked up by post-agent log scanners.
Changes:
- Adds a configurable Copilot session state base directory in
runWithCopilotSDK(parameter +GH_AW_SESSION_STATE_BASE_DIRenv override). - Updates
copilot_sdk_drivertests to redirect session-state writes into a temporary per-suite directory and clean it up afterward. - Adjusts the header comment in the setup-cli
install.shscript (but the new wording appears incorrect/self-referential).
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/copilot_sdk_session.cjs | Allows redirecting Copilot session-state event log writes via option/env var to avoid polluting the default sandbox path. |
| actions/setup/js/copilot_sdk_driver.test.cjs | Sets GH_AW_SESSION_STATE_BASE_DIR to an isolated temp dir for the test suite to prevent false-positive tool-denial detection. |
| actions/setup-cli/install.sh | Changes the “kept in sync” header comment; current wording likely contradicts the documented generation flow. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
| set +o histexpand | ||
|
|
||
| # Kept in sync with install-gh-aw.sh — edit that file, then copy to this path. | ||
| # Kept in sync with actions/setup-cli/install.sh — edit this file, then copy to that path. |
| let testSessionStateDir; | ||
| beforeAll(() => { | ||
| testSessionStateDir = fs.mkdtempSync(path.join(os.tmpdir(), "gh-aw-test-session-state-")); | ||
| process.env.GH_AW_SESSION_STATE_BASE_DIR = testSessionStateDir; | ||
| }); | ||
| afterAll(() => { | ||
| delete process.env.GH_AW_SESSION_STATE_BASE_DIR; | ||
| fs.rmSync(testSessionStateDir, { recursive: true, force: true }); | ||
| }); |
|
Hey One thing to tighten up before merging:
If you'd like a hand splitting this up, here's a ready-to-use prompt: Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "patchdiff.githubusercontent.com"See Network Configuration for more information.
|
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed for PR #39940: PR does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (default_business_additions=0, threshold=100). Neither Condition A nor Condition B is met. |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 90/100 — Excellent
📊 Metrics & Test Classification (0 tests analyzed)
Go: 0 ( Nature of the change: Production counterpart ( Verdict
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — clean targeted fix with a few improvement opportunities.
📋 Key Themes & Highlights
Key Themes
- Self-verification gap: The fix prevents test pollution via isolation hooks, but no assertion confirms the redirect actually fires — a future regression could silently undo the fix until
handle_agent_failure.cjsfires again. - Untested/unreachable parameter:
sessionStateBaseDiris added to therunWithCopilotSDKsignature but isn't forwarded by the production call site (copilot_sdk_driver.cjs) and isn't tested directly. Consider removing it or adding a test that exercises it. - Comment narrows env-var intent to tests:
GH_AW_SESSION_STATE_BASE_DIRis useful beyond tests (CI sandboxing, etc.); the comment should reflect that.
Positive Highlights
- ✅ Root cause correctly identified and addressed at the source, not just symptom-patched
- ✅
mkdtempSyncguarantees a unique, fresh directory per run — no path collision risk - ✅
delete process.env.GH_AW_SESSION_STATE_BASE_DIRinafterAllkeeps env clean for other test files - ✅
force: trueonrmSynctolerates a missing directory gracefully - ✅
install.shcomment correction is a nice housekeeping touch
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer
| }); | ||
| afterAll(() => { | ||
| delete process.env.GH_AW_SESSION_STATE_BASE_DIR; | ||
| fs.rmSync(testSessionStateDir, { recursive: true, force: true }); |
There was a problem hiding this comment.
[/tdd] afterAll calls fs.rmSync(testSessionStateDir, ...) but if mkdtempSync in beforeAll throws, testSessionStateDir will be undefined, causing rmSync to throw with an unhelpful error that masks the original failure.
💡 Add a null guard
afterAll(() => {
delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
if (testSessionStateDir) {
fs.rmSync(testSessionStateDir, { recursive: true, force: true });
}
});This keeps cleanup errors from shadowing beforeAll failures during debugging.
| // Session state directory — mirrors the target path used by unified_timeline.cjs. | ||
| // /tmp/gh-aw/sandbox/agent/logs/copilot-session-state/{sessionId}/events.jsonl | ||
| const sessionStateBase = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state"); | ||
| // GH_AW_SESSION_STATE_BASE_DIR may be set in tests to redirect writes to an isolated directory. |
There was a problem hiding this comment.
[/diagnose] The comment frames GH_AW_SESSION_STATE_BASE_DIR as a test-only mechanism, but the env var is a legitimate runtime override too — useful for CI job isolation, ephemeral sandboxes, or any scenario where the caller can't pass sessionStateBaseDir directly. Consider rewording to make its general-purpose intent clear.
💡 Suggested rewording
// Override session-state base directory (e.g. for test isolation or CI sandboxing).
const sessionStateBase = sessionStateBaseDir ?? process.env.GH_AW_SESSION_STATE_BASE_DIR ?? defaultSessionStateBase;Coupling the comment to "tests" may mislead future readers who want to use it in other contexts.
| * RuntimeConnection: typeof import("@github/copilot-sdk").RuntimeConnection, | ||
| * approveAll: typeof import("@github/copilot-sdk").approveAll | ||
| * }, | ||
| * sessionStateBaseDir?: string, |
There was a problem hiding this comment.
[/tdd] sessionStateBaseDir is added to the JSDoc and function signature but the production call site in copilot_sdk_driver.cjs (line 121) doesn't forward it — making the parameter unreachable via the normal CLI entrypoint. No test exercises it directly either; all tests rely on the env var path instead.
💡 Options to consider
- Remove the parameter and rely solely on the env var (simpler API; the env var already satisfies test isolation needs).
- Keep the parameter but add a unit test that passes
sessionStateBaseDirexplicitly and asserts that event files land there — ensuring it doesn't silently regress:
it('respects explicit sessionStateBaseDir over env var', async () => {
const explicitDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sdk-explicit-'));
try {
await runWithCopilotSDK({ ..., sessionStateBaseDir: explicitDir });
// assert files written to explicitDir, not testSessionStateDir
} finally {
fs.rmSync(explicitDir, { recursive: true, force: true });
}
});Either way, the current state leaves the parameter untested and unused in production.
|
|
||
| describe("copilot_sdk_driver.cjs", () => { | ||
| let testSessionStateDir; | ||
| beforeAll(() => { |
There was a problem hiding this comment.
[/diagnose] The beforeAll/afterAll hooks correctly prevent test-generated events from polluting the production session-state path, but there's no assertion confirming the redirect actually works. Without a self-verifying test, a future refactor that breaks the isolation could go undetected until handle_agent_failure.cjs fires again.
💡 Consider a smoke-test assertion
After a runWithCopilotSDK call in an existing test, you could assert:
// At the end of an existing test that exercises a session with a known sessionId:
const entries = fs.readdirSync(testSessionStateDir, { recursive: true });
expect(entries.length).toBeGreaterThan(0); // confirms writes landed in the isolated dirThis turns the fix into a regression guard, not just an isolation helper.
|
@copilot run pr-finisher skill |
There was a problem hiding this comment.
The core isolation fix — mkdtemp + env-var redirect in beforeAll/afterAll — correctly solves the false-positive tool-denial issue. Four concerns worth addressing before merge:
### Issues found (4)
Medium — install.sh comment references itself (install.sh:4)
The new comment says Kept in sync with actions/setup-cli/install.sh — that is the file it's in. install-gh-aw.sh was accidentally replaced with the current file's own path, and "that path" now has no referent. Should be install-gh-aw.sh — edit this file, then copy to that path.
Medium-high — GH_AW_SESSION_STATE_BASE_DIR is a production env-var override with no observability (copilot_sdk_session.cjs:112)
The comment calls this a test mechanism, but it lives unconditionally in the production fallback chain. If set in any real agent environment, all session-state writes are silently redirected and handle_agent_failure.cjs gets zero events (false negatives). The clean fix is to remove the env-var from the production path and pass the override exclusively via the sessionStateBaseDir parameter from the test harness.
Medium — sessionStateBaseDir parameter is untested dead code (copilot_sdk_session.cjs:89)
The test never passes this parameter — it relies solely on the env var. The sessionStateBaseDir ?? branch in the fallback chain is uncovered and could be silently broken by a future refactor.
Low — afterAll crashes with TypeError if beforeAll throws (copilot_sdk_driver.test.cjs:18)
force: true suppresses ENOENT, not type errors. If mkdtempSync fails, testSessionStateDir is undefined and fs.rmSync(undefined, ...) throws, masking the original failure. Guard with if (testSessionStateDir).
🔎 Code quality review by PR Code Quality Reviewer
| set +o histexpand | ||
|
|
||
| # Kept in sync with install-gh-aw.sh — edit that file, then copy to this path. | ||
| # Kept in sync with actions/setup-cli/install.sh — edit this file, then copy to that path. |
There was a problem hiding this comment.
Self-referential comment: this line says Kept in sync with actions/setup-cli/install.sh — but that is this file, making the sync target undefined and the install-gh-aw.sh reference silently dropped.
💡 Suggested fix
The new wording correctly flipped edit that file → edit this file, but accidentally replaced install-gh-aw.sh with the current file's own path. Keep the sibling reference:
# Kept in sync with install-gh-aw.sh — edit this file, then copy to that path.As written, no maintainer can tell which file is canonical or where to copy changes.
| const sessionStateBase = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state"); | ||
| // GH_AW_SESSION_STATE_BASE_DIR may be set in tests to redirect writes to an isolated directory. | ||
| const defaultSessionStateBase = path.join(os.tmpdir(), "gh-aw", "sandbox", "agent", "logs", "copilot-session-state"); | ||
| const sessionStateBase = sessionStateBaseDir ?? process.env.GH_AW_SESSION_STATE_BASE_DIR ?? defaultSessionStateBase; |
There was a problem hiding this comment.
Production code now has an undocumented env-var override that can silently lose all audit events: GH_AW_SESSION_STATE_BASE_DIR is described as a test mechanism but lives in the unconditional production fallback chain.
💡 Detail and suggested fix
If this env var is set in any real agent environment — by accident, CI variable leakage, or a misconfigured workflow — all session-state JSONL writes are silently redirected to an arbitrary path. handle_agent_failure.cjs reads the default path and finds nothing, turning every real tool-denial event into a silent false-negative. No warning is emitted.
The cleaner design is to keep the env-var lookup only in tests, pass the override via the sessionStateBaseDir parameter, and remove process.env.GH_AW_SESSION_STATE_BASE_DIR from this fallback chain:
// Production code — no env var
const sessionStateBase = sessionStateBaseDir ?? defaultSessionStateBase;In the test harness, pass the dir explicitly:
await runWithCopilotSDK({ ..., sessionStateBaseDir: testSessionStateDir });If keeping the env var, at minimum emit a visible log line when it overrides the default:
if (process.env.GH_AW_SESSION_STATE_BASE_DIR && !sessionStateBaseDir) {
logger?.warn?.(`[session] GH_AW_SESSION_STATE_BASE_DIR overrides default session state path`);
}| * @returns {Promise<{exitCode: number, output: string, hasOutput: boolean, durationMs: number}>} | ||
| */ | ||
| async function runWithCopilotSDK({ sdkUri, prompt, logger, attempt = 0, model, connectionToken, provider, maxToolDenials, permissionConfig, coreLogger, sdkModule }) { | ||
| async function runWithCopilotSDK({ sdkUri, prompt, logger, attempt = 0, model, connectionToken, provider, maxToolDenials, permissionConfig, coreLogger, sdkModule, sessionStateBaseDir }) { |
There was a problem hiding this comment.
sessionStateBaseDir parameter is added but never exercised in tests: the test suite bypasses it entirely, relying only on process.env.GH_AW_SESSION_STATE_BASE_DIR.
💡 Detail
The parameter was introduced specifically to give callers a clean injection point, but copilot_sdk_driver.test.cjs never passes it — beforeAll sets the env var instead. This means:
- The parameter override branch (
sessionStateBaseDir ?? ...) is dead code from a test-coverage perspective. - Any future breakage of the parameter path (e.g. a destructuring refactor that drops it) won't be caught.
The test should pass sessionStateBaseDir directly and stop relying on the env var, or the parameter should be removed if the env var is the intended permanent mechanism.
| }); | ||
| afterAll(() => { | ||
| delete process.env.GH_AW_SESSION_STATE_BASE_DIR; | ||
| fs.rmSync(testSessionStateDir, { recursive: true, force: true }); |
There was a problem hiding this comment.
fs.rmSync(undefined, ...) will throw a TypeError if beforeAll fails: { force: true } suppresses ENOENT but not type errors from an undefined path argument.
💡 Suggested fix
If fs.mkdtempSync throws (disk full, permissions), testSessionStateDir stays undefined. Vitest still runs afterAll, and fs.rmSync(undefined, ...) throws TypeError: The "path" argument must be of type string, masking the original failure.
Guard the cleanup:
afterAll(() => {
delete process.env.GH_AW_SESSION_STATE_BASE_DIR;
if (testSessionStateDir) fs.rmSync(testSessionStateDir, { recursive: true, force: true });
});Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed the review feedback in fix: restore pre-existing GH_AW_SESSION_STATE_BASE_DIR in test teardown:
Local validation passes ( |
|
@copilot merge main and recompile |
…-tool-denial # Conflicts: # actions/setup-cli/install.sh Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. Merged |
The
jsweeprun that triggered this issue actually succeeded (created a PR), buthandle_agent_failure.cjsreported aguard.tool_denials_exceededevent anyway. The cause: when the agent rannpm run test:jsfor validation,copilot_sdk_driver.test.cjsexercisedrunWithCopilotSDKwith mocked sessions namedsession-max-tool-denialsetc., which wrote realguard.tool_denials_exceededentries into/tmp/gh-aw/sandbox/agent/logs/copilot-session-state/— the exact directoryhandle_agent_failure.cjsscans post-agent, producing a false-positive issue.Changes
copilot_sdk_session.cjs—runWithCopilotSDKnow accepts an optionalsessionStateBaseDirparameter and also checksGH_AW_SESSION_STATE_BASE_DIRenv var before falling back to the default path:copilot_sdk_driver.test.cjs— addsbeforeAll/afterAllhooks that redirect all session state writes to an isolatedmkdtempdirectory for the test suite lifetime and clean up afterward, keeping test artifacts out of the pathhandle_agent_failure.cjsreads: