ci(smoke): add token-usage sanity checks to smoke workflows by lpcox · Pull Request #5264 · github/gh-aw-firewall

lpcox · 2026-06-18T21:13:42Z

What

Adds a verify_token_usage job to the smoke-copilot, smoke-claude, and smoke-codex workflows. The job runs after the agent job, downloads the agent artifact, and runs scripts/ci/check-token-usage.js against it. The compiler wires the job into conclusion.needs, so a failure fails the workflow.

Checks enforced

The checker validates two engine-independent invariants over the agent artifact:

Internal consistency — the sum of per-response records in token-usage.jsonl must exactly equal the aggregated agent_usage.json (input/output/cache_read/cache_write). Hard fail on mismatch or missing aggregate.
cache_read != 0 — cache_read_tokens == 0 across multiple responses is a hard failure (the symptom of the cached-token normalization bug fixed in fix(api-proxy): map OpenAI Responses API cached tokens to cache_read #5262). Below the min-requests threshold it only warns.

ai_credits/ambient_context drift is reported as warnings only.

Why internal consistency instead of engine-vs-proxy

Codex's engine-native telemetry (turn.completed) reports cumulative counts that diverge ~2x from the api-proxy per-request sum, making an engine-vs-proxy comparison infeasible. The internal-consistency invariant was verified exact for both codex and copilot real artifacts.

Tests

scripts/ci/check-token-usage.test.ts — 17 unit tests (parsing, summation, consistency, cache-read guard, file location, arg parsing).
Validated the checker against a real on-disk codex artifact: consistency passes, cache_read==0 correctly fails.

Notes

Checker is zero-dependency CommonJS, runnable with bare node (no setup-node/npm ci needed in the verify job).
.md sources edited, recompiled with gh aw compile, and post-processed via scripts/ci/postprocess-smoke-workflows.ts.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

github-actions · 2026-06-18T21:15:16Z

✅ Coverage Check Passed

Overall Coverage

Metric	Base	PR	Delta
Lines	97.54%	97.58%	📈 +0.04%
Statements	97.47%	97.50%	📈 +0.03%
Functions	98.85%	98.85%	➡️ +0.00%
Branches	92.87%	92.91%	📈 +0.04%

📁 Per-file Coverage Changes (1 files)

File	Lines (Before → After)	Statements (Before → After)
`src/workdir-setup.ts`	92.7% → 94.5% (+1.82%)	92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Add a verify_token_usage job to smoke-copilot, smoke-claude, and smoke-codex that runs after the agent job on the downloaded agent artifact and fails the workflow when token accounting looks wrong. The checker (scripts/ci/check-token-usage.js) enforces two invariants: - Internal consistency: the sum of per-response records in token-usage.jsonl must exactly equal the aggregated agent_usage.json (input/output/cache_read/cache_write). This is engine-independent. - cache_read_tokens must not be 0 across multiple responses, which is the symptom of the cached-token normalization bug. ai_credits/ambient_context drift is reported as warnings only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR adds a CI-side “token usage sanity check” to the smoke workflows by introducing a small Node.js checker script (plus unit tests) and wiring a new verify_token_usage job into the generated smoke workflow graphs so token-usage inconsistencies fail the workflow.

Changes:

Add scripts/ci/check-token-usage.js and scripts/ci/check-token-usage.test.ts to validate token-usage internal consistency and detect cache_read_tokens == 0 across multi-turn runs.
Extend smoke workflow sources (smoke-copilot.md, smoke-claude.md, smoke-codex.md) and regenerated lock workflows to run the checker against the downloaded agent artifact.
Update the gVisor firewall comparison workflow to wait for Squid/Envoy readiness (but see review comments on the Squid probe).

Show a summary per file

File	Description
`scripts/ci/check-token-usage.js`	New zero-dependency Node checker that locates usage files in the agent artifact and enforces invariants.
`scripts/ci/check-token-usage.test.ts`	Unit tests for parsing, summation/consistency, cache-read guard, path resolution, and arg parsing.
`.github/workflows/smoke-copilot.md`	Adds `verify_token_usage` job to run the checker after `agent`.
`.github/workflows/smoke-claude.md`	Adds `verify_token_usage` job to run the checker after `agent`.
`.github/workflows/smoke-codex.md`	Adds `verify_token_usage` job to run the checker after `agent`.
`.github/workflows/smoke-copilot.lock.yml`	Regenerated locked workflow including `verify_token_usage` in the job graph.
`.github/workflows/smoke-claude.lock.yml`	Regenerated locked workflow including `verify_token_usage` (also includes additional generated deltas).
`.github/workflows/smoke-codex.lock.yml`	Regenerated locked workflow including `verify_token_usage` in the job graph.
`.github/workflows/test-gvisor-firewall-comparison.yml`	Replaces fixed sleeps with readiness loops for Squid/Envoy startup.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 8/8 changed files
Comments generated: 3

+    const text = fs.readFileSync(agentUsage, 'utf8').trim();
+    // agent_usage may be a single JSON object or a one-line JSONL file.
+    const parsed = parseJsonl(text);
+    aggregate = parsed.length > 0 ? parsed[parsed.length - 1] : null;
+  }


+    ]) ||
+    findFileRecursive(root, 'agent_usage.json');
+


      - name: Setup Scripts
        id: setup
-        uses: github/gh-aw-actions/setup@c0338fef4749d08c21f8f975fb0e37efa17dda47 # v0.79.8
+        uses: github/gh-aw-actions/setup@5c2fe865bb4dc46e1450f6ee0d0541d759aea73a # v0.79.6


lpcox · 2026-06-18T21:47:25Z

@copilot address review feedback

github-actions · 2026-06-18T21:56:36Z

⏳ Copilot review left inline comments.

@lpcox To proceed:

Ask @copilot to address the review feedback (reply to this comment or the review thread)
Once the fix is pushed, add the ready-for-aw label to trigger agentic CI smoke tests

- Parse agent_usage with JSON.parse() first, fallback to JSONL - Recursive fallback also searches for agent_usage.jsonl - Restore smoke-claude.lock.yml to v0.79.8 versions and add only the verify_token_usage job - Add 3 unit tests (20 total)

Copilot · 2026-06-18T21:57:16Z

@copilot address review feedback

Addressed in e6e291c:

Pretty-printed JSON parsing — agent_usage is now parsed with JSON.parse() first; the JSONL path is only taken if that throws. This handles multi-line pretty-printed files correctly.
Recursive fallback for agent_usage.jsonl — locateUsageFiles() now falls back to findFileRecursive(root, 'agent_usage.jsonl') in addition to agent_usage.json, so the JSONL form is found even when the artifact layout changes.
smoke-claude.lock.yml version downgrade — reverted the file to the original v0.79.8 state and re-applied only the verify_token_usage job (matching what was done for copilot/codex). All v0.79.8 SHAs and variable names are preserved.

Three new unit tests added (20 total): findFileRecursive for .jsonl, the pretty-printed JSON path via main(), and findFileRecursive absent-file return.

github-actions · 2026-06-18T21:58:10Z

❌ Smoke Claude failed

github-actions · 2026-06-18T21:58:18Z

✅ Contribution Check completed successfully!

github-actions · 2026-06-18T21:59:02Z

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

github-actions · 2026-06-18T21:59:09Z

✅ Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

github-actions · 2026-06-18T21:59:20Z

❌ Smoke Copilot BYOK reports failed. BYOK mode investigation needed...

github-actions · 2026-06-18T21:59:24Z

✅ Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

Completed smoke test summary with comment and labels

github-actions · 2026-06-18T21:59:25Z

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

github-actions · 2026-06-18T21:59:28Z

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

github-actions · 2026-06-18T21:59:47Z

🔌 Smoke Services — All services reachable! ✅

github-actions · 2026-06-18T21:59:58Z

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

github-actions · 2026-06-18T22:00:14Z

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

github-actions · 2026-06-18T22:01:11Z

✅ Build Test Suite completed successfully!

github-actions · 2026-06-18T22:03:41Z

🔥 Smoke Test: Copilot PAT — PASS

Test	Result
GitHub MCP connectivity	✅
GitHub.com HTTP	✅ 200
File write/read	✅

Overall: PASS · Auth mode: PAT (COPILOT_GITHUB_TOKEN)

@lpcox

🔑 PAT report filed by Smoke Copilot PAT

github-actions · 2026-06-18T22:05:11Z

✅ Smoke Gemini completed. All facets verified. 💎

Gemini smoke test completed with a FAIL status due to connectivity issues.

github-actions · 2026-06-18T22:06:37Z

@lpcox
MCP PR list: ci(smoke): add token-usage sanity checks to smoke workflows; fix(api-proxy): write placeholder token-usage record when usage extraction fails ✅
GitHub.com connectivity: 200 ✅
File write/read: agent-write-test ✅
Direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) ✅
Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

github-actions · 2026-06-18T22:09:04Z

Chroot Version Comparison Results

Runtime	Host Version	Chroot Version	Match?
Python	Python 3.12.13	Python 3.12.3	❌
Node.js	v24.16.0	v22.22.3	❌
Go	go1.22.12	go1.22.12	✅

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

github-actions · 2026-06-18T22:09:11Z

Smoke Test: API Proxy OpenTelemetry Tracing

Scenario	Result	Notes
S1: Module Loading	✅	`otel.js` loads cleanly; exports: `startRequestSpan`, `setTokenAttributes`, `setBudgetAttributes`, `endSpan`, `endSpanError`, `shutdown`, `isEnabled` + internal helpers
S2: Test Suite	✅	59 tests passed across `otel.test.js` + `otel-fanout.test.js` (2 suites, 0 failures)
S3: Env Var Forwarding	✅	`api-proxy-service-config.ts` forwards `OTEL_EXPORTER_OTLP_ENDPOINT`, `OTEL_EXPORTER_OTLP_HEADERS`, `GITHUB_AW_OTEL_TRACE_ID`, `GITHUB_AW_OTEL_PARENT_SPAN_ID`, `OTEL_SERVICE_NAME`; full `OTEL_*` suite forwarded via `agent-environment-credentials.ts`
S4: Token Tracker Integration	✅	`onUsage` callback exists in `token-tracker-http.js` (line 283/374) as the OTEL hook point
S5: OTEL Diagnostics	✅	No endpoint configured → spans written to `/var/log/api-proxy/otel.jsonl` (file exporter fallback); graceful degradation confirmed

All 5 scenarios passed. OTEL tracing integration is functioning correctly.

📡 OTel tracing validated by Smoke OTel Tracing

github-actions · 2026-06-18T22:09:48Z

🔬 Smoke Test Results

Test	Result
GitHub MCP connectivity	✅ PASS
GitHub.com HTTP connectivity	⚠️ N/A (pre-step template not rendered)
File write/read	⚠️ N/A (pre-step template not rendered)

PR: ci(smoke): add token-usage sanity checks to smoke workflows
Author: @lpcox

Overall: ⚠️ INCONCLUSIVE — MCP test passed; pre-computed HTTP and file test data unavailable (workflow template variables not expanded).

📰 BREAKING: Report filed by Smoke Copilot

github-actions · 2026-06-18T22:11:01Z

@lpcox Smoke Test Results for Direct BYOK (Azure OpenAI Entra):

GitHub MCP: ✅
GitHub.com Connectivity: ✅
File Write/Read Test: ❌
BYOK Inference Test: ✅
Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)
Overall: FAIL

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

github-actions · 2026-06-18T22:11:43Z

Smoke Test: GitHub Actions Services Connectivity

Check	Result
Redis PING	❌ No response (timeout)
PostgreSQL pg_isready	❌ No response (timeout)
PostgreSQL SELECT 1	❌ No response (timeout)

Overall: ❌ FAIL

host.docker.internal resolves to 172.17.0.1 but ports 6379 and 5432 are unreachable — service containers do not appear to be running in this environment.

🔌 Service connectivity validated by Smoke Services

github-actions · 2026-06-18T22:18:33Z

Gemini Smoke Test Results

GitHub MCP Testing: ✅
- fix(api-proxy): map OpenAI Responses API cached tokens to cache_read #5262: fix(api-proxy): map OpenAI Responses API cached tokens to cache_read
- fix(api-proxy): copy token-tracker-shared + otel modules into image (fixes AIC=0) #5254: fix(api-proxy): copy token-tracker-shared + otel modules into image (fixes AIC=0)
GitHub.com Connectivity: ❌ (000/SSL error)
File Writing Testing: ✅
Bash Tool Testing: ✅

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

github-actions · 2026-06-18T22:31:13Z

🏗️ Build Test Suite Results

Ecosystem	Project	Build/Install	Tests	Status
Bun	elysia	✅	1/1 passed	✅ PASS
Bun	hono	✅	1/1 passed	✅ PASS
C++	fmt	✅	N/A	✅ PASS
C++	json	✅	N/A	✅ PASS
Deno	oak	N/A	1/1 passed	✅ PASS
Deno	std	N/A	1/1 passed	✅ PASS
.NET	hello-world	✅	N/A	✅ PASS
.NET	json-parse	✅	N/A	✅ PASS
Go	color	✅	ok	✅ PASS
Go	env	✅	ok	✅ PASS
Go	uuid	✅	ok	✅ PASS
Java	gson	✅	1/1 passed	✅ PASS
Java	caffeine	✅	1/1 passed	✅ PASS
Node.js	clsx	✅	All passed	✅ PASS
Node.js	execa	✅	All passed	✅ PASS
Node.js	p-limit	✅	All passed	✅ PASS
Rust	fd	✅	1/1 passed	✅ PASS
Rust	zoxide	✅	1/1 passed	✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5264 · 160.9 AIC · ⊞ 7.5K · ◷

github-actions · 2026-06-18T22:34:01Z

Merged PRs:

fix(api-proxy): map OpenAI Responses API cached tokens to cache_read #5262 fix(api-proxy): map OpenAI Responses API cached tokens to cache_read
fix(api-proxy): copy token-tracker-shared + otel modules into image (fixes AIC=0) #5254 fix(api-proxy): copy token-tracker-shared + otel modules into image (fixes AIC=0)

Checks:

GitHub read: ✅
Discussion comment: ✅
Playwright title: ✅
File write/read: ✅
Build: ✅

Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

Copilot AI review requested due to automatic review settings June 18, 2026 21:13

Copilot started reviewing on behalf of lpcox June 18, 2026 21:14 View session

lpcox force-pushed the add-smoke-token-usage-checks branch from 772a6b1 to 6cf652c Compare June 18, 2026 21:15

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Copilot started work on behalf of lpcox June 18, 2026 21:47 View session

lpcox added the ready-for-aw label Jun 18, 2026

Copilot finished work on behalf of lpcox June 18, 2026 21:57

lpcox temporarily deployed to aoai-model June 18, 2026 21:59 — with GitHub Actions Inactive

github-actions Bot added the smoke-copilot-pat label Jun 18, 2026

github-actions Bot mentioned this pull request Jun 18, 2026

[aw] Smoke Copilot BYOK is missing required data #5265

Closed

github-actions Bot added the smoke-copilot-byok-aoai-apikey label Jun 18, 2026

lpcox temporarily deployed to aoai-model June 18, 2026 22:10 — with GitHub Actions Inactive

github-actions Bot added the smoke-copilot-byok-aoai-entra label Jun 18, 2026

lpcox temporarily deployed to aoai-model June 18, 2026 22:11 — with GitHub Actions Inactive

github-actions Bot added the build-test label Jun 18, 2026

github-actions Bot added the smoke-codex label Jun 18, 2026

lpcox merged commit 4eba441 into main Jun 18, 2026
83 of 94 checks passed

lpcox deleted the add-smoke-token-usage-checks branch June 18, 2026 22:36

This was referenced Jun 19, 2026

fix(containers): apt install fallback to archive.ubuntu.com #5266

Merged

[aw] No-Op Runs #5231

Closed

lpcox mentioned this pull request Jun 19, 2026

Bump DefaultFirewallVersion to v0.27.7 github/gh-aw#40207

Open

4 tasks

Copilot AI mentioned this pull request Jun 19, 2026

Bump default gh-aw-firewall to v0.27.7 and refresh generated artifacts github/gh-aw#40208

Merged

Conversation

lpcox commented Jun 18, 2026

What

Checks enforced

Why internal consistency instead of engine-vs-proxy

Tests

Notes

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Coverage Check Passed

Overall Coverage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

lpcox commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

Copilot AI commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026

🔥 Smoke Test: Copilot PAT — PASS

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

Chroot Version Comparison Results

Uh oh!

github-actions Bot commented Jun 18, 2026

Smoke Test: API Proxy OpenTelemetry Tracing

Uh oh!

github-actions Bot commented Jun 18, 2026

🔬 Smoke Test Results

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

github-actions Bot commented Jun 18, 2026

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading