Skip to content

ci(smoke): add token-usage sanity checks to smoke workflows#5264

Merged
lpcox merged 2 commits into
mainfrom
add-smoke-token-usage-checks
Jun 18, 2026
Merged

ci(smoke): add token-usage sanity checks to smoke workflows#5264
lpcox merged 2 commits into
mainfrom
add-smoke-token-usage-checks

Conversation

@lpcox

@lpcox lpcox commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What

Adds a verify_token_usage job to the smoke-copilot, smoke-claude, and smoke-codex workflows. The job runs after the agent job, downloads the agent artifact, and runs scripts/ci/check-token-usage.js against it. The compiler wires the job into conclusion.needs, so a failure fails the workflow.

Checks enforced

The checker validates two engine-independent invariants over the agent artifact:

  1. Internal consistency — the sum of per-response records in token-usage.jsonl must exactly equal the aggregated agent_usage.json (input/output/cache_read/cache_write). Hard fail on mismatch or missing aggregate.
  2. cache_read != 0cache_read_tokens == 0 across multiple responses is a hard failure (the symptom of the cached-token normalization bug fixed in fix(api-proxy): map OpenAI Responses API cached tokens to cache_read #5262). Below the min-requests threshold it only warns.

ai_credits/ambient_context drift is reported as warnings only.

Why internal consistency instead of engine-vs-proxy

Codex's engine-native telemetry (turn.completed) reports cumulative counts that diverge ~2x from the api-proxy per-request sum, making an engine-vs-proxy comparison infeasible. The internal-consistency invariant was verified exact for both codex and copilot real artifacts.

Tests

  • scripts/ci/check-token-usage.test.ts — 17 unit tests (parsing, summation, consistency, cache-read guard, file location, arg parsing).
  • Validated the checker against a real on-disk codex artifact: consistency passes, cache_read==0 correctly fails.

Notes

  • Checker is zero-dependency CommonJS, runnable with bare node (no setup-node/npm ci needed in the verify job).
  • .md sources edited, recompiled with gh aw compile, and post-processed via scripts/ci/postprocess-smoke-workflows.ts.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Copilot AI review requested due to automatic review settings June 18, 2026 21:13
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 97.54% 97.58% 📈 +0.04%
Statements 97.47% 97.50% 📈 +0.03%
Functions 98.85% 98.85% ➡️ +0.00%
Branches 92.87% 92.91% 📈 +0.04%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/workdir-setup.ts 92.7% → 94.5% (+1.82%) 92.7% → 94.5% (+1.82%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Add a verify_token_usage job to smoke-copilot, smoke-claude, and
smoke-codex that runs after the agent job on the downloaded agent
artifact and fails the workflow when token accounting looks wrong.

The checker (scripts/ci/check-token-usage.js) enforces two invariants:
- Internal consistency: the sum of per-response records in
  token-usage.jsonl must exactly equal the aggregated agent_usage.json
  (input/output/cache_read/cache_write). This is engine-independent.
- cache_read_tokens must not be 0 across multiple responses, which is
  the symptom of the cached-token normalization bug.

ai_credits/ambient_context drift is reported as warnings only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lpcox lpcox force-pushed the add-smoke-token-usage-checks branch from 772a6b1 to 6cf652c Compare June 18, 2026 21:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CI-side “token usage sanity check” to the smoke workflows by introducing a small Node.js checker script (plus unit tests) and wiring a new verify_token_usage job into the generated smoke workflow graphs so token-usage inconsistencies fail the workflow.

Changes:

  • Add scripts/ci/check-token-usage.js and scripts/ci/check-token-usage.test.ts to validate token-usage internal consistency and detect cache_read_tokens == 0 across multi-turn runs.
  • Extend smoke workflow sources (smoke-copilot.md, smoke-claude.md, smoke-codex.md) and regenerated lock workflows to run the checker against the downloaded agent artifact.
  • Update the gVisor firewall comparison workflow to wait for Squid/Envoy readiness (but see review comments on the Squid probe).
Show a summary per file
File Description
scripts/ci/check-token-usage.js New zero-dependency Node checker that locates usage files in the agent artifact and enforces invariants.
scripts/ci/check-token-usage.test.ts Unit tests for parsing, summation/consistency, cache-read guard, path resolution, and arg parsing.
.github/workflows/smoke-copilot.md Adds verify_token_usage job to run the checker after agent.
.github/workflows/smoke-claude.md Adds verify_token_usage job to run the checker after agent.
.github/workflows/smoke-codex.md Adds verify_token_usage job to run the checker after agent.
.github/workflows/smoke-copilot.lock.yml Regenerated locked workflow including verify_token_usage in the job graph.
.github/workflows/smoke-claude.lock.yml Regenerated locked workflow including verify_token_usage (also includes additional generated deltas).
.github/workflows/smoke-codex.lock.yml Regenerated locked workflow including verify_token_usage in the job graph.
.github/workflows/test-gvisor-firewall-comparison.yml Replaces fixed sleeps with readiness loops for Squid/Envoy startup.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 8/8 changed files
  • Comments generated: 3

Comment on lines +290 to +294
const text = fs.readFileSync(agentUsage, 'utf8').trim();
// agent_usage may be a single JSON object or a one-line JSONL file.
const parsed = parseJsonl(text);
aggregate = parsed.length > 0 ? parsed[parsed.length - 1] : null;
}
Comment on lines +233 to +235
]) ||
findFileRecursive(root, 'agent_usage.json');

Comment thread .github/workflows/smoke-claude.lock.yml Outdated
Comment on lines +105 to +107
- name: Setup Scripts
id: setup
uses: github/gh-aw-actions/setup@c0338fef4749d08c21f8f975fb0e37efa17dda47 # v0.79.8
uses: github/gh-aw-actions/setup@5c2fe865bb4dc46e1450f6ee0d0541d759aea73a # v0.79.6
@lpcox

lpcox commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

@copilot address review feedback

@github-actions

Copy link
Copy Markdown
Contributor

⏳ Copilot review left inline comments.

@lpcox To proceed:

  1. Ask @copilot to address the review feedback (reply to this comment or the review thread)
  2. Once the fix is pushed, add the ready-for-aw label to trigger agentic CI smoke tests

- Parse agent_usage with JSON.parse() first, fallback to JSONL
- Recursive fallback also searches for agent_usage.jsonl
- Restore smoke-claude.lock.yml to v0.79.8 versions and add
  only the verify_token_usage job
- Add 3 unit tests (20 total)

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

@copilot address review feedback

Addressed in e6e291c:

  1. Pretty-printed JSON parsingagent_usage is now parsed with JSON.parse() first; the JSONL path is only taken if that throws. This handles multi-line pretty-printed files correctly.

  2. Recursive fallback for agent_usage.jsonllocateUsageFiles() now falls back to findFileRecursive(root, 'agent_usage.jsonl') in addition to agent_usage.json, so the JSONL form is found even when the artifact layout changes.

  3. smoke-claude.lock.yml version downgrade — reverted the file to the original v0.79.8 state and re-applied only the verify_token_usage job (matching what was done for copilot/codex). All v0.79.8 SHAs and variable names are preserved.

Three new unit tests added (20 total): findFileRecursive for .jsonl, the pretty-printed JSON path via main(), and findFileRecursive absent-file return.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude failed

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK reports failed. BYOK mode investigation needed...

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

Completed smoke test summary with comment and labels

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Chroot tests passed! Smoke Chroot - All security and functionality tests succeeded.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

✨ The prophecy is fulfilled... Smoke Codex has completed its mystical journey. The stars align. 🌟

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test: Copilot PAT — PASS

Test Result
GitHub MCP connectivity
GitHub.com HTTP ✅ 200
File write/read

Overall: PASS · Auth mode: PAT (COPILOT_GITHUB_TOKEN)

@lpcox

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

Gemini smoke test completed with a FAIL status due to connectivity issues.

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox
MCP PR list: ci(smoke): add token-usage sanity checks to smoke workflows; fix(api-proxy): write placeholder token-usage record when usage extraction fails ✅
GitHub.com connectivity: 200 ✅
File write/read: agent-write-test ✅
Direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw) ✅
Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.16.0 v22.22.3
Go go1.22.12 go1.22.12

Overall: ❌ FAILED — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Notes
S1: Module Loading otel.js loads cleanly; exports: startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled + internal helpers
S2: Test Suite 59 tests passed across otel.test.js + otel-fanout.test.js (2 suites, 0 failures)
S3: Env Var Forwarding api-proxy-service-config.ts forwards OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID, OTEL_SERVICE_NAME; full OTEL_* suite forwarded via agent-environment-credentials.ts
S4: Token Tracker Integration onUsage callback exists in token-tracker-http.js (line 283/374) as the OTEL hook point
S5: OTEL Diagnostics No endpoint configured → spans written to /var/log/api-proxy/otel.jsonl (file exporter fallback); graceful degradation confirmed

All 5 scenarios passed. OTEL tracing integration is functioning correctly.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

Test Result
GitHub MCP connectivity ✅ PASS
GitHub.com HTTP connectivity ⚠️ N/A (pre-step template not rendered)
File write/read ⚠️ N/A (pre-step template not rendered)

PR: ci(smoke): add token-usage sanity checks to smoke workflows
Author: @lpcox

Overall: ⚠️ INCONCLUSIVE — MCP test passed; pre-computed HTTP and file test data unavailable (workflow template variables not expanded).

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

@lpcox Smoke Test Results for Direct BYOK (Azure OpenAI Entra):

  • GitHub MCP: ✅
  • GitHub.com Connectivity: ✅
  • File Write/Read Test: ❌
  • BYOK Inference Test: ✅
    Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)
    Overall: FAIL

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: GitHub Actions Services Connectivity

Check Result
Redis PING ❌ No response (timeout)
PostgreSQL pg_isready ❌ No response (timeout)
PostgreSQL SELECT 1 ❌ No response (timeout)

Overall: ❌ FAIL

host.docker.internal resolves to 172.17.0.1 but ports 6379 and 5432 are unreachable — service containers do not appear to be running in this environment.

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

Gemini Smoke Test Results

  1. GitHub MCP Testing: ✅
  2. GitHub.com Connectivity: ❌ (000/SSL error)
  3. File Writing Testing: ✅
  4. Bash Tool Testing: ✅

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color ok ✅ PASS
Go env ok ✅ PASS
Go uuid ok ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5264 · 160.9 AIC · ⊞ 7.5K ·

@github-actions

Copy link
Copy Markdown
Contributor

Merged PRs:

Checks:

  • GitHub read: ✅
  • Discussion comment: ✅
  • Playwright title: ✅
  • File write/read: ✅
  • Build: ✅

Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@lpcox lpcox merged commit 4eba441 into main Jun 18, 2026
83 of 94 checks passed
@lpcox lpcox deleted the add-smoke-token-usage-checks branch June 18, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants