Skip to content

[agentic-token-optimizer] Code Simplifier — eliminate Python bash-policy failures that waste 864 AIC/run #37266

@github-actions

Description

@github-actions

Selected workflow: code-simplifier.md — highest-AIC eligible workflow in the 7-day window (864 AIC, 1 run captured) and currently on a 5-run failure streak (2026-06-02 → 2026-06-06).


Analysis period and runs audited

Metric Value
Analysis window 2026-05-28 → 2026-06-06 (10 runs)
Runs audited 10 (6 failures, 4 successes)
Failure rate 60% overall; 100% in last 5 runs
Data sources all-runs.json, job step logs (runs 27052709585, 26995892409, 26931428773, 26735803823, 26555122796)

Cost profile

Metric Failure run (Jun 6) Success run (May 28) Delta
AIC 864.23 ~90 (est.) ~9.6× worse
Raw tokens 2,584,799 460,665 5.6× worse
Effective tokens 25,759,102 ~460,000 55× worse
Turns 50 (max hit) 10 5× worse
Duration 13 min ~4 min 3× worse
Conclusion failure (429 token cap) success

The June 6 run consumed 25.76M effective tokens, exceeding the 25M hard cap and triggering a terminal CAPIError: 429 Maximum effective tokens exceeded after 5 retries.


Ranked recommendations

#1 — Add python3 to bash allowlist (or enforce jq-only JSON processing)

Estimated AIC savings: ~680–700 AIC per failure run

Root cause confirmed across 3 consecutive failure runs:
The agent consistently invokes python3 -c "..." to parse and explore JSON files (source-files.json, recent-prs.json, history-summary.json). python3 is not in the bash allowlist, so every call produces a Permission denied and could not request permission from user error. The harness classifies the run as a missing tool/permission issue after ≥ 11 denials and does not retry the full run.

Evidence from job step logs (runs 27052709585, 26995892409, 26931428773):

$ cat /tmp/copilot-tool-output-*.txt | python3 -c "import json,sys; files=json.load(sys.stdin); [print(f) for f in files]"
 Permission denied and could not request permission from user
$ cat /tmp/gh-aw/code-simplifier/history-summary.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(json.dumps(d, indent=2))"
 Permission denied and could not request permission from user
$ cat /tmp/gh-aw/code-simplifier/recent-prs.json | python3 -c "import json,sys; ..."
 Permission denied and could not request permission from user

11 permission-denied events per run; found in every failure run inspected, zero in the May 28 success run.

Actions (pick one or both):

  • A — Add to allowlist (immediate): add - "python3 -c *" and - "python3 -m json.tool" to the bash: tool list in the frontmatter.
  • B — Prompt guardrail (complementary): in ## Command Guardrails, add: "Do NOT use python3 for JSON parsing; use jq, cat, or head instead."

Why this fixes the failure streak: without Python permission errors, the agent processes files with jq in ≤15 turns instead of exhausting 50 turns and hitting the effective-token cap.


#2 — Fix jq arithmetic error in the history-summary deterministic step

Estimated AIC savings: ~50–80 AIC/run (indirect)

The "Prepare workflow history summary" setup step fails with:

jq: error (at <stdin>:0): string ("g") and number (2) cannot be added

on every run (confirmed in 3 failure runs, likely present in successes too but non-fatal). The jq filter operates on the GitHub workflow runs API response; the error suggests a field expected to be a number is a string (e.g., run_number or a count is being summed with a string value). A malformed history-summary.json deprives the agent of the precomputed context it needs, likely increasing turn count as it tries to reconstruct missing data.

Action: Audit the jq filter in the "Prepare workflow history summary" step. The likely fix is converting string fields before arithmetic: e.g., replace direct + on potentially-string API fields with tonumber coercion or a null guard.


#3 — Reduce effective token pressure with a turn-limit guardrail

Estimated AIC savings: ~100–150 AIC/run on any future failure runs

The June 6 run exhausted all 50 turns before hitting the 25M effective-token hard cap. The existing ## Command Guardrails section says to call report_incomplete when a command is blocked, but the agent keeps trying Python variations instead. Adding an explicit turn-count awareness instruction or reducing max-daily-ai-credits from 100M would create a softer ceiling before the hard cap is reached.

Action: Add to ## Command Guardrails: "If you encounter 3 or more consecutive Permission denied errors for the same type of command, stop immediately and call report_incomplete." This aligns with the existing "short-circuit instead of continuing retries" rule but adds a measurable threshold.


#4 — Reduce blocked unknown-domain requests (46 per run)

Estimated AIC savings: ~5–10 AIC/run (minor)

Each run generates 46 blocked requests to (unknown) domains alongside the 108 allowed api.githubcopilot.com calls. These are likely from tool calls that attempt side-channel HTTP (e.g., pip install or Python module fetches triggered by python3). Fixing recommendation #1 eliminates python3 calls and should reduce this noise.

Action: No separate action needed; will be resolved by #1.


Caveats

  • Only 1 run was captured in the 7-day all-runs.json window; the 5-run streak was confirmed via GitHub Actions API historical lookup.
  • Success-run AIC (~90) is estimated from token count (460K raw tokens on May 28 success); no AIC field was available for that run in snapshots.
  • The jq error (Add workflow: githubnext/agentics/weekly-research #2) may be benign if history-summary.json is still written with partial data; impact is indirect and conservative.
  • No inline sub-agent recommendations: the workflow already has scope-filter and simplification-scout sub-agents, and the failures are all tool-policy related, not prompt-structure issues.
Supporting run evidence
Run ID Date Conclusion Turns Permission Denials Python calls observed
§27052709585 2026-06-06 failure (token cap) 50 11 Yes
§26995892409 2026-06-05 failure 50 (est.) 11+ Yes
§26931428773 2026-06-04 failure 50 (est.) 11+ Yes
§26864348833 2026-06-03 failure
§26799052440 2026-06-02 failure
§26735803823 2026-06-01 success 10 0 No
§26703540010 2026-05-31 failure
§26674621344 2026-05-30 success 0
§26618444742 2026-05-29 success 0
§26555122796 2026-05-28 success 10 0 No

References: §27052709585 · §26995892409 · §26735803823

Generated by Agentic Workflow AIC Usage Optimizer · 654.5 AIC ·

  • expires on Jun 13, 2026, 5:52 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions