[agentic-token-optimizer] Code Simplifier — eliminate Python bash-policy failures that waste 864 AIC/run

**Selected workflow:** `code-simplifier.md` — highest-AIC eligible workflow in the 7-day window (864 AIC, 1 run captured) and currently on a **5-run failure streak** (2026-06-02 → 2026-06-06).

---

### Analysis period and runs audited

| Metric | Value |
|---|---|
| Analysis window | 2026-05-28 → 2026-06-06 (10 runs) |
| Runs audited | 10 (6 failures, 4 successes) |
| Failure rate | 60% overall; 100% in last 5 runs |
| Data sources | `all-runs.json`, job step logs (runs 27052709585, 26995892409, 26931428773, 26735803823, 26555122796) |

---

### Cost profile

| Metric | Failure run (Jun 6) | Success run (May 28) | Delta |
|---|---|---|---|
| AIC | 864.23 | ~90 (est.) | ~9.6× worse |
| Raw tokens | 2,584,799 | 460,665 | 5.6× worse |
| Effective tokens | 25,759,102 | ~460,000 | 55× worse |
| Turns | 50 (max hit) | 10 | 5× worse |
| Duration | 13 min | ~4 min | 3× worse |
| Conclusion | failure (429 token cap) | success | — |

The June 6 run consumed **25.76M effective tokens**, exceeding the 25M hard cap and triggering a terminal `CAPIError: 429 Maximum effective tokens exceeded` after 5 retries.

---

### Ranked recommendations

#### #1 — Add `python3` to bash allowlist (or enforce `jq`-only JSON processing)
**Estimated AIC savings: ~680–700 AIC per failure run**

**Root cause confirmed across 3 consecutive failure runs:**
The agent consistently invokes `python3 -c "..."` to parse and explore JSON files (source-files.json, recent-prs.json, history-summary.json). `python3` is **not in the bash allowlist**, so every call produces a `Permission denied and could not request permission from user` error. The harness classifies the run as a `missing tool/permission issue` after ≥ 11 denials and does not retry the full run.

**Evidence from job step logs (runs 27052709585, 26995892409, 26931428773):**
```
$ cat /tmp/copilot-tool-output-*.txt | python3 -c "import json,sys; files=json.load(sys.stdin); [print(f) for f in files]"
 Permission denied and could not request permission from user
$ cat /tmp/gh-aw/code-simplifier/history-summary.json | python3 -c "import json,sys; d=json.load(sys.stdin); print(json.dumps(d, indent=2))"
 Permission denied and could not request permission from user
$ cat /tmp/gh-aw/code-simplifier/recent-prs.json | python3 -c "import json,sys; ..."
 Permission denied and could not request permission from user
```
11 permission-denied events per run; found in every failure run inspected, zero in the May 28 success run.

**Actions (pick one or both):**
- **A — Add to allowlist** (immediate): add `- "python3 -c *"` and `- "python3 -m json.tool"` to the `bash:` tool list in the frontmatter.
- **B — Prompt guardrail** (complementary): in `## Command Guardrails`, add: _"Do NOT use `python3` for JSON parsing; use `jq`, `cat`, or `head` instead."_

**Why this fixes the failure streak:** without Python permission errors, the agent processes files with `jq` in ≤15 turns instead of exhausting 50 turns and hitting the effective-token cap.

---

#### #2 — Fix `jq` arithmetic error in the history-summary deterministic step
**Estimated AIC savings: ~50–80 AIC/run (indirect)**

The "Prepare workflow history summary" setup step fails with:
```
jq: error (at <stdin>:0): string ("g") and number (2) cannot be added
```
on every run (confirmed in 3 failure runs, likely present in successes too but non-fatal). The jq filter operates on the GitHub workflow runs API response; the error suggests a field expected to be a number is a string (e.g., `run_number` or a count is being summed with a string value). A malformed `history-summary.json` deprives the agent of the precomputed context it needs, likely increasing turn count as it tries to reconstruct missing data.

**Action:** Audit the jq filter in the "Prepare workflow history summary" step. The likely fix is converting string fields before arithmetic: e.g., replace direct `+` on potentially-string API fields with `tonumber` coercion or a `null` guard.

---

#### #3 — Reduce effective token pressure with a turn-limit guardrail
**Estimated AIC savings: ~100–150 AIC/run on any future failure runs**

The June 6 run exhausted all 50 turns before hitting the 25M effective-token hard cap. The existing `## Command Guardrails` section says to call `report_incomplete` when a command is blocked, but the agent keeps trying Python variations instead. Adding an explicit turn-count awareness instruction or reducing `max-daily-ai-credits` from `100M` would create a softer ceiling before the hard cap is reached.

**Action:** Add to `## Command Guardrails`: _"If you encounter 3 or more consecutive `Permission denied` errors for the same type of command, stop immediately and call `report_incomplete`."_ This aligns with the existing "short-circuit instead of continuing retries" rule but adds a measurable threshold.

---

#### #4 — Reduce blocked unknown-domain requests (46 per run)
**Estimated AIC savings: ~5–10 AIC/run (minor)**

Each run generates 46 blocked requests to `(unknown)` domains alongside the 108 allowed `api.githubcopilot.com` calls. These are likely from tool calls that attempt side-channel HTTP (e.g., pip install or Python module fetches triggered by python3). Fixing recommendation #1 eliminates python3 calls and should reduce this noise.

**Action:** No separate action needed; will be resolved by #1.

---

### Caveats

- Only 1 run was captured in the 7-day `all-runs.json` window; the 5-run streak was confirmed via GitHub Actions API historical lookup.
- Success-run AIC (~90) is estimated from token count (460K raw tokens on May 28 success); no AIC field was available for that run in snapshots.
- The jq error (#2) may be benign if `history-summary.json` is still written with partial data; impact is indirect and conservative.
- No inline sub-agent recommendations: the workflow already has `scope-filter` and `simplification-scout` sub-agents, and the failures are all tool-policy related, not prompt-structure issues.

<details>
<summary>Supporting run evidence</summary>

| Run ID | Date | Conclusion | Turns | Permission Denials | Python calls observed |
|---|---|---|---|---|---|
| [§27052709585](https://github.com/github/gh-aw/actions/runs/27052709585) | 2026-06-06 | failure (token cap) | 50 | 11 | Yes |
| [§26995892409](https://github.com/github/gh-aw/actions/runs/26995892409) | 2026-06-05 | failure | 50 (est.) | 11+ | Yes |
| [§26931428773](https://github.com/github/gh-aw/actions/runs/26931428773) | 2026-06-04 | failure | 50 (est.) | 11+ | Yes |
| §26864348833 | 2026-06-03 | failure | — | — | — |
| §26799052440 | 2026-06-02 | failure | — | — | — |
| [§26735803823](https://github.com/github/gh-aw/actions/runs/26735803823) | 2026-06-01 | success | 10 | 0 | No |
| §26703540010 | 2026-05-31 | failure | — | — | — |
| §26674621344 | 2026-05-30 | success | — | 0 | — |
| §26618444742 | 2026-05-29 | success | — | 0 | — |
| §26555122796 | 2026-05-28 | success | 10 | 0 | No |

**References:** [§27052709585](https://github.com/github/gh-aw/actions/runs/27052709585) · [§26995892409](https://github.com/github/gh-aw/actions/runs/26995892409) · [§26735803823](https://github.com/github/gh-aw/actions/runs/26735803823)

</details>







> Generated by [Agentic Workflow AIC Usage Optimizer](https://github.com/github/gh-aw/actions/runs/27054053013) · 654.5 AIC · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fagentic-token-optimizer%22&type=issues)
> - [x] expires  on Jun 13, 2026, 5:52 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[agentic-token-optimizer] Code Simplifier — eliminate Python bash-policy failures that waste 864 AIC/run #37266

Analysis period and runs audited

Cost profile

Ranked recommendations

#1 — Add `python3` to bash allowlist (or enforce `jq`-only JSON processing)

#2 — Fix `jq` arithmetic error in the history-summary deterministic step

#3 — Reduce effective token pressure with a turn-limit guardrail

#4 — Reduce blocked unknown-domain requests (46 per run)

Caveats

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Analysis window	2026-05-28 → 2026-06-06 (10 runs)
Runs audited	10 (6 failures, 4 successes)
Failure rate	60% overall; 100% in last 5 runs
Data sources	`all-runs.json`, job step logs (runs 27052709585, 26995892409, 26931428773, 26735803823, 26555122796)

Metric	Failure run (Jun 6)	Success run (May 28)	Delta
AIC	864.23	~90 (est.)	~9.6× worse
Raw tokens	2,584,799	460,665	5.6× worse
Effective tokens	25,759,102	~460,000	55× worse
Turns	50 (max hit)	10	5× worse
Duration	13 min	~4 min	3× worse
Conclusion	failure (429 token cap)	success	—

Run ID	Date	Conclusion	Turns	Permission Denials	Python calls observed
§27052709585	2026-06-06	failure (token cap)	50	11	Yes
§26995892409	2026-06-05	failure	50 (est.)	11+	Yes
§26931428773	2026-06-04	failure	50 (est.)	11+	Yes
§26864348833	2026-06-03	failure	—	—	—
§26799052440	2026-06-02	failure	—	—	—
§26735803823	2026-06-01	success	10	0	No
§26703540010	2026-05-31	failure	—	—	—
§26674621344	2026-05-30	success	—	0	—
§26618444742	2026-05-29	success	—	0	—
§26555122796	2026-05-28	success	10	0	No

[agentic-token-optimizer] Code Simplifier — eliminate Python bash-policy failures that waste 864 AIC/run #37266

Description

Analysis period and runs audited

Cost profile

Ranked recommendations

#1 — Add python3 to bash allowlist (or enforce jq-only JSON processing)

#2 — Fix jq arithmetic error in the history-summary deterministic step

#3 — Reduce effective token pressure with a turn-limit guardrail

#4 — Reduce blocked unknown-domain requests (46 per run)

Caveats

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

#1 — Add `python3` to bash allowlist (or enforce `jq`-only JSON processing)

#2 — Fix `jq` arithmetic error in the history-summary deterministic step