Skip to content
Closed
2 changes: 1 addition & 1 deletion .github/workflows/objective-impact-report.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

79 changes: 72 additions & 7 deletions .github/workflows/objective-impact-report.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,14 +207,20 @@ Create one issue titled:
Impact Efficiency Report - YYYY-MM-DD
```

Use **progressive disclosure** for the report body. Place the Executive Summary first as plain text (no collapsible wrapper). Wrap every other section individually in an HTML `<details>` block using the canonical structure (`<details>` on its own line, `<summary>…</summary>` on its own line, then a blank line before the section body, ending with `</details>`). Use descriptive summary labels that include the section's most important number where possible (e.g. `📋 Summary — 3 accepted outcomes, 43,055 AIC, IE 0.00476`, `🎯 Agentic Work by Objective — top: bug (170 value)`, `📉 Data Quality — 2 gaps`).

The report must include:

### Executive Summary

Write 2–4 sentences that directly answer: *What did the agent work on, what was the highest-impact agentic work, which workflows contributed most to that impact, how efficiently were AIC tokens spent, and what high-impact work was delivered outside agentic workflows (if any)?* Highlight the most impactful objective categories, the workflows contributing the most value, and any significant gaps (e.g., large AIC spend with no mapped objective value).

The Executive Summary must **not** be wrapped in a `<details>` block — it is always visible.

### Summary

*(Wrap this section in `<details><summary>📋 Summary — …</summary>…</details>`.)*

| Metric | Value |
|---|---:|

Expand All @@ -233,21 +239,25 @@ Include:

### Agentic Work by Objective

*(Wrap this section in `<details><summary>🎯 Agentic Work by Objective — …</summary>…</details>`.)*

Group all **accepted, mapped** outcomes by objective category (the highest-value objective label from the mapping). For each category, list:

- Objective category name and its mapping value
- Number of accepted outcomes in this category
- Total outcome value contributed
- AIC consumed by outcomes in this category
- Impact Efficiency for this category (total outcome value / AIC consumed)
- AIC consumed by outcomes in this category — use per-workflow AIC from `aic-by-workflow.json` only for workflows that are attributed to outcomes in this category; if attribution is unavailable, show `—` (not `N/A`) and add a note that workflow attribution is required to compute per-category AIC
- Impact Efficiency for this category (total outcome value / AIC consumed) — show `—` if AIC is unknown for this category
- Representative examples (up to 3 linked outcomes)

Sort categories by total outcome value descending. Also call out separately which category consumed the **most AIC** (highest denominator cost), so readers can see where budget was spent regardless of value delivered.
Sort categories by total outcome value descending. If per-category AIC is available, also call out separately which category consumed the **most AIC**; otherwise state that the most-AIC category cannot be determined without workflow attribution.

This section should make the most impactful work in the repository obvious at a glance.

### Which Workflows Drove That Impact

*(Wrap this section in `<details><summary>⚙️ Which Workflows Drove That Impact — …</summary>…</details>`.)*

Group all analyzed outcomes by the workflow that directly produced them. For each workflow, list:

- Workflow name
Expand All @@ -269,27 +279,39 @@ If any analyzed outcomes cannot be attributed to a workflow, report an unattribu

### Top outcomes by outcome value

*(Wrap this section in `<details><summary>🏆 Top Outcomes by Value — top outcome value: …</summary>…</details>`.)*

| Outcome | Workflow | Type | Root / Associated Objective | Objective Value | Outcome Value |
|---|---|---|---|---:|---:|

List the top 15 outcomes with highest Outcome Value. Include a link to the PR or issue.

### Unmapped outcomes

*(Wrap this section in `<details><summary>❓ Unmapped Outcomes — N unmapped</summary>…</details>`. If there are no unmapped outcomes, use the summary label `❓ Unmapped Outcomes — none`.)*

| Outcome | Type | Reason objective was not mapped |
|---|---|---|

Only include outcomes that were in scope (linked-issue PRs and safe-output issues) but had no matching label in the objective mapping. Do not include PRs that were excluded for lacking a linked issue — those are already counted in "PRs excluded".

### Interpretation

*(Wrap this section in `<details><summary>💡 Interpretation</summary>…</details>`.)*

Compare:

- accepted outcome count alone
- Impact Efficiency

Explain which one better reflects meaningful delivered value relative to cost.

Also compute and report a **scope-adjusted IE** when workflow attribution is available: use only AIC from workflows that are attributed to at least one PR or safe-output outcome in the analysis window. If no analyzed outcomes can be attributed to a workflow, show scope-adjusted IE as `—` and explain that workflow attribution is required to compute it.

Present both the full-denominator IE and the scope-adjusted IE; label them clearly.

Do **not** describe IE as "artificially" depressed. Instead, explain concretely what the denominator includes (all workflows, including reporting and analysis workflows that produced no PR outcomes), and present the scope-adjusted IE as the supplemental view.

Call out the most significant findings:

- which objective category delivered the most value per AIC
Expand All @@ -298,6 +320,8 @@ Call out the most significant findings:

### Data quality

*(Wrap this section in `<details><summary>📊 Data Quality — N issues</summary>…</details>`. Count the number of ⚠️ and ❌ items and use that count in the summary label.)*

Mention missing or weak links in:

- PR root tracing and linked-closing-issue coverage (count of PRs excluded for lacking a linked issue)
Expand All @@ -309,11 +333,51 @@ State whether AI Credits came from deterministic precomputed data or from a live

If AI Credits are unavailable, still produce the delivered-value analysis and clearly state that the cost-normalized Impact Efficiency metric could not be computed.

### Recommendations

*(Wrap this section in `<details><summary>✅ Recommendations — N action items</summary>…</details>`. Count only the action items that apply this cycle (i.e., have concrete evidence from this report's data) and use that count in the summary label.)*

Generate **actionable, evidence-grounded recommendations** to improve the accuracy, determinism, and attribution of the next report. Base each recommendation directly on a gap identified in the Data Quality section or a finding from the analysis — do not produce generic advice.

For each recommendation, include:

- A short title (one line)
- The specific gap or finding from this cycle that motivates it (with concrete numbers where available, e.g., "993 PRs excluded for lacking `Closes #N`")
- The expected effect on report accuracy: which metric would improve and by how much, if estimable from available data (e.g., "would convert up to N excluded PRs into analyzable outcomes")
- The owner or mechanism (workflow name, script, PR template change, etc.)

Generate a recommendation for each of the following gap categories **only when the gap is confirmed by this cycle's data**:

1. **Workflow attribution**: If any analyzed outcomes are unattributed (no `workflow_run_id`/`workflow_name` link), recommend the specific change needed to emit that field — e.g., adding a `workflow_run_id` to PR bodies at creation time, or extending the dataset preparation script to join on head-branch naming conventions. Specify which workflow or script would need the change.

2. **Linked-issue coverage**: If a significant number of PRs were excluded for lacking a `Closes #N` link, recommend adding a PR body template or linter that enforces the link for agentic-workflow-created PRs. Include the exclusion count to quantify the impact.

3. **Objective label coverage**: If any in-scope outcomes were unmapped (objective value = 0 with labels present), recommend adding the missing labels to `.github/objective-mapping.json`. List the specific unmapped labels observed this cycle.

4. **AIC per-outcome attribution**: If per-category or per-workflow AIC could not be computed because attribution was missing, recommend the minimal dataset join that would enable it (e.g., matching workflow log entries to outcome PRs by branch name or run ID).

5. **PR dataset cap**: If the merged or closed-unmerged PR dataset was capped at fewer records than the window might contain (check `dataset-manifest.json` for cap warnings), recommend increasing `OBJECTIVE_IMPACT_PR_LIST_LIMIT` or paginating the fetch.

6. **Likely-agentic reclassification**: If a significant number of PRs were moved to the "likely agentic (attribution gap)" bucket in the Human Work section, recommend the specific attribution improvement (e.g., stamping the producing workflow name in the PR body) that would collapse that bucket in the next cycle.

Only include recommendations that are directly supported by data from this cycle. Omit any category where the data shows no gap (e.g., if attribution is fully resolved, omit recommendation 1). Sort recommendations by expected impact on report accuracy, highest first.

### Human Work

*(Wrap this section in `<details><summary>👤 Human Work — N merged PRs</summary>…</details>`.)*

This section is independent of AIC and the agentic efficiency analysis above. It captures pull requests merged in the analysis window that could not be attributed to any GitHub Agentic Workflow run in the deterministic logs.

Identify merged PRs from `/tmp/gh-aw/agent/objective-impact-report/merged-prs-linked.json` that have **no** matching run in `/tmp/gh-aw/agent/objective-impact-report/workflow-logs.json` (i.e., PRs whose author or head branch cannot be linked to any workflow run that produced an outcome). Treat these as human-authored contributions for reporting, but explicitly note that missing log coverage or attribution gaps can inflate this count.
Identify merged PRs from `/tmp/gh-aw/agent/objective-impact-report/merged-prs-linked.json` that have **no** matching run in `/tmp/gh-aw/agent/objective-impact-report/workflow-logs.json` (i.e., PRs whose author or head branch cannot be linked to any workflow run that produced an outcome).

Before reporting these as human-authored, apply the following filter to identify **likely-agentic PRs** that may appear human due to attribution gaps:

- PR title matches patterns such as `[docs]`, `[linter-miner]`, `[fix]`, `[refactor]`, `[chore]`, or other known bot-prefixes used in this repository.
- PR author is a bot account (login ending in the literal suffix ``[bot]`` — e.g. ``dependabot[bot]`` — or known agentic accounts).

Report likely-agentic PRs in a separate sub-table labelled **"Likely agentic (attribution gap)"** rather than counting them in the human total. This prevents attribution gaps from inflating the human work count. Explicitly note how many PRs were reclassified.

For the remaining PRs classified as human-authored, treat them as human contributions for reporting. Explicitly note that missing log coverage or attribution gaps can still inflate this count.

For each human-authored merged PR that has a linked closing issue (non-empty `linked_issue_numbers`), use precomputed objective fields from `merged-prs-with-objective.json` when available; otherwise resolve issue labels from linked issues and apply `objective-mapping.json`. Group results by objective category (highest-value mapped label) and report:

Expand All @@ -324,9 +388,10 @@ For each human-authored merged PR that has a linked closing issue (non-empty `li

Also report:

- Total number of human-authored merged PRs identified in the analysis window
- Number with a linked closing issue vs. without
- Number mapped to an objective vs. unmapped
- Total number of merged PRs in the dataset
- Of those: likely-agentic (attribution gap), confirmed-human
- Of confirmed-human: with linked closing issue vs. without
- Of confirmed-human with linked issue: mapped to objective vs. unmapped

Sort categories by total objective value descending. Do **not** compute AIC or Impact Efficiency for this section — human work has no associated AI Credits cost.

Expand Down
Loading