Releases: CodeAlive-AI/ai-driven-development
v3.4.0 — Artefact schemas, run aggregator, grading checklist
Summary
Three additions to skills-management, adopted and adapted from upstream anthropics/skills' skill-creator (commit `690f15c`, May 2026). The upstream is creator-focused; these adaptations target the management / optimisation context.
What's new
| File | Purpose |
|---|---|
references/optimization-artifacts-schemas.md |
Formal JSON schemas for every artefact written by `optimize_skill.py` and `log_skill_edit.py`: `splits.json`, `state.json`, `rollouts.jsonl`, `proposals.json`, `decision.json`, `edit_apply_report.json`, `rejected_buffer.json`, `meta_skill.json`, `optimization_report.md` frontmatter, `test_rollouts.jsonl`, `.skill_edit_log.jsonl`, `.skill_snapshots/`. Schema stability is a v3.x guarantee — breaking changes bump major. |
| `scripts/aggregate_runs.py` | Aggregate N optimisation runs into a single summary. Computes mean/stddev/min/max for `test_score`, `tokens_delta`, `test_delta_vs_baseline`, plus per-run lists for accepted/rejected edits. `--compare` for side-by-side; text / json / md output; robust to incomplete runs. |
| `references/optimization-grading-checklist.md` | Audit checklist applied to a finished optimisation run before shipping `best_skill.md`. Pre-flight + per-artefact review (best_skill, edit_apply_report, rejected_buffer, optimization_report+test_rollouts, meta_skill) + red flags + green-light decision paths. |
What we deliberately skipped from upstream
These are creator-only machinery for description prose iteration, not a fit for managing/auditing existing skills:
- `agents/grader.md`, `agents/analyzer.md`, `agents/comparator.md` — eval-loop orchestration
- `scripts/run_loop.py`, `scripts/improve_description.py`, `scripts/run_eval.py` — description-iteration loop
- `eval-viewer/` — creator-side visualisation
SKILL.md changes
- Quick Reference adds the aggregator
- References table adds the two new docs
Compatibility
- Backwards compatible. All v3.3.0 scripts and schemas unchanged.
- Python 3.10+, stdlib only. No new dependencies.
Test plan
- `python3 scripts/aggregate_runs.py --help` exits 0
- Run `optimize_skill.py` once, then `aggregate_runs.py ` and confirm summary
- Run `optimize_skill.py` twice with different seeds, then `aggregate_runs.py r1 r2 --compare`
- Read `references/optimization-grading-checklist.md` end-to-end against a real run
🤖 Generated with Claude Code
v3.3.0 — SkillOpt training loop for skills-management
Summary
skills-management becomes a trainable skill manager, not just a CRUD tool. Based on SkillOpt (Microsoft, May 2026): treat the SKILL.md document as the external trainable state of a frozen agent, with the same discipline that makes weight-space optimisation reproducible — bounded edits, held-out validation gate, rejected-edit buffer, epoch-wise slow/meta update.
New scripts
| Script | Purpose |
|---|---|
scripts/optimize_skill.py |
Full SkillOpt loop: train/sel/test splits, rollout via claude -p, failure/success mini-batch reflection, hierarchical merge, ranked bounded apply (constant/linear/cosine L_t schedules), strict-greater validation gate, rejected-edit buffer, epoch-boundary slow update into a protected section, optimiser-side meta-skill. Supports --dry-run and --resume. |
scripts/log_skill_edit.py |
Append-only audit log with SHA chain, token delta, optional --snapshot. |
scripts/diff_skill_versions.py |
Diff between git commits, explicit files, or snapshot history; unified/stats/side-by-side formats. |
scripts/trigger_test.py |
Trigger tests with heuristic or claude-cli judge, P/R/F1 metrics. --generate seeds cases.yaml from the description. |
scripts/transfer_test.py |
Structural verification of a skill across all 42 supported agents; --copy --to <agent>, --all. |
New prompt contracts (verbatim §C.2 of the paper)
prompts/analyst_error.md, analyst_success.md, merge_failure.md, merge_success.md, merge_final.md, ranking.md, slow_update.md, meta_skill.md.
New reference
references/skill-optimization.md — when to optimise, five core principles, targets (300-2000 tokens, 1-4 accepted edits), one-page algorithm, hyperparameter defaults, six anti-patterns, transfer evidence.
review_skill.py upgrades
- Token footprint (300-2000 target per Table 6 of the paper; penalties at 2000/4000)
- Procedurality check (instance-specific markers — filenames, literal numbers, task references — should be rare)
- Patch-friendliness (anchor density + duplicate-anchor detection for reliable
insert_afteredits) - Slow-update section integrity (balanced
<!-- SLOW_UPDATE_START -->/<!-- SLOW_UPDATE_END -->markers)
JSON output gains body_tokens, references_tokens, slow_update_tokens, total_tokens, anchor_density, heading_count. CLI flags and exit codes unchanged.
SKILL.md changes
- New section: Optimize a Skill (SkillOpt-style)
- Quick Reference gains 8 new commands (optimize, log, diff, trigger-test, transfer-test, generate cases)
- Documented
<!-- SLOW_UPDATE_START/END -->protected-section convention - Description extended with new trigger phrases: "optimise skill", "train skill on tasks", "iterate skill", "audit skill edits", "log skill edit", "diff skill versions", "trigger test skill", "transfer skill across agents"
Patterns reference
06-patterns-and-troubleshooting.md gains Pattern 6: Validated iterative refinement with the blind-rewrites anti-pattern.
Compatibility
- Backwards compatible. All existing scripts unchanged.
review_skill.pykeeps every existing rule. - Python 3.10+, stdlib only. No new dependencies.
optimize_skill.pyshells out toclaude -pfor rollouts and optimiser calls — inherits whatever subscription/API the user has configured.
Test plan
- Run
python3 scripts/review_skill.py <any-skill>and verify the new Token footprint block appears. - Run
python3 scripts/optimize_skill.py <skill> --tasks tasks.jsonl --dry-runand verify schedule + splits + prompt previews print without LLM calls. - Run
python3 scripts/log_skill_edit.py <skill> --reason "test" --dry-runand verify the planned entry. - Run
python3 scripts/trigger_test.py <skill> --generate > cases.yamlthen--cases cases.yaml. - Run
python3 scripts/transfer_test.py <skill> --allto verify cross-agent placement.
🤖 Generated with Claude Code
refactoring-csharp v0.1.0
Roslyn-based C# rename refactorer packaged as an agent skill.\n\nAssets include release installers, the skill archive, self-contained CLI binaries for macOS/Linux/Windows, and SHA256 checksums.\n\nQuick install:\n\nmacOS/Linux:\nbash\ncurl -fsSL https://github.com/CodeAlive-AI/ai-driven-development/releases/download/refactoring-csharp-v0.1.0/install.sh | bash\n\n\nWindows PowerShell:\npowershell\nirm https://github.com/CodeAlive-AI/ai-driven-development/releases/download/refactoring-csharp-v0.1.0/install.ps1 | iex\n
v3.1.0 — investigating-repository-history
What's new
Adds investigating-repository-history — the 19th skill in the umbrella collection.
investigating-repository-history
Reconstructs the historical intent behind code before risky edits. Returns a compact, cited history note instead of guessing.
- Local
git blame -w -M -C -C -C+git log --follow+ pickaxe (-S) for seed commits - GitHub PR / review / inline-comment evidence via
ghCLI (no directapi.github.comcalls) - Anomaly handling: squash merges, rebase merges, cherry-picks, backports, reverts/re-applies, rename/move lineage, mass-refactor downweighting, generated-file detection
- Decision-atom extraction (compatibility / security / performance / concurrency / rejected approaches / test requirements) with confidence scoring
- Output as JSON (machine-readable schema) or compact Markdown history note
- Self-contained test suite: 19 unittest cases (stdlib only) — structure, frontmatter, gh-only access policy, validator, and a local-only smoke test that builds a throw-away git repo
Other changes
- Bumps plugin and marketplace versions to 3.1.0
- README updated: 18 → 19 skills, "Engineering practices (4)", repo structure
- Bug fix:
cmd_commit_prsno longer crashes onargs.no_ghfor non-inspectparsers
Install
# Single skill via Skills CLI
npx skills add CodeAlive-AI/ai-driven-development --skill investigating-repository-history
# Or all 19 skills
npx skills add CodeAlive-AI/ai-driven-development
# Claude Code plugin
/plugin marketplace add CodeAlive-AI/ai-driven-development
/plugin install ai-driven-development@ai-driven-developmentFull Changelog: v3.0.0...v3.1.0
bash-guard v0.2.0 — git rule family
Adds git as its own rule family covering the everyday "lost work" surface every developer hits. Previously bash-guard caught only git push --force; v0.2.0 expands coverage to nine more destructive git operations plus BFG Repo-Cleaner.
What's new
| Trigger | Reason code |
|---|---|
git push -f / --force / --force-with-lease[=…] / +<refspec> |
git.force_push |
git push --delete / -d / origin :<branch> |
git.push_delete |
git reset --hard [<ref>] |
git.reset_hard |
git clean -f / --force / -fd / -fdx |
git.clean_force |
git checkout . / -- <pathspec> |
git.checkout_pathspec |
git restore . (carve-outs: --source, --staged, --cached) |
git.restore_pathspec |
git branch -D / --delete --force |
git.branch_force_delete |
git stash drop / clear |
git.stash_loss |
git filter-branch, git filter-repo (carve-out: --analyze) |
git.history_rewrite |
bfg … (carve-outs: --help, -h, --version) |
git.history_rewrite |
Reason codes for the existing force-push case were renamed infra.git_force_push → git.force_push (pre-1.0; cleaner audit-log scoping now that git has its own family).
False-positive carve-outs
git reset --soft / --mixed— preserves working tree → allowgit clean -n(dry-run) and baregit clean(no-op) → allowgit checkout <branch>,git checkout -b <name>— branch switch, no pathspec → allowgit restore --source=<ref> .— intentional restore from a specific source → allowgit restore --staged <file>— unstage, no working-tree change → allowgit branch -d <name>(lowercase) — safe-delete refuses on unmerged → allowgit stash pop / apply / list / show / push— non-destructive → allowgit filter-repo --analyze— read-only report, no rewrite → allowbfg --help— usage print, no rewrite → allow
31 new golden fixtures (allow + ask + corner case per behaviour) — total fixture count now ~155.
Quick install (no Go required)
curl -fsSL https://raw.githubusercontent.com/CodeAlive-AI/ai-driven-development/main/hooks/balanced-safety-hooks/install-prebuilt.sh | shDetects host OS/arch (darwin / linux × arm64 / amd64), downloads the matching binary from this release, verifies SHA-256, and patches ~/.claude/settings.json.
To pin this version explicitly: BASH_GUARD_VERSION=bash-guard-v0.2.0.
Upgrading from v0.1.0
Re-run the installer — it replaces the binary in place and the next Bash hook fire picks it up. Audit-log consumers should note the reason-code rename: any dashboard/grep targeting infra.git_force_push should switch to git.force_push.
Why a "git" family
Git's destructive edges (reset --hard, clean -fd, branch -D, stash drop, filter-*) are exactly where banner-blindness Allow-mashing hurts most: the operations are rare in normal flow, almost always intentional, and the cost of a missed ask is unrecoverable lost work. Each rule was selected for low FP rate and high lost-work cost — the same opinionated bar as the other families.
bash-guard v0.1.0
First release of bash-guard under the consolidated ai-driven-development umbrella — a balanced Bash safety hook for autonomous coding agents in Go. It parses every command with a real shell AST and asks for human confirmation only on the genuinely destructive ones — gating the dangerous, allowing the rest, deliberately not using deny (modern agents trivially bypass it).
Note: This is the same binary as the original release in
CodeAlive-AI/awesome-agent-skills(v0.1.0). The Go module path and install URLs were updated for the new repo location. Functionality is unchanged.
Quick install (no Go required)
curl -fsSL https://raw.githubusercontent.com/CodeAlive-AI/ai-driven-development/main/hooks/balanced-safety-hooks/install-prebuilt.sh | shDetects host OS/arch (darwin / linux × arm64 / amd64), downloads the matching binary from this release, verifies SHA-256, and patches ~/.claude/settings.json.
Highlights
- Real Bash AST via
mvdan.cc/sh— heredoc, single-quoted prose, executor wrappers (sudo,env,xargs,find -delete/-exec,bash -c,eval,ssh host "…", pipe-to-shell) all classified correctly. askby default, nodeny— modern agents trivially bypassdenyby rephrasing/splitting/wrapping/switching tools, especially under prompt-injection.askkeeps the human in the loop.- Safe-paths matrix with carve-outs —
/etc,/usr,$HOMEare protected;/tmpand explicitly trusted project paths allowed. - PocketOS-class API coverage — vendor CLIs (railway / fly / heroku / aws / az / gcloud),
curl -X POST/PUT/PATCH/DELETEagainst cloud control-plane and GraphQL mutations, DB clients (psql -c "DROP DATABASE",redis-cli FLUSHALL), ORM migrations (alembic / prisma / drizzle / rails db:migrate / …). - Asymmetric fail-open — pre-trigger parse failures →
allow; post-trigger parse failures →ask. - Trusted-projects allowlist — per-repo
.claude/bash-guard.tomlis opt-in via globaltrusted-projects.toml. - Performance — ~0.16 ms quick-reject, <5 ms full parse + rule evaluation.
Binaries
Static, stripped, CGO-disabled:
| Platform | File |
|---|---|
| macOS (Apple Silicon) | bash-guard-darwin-arm64 |
| macOS (Intel) | bash-guard-darwin-amd64 |
| Linux (arm64) | bash-guard-linux-arm64 |
| Linux (amd64) | bash-guard-linux-amd64 |
SHA256SUMS is attached for verification.
Build from source
git clone https://github.com/CodeAlive-AI/ai-driven-development.git
cd ai-driven-development/hooks/balanced-safety-hooks
./install.sh --liveRequires Go ≥ 1.21 and jq.
Full docs: hooks/balanced-safety-hooks/
v2.1.0 — OpenCode full support + 2026 actualization
Highlights
Full OpenCode support (anomalyco/opencode v1.14.x) across 6 of 7 skills. Plus a comprehensive 2026 refresh of Claude Code and Codex CLI coverage.
What's New
OpenCode (new — full coverage)
Six new reference docs, one per supported capability:
settings/references/opencode-settings.md— JSON/JSONC config, deep-merge semantics, 8-layer file precedence, permission glob rules, provider blocks, env-var substitutionsubagents/references/opencode-agents.md—agentblocks (JSON + markdown forms),mode: primary/subagent/all, AGENTS.md hierarchy with CLAUDE.md fallback,opencode agent create/listCLImcp/references/opencode-mcp.md—type: local/remote,commandarray,environment(notenv), OAuth viaopencode mcp auth, field-by-field comparison vs Claude Code/Codexhooks/references/opencode-hooks.md— plugin-based hooks (Bun/TypeScript/@opencode-ai/plugin), 25+ lifecycle events,tool.execute.beforeblocking pattern, MCP-hook caveatplugins/references/opencode-plugins.md— plugin context fields, custom tools viatool()helper with Zod schemas, distribution conventions, awesome-opencode + npmskills/references/opencode-skills.md— 6-path skill discovery,permission.skillglob rules, frontmatter constraints, AGENTS.md interplay
Codex CLI (actualized for 2026-04)
- Hooks GA in v0.124.0 (April 2026): all six events documented (
SessionStart,UserPromptSubmit,PreToolUse,PermissionRequest,PostToolUse,Stop), inline TOML config, exit-code-2 vs JSONpermissionDecision: denyblocking - Subagents GA (March 2026): three built-ins (
default,explorer,worker),[agents]orchestration block, custom subagent definitions in~/.codex/agents/*.toml - Defaults:
gpt-5.5model, granular approvals object (approval_policy = on-failuredeprecated), new[features]flags - MCP: OAuth 2.1 fields,
supports_parallel_tool_calls,default_tools_approval_mode,experimental_environmentfor remote-stdio
Claude Code (actualized for 2026-04)
- 14 new hook events:
PostCompact,PostToolUseFailure,SubagentStart,TaskCreated,TaskCompleted,TeammateIdle,ConfigChange,FileChanged,CwdChanged,WorktreeCreate,WorktreeRemove,Elicitation,ElicitationResult,InstructionsLoaded,UserPromptExpansion(28 events total) - New handler types:
http,mcp_tool,prompt,agent(wascommandonly) - Settings:
autopermission mode (March 2026),disableSkillShellExecution,prUrlTemplate,worktree.{sparsePaths,symlinkDirectories},sandbox.network.deniedDomains - MCP: SSE end-of-life April 2026; Streamable HTTP promoted with OAuth 2.1, RFC 9728 PRM, CIMD;
_meta["anthropic/maxResultSizeChars"](500K) - Plugins: new manifest fields (
outputStyles,themes,lspServers,monitors,userConfig,channels,dependencies);claude plugin tag,--keep-data,/reload-plugins - Subagents:
effort,maxTurns,isolation: worktree,background,color,memory,permissionMode: auto
Upgrade
Plugin restart required after installation.
/plugin marketplace update agents-reflection-skills
/plugin install agents-reflection-skills@agents-reflection-skillsStats
- 25 files changed, +3407 / -351 lines
- 6 new reference documents
- 11 commits across 3 parallel feature branches, integrated and shipped
v2.0.4 — harden permissionDecision guidance in hooks-management
hooks-management: permissionDecision guidance that agents can't miss on first read
Follow-up to v2.0.3: agents were still reaching for exit 2 or inventing their own confirmation schemes (env-var flags, osascript dialogs, bypass tokens) because the JSON decision control lived near the bottom of the SKILL.md and was only briefly mentioned elsewhere.
What changed
- Decision Control section moved up — now appears immediately after Common Patterns, before Codex / Validation, so it reads during a top-to-bottom pass.
- Quick Reference explicitly names JSON decision control as the default mechanism for PreToolUse and calls out anti-patterns to avoid.
- Script template for PreToolUse rewritten — JSON
ask/denyare the primary path;exit 2is positioned as a fallback for simple blocking only. - "User Says" translation table gained three rows: "require manual approval", "ask before dangerous", "block unless confirmed" — all pointing to
permissionDecision: "ask". - New gotcha documented:
"ask"is silently bypassed whenpermissions.allowalready matches the tool. Symptom = hook appears to do nothing; fix = narrow the allow rule.
Why
Agents invoking this skill kept falling into two traps: (1) using exit 2 when the intent was "ask the user", producing a UX worse than the native confirm prompt, and (2) reinventing confirmation with home-grown env vars or osascript dialogs that don't integrate with Claude Code's permission system. The fixes above put the right pattern on the first page the agent reads.
No behaviour changes in scripts, validators, or any other skill. Docs-only.
v2.0.3
hooks-management: Add PreToolUse decision control documentation
- Document
permissionDecisionJSON output (allow,deny,ask) as the preferred alternative to bare exit codes for PreToolUse hooks - Add working examples: "ask user before dangerous command" and "deny with reason"
- Fix translation table: "ask before dangerous commands" now correctly maps to PreToolUse (was PermissionRequest)
- Update script template and exit codes section to reference JSON decision control
v2.0.2 - Fix skills-management trigger and post-search review
Changes
- Fix skills-management trigger: Updated description with explicit trigger phrases ("find a skill for X", "search for a skill", "is there a skill for X") so the skill is invoked automatically for discovery requests
- Fix post-search review behavior: Changed from only reviewing ambiguous results to always suggesting review of found skills after a search
- Bump plugin.json version to 2.0.2