Skip to content

feat(agent): enforce tool policy (allowlist + tags), prune dead surface#15

Open
joy-void-joy wants to merge 14 commits into
lup-standalonefrom
surface-pruning
Open

feat(agent): enforce tool policy (allowlist + tags), prune dead surface#15
joy-void-joy wants to merge 14 commits into
lup-standalonefrom
surface-pruning

Conversation

@joy-void-joy

Copy link
Copy Markdown
Owner

Summary

Resolves the dead/aspirational surface per the reviewed wire-keep-delete table. Stacked on #14. Net −14 lines while adding three real capabilities.

Wired (consumer existed or was one step away)

  • AGENT_MAX_TURNS / AGENT_MAX_BUDGET_USD pass through to the SDK natively (both are real ClaudeAgentOptions fields; defaults stay None = unlimited)
  • Tool availability is now enforced: allowed_tools= was ignored under bypassPermissions — replaced with create_tool_allowlist_hook (the lib utility built for exactly this), fed by ToolPolicy.get_allowed_tools(servers) which introspects registered SDK servers. BUILTIN_TOOLS corrected (dropped nonexistent TodoRead, added Edit/NotebookEdit), and StructuredOutput is allowlisted as a framework tool so the reflection-gated final output can't be bricked. Denials list what is available.
  • Tag-based filtering implements lup_tool(tags=...)'s documented promise: tools self-declare requirements (tags=["requires:example-api"]), ToolPolicy.filter_tools() drops them when the key is missing — replacing name-list bookkeeping as the primary mechanism (name sets still work). search_example demonstrates; fetch_example stays untagged as the counter-example.
  • Feedback devtools read history through the library (get_latest_session_json, list_all_session_ids, lup.client.TokenUsage, lup.metrics types) — the drifted local copies, including two dead total_cost_usd fields, are gone. version bump uses lup.history.parse_semver.

Kept with weight (per review discussion): with_retry, tracked, create_nudge_hook, create_capture_hook get full what/when/why docstrings; tracked is repositioned (tools are tracked automatically inside lup_tool; the decorator is for non-tool functions) and core.py's stale claim fixed. Behavior tests already existed and stay green.

Deleted: charts.py + the plotext dependency (superseded by usage.py's built-in renderers), http_timeout_seconds + max_concurrent_requests settings (no consumers), the list_all_sessions alias, setup.py's duplicate subprocess clipboard (shared sh-based helper in utils now), the dead zoneinfo catch, iguana_necktie, and all three # type: ignores (replaced with real types).

Structure: dev/feedback/trace typer apps moved out of __init__.py into app.py modules (git mv, history preserved) — __init__.py files are docstring-only per the repo convention; old-style typer options converted to Annotated; ProjectEntry/ParsedBranch TypedDicts replace stringly dicts.

Sandbox is optional: AGENT_SANDBOX_ENABLED=false runs the agent without code-execution tools (ExitStack-managed) — Docker is now genuinely an optional dependency, as the README claims.

Test plan

  • 14 new tests (allowlist enforcement + computation, tag filtering, sandbox toggle, option passthrough)
  • Full suite: 192 passed · ruff/pyright clean · whole-branch deletion audit done (only intended removals)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant