Skip to content

fix(lup): import-safe standalone library + tool-gate primitive + behavior tests#14

Open
joy-void-joy wants to merge 23 commits into
devtools-correctnessfrom
lup-standalone
Open

fix(lup): import-safe standalone library + tool-gate primitive + behavior tests#14
joy-void-joy wants to merge 23 commits into
devtools-correctnessfrom
lup-standalone

Conversation

@joy-void-joy

Copy link
Copy Markdown
Owner

Summary

Makes packages/lup genuinely standalone and adds the library's missing behavior-test layer. Stacked on #13.

Import-safety / one path system

  • lup.paths resolves the project root lazily (a pip-installed lup no longer crashes at import outside a [tool.lup] project); configure() tolerates roots without a pyproject (version falls back to 0.0.0)
  • AGENT_NOTES_PATH / AGENT_LOGS_PATH are wired through lup.paths.configure() at CLI startup — they now relocate all session data, not just trace files; four devtools value-imports of AGENT_VERSION (frozen at import) migrated to the accessor

Silently-wrong-result APIs fixed

  • query(output_type=T, options=prebuilt) no longer drops the structured-output request (previously returned None silently)
  • update_session_metadata picks the latest session by parsed timestamp, not lexicographic path (0.10.0 < 0.9.0)
  • run_code(timeout_seconds=0) means no deadline, as documented — it previously killed the REPL after 5s and lost all state
  • error ResultMessages are emitted/traced before raising, so failing sessions leave evidence

Permissions: the notes read-only grant no longer includes notes/traces/<version>/logs/ — the docstring's "agent cannot access logs" is now true (RO covers sessions/ and outputs/ across versions, logs excluded, pinned by tests).

create_tool_gate — the four existing deny-until-unlocked guards (reflection gate, stop guard, pending-event guard, meta-before-sleep) were one pattern implemented four ways; they're now thin presets over a single documented primitive in lup.hooks, with the pattern written up in PATTERNS.md.

Library hygiene: TraceLogger stores entries once (lines rendered at save; read_entries kept as the replay affordance); SleepResult.time is set; TIMESTAMP_FMT deduplicated; dated model defaults bumped to claude-sonnet-4-6; lup.lib.* ghost references removed; _-prefix sweep in background/realtime/throttle; file-level # claude: ignore moved out of the mcp docstring.

Packaging: mcp dependency declared (was only transitive), py.typed ships in the wheel, packages/lup/README.md added (pyproject already pointed at it).

Barrel: lup.__init__ now declares the complete public API with a drift test asserting every __all__ name resolves.

Test plan

  • 64 new library behavior tests: tool-gate lock/unlock + all four presets, lup_tool error propagation, RW/RO permission hooks (incl. the logs exclusion), resolve_version fallback, latest-by-timestamp, lazy paths/configure, TraceLogger round-trip, truncation recursion, client option-injection, sandbox deadline computation, barrel drift
  • Full suite: 178 passed · ruff/pyright clean

lup.paths ran find_project_root() at import time, so a pip-installed
lup could not even be imported outside a [tool.lup] project. Root,
version, and base dirs now resolve on first accessor call (cached in
an internal PathConfig), configure() stays the override, and
configure(root=X) tolerates a missing pyproject (version falls back
to 0.0.0 unless given). The from-importable mutable globals are gone;
devtools read the version through the agent_version() accessor and
default typer version options to None (resolved per invocation).
The settings existed but were never applied, so session data always
landed under the project root. The typer callback now configures
lup.paths on every invocation, anchoring relative values at the
project root and honoring absolute overrides.
query(output_type=T, options=prebuilt) computed an output_format that
build_client then ignored, so structured output came back None with no
error. prepare_output_format() now injects the computed format into the
provided options (ValueError when they already set one), and
build_client raises on any keyword argument combined with pre-built
options instead of silently discarding it.
An error ResultMessage raised inside the collector before the yield,
so the failing message was never seen by iteration consumers, never
logged, and never reached the trace. The error is now logged, written
to the trace logger, and yielded; the RuntimeError fires when the
consumer resumes the generator.
update_session_metadata chose its target via a lexicographic sort of
full paths, which orders version directories wrong across versions
(0.10.0 < 0.9.0) and could update a stale file. Candidates are now
ranked by the filename timestamp via lup.paths.parse_timestamp.
run_code(timeout_seconds=0) is documented as no timeout, but the host
still enforced a 0+5 second deadline, killing the REPL connection and
losing all session state. compute_deadline() now returns None for
non-positive timeouts, recv_response blocks indefinitely in that case,
and the in-sandbox SIGALRM is already skipped for non-positive values.
setup_notes granted ro=[traces/<version>/], which contains logs/ —
contradicting the contract that trace logs are invisible to the agent.
The RO grant now lists the sessions/ and outputs/ directories of every
existing version (current one included), never logs/.
Four hook factories shared one pattern: deny tool B (or Stop) with an
agent-readable message until condition A holds. create_tool_gate in
lup.hooks is the general primitive (gated tool(s), static or dynamic
message, input-aware unlock predicate, optional on_unlock_tool tracked
via PostToolUse, deny/block styles, PreToolUse or Stop). The existing
factories — create_reflection_gate, create_stop_guard,
create_pending_event_guard, create_meta_before_sleep_guard — keep
their public signatures and behavior as thin presets over it.
truncate_str_fields dropped max_len_list on recursion, so nested lists
inside dicts or lists reverted to the default limit.
The lines list duplicated entries[].content and only lines fed save(),
so entry edits could diverge from the saved file. save() now renders
the line stream from entries; read_entries() stays for replaying
recent trace context in persistent sessions.
The time field was declared on SleepResult but never populated.
history.save_session and notes.setup_notes hardcoded the format string
that parse_timestamp expects; both now share the constant.
The marker sat inside the module docstring, rendering in help() and
not registering as a comment. It is now a bare comment line at the top
of the file, where the edits hook recognizes file-level markers.
The catch-all swallowed every failure during socket teardown; only
socket-layer close errors (OSError, ValueError on closed I/O) are
expected there.
Renames in background (task, wake_event, running, message_generator,
run, handle_response), realtime (scheduler attributes; the ideas
property becomes a plain attribute), and throttle (max_concurrent,
min_interval, loop_states) per the no-private-prefix convention.
dict[str, object] payloads that are inherently open (JSON Schema,
domain session JSON, SDK stream turns) carry claude: ignore markers.
lup.mcp imports the mcp package but the dependency was undeclared
(satisfied only transitively). The py.typed marker ships via
setuptools package-data so consumers get type information, and the
README documents the library since pyproject declares it.
Adds the genuinely public surface that was missing from __all__:
scheduler and guards, background agents, session history, notes setup,
throttle, retry, tool gate and hook factories, metrics tracking, and
timestamp helpers. Sandbox is exported via lazy module __getattr__ so
import lup works without the docker extra. In-repo imports stay
direct-from-module.
Covers create_tool_gate and all four presets (deny before unlock,
pass/allow after), lup_tool response paths (success, ToolError,
validation failure, direct call), permission hook RW/RO enforcement,
the notes RO grant excluding logs (regression), resolve_version
progressive fallback, latest-session selection by parsed timestamp
across versions (regression), lazy path configuration without a
pyproject, TraceLogger save/slice and nested truncation limits,
with_retry, tracked metrics, nudge/capture hooks, barrel drift,
query option injection with pre-built options, error-result tracing,
and the sandbox deadline computation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant