[codex] fix code health findings by logpie · Pull Request #3 · logpie/otto

logpie · 2026-04-24T08:16:35Z

Summary

This PR fixes the actionable findings from the retried code-health audit:

hardens queue lifecycle behavior, including immediate shutdown escalation and queue spec path lookup
validates queue manifest task ids at the path-helper boundary
removes stale merge-resume UI/state surface instead of keeping a deferred hidden option
strengthens queue drain, env scoping, nightly fixture, and E2E assertions
removes dead helpers/imports and stale docs/ignore patterns

Audit artifacts remain local under audits/2026-04-24-0043/ and were not committed to keep the PR focused.

Validation

uv run pytest tests/test_paths.py tests/test_manifest.py tests/test_merge_orchestrator.py tests/test_cli_merge.py tests/test_queue_runner.py tests/test_env_bypass.py tests/test_otto_as_user_nightly.py -q -> 158 passed
uv run python scripts/e2e_runner.py B1 B5 B13 -> 3/3 passed
uv run pytest -q --maxfail=10 -> 911 passed
uv run ruff check otto scripts tests --select F401,F821,F841 -> passed
uv run python -m compileall -q otto scripts tests -> passed
uv run python -m pip check -> passed
git diff --check -> passed

coderabbitai · 2026-04-24T08:19:25Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02100dda-4bf2-4b6f-9845-0510a9b4cf8d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/code-health-retry-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CLUSTER C — bundle/build integrity (closes 1 CRITICAL + 4 IMPORTANT + 1 NOTE): CRITICAL closed: - `app.py:35`: `otto web` served checked-in bundle without verifying it matches source — developer skips `npm run web:build` → silent stale UI (codex-packaging-bundle #1). IMPORTANT closed: - Python build never built frontend; `pip install -e .` shipped whatever static was in tree (#2). Vite plugin emits `build-stamp.json` on every build with source-hash + timestamp + git commit; FastAPI startup verifies freshness via `verify_bundle_freshness()`. - `web:build` ran only Vite; `web:typecheck` advisory; no CI gates (#3). New `web:verify` script chains typecheck → build → committed-check. - Default cache headers underused hashed assets (#4). New `_CacheHeaderStaticFiles` subclass: `no-store` for shell + index.html, `public, max-age=31536000, immutable` for `/static/assets/*`. - Server didn't validate `index.html` referenced JS/CSS exist (#5). Startup parses index.html and asserts every referenced static path resolves; missing → fail-fast with `npm run web:build` guidance. NOTE closed: - `[tool.setuptools.package-data]` was flat (`static/*`, `static/assets/*`); future nested assets (fonts/, images/, locale/) would silently miss wheels. Now `static/**/*` recursive glob. Files: otto/web/bundle.py (new, 263 lines), otto/web/client/vite.config.ts (build-stamp emitter), scripts/build_stamp.py (CLI for manual stamp), scripts/check_bundle_committed.py (git-diff guard for CI), package.json (web:verify script), pyproject.toml (recursive package_data), otto/web/app.py (verify_bundle_freshness call + _CacheHeaderStaticFiles). Tests: tests/test_web_bundle_freshness.py (5 tests) + tests/test_web_cache_headers.py (2 tests, 8 actual checks). 15 new server-layer tests, all green. CLUSTER D — history pagination (closes 1 CRITICAL + 2 IMPORTANT): CRITICAL closed: - `total_rows=247` displayed but only first page rendered; power user with 200+ runs stuck on page 1 (heavy-user, codex-state-management #6, codex-long-string-overflow #3). IMPORTANT closed: - `/api/state` accepted `history_page` + `page_size`; client never sent `history_page` and rendered no controls. - Page-size selector now lives in the UI (10/25/50/100, default 25); server clamps to [1, 200] to refuse stale URLs requesting unbounded slices. Implementation: - `MissionControlFilters.history_page_size: int | None` for per-request override (server clamps to safe range). - App.tsx History pane: pagination footer (Page N of M · X runs · ←/→ · jump-to + page-size selector). URL persists `hp` + `ps` query params. - Filter changes reset page to 1. - Stale deep-link `?hp=99` (out-of-range) → "Page 99 doesn't exist; jump to page 1" with reset button. Files: otto/mission_control/{model.py,serializers.py} (history_page_size plumbing), otto/web/app.py (param wiring), otto/web/client/src/{App.tsx, api.ts,types.ts,styles.css} (pagination UI), tests/browser/test_history_pagination.py (10 paired Playwright tests, all green). Verification: - Browser suite: 25 passed (8 cluster A + 7 cluster B + 10 cluster D + new smoke set from cluster C cache headers via tests/test_web_cache_headers.py) - Default suite: 1091 passed (was 1076; +15 cluster C server tests) - npm run web:typecheck: clean - npm run web:build: 277.82 kB JS / 33.34 kB CSS (rebuilt with stamp) Note: cluster C agent hit an API overload during its final summary step but all files landed cleanly on disk; verified by independently running the new test files before committing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

============================================================ 5 IMPORTANT closures ============================================================ 1. W2-IMPORTANT-3 — Watcher cancel-vs-dispatch race Cancel POST landing between begin_command_drain and _dispatch_new was deferred to next tick — by then the queued task had already spawned. Added a `late_drain` callback into _dispatch_new called once at entry AND before each task's spawn evaluation; deduped against already-applied command_ids. Lock semantics in begin_command_drain are idempotent so the second call is safe. 2. Codex info-density #6 — Mission Focus active headline Working state showed only `<N> tasks in flight`. Now sorts live items by elapsed_s ascending, picks the hottest, and assembles `task-id · branch · elapsed · cost · last-event`. Segments dropped when missing or when cost is "…" placeholder. 3. Codex info-density #3 — Status badge tones .task-status was one gray pill regardless of status/stage. Added statusTone() helper mapping status+stage → success/running/warning/danger/neutral. CSS adds tinted bg + text colors per tone using existing palette tokens. 4. Codex error-empty-states #11 — Filter-blind empty state Empty board read "No work queued" even when filters were hiding tasks. Added BoardEmptyReason + computeBoardEmptyReason + filtersAreActive helpers. Banner now shows reason-specific copy: "No matching tasks." + Clear-filters link (filtered) vs "No work queued..." + Queue-job CTA (true-empty). Subtitle and per-column empty also reflect filter awareness. 5. Codex first-time-user #19 — Queued cards clickable TaskCard had disabled={!task.runId}, so queued tasks not yet picked up by watcher had a dead button. Users couldn't open Details. Added onSelectQueued prop; selectQueuedTask state. RunDetailPanel renders "Waiting for watcher" placeholder with title/status/branch/intent/reason + Start-watcher CTA when stopped. useEffect auto-promotes selection to real runId once the watcher dispatches (live row's queue_task_id matches). ============================================================ Tests added (5 new files, 14 total tests) ============================================================ - tests/test_queue_cancel_race.py (3 tests) - tests/browser/test_mission_focus_active_headline.py (3 tests) - tests/browser/test_task_status_badge_tones.py (2 tests) - tests/browser/test_task_board_filter_blind_empty.py (3 tests) - tests/browser/test_queued_card_clickable.py (3 tests) ============================================================ Test counts ============================================================ - Default suite: 1136 passed (was 1133; +3) - Browser suite: 102 passed (was 88; +14) - npm web:typecheck: clean - npm web:build: 305.39 kB JS / 40.27 kB CSS ============================================================ Tally ============================================================ CRITICAL: 22 of 22 closed (100%) IMPORTANT: ~53 of 132 closed (~40%) Followups: - statusTone could extend to history table + live-runs panel pills - TaskBoard "search input" data-testid for stable test selectors - Queued placeholder could surface cancel/remove actions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Closes findings #6, #7, #8, #9, #10, #12, #14 from the codex-long-string-overflow hunter. Implementation: - Shared utilities .mc-wrap-anywhere + .mc-line-clamp-3 so future free-text fields opt in by class - Toast / status-banner / job-dialog status: max-height + overflow-y:auto so long strings never push siblings (dismiss/submit) offscreen - Confirm dialog body + bulk-row branch: wrap-anywhere - Review packet (#10): replaced silent ellipsis with 3-line clamp + wrap-anywhere; existing title= attribute provides full text on hover - Changed-file <li>: wrap-anywhere; .review-packet code: display:block + pre-wrap so diff_command snippet wraps inside the drawer (inline code runs explicitly re-set to display:inline so override doesn't break) - Diff toolbar (#14): nowrap+ellipsis with title tooltip on right span; min-width:0 on flex children prevents toolbar drift; matches audit's suggested affordance Tests (tests/browser/test_long_string_overflow.py — 7 new): - test_toast_wraps_long_url - test_last_error_banner_clamps_long_message - test_job_dialog_status_does_not_push_submit_offscreen - test_confirm_dialog_wraps_long_branch_names - test_review_packet_long_text_wraps_consistently - test_changed_file_list_wraps_long_paths - test_diff_toolbar_branch_truncates_long_names Each test asserts (a) no document-level horizontal scroll, (b) the overflowing element fits within its parent's width, (c) the long string remains in DOM (CSS overflow, not JS truncation). Test counts: - Default: 1136 passing (no regression) - Browser: 109 passing (was 102; +7) - npm web:typecheck: clean - npm web:build: clean CRITICAL: 22 of 22 closed (100%) IMPORTANT: ~60 of 132 closed (~45%) Followups (deferred): - Long-string findings #2/#3/#4/#5/#11/#13 need React state/render changes (perf + virtualization) — not CSS-only. Bigger cluster. - Future PRs that add free-text fields should apply .mc-wrap-anywhere explicitly rather than deriving fresh rules. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

============================================================ W5-CRITICAL-1 — Merge preflight ignored untracked files in project root ============================================================ Real safety bug found by live W5 rerun. User had untracked DIRTY_FILE.txt; clicked Merge → HTTP 200 + ping.py landed in main. Fix: merge action handler now consults _project_is_user_dirty (which already correctly filters Otto-owned untracked files) BEFORE the merge runs. On dirty: returns HTTP 409 with structured {reason, files} payload. UI surfaces block in confirm dialog. Tests: - tests/test_merge_preflight_dirty_tree.py (server) - tests/browser/test_merge_blocked_dirty_tree.py (browser) - blocks on user-untracked file in root - allows when only Otto-owned untracked - blocks on tracked-file modification - blocks on staged changes - confirm dialog shows block reason + file list ============================================================ 5 IMPORTANT — Evidence trustworthiness cluster ============================================================ #3 — Proof drawer cached + no run_id validation service.py: _proof_provenance helper extracts generated_at, run_id, session_id, branch, head_sha, file_mtime, sha256, run_id_matches flag. Client cache invalidates on (run_id|version|sha256|mtime) key change. New <ProofProvenance> component renders metadata + warning banner on run_id mismatch. #4 — Per-round evidence not visible service.py _certification_summary gains rounds field via new _certification_round_history helper. Client <CertificationRoundTabs> component: tab strip with verdict/duration; per-round detail with diagnosis, failing story IDs, fix commits. #5 — Live vs Final indicator (already covered cluster B; verified) #6 — Binary artifacts decoded as garbage UTF-8 service.py artifact_content does MIME detection via magic bytes (PNG/GIF/JPG/PDF/WEBM/WEBP/MP4) + null-byte sniff; returns {previewable, mime_type, size_bytes}. New /api/runs/{id}/artifacts/ {i}/raw route serves binary via FileResponse. Client routes image MIMEs to <img src=raw>, video to <video>, others to "no text preview · download" card. #7 — Artifact list missing size/mtime/hash serializers.serialize_artifact extended with size_bytes, mtime (ISO UTC), sha256 (full hex; client truncates to 12 chars). SHA computation capped at 16 MB so list rendering stays cheap. #8 — Visual evidence has no manifest certifier writes sibling <artifact>.manifest.json with schema_version, captured_at, run_id, session_id, round, story_id, story_status, sha256, size_bytes, viewport, browser, certifier_mode, kind. Tests (15 new, all green): - tests/test_proof_provenance.py (9 server tests) - tests/browser/test_certification_round_tabs.py (4 browser tests) - tests/browser/test_artifact_binary_preview.py (2 browser tests) ============================================================ Test counts ============================================================ - Default: 1183 (was 1168; +15) - Browser: 165 (was 153; +12) - npm web:typecheck: clean - npm web:build: 327KB JS / 49KB CSS ============================================================ Tally ============================================================ CRITICAL: 25 of 25 closed (100%) IMPORTANT: ~94 of 132 closed (~71%) Followups: - mimetypes for .json may return application/json on some platforms; if certifier emits custom-extension JSON, extend allowlist - Manifest viewport/browser fields are placeholders — agent SDK doesn't expose hook at screenshot capture time; fill once cert owns capture path - TUI test_mission_control_status_cells_use_theme_colors: pre-existing STALE-vs-RUNNING heartbeat test failure unrelated to this work Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fix code health findings

7a3dbf7

logpie merged commit ef3f030 into main Apr 25, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] fix code health findings#3

[codex] fix code health findings#3
logpie merged 1 commit intomainfrom
codex/code-health-retry-fixes

logpie commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

logpie commented Apr 24, 2026

Summary

Validation

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant