Skip to content

[codex] fix code health findings#3

Merged
logpie merged 1 commit intomainfrom
codex/code-health-retry-fixes
Apr 25, 2026
Merged

[codex] fix code health findings#3
logpie merged 1 commit intomainfrom
codex/code-health-retry-fixes

Conversation

@logpie
Copy link
Copy Markdown
Owner

@logpie logpie commented Apr 24, 2026

Summary

This PR fixes the actionable findings from the retried code-health audit:

  • hardens queue lifecycle behavior, including immediate shutdown escalation and queue spec path lookup
  • validates queue manifest task ids at the path-helper boundary
  • removes stale merge-resume UI/state surface instead of keeping a deferred hidden option
  • strengthens queue drain, env scoping, nightly fixture, and E2E assertions
  • removes dead helpers/imports and stale docs/ignore patterns

Audit artifacts remain local under audits/2026-04-24-0043/ and were not committed to keep the PR focused.

Validation

  • uv run pytest tests/test_paths.py tests/test_manifest.py tests/test_merge_orchestrator.py tests/test_cli_merge.py tests/test_queue_runner.py tests/test_env_bypass.py tests/test_otto_as_user_nightly.py -q -> 158 passed
  • uv run python scripts/e2e_runner.py B1 B5 B13 -> 3/3 passed
  • uv run pytest -q --maxfail=10 -> 911 passed
  • uv run ruff check otto scripts tests --select F401,F821,F841 -> passed
  • uv run python -m compileall -q otto scripts tests -> passed
  • uv run python -m pip check -> passed
  • git diff --check -> passed

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02100dda-4bf2-4b6f-9845-0510a9b4cf8d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/code-health-retry-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@logpie logpie merged commit ef3f030 into main Apr 25, 2026
1 check passed
logpie added a commit that referenced this pull request Apr 26, 2026
CLUSTER C — bundle/build integrity (closes 1 CRITICAL + 4 IMPORTANT + 1 NOTE):

CRITICAL closed:
- `app.py:35`: `otto web` served checked-in bundle without verifying it
  matches source — developer skips `npm run web:build` → silent stale UI
  (codex-packaging-bundle #1).

IMPORTANT closed:
- Python build never built frontend; `pip install -e .` shipped whatever
  static was in tree (#2). Vite plugin emits `build-stamp.json` on every
  build with source-hash + timestamp + git commit; FastAPI startup verifies
  freshness via `verify_bundle_freshness()`.
- `web:build` ran only Vite; `web:typecheck` advisory; no CI gates (#3).
  New `web:verify` script chains typecheck → build → committed-check.
- Default cache headers underused hashed assets (#4). New
  `_CacheHeaderStaticFiles` subclass: `no-store` for shell + index.html,
  `public, max-age=31536000, immutable` for `/static/assets/*`.
- Server didn't validate `index.html` referenced JS/CSS exist (#5).
  Startup parses index.html and asserts every referenced static path
  resolves; missing → fail-fast with `npm run web:build` guidance.

NOTE closed:
- `[tool.setuptools.package-data]` was flat (`static/*`, `static/assets/*`);
  future nested assets (fonts/, images/, locale/) would silently miss
  wheels. Now `static/**/*` recursive glob.

Files: otto/web/bundle.py (new, 263 lines), otto/web/client/vite.config.ts
(build-stamp emitter), scripts/build_stamp.py (CLI for manual stamp),
scripts/check_bundle_committed.py (git-diff guard for CI), package.json
(web:verify script), pyproject.toml (recursive package_data), otto/web/app.py
(verify_bundle_freshness call + _CacheHeaderStaticFiles).

Tests: tests/test_web_bundle_freshness.py (5 tests) + tests/test_web_cache_headers.py
(2 tests, 8 actual checks). 15 new server-layer tests, all green.

CLUSTER D — history pagination (closes 1 CRITICAL + 2 IMPORTANT):

CRITICAL closed:
- `total_rows=247` displayed but only first page rendered; power user with
  200+ runs stuck on page 1 (heavy-user, codex-state-management #6,
  codex-long-string-overflow #3).

IMPORTANT closed:
- `/api/state` accepted `history_page` + `page_size`; client never sent
  `history_page` and rendered no controls.
- Page-size selector now lives in the UI (10/25/50/100, default 25);
  server clamps to [1, 200] to refuse stale URLs requesting unbounded slices.

Implementation:
- `MissionControlFilters.history_page_size: int | None` for per-request
  override (server clamps to safe range).
- App.tsx History pane: pagination footer (Page N of M · X runs · ←/→ ·
  jump-to + page-size selector). URL persists `hp` + `ps` query params.
- Filter changes reset page to 1.
- Stale deep-link `?hp=99` (out-of-range) → "Page 99 doesn't exist; jump
  to page 1" with reset button.

Files: otto/mission_control/{model.py,serializers.py} (history_page_size
plumbing), otto/web/app.py (param wiring), otto/web/client/src/{App.tsx,
api.ts,types.ts,styles.css} (pagination UI), tests/browser/test_history_pagination.py
(10 paired Playwright tests, all green).

Verification:
- Browser suite: 25 passed (8 cluster A + 7 cluster B + 10 cluster D + new
  smoke set from cluster C cache headers via tests/test_web_cache_headers.py)
- Default suite: 1091 passed (was 1076; +15 cluster C server tests)
- npm run web:typecheck: clean
- npm run web:build: 277.82 kB JS / 33.34 kB CSS (rebuilt with stamp)

Note: cluster C agent hit an API overload during its final summary step but
all files landed cleanly on disk; verified by independently running the new
test files before committing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
logpie added a commit that referenced this pull request Apr 26, 2026
============================================================
5 IMPORTANT closures
============================================================

1. W2-IMPORTANT-3 — Watcher cancel-vs-dispatch race
   Cancel POST landing between begin_command_drain and _dispatch_new
   was deferred to next tick — by then the queued task had already
   spawned. Added a `late_drain` callback into _dispatch_new called
   once at entry AND before each task's spawn evaluation; deduped
   against already-applied command_ids. Lock semantics in
   begin_command_drain are idempotent so the second call is safe.

2. Codex info-density #6 — Mission Focus active headline
   Working state showed only `<N> tasks in flight`. Now sorts
   live items by elapsed_s ascending, picks the hottest, and
   assembles `task-id · branch · elapsed · cost · last-event`.
   Segments dropped when missing or when cost is "…" placeholder.

3. Codex info-density #3 — Status badge tones
   .task-status was one gray pill regardless of status/stage.
   Added statusTone() helper mapping status+stage →
   success/running/warning/danger/neutral. CSS adds tinted bg +
   text colors per tone using existing palette tokens.

4. Codex error-empty-states #11 — Filter-blind empty state
   Empty board read "No work queued" even when filters were
   hiding tasks. Added BoardEmptyReason + computeBoardEmptyReason
   + filtersAreActive helpers. Banner now shows reason-specific
   copy: "No matching tasks." + Clear-filters link (filtered) vs
   "No work queued..." + Queue-job CTA (true-empty). Subtitle and
   per-column empty also reflect filter awareness.

5. Codex first-time-user #19 — Queued cards clickable
   TaskCard had disabled={!task.runId}, so queued tasks not yet
   picked up by watcher had a dead button. Users couldn't open
   Details. Added onSelectQueued prop; selectQueuedTask state.
   RunDetailPanel renders "Waiting for watcher" placeholder with
   title/status/branch/intent/reason + Start-watcher CTA when
   stopped. useEffect auto-promotes selection to real runId once
   the watcher dispatches (live row's queue_task_id matches).

============================================================
Tests added (5 new files, 14 total tests)
============================================================

- tests/test_queue_cancel_race.py (3 tests)
- tests/browser/test_mission_focus_active_headline.py (3 tests)
- tests/browser/test_task_status_badge_tones.py (2 tests)
- tests/browser/test_task_board_filter_blind_empty.py (3 tests)
- tests/browser/test_queued_card_clickable.py (3 tests)

============================================================
Test counts
============================================================

- Default suite: 1136 passed (was 1133; +3)
- Browser suite: 102 passed (was 88; +14)
- npm web:typecheck: clean
- npm web:build: 305.39 kB JS / 40.27 kB CSS

============================================================
Tally
============================================================

CRITICAL: 22 of 22 closed (100%)
IMPORTANT: ~53 of 132 closed (~40%)

Followups:
- statusTone could extend to history table + live-runs panel pills
- TaskBoard "search input" data-testid for stable test selectors
- Queued placeholder could surface cancel/remove actions

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
logpie added a commit that referenced this pull request Apr 26, 2026
Closes findings #6, #7, #8, #9, #10, #12, #14 from the
codex-long-string-overflow hunter.

Implementation:
- Shared utilities .mc-wrap-anywhere + .mc-line-clamp-3 so future
  free-text fields opt in by class
- Toast / status-banner / job-dialog status: max-height + overflow-y:auto
  so long strings never push siblings (dismiss/submit) offscreen
- Confirm dialog body + bulk-row branch: wrap-anywhere
- Review packet (#10): replaced silent ellipsis with 3-line clamp +
  wrap-anywhere; existing title= attribute provides full text on hover
- Changed-file <li>: wrap-anywhere; .review-packet code: display:block +
  pre-wrap so diff_command snippet wraps inside the drawer (inline code
  runs explicitly re-set to display:inline so override doesn't break)
- Diff toolbar (#14): nowrap+ellipsis with title tooltip on right span;
  min-width:0 on flex children prevents toolbar drift; matches audit's
  suggested affordance

Tests (tests/browser/test_long_string_overflow.py — 7 new):
- test_toast_wraps_long_url
- test_last_error_banner_clamps_long_message
- test_job_dialog_status_does_not_push_submit_offscreen
- test_confirm_dialog_wraps_long_branch_names
- test_review_packet_long_text_wraps_consistently
- test_changed_file_list_wraps_long_paths
- test_diff_toolbar_branch_truncates_long_names

Each test asserts (a) no document-level horizontal scroll, (b) the
overflowing element fits within its parent's width, (c) the long string
remains in DOM (CSS overflow, not JS truncation).

Test counts:
- Default: 1136 passing (no regression)
- Browser: 109 passing (was 102; +7)
- npm web:typecheck: clean
- npm web:build: clean

CRITICAL: 22 of 22 closed (100%)
IMPORTANT: ~60 of 132 closed (~45%)

Followups (deferred):
- Long-string findings #2/#3/#4/#5/#11/#13 need React state/render
  changes (perf + virtualization) — not CSS-only. Bigger cluster.
- Future PRs that add free-text fields should apply .mc-wrap-anywhere
  explicitly rather than deriving fresh rules.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
logpie added a commit that referenced this pull request Apr 26, 2026
============================================================
W5-CRITICAL-1 — Merge preflight ignored untracked files in project root
============================================================

Real safety bug found by live W5 rerun. User had untracked
DIRTY_FILE.txt; clicked Merge → HTTP 200 + ping.py landed in main.

Fix: merge action handler now consults _project_is_user_dirty (which
already correctly filters Otto-owned untracked files) BEFORE the
merge runs. On dirty: returns HTTP 409 with structured
{reason, files} payload. UI surfaces block in confirm dialog.

Tests:
- tests/test_merge_preflight_dirty_tree.py (server)
- tests/browser/test_merge_blocked_dirty_tree.py (browser)
  - blocks on user-untracked file in root
  - allows when only Otto-owned untracked
  - blocks on tracked-file modification
  - blocks on staged changes
  - confirm dialog shows block reason + file list

============================================================
5 IMPORTANT — Evidence trustworthiness cluster
============================================================

#3 — Proof drawer cached + no run_id validation
  service.py: _proof_provenance helper extracts generated_at, run_id,
  session_id, branch, head_sha, file_mtime, sha256, run_id_matches
  flag. Client cache invalidates on (run_id|version|sha256|mtime) key
  change. New <ProofProvenance> component renders metadata + warning
  banner on run_id mismatch.

#4 — Per-round evidence not visible
  service.py _certification_summary gains rounds field via new
  _certification_round_history helper. Client <CertificationRoundTabs>
  component: tab strip with verdict/duration; per-round detail with
  diagnosis, failing story IDs, fix commits.

#5 — Live vs Final indicator (already covered cluster B; verified)

#6 — Binary artifacts decoded as garbage UTF-8
  service.py artifact_content does MIME detection via magic bytes
  (PNG/GIF/JPG/PDF/WEBM/WEBP/MP4) + null-byte sniff; returns
  {previewable, mime_type, size_bytes}. New /api/runs/{id}/artifacts/
  {i}/raw route serves binary via FileResponse. Client routes image
  MIMEs to <img src=raw>, video to <video>, others to "no text
  preview · download" card.

#7 — Artifact list missing size/mtime/hash
  serializers.serialize_artifact extended with size_bytes, mtime
  (ISO UTC), sha256 (full hex; client truncates to 12 chars). SHA
  computation capped at 16 MB so list rendering stays cheap.

#8 — Visual evidence has no manifest
  certifier writes sibling <artifact>.manifest.json with schema_version,
  captured_at, run_id, session_id, round, story_id, story_status,
  sha256, size_bytes, viewport, browser, certifier_mode, kind.

Tests (15 new, all green):
- tests/test_proof_provenance.py (9 server tests)
- tests/browser/test_certification_round_tabs.py (4 browser tests)
- tests/browser/test_artifact_binary_preview.py (2 browser tests)

============================================================
Test counts
============================================================
- Default: 1183 (was 1168; +15)
- Browser: 165 (was 153; +12)
- npm web:typecheck: clean
- npm web:build: 327KB JS / 49KB CSS

============================================================
Tally
============================================================
CRITICAL: 25 of 25 closed (100%)
IMPORTANT: ~94 of 132 closed (~71%)

Followups:
- mimetypes for .json may return application/json on some platforms;
  if certifier emits custom-extension JSON, extend allowlist
- Manifest viewport/browser fields are placeholders — agent SDK
  doesn't expose hook at screenshot capture time; fill once cert
  owns capture path
- TUI test_mission_control_status_cells_use_theme_colors: pre-existing
  STALE-vs-RUNNING heartbeat test failure unrelated to this work

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant