Skip to content

Menubar and CLI hardening from multi-agent audit#257

Merged
iamtoruk merged 1 commit intomainfrom
fix/menubar-and-cli-hardening
May 7, 2026
Merged

Menubar and CLI hardening from multi-agent audit#257
iamtoruk merged 1 commit intomainfrom
fix/menubar-and-cli-hardening

Conversation

@iamtoruk
Copy link
Copy Markdown
Member

@iamtoruk iamtoruk commented May 7, 2026

Two passes of validators across CLI accuracy, dashboard UX, menubar Swift, performance, security, and end-to-end smoke tests against real session data on a power-user machine.

Data correctness

  • parseLocalDate rejects month/day overflow. --from 2026-02-31 --to 2026-03-15 previously rolled to Mar 3 and silently dropped sessions on Feb 28 - Mar 2. Now throws with a clear reason. Leap-day correct (2024-02-29 valid; 2025-02-29 rejected).
  • CSV/JSON exports use the active currency's natural decimal places. Dashboard rendered ¥412 while CSV showed ¥412.37 — finance teams comparing the two saw a discrepancy. New roundForActiveCurrency rounds to 0 places for JPY/KRW/CLP and 2 for USD/EUR.
  • Copilot toolRequests is Array.isArray-guarded in both modern and legacy event branches. A corrupt session with toolRequests: null or "..." previously threw inside .map and aborted the whole file's parse loop, dropping every legitimate call after the bad event.
  • Codex token_count dedup uses a null sentinel for prevCumulativeTotal. Sessions that emit only last_token_usage report cumulativeTotal=0 on every event; with the prior 0-initialized prev, the first event matched the dedup guard and was dropped.
  • LiteLLM pricing values clamped to [0, 1] per token via safePerTokenRate. Defense in depth against a tampered upstream JSON propagating negative or absurd costs.

Performance

  • Cursor SQLite parse no longer pegs at minutes on multi-GB DBs. Per-conversation user-message buffer uses an index pointer (was Array.shift() — O(n) per call). Real ROWID cutoff via subquery limits the scan to the most recent 250k bubbles with a stderr warning.
  • Spawned codeburn CLI subprocesses are terminated on Task cancellation. Rapid period/provider tab clicks in the menubar were cancelling the Task but leaving the subprocess running to completion.

UX

  • Dashboard period switch flips to loading + clears projects synchronously before reloadData runs, eliminating the frame where the new period label rendered over the old period's numbers.
  • Optimize findings tab paginates 3-at-a-time with j/k scroll. Long findings lists were scrolling the StatusBar off the alt buffer top.
  • Custom --from/--to ranges hide the period tab strip and disable 1-5 / arrow keys so a stray press doesn't silently abandon the user's explicit range. A "Custom range: X to Y" banner replaces the tab strip.
  • OpenCode storage-format warning is per-table-set, rate-limited, and actionable — names the missing tables and points at OpenCode's migration step or the issue tracker.

Menubar / OAuth

  • Both Claude and Codex bootstrap (Reconnect button) honour the usageBlockedUntil 429 backoff. Spamming Reconnect during sustained rate-limits no longer hammers the upstream endpoint on every click.
  • Codex Retry-After HTTP header is parsed (delta-seconds + IMF-fixdate fallback) so we don't over-back-off.
  • Both credential cache files written via SafeFile.write (O_CREAT | O_EXCL | O_NOFOLLOW with explicit 0600) — no race window at default umask, no symlink-follow at the destination. Reads route through SafeFile.read with a 64 KiB cap.

CI signal

  • tsc --noEmit is now zero errors. Six pre-existing errors in src/providers/copilot.ts came from a permissive catch-all branch in the discriminated union. Removed it; runtime safely falls through unknown event types via the existing if/else chain.

Tests

16 new, 555 total, all passing.

  • date-range-filter — month/day/year overflow rejection, leap-day correctness
  • currency-roundingconvertCost no-rounding contract; roundForActiveCurrency for USD/JPY/KRW/EUR
  • providers/copilot — malformed toolRequests does not abort the parse
  • providers/cursor-bubble-dedup — re-parse after token mutation does not double-count
  • providers/codex — first event with cumulativeTotal=0 not dropped; consecutive zero-cumulative duplicates still deduped

Validation

  • npm test — 41 files, 555 tests, 0 failures
  • tsc --noEmit — 0 errors
  • swift build — clean
  • 13/13 real-data CLI smoke tests pass
  • Menubar runs cleanly under PID watch

Test plan

  • Full vitest suite green
  • tsc --noEmit clean
  • Swift build clean
  • All 13 CLI surfaces (status, today, month, export csv/json, optimize, compare, yield, plan, currency, model-alias) smoke-tested against live data
  • Menubar smoke (process up + log clean)
  • Multi-agent re-validation across accuracy / security / smoke

Two passes of validators across CLI accuracy, dashboard UX, menubar Swift,
performance, security, and end-to-end smoke tests on real session data.

Data-correctness fixes:

- parseLocalDate rejects month/day overflow. JS Date silently rolled
  Feb 31 to Mar 3, so --from 2026-02-31 --to 2026-03-15 quietly dropped
  sessions on Feb 28 - Mar 2. Now throws "Invalid date" with a clear
  reason. Leap-day case covered (2024-02-29 valid, 2025-02-29 rejected).

- CSV/JSON exports use the active currency's natural decimal places. The
  previous round2 helper produced ¥412.37 in CSV while the dashboard
  rendered ¥412 — finance teams comparing the two surfaces saw a
  discrepancy. New roundForActiveCurrency consults Intl.NumberFormat for
  the right precision (0 for JPY/KRW/CLP, 2 for USD/EUR, etc).

- Copilot toolRequests is Array.isArray-guarded in both modern and legacy
  event branches. Previously a corrupt session with toolRequests=null or
  a string aborted the whole file's parse loop and silently dropped every
  legitimate call after it.

- Codex token_count dedup uses a null sentinel for prevCumulativeTotal so
  the first event is never confused with a duplicate. Sessions that emit
  only last_token_usage (no total_token_usage) report cumulativeTotal=0
  on every event; with the previous 0-initialized prev, the first event
  matched the dedup guard and was dropped.

- LiteLLM pricing values are clamped to [0, 1] per token via safePerTokenRate.
  Defense in depth against a tampered upstream JSON shipping negative or
  absurdly large per-token costs that would otherwise propagate into all
  cost totals.

Performance:

- Cursor SQLite parse no longer pegs at minutes on multi-GB DBs. Two
  changes: per-conversation user-message buffer uses an index pointer
  instead of Array.shift() (which was O(n) per call); and a real ROWID
  cutoff via subquery limits the scan to the most recent 250k bubbles
  with a stderr warning so power users get a partial report rather than
  a stalled CLI.

- Spawned codeburn CLI subprocesses are terminated when the calling Task
  is cancelled. Without this, rapid period/provider tab clicks in the
  menubar cancelled the Task but left the subprocess running to
  completion, piling up zombie processes.

UX:

- Dashboard period switch flips to loading and clears projects
  synchronously before reloadData runs, eliminating the frame where the
  new period label rendered over the old period's projects.

- Optimize findings tab paginates 3-at-a-time with j/k scroll. With 4
  new detectors plus 7 originals, 8-10 findings * 6 lines was scrolling
  the StatusBar off the alt buffer top.

- Custom --from/--to ranges hide the period tab strip and disable the
  1-5 / arrow keys so a stray period press no longer abandons the user's
  explicit range. A "Custom range: X to Y" banner replaces the tab strip.

- OpenCode storage-format warning is per-table-set, rate-limited to once
  per process, and points the user at OpenCode's migration step or the
  issue tracker. The previous all-or-nothing check fired the generic
  "format not recognized" string for any schema mismatch.

Menubar / OAuth:

- Both Claude and Codex bootstrap (Reconnect button) now honour the
  usageBlockedUntil 429 backoff that refreshIfBootstrapped respects.
  Spamming Reconnect during sustained rate-limit windows previously
  hammered the upstream endpoint on every click.

- Codex Retry-After HTTP header is parsed (delta-seconds plus IMF-fixdate
  fallback) so we don't over-back-off when ChatGPT tells us a shorter
  window than our 5-minute floor.

- Both credential cache files are written via SafeFile.write
  (O_CREAT | O_EXCL | O_NOFOLLOW with explicit 0600) so there is no race
  window where the temp file briefly exists at default umask, and a
  symlink at the destination cannot redirect the write. Reads now route
  through SafeFile.read with a 64 KiB cap, closing the symlink-follow gap
  on Data(contentsOf:).

CI signal:

- TypeScript strict typecheck (tsc --noEmit) is now zero errors. The
  six errors in src/providers/copilot.ts came from a discriminated-union
  catch-all branch whose `data: Record<string, unknown>` shape TS picked
  over the specific event branches when narrowing on `type`. Removed the
  catch-all; runtime falls through unknown event types via the existing
  if/else chain.

Tests added: 16 new (now 555 total)
- date-range-filter: month/day/year overflow rejection, leap-day correctness
- currency-rounding: convertCost no-rounding contract, roundForActiveCurrency
  for USD/JPY/KRW/EUR
- providers/copilot: malformed toolRequests does not abort the parse
- providers/cursor-bubble-dedup: re-parse after token mutation does not
  double-count, single parse yields one call per bubble
- providers/codex: first event with cumulativeTotal=0 not dropped,
  consecutive zero-cumulative duplicates still deduped
@iamtoruk iamtoruk merged commit daa6734 into main May 7, 2026
3 checks passed
@iamtoruk iamtoruk deleted the fix/menubar-and-cli-hardening branch May 7, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant