Skip to content

feat: surface #7762 metrics in dive CSV + flag-enable in workflow#100

Merged
JohnMcLear merged 1 commit into
mainfrom
harness/dive-metrics-pass2
May 15, 2026
Merged

feat: surface #7762 metrics in dive CSV + flag-enable in workflow#100
JohnMcLear merged 1 commit into
mainfrom
harness/dive-metrics-pass2

Conversation

@JohnMcLear
Copy link
Copy Markdown
Member

Summary

Three follow-ups to make the next scaling-dive run actually use the new core metrics from ether/etherpad#7762:

  1. Workflow enables `scalingDiveMetrics`. Core's setting defaults to `false`. Without flipping it on, the three new `etherpad_pad_users` / `etherpad_changeset_apply_duration_seconds` / `etherpad_socket_emits_total` rows never appear on `/stats/prometheus`. JSON-patched via inline node so we don't depend on key ordering in settings.json.

  2. `evloop_p95_ms` → `evloop_p99_ms`. prom-client's `collectDefaultMetrics` emits `nodejs_eventloop_lag_p50/p90/p99_seconds` — there's no `p95`. The CSV column was always empty for that reason; the previous dive run made that visible. p99 is the closest tail metric.

  3. Two new curated CSV columns:

    • `apply_mean_ms` — `etherpad_changeset_apply_duration_seconds_{sum,count}` mean × 1000. Server-side apply-path latency, isolated from fan-out (the histogram in #7762 is now scoped correctly).
    • `emits_new_changes` — `etherpad_socket_emits_total{type=NEW_CHANGES}`. Dominant fan-out cost; the column makes the batching lever's payoff visible without digging into JSON.

Both new columns are populated only when the underlying metrics exist on the SUT; older builds get empty cells (the existing missing-metric pattern is preserved).

48 tests green.

Test Plan

  • CI: `pnpm test` green.
  • CI: legacy + sweep-integration jobs green.
  • After merge: trigger Scaling dive workflow; confirm artifacts contain non-empty `evloop_p99_ms`, `apply_mean_ms`, and `emits_new_changes` columns.

🤖 Generated with Claude Code

Three follow-ups to make the next dive run informative:

1. Workflow now patches settings.json to set
   scalingDiveMetrics=true (added to core in #7762 but defaulting
   off there per project rule). Without this the three new metric
   rows the harness wants would never appear on /stats/prometheus.

2. CSV column rename: evloop_p95_ms -> evloop_p99_ms. prom-client's
   collectDefaultMetrics emits nodejs_eventloop_lag_p99_seconds
   (p50/p90/p99 — no p95), so the previous lookup was always empty.
   MD table header updated to match.

3. Two new curated CSV columns:
   - apply_mean_ms: from etherpad_changeset_apply_duration_seconds_sum
     and _count (histogram mean), converted to ms. Lets the dive
     attribute server-side latency to the apply path vs fan-out.
   - emits_new_changes: from etherpad_socket_emits_total{type=NEW_CHANGES}.
     Dominant fan-out cost; the column makes the batching lever's
     payoff visible.

Both new columns are populated only when the underlying metrics
exist on the SUT; older builds get empty cells (existing pattern).

48 tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@JohnMcLear JohnMcLear merged commit 9f224f7 into main May 15, 2026
4 checks passed
@JohnMcLear JohnMcLear deleted the harness/dive-metrics-pass2 branch May 15, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant