feat: surface #7762 metrics in dive CSV + flag-enable in workflow#100
Merged
Conversation
Three follow-ups to make the next dive run informative:
1. Workflow now patches settings.json to set
scalingDiveMetrics=true (added to core in #7762 but defaulting
off there per project rule). Without this the three new metric
rows the harness wants would never appear on /stats/prometheus.
2. CSV column rename: evloop_p95_ms -> evloop_p99_ms. prom-client's
collectDefaultMetrics emits nodejs_eventloop_lag_p99_seconds
(p50/p90/p99 — no p95), so the previous lookup was always empty.
MD table header updated to match.
3. Two new curated CSV columns:
- apply_mean_ms: from etherpad_changeset_apply_duration_seconds_sum
and _count (histogram mean), converted to ms. Lets the dive
attribute server-side latency to the apply path vs fan-out.
- emits_new_changes: from etherpad_socket_emits_total{type=NEW_CHANGES}.
Dominant fan-out cost; the column makes the batching lever's
payoff visible.
Both new columns are populated only when the underlying metrics
exist on the SUT; older builds get empty cells (existing pattern).
48 tests green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three follow-ups to make the next scaling-dive run actually use the new core metrics from ether/etherpad#7762:
Workflow enables `scalingDiveMetrics`. Core's setting defaults to `false`. Without flipping it on, the three new `etherpad_pad_users` / `etherpad_changeset_apply_duration_seconds` / `etherpad_socket_emits_total` rows never appear on `/stats/prometheus`. JSON-patched via inline node so we don't depend on key ordering in settings.json.
`evloop_p95_ms` → `evloop_p99_ms`. prom-client's `collectDefaultMetrics` emits `nodejs_eventloop_lag_p50/p90/p99_seconds` — there's no `p95`. The CSV column was always empty for that reason; the previous dive run made that visible. p99 is the closest tail metric.
Two new curated CSV columns:
Both new columns are populated only when the underlying metrics exist on the SUT; older builds get empty cells (the existing missing-metric pattern is preserved).
48 tests green.
Test Plan
🤖 Generated with Claude Code