Add Prometheus metrics for gRPC performance monitoring#1436
Conversation
Introduce a custom Tower middleware layer that automatically instruments all gRPC endpoints with request counters and latency histograms, exposed via a Prometheus /metrics HTTP endpoint on a configurable port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughPrometheus エクスポーターと gRPC 用の計測ミドルウェアを追加。依存関係を増やし、メトリクス用 HTTP サーバーを起動、Tonic サーバーに Changes
Sequence Diagram(s)sequenceDiagram
participant Client as gRPCクライアント
participant Layer as GrpcMetricsLayer
participant Service as InnerService
participant Metrics as MetricsCollector
participant Exporter as PrometheusExporter
Client->>Layer: gRPC リクエスト (HTTP)
activate Layer
Layer->>Metrics: increment grpc_requests_started_total(method)
Layer->>Service: forward request
activate Service
Service-->>Layer: response / error
deactivate Service
Layer->>Metrics: increment grpc_requests_total(method, status)
Layer->>Metrics: observe grpc_request_duration_seconds(method, status, duration)
Layer-->>Client: return response
deactivate Layer
Client->>Exporter: GET /metrics
Exporter->>Metrics: collect current metrics
Exporter-->>Client: metrics payload (HTTP)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
stationapi/src/main.rs (1)
63-74: メトリクスエクスポーターの初期化タイミングについて。Prometheusエクスポーターが
.env.localの読み込み(Line 76)より前に初期化されています。METRICS_PORTが.env.localで設定される場合、その値は反映されません。意図的な設計であれば問題ありませんが、確認をお勧めします。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/main.rs` around lines 63 - 74, Prometheus exporter is initialized before `.env.local` is loaded so METRICS_PORT from the file may be ignored; update initialization order by either moving the `metrics_exporter_prometheus::PrometheusBuilder` block (and the call to `fetch_metrics_port`) to after the `.env.local` loading code, or modify `fetch_metrics_port` to explicitly read/refresh environment variables from `.env.local` before parsing `METRICS_PORT`; adjust references to `metrics_addr`, `metrics_port`, and the `info!` log accordingly so the exporter uses the intended value.stationapi/src/presentation/middleware/metrics.rs (3)
41-51: ヘルスチェックとリフレクションリクエストもメトリクスに含まれます。
grpc.health.v1.Health/Checkやgrpc.reflection.v1.ServerReflectionのリクエストもメトリクスに記録されます。運用上問題なければそのままで構いませんが、アプリケーション固有のメトリクスのみを計測したい場合はフィルタリングを検討してください。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/presentation/middleware/metrics.rs` around lines 41 - 51, In call(), filter out health-check and reflection RPCs before incrementing the metric: detect the request path obtained in method (from req.uri().path()) and if it equals or matches the gRPC health and reflection routes (e.g. "/grpc.health.v1.Health/Check" and any reflection path such as starting with "/grpc.reflection.v1.ServerReflection"), skip the counter!("grpc_requests_started_total", ...) increment and proceed to construct GrpcMetricsFuture unchanged; update the logic in the call function accordingly so these method names are not recorded.
76-76: ステータスラベルがトランスポートエラーのみを区別しています。現在の実装では
Result::is_ok()でトランスポートレベルのエラーのみを検出しています。gRPCステータスコード(OK,NOT_FOUND,INTERNALなど)はtonic::Statusとしてレスポンスボディに含まれるため、成功したHTTPレスポンス内のgRPCエラーは "ok" として記録されます。これは初期実装として許容範囲ですが、より詳細なメトリクスが必要な場合は、レスポンスからgRPCステータスを抽出する拡張を検討してください。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/presentation/middleware/metrics.rs` at line 76, 現在の metrics ミドルウェアで行っている let status = if result.is_ok() { "ok" } else { "error" }; は transport レベルの失敗のみを判定しており、成功した HTTP レスポンス内に含まれる gRPC エラー(tonic::Status)を見逃します。metrics.rs の該当箇所(let status = ... の代替)を修正して、レスポンスから gRPC ステータスを抽出してラベルに反映するようにしてください(例: レスポンスボディ内の tonic::Status を検査するか、トレーラ/ヘッダの "grpc-status" を参照して OK/NOT_FOUND/INTERNAL 等を区別)。result.is_ok() 判定は残す場合でも、gRPC ステータスの存在を優先して status ラベルを設定するように変更してください。
74-86: 不要なclone()を削除してパフォーマンスを改善できます。Line 77で
this.method.clone()を行い、Line 79でさらにmethod.clone()しています。1回の clone で済むように最適化できます。♻️ 最適化案
Poll::Ready(result) => { let elapsed = this.start.elapsed().as_secs_f64(); let status = if result.is_ok() { "ok" } else { "error" }; - let method = this.method.clone(); + let method: &str = this.method; - counter!("grpc_requests_total", "method" => method.clone(), "status" => status) + counter!("grpc_requests_total", "method" => method.to_owned(), "status" => status) .increment(1); histogram!("grpc_request_duration_seconds", "method" => method, "status" => status) .record(elapsed);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/presentation/middleware/metrics.rs` around lines 74 - 86, Clone the request method only once and reuse it to avoid the redundant second clone: replace the two clones on this.method with a single let method = this.method.clone() (as already started on line 77) and pass references to that local variable into the counter! and histogram! macros (e.g. "method" => &method) instead of calling method.clone() again so the code records metrics without the extra allocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/Cargo.toml`:
- Around line 33-37: The Cargo dependency for tower is set to version "0.5"
which conflicts with tonic 0.12.3 (tonic depends on tower ^0.4.7); either
downgrade the tower dependency to a 0.4.x series or upgrade tonic to >=0.14 so
versions are compatible—update the Cargo.toml entry for the tower dependency (or
the tonic entry) accordingly and run cargo update/cargo build to verify the
resolver picks compatible versions.
In `@stationapi/src/main.rs`:
- Around line 220-233: The fetch_metrics_port function has two issues: the
multi-line warn! macro breaks cargo fmt and the expression fetch_port() + 1 can
overflow when PORT==65535. Fix by collapsing the warn! call into a single line
(keep the same message) and replace the addition with a safe increment (e.g.,
use fetch_port().checked_add(1).unwrap_or(u16::MAX) or
fetch_port().saturating_add(1)) so it never overflows; implement these changes
inside the fetch_metrics_port function and keep references to fetch_port() and
warn! so the logic and message remain clear.
- Around line 63-74: The build is failing cargo fmt because the array passed to
PrometheusBuilder::set_buckets is on one long line; update the call in main (the
chain starting with metrics_exporter_prometheus::PrometheusBuilder::new()) so
the set_buckets(&[ ... ]) argument uses multiline formatting (each bucket value
on its own line or wrapped across multiple lines) to satisfy rustfmt, keeping
the same values and retaining the subsequent with_http_listener(metrics_addr)
and install() calls in the fluent chain.
---
Nitpick comments:
In `@stationapi/src/main.rs`:
- Around line 63-74: Prometheus exporter is initialized before `.env.local` is
loaded so METRICS_PORT from the file may be ignored; update initialization order
by either moving the `metrics_exporter_prometheus::PrometheusBuilder` block (and
the call to `fetch_metrics_port`) to after the `.env.local` loading code, or
modify `fetch_metrics_port` to explicitly read/refresh environment variables
from `.env.local` before parsing `METRICS_PORT`; adjust references to
`metrics_addr`, `metrics_port`, and the `info!` log accordingly so the exporter
uses the intended value.
In `@stationapi/src/presentation/middleware/metrics.rs`:
- Around line 41-51: In call(), filter out health-check and reflection RPCs
before incrementing the metric: detect the request path obtained in method (from
req.uri().path()) and if it equals or matches the gRPC health and reflection
routes (e.g. "/grpc.health.v1.Health/Check" and any reflection path such as
starting with "/grpc.reflection.v1.ServerReflection"), skip the
counter!("grpc_requests_started_total", ...) increment and proceed to construct
GrpcMetricsFuture unchanged; update the logic in the call function accordingly
so these method names are not recorded.
- Line 76: 現在の metrics ミドルウェアで行っている let status = if result.is_ok() { "ok" } else
{ "error" }; は transport レベルの失敗のみを判定しており、成功した HTTP レスポンス内に含まれる gRPC
エラー(tonic::Status)を見逃します。metrics.rs の該当箇所(let status = ... の代替)を修正して、レスポンスから
gRPC ステータスを抽出してラベルに反映するようにしてください(例: レスポンスボディ内の tonic::Status を検査するか、トレーラ/ヘッダの
"grpc-status" を参照して OK/NOT_FOUND/INTERNAL 等を区別)。result.is_ok() 判定は残す場合でも、gRPC
ステータスの存在を優先して status ラベルを設定するように変更してください。
- Around line 74-86: Clone the request method only once and reuse it to avoid
the redundant second clone: replace the two clones on this.method with a single
let method = this.method.clone() (as already started on line 77) and pass
references to that local variable into the counter! and histogram! macros (e.g.
"method" => &method) instead of calling method.clone() again so the code records
metrics without the extra allocation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 06664cd5-6ebe-452a-956c-8c0ec201fee3
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
stationapi/Cargo.tomlstationapi/src/main.rsstationapi/src/presentation.rsstationapi/src/presentation/middleware.rsstationapi/src/presentation/middleware/metrics.rs
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
stationapi/src/main.rs (1)
222-229:⚠️ Potential issue | 🟠 Major
fetch_port() + 1はポート上限でオーバーフローします。Line 226 は
PORT=65535のとき不正値になります。checked_addで明示的に扱ってください。これは過去指摘と同一論点です。修正案
fn fetch_metrics_port() -> u16 { match env::var("METRICS_PORT") { Ok(s) => s.parse().expect("Failed to parse $METRICS_PORT"), Err(env::VarError::NotPresent) => { - let default = fetch_port() + 1; + let port = fetch_port(); + let default = port.checked_add(1).unwrap_or_else(|| { + panic!("PORT {} would overflow when computing default METRICS_PORT", port) + }); warn!("$METRICS_PORT is not set. Falling back to {}.", default); default } Err(VarError::NotUnicode(_)) => panic!("$METRICS_PORT should be written in Unicode."), } }#!/bin/bash # 目的: fetch_metrics_port のデフォルト計算が未チェック加算になっていないか確認 rg -n -C2 'fn fetch_metrics_port|fetch_port\(\)\s*\+\s*1|checked_add|saturating_add' stationapi/src/main.rs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/main.rs` around lines 222 - 229, fetch_metrics_port currently does an unchecked add (fetch_port() + 1) which overflows when PORT==65535; replace that with a checked add on the u16 from fetch_port(), e.g. call fetch_port().checked_add(1) and handle the None case explicitly: log a clear error via warn/error (including the offending port value) and exit(1) or return a safe fallback (e.g. u16::MAX) instead of letting wraparound occur; update the logic in fetch_metrics_port and any related tests to reflect this explicit overflow handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/main.rs`:
- Line 65: Change the hard-coded bind address for metrics to be configurable via
a new METRICS_HOST env var (defaulting to "127.0.0.1") instead of always
"0.0.0.0"; update the place that builds metrics_addr (currently using
metrics_port and creating metrics_addr) to read METRICS_HOST, fall back to
"127.0.0.1" when unset, and then format SocketAddr from that host and
metrics_port so /metrics is loopback by default and only publicly exposed when
METRICS_HOST is explicitly set.
- Around line 63-80: The dotenv local file is being loaded after metrics
initialization so METRICS_PORT/PORT read by fetch_metrics_port() isn't picking
up .env.local overrides; call dotenv::from_filename(".env.local") (and handle
its Result) at startup before invoking fetch_metrics_port() and before creating
the Prometheus exporter (i.e., move the dotenv::from_filename(".env.local") call
to precede the code that calls fetch_metrics_port() and
metrics_exporter_prometheus::PrometheusBuilder::new()), ensuring any env
overrides are applied when fetch_metrics_port() and the Prometheus builder run.
---
Duplicate comments:
In `@stationapi/src/main.rs`:
- Around line 222-229: fetch_metrics_port currently does an unchecked add
(fetch_port() + 1) which overflows when PORT==65535; replace that with a checked
add on the u16 from fetch_port(), e.g. call fetch_port().checked_add(1) and
handle the None case explicitly: log a clear error via warn/error (including the
offending port value) and exit(1) or return a safe fallback (e.g. u16::MAX)
instead of letting wraparound occur; update the logic in fetch_metrics_port and
any related tests to reflect this explicit overflow handling.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 098cf9b8-279b-4e6b-8d28-70e58aaea3bf
📒 Files selected for processing (1)
stationapi/src/main.rs
…low safety - Move dotenv loading before metrics initialization so .env.local overrides are picked up by fetch_metrics_port()/fetch_port() - Add METRICS_HOST env var (default: 127.0.0.1) instead of hard-coded 0.0.0.0, so /metrics is loopback-only by default - Use checked_add(1) in fetch_metrics_port() to prevent u16 overflow when PORT=65535 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@coderabbitai 指摘ないならapproveしてください |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me read the actual code before approving. 🏁 Script executed: Length of output: 11940
その他の実装( [approve] 🧠 Learnings used |
Summary
/metricsエンドポイントを別ポート(デフォルト: メインポート+1)で公開Metrics
grpc_requests_started_totalmethodgrpc_requests_totalmethod,statusgrpc_request_duration_secondsmethod,statusConfiguration
METRICS_PORT環境変数でポートを変更可能(デフォルト:PORT + 1)Test plan
SQLX_OFFLINE=true cargo buildでコンパイル確認済みcurl http://localhost:50052/metricsでPrometheusメトリクスが返ることを確認grpc_request_duration_secondsとgrpc_requests_totalが増加することを確認🤖 Generated with Claude Code
Summary by CodeRabbit
新機能
Chores