Skip to content

Add Prometheus metrics for gRPC performance monitoring#1436

Merged
TinyKitten merged 3 commits into
devfrom
feature/prometheus-metrics
Mar 20, 2026
Merged

Add Prometheus metrics for gRPC performance monitoring#1436
TinyKitten merged 3 commits into
devfrom
feature/prometheus-metrics

Conversation

@TinyKitten

@TinyKitten TinyKitten commented Mar 20, 2026

Copy link
Copy Markdown
Member

Summary

  • gRPC全エンドポイントのリクエスト数・レイテンシを自動計測するTower middlewareを追加
  • Prometheus /metrics エンドポイントを別ポート(デフォルト: メインポート+1)で公開
  • 既存のRPCハンドラへの変更不要

Metrics

メトリクス名 種別 ラベル
grpc_requests_started_total Counter method
grpc_requests_total Counter method, status
grpc_request_duration_seconds Histogram method, status

Configuration

  • METRICS_PORT 環境変数でポートを変更可能(デフォルト: PORT + 1

Test plan

  • SQLX_OFFLINE=true cargo build でコンパイル確認済み
  • サーバー起動後、curl http://localhost:50052/metrics でPrometheusメトリクスが返ることを確認
  • gRPCリクエスト送信後、grpc_request_duration_secondsgrpc_requests_totalが増加することを確認

🤖 Generated with Claude Code

Summary by CodeRabbit

  • 新機能

    • Prometheus形式のメトリクスをHTTPで公開するエクスポーターを追加しました。
    • gRPCリクエストの計測を導入し、リクエスト数・レイテンシ・結果ステータスを収集します。
    • メトリクス用ホスト/ポートを環境変数で指定可能。未指定時は既存のポートから自動決定します。
  • Chores

    • メトリクス収集に関する依存パッケージを追加しました。

Introduce a custom Tower middleware layer that automatically instruments
all gRPC endpoints with request counters and latency histograms, exposed
via a Prometheus /metrics HTTP endpoint on a configurable port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the feature 要望対応や課題解決 label Mar 20, 2026
@coderabbitai

coderabbitai Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 398c95ca-1386-48a6-a639-efb5ef728e09

📥 Commits

Reviewing files that changed from the base of the PR and between a6e2e86 and 6355f81.

📒 Files selected for processing (1)
  • stationapi/src/main.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • stationapi/src/main.rs

📝 Walkthrough

Walkthrough

Prometheus エクスポーターと gRPC 用の計測ミドルウェアを追加。依存関係を増やし、メトリクス用 HTTP サーバーを起動、Tonic サーバーに GrpcMetricsLayer を組み込む変更を導入しました。(50語以内)

Changes

Cohort / File(s) Summary
依存関係
stationapi/Cargo.toml
Prometheus メトリクスおよびミドルウェア実装に必要なクレートを追加(metrics, metrics-exporter-prometheus, pin-project-lite, http = "1", tower with util)。
サーバー初期化・メトリクス起動
stationapi/src/main.rs
環境変数読み取りヘルパー追加(METRICS_HOST/METRICS_PORT)、Prometheus エクスポーター起動、ヒストグラムバケツ設定、GrpcMetricsLayer を Tonic サーバーに組み込み(grpc-web 分岐含む)。
モジュール公開
stationapi/src/presentation.rs, stationapi/src/presentation/middleware.rs
middleware モジュールとその下位 metrics モジュールを公開 (pub mod middleware;, pub mod metrics;)。
メトリクスミドルウェア実装
stationapi/src/presentation/middleware/metrics.rs
GrpcMetricsLayer/GrpcMetricsService/GrpcMetricsFuture を実装。リクエスト開始カウンタ、完了カウンタ(status ラベル)、レイテンシヒストグラムを記録し、内部サービスへフォワードする Tower ミドルウェア。

Sequence Diagram(s)

sequenceDiagram
    participant Client as gRPCクライアント
    participant Layer as GrpcMetricsLayer
    participant Service as InnerService
    participant Metrics as MetricsCollector
    participant Exporter as PrometheusExporter

    Client->>Layer: gRPC リクエスト (HTTP)
    activate Layer
    Layer->>Metrics: increment grpc_requests_started_total(method)
    Layer->>Service: forward request
    activate Service
    Service-->>Layer: response / error
    deactivate Service
    Layer->>Metrics: increment grpc_requests_total(method, status)
    Layer->>Metrics: observe grpc_request_duration_seconds(method, status, duration)
    Layer-->>Client: return response
    deactivate Layer

    Client->>Exporter: GET /metrics
    Exporter->>Metrics: collect current metrics
    Exporter-->>Client: metrics payload (HTTP)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • reflection対応 #1202: stationapi/src/main.rs のサーバー初期化周りを変更しており、Tonic サーバー設定の衝突や類似箇所が存在する可能性が高い。

Poem

🐰 メトリクスの畑で跳ねてみた
リクエスト数を数え、時をはかるよ
ラベルが踊り、ヒストグラムが歌う
Prometheus に道を示しながら
ぴょん、とサーバーは今日も元気 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of the PR: adding Prometheus metrics for gRPC performance monitoring, which is reflected in the dependency additions, middleware implementation, and metrics endpoint configuration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/prometheus-metrics
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
stationapi/src/main.rs (1)

63-74: メトリクスエクスポーターの初期化タイミングについて。

Prometheusエクスポーターが .env.local の読み込み(Line 76)より前に初期化されています。METRICS_PORT.env.local で設定される場合、その値は反映されません。意図的な設計であれば問題ありませんが、確認をお勧めします。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/main.rs` around lines 63 - 74, Prometheus exporter is
initialized before `.env.local` is loaded so METRICS_PORT from the file may be
ignored; update initialization order by either moving the
`metrics_exporter_prometheus::PrometheusBuilder` block (and the call to
`fetch_metrics_port`) to after the `.env.local` loading code, or modify
`fetch_metrics_port` to explicitly read/refresh environment variables from
`.env.local` before parsing `METRICS_PORT`; adjust references to `metrics_addr`,
`metrics_port`, and the `info!` log accordingly so the exporter uses the
intended value.
stationapi/src/presentation/middleware/metrics.rs (3)

41-51: ヘルスチェックとリフレクションリクエストもメトリクスに含まれます。

grpc.health.v1.Health/Checkgrpc.reflection.v1.ServerReflection のリクエストもメトリクスに記録されます。運用上問題なければそのままで構いませんが、アプリケーション固有のメトリクスのみを計測したい場合はフィルタリングを検討してください。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/presentation/middleware/metrics.rs` around lines 41 - 51, In
call(), filter out health-check and reflection RPCs before incrementing the
metric: detect the request path obtained in method (from req.uri().path()) and
if it equals or matches the gRPC health and reflection routes (e.g.
"/grpc.health.v1.Health/Check" and any reflection path such as starting with
"/grpc.reflection.v1.ServerReflection"), skip the
counter!("grpc_requests_started_total", ...) increment and proceed to construct
GrpcMetricsFuture unchanged; update the logic in the call function accordingly
so these method names are not recorded.

76-76: ステータスラベルがトランスポートエラーのみを区別しています。

現在の実装では Result::is_ok() でトランスポートレベルのエラーのみを検出しています。gRPCステータスコード(OK, NOT_FOUND, INTERNAL など)は tonic::Status としてレスポンスボディに含まれるため、成功したHTTPレスポンス内のgRPCエラーは "ok" として記録されます。

これは初期実装として許容範囲ですが、より詳細なメトリクスが必要な場合は、レスポンスからgRPCステータスを抽出する拡張を検討してください。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/presentation/middleware/metrics.rs` at line 76, 現在の metrics
ミドルウェアで行っている let status = if result.is_ok() { "ok" } else { "error" }; は
transport レベルの失敗のみを判定しており、成功した HTTP レスポンス内に含まれる gRPC
エラー(tonic::Status)を見逃します。metrics.rs の該当箇所(let status = ... の代替)を修正して、レスポンスから
gRPC ステータスを抽出してラベルに反映するようにしてください(例: レスポンスボディ内の tonic::Status を検査するか、トレーラ/ヘッダの
"grpc-status" を参照して OK/NOT_FOUND/INTERNAL 等を区別)。result.is_ok() 判定は残す場合でも、gRPC
ステータスの存在を優先して status ラベルを設定するように変更してください。

74-86: 不要な clone() を削除してパフォーマンスを改善できます。

Line 77で this.method.clone() を行い、Line 79でさらに method.clone() しています。1回の clone で済むように最適化できます。

♻️ 最適化案
             Poll::Ready(result) => {
                 let elapsed = this.start.elapsed().as_secs_f64();
                 let status = if result.is_ok() { "ok" } else { "error" };
-                let method = this.method.clone();
+                let method: &str = this.method;

-                counter!("grpc_requests_total", "method" => method.clone(), "status" => status)
+                counter!("grpc_requests_total", "method" => method.to_owned(), "status" => status)
                     .increment(1);
                 histogram!("grpc_request_duration_seconds", "method" => method, "status" => status)
                     .record(elapsed);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/presentation/middleware/metrics.rs` around lines 74 - 86,
Clone the request method only once and reuse it to avoid the redundant second
clone: replace the two clones on this.method with a single let method =
this.method.clone() (as already started on line 77) and pass references to that
local variable into the counter! and histogram! macros (e.g. "method" =>
&method) instead of calling method.clone() again so the code records metrics
without the extra allocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/Cargo.toml`:
- Around line 33-37: The Cargo dependency for tower is set to version "0.5"
which conflicts with tonic 0.12.3 (tonic depends on tower ^0.4.7); either
downgrade the tower dependency to a 0.4.x series or upgrade tonic to >=0.14 so
versions are compatible—update the Cargo.toml entry for the tower dependency (or
the tonic entry) accordingly and run cargo update/cargo build to verify the
resolver picks compatible versions.

In `@stationapi/src/main.rs`:
- Around line 220-233: The fetch_metrics_port function has two issues: the
multi-line warn! macro breaks cargo fmt and the expression fetch_port() + 1 can
overflow when PORT==65535. Fix by collapsing the warn! call into a single line
(keep the same message) and replace the addition with a safe increment (e.g.,
use fetch_port().checked_add(1).unwrap_or(u16::MAX) or
fetch_port().saturating_add(1)) so it never overflows; implement these changes
inside the fetch_metrics_port function and keep references to fetch_port() and
warn! so the logic and message remain clear.
- Around line 63-74: The build is failing cargo fmt because the array passed to
PrometheusBuilder::set_buckets is on one long line; update the call in main (the
chain starting with metrics_exporter_prometheus::PrometheusBuilder::new()) so
the set_buckets(&[ ... ]) argument uses multiline formatting (each bucket value
on its own line or wrapped across multiple lines) to satisfy rustfmt, keeping
the same values and retaining the subsequent with_http_listener(metrics_addr)
and install() calls in the fluent chain.

---

Nitpick comments:
In `@stationapi/src/main.rs`:
- Around line 63-74: Prometheus exporter is initialized before `.env.local` is
loaded so METRICS_PORT from the file may be ignored; update initialization order
by either moving the `metrics_exporter_prometheus::PrometheusBuilder` block (and
the call to `fetch_metrics_port`) to after the `.env.local` loading code, or
modify `fetch_metrics_port` to explicitly read/refresh environment variables
from `.env.local` before parsing `METRICS_PORT`; adjust references to
`metrics_addr`, `metrics_port`, and the `info!` log accordingly so the exporter
uses the intended value.

In `@stationapi/src/presentation/middleware/metrics.rs`:
- Around line 41-51: In call(), filter out health-check and reflection RPCs
before incrementing the metric: detect the request path obtained in method (from
req.uri().path()) and if it equals or matches the gRPC health and reflection
routes (e.g. "/grpc.health.v1.Health/Check" and any reflection path such as
starting with "/grpc.reflection.v1.ServerReflection"), skip the
counter!("grpc_requests_started_total", ...) increment and proceed to construct
GrpcMetricsFuture unchanged; update the logic in the call function accordingly
so these method names are not recorded.
- Line 76: 現在の metrics ミドルウェアで行っている let status = if result.is_ok() { "ok" } else
{ "error" }; は transport レベルの失敗のみを判定しており、成功した HTTP レスポンス内に含まれる gRPC
エラー(tonic::Status)を見逃します。metrics.rs の該当箇所(let status = ... の代替)を修正して、レスポンスから
gRPC ステータスを抽出してラベルに反映するようにしてください(例: レスポンスボディ内の tonic::Status を検査するか、トレーラ/ヘッダの
"grpc-status" を参照して OK/NOT_FOUND/INTERNAL 等を区別)。result.is_ok() 判定は残す場合でも、gRPC
ステータスの存在を優先して status ラベルを設定するように変更してください。
- Around line 74-86: Clone the request method only once and reuse it to avoid
the redundant second clone: replace the two clones on this.method with a single
let method = this.method.clone() (as already started on line 77) and pass
references to that local variable into the counter! and histogram! macros (e.g.
"method" => &method) instead of calling method.clone() again so the code records
metrics without the extra allocation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06664cd5-6ebe-452a-956c-8c0ec201fee3

📥 Commits

Reviewing files that changed from the base of the PR and between 5229d7c and 20c4813.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • stationapi/Cargo.toml
  • stationapi/src/main.rs
  • stationapi/src/presentation.rs
  • stationapi/src/presentation/middleware.rs
  • stationapi/src/presentation/middleware/metrics.rs

Comment thread stationapi/Cargo.toml
Comment thread stationapi/src/main.rs
Comment thread stationapi/src/main.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
stationapi/src/main.rs (1)

222-229: ⚠️ Potential issue | 🟠 Major

fetch_port() + 1 はポート上限でオーバーフローします。

Line 226 は PORT=65535 のとき不正値になります。checked_add で明示的に扱ってください。これは過去指摘と同一論点です。

修正案
 fn fetch_metrics_port() -> u16 {
     match env::var("METRICS_PORT") {
         Ok(s) => s.parse().expect("Failed to parse $METRICS_PORT"),
         Err(env::VarError::NotPresent) => {
-            let default = fetch_port() + 1;
+            let port = fetch_port();
+            let default = port.checked_add(1).unwrap_or_else(|| {
+                panic!("PORT {} would overflow when computing default METRICS_PORT", port)
+            });
             warn!("$METRICS_PORT is not set. Falling back to {}.", default);
             default
         }
         Err(VarError::NotUnicode(_)) => panic!("$METRICS_PORT should be written in Unicode."),
     }
 }
#!/bin/bash
# 目的: fetch_metrics_port のデフォルト計算が未チェック加算になっていないか確認
rg -n -C2 'fn fetch_metrics_port|fetch_port\(\)\s*\+\s*1|checked_add|saturating_add' stationapi/src/main.rs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/main.rs` around lines 222 - 229, fetch_metrics_port currently
does an unchecked add (fetch_port() + 1) which overflows when PORT==65535;
replace that with a checked add on the u16 from fetch_port(), e.g. call
fetch_port().checked_add(1) and handle the None case explicitly: log a clear
error via warn/error (including the offending port value) and exit(1) or return
a safe fallback (e.g. u16::MAX) instead of letting wraparound occur; update the
logic in fetch_metrics_port and any related tests to reflect this explicit
overflow handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/src/main.rs`:
- Line 65: Change the hard-coded bind address for metrics to be configurable via
a new METRICS_HOST env var (defaulting to "127.0.0.1") instead of always
"0.0.0.0"; update the place that builds metrics_addr (currently using
metrics_port and creating metrics_addr) to read METRICS_HOST, fall back to
"127.0.0.1" when unset, and then format SocketAddr from that host and
metrics_port so /metrics is loopback by default and only publicly exposed when
METRICS_HOST is explicitly set.
- Around line 63-80: The dotenv local file is being loaded after metrics
initialization so METRICS_PORT/PORT read by fetch_metrics_port() isn't picking
up .env.local overrides; call dotenv::from_filename(".env.local") (and handle
its Result) at startup before invoking fetch_metrics_port() and before creating
the Prometheus exporter (i.e., move the dotenv::from_filename(".env.local") call
to precede the code that calls fetch_metrics_port() and
metrics_exporter_prometheus::PrometheusBuilder::new()), ensuring any env
overrides are applied when fetch_metrics_port() and the Prometheus builder run.

---

Duplicate comments:
In `@stationapi/src/main.rs`:
- Around line 222-229: fetch_metrics_port currently does an unchecked add
(fetch_port() + 1) which overflows when PORT==65535; replace that with a checked
add on the u16 from fetch_port(), e.g. call fetch_port().checked_add(1) and
handle the None case explicitly: log a clear error via warn/error (including the
offending port value) and exit(1) or return a safe fallback (e.g. u16::MAX)
instead of letting wraparound occur; update the logic in fetch_metrics_port and
any related tests to reflect this explicit overflow handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 098cf9b8-279b-4e6b-8d28-70e58aaea3bf

📥 Commits

Reviewing files that changed from the base of the PR and between 20c4813 and a6e2e86.

📒 Files selected for processing (1)
  • stationapi/src/main.rs

Comment thread stationapi/src/main.rs Outdated
Comment thread stationapi/src/main.rs Outdated
…low safety

- Move dotenv loading before metrics initialization so .env.local
  overrides are picked up by fetch_metrics_port()/fetch_port()
- Add METRICS_HOST env var (default: 127.0.0.1) instead of hard-coded
  0.0.0.0, so /metrics is loopback-only by default
- Use checked_add(1) in fetch_metrics_port() to prevent u16 overflow
  when PORT=65535

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@TinyKitten

Copy link
Copy Markdown
Member Author

@coderabbitai 指摘ないならapproveしてください

@coderabbitai

coderabbitai Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me read the actual code before approving.


🏁 Script executed:

#!/bin/bash
echo "=== Cargo.toml (relevant section) ==="
grep -A5 -B2 "metrics\|pin-project\|http\|tower" stationapi/Cargo.toml

echo ""
echo "=== main.rs (new additions) ==="
cat stationapi/src/main.rs

echo ""
echo "=== presentation.rs ==="
cat stationapi/src/presentation.rs

echo ""
echo "=== middleware.rs ==="
cat stationapi/src/presentation/middleware.rs

echo ""
echo "=== metrics.rs (middleware) ==="
cat stationapi/src/presentation/middleware/metrics.rs

Length of output: 11940


@TinyKitten コードを確認しました。全体的に実装は問題ありません。1点だけ情報共有として:

GrpcMetricsFuturestatus ラベル("ok" / "error")は Tower の Service が返す Result のトランスポートレベルの成否を反映しています。gRPC のアプリケーションレベルのステータスコード(NOT_FOUNDINVALID_ARGUMENT 等)は HTTP 200 + grpc-status トレーラーとして返るため、これらは "ok" として記録されます。Tower ミドルウェアとしては一般的なアプローチであり、バグではありませんが、将来的に gRPC ステータスコード単位での詳細な計測が必要になった場合はレスポンスのトレーラーを検査する拡張が必要になります。

その他の実装(checked_add(1) によるオーバーフロー防止、127.0.0.1 へのデフォルトバインド、dotenv の事前ロード等)はすべて適切です。承認します。

[approve]


🧠 Learnings used
Learnt from: CR
Repo: TrainLCD/StationAPI PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-15T16:09:23.603Z
Learning: Applies to stationapi/src/main.rs : Enable health checks via `grpc.health.v1.Health/Check` and gRPC Reflection through `tonic-health` and `tonic-reflection` in the gRPC server

@TinyKitten TinyKitten merged commit 55bdb62 into dev Mar 20, 2026
11 checks passed
@TinyKitten TinyKitten deleted the feature/prometheus-metrics branch March 20, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature 要望対応や課題解決

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant