Optimize GetStationsByLineGroupId/GetStationsByLineIdList performance#1439
Conversation
…ance Address three performance bottlenecks in update_station_vec_with_attributes: 1. Early return for train type queries when line_group_id is None - SQL `column = NULL` always returns empty, so skip the query entirely 2. Cache nearby bus line lookups by ~100m coordinate grid - Eliminates N+1 get_nearby_bus_lines queries (2 DB hits per station) - Stations on the same line share similar coordinates, so cache hit rate is high 3. Parallelize independent DB queries with tokio::try_join! - Phase 1: stations_by_group_id + lines_by_station_group_id - Phase 2: companies + train_types Expected improvement: ~70-80% latency reduction for affected RPCs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthrough
Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Interactor
participant StationRepo
participant LineRepo
participant CompanyRepo
participant TrainTypeRepo
participant BusRepo
Client->>Interactor: update_station_vec_with_attributes(request)
par phase 1
Interactor->>StationRepo: get_stations_by_group_id_vec(station_group_ids)
Interactor->>LineRepo: get_lines_by_station_group_id_vec(station_group_ids)
StationRepo-->>Interactor: stations
LineRepo-->>Interactor: lines
end
Interactor->>CompanyRepo: find_company_by_id_vec(company_ids)
alt line_group_id present
Interactor->>TrainTypeRepo: get_train_types_by_station_id_vec(Some(line_group_id), station_ids)
else line_group_id absent
TrainTypeRepo-->>Interactor: Ok([])
end
par phase 2
CompanyRepo-->>Interactor: companies
TrainTypeRepo-->>Interactor: train_types
end
Interactor->>BusRepo: get_nearby_candidates_by_grid(grid_key) (cache)
alt grid miss
BusRepo-->>Interactor: nearby stop candidates
Interactor->>BusRepo: get_bus_lines_by_group_ids(group_id_set) (cache)
BusRepo-->>Interactor: bus lines
else cache hit
BusRepo-->>Interactor: cached bus lines
end
Interactor-->>Client: merged station info (stations + lines + companies + train_types + nearby bus)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/use_case/interactor/query.rs`:
- Around line 2189-2191: テストが line_group_id 分岐を検証していないため、モックの
ConfigurableMockTrainTypeRepository::get_types_by_station_id_vec
を受け取った第2引数(line_group_id)に応じて返す値を変えるか、受け取った引数を assert
するように修正して、update_station_vec_with_attributes と build_route_tree_map
の挙動を厳密に検証してください;具体的には get_types_by_station_id_vec が現在常に self.train_types.clone()
を返している箇所を変更し、受け取った line_group_id に一致する TrainType のみを返すか(期待する line_group_id
を渡したときのみテストが通る)、引数が Some(1000) 等の期待値と一致することを assert するロジックを追加してテストの分岐を固定してください。
- Around line 296-297: The bus_line_cache currently keyed by rounded grid
coordinates causes order-dependent incorrect station.lines because
get_nearby_bus_lines returns strict "within 300m" results and the cached value
from one station is reused for another in the same grid bucket; change caching
to either use a stable per-station key (e.g., station_g_cd) when caching final
filtered results, or only cache the candidate set (unfiltered results from
get_nearby_bus_lines) keyed by the grid and then re-run the distance/300m filter
and apply the station-specific limit per station (reference:
get_nearby_bus_lines, bus_line_cache, station.lines, station_g_cd, and the
limit=50 candidate retrieval), and apply the same fix to the other occurrence
around the 332-342 region.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c359f4c6-bab1-4275-8a10-25af083b3baa
📒 Files selected for processing (1)
stationapi/src/use_case/interactor/query.rs
1. Bus cache: cache candidate set from get_by_coordinates by grid key, re-apply 300m haversine filter per station's exact coordinates. Previously cached final results, which could include/exclude bus stops incorrectly for stations at different positions within the same grid cell. 2. Mock: ConfigurableMockTrainTypeRepository.get_types_by_station_id_vec now returns empty when line_group_id is None (matching real SQL behavior) and asserts expected_line_group_id when configured. 3. Tests: use create_configurable_interactor_with_line_group_id for train type tests; add dedicated test for line_group_id=None skipping train types. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deduplicate station_group_ids and company_ids before batch queries to reduce IN clause size for GetStationsByLineIdList (multiple lines can share station groups) - Cache bus line query results by sorted station_group_ids to avoid repeated get_by_station_group_id_vec calls in the per-station loop - Fix rustfmt formatting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The logic was inlined into update_station_vec_with_attributes with per-station distance filtering and caching, making this method dead code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
stationapi/src/use_case/interactor/query.rs (1)
297-305:⚠️ Potential issue | 🟠 Major前回の懸念どおり、グリッド共有した候補 50 件では近傍バス路線をまだ取りこぼします。
各駅で 300m フィルタを掛け直しても、再利用している
candidates自体は最初の駅座標でのget_by_coordinates(..., Some(50), ...)結果です。同じグリッド内の別駅では、その 50 件の外にある停留所が 300m 圏内に入るので、密なエリアではstation.linesから正しいバス路線が落ちます。正確性を維持するなら、キャッシュ単位をstation_g_cdベースの最終結果まで下げるか、候補取得側のlimit依存を外す必要があります。Based on learnings: Verify that helper methods such as
update_station_vec_with_attributesandbuild_route_tree_mapbehave correctly when modifyingQueryInteractorAlso applies to: 340-358
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/use_case/interactor/query.rs` around lines 297 - 305, 現在のグリッド単位キャッシュ(bus_candidate_cache keyed by (i64,i64)) と get_by_coordinates(..., Some(50), ...) の組み合わせは、別駅で 300m に入る停留所を取りこぼすため、候補キャッシュを station_g_cd(駅ごと)まで細かくするか、候補取得の limit に依存しない設計に変更してください: 具体的には QueryInteractor 内で get_by_coordinates を呼ぶ箇所を見つけ(参照: get_by_coordinates 呼び出しと bus_candidate_cache の使用箇所)、キャッシュキーを station.station_g_cd に変更して駅ごとの最終候補(既に 300m フィルタ済み)を保存するか、あるいは grid キャッシュにするなら get_by_coordinates の limit を除去して完全な候補セットを返すようにして bus_candidate_cache の再利用で欠落が起きないようにしてください。また、update_station_vec_with_attributes と build_route_tree_map が QueryInteractor の状態を変更する場合に副作用がないか確認し、必要なら station ごとの処理でクローン/ローカル変数を使って QueryInteractor の共有状態を汚染しないよう修正してください。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/use_case/interactor/query.rs`:
- Around line 271-279: The code builds company_ids/company_map only from the
initial lines collection so later-added nearby bus lines lack company entries;
update QueryInteractor so that before calling find_company_by_id_vec (and before
building company_map used in the common enrichment loop) you first collect all
lines including any nearby bus lines (the same logic used in the common
enrichment that produces RailAndBus entries), then recompute company_ids (from
lines.iter().map(|l| l.company_cd as u32), sort_unstable, dedup) and then call
self.find_company_by_id_vec(&company_ids) so the company_map covers added nearby
bus lines; apply the same fix in the other block mentioned (around the 377-421
logic) and ensure get_train_types_by_station_id_vec still receives the full
station_ids set.
---
Duplicate comments:
In `@stationapi/src/use_case/interactor/query.rs`:
- Around line 297-305: 現在のグリッド単位キャッシュ(bus_candidate_cache keyed by (i64,i64)) と
get_by_coordinates(..., Some(50), ...) の組み合わせは、別駅で 300m に入る停留所を取りこぼすため、候補キャッシュを
station_g_cd(駅ごと)まで細かくするか、候補取得の limit に依存しない設計に変更してください: 具体的には QueryInteractor
内で get_by_coordinates を呼ぶ箇所を見つけ(参照: get_by_coordinates 呼び出しと bus_candidate_cache
の使用箇所)、キャッシュキーを station.station_g_cd に変更して駅ごとの最終候補(既に 300m フィルタ済み)を保存するか、あるいは
grid キャッシュにするなら get_by_coordinates の limit を除去して完全な候補セットを返すようにして
bus_candidate_cache の再利用で欠落が起きないようにしてください。また、update_station_vec_with_attributes
と build_route_tree_map が QueryInteractor の状態を変更する場合に副作用がないか確認し、必要なら station
ごとの処理でクローン/ローカル変数を使って QueryInteractor の共有状態を汚染しないよう修正してください。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 24a6ecba-86a1-48f9-a837-540c92e82d48
📒 Files selected for processing (1)
stationapi/src/use_case/interactor/query.rs
1. company_map now owns values and is extended with bus-only operator companies after nearby bus lines are added. Previously, bus lines from operators not in the initial rail lines had company=None. 2. Bus candidate cache key changed from grid coordinates to station_g_cd. Grid+LIMIT(50) could miss bus stops within 300m when a different station in the same grid cell had a different top-50 set. station_g_cd guarantees identical physical location, making the cache correct. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
line_group_idがNoneの場合、SQLでcolumn = NULLは常にfalseで空結果になるため、クエリをスキップget_by_coordinatesの候補セットをキャッシュし、300mフィルタは駅ごとに正確に再適用。バス路線クエリ(get_by_station_group_id_vec)もキャッシュして重複排除tokio::try_join!で4つの逐次クエリを2フェーズに並列化station_group_ids/company_idsをdedup してIN句サイズを削減ConfigurableMockTrainTypeRepositoryがline_group_idに応じて返す値を分岐するように修正し、line_group_id=Noneでtrain typeがスキップされるテストを追加Test plan
SQLX_OFFLINE=true cargo test全テスト通過(新規テスト含む)grpc_request_duration_secondsで改善を確認GetStationsByLineGroupIdとGetStationsByLineIdListの応答時間を実測比較🤖 Generated with Claude Code
Summary by CodeRabbit
パフォーマンス改善
最適化
変更
テスト