Optimize IPA computation with caching and batch train type queries#1434
Conversation
- IPA変換の二重計算を排除する統合関数(compute_ipa/compute_line_ipa)を追加 - LazyLock+RwLock<HashMap>によるプロセスグローバルなメモ化キャッシュを導入 同じ(name_katakana, name_roman)に対するIPA計算を1回に削減 - Station/Line/TrainType各DTOの3つの個別IPA呼び出しをキャッシュ付き統合関数に置換 - query.rs内のルート検索でのインラインIPA呼び出しも同様に置換 - train_type取得のN+1クエリをUNNESTベースのバッチクエリに置換 https://claude.ai/code/session_01C529f2ARzy4nrGEak9kgE4
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughIpaResultによるIPA/TTS出力の単一化と駅/路線向けグローバルキャッシュ導入、および列車種別リポジトリに複数(line_group_id, line_id)ペアを一括取得するバッチAPI追加。DTOやインタラクターはこれらの新APIへ切替済み。 変更内容
シーケンス図sequenceDiagram
participant Client as クライアント
participant Cache as IPAキャッシュ
participant Compute as IPA計算
participant DTO as DTO変換
Client->>Cache: compute_ipa_cached(katakana, roman)
Cache->>Cache: 読み取りロックで検索
alt キャッシュヒット
Cache-->>Client: IpaResult を返却
else キャッシュミス
Cache->>Compute: compute_ipa / compute_line_ipa を呼出し
Compute->>Compute: name_ipa, name_roman_ipa, tts_segments を生成
Compute-->>Cache: 計算結果を返却
Cache->>Cache: 書き込みロックで挿入
Cache-->>Client: IpaResult を返却
end
Client->>DTO: IpaResult からフィールドを抽出してマッピング
DTO-->>Client: Grpc メッセージ返却
推定コードレビュー工数🎯 4 (Complex) | ⏱️ ~60 minutes 関連PRの可能性
推奨ラベル
ポエム
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
stationapi/src/use_case/interactor/query.rs (2)
620-635:tt_lookup_pairsは重複排除してから渡したいです。同じ
line_group_cdに複数のtrain_typesがあると、同一の(line_group_cd, line_cd)がその数だけ配列へ積まれます。戻り値はHashMapなので重複に意味はなく、UNNESTの入力だけが膨らんでしまいます。今回の最適化効果を落とさないためにも、ここで一度集合化してからクエリへ渡すのがよさそうです。♻️ 例
- let tt_lookup_pairs: Vec<(u32, u32)> = train_types + let tt_lookup_pairs: Vec<(u32, u32)> = train_types .iter() .filter_map(|tt| tt.line_group_cd.map(|lgc| lgc as u32)) .flat_map(|lgc| { lines .iter() .filter(move |l| l.line_group_cd == Some(lgc as i32)) .map(move |l| (lgc, l.line_cd as u32)) }) + .collect::<HashSet<_>>() + .into_iter() .collect();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/use_case/interactor/query.rs` around lines 620 - 635, tt_lookup_pairs currently collects duplicate (line_group_cd, line_cd) tuples which unnecessarily expands the query input; change the building of tt_lookup_pairs (based on train_types and lines) to deduplicate pairs before calling self.train_type_repository.find_by_line_group_id_and_line_id_vec, e.g. collect into a HashSet or use iterator .unique() equivalent and then convert back to Vec so tt_lookup_pairs contains unique (u32,u32) pairs only; the rest of the flow (tt_by_pair) can remain unchanged.
1945-1950: このモックだと batch 経路の検証が空振りになります。
get_train_types_by_station_id()の新経路はfind_by_line_group_id_and_line_id_vec()に依存しますが、この実装はself.train_typesを見ずに常に空HashMapを返しています。今後この経路のテストを追加してもline.train_typeが必ずNoneになって、本番相当の検証ができません。少なくともテストデータから(line_group_id, line_id) -> TrainTypeを返せる形にしておきたいです。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/use_case/interactor/query.rs` around lines 1945 - 1950, The mock implementation of find_by_line_group_id_and_line_id_vec currently always returns an empty HashMap, causing batch-route tests to miss train_type data; update the method (find_by_line_group_id_and_line_id_vec) to iterate the provided key slice and build a HashMap by looking up matching entries in self.train_types (or the mock's test-data store), mapping each (line_group_id, line_id) to its TrainType when present, and return that map wrapped in Ok(...) while preserving the Result<..., DomainError> signature.stationapi/src/infrastructure/train_type_repository.rs (1)
420-504: この batch SQL には integration test を 1 本足しておきたいです。
UNNEST($1, $2)の並列ペアリングとDISTINCT ONの重複解消は、SQL を少し触っただけでも壊れやすいポイントです。重複 pair と miss を含むケースを固定しておくと、この最適化経路の回帰を拾いやすくなります。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/infrastructure/train_type_repository.rs` around lines 420 - 504, Add an integration test for get_by_line_group_id_and_line_id_vec that verifies UNNEST($1::int[], $2::int[]) pairs and DISTINCT ON deduplication: seed the DB with types, stations and station_station_types rows so one pair has multiple sst rows (different sst.id) and another pair has no matching stations; call get_by_line_group_id_and_line_id_vec with a pairs array containing duplicate pairs and a non-existent pair, then assert the returned HashMap contains exactly one TrainType for the duplicated pair (matching the sst row expected by ORDER BY sst.id) and no entry for the missing pair; ensure the test uses the same DB connection setup/transaction helpers your integration tests use so it can run isolated and roll back.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stationapi/src/infrastructure/train_type_repository.rs`:
- Around line 420-504: Add an integration test for
get_by_line_group_id_and_line_id_vec that verifies UNNEST($1::int[], $2::int[])
pairs and DISTINCT ON deduplication: seed the DB with types, stations and
station_station_types rows so one pair has multiple sst rows (different sst.id)
and another pair has no matching stations; call
get_by_line_group_id_and_line_id_vec with a pairs array containing duplicate
pairs and a non-existent pair, then assert the returned HashMap contains exactly
one TrainType for the duplicated pair (matching the sst row expected by ORDER BY
sst.id) and no entry for the missing pair; ensure the test uses the same DB
connection setup/transaction helpers your integration tests use so it can run
isolated and roll back.
In `@stationapi/src/use_case/interactor/query.rs`:
- Around line 620-635: tt_lookup_pairs currently collects duplicate
(line_group_cd, line_cd) tuples which unnecessarily expands the query input;
change the building of tt_lookup_pairs (based on train_types and lines) to
deduplicate pairs before calling
self.train_type_repository.find_by_line_group_id_and_line_id_vec, e.g. collect
into a HashSet or use iterator .unique() equivalent and then convert back to Vec
so tt_lookup_pairs contains unique (u32,u32) pairs only; the rest of the flow
(tt_by_pair) can remain unchanged.
- Around line 1945-1950: The mock implementation of
find_by_line_group_id_and_line_id_vec currently always returns an empty HashMap,
causing batch-route tests to miss train_type data; update the method
(find_by_line_group_id_and_line_id_vec) to iterate the provided key slice and
build a HashMap by looking up matching entries in self.train_types (or the
mock's test-data store), mapping each (line_group_id, line_id) to its TrainType
when present, and return that map wrapped in Ok(...) while preserving the
Result<..., DomainError> signature.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f997eaee-faca-4d24-8063-a0b61c59f465
📒 Files selected for processing (7)
stationapi/src/domain/ipa.rsstationapi/src/domain/repository/train_type_repository.rsstationapi/src/infrastructure/train_type_repository.rsstationapi/src/use_case/dto/line.rsstationapi/src/use_case/dto/station.rsstationapi/src/use_case/dto/train_type.rsstationapi/src/use_case/interactor/query.rs
…ne_id_vec - Collect tt_lookup_pairs into HashSet before converting to Vec to avoid sending duplicate (line_group_cd, line_cd) pairs to the DB query - Update ConfigurableMockTrainTypeRepository's find_by_line_group_id_and_line_id_vec to actually look up matching TrainType entries by line_group_cd instead of always returning an empty HashMap https://claude.ai/code/session_01C529f2ARzy4nrGEak9kgE4
There was a problem hiding this comment.
🧹 Nitpick comments (1)
stationapi/src/use_case/interactor/query.rs (1)
1947-1962: モック内で単体APIとバッチAPIの意味論を揃えてください。この実装だと
find_by_line_group_id_and_line_idは常にNoneなのに、find_by_line_group_id_and_line_id_vecは値を返すため、同一リポジトリ内で挙動が不一致です。テストが実装差分を見逃す原因になります。共通ヘルパーに寄せて、両メソッドの返却規則を一致させるのを推奨します。差分例(挙動統一の方向性)
impl TrainTypeRepository for ConfigurableMockTrainTypeRepository { async fn find_by_line_group_id_and_line_id( &self, - _: u32, - _: u32, + line_group_id: u32, + _line_id: u32, ) -> Result<Option<TrainType>, DomainError> { - Ok(None) + Ok(self + .train_types + .iter() + .find(|t| t.line_group_cd == Some(line_group_id as i32)) + .cloned()) } async fn find_by_line_group_id_and_line_id_vec( &self, pairs: &[(u32, u32)], ) -> Result<std::collections::HashMap<(u32, u32), TrainType>, DomainError> { let mut map = std::collections::HashMap::new(); for &(lg, lc) in pairs { - if let Some(tt) = self - .train_types - .iter() - .find(|t| t.line_group_cd == Some(lg as i32)) - { + if let Some(tt) = self.find_by_line_group_id_and_line_id(lg, lc).await? { map.insert((lg, lc), tt.clone()); } } Ok(map) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/use_case/interactor/query.rs` around lines 1947 - 1962, 現在のモックは単体メソッド find_by_line_group_id_and_line_id が常に None になる一方で、バッチ find_by_line_group_id_and_line_id_vec は値を返しており振る舞いが不一致です。両者を同じ返却規則に揃えるため、マッチ判定ロジックを共通ヘルパー(例: matches_line(line_group_id, line_id, &TrainType))に抽出し、find_by_line_group_id_and_line_id はそのヘルパーを使って Option を返すように、find_by_line_group_id_and_line_id_vec も同じヘルパーで一致するペアのみを HashMap に挿入するよう修正してください(現在の実装が line_id を無視しているので必ず両方をチェックすること)。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stationapi/src/use_case/interactor/query.rs`:
- Around line 1947-1962: 現在のモックは単体メソッド find_by_line_group_id_and_line_id が常に
None になる一方で、バッチ find_by_line_group_id_and_line_id_vec
は値を返しており振る舞いが不一致です。両者を同じ返却規則に揃えるため、マッチ判定ロジックを共通ヘルパー(例:
matches_line(line_group_id, line_id,
&TrainType))に抽出し、find_by_line_group_id_and_line_id はそのヘルパーを使って Option
を返すように、find_by_line_group_id_and_line_id_vec も同じヘルパーで一致するペアのみを HashMap
に挿入するよう修正してください(現在の実装が line_id を無視しているので必ず両方をチェックすること)。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cab46d7d-ab16-4121-884e-0e23c023430a
📒 Files selected for processing (1)
stationapi/src/use_case/interactor/query.rs
Extract matches_pair(tt, line_group_id, line_id) that checks both line_group_cd and line_id (via tt.lines when populated). Both find_by_line_group_id_and_line_id and find_by_line_group_id_and_line_id_vec now use this helper, fixing the inconsistency where the single-pair method always returned None while the batch method returned data. https://claude.ai/code/session_01C529f2ARzy4nrGEak9kgE4
Summary
This PR improves performance by introducing IPA computation caching and optimizing train type lookups through batch queries instead of individual database calls.
Key Changes
IPA Computation Caching: Added
compute_ipa_cached()andcompute_line_ipa_cached()functions with thread-safe memoization usingLazyLockandRwLock. These functions consolidate redundant IPA computations (katakana-to-IPA, roman-to-IPA, and TTS segments) into a single pass, eliminating double-computation ofstation_name_to_tts_segments().Batch Train Type Queries: Introduced
find_by_line_group_id_and_line_id_vec()method to theTrainTypeRepositorytrait that accepts multiple (line_group_id, line_id) pairs and returns results in a HashMap. This replaces individual database queries in the query interactor with a single batch query.Query Optimization: Modified the station query interactor to collect all (line_group_cd, line_cd) pairs upfront and perform a single batch lookup instead of querying the database for each line-train type combination.
DTO Simplification: Updated
Station,Line, andTrainTypeDTO conversions to use the new cached IPA functions, reducing code duplication and improving maintainability.Database Query Enhancement: Implemented
get_by_line_group_id_and_line_id_vec()in the PostgreSQL repository usingUNNESTto efficiently join multiple pairs in a single query withDISTINCT ONto handle duplicates.Notable Implementation Details
UNNESTfunction to unnest two parallel arrays of IDs and join them efficiently.https://claude.ai/code/session_01C529f2ARzy4nrGEak9kgE4
Summary by CodeRabbit
リリースノート
パフォーマンス改善
Refactor
Tests