Improve name_ipa generation and add IPA audit#1429
Conversation
📝 WalkthroughWalkthrough駅名IPA変換にローマ字入力対応とトークン化処理を追加し、CSVを走査して未解決トークンを集計する読み取り専用のIPA監査ツールとMakefileターゲットを導入した。 Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Makefile / Dev
participant Tool as ipa_audit (tools/ipa_audit.rs)
participant Domain as stationapi::domain::ipa
participant Data as CSV files (data/2!,3!,4!)
participant Out as Stdout
Dev->>Tool: make ipa-audit(コンパイル & 実行)
Tool->>Data: CSV を開いて行を読む
loop 各名称ごと
Tool->>Domain: romanized_name_to_ipa(name_roman) / word_to_ipa(token)
Domain-->>Tool: IPA 文字列または None
alt 未解決トークン
Tool->>Tool: トークン集計と例を記録
end
end
Tool->>Out: データセット別集計と未解決トップトークンを出力
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
stationapi/src/domain/ipa.rs (1)
452-466:number_to_ipaは単一桁のみをサポート現在の実装では単一桁(0-9)のみがIPAに変換されます。「Terminal 12」のような複数桁の数字は変換できません。
現状の使用ケースで問題がなければそのままで構いませんが、将来的に複数桁のサポートが必要になる可能性があることを認識しておいてください。
🔧 複数桁サポートの参考実装
fn number_to_ipa(word: &str) -> Option<String> { const DIGITS: &[&str] = &[ "zɪɹoʊ", "wʌn", "tuː", "θɹiː", "fɔɹ", "faɪv", "sɪks", "sɛvən", "eɪt", "naɪn", ]; let mut result = Vec::new(); for c in word.chars() { let digit = c.to_digit(10)? as usize; result.push(DIGITS[digit]); } Some(result.join(" ")) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/domain/ipa.rs` around lines 452 - 466, The current number_to_ipa function only handles single digits and returns Option<&'static str>; change number_to_ipa to accept &str and return Option<String>, build a const DIGITS array of IPA strings for 0–9, iterate word.chars(), use c.to_digit(10)? to map each char to an index, collect the corresponding DIGITS entries and join them with spaces into a single String, and update any callers of number_to_ipa to handle Option<String> instead of Option<&'static str>.tools/ipa_audit.rs (1)
137-163: カスタムCSVパーサーの制限事項このCSVパーサーはクォート内の改行を処理しません。CSVデータにクォートされたフィールド内で改行が含まれている場合、正しくパースされない可能性があります。
現在のデータセット(
lines.csv、stations.csv、types.csv)にそのようなデータが存在しないことを確認するか、必要に応じてcsvクレートの使用を検討してください。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/ipa_audit.rs` around lines 137 - 163, The custom CSV parser function parse_csv_line does not handle quoted fields that contain embedded newlines, so multiline quoted fields will be parsed incorrectly; fix by either replacing parse_csv_line and its callers with the Rust csv crate (use csv::Reader to read records from the files lines.csv, stations.csv, types.csv so quoting/newlines are handled correctly) or, if you must keep the custom parser, modify the reading logic that feeds parse_csv_line to accumulate input lines until the quote state is balanced (i.e., keep appending subsequent physical lines while the number of unclosed quotes detected by parse_csv_line’s state machine is odd) before calling parse_csv_line.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 452-466: The current number_to_ipa function only handles single
digits and returns Option<&'static str>; change number_to_ipa to accept &str and
return Option<String>, build a const DIGITS array of IPA strings for 0–9,
iterate word.chars(), use c.to_digit(10)? to map each char to an index, collect
the corresponding DIGITS entries and join them with spaces into a single String,
and update any callers of number_to_ipa to handle Option<String> instead of
Option<&'static str>.
In `@tools/ipa_audit.rs`:
- Around line 137-163: The custom CSV parser function parse_csv_line does not
handle quoted fields that contain embedded newlines, so multiline quoted fields
will be parsed incorrectly; fix by either replacing parse_csv_line and its
callers with the Rust csv crate (use csv::Reader to read records from the files
lines.csv, stations.csv, types.csv so quoting/newlines are handled correctly)
or, if you must keep the custom parser, modify the reading logic that feeds
parse_csv_line to accumulate input lines until the quote state is balanced
(i.e., keep appending subsequent physical lines while the number of unclosed
quotes detected by parse_csv_line’s state machine is odd) before calling
parse_csv_line.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 74699f56-a88b-49ac-9267-ade76d065cc8
📒 Files selected for processing (6)
AGENTS.mdMakefilestationapi/src/domain/ipa.rsstationapi/src/use_case/dto/station.rsstationapi/src/use_case/interactor/query.rstools/ipa_audit.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 128-130: word_to_ipa currently delegates numeric tokens to
number_to_ipa but only handles single-digit results, so multi-digit strings like
"10" return None and cause a full romanization fallback; change the numeric
branch in word_to_ipa to: call number_to_ipa(&normalized) first and if it
returns None, iterate normalized.chars(), convert each digit separately via
number_to_ipa(&digit.to_string()), collect and concatenate the per-digit IPA
results into a single String (return Some when all digits succeed, otherwise
return None). Apply the same change to the other numeric-handling site that
mirrors this logic (the later block around the second numeric-handling section).
In `@stationapi/src/use_case/dto/line.rs`:
- Around line 11-14: The computed name_ipa can become Some("") which should be
normalized to None; update the logic around station_name_to_ipa /
replace_line_name_suffix / katakana_to_ipa so any empty-string result is
converted to None (e.g., after either branch, filter out empty results or use
and_then to return None when the produced IPA is an empty string), referencing
the name_ipa binding and inputs line.line_name_r and line.line_name_k to ensure
downstream code always sees None instead of Some("").
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f7cdd303-5e3d-4227-abf4-9b599f5c4acb
📒 Files selected for processing (2)
stationapi/src/domain/ipa.rsstationapi/src/use_case/dto/line.rs
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
stationapi/src/use_case/dto/line.rs (1)
11-16:⚠️ Potential issue | 🟡 Minor空白だけの
name_ipaはまだNoneに正規化されません。Line 16 は空文字だけを除外しているので、
line_name_kが空白のみのケースではkatakana_to_ipaの透過スペースがSome(" ")のまま残ります。最終結果をtrim()してから判定した方が downstream で扱いやすいです。修正案
let name_ipa = station_name_to_ipa("", line.line_name_r.as_deref()) .or_else(|| { let (stem, suffix_ipa) = replace_line_name_suffix(&line.line_name_k); katakana_to_ipa(stem).map(|ipa| format!("{ipa}{suffix_ipa}")) }) - .filter(|ipa| !ipa.is_empty()); + .map(|ipa| ipa.trim().to_string()) + .filter(|ipa| !ipa.is_empty());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/use_case/dto/line.rs` around lines 11 - 16, name_ipa may end up Some(" ") when line_name_k is only whitespace; change the pipeline around station_name_to_ipa / replace_line_name_suffix / katakana_to_ipa so that the produced IPA string is trimmed before empty check—e.g. after mapping to format!("{ipa}{suffix_ipa}") or after katakana_to_ipa, call trim() (or map to a trimmed String) and then filter out empty strings; update the chain that builds name_ipa (variables/functions: name_ipa, station_name_to_ipa, replace_line_name_suffix, katakana_to_ipa) to normalize whitespace to None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 118-141: The word_to_ipa function misses mixed alphanumeric tokens
(e.g., "B快速") because the digits-only branch only handles pure digits and
everything else falls through to romaji_to_katakana; add a branch in word_to_ipa
that detects when normalized contains both ASCII letters and digits (or
interleaved digit runs) and then process the token by splitting it into runs:
for digit runs call number_to_ipa (per-run or per-digit) and append their IPA;
for ASCII letter runs call romaji_to_katakana and then katakana_to_ipa and
append that result; leave non-ASCII/non-alnum runs to the existing
romaji_to_katakana fallback. Reference: word_to_ipa, normalize_name_token,
number_to_ipa, romaji_to_katakana, katakana_to_ipa.
---
Duplicate comments:
In `@stationapi/src/use_case/dto/line.rs`:
- Around line 11-16: name_ipa may end up Some(" ") when line_name_k is only
whitespace; change the pipeline around station_name_to_ipa /
replace_line_name_suffix / katakana_to_ipa so that the produced IPA string is
trimmed before empty check—e.g. after mapping to format!("{ipa}{suffix_ipa}") or
after katakana_to_ipa, call trim() (or map to a trimmed String) and then filter
out empty strings; update the chain that builds name_ipa (variables/functions:
name_ipa, station_name_to_ipa, replace_line_name_suffix, katakana_to_ipa) to
normalize whitespace to None.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8e891420-5e9f-4bee-b488-d72e2f31c1de
📒 Files selected for processing (2)
stationapi/src/domain/ipa.rsstationapi/src/use_case/dto/line.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
stationapi/src/use_case/dto/line.rs (1)
16-17: 冗長なフィルターがありますが、動作は正しいです。
station_name_to_ipaは内部で既に.filter(|ipa| !ipa.is_empty())を適用しています(ipa.rs:72 参照)。そのため、Line 17 の外側の.filter(|ipa| !ipa.is_empty())は技術的には冗長です。ただし、防御的プログラミングとして残しておくことも合理的なので、必須の修正ではありません。
♻️ オプション: 冗長なフィルターを削除
- let name_roman_ipa = - station_name_to_ipa("", line.line_name_r.as_deref()).filter(|ipa| !ipa.is_empty()); + let name_roman_ipa = station_name_to_ipa("", line.line_name_r.as_deref());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/use_case/dto/line.rs` around lines 16 - 17, The assignment to name_roman_ipa applies .filter(|ipa| !ipa.is_empty()) twice; remove the outer filter so that name_roman_ipa is set directly from station_name_to_ipa("", line.line_name_r.as_deref()), since station_name_to_ipa already filters empty strings; update the expression that constructs name_roman_ipa (referencing station_name_to_ipa and line.line_name_r) to omit the redundant .filter call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stationapi/src/use_case/dto/line.rs`:
- Around line 16-17: The assignment to name_roman_ipa applies .filter(|ipa|
!ipa.is_empty()) twice; remove the outer filter so that name_roman_ipa is set
directly from station_name_to_ipa("", line.line_name_r.as_deref()), since
station_name_to_ipa already filters empty strings; update the expression that
constructs name_roman_ipa (referencing station_name_to_ipa and line.line_name_r)
to omit the redundant .filter call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: effd7ae3-e612-4d12-bf89-d0180c904399
📒 Files selected for processing (5)
stationapi/protostationapi/src/use_case/dto/line.rsstationapi/src/use_case/dto/station.rsstationapi/src/use_case/dto/train_type.rsstationapi/src/use_case/interactor/query.rs
✅ Files skipped from review due to trivial changes (1)
- stationapi/proto
🚧 Files skipped from review as they are similar to previous changes (2)
- stationapi/src/use_case/interactor/query.rs
- stationapi/src/use_case/dto/station.rs
Summary
name_ipafrom official romanized/English names when available, with katakana fallbackmake ipa-auditreport forlines,stations, andtrain_typeswithout changingdata_validatorfailure behaviorVerification
rustfmt stationapi/src/domain/ipa.rs tools/ipa_audit.rsmake ipa-auditrustc --test /tmp/ipa_smoke.rs -o /tmp/ipa_smoke && /tmp/ipa_smoke --nocapture test_station_name_ipa_supports_english_and_digits test_station_name_ipa_supports_mixed_english_facility_words test_station_name_ipa_supports_common_line_words test_station_name_ipa_supports_bilabial_m_in_romaji test_station_name_ipa_supports_line_related_english_words test_station_name_ipa_supports_train_type_wordsSummary by CodeRabbit
新機能
ドキュメント
テスト