Improve name_ipa generation and add IPA audit by TinyKitten · Pull Request #1429 · TrainLCD/StationAPI

TinyKitten · 2026-03-14T21:16:37Z

Summary

generate station name_ipa from official romanized/English names when available, with katakana fallback
expand the IPA lexicon and romaji handling to cover mixed English/Japanese station, line, and train-type names
add a read-only make ipa-audit report for lines, stations, and train_types without changing data_validator failure behavior

Verification

rustfmt stationapi/src/domain/ipa.rs tools/ipa_audit.rs
make ipa-audit
rustc --test /tmp/ipa_smoke.rs -o /tmp/ipa_smoke && /tmp/ipa_smoke --nocapture test_station_name_ipa_supports_english_and_digits test_station_name_ipa_supports_mixed_english_facility_words test_station_name_ipa_supports_common_line_words test_station_name_ipa_supports_bilabial_m_in_romaji test_station_name_ipa_supports_line_related_english_words test_station_name_ipa_supports_train_type_words

Summary by CodeRabbit

新機能
- 駅名・種別名のIPA出力を強化：ローマ字表記を優先しカタカナへフォールバックするIPA表記を追加。API応答にローマ字由来のIPAフィールドが含まれます。
- IPA監査ツールとビルドターゲット（ipa-audit）を追加し、未解決トークンと例をレポート。
ドキュメント
- IPA監査の利用方法と出力内容を追記。
テスト
- 新しい変換挙動を検証する多数のテストを追加。

coderabbitai · 2026-03-14T21:16:55Z

📝 Walkthrough

Walkthrough

駅名IPA変換にローマ字入力対応とトークン化処理を追加し、CSVを走査して未解決トークンを集計する読み取り専用のIPA監査ツールとMakefileターゲットを導入した。

Changes

Cohort / File(s)	Summary
Documentation `AGENTS.md`	`make ipa-audit`で実行する「IPA coverage audit」項目を追加（読み取り専用の監査説明）。
Build / CI `Makefile`	`.PHONY`に`ipa-audit`を追加。Rustツールをコンパイルして実行する`ipa-audit`ターゲットとhelp出力を追加。
IPA Conversion Logic `stationapi/src/domain/ipa.rs`	公開関数`station_name_to_ipa()`および`romanized_name_to_ipa`、`word_to_ipa`等のトークン化・正規化・英語辞書・数値→IPA・romaji→片仮名変換補助群と多数のテストを追加。
DTO / Use-case updates `stationapi/src/use_case/dto/station.rs`, `stationapi/src/use_case/dto/line.rs`, `stationapi/src/use_case/dto/train_type.rs`, `stationapi/src/use_case/interactor/query.rs`	DTO組み立てに`name_roman_ipa`を追加し、`station_name_to_ipa`を利用してローマ字優先のIPA情報を出力するよう変更（関連テスト追加/更新）。
Audit tool `tools/ipa_audit.rs`	`data/2!lines.csv`,`data/3!stations.csv`,`data/4!types.csv`を解析し、`romanized_name_to_ipa`/`word_to_ipa`で解決できないトークンを集計・例示するCLIツールを追加。カスタムCSVパーサとトークン統計を含む。
Proto submodule `stationapi/proto`	submodule参照コミット更新（API変更は検出されず）。

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Dev as Makefile / Dev
    participant Tool as ipa_audit (tools/ipa_audit.rs)
    participant Domain as stationapi::domain::ipa
    participant Data as CSV files (data/2!,3!,4!)
    participant Out as Stdout

    Dev->>Tool: make ipa-audit（コンパイル & 実行）
    Tool->>Data: CSV を開いて行を読む
    loop 各名称ごと
      Tool->>Domain: romanized_name_to_ipa(name_roman) / word_to_ipa(token)
      Domain-->>Tool: IPA 文字列または None
      alt 未解決トークン
        Tool->>Tool: トークン集計と例を記録
      end
    end
    Tool->>Out: データセット別集計と未解決トップトークンを出力

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add katakana to IPA conversion for SSML phoneme tags #1410: ローマ字対応とトークン化の拡張に直接関連（domain/ipa.rsとDTOの重複・発展）。
fix: IPA生成でエ段の母音をɛからeに変更しGoogle TTSの発音を修正 #1418: 母音長や長音ルールの変更でIPA生成パイプラインに影響するため関連。
fix: IPA母音長音化ルールを削除しGoogle TTSの誤読を修正 #1419: 音節境界や母音処理の変更がdomain/ipa.rsの挙動に接点あり。

Suggested reviewers

10mocy

Poem

🐰 ローマ字ぴょんと跳ねて来た、
名前を数えて音を探すよ、
未解決は集めて見せるだけ、
CSVの林をぴょんぴょん進む、
駅名の歌、少しずつ整うよ。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main changes: improving IPA generation from romanized names and adding an audit tool for IPA coverage across datasets.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/ipa-audit-and-name-ipa

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

stationapi/src/domain/ipa.rs (1)
452-466: number_to_ipa は単一桁のみをサポート

現在の実装では単一桁（0-9）のみがIPAに変換されます。「Terminal 12」のような複数桁の数字は変換できません。

現状の使用ケースで問題がなければそのままで構いませんが、将来的に複数桁のサポートが必要になる可能性があることを認識しておいてください。
🔧 複数桁サポートの参考実装
fn number_to_ipa(word: &str) -> Option<String> {
    const DIGITS: &[&str] = &[
        "zɪɹoʊ", "wʌn", "tuː", "θɹiː", "fɔɹ",
        "faɪv", "sɪks", "sɛvən", "eɪt", "naɪn",
    ];
    
    let mut result = Vec::new();
    for c in word.chars() {
        let digit = c.to_digit(10)? as usize;
        result.push(DIGITS[digit]);
    }
    Some(result.join(" "))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/domain/ipa.rs` around lines 452 - 466, The current
number_to_ipa function only handles single digits and returns Option<&'static
str>; change number_to_ipa to accept &str and return Option<String>, build a
const DIGITS array of IPA strings for 0–9, iterate word.chars(), use
c.to_digit(10)? to map each char to an index, collect the corresponding DIGITS
entries and join them with spaces into a single String, and update any callers
of number_to_ipa to handle Option<String> instead of Option<&'static str>.
tools/ipa_audit.rs (1)
137-163: カスタムCSVパーサーの制限事項

このCSVパーサーはクォート内の改行を処理しません。CSVデータにクォートされたフィールド内で改行が含まれている場合、正しくパースされない可能性があります。

現在のデータセット（lines.csv、stations.csv、types.csv）にそのようなデータが存在しないことを確認するか、必要に応じて csv クレートの使用を検討してください。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/ipa_audit.rs` around lines 137 - 163, The custom CSV parser function
parse_csv_line does not handle quoted fields that contain embedded newlines, so
multiline quoted fields will be parsed incorrectly; fix by either replacing
parse_csv_line and its callers with the Rust csv crate (use csv::Reader to read
records from the files lines.csv, stations.csv, types.csv so quoting/newlines
are handled correctly) or, if you must keep the custom parser, modify the
reading logic that feeds parse_csv_line to accumulate input lines until the
quote state is balanced (i.e., keep appending subsequent physical lines while
the number of unclosed quotes detected by parse_csv_line’s state machine is odd)
before calling parse_csv_line.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 452-466: The current number_to_ipa function only handles single
digits and returns Option<&'static str>; change number_to_ipa to accept &str and
return Option<String>, build a const DIGITS array of IPA strings for 0–9,
iterate word.chars(), use c.to_digit(10)? to map each char to an index, collect
the corresponding DIGITS entries and join them with spaces into a single String,
and update any callers of number_to_ipa to handle Option<String> instead of
Option<&'static str>.

In `@tools/ipa_audit.rs`:
- Around line 137-163: The custom CSV parser function parse_csv_line does not
handle quoted fields that contain embedded newlines, so multiline quoted fields
will be parsed incorrectly; fix by either replacing parse_csv_line and its
callers with the Rust csv crate (use csv::Reader to read records from the files
lines.csv, stations.csv, types.csv so quoting/newlines are handled correctly)
or, if you must keep the custom parser, modify the reading logic that feeds
parse_csv_line to accumulate input lines until the quote state is balanced
(i.e., keep appending subsequent physical lines while the number of unclosed
quotes detected by parse_csv_line’s state machine is odd) before calling
parse_csv_line.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74699f56-a88b-49ac-9267-ade76d065cc8

📥 Commits

Reviewing files that changed from the base of the PR and between c520850 and 431377b.

📒 Files selected for processing (6)

AGENTS.md
Makefile
stationapi/src/domain/ipa.rs
stationapi/src/use_case/dto/station.rs
stationapi/src/use_case/interactor/query.rs
tools/ipa_audit.rs

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 128-130: word_to_ipa currently delegates numeric tokens to
number_to_ipa but only handles single-digit results, so multi-digit strings like
"10" return None and cause a full romanization fallback; change the numeric
branch in word_to_ipa to: call number_to_ipa(&normalized) first and if it
returns None, iterate normalized.chars(), convert each digit separately via
number_to_ipa(&digit.to_string()), collect and concatenate the per-digit IPA
results into a single String (return Some when all digits succeed, otherwise
return None). Apply the same change to the other numeric-handling site that
mirrors this logic (the later block around the second numeric-handling section).

In `@stationapi/src/use_case/dto/line.rs`:
- Around line 11-14: The computed name_ipa can become Some("") which should be
normalized to None; update the logic around station_name_to_ipa /
replace_line_name_suffix / katakana_to_ipa so any empty-string result is
converted to None (e.g., after either branch, filter out empty results or use
and_then to return None when the produced IPA is an empty string), referencing
the name_ipa binding and inputs line.line_name_r and line.line_name_k to ensure
downstream code always sees None instead of Some("").

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f7cdd303-5e3d-4227-abf4-9b599f5c4acb

📥 Commits

Reviewing files that changed from the base of the PR and between 431377b and 89505a8.

📒 Files selected for processing (2)

stationapi/src/domain/ipa.rs
stationapi/src/use_case/dto/line.rs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

stationapi/src/use_case/dto/line.rs (1)

11-16: ⚠️ Potential issue | 🟡 Minor

空白だけの name_ipa はまだ None に正規化されません。

Line 16 は空文字だけを除外しているので、line_name_k が空白のみのケースでは katakana_to_ipa の透過スペースが Some(" ") のまま残ります。最終結果を trim() してから判定した方が downstream で扱いやすいです。

修正案

         let name_ipa = station_name_to_ipa("", line.line_name_r.as_deref())
             .or_else(|| {
                 let (stem, suffix_ipa) = replace_line_name_suffix(&line.line_name_k);
                 katakana_to_ipa(stem).map(|ipa| format!("{ipa}{suffix_ipa}"))
             })
-            .filter(|ipa| !ipa.is_empty());
+            .map(|ipa| ipa.trim().to_string())
+            .filter(|ipa| !ipa.is_empty());

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/use_case/dto/line.rs` around lines 11 - 16, name_ipa may end
up Some(" ") when line_name_k is only whitespace; change the pipeline around
station_name_to_ipa / replace_line_name_suffix / katakana_to_ipa so that the
produced IPA string is trimmed before empty check—e.g. after mapping to
format!("{ipa}{suffix_ipa}") or after katakana_to_ipa, call trim() (or map to a
trimmed String) and then filter out empty strings; update the chain that builds
name_ipa (variables/functions: name_ipa, station_name_to_ipa,
replace_line_name_suffix, katakana_to_ipa) to normalize whitespace to None.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 118-141: The word_to_ipa function misses mixed alphanumeric tokens
(e.g., "B快速") because the digits-only branch only handles pure digits and
everything else falls through to romaji_to_katakana; add a branch in word_to_ipa
that detects when normalized contains both ASCII letters and digits (or
interleaved digit runs) and then process the token by splitting it into runs:
for digit runs call number_to_ipa (per-run or per-digit) and append their IPA;
for ASCII letter runs call romaji_to_katakana and then katakana_to_ipa and
append that result; leave non-ASCII/non-alnum runs to the existing
romaji_to_katakana fallback. Reference: word_to_ipa, normalize_name_token,
number_to_ipa, romaji_to_katakana, katakana_to_ipa.

---

Duplicate comments:
In `@stationapi/src/use_case/dto/line.rs`:
- Around line 11-16: name_ipa may end up Some(" ") when line_name_k is only
whitespace; change the pipeline around station_name_to_ipa /
replace_line_name_suffix / katakana_to_ipa so that the produced IPA string is
trimmed before empty check—e.g. after mapping to format!("{ipa}{suffix_ipa}") or
after katakana_to_ipa, call trim() (or map to a trimmed String) and then filter
out empty strings; update the chain that builds name_ipa (variables/functions:
name_ipa, station_name_to_ipa, replace_line_name_suffix, katakana_to_ipa) to
normalize whitespace to None.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e891420-5e9f-4bee-b488-d72e2f31c1de

📥 Commits

Reviewing files that changed from the base of the PR and between 89505a8 and e2cca48.

📒 Files selected for processing (2)

stationapi/src/domain/ipa.rs
stationapi/src/use_case/dto/line.rs

coderabbitai

🧹 Nitpick comments (1)

stationapi/src/use_case/dto/line.rs (1)
16-17: 冗長なフィルターがありますが、動作は正しいです。

station_name_to_ipa は内部で既に .filter(|ipa| !ipa.is_empty()) を適用しています（ipa.rs:72 参照）。そのため、Line 17 の外側の .filter(|ipa| !ipa.is_empty()) は技術的には冗長です。

ただし、防御的プログラミングとして残しておくことも合理的なので、必須の修正ではありません。
♻️ オプション: 冗長なフィルターを削除
-        let name_roman_ipa =
-            station_name_to_ipa("", line.line_name_r.as_deref()).filter(|ipa| !ipa.is_empty());
+        let name_roman_ipa = station_name_to_ipa("", line.line_name_r.as_deref());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/use_case/dto/line.rs` around lines 16 - 17, The assignment to
name_roman_ipa applies .filter(|ipa| !ipa.is_empty()) twice; remove the outer
filter so that name_roman_ipa is set directly from station_name_to_ipa("",
line.line_name_r.as_deref()), since station_name_to_ipa already filters empty
strings; update the expression that constructs name_roman_ipa (referencing
station_name_to_ipa and line.line_name_r) to omit the redundant .filter call.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@stationapi/src/use_case/dto/line.rs`:
- Around line 16-17: The assignment to name_roman_ipa applies .filter(|ipa|
!ipa.is_empty()) twice; remove the outer filter so that name_roman_ipa is set
directly from station_name_to_ipa("", line.line_name_r.as_deref()), since
station_name_to_ipa already filters empty strings; update the expression that
constructs name_roman_ipa (referencing station_name_to_ipa and line.line_name_r)
to omit the redundant .filter call.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: effd7ae3-e612-4d12-bf89-d0180c904399

📥 Commits

Reviewing files that changed from the base of the PR and between e2cca48 and 5f80a6d.

📒 Files selected for processing (5)

stationapi/proto
stationapi/src/use_case/dto/line.rs
stationapi/src/use_case/dto/station.rs
stationapi/src/use_case/dto/train_type.rs
stationapi/src/use_case/interactor/query.rs

✅ Files skipped from review due to trivial changes (1)

stationapi/proto

🚧 Files skipped from review as they are similar to previous changes (2)

stationapi/src/use_case/interactor/query.rs
stationapi/src/use_case/dto/station.rs

Improve name_ipa generation and add IPA audit

431377b

github-actions Bot added the feature 要望対応や課題解決 label Mar 14, 2026

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

coderabbitai Bot approved these changes Mar 14, 2026

View reviewed changes

Fix Keisei IPA pronunciation

89505a8

TinyKitten self-assigned this Mar 15, 2026

coderabbitai Bot requested changes Mar 15, 2026

View reviewed changes

Comment thread stationapi/src/domain/ipa.rs

Comment thread stationapi/src/use_case/dto/line.rs Outdated

Handle multi-digit IPA tokens

e2cca48

coderabbitai Bot requested changes Mar 15, 2026

View reviewed changes

Comment thread stationapi/src/domain/ipa.rs

TinyKitten added 2 commits March 15, 2026 03:22

Split katakana and roman IPA fields

daa3e40

Update proto submodule for IPA fields

5f80a6d

coderabbitai Bot reviewed Mar 15, 2026

View reviewed changes

coderabbitai Bot approved these changes Mar 15, 2026

View reviewed changes

TinyKitten merged commit 6b28ca7 into dev Mar 15, 2026
11 checks passed

TinyKitten deleted the feature/ipa-audit-and-name-ipa branch March 15, 2026 03:40

coderabbitai Bot mentioned this pull request Mar 15, 2026

Fix IPA generation for compound station names #1430

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve name_ipa generation and add IPA audit#1429

Improve name_ipa generation and add IPA audit#1429
TinyKitten merged 5 commits into
devfrom
feature/ipa-audit-and-name-ipa

TinyKitten commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TinyKitten commented Mar 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TinyKitten commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading