Skip to content

Improve name_ipa generation and add IPA audit#1429

Merged
TinyKitten merged 5 commits into
devfrom
feature/ipa-audit-and-name-ipa
Mar 15, 2026
Merged

Improve name_ipa generation and add IPA audit#1429
TinyKitten merged 5 commits into
devfrom
feature/ipa-audit-and-name-ipa

Conversation

@TinyKitten

@TinyKitten TinyKitten commented Mar 14, 2026

Copy link
Copy Markdown
Member

Summary

  • generate station name_ipa from official romanized/English names when available, with katakana fallback
  • expand the IPA lexicon and romaji handling to cover mixed English/Japanese station, line, and train-type names
  • add a read-only make ipa-audit report for lines, stations, and train_types without changing data_validator failure behavior

Verification

  • rustfmt stationapi/src/domain/ipa.rs tools/ipa_audit.rs
  • make ipa-audit
  • rustc --test /tmp/ipa_smoke.rs -o /tmp/ipa_smoke && /tmp/ipa_smoke --nocapture test_station_name_ipa_supports_english_and_digits test_station_name_ipa_supports_mixed_english_facility_words test_station_name_ipa_supports_common_line_words test_station_name_ipa_supports_bilabial_m_in_romaji test_station_name_ipa_supports_line_related_english_words test_station_name_ipa_supports_train_type_words

Summary by CodeRabbit

  • 新機能

    • 駅名・種別名のIPA出力を強化:ローマ字表記を優先しカタカナへフォールバックするIPA表記を追加。API応答にローマ字由来のIPAフィールドが含まれます。
    • IPA監査ツールとビルドターゲット(ipa-audit)を追加し、未解決トークンと例をレポート。
  • ドキュメント

    • IPA監査の利用方法と出力内容を追記。
  • テスト

    • 新しい変換挙動を検証する多数のテストを追加。

@github-actions github-actions Bot added the feature 要望対応や課題解決 label Mar 14, 2026
@coderabbitai

coderabbitai Bot commented Mar 14, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

駅名IPA変換にローマ字入力対応とトークン化処理を追加し、CSVを走査して未解決トークンを集計する読み取り専用のIPA監査ツールとMakefileターゲットを導入した。

Changes

Cohort / File(s) Summary
Documentation
AGENTS.md
make ipa-auditで実行する「IPA coverage audit」項目を追加(読み取り専用の監査説明)。
Build / CI
Makefile
.PHONYipa-auditを追加。Rustツールをコンパイルして実行するipa-auditターゲットとhelp出力を追加。
IPA Conversion Logic
stationapi/src/domain/ipa.rs
公開関数station_name_to_ipa()およびromanized_name_to_ipaword_to_ipa等のトークン化・正規化・英語辞書・数値→IPA・romaji→片仮名変換補助群と多数のテストを追加。
DTO / Use-case updates
stationapi/src/use_case/dto/station.rs, stationapi/src/use_case/dto/line.rs, stationapi/src/use_case/dto/train_type.rs, stationapi/src/use_case/interactor/query.rs
DTO組み立てにname_roman_ipaを追加し、station_name_to_ipaを利用してローマ字優先のIPA情報を出力するよう変更(関連テスト追加/更新)。
Audit tool
tools/ipa_audit.rs
data/2!lines.csv,data/3!stations.csv,data/4!types.csvを解析し、romanized_name_to_ipa/word_to_ipaで解決できないトークンを集計・例示するCLIツールを追加。カスタムCSVパーサとトークン統計を含む。
Proto submodule
stationapi/proto
submodule参照コミット更新(API変更は検出されず)。

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Dev as Makefile / Dev
    participant Tool as ipa_audit (tools/ipa_audit.rs)
    participant Domain as stationapi::domain::ipa
    participant Data as CSV files (data/2!,3!,4!)
    participant Out as Stdout

    Dev->>Tool: make ipa-audit(コンパイル & 実行)
    Tool->>Data: CSV を開いて行を読む
    loop 各名称ごと
      Tool->>Domain: romanized_name_to_ipa(name_roman) / word_to_ipa(token)
      Domain-->>Tool: IPA 文字列または None
      alt 未解決トークン
        Tool->>Tool: トークン集計と例を記録
      end
    end
    Tool->>Out: データセット別集計と未解決トップトークンを出力
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • 10mocy

Poem

🐰 ローマ字ぴょんと跳ねて来た、
名前を数えて音を探すよ、
未解決は集めて見せるだけ、
CSVの林をぴょんぴょん進む、
駅名の歌、少しずつ整うよ。

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main changes: improving IPA generation from romanized names and adding an audit tool for IPA coverage across datasets.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/ipa-audit-and-name-ipa
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
stationapi/src/domain/ipa.rs (1)

452-466: number_to_ipa は単一桁のみをサポート

現在の実装では単一桁(0-9)のみがIPAに変換されます。「Terminal 12」のような複数桁の数字は変換できません。

現状の使用ケースで問題がなければそのままで構いませんが、将来的に複数桁のサポートが必要になる可能性があることを認識しておいてください。

🔧 複数桁サポートの参考実装
fn number_to_ipa(word: &str) -> Option<String> {
    const DIGITS: &[&str] = &[
        "zɪɹoʊ", "wʌn", "tuː", "θɹiː", "fɔɹ",
        "faɪv", "sɪks", "sɛvən", "eɪt", "naɪn",
    ];
    
    let mut result = Vec::new();
    for c in word.chars() {
        let digit = c.to_digit(10)? as usize;
        result.push(DIGITS[digit]);
    }
    Some(result.join(" "))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/domain/ipa.rs` around lines 452 - 466, The current
number_to_ipa function only handles single digits and returns Option<&'static
str>; change number_to_ipa to accept &str and return Option<String>, build a
const DIGITS array of IPA strings for 0–9, iterate word.chars(), use
c.to_digit(10)? to map each char to an index, collect the corresponding DIGITS
entries and join them with spaces into a single String, and update any callers
of number_to_ipa to handle Option<String> instead of Option<&'static str>.
tools/ipa_audit.rs (1)

137-163: カスタムCSVパーサーの制限事項

このCSVパーサーはクォート内の改行を処理しません。CSVデータにクォートされたフィールド内で改行が含まれている場合、正しくパースされない可能性があります。

現在のデータセット(lines.csvstations.csvtypes.csv)にそのようなデータが存在しないことを確認するか、必要に応じて csv クレートの使用を検討してください。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/ipa_audit.rs` around lines 137 - 163, The custom CSV parser function
parse_csv_line does not handle quoted fields that contain embedded newlines, so
multiline quoted fields will be parsed incorrectly; fix by either replacing
parse_csv_line and its callers with the Rust csv crate (use csv::Reader to read
records from the files lines.csv, stations.csv, types.csv so quoting/newlines
are handled correctly) or, if you must keep the custom parser, modify the
reading logic that feeds parse_csv_line to accumulate input lines until the
quote state is balanced (i.e., keep appending subsequent physical lines while
the number of unclosed quotes detected by parse_csv_line’s state machine is odd)
before calling parse_csv_line.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 452-466: The current number_to_ipa function only handles single
digits and returns Option<&'static str>; change number_to_ipa to accept &str and
return Option<String>, build a const DIGITS array of IPA strings for 0–9,
iterate word.chars(), use c.to_digit(10)? to map each char to an index, collect
the corresponding DIGITS entries and join them with spaces into a single String,
and update any callers of number_to_ipa to handle Option<String> instead of
Option<&'static str>.

In `@tools/ipa_audit.rs`:
- Around line 137-163: The custom CSV parser function parse_csv_line does not
handle quoted fields that contain embedded newlines, so multiline quoted fields
will be parsed incorrectly; fix by either replacing parse_csv_line and its
callers with the Rust csv crate (use csv::Reader to read records from the files
lines.csv, stations.csv, types.csv so quoting/newlines are handled correctly)
or, if you must keep the custom parser, modify the reading logic that feeds
parse_csv_line to accumulate input lines until the quote state is balanced
(i.e., keep appending subsequent physical lines while the number of unclosed
quotes detected by parse_csv_line’s state machine is odd) before calling
parse_csv_line.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74699f56-a88b-49ac-9267-ade76d065cc8

📥 Commits

Reviewing files that changed from the base of the PR and between c520850 and 431377b.

📒 Files selected for processing (6)
  • AGENTS.md
  • Makefile
  • stationapi/src/domain/ipa.rs
  • stationapi/src/use_case/dto/station.rs
  • stationapi/src/use_case/interactor/query.rs
  • tools/ipa_audit.rs

@TinyKitten TinyKitten self-assigned this Mar 15, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 128-130: word_to_ipa currently delegates numeric tokens to
number_to_ipa but only handles single-digit results, so multi-digit strings like
"10" return None and cause a full romanization fallback; change the numeric
branch in word_to_ipa to: call number_to_ipa(&normalized) first and if it
returns None, iterate normalized.chars(), convert each digit separately via
number_to_ipa(&digit.to_string()), collect and concatenate the per-digit IPA
results into a single String (return Some when all digits succeed, otherwise
return None). Apply the same change to the other numeric-handling site that
mirrors this logic (the later block around the second numeric-handling section).

In `@stationapi/src/use_case/dto/line.rs`:
- Around line 11-14: The computed name_ipa can become Some("") which should be
normalized to None; update the logic around station_name_to_ipa /
replace_line_name_suffix / katakana_to_ipa so any empty-string result is
converted to None (e.g., after either branch, filter out empty results or use
and_then to return None when the produced IPA is an empty string), referencing
the name_ipa binding and inputs line.line_name_r and line.line_name_k to ensure
downstream code always sees None instead of Some("").

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f7cdd303-5e3d-4227-abf4-9b599f5c4acb

📥 Commits

Reviewing files that changed from the base of the PR and between 431377b and 89505a8.

📒 Files selected for processing (2)
  • stationapi/src/domain/ipa.rs
  • stationapi/src/use_case/dto/line.rs

Comment thread stationapi/src/domain/ipa.rs
Comment thread stationapi/src/use_case/dto/line.rs Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
stationapi/src/use_case/dto/line.rs (1)

11-16: ⚠️ Potential issue | 🟡 Minor

空白だけの name_ipa はまだ None に正規化されません。

Line 16 は空文字だけを除外しているので、line_name_k が空白のみのケースでは katakana_to_ipa の透過スペースが Some(" ") のまま残ります。最終結果を trim() してから判定した方が downstream で扱いやすいです。

修正案
         let name_ipa = station_name_to_ipa("", line.line_name_r.as_deref())
             .or_else(|| {
                 let (stem, suffix_ipa) = replace_line_name_suffix(&line.line_name_k);
                 katakana_to_ipa(stem).map(|ipa| format!("{ipa}{suffix_ipa}"))
             })
-            .filter(|ipa| !ipa.is_empty());
+            .map(|ipa| ipa.trim().to_string())
+            .filter(|ipa| !ipa.is_empty());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/use_case/dto/line.rs` around lines 11 - 16, name_ipa may end
up Some(" ") when line_name_k is only whitespace; change the pipeline around
station_name_to_ipa / replace_line_name_suffix / katakana_to_ipa so that the
produced IPA string is trimmed before empty check—e.g. after mapping to
format!("{ipa}{suffix_ipa}") or after katakana_to_ipa, call trim() (or map to a
trimmed String) and then filter out empty strings; update the chain that builds
name_ipa (variables/functions: name_ipa, station_name_to_ipa,
replace_line_name_suffix, katakana_to_ipa) to normalize whitespace to None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 118-141: The word_to_ipa function misses mixed alphanumeric tokens
(e.g., "B快速") because the digits-only branch only handles pure digits and
everything else falls through to romaji_to_katakana; add a branch in word_to_ipa
that detects when normalized contains both ASCII letters and digits (or
interleaved digit runs) and then process the token by splitting it into runs:
for digit runs call number_to_ipa (per-run or per-digit) and append their IPA;
for ASCII letter runs call romaji_to_katakana and then katakana_to_ipa and
append that result; leave non-ASCII/non-alnum runs to the existing
romaji_to_katakana fallback. Reference: word_to_ipa, normalize_name_token,
number_to_ipa, romaji_to_katakana, katakana_to_ipa.

---

Duplicate comments:
In `@stationapi/src/use_case/dto/line.rs`:
- Around line 11-16: name_ipa may end up Some(" ") when line_name_k is only
whitespace; change the pipeline around station_name_to_ipa /
replace_line_name_suffix / katakana_to_ipa so that the produced IPA string is
trimmed before empty check—e.g. after mapping to format!("{ipa}{suffix_ipa}") or
after katakana_to_ipa, call trim() (or map to a trimmed String) and then filter
out empty strings; update the chain that builds name_ipa (variables/functions:
name_ipa, station_name_to_ipa, replace_line_name_suffix, katakana_to_ipa) to
normalize whitespace to None.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e891420-5e9f-4bee-b488-d72e2f31c1de

📥 Commits

Reviewing files that changed from the base of the PR and between 89505a8 and e2cca48.

📒 Files selected for processing (2)
  • stationapi/src/domain/ipa.rs
  • stationapi/src/use_case/dto/line.rs

Comment thread stationapi/src/domain/ipa.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
stationapi/src/use_case/dto/line.rs (1)

16-17: 冗長なフィルターがありますが、動作は正しいです。

station_name_to_ipa は内部で既に .filter(|ipa| !ipa.is_empty()) を適用しています(ipa.rs:72 参照)。そのため、Line 17 の外側の .filter(|ipa| !ipa.is_empty()) は技術的には冗長です。

ただし、防御的プログラミングとして残しておくことも合理的なので、必須の修正ではありません。

♻️ オプション: 冗長なフィルターを削除
-        let name_roman_ipa =
-            station_name_to_ipa("", line.line_name_r.as_deref()).filter(|ipa| !ipa.is_empty());
+        let name_roman_ipa = station_name_to_ipa("", line.line_name_r.as_deref());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stationapi/src/use_case/dto/line.rs` around lines 16 - 17, The assignment to
name_roman_ipa applies .filter(|ipa| !ipa.is_empty()) twice; remove the outer
filter so that name_roman_ipa is set directly from station_name_to_ipa("",
line.line_name_r.as_deref()), since station_name_to_ipa already filters empty
strings; update the expression that constructs name_roman_ipa (referencing
station_name_to_ipa and line.line_name_r) to omit the redundant .filter call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@stationapi/src/use_case/dto/line.rs`:
- Around line 16-17: The assignment to name_roman_ipa applies .filter(|ipa|
!ipa.is_empty()) twice; remove the outer filter so that name_roman_ipa is set
directly from station_name_to_ipa("", line.line_name_r.as_deref()), since
station_name_to_ipa already filters empty strings; update the expression that
constructs name_roman_ipa (referencing station_name_to_ipa and line.line_name_r)
to omit the redundant .filter call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: effd7ae3-e612-4d12-bf89-d0180c904399

📥 Commits

Reviewing files that changed from the base of the PR and between e2cca48 and 5f80a6d.

📒 Files selected for processing (5)
  • stationapi/proto
  • stationapi/src/use_case/dto/line.rs
  • stationapi/src/use_case/dto/station.rs
  • stationapi/src/use_case/dto/train_type.rs
  • stationapi/src/use_case/interactor/query.rs
✅ Files skipped from review due to trivial changes (1)
  • stationapi/proto
🚧 Files skipped from review as they are similar to previous changes (2)
  • stationapi/src/use_case/interactor/query.rs
  • stationapi/src/use_case/dto/station.rs

@TinyKitten TinyKitten merged commit 6b28ca7 into dev Mar 15, 2026
11 checks passed
@TinyKitten TinyKitten deleted the feature/ipa-audit-and-name-ipa branch March 15, 2026 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature 要望対応や課題解決

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant