Add katakana to IPA conversion for SSML phoneme tags#1410
Conversation
Rule-based converter that handles: - Palatalized sounds (拗音): キョ→kʲo, チョ→t͡ɕo, ジョ→ʤo, etc. - Moraic nasal (撥音) assimilation: ン→m/ɲ/ŋ/n/ɴ based on context - Geminate consonants (促音): ッ doubles following onset - Long vowels: オウ→oː, オオ→oː, ー→vowelː Replaces 31 hardcoded IPA mappings in Cloud Functions tts.ts with systematic pattern-based generation from katakana input. https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
Points to local branch adding optional name_ipa to Station and Line messages for TTS IPA transcription support. https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
Compute name_ipa from name_katakana using katakana_to_ipa() in: - Station DTO (From<Station> for GrpcStation) - Line DTO (From<Line> for GrpcLine) - StationMinimal construction in route queries The IPA field is optional and only set when the katakana input produces a non-empty transcription. https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughカタカナをIPAへ変換する新しいドメインモジュール Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Interactor
participant IPA_Module as IPA_Module
participant Proto as GrpcResponse
Client->>Interactor: 駅/路線情報要求
Interactor->>IPA_Module: katakana_to_ipa(name_katakana)
IPA_Module-->>Interactor: Option<String> (name_ipa)
Interactor->>Proto: Build GrpcStation/StationMinimal with name_ipa
Proto-->>Client: Response (includes name_ipa)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/proto`:
- Line 1: リモートで解決できないサブモジュールSHAが原因で CI が止まっているので、stationapi/proto
サブモジュールの参照を修正してください:対象のコミットSHAを到達可能なコミットに更新するか、該当コミットをサブモジュールの正しいリモートリポジトリに push
してから、リポジトリの親で git submodule update --init --recursive
を実行して参照が解決されることを確認してください(必要なら .gitmodules の URL を正しいリモートに修正してください)。
In `@stationapi/src/domain/ipa.rs`:
- Around line 283-285: The branch that detects voiced affricates for gemination
uses next_ipa.starts_with("d͡") which doesn't match this module's mapping for
the voiced affricate (dʑ / ʤ); update the condition in the block that checks
next_ipa (the one pushing into output) to test for the actual mapped symbol
(e.g., next_ipa.starts_with("dʑ") or the mapped "ʤ" variant used elsewhere)
instead of "d͡" so the "ッジ" gemination path behaves consistently with the
module's dʑ/ʤ mapping.
- Around line 225-256: The nasal_for_following function misses the palatal "j"
series, causing sequences like "シンヨ..." to yield "ɴj" instead of palatal
assimilation; update the palatal branch in nasal_for_following to include checks
for 'j' (and any common 'j' combos, e.g. starts_with('j') or starts_with("jʲ"))
so that those inputs return "ɲ" like the other palatal targets.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 525e2427-ea7e-4295-a7c9-b3ad4af91f0d
📒 Files selected for processing (6)
stationapi/protostationapi/src/domain.rsstationapi/src/domain/ipa.rsstationapi/src/use_case/dto/line.rsstationapi/src/use_case/dto/station.rsstationapi/src/use_case/interactor/query.rs
There was a problem hiding this comment.
♻️ Duplicate comments (2)
stationapi/src/domain/ipa.rs (2)
224-231:⚠️ Potential issue | 🟠 Major
ンの口蓋同化でj系が未考慮です。Line 224-231 の条件だと
シンヨ...系が...ɴj...になり、口蓋同化されません。j(必要ならçも)を口蓋分岐に含めてください。修正案
fn nasal_for_following(next_ipa: &str) -> &'static str { @@ } else if next_ipa.starts_with('ɲ') || next_ipa.starts_with("dʑ") || next_ipa.starts_with('ʤ') || next_ipa.starts_with('ɕ') + || next_ipa.starts_with('j') + || next_ipa.starts_with('ç') || next_ipa.starts_with("ɡʲ") || next_ipa.starts_with("kʲ") { "ɲ" // palatal assimilation🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/domain/ipa.rs` around lines 224 - 231, The palatal assimilation branch that returns "ɲ" (the conditional checking next_ipa.starts_with(...) in stationapi/src/domain/ipa.rs) currently omits the palatal glide 'j' (and optionally 'ç'), so sequences like シンヨ... become ...ɴj... instead of assimilating; update that conditional (the block inspecting next_ipa) to also check for next_ipa.starts_with('j') (and add 'ç' if desired) so those cases are treated as palatal and return "ɲ".
273-277:⚠️ Potential issue | 🟠 Major有声破擦音の促音判定が現行マッピングと不一致です。
Line 275 の
starts_with("d͡")は、このファイル内のジ系表記(dʑ/ʤ)に一致しません。ッジ系の重子音化が意図どおり動かないため、条件を揃えるべきです。修正案
- if next_ipa.starts_with("t͡ɕ") - || next_ipa.starts_with("t͡s") - || next_ipa.starts_with("d͡") - { + if next_ipa.starts_with("t͡ɕ") + || next_ipa.starts_with("t͡s") + { output.push('t'); + } else if next_ipa.starts_with("dʑ") || next_ipa.starts_with('ʤ') { + output.push('d'); } else { let (onset, _) = split_onset(next_ipa);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/domain/ipa.rs` around lines 273 - 277, The voicing-affricate check uses next_ipa.starts_with("d͡") which doesn't match the file's "ジ" mappings (dʑ / ʤ), so update the condition in the block that currently contains next_ipa.starts_with("t͡ɕ") || next_ipa.starts_with("t͡s") || next_ipa.starts_with("d͡") to instead check the actual mapped symbols, e.g. next_ipa.starts_with("dʑ") || next_ipa.starts_with("ʤ") (keep the t͡ɕ and t͡s checks as-is) so the sokuon handling that pushes 't' behaves consistently with the rest of the mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 224-231: The palatal assimilation branch that returns "ɲ" (the
conditional checking next_ipa.starts_with(...) in stationapi/src/domain/ipa.rs)
currently omits the palatal glide 'j' (and optionally 'ç'), so sequences like
シンヨ... become ...ɴj... instead of assimilating; update that conditional (the
block inspecting next_ipa) to also check for next_ipa.starts_with('j') (and add
'ç' if desired) so those cases are treated as palatal and return "ɲ".
- Around line 273-277: The voicing-affricate check uses
next_ipa.starts_with("d͡") which doesn't match the file's "ジ" mappings (dʑ / ʤ),
so update the condition in the block that currently contains
next_ipa.starts_with("t͡ɕ") || next_ipa.starts_with("t͡s") ||
next_ipa.starts_with("d͡") to instead check the actual mapped symbols, e.g.
next_ipa.starts_with("dʑ") || next_ipa.starts_with("ʤ") (keep the t͡ɕ and t͡s
checks as-is) so the sokuon handling that pushes 't' behaves consistently with
the rest of the mapping.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8e028ae0-d79e-4339-bbb2-eaf540a073dd
📒 Files selected for processing (4)
stationapi/src/domain/ipa.rsstationapi/src/use_case/dto/line.rsstationapi/src/use_case/dto/station.rsstationapi/src/use_case/interactor/query.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- stationapi/src/use_case/dto/line.rs
- stationapi/src/use_case/interactor/query.rs
…icate gemination - Add 'j' and 'ç' to palatal assimilation check in nasal_for_following so ン before ヤ行/ヒ行 correctly produces ɲ (e.g. シンヨコハマ → ɕiɲjokohama) - Fix sokuon (ッ) handling for voiced affricates: replace dead d͡ check with actual mapped symbols dʑ/ʤ, geminating with 'd' instead of 't'
There was a problem hiding this comment.
🧹 Nitpick comments (1)
stationapi/src/domain/ipa.rs (1)
271-285: 有声破擦音の促音化ロジックに対する回帰テストを追加してくださいLine 277-278 で
dʑ/ʤ分岐を修正していますが、対応するテスト(ッジ/ッジュ系)が未追加です。今回の修正点に直結するケースを最低1件入れておくと再発防止になります。🔧 追加例
#[cfg(test)] mod tests { use super::*; @@ fn test_shin_yokohama() { // ン before ヨ(j) → ɲ (palatal assimilation) assert_eq!(katakana_to_ipa("シンヨコハマ"), "ɕiɲjokohama"); } + + #[test] + fn test_geminate_voiced_affricate_ji() { + // ッ + ジ(ʤi) は stop 部分のみ重子音化して dʤi + assert_eq!(katakana_to_ipa("エッジ"), "edʤi"); + } #[test] fn test_empty() { assert_eq!(katakana_to_ipa(""), ""); }Also applies to: 345-550
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/domain/ipa.rs` around lines 271 - 285, The geminate-handling branch for Phoneme::Geminate special-cases voiced affricates (the dʑ/ʤ branch in the Phoneme::Geminate block that uses find_next_regular and split_onset) but there's no regression test for the corrected behavior (e.g., geminated 'ッジ' / 'ッジュ' cases); add at least one unit test that constructs an input IPA sequence containing a geminate before dʑ or ʤ, invokes the code path that calls Phoneme::Geminate (so find_next_regular and split_onset are exercised), and asserts the output contains the expected doubled onset ('d' emitted) to prevent regressions. Ensure the test targets the same logic area (the Phoneme::Geminate handling) and includes both 'ッジ' and 'ッジュ' variants or at minimum one representative case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 271-285: The geminate-handling branch for Phoneme::Geminate
special-cases voiced affricates (the dʑ/ʤ branch in the Phoneme::Geminate block
that uses find_next_regular and split_onset) but there's no regression test for
the corrected behavior (e.g., geminated 'ッジ' / 'ッジュ' cases); add at least one
unit test that constructs an input IPA sequence containing a geminate before dʑ
or ʤ, invokes the code path that calls Phoneme::Geminate (so find_next_regular
and split_onset are exercised), and asserts the output contains the expected
doubled onset ('d' emitted) to prevent regressions. Ensure the test targets the
same logic area (the Phoneme::Geminate handling) and includes both 'ッジ' and
'ッジュ' variants or at minimum one representative case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5ac8bc05-d568-41d3-95ab-cb0bcb635dae
📒 Files selected for processing (2)
stationapi/protostationapi/src/domain/ipa.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- stationapi/proto
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 19-23: The current code silently skips unknown characters (calling
lookup_single and only pushing on Some, using result.push) which yields
partially broken IPA; modify the converter (the function that iterates chars and
calls lookup_single) to detect unknown characters and propagate that by either
returning Option/Result (e.g., return None or Err on any lookup_single == None)
or by pushing an explicit None token into the result and exposing a detection
flag; update the loop that uses lookup_single and result.push (and the similar
block at lines 27-28) so unknown characters are not silently dropped but instead
cause a fallback/notification path to be returned to the caller.
- Around line 271-283: The geminate handling in Phoneme::Geminate currently
appends the entire onset from split_onset(next_ipa), which causes palatalized
onsets like "kʲ" to be duplicated as "kʲkʲ"; change the logic in the
Phoneme::Geminate branch (the block using find_next_regular, next_ipa,
split_onset, and output) so that instead of output.push_str(onset) you compute
and append only the base consonant (e.g., the first base letter of onset with
diacritics/secondary symbols removed) — for example derive a base_consonant from
onset by stripping palatalization markers like 'ʲ' and any combining/modifier
characters and then call output.push_str(&base_consonant); keep the existing
special-case handling for affricates (t͡ɕ, t͡s, dʑ, ʤ) intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 810d9e49-da0e-46dc-a6b8-42f103c8247f
📒 Files selected for processing (1)
stationapi/src/domain/ipa.rs
… onsets - Change katakana_to_ipa to return Option<String>, returning None when input contains characters that cannot be converted instead of silently dropping them. - Fix geminate (ッ) handling for palatalized onsets: only duplicate the base consonant (e.g., ッキョ → kkʲo) instead of the full onset including palatalization marker (was producing kʲkʲo). - Update all call sites to use the new Option return type. https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 68-71: The IPA mapping for the ジ行拗音 is inconsistent: ('ジ','ャ') =>
"dʑa" and ('ジ','ュ') => "dʑɯ" use dʑ but ('ジ','ョ') => "ʤo" uses ʤ; update the
('ジ','ョ') mapping to use "dʑo" so all three entries ('ジ','ャ'), ('ジ','ュ'),
('ジ','ョ') consistently use dʑ in the IPA mapping in ipa.rs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 230e3589-fe3e-43c1-8288-eb1be85512a6
📒 Files selected for processing (4)
stationapi/src/domain/ipa.rsstationapi/src/use_case/dto/line.rsstationapi/src/use_case/dto/station.rsstationapi/src/use_case/interactor/query.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- stationapi/src/use_case/dto/station.rs
- stationapi/src/use_case/dto/line.rs
('ジ','ョ') was using "ʤo" while ('ジ','ャ') and ('ジ','ュ') used "dʑ".
Updated to "dʑo" for consistency.
https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
@coderabbitai 残指摘を教えてください |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 58047
✅ 対応済みの指摘(全6件)
|
|
@coderabbitai 後出しジャンケンが多すぎますよ。まとめて指摘するように善処してください。 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 194-195: コメントと実装が不一致しているため、関数 split_onset
のドキュメントを実装に合わせて書き直してください:関数は常に (&str, &str) を返すので「vowel 開始時は
None」と書かれている説明を削除し、母音開始時は onset が空文字列 ("") で remainder
は元の文字列を返す、という旨に変更してください(関数名 split_onset を参照して対象箇所を修正)。
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ffeb2790-65dd-4581-8950-a6837d4e0af0
📒 Files selected for processing (1)
stationapi/src/domain/ipa.rs
|
Tip For best results, initiate chat on the files or code changes.
今後は、レビュー時にすべての指摘をまとめて一度に提示するよう努めます。後から追加で指摘することのないよう、初回レビューで網羅的に確認するよう改善いたします。 ✏️ Learnings added
|
The comment said "or None if starts with a vowel" but the function returns (&str, &str), with an empty onset when the string starts with a vowel. https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
stationapi/src/domain/ipa.rs (1)
561-571:ッジ系は語中の回帰テストを1件追加すると安心です現状の
カッジ/カッジュに加えて、エッジのような語中ケースも固定しておくと今後の回帰を防ぎやすいです。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stationapi/src/domain/ipa.rs` around lines 561 - 571, Add a regression test for geminate "ッジ" in medial position by extending the test suite (near test_geminate_ji and test_geminate_ju) to include a case like ipa("エッジ") and assert the expected output (e.g., "edʤi"); update the test function names or add a new test function (e.g., test_geminate_ji_medial) so the ipa(...) behavior for in-word gemination is covered and protected from future regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 289-293: 現在の分岐は onset が複数子音(例: "dz")のとき
strip_secondary_articulation(onset) で得た base を丸ごと output.push_str してしまい "dzdz…"
のようになるので、base の先頭の基底子音のみを重ねるよう修正してください — 具体的には現在の output.push_str(&base)
を、strip_secondary_articulation(onset) の結果から先頭の文字だけを取り出してそれを push する処理に置き換え(たとえば
base.chars().next() を使って先頭文字を取得してから push
する)、onset/split_onset/strip_secondary_articulation の既存変数を活かす形で実装してください。
---
Nitpick comments:
In `@stationapi/src/domain/ipa.rs`:
- Around line 561-571: Add a regression test for geminate "ッジ" in medial
position by extending the test suite (near test_geminate_ji and
test_geminate_ju) to include a case like ipa("エッジ") and assert the expected
output (e.g., "edʤi"); update the test function names or add a new test function
(e.g., test_geminate_ji_medial) so the ipa(...) behavior for in-word gemination
is covered and protected from future regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1345d0ab-a378-42fd-bdf5-37424e0a1e0a
📒 Files selected for processing (1)
stationapi/src/domain/ipa.rs
For onsets like "dz", strip_secondary_articulation returns the full string, causing "dzdz..." duplication. Now only the leading character of the base is pushed, matching the intended gemination behavior. https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
Summary
This PR adds a new katakana-to-IPA (International Phonetic Alphabet) conversion module to support generating phonetic transcriptions for Japanese station and line names. These IPA strings are used in SSML
<phoneme>tags for text-to-speech pronunciation.Key Changes
New IPA conversion module (
stationapi/src/domain/ipa.rs):katakana_to_ipa()function that converts Japanese katakana text to IPA transcriptionIntegration with DTOs:
station.rsto generate IPA transcription forstation_name_k(katakana station name)line.rsto generate IPA transcription forline_name_k(katakana line name)query.rsinteractor to include IPA forStationMinimalresponsesProto updates: Updated proto submodule to include new IPA fields in gRPC message definitions
Implementation Details
https://claude.ai/code/session_01Pjo9E2fzdLZEkvNqxXAPeQ
Summary by CodeRabbit
新機能
テスト
雑務