chore(deps): update actions/upload-artifact digest to ea165f8 by renovate[bot] · Pull Request #70 · block/sprout

renovate · 2026-03-16T01:19:21Z

This PR contains the following updates:

Package	Type	Update	Change
actions/upload-artifact (changelog)	action	digest	`6546280` → `ea165f8`

Configuration

📅 Schedule: Branch creation - Between 12:00 AM and 03:59 AM, only on Monday ( * 0-3 * * 1 ) (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

* origin/main: feat: agent users:write scope + system messages in chat (#73) chore(deps): update swatinem/rust-cache digest to e18b497 (#71) chore(deps): update actions/upload-artifact digest to ea165f8 (#70) feat: NIP-50 search, NIP-10 threads, NIP-17 DMs, Sprout DM discovery (#74)

Pocket TTS' FlowLM has an autoregressive cold-start: the first 2-3 generation steps run without audio context in the KV cache, occasionally smearing or dropping the first phoneme of short utterances. Tyler reproduced this on 'I'm happy.' rendering as 'm happy', and on other 'I'm X' constructions across random seeds. The bug is documented upstream as kyutai-labs/pocket-tts #91 (8 comments, 2 collaborators acknowledged) and #70, with collateral discussion at sherpa-onnx #3180. Earlier commits in this branch reduced but did not eliminate the failure: 773a2a1 added 8-space padding; 1dbfa2c restored sherpa's `frames_after_eos` default of 3 (fixing a separate static-burst regression); f570ec0 dropped the leading fade-in. Empirical study at production settings (silence_scale=0, frames_after_eos=3 default) confirmed that temperature, silence_scale, seed, and pad tweaks are all insufficient — the model's stochastic sampling lands on a bad trajectory often enough to be perceptible on short prompts. This commit applies the upstream-documented sacrificial-word workaround (ikidd in kyutai-labs/pocket-tts #70) with two refinements: 1. Sacrificial prefix '. . ' (two periods + space) instead of a word. The pair was empirically the only variant in our probe that produced a usable post-sacrificial silence gap on every random seed in the 8-seed × 8-variant matrix (`sacrificial_probe`, iterated locally during investigation); a single period failed on seed=99999. Periods render as low-amplitude breath rather than spoken audio. 2. Post-synth trim: scan from t=30ms looking for the first run of samples below 0.02 lasting >= 50 ms — that's the sacrificial→main boundary. `Vec::drain` everything before the gap-end. If no gap is found or the boundary lies beyond 1.2 s (production max-drop bound), bail out and emit the raw buffer rather than corrupt the audio. We don't insert a zero lead-in here because tts.rs's existing FIRST_APPEND_LEAD_IN_SAMPLES already provides the OS-device warm-up cushion on the first append of an utterance, and subsequent sentences are buffered by INTER_SENTENCE_SILENCE. Both the prefix and the trim are gated on PreparedPrompt::is_short (<= 4 words after preprocessing, matches upstream's pad_with_spaces_for_short_inputs predicate). Long prompts pass through unchanged: the first phoneme of a long utterance has enough downstream context to avoid the smear, and a natural early pause like the comma in 'Hello, how can I help you?' would otherwise be misdetected by the trimmer as the sacrificial gap (Max caught this in review — thanks). Also: bump TARGET_PEAK in tts.rs from -6 dBFS (0.501) to -3 dBFS (0.708) per Tyler. This is a ceiling on per-sentence loudness normalization, not a floor — quieter Pocket utterances under MAX_GAIN=8 will still land below the ceiling (bench-typical peak 0.076 lands at 0.608, ~-4.3 dBFS). Comment updated to reflect that nuance. Probe data (see examples/prod_probe.rs; production GenerationConfig: silence_scale=0.0, frames_after_eos default 3, max_frames=100 short). Tested 5 prompts × 5 seeds with the new code path: Short prompts ('I'm happy', 'I'm sorry', 'I'm ready', 'Yep', 'I see you') with sacrificial prefix: 25/25 produced a >=50ms silence gap in the 30-340ms range. Trim drops 47-339ms; final audio 270-748ms. Long prompts without sacrificial (regression check): 'Hello, how can I help you today?' and 'Yes, that works. Let me try again.' generate normally; comma pauses preserved. Tyler ear-confirmed the trimmed short-prompt output: > these are much better! I like this! Max reviewed twice — first flagging a silence_scale mismatch between probe (silence_scale=1.0) and production (0.0), then flagging the destructive-edge hazard if trim ran on un-sacrificed long prompts. Both are addressed: prod_probe mirrors production GenerationConfig exactly (silence_scale=0.0, no frames_after_eos override per 1dbfa2c), and the trim is gated on is_short with a 1.2s max-drop bound as belt-and-suspenders against the destructive edge case. Tests added (in pocket.rs): - prepare_prompt_inserts_sacrificial_prefix_only_for_short: pins the exact ordering (pad + '. . ' + cleaned). - prepare_prompt_threshold_is_inclusive_at_four_words extended to assert is_short and SACRIFICIAL_PREFIX absence on long input. - trim_strips_sacrificial_and_keeps_only_speech: feed a synthetic sacrificial+gap+speech buffer; assert leading sample is speech. - trim_is_noop_when_no_long_silence_gap_exists - trim_is_noop_when_gap_is_shorter_than_threshold - trim_is_noop_when_gap_is_beyond_max_drop_bound: guards the destructive-edge case Max flagged. - trim_is_noop_on_buffer_smaller_than_scan_start: no panic. - trim_constants_use_sane_units: pins millisecond meanings. Tests added (in tts.rs): - normalize_for_playback_clamps_at_max_gain_below_target: new behaviour under the -3 dBFS ceiling for bench-typical peaks. - normalize_for_playback_hits_target_on_quiet_buffer updated for new MAX_GAIN saturation point (0.0885) on the input side. All 330 cargo test --lib pass. cargo fmt --check and desktop/scripts/check-file-sizes.mjs are green. pocket.rs cap 620 → 900, tts.rs cap 1335 → 1380. Signed-off-by: Tyler Longwell <tlongwell@squareup.com> Signed-off-by: npub1cc3ha7z055mu0rwwu7806t2wt8mj3pvu0uv5mfp2c50dahaqhczshdalg6 <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co>

chore(deps): update actions/upload-artifact digest to ea165f8

edb52c6

renovate Bot requested a review from wesbillman as a code owner March 16, 2026 01:19

wesbillman approved these changes Mar 16, 2026

View reviewed changes

wesbillman merged commit a0c4d1b into main Mar 16, 2026
8 checks passed

wesbillman deleted the renovate/actions-upload-artifact-digest branch March 16, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): update actions/upload-artifact digest to ea165f8#70

chore(deps): update actions/upload-artifact digest to ea165f8#70
wesbillman merged 1 commit into
mainfrom
renovate/actions-upload-artifact-digest

renovate Bot commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

renovate Bot commented Mar 16, 2026

Configuration

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant