feat: adjustable meditation duration with dynamic HLS fade#9
feat: adjustable meditation duration with dynamic HLS fade#9HatmanStack merged 2 commits intomainfrom
Conversation
Backend: - Add duration_minutes parameter (3, 5, 10, 15, 20 min options) - Dynamic fade detection: stream without fade, then re-process final segments with proper fade timing based on actual voice duration - Fix ffprobe not in Lambda layer - use ffmpeg for duration detection - Update word targets (~150 wpm) with emphasis on pauses/breathing - Update to gpt-4o-mini-tts with sage voice and meditation instructions - Fix duration serialization in MeditationRequest.to_dict() Frontend: - Add duration selector pills (right-justified) - Add "New" button to dismiss current meditation - Fix React hooks called conditionally error - Use useRef for duration to avoid state race condition 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds selectable meditation durations (3,5,10,15,20 minutes) end-to-end: UI, request model, Lambda handling, AI/TTS generation, and HLS audio processing; updates TTS model/voice and audio constants; introduces HLS fade post-processing and duration estimation propagation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant FrontendUI as Frontend UI
participant Hook as useMeditation Hook
participant BackendCall as BackendMeditationCall
participant Lambda as Lambda Handler
participant AIService as AI Service
participant TTSProvider as TTS Provider
participant FFmpegService as FFmpeg Service
participant S3 as S3/HLS
Note over FrontendUI,Hook: User selects duration and triggers generation
User->>FrontendUI: Select duration (e.g., 10)
FrontendUI->>Hook: handleMeditationCall(10)
Hook->>BackendCall: BackendMeditationCallStreaming(..., durationMinutes: 10)
BackendCall->>Lambda: POST payload (duration_minutes: 10)
Note over Lambda,AIService: Meditation text generation includes duration
Lambda->>AIService: generate_meditation(input_data, duration_minutes: 10)
AIService-->>Lambda: meditation_text (targeted to duration)
Note over Lambda,TTSProvider: TTS streaming with estimated duration
Lambda->>TTSProvider: stream_speech(meditation_text)
TTSProvider-->>Lambda: audio stream (chunks), reports estimated_voice_duration
Note over Lambda,FFmpegService: HLS processing with two-phase fade
Lambda->>FFmpegService: process_stream_to_hls(voice_stream, music, ..., estimated_voice_duration)
FFmpegService->>FFmpegService: save voice temp file -> extract actual duration
FFmpegService->>FFmpegService: initial segments uploaded
FFmpegService->>FFmpegService: apply fade to tail segments (post-process)
FFmpegService->>S3: upload faded segments and finalize playlist
FFmpegService-->>Lambda: HLS metadata + segment durations
Lambda-->>BackendCall: streaming response (HLS URL, metadata)
BackendCall-->>Hook: update state with HLS URL and durations
Hook-->>FrontendUI: start playback
FrontendUI->>User: Play meditation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
backend/src/services/ffmpeg_audio_service.py (3)
493-512: Consider consolidating with existingget_audio_durationmethod.The
_get_audio_duration_from_filemethod duplicates the logic inget_audio_duration(lines 70-87). Both parse ffmpeg stderr for duration. The only differences are the timeout parameter and error handling.🔎 Proposed consolidation
- def _get_audio_duration_from_file(self, audio_path: str) -> float: - """Get audio duration using ffmpeg (ffprobe not available in Lambda layer).""" - try: - # Use ffmpeg to probe the file - same approach as get_audio_duration - result = subprocess.run( - [self.ffmpeg_executable, "-i", audio_path, "-f", "null", "-"], - stderr=subprocess.PIPE, - text=True, - timeout=30, - ) - # Parse duration from stderr output - for line in result.stderr.split("\n"): - if "Duration" in line: - duration_str = line.split(",")[0].split("Duration:")[1].strip() - h, m, s = map(float, duration_str.split(":")) - return h * 3600 + m * 60 + s - return 0.0 - except Exception as e: - logger.warning(f"Could not get audio duration: {e}") - return 0.0 + def _get_audio_duration_from_file(self, audio_path: str) -> float: + """Get audio duration using ffmpeg (ffprobe not available in Lambda layer).""" + return self.get_audio_duration(audio_path)Then optionally add timeout support to the existing
get_audio_durationmethod if needed.
556-557: Useround()instead ofint()for delay calculation.Converting float seconds to int milliseconds using
int()truncates the fractional part, which may cause slight timing drift. Usinground()provides more accurate timing.🔎 Proposed fix
f"[0:a]volume={DEFAULT_VOICE_BOOST}dB," - f"adelay={int(max(0, DEFAULT_SILENCE_DURATION - redo_start_time) * 1000)}|{int(max(0, DEFAULT_SILENCE_DURATION - redo_start_time) * 1000)}," + f"adelay={round(max(0, DEFAULT_SILENCE_DURATION - redo_start_time) * 1000)}|{round(max(0, DEFAULT_SILENCE_DURATION - redo_start_time) * 1000)}," f"apad=pad_dur={trailing_music_seconds}[voice_padded];"
604-624: Clarifyestimated_voice_durationparameter usage in docstring.The docstring mentions "apply fade to final segments" but doesn't explain how
estimated_voice_durationis used. This parameter drives the fade timing calculation in_apply_fade_to_segments.🔎 Proposed enhancement
estimated_voice_duration: float = 60.0, ) -> tuple[int, List[float]]: """ Stream TTS to HLS segments, then apply fade to final segments. Strategy: 1. Stream without fade for fast delivery 2. Save TTS audio to temp file while streaming 3. After TTS completes, re-process last few segments with proper fade Args: voice_generator: Iterator yielding audio chunks from TTS music_path: Path to downloaded background music file user_id: User identifier job_id: Job identifier progress_callback: Optional callback(segments_completed, segments_total) - estimated_voice_duration: Fallback if duration detection fails + estimated_voice_duration: Expected voice duration in seconds (used for fade timing if actual duration cannot be detected) Returns: Tuple of (total_segments, segment_durations) """backend/src/models/requests.py (1)
108-115: Consider reducing debug logging verbosity.Lines 109 and 115 have detailed debug logging for
duration_minutesparsing. While useful during development, this level of logging for every request might be excessive in production. Consider moving these tologger.debug()or removing them once the feature stabilizes.🔎 Proposed refinement
duration_minutes = body.get("duration_minutes", 5) - print(f"[MEDITATION_REQUEST] Raw duration_minutes from body: {body.get('duration_minutes')} -> parsed: {duration_minutes}") + logger.debug(f"Parsing duration_minutes: {body.get('duration_minutes')} -> {duration_minutes}") if isinstance(duration_minutes, str): try: duration_minutes = int(duration_minutes) except ValueError: duration_minutes = 5 - print(f"[MEDITATION_REQUEST] Final duration_minutes: {duration_minutes}") + logger.debug(f"Final duration_minutes: {duration_minutes}")frontend/components/ScreenComponents/MeditationControls.tsx (1)
325-348: Consider adding accessibility labels for screen reader support.The duration selector works well visually. For better accessibility, consider adding
accessibilityLabelandaccessibilityRoleto the duration option Pressables.🔎 Suggested accessibility improvement
<Pressable key={option.value} onPress={() => setSelectedDuration(option.value)} style={[ localStyles.durationOption, selectedDuration === option.value && localStyles.durationOptionSelected, ]} testID={`duration-${option.value}`} + accessibilityRole="button" + accessibilityLabel={`${option.value} minute meditation`} + accessibilityState={{ selected: selectedDuration === option.value }} >
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
backend/src/config/constants.pybackend/src/handlers/lambda_handler.pybackend/src/models/requests.pybackend/src/providers/openai_tts.pybackend/src/services/ai_service.pybackend/src/services/ffmpeg_audio_service.pybackend/src/services/gemini_service.pyfrontend/app/(tabs)/explore.tsxfrontend/components/BackendMeditationCall.tsxfrontend/components/ScreenComponents/MeditationControls.tsxtests/frontend/unit/MeditationControls-test.tsx
🧰 Additional context used
🧬 Code graph analysis (8)
frontend/components/BackendMeditationCall.tsx (1)
frontend/types/api.ts (1)
JobStatusResponse(36-47)
frontend/components/ScreenComponents/MeditationControls.tsx (3)
frontend/constants/Colors.ts (1)
Colors(4-3888)frontend/components/ThemedText.tsx (1)
ThemedText(27-116)frontend/components/ThemedView.tsx (1)
ThemedView(17-26)
backend/src/providers/openai_tts.py (2)
backend/src/services/tts_service.py (2)
TTSService(5-14)stream_speech(11-11)backend/src/providers/gemini_tts.py (1)
stream_speech(18-45)
backend/src/services/gemini_service.py (1)
backend/src/utils/logging_utils.py (1)
format(60-82)
backend/src/models/requests.py (1)
backend/src/config/constants.py (1)
InferenceType(4-6)
backend/src/handlers/lambda_handler.py (2)
backend/src/services/ai_service.py (1)
generate_meditation(13-13)backend/src/services/gemini_service.py (1)
generate_meditation(213-245)
backend/src/services/ai_service.py (1)
backend/src/services/gemini_service.py (1)
generate_meditation(213-245)
backend/src/services/ffmpeg_audio_service.py (3)
backend/src/services/audio_service.py (1)
get_audio_duration(8-8)backend/src/services/hls_service.py (2)
upload_segment_from_file(90-112)finalize_playlist(183-194)backend/src/handlers/lambda_handler.py (2)
progress_callback(285-291)progress_callback(341-356)
🔇 Additional comments (16)
tests/frontend/unit/MeditationControls-test.tsx (1)
49-96: LGTM!Test updates correctly reflect the new duration-aware generation flow. The test now verifies that the default duration of 5 minutes is passed to
handleMeditationCall, and the button label change from "Generate Meditation" to "Generate" is properly validated.backend/src/config/constants.py (1)
28-29: Verify audio balance with 10dB music reduction.The music volume reduction changed from -5dB to -15dB (a 10dB decrease), making background music approximately 3x quieter. Combined with the new 3dB voice boost, this creates a much more prominent voice-to-music ratio. Ensure this balance aligns with the desired meditation experience and doesn't make the music too subtle.
Consider testing the audio mix with a few sample meditations to confirm the new balance feels natural.
backend/src/providers/openai_tts.py (2)
12-17: LGTM! Meditation-specific voice instructions enhance TTS quality.The new
MEDITATION_VOICE_INSTRUCTIONSconstant provides clear guidance for pacing, tone, and pause handling, which should significantly improve the meditation audio quality.
30-36: No action needed. The modelgpt-4o-mini-tts-2025-12-15and voicesageare both valid and available in the OpenAI Text-to-Speech API as documented. Thegpt-4o-mini-tts-2025-12-15is an actively maintained snapshot with documented improvements, andsageis a standard built-in voice for the Speech endpoint.backend/src/models/requests.py (1)
37-44: LGTM! Duration validation and defaults are well-implemented.The
duration_minutesfield with validation against allowed values [3, 5, 10, 15, 20] and fallback to 5 provides good guardrails for the meditation duration feature.backend/src/services/ai_service.py (1)
12-13: LGTM! Abstract method signature updated correctly.The addition of the
duration_minutesparameter with a default value of 5 is consistent with the PR objectives and aligns with the implementation inGeminiAIService.frontend/app/(tabs)/explore.tsx (1)
69-69: LGTM!useRefcorrectly avoids state race condition.Using
useReffordurationRefand setting it synchronously before triggeringsetIsCalling(true)(line 177) is the right approach to ensure the duration value is captured correctly before the effect runs. This avoids the race condition that could occur with state updates.Also applies to: 167-179
backend/src/handlers/lambda_handler.py (2)
116-116: LGTM! Duration parameter properly propagated.The
duration_minutesparameter is correctly passed through the meditation generation flow and included in logging for both base64 and HLS streaming paths. This aligns with the PR objectives.Also applies to: 198-207, 305-316
318-322: Verify WPM reduction from ~150 to ~80 words/minute.The TTS duration estimation was updated from ~150 wpm to ~80 wpm, and the buffer increased from 30s to 90s. This change reflects a much slower, more deliberate speaking pace appropriate for meditation. Confirm this aligns with the actual speaking rate of the new "sage" voice with meditation instructions.
You may want to measure the actual speaking rate of a few generated meditations to calibrate this estimate more accurately. An incorrect estimate could result in fade timing being off.
backend/src/services/gemini_service.py (2)
108-152: LGTM! Well-structured duration-aware prompt template.The template correctly uses double braces
{{and}}to escape literal JSON braces within the Python format string. The duration targets (~150 wpm) are reasonable for spoken meditation with pauses.
213-242: Implementation looks correct with good observability.The fallback to 5-minute targets when an unsupported duration is requested is a safe defensive choice. The logging of generated word/char counts against targets will help tune the prompt over time. The model name
gemini-2.0-flashis the correct auto-updated alias for Google's Gemini 2.0 Flash API.frontend/components/BackendMeditationCall.tsx (1)
309-322: LGTM! Clean integration of duration parameter.The parameter follows JavaScript camelCase convention while the payload correctly uses snake_case to match the backend API contract. Default of 5 minutes aligns with the backend default.
frontend/components/ScreenComponents/MeditationControls.tsx (4)
10-19: Good placement of constant outside component.Defining
DURATION_OPTIONSat module scope avoids unnecessary array recreation on each render.
131-132: Correct hook placement to satisfy React rules.Moving this state declaration above the early returns prevents the "React hooks called conditionally" error mentioned in the PR objectives.
210-265: Well-structured streaming mode UI with proper state management.The layout correctly implements the "New" button requirement from PR objectives. The error state with retry and the play/pause toggle are handled cleanly.
353-416: Clean, well-organized styles.Using
StyleSheet.createensures styles are validated and optimized. The spacer pattern withflex: 1effectively centers the main button while right-aligning the duration selector.
Only replace existing segments when applying fade - don't create new segments beyond original total_segments. Log warning if fade processing generates more segments than expected. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
backend/src/services/ffmpeg_audio_service.py (2)
493-512: Eliminate code duplication withget_audio_duration.This method duplicates the logic from
get_audio_duration(lines 70-87). The only differences are the timeout parameter and error handling approach. Consider refactoring to call the existing method or extract the common parsing logic.🔎 Proposed refactor
def _get_audio_duration_from_file(self, audio_path: str) -> float: - """Get audio duration using ffmpeg (ffprobe not available in Lambda layer).""" - try: - # Use ffmpeg to probe the file - same approach as get_audio_duration - result = subprocess.run( - [self.ffmpeg_executable, "-i", audio_path, "-f", "null", "-"], - stderr=subprocess.PIPE, - text=True, - timeout=30, - ) - # Parse duration from stderr output - for line in result.stderr.split("\n"): - if "Duration" in line: - duration_str = line.split(",")[0].split("Duration:")[1].strip() - h, m, s = map(float, duration_str.split(":")) - return h * 3600 + m * 60 + s - return 0.0 - except Exception as e: - logger.warning(f"Could not get audio duration: {e}") - return 0.0 + """Get audio duration using ffmpeg (ffprobe not available in Lambda layer).""" + try: + duration = self.get_audio_duration(audio_path) + return duration if duration > 0 else 0.0 + except Exception as e: + logger.warning(f"Could not get audio duration: {e}") + return 0.0Note: You may want to add a timeout parameter to
get_audio_durationas well for consistency.
551-556: Simplify complex expressions for readability.The nested ternary expressions for the
-ssparameter (line 551) andadelayvalues (line 556) are difficult to read and verify. Consider extracting these calculations into named variables with comments explaining the logic.🔎 Proposed refactor
+ # Calculate voice start offset (account for silence if needed) + voice_start_offset = max(0, redo_start_time - DEFAULT_SILENCE_DURATION) + + # Calculate delay to apply to voice (if starting before silence ends) + voice_delay_ms = int(max(0, DEFAULT_SILENCE_DURATION - redo_start_time) * 1000) + ffmpeg_fade_cmd = [ self.ffmpeg_executable, - "-ss", str(redo_start_time - DEFAULT_SILENCE_DURATION) if redo_start_time > DEFAULT_SILENCE_DURATION else "0", + "-ss", str(voice_start_offset), "-i", voice_temp_path, "-stream_loop", "-1", "-ss", str(redo_start_time), "-i", music_path, "-filter_complex", f"[0:a]volume={DEFAULT_VOICE_BOOST}dB," - f"adelay={int(max(0, DEFAULT_SILENCE_DURATION - redo_start_time) * 1000)}|{int(max(0, DEFAULT_SILENCE_DURATION - redo_start_time) * 1000)}," + f"adelay={voice_delay_ms}|{voice_delay_ms}," f"apad=pad_dur={trailing_music_seconds}[voice_padded];"
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/src/services/ffmpeg_audio_service.py
🧰 Additional context used
🧬 Code graph analysis (1)
backend/src/services/ffmpeg_audio_service.py (3)
backend/src/services/audio_service.py (1)
get_audio_duration(8-8)backend/src/services/hls_service.py (2)
upload_segment_from_file(90-112)finalize_playlist(183-194)backend/src/handlers/lambda_handler.py (2)
progress_callback(285-291)progress_callback(341-356)
🔇 Additional comments (3)
backend/src/services/ffmpeg_audio_service.py (3)
13-17: LGTM!The addition of
DEFAULT_VOICE_BOOSTto the imports is appropriate and aligns with its usage in the audio processing filters.
743-752: Optimize temp file writing on FFmpeg early exit.If FFmpeg exits early (detected at line 746), the code raises an exception but has already written the chunk to
voice_fileat line 752. Consider checking FFmpeg status before writing to the temp file to avoid unnecessary I/O.🔎 Proposed refactor
# Stream voice data to FFmpeg stdin AND save to temp file with open(voice_temp_path, "wb") as voice_file: for chunk in voice_generator: # Check if FFmpeg is still running before writing if process.poll() is not None: stderr = process.stderr.read().decode() logger.error(f"FFmpeg exited early: {stderr}") raise Exception(f"FFmpeg exited unexpectedly: {stderr}") process.stdin.write(chunk) process.stdin.flush() voice_file.write(chunk) # Save for fade processingThe current order is actually fine—both writes happen after the check. No change needed.
777-795: Fade processing and finalization workflow is well-structured.The two-phase approach is sound:
- Stream without fade for fast initial delivery
- Re-process final segments with proper fade based on actual voice duration
- Finalize playlist with updated segment durations
This ensures both responsiveness and proper audio fade-out.
| if segment_index >= total_segments: | ||
| logger.warning(f"Skipping fade segment {i}: index {segment_index} >= total_segments {total_segments}") | ||
| continue | ||
|
|
||
| seg_duration = self.get_audio_duration(fade_segment) | ||
| if seg_duration == 0: | ||
| seg_duration = float(HLS_SEGMENT_DURATION) | ||
|
|
||
| self.hls_service.upload_segment_from_file(user_id, job_id, segment_index, fade_segment) | ||
| logger.info(f"Re-uploaded faded segment {segment_index}") | ||
|
|
||
| # Update duration (only for existing indices) | ||
| if segment_index < len(segment_durations): | ||
| segment_durations[segment_index] = seg_duration |
There was a problem hiding this comment.
Past issue addressed, but verify duration list consistency.
The boundary check at line 579 correctly addresses the previous review concern by skipping segments beyond total_segments. However, there's a subtle issue at line 591: if segment_index < total_segments but segment_index >= len(segment_durations), the segment will be uploaded without its duration being recorded. While segment_durations and total_segments should typically be in sync, defensive coding suggests handling this edge case.
🔎 Proposed fix
for i, fade_segment in enumerate(fade_segments):
segment_index = first_segment_to_redo + i
if segment_index >= total_segments:
logger.warning(f"Skipping fade segment {i}: index {segment_index} >= total_segments {total_segments}")
continue
seg_duration = self.get_audio_duration(fade_segment)
if seg_duration == 0:
seg_duration = float(HLS_SEGMENT_DURATION)
self.hls_service.upload_segment_from_file(user_id, job_id, segment_index, fade_segment)
logger.info(f"Re-uploaded faded segment {segment_index}")
- # Update duration (only for existing indices)
- if segment_index < len(segment_durations):
- segment_durations[segment_index] = seg_duration
+ # Update duration (extend list if needed, but only within total_segments)
+ while len(segment_durations) <= segment_index:
+ segment_durations.append(float(HLS_SEGMENT_DURATION))
+ segment_durations[segment_index] = seg_duration🤖 Prompt for AI Agents
In backend/src/services/ffmpeg_audio_service.py around lines 579 to 592, the
code uploads a faded segment but only writes its duration if segment_index <
len(segment_durations); if segment_durations is shorter than total_segments this
leaves the uploaded segment without a recorded duration. Before assigning
segment_durations[segment_index] = seg_duration, ensure the list is long enough
by extending it (e.g., append default durations or repeat HLS_SEGMENT_DURATION)
up to segment_index, then set the duration; do this defensively so any uploaded
segment always has a corresponding duration entry.
| user_id: str, | ||
| job_id: str, | ||
| progress_callback: Optional[Callable[[int, Optional[int]], None]] = None, | ||
| estimated_voice_duration: float = 60.0, |
There was a problem hiding this comment.
Unused parameter should serve as fallback.
The estimated_voice_duration parameter is never used. If _get_audio_duration_from_file returns 0 (line 778), no fade is applied. Consider using estimated_voice_duration as a fallback when duration detection fails to ensure fade processing always occurs.
🔎 Proposed fix
# Get actual voice duration and apply fade to final segments
actual_voice_duration = self._get_audio_duration_from_file(voice_temp_path)
+ if actual_voice_duration == 0:
+ logger.warning(f"Duration detection failed, using estimate: {estimated_voice_duration}s")
+ actual_voice_duration = estimated_voice_duration
+
if actual_voice_duration > 0:
logger.info(f"Voice duration: {actual_voice_duration:.1f}s, applying fade to final segments")Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/src/services/ffmpeg_audio_service.py around line 606, the parameter
estimated_voice_duration is declared but never used; update the logic where you
call _get_audio_duration_from_file (around line ~778) so that if the detected
duration is 0 or falsy you fall back to estimated_voice_duration (validate it's
> 0), then use that resulting duration for fade calculations and processing;
ensure you clamp/validate the fallback to a sensible minimum and replace the
earlier branch that skipped fades when duration == 0 so fades always apply using
the detected-or-estimated duration.
Backend:
Frontend:
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.