feat(platform): audio transcription for chat uploads by larryro · Pull Request #1591 · tale-project/tale

larryro · 2026-04-21T06:21:30Z

Summary

Server-side ffmpeg pipeline: compress (silenceremove + 32 kbps Opus mono 16 kHz) → chunk <24 MB → sequential OpenAI-compatible /audio/transcriptions → concat. Handles up to 4 hours of audio.
Transcripts inline into the LLM prompt (process_attachments) and index to RAG for cross-chat search. Usage recorded to the existing usageLedger via a new centsPerAudioMinute cost model.
Provider-file system extended with transcription tag + provider default. Self-hosted Whisper (faster-whisper-server / LocalAI / vLLM) swaps in by changing baseUrl only.
Client: 4h duration cap via HTMLAudioElement metadata; send-gate blocks message until transcription finishes; user X on attachment cancels pending retries; watchdog cron recovers stuck running rows.

Test plan

Rebuild Convex container (Dockerfile adds ffmpeg); confirm which ffmpeg inside
Add examples/providers/openai.json + SOPS secret; verify UI accepts transcription tag
Upload ~5 MB .m4a: status flows queued → running (transcribing) → completed; LLM references content
Upload ~2h recording: chunked path fires, chip shows "transcribing chunk 2 of 3"
Upload >4h file: client rejects with clear toast before any network
Remove audio attachment mid-transcription: next retry logs transcription.cancelled, no further work
Verify ledger row appears in /metrics/usage with audioDurationSec populated
Swap baseUrl to local faster-whisper-server; end-to-end works with zero code change

Summary by CodeRabbit

New Features
- Added audio file upload and automatic transcription to chat
- Transcription status tracking with visual indicators (in progress, completed, failed)
- Skip and retry controls for transcription workflows
- OpenAI Whisper transcription provider support
- Audio transcript integration into chat context
Chores
- Added FFmpeg support for audio processing

Audio files uploaded to chat are server-side compressed via ffmpeg (silence removal + 32 kbps Opus mono 16 kHz), chunked into <24 MB pieces if needed, transcribed via an OpenAI-compatible Whisper provider, and inlined into the LLM prompt. Transcripts are also indexed to RAG for cross-chat search. Usage flows into the existing ledger with a new per-minute cost model; self-hosted Whisper swaps in by changing baseUrl only. 4-hour duration cap enforced client-side; watchdog cron recovers any stuck 'running' rows; user removal/cancel stops pending retries.

coderabbitai · 2026-04-21T06:33:05Z

📝 Walkthrough

Walkthrough

This PR introduces audio transcription capabilities to the platform. It adds OpenAI's Whisper-1 as a configurable transcription provider, implements server-side audio preprocessing (compression, chunking via ffmpeg), and integrates transcription state management throughout the chat interface. The backend orchestrates transcription via Convex actions with OpenAI API integration, error recovery, retry logic, and cost tracking per audio duration. Client-side components display transcription status, validate audio file constraints (duration/size), and allow users to skip or retry transcriptions. Schema updates track transcription lifecycle (queued/running/completed/failed/skipped), and localization strings support EN/FR/DE interfaces.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

refactor(platform): convert provider dialogs to sidepanels #1410: Modifies the same provider settings panel components (provider-add-panel.tsx, provider-edit-panel.tsx) to manage model tags and UI configuration.
feat(platform): configurable file retention policies #1391: Updates the core file-metadata save flow (saveFileMetadata in mutations/internal_mutations) and fileMetadata schema that this PR extends with transcription state fields.
feat(platform): improve chat UX with persisted drafts and error handling #502: Modifies the same chat UI components (chat-input.tsx, chat-interface.tsx, file-displays.tsx) and file upload hook (use-convex-file-upload.ts) that handle attachment state and validation.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(platform): audio transcription for chat uploads' clearly summarizes the main change: adding audio transcription functionality to chat file uploads on the platform.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/chat-audio-transcription

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

services/platform/app/features/settings/providers/components/provider-edit-panel.tsx (1)
95-103: ⚠️ Potential issue | 🟠 Major

isDirty misses the new transcription default, so Save can stay disabled.

After adding transcription to the defaults UI (Line 178), the dirty-state check still stops at embedding (Line 100-103). If a user only changes transcription, the form won’t be considered dirty and cannot be saved.
💡 Minimal fix
   const isDirty =
     data?.ok &&
     (form.displayName !== data.config.displayName ||
       form.description !== (data.config.description ?? '') ||
       form.baseUrl !== data.config.baseUrl ||
       form.defaults.chat !== (data.config.defaults?.chat ?? NONE_VALUE) ||
       form.defaults.vision !== (data.config.defaults?.vision ?? NONE_VALUE) ||
       form.defaults.embedding !==
-        (data.config.defaults?.embedding ?? NONE_VALUE));
+        (data.config.defaults?.embedding ?? NONE_VALUE) ||
+      form.defaults.transcription !==
+        (data.config.defaults?.transcription ?? NONE_VALUE));
Also applies to: 178-180
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@services/platform/app/features/settings/providers/components/provider-edit-panel.tsx`
around lines 95 - 103, The isDirty computation in provider-edit-panel.tsx omits
the new transcription default, so changes to form.defaults.transcription won't
mark the form dirty; update the isDirty boolean to include a comparison of
form.defaults.transcription !== (data.config.defaults?.transcription ??
NONE_VALUE) along with the other defaults (chat, vision, embedding) and ensure
any other dirty checks that mirror this logic (e.g., the save-enabling checks
around the defaults UI) are updated similarly to reference
form.defaults.transcription and data.config.defaults?.transcription and use
NONE_VALUE for the fallback.
services/platform/convex/lib/attachments/process_attachments.ts (1)
8-13: ⚠️ Potential issue | 🟠 Major

Update the file-types mock before merging.

The Unit job is already failing because convex/lib/attachments/__tests__/process_attachments.test.ts mocks ../../../../lib/shared/file-types without an isAudio export. This import change needs a matching test update or the server suite stays red.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/lib/attachments/process_attachments.ts` around lines
8 - 13, The test mock for '../../../../lib/shared/file-types' is missing the
newly imported isAudio export used by process_attachments (imported as isAudio,
isImage, isSpreadsheet, isTextFile in process_attachments.ts); update the mock
in convex/lib/attachments/__tests__/process_attachments.test.ts to include a
stubbed isAudio export (matching the other mocked functions) or adjust the
test's mocked module to the updated import path so the test provides isAudio;
ensure the mock name/signature aligns with how process_attachments calls
isAudio.
services/platform/lib/shared/file-types.ts (1)
242-243: ⚠️ Potential issue | 🟡 Minor

Update CHAT_UPLOAD_ACCEPT to include audio file types.

CHAT_UPLOAD_ACCEPT currently aliases TEXT_FILE_ACCEPT, which does not include audio MIME types. The native file picker will reject audio files even though runtime validation (line 312 in chat-input.tsx) accepts them. Audio types are already defined in file-types.ts (audio/mpeg, audio/wav, audio/mp4, audio/webm, audio/ogg) but must be added to the accept list used by the input element.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/lib/shared/file-types.ts` around lines 242 - 243,
CHAT_UPLOAD_ACCEPT currently aliases TEXT_FILE_ACCEPT and so omits audio MIME
types; update the CHAT_UPLOAD_ACCEPT constant to include the audio MIME types
defined in this module (add the audio MIME list or concatenate the existing
audio constant) so the input accept string allows audio files too — change
export const CHAT_UPLOAD_ACCEPT = TEXT_FILE_ACCEPT to a combined value (e.g.,
`${TEXT_FILE_ACCEPT},${AUDIO_FILE_ACCEPT}` or append the specific audio MIME
types like audio/mpeg,audio/wav,audio/mp4,audio/webm,audio/ogg) and ensure the
symbol CHAT_UPLOAD_ACCEPT is exported with the new value.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/platform/app/features/chat/components/chat-input.tsx`:
- Around line 312-363: The audio-attachment branch falls back to file size when
transcription metadata hasn't loaded; add a pending UI when the component-level
isTranscribing/isTranscriptionQueryLoading flag is true but
transcriptionStatuses.get(attachment.fileId) (the local variable info) is still
undefined. In the audio branch (inside attachment.fileType.startsWith('audio/'))
check if isTranscribing (or the prop passed from chat-interface, e.g.,
isTranscriptionQueryLoading via isTranscribing) and !info, and return the same
HStack Loader + caption (using tChat('transcription.transcribing') or
info?.progress) so attachments show a transcribing pending state instead of file
size while metadata is loading.

In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts`:
- Around line 104-112: The current Math.min(mergedConfig.maxFileSize,
getMaxFileSizeForType(resolvedType)) wrongly enforces the generic
CHAT_MAX_FILE_SIZE for audio; change the logic so audio uses the per-type
ceiling instead of being min'd with mergedConfig. Replace the Math.min
expression that sets perTypeLimit with a conditional that uses
getMaxFileSizeForType(resolvedType) when resolvedType is audio (or when
getMaxFileSizeForType returns a larger per-type ceiling), otherwise use
mergedConfig.maxFileSize; keep the rest of the rejection flow
(rejectedTooLarge.push(file)) unchanged.
- Around line 132-139: Replace the sentinel mutation pattern that sets (entry as
unknown as { _tooLong?: true })._tooLong with a type-safe Set: declare const
tooLongFiles = new Set<File>() before the Promise.all where you check durations;
inside the duration-check branch call tooLongFiles.add(entry.file) instead of
mutating entry; then change the filtered computation that currently references
validFiles.filter(/* sentinel check */) to const filtered =
validFiles.filter((v) => !tooLongFiles.has(v.file)); update references to the
sentinel (_tooLong) and remove the unsafe casts.

In `@services/platform/app/features/chat/hooks/use-file-transcription-status.ts`:
- Around line 55-61: The current map entry sets startedAt using m._creationTime
which is the file-metadata creation time and misrepresents when transcription
began; update the code that builds the map (the map.set call) to use a real
transcription start timestamp instead of m._creationTime—preferably a
backend-provided field like m.transcriptionStartedAt (or similar) that reflects
when status switched to "running"—and if such a field is not available, rename
the property exposed to callers from startedAt to createdAt so callers don't
treat it as the transcription start time; ensure any downstream uses that check
the 60-second skip threshold or elapsed UI use the correct transcription-start
field (e.g., transcriptionStartedAt) or the renamed createdAt to avoid
miscalculation.

In `@services/platform/app/features/chat/utils/get-audio-duration.ts`:
- Around line 15-24: The Promise in getAudioDuration can hang if neither
'loadedmetadata' nor 'error' fires; add a safety timeout (e.g., configurable ms)
that resolves to null after expiry, and ensure proper cleanup by removing the
'loadedmetadata' and 'error' listeners and clearing the timeout when any handler
runs so the Promise settles exactly once; update the Promise executor in
getAudioDuration (references: audio element, 'loadedmetadata' handler, 'error'
handler) to create and clear the timeout and to detach listeners in each branch.

In `@services/platform/convex/file_metadata/internal_mutations.ts`:
- Around line 255-270: The watchdog currently uses row._creationTime in
recoverStuckTranscriptions which false-positives when transcription started
later; add a transcriptionStartedAt?: number to the file metadata schema, set
transcriptionStartedAt = Date.now() when updateFileTranscription transitions
transcriptionStatus to 'running', and change recoverStuckTranscriptions to
compare transcriptionStartedAt (falling back to _creationTime if
transcriptionStartedAt is absent) against the cutoff before marking
transcriptionStatus 'failed' so only truly stuck transcriptions are timed out.

In `@services/platform/convex/file_metadata/mutations.ts`:
- Around line 196-210: The retry path currently unconditionally sets
transcriptionStatus to 'queued' and enqueues
internal.file_metadata.transcribe_audio.transcribeAudio, causing duplicate work;
change the logic in the retry handler in mutations.ts to first read the current
transcriptionStatus (via ctx.db.get or a conditional patch on metadata._id) and
only transition and schedule when the status is a terminal retryable state
(e.g., 'failed' or 'skipped'); if status is already 'queued', 'running', or
'completed' return no-op; ensure transcriptionError is cleared only when you
successfully transition to 'queued' and then call ctx.scheduler.runAfter with
storageId, fileName, contentType, and organizationId.
- Around line 149-167: The handler currently only compares args.organizationId
with the row, allowing any authenticated user who knows a storageId to mutate
transcription state and it doesn't verify the file is audio; update the mutation
handlers (the async handler functions shown and the similar handler at lines
181-194) to: after authComponent.getAuthUser(ctx) call, call the existing
getOrganizationMember (or equivalent helper) to verify the authUser is a member
of args.organizationId and throw on failure, then after loading metadata check
metadata.contentType exists and startsWith('audio/') (or otherwise indicates
audio) and throw an error if not audio; keep the existing
storageId/organizationId checks and avoid introducing authorizeRls() into these
mutation handlers.

In `@services/platform/convex/file_metadata/queries.ts`:
- Around line 91-105: The schema exposes sensitive transcript fields
(transcript, transcriptionError, _creationTime) in the getByStorageIds path
without scoping rows to the caller; update the lookup (getByStorageIds) to
require and verify organizationId/owner before returning those fields (either
add an organizationId parameter and include it in the query filter, or filter by
caller identity/ownership in the resolver) and apply the same fix to the other
query that exposes these fields in the 130-135 region so rows are only returned
with transcript-related fields when organization/ownership is confirmed.

In `@services/platform/convex/governance/internal_mutations.ts`:
- Around line 247-273: The current upsert (using existingQuery.first() then
ctx.db.insert('usageLedger', ...)) can race and create duplicate rows; replace
it with the same reconciliation/upsert approach used in incrementUsageLedger (or
extract that logic into a shared helper) so concurrent transcriptions reconcile
into a single row. Specifically, instead of blind insert after !match, either
call the existing incrementUsageLedger helper or implement the reconciliation:
attempt insert, catch duplicate-key errors, then re-query the matching row
(using existingQuery) and ctx.db.patch to atomically add audioDurationSec,
costEstimate, and increment requestCount; ensure the helper is used by this
mutation to avoid duplicating the race-prone first()+insert() pattern.
- Around line 236-245: The existing query building the upsert lookup
(existingQuery) uses the index 'by_org_user_period_team_agent_model' and keys on
organizationId, userId, periodKey, teamId, agentSlug, and model but omits
provider; update the lookup and upsert to include provider so rows are keyed
per-provider (either add .eq('provider', args.provider) to the withIndex chain
and use an index that includes provider, or change to/to create an index name
that includes provider such as 'by_org_user_period_team_agent_model_provider')
ensuring the same provider field is included in both the query (existingQuery)
and the upsert/write path so transcription usage for identical model names
across providers does not collide.

In `@services/platform/convex/governance/upload_enforcement.ts`:
- Around line 74-85: The code uses the caller-provided mimeType to pick a
per-MIME size override (checking mimeType and config.maxFileSizeLimits) which
allows clients to escalate limits; instead, only apply per-MIME overrides when
you have a trusted, server-derived media type (e.g., a
validated/detectedMimeType or a flag like contentValidated) or else fall back to
the global config.maxFileSizeBytes; update the branch around
mimeType/config.maxFileSizeLimits/limit so it checks a server-validated value
(or a validation flag) before selecting a match and never raises the cap based
solely on the caller-controlled mimeType.

In `@services/platform/convex/lib/attachments/process_attachments.ts`:
- Around line 130-149: The document-count heuristic still includes audio files;
update the documentCount calculation to exclude audio by using the
already-filtered documentAttachments (or subtract audioAttachments.length)
instead of counting all non-image/non-spreadsheet files. Locate the variables
documentCount and maxDocLength logic and replace the current count with
documentAttachments.length (or adjust to deduct audioAttachments) so audio
(audioAttachments) no longer reduces maxDocLength.

In `@services/platform/lib/shared/file-types.ts`:
- Line 336: The exported constant CHAT_AUDIO_MAX_FILE_SIZE is unused and
triggers knip; make it module-local by removing the export modifier (change
"export const CHAT_AUDIO_MAX_FILE_SIZE" to "const CHAT_AUDIO_MAX_FILE_SIZE") so
it remains available inside file-types.ts but not exported. If another module
actually requires the raw limit, instead add an explicit export where it's used
or import the value from a single authoritative place; otherwise leave it
non-exported to unblock CI.

In `@services/platform/lib/shared/schemas/governance.ts`:
- Around line 55-66: The settings UI currently doesn't preserve the new
maxFileSizeLimits field because upload-policy-editor.tsx still builds and saves
UploadPolicyConfig from legacy fields only; update the editor to round-trip the
new schema by reading maxFileSizeLimits into the editor state and including it
when constructing/saving the UploadPolicyConfig object. Locate the
UploadPolicyConfig construction and the state initialization/serialization in
upload-policy-editor.tsx (and any helper funcs that assemble the policy object)
and add handling for the maxFileSizeLimits array (validate entries, bind inputs
to mimeTypePrefix and maxBytes, and include the field when saving) so existing
per-MIME overrides are not lost on save.

In `@services/platform/messages/de.json`:
- Line 1578: fr.json is missing 149 keys that exist in en.json and de.json
(including the new keys like "tagTranscription" and related audio/transcription
keys); update fr.json to include the full set of keys present in en.json/de.json
so all base locales share an identical key set. Open fr.json, compare it against
en.json (or de.json) and add the missing keys with placeholder French values (or
empty strings) for each missing entry (at minimum add "tagTranscription" and all
new audio/transcription keys introduced in this PR), preserving the same key
names and nesting as in en.json/de.json to restore synchronization.

In `@services/platform/messages/fr.json`:
- Line 2865: Update the French message for the key "audioDurationExceeded" to
use the same informal chat tone as surrounding messages: keep the placeholders
{names} and {maxHours} but replace the formal "Veuillez le découper en segments
plus courts." with an informal phrasing such as "Découpe‑le en segments plus
courts." so the final string reads similarly to "{names} : l'audio dépasse la
limite de {maxHours} heures. Découpe‑le en segments plus courts."

---

Outside diff comments:
In
`@services/platform/app/features/settings/providers/components/provider-edit-panel.tsx`:
- Around line 95-103: The isDirty computation in provider-edit-panel.tsx omits
the new transcription default, so changes to form.defaults.transcription won't
mark the form dirty; update the isDirty boolean to include a comparison of
form.defaults.transcription !== (data.config.defaults?.transcription ??
NONE_VALUE) along with the other defaults (chat, vision, embedding) and ensure
any other dirty checks that mirror this logic (e.g., the save-enabling checks
around the defaults UI) are updated similarly to reference
form.defaults.transcription and data.config.defaults?.transcription and use
NONE_VALUE for the fallback.

In `@services/platform/convex/lib/attachments/process_attachments.ts`:
- Around line 8-13: The test mock for '../../../../lib/shared/file-types' is
missing the newly imported isAudio export used by process_attachments (imported
as isAudio, isImage, isSpreadsheet, isTextFile in process_attachments.ts);
update the mock in convex/lib/attachments/__tests__/process_attachments.test.ts
to include a stubbed isAudio export (matching the other mocked functions) or
adjust the test's mocked module to the updated import path so the test provides
isAudio; ensure the mock name/signature aligns with how process_attachments
calls isAudio.

In `@services/platform/lib/shared/file-types.ts`:
- Around line 242-243: CHAT_UPLOAD_ACCEPT currently aliases TEXT_FILE_ACCEPT and
so omits audio MIME types; update the CHAT_UPLOAD_ACCEPT constant to include the
audio MIME types defined in this module (add the audio MIME list or concatenate
the existing audio constant) so the input accept string allows audio files too —
change export const CHAT_UPLOAD_ACCEPT = TEXT_FILE_ACCEPT to a combined value
(e.g., `${TEXT_FILE_ACCEPT},${AUDIO_FILE_ACCEPT}` or append the specific audio
MIME types like audio/mpeg,audio/wav,audio/mp4,audio/webm,audio/ogg) and ensure
the symbol CHAT_UPLOAD_ACCEPT is exported with the new value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f620b175-4e89-44e1-8c17-2dab27ed68a7

📥 Commits

Reviewing files that changed from the base of the PR and between 71acec5 and 4832add.

⛔ Files ignored due to path filters (1)

services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**

📒 Files selected for processing (33)

examples/providers/openai.json
examples/providers/openai.secrets.json
services/convex/Dockerfile
services/platform/app/features/chat/components/chat-input.tsx
services/platform/app/features/chat/components/chat-interface.tsx
services/platform/app/features/chat/components/message-bubble/file-displays.tsx
services/platform/app/features/chat/components/model-tag-icons.tsx
services/platform/app/features/chat/hooks/use-convex-file-upload.ts
services/platform/app/features/chat/hooks/use-file-transcription-status.ts
services/platform/app/features/chat/utils/get-audio-duration.ts
services/platform/app/features/settings/providers/components/provider-add-panel.tsx
services/platform/app/features/settings/providers/components/provider-edit-panel.tsx
services/platform/app/features/settings/providers/utils/model-tag-label.ts
services/platform/convex/crons.ts
services/platform/convex/file_metadata/audio_preprocess.ts
services/platform/convex/file_metadata/internal_mutations.ts
services/platform/convex/file_metadata/mutations.ts
services/platform/convex/file_metadata/queries.ts
services/platform/convex/file_metadata/schema.ts
services/platform/convex/file_metadata/transcribe_audio.ts
services/platform/convex/governance/cost_estimation.ts
services/platform/convex/governance/internal_mutations.ts
services/platform/convex/governance/schema.ts
services/platform/convex/governance/upload_enforcement.ts
services/platform/convex/lib/attachments/process_attachments.ts
services/platform/convex/providers/file_actions.ts
services/platform/convex/providers/resolve_model.ts
services/platform/lib/shared/file-types.ts
services/platform/lib/shared/schemas/governance.ts
services/platform/lib/shared/schemas/providers.ts
services/platform/messages/de.json
services/platform/messages/en.json
services/platform/messages/fr.json

coderabbitai · 2026-04-21T06:33:08Z

+                      if (attachment.fileType.startsWith('audio/')) {
+                        const info = transcriptionStatuses?.get(
+                          attachment.fileId,
+                        );
+                        const status = info?.status;
+                        if (status === 'queued' || status === 'running') {
+                          return (
+                            <HStack gap={1} align="center">
+                              <Loader className="text-muted-foreground/50 size-3 animate-spin" />
+                              <Text
+                                as="span"
+                                variant="caption"
+                                className="text-muted-foreground/50"
+                              >
+                                {info?.progress ||
+                                  tChat('transcription.transcribing')}
+                              </Text>
+                            </HStack>
+                          );
+                        }
+                        if (status === 'completed') {
+                          return (
+                            <Text
+                              as="span"
+                              variant="caption"
+                              className="text-muted-foreground/70"
+                            >
+                              {tChat('transcription.transcribed')}
+                            </Text>
+                          );
+                        }
+                        if (status === 'failed' || status === 'skipped') {
+                          return (
+                            <Text
+                              as="span"
+                              variant="caption"
+                              className="text-destructive"
+                            >
+                              {tChat('transcription.couldNotTranscribe')}
+                            </Text>
+                          );
+                        }
+                        return (
+                          <Text
+                            as="div"
+                            variant="caption"
+                            className="text-muted-foreground/50"
+                          >
+                            {formatFileSize(attachment.fileSize)}
+                          </Text>
+                        );
+                      }


🛠️ Refactor suggestion | 🟠 Major

Show a pending state while transcription metadata is still loading.

chat-interface.tsx can pass isTranscribing={isTranscribing || isTranscriptionQueryLoading}, but this branch still falls back to file size when the status map has not populated yet. That leaves the attachment looking idle while send is disabled for transcription.

♻️ Proposed adjustment

if (status === 'failed' || status === 'skipped') { return ( <Text as="span" variant="caption" className="text-destructive" > {tChat('transcription.couldNotTranscribe')} </Text> ); } + if (isTranscribing) { + return ( + <HStack gap={1} align="center"> + <Loader className="text-muted-foreground/50 size-3 animate-spin" /> + <Text + as="span" + variant="caption" + className="text-muted-foreground/50" + > + {tChat('transcription.transcribing')} + </Text> + </HStack> + ); + } return ( <Text as="div" variant="caption" className="text-muted-foreground/50"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/chat/components/chat-input.tsx` around lines 312 - 363, The audio-attachment branch falls back to file size when transcription metadata hasn't loaded; add a pending UI when the component-level isTranscribing/isTranscriptionQueryLoading flag is true but transcriptionStatuses.get(attachment.fileId) (the local variable info) is still undefined. In the audio branch (inside attachment.fileType.startsWith('audio/')) check if isTranscribing (or the prop passed from chat-interface, e.g., isTranscriptionQueryLoading via isTranscribing) and !info, and return the same HStack Loader + caption (using tChat('transcription.transcribing') or info?.progress) so attachments show a transcribing pending state instead of file size while metadata is loading.

coderabbitai · 2026-04-21T06:33:08Z

+        // Per-type ceiling: audio max file size is 1 GB (duration is the
+        // real gate — see audio duration check below); other types cap at
+        // the generic `maxFileSize`.
+        const perTypeLimit = Math.min(
+          mergedConfig.maxFileSize,
+          getMaxFileSizeForType(resolvedType),
+        );
+        if (file.size > perTypeLimit) {
          rejectedTooLarge.push(file);


⚠️ Potential issue | 🟠 Major

Math.min keeps audio capped at 100 MB.

With the default config, mergedConfig.maxFileSize is still CHAT_MAX_FILE_SIZE (100 MB). For audio files this computes Math.min(100 MB, 1 GB), so long audio gets rejected before the new 4-hour duration check ever runs. That breaks the intended audio-upload path and will also show the wrong size limit in the rejection toast.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts` around lines 104 - 112, The current Math.min(mergedConfig.maxFileSize, getMaxFileSizeForType(resolvedType)) wrongly enforces the generic CHAT_MAX_FILE_SIZE for audio; change the logic so audio uses the per-type ceiling instead of being min'd with mergedConfig. Replace the Math.min expression that sets perTypeLimit with a conditional that uses getMaxFileSizeForType(resolvedType) when resolvedType is audio (or when getMaxFileSizeForType returns a larger per-type ceiling), otherwise use mergedConfig.maxFileSize; keep the rest of the rejection flow (rejectedTooLarge.push(file)) unchanged.

coderabbitai · 2026-04-21T06:33:08Z

+              // oxlint-disable-next-line typescript/no-unsafe-type-assertion -- sentinel used only for filter step below
+              (entry as unknown as { _tooLong?: true })._tooLong = true;
+            }
+          }),
+        );
+        const filtered = validFiles.filter(
+          // oxlint-disable-next-line typescript/no-unsafe-type-assertion -- inverse of the sentinel set above
+          (v) => !(v as unknown as { _tooLong?: true })._tooLong,


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Get file info and view the relevant section wc -l services/platform/app/features/chat/hooks/use-convex-file-upload.ts

Repository: tale-project/tale

Length of output: 132

🏁 Script executed:

# Read context around lines 132-139 sed -n '110,160p' services/platform/app/features/chat/hooks/use-convex-file-upload.ts

Repository: tale-project/tale

Length of output: 2156

🏁 Script executed:

# Read broader context including function signature and variable declarations sed -n '90,145p' services/platform/app/features/chat/hooks/use-convex-file-upload.ts

Repository: tale-project/tale

Length of output: 2427

🏁 Script executed:

# Look for the type of validFiles and entry sed -n '80,135p' services/platform/app/features/chat/hooks/use-convex-file-upload.ts | head -60

Repository: tale-project/tale

Length of output: 2405

🏁 Script executed:

# Verify the full function signature and context to ensure Set<File> is viable sed -n '60,80p' services/platform/app/features/chat/hooks/use-convex-file-upload.ts

Repository: tale-project/tale

Length of output: 696

Replace the sentinel _tooLong pattern with a Set<File> to track oversized audio.

The current as unknown as { _tooLong?: true } approach mutates objects with shadow fields and violates the TypeScript guideline: "Never as, never any, never unknown in TypeScript. Use type guards, generics, discriminated unions, or never."

Instead, declare const tooLongFiles = new Set<File>() before the Promise.all block. Inside the duration check, call tooLongFiles.add(entry.file) instead of mutating the entry. Then replace the filtered assignment with:

const filtered = validFiles.filter((v) => !tooLongFiles.has(v.file));

This is type-safe, cleaner, and avoids mutations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts` around lines 132 - 139, Replace the sentinel mutation pattern that sets (entry as unknown as { _tooLong?: true })._tooLong with a type-safe Set: declare const tooLongFiles = new Set<File>() before the Promise.all where you check durations; inside the duration-check branch call tooLongFiles.add(entry.file) instead of mutating entry; then change the filtered computation that currently references validFiles.filter(/* sentinel check */) to const filtered = validFiles.filter((v) => !tooLongFiles.has(v.file)); update references to the sentinel (_tooLong) and remove the unsafe casts.

coderabbitai · 2026-04-21T06:33:08Z

+      map.set(m.storageId, {
+        status: m.transcriptionStatus,
+        error: m.transcriptionError,
+        transcript: m.transcript,
+        durationSec: m.transcriptionDurationSec,
+        progress: m.transcriptionProgress,
+        startedAt: m._creationTime,


⚠️ Potential issue | 🟠 Major

startedAt is populated with the wrong timestamp.

m._creationTime is the file-metadata row creation time, not the moment transcription entered running. If this value drives the 60-second skip threshold or any elapsed-time UI, queued time and retries will be measured incorrectly. Either expose a real transcription-start field from the backend or rename this to createdAt so callers do not misinterpret it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/chat/hooks/use-file-transcription-status.ts` around lines 55 - 61, The current map entry sets startedAt using m._creationTime which is the file-metadata creation time and misrepresents when transcription began; update the code that builds the map (the map.set call) to use a real transcription start timestamp instead of m._creationTime—preferably a backend-provided field like m.transcriptionStartedAt (or similar) that reflects when status switched to "running"—and if such a field is not available, rename the property exposed to callers from startedAt to createdAt so callers don't treat it as the transcription start time; ensure any downstream uses that check the 60-second skip threshold or elapsed UI use the correct transcription-start field (e.g., transcriptionStartedAt) or the renamed createdAt to avoid miscalculation.

coderabbitai · 2026-04-21T06:33:09Z

+    return await new Promise<number | null>((resolve) => {
+      const audio = document.createElement('audio');
+      audio.preload = 'metadata';
+      audio.src = url;
+      audio.addEventListener('loadedmetadata', () => {
+        const d = audio.duration;
+        resolve(Number.isFinite(d) && d > 0 ? d : null);
+      });
+      audio.addEventListener('error', () => resolve(null));
+    });


⚠️ Potential issue | 🟠 Major

Promise can hang indefinitely when metadata events never arrive.

getAudioDuration currently resolves only on loadedmetadata/error. In edge cases where neither fires, the upload flow can remain blocked.

💡 Suggested hardening (timeout + cleanup)

export async function getAudioDuration(file: File): Promise<number | null> { const url = URL.createObjectURL(file); try { return await new Promise<number | null>((resolve) => { const audio = document.createElement('audio'); + const timeoutId = window.setTimeout(() => { + cleanup(); + resolve(null); + }, 5000); + + const cleanup = () => { + window.clearTimeout(timeoutId); + audio.removeEventListener('loadedmetadata', onLoadedMetadata); + audio.removeEventListener('error', onError); + }; + + const onLoadedMetadata = () => { + const d = audio.duration; + cleanup(); + resolve(Number.isFinite(d) && d > 0 ? d : null); + }; + + const onError = () => { + cleanup(); + resolve(null); + }; + audio.preload = 'metadata'; audio.src = url; - audio.addEventListener('loadedmetadata', () => { - const d = audio.duration; - resolve(Number.isFinite(d) && d > 0 ? d : null); - }); - audio.addEventListener('error', () => resolve(null)); + audio.addEventListener('loadedmetadata', onLoadedMetadata, { + once: true, + }); + audio.addEventListener('error', onError, { once: true }); }); } finally { URL.revokeObjectURL(url); } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return await new Promise<number | null>((resolve) => {

const audio = document.createElement('audio');

audio.preload = 'metadata';

audio.src = url;

audio.addEventListener('loadedmetadata', () => {

const d = audio.duration;

resolve(Number.isFinite(d) && d > 0 ? d : null);

});

audio.addEventListener('error', () => resolve(null));

});

export async function getAudioDuration(file: File): Promise<number | null> {

const url = URL.createObjectURL(file);

try {

return await new Promise<number | null>((resolve) => {

const audio = document.createElement('audio');

const timeoutId = window.setTimeout(() => {

cleanup();

resolve(null);

}, 5000);

const cleanup = () => {

window.clearTimeout(timeoutId);

audio.removeEventListener('loadedmetadata', onLoadedMetadata);

audio.removeEventListener('error', onError);

};

const onLoadedMetadata = () => {

const d = audio.duration;

cleanup();

resolve(Number.isFinite(d) && d > 0 ? d : null);

};

const onError = () => {

cleanup();

resolve(null);

};

audio.preload = 'metadata';

audio.src = url;

audio.addEventListener('loadedmetadata', onLoadedMetadata, {

once: true,

});

audio.addEventListener('error', onError, { once: true });

});

} finally {

URL.revokeObjectURL(url);

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/chat/utils/get-audio-duration.ts` around lines 15 - 24, The Promise in getAudioDuration can hang if neither 'loadedmetadata' nor 'error' fires; add a safety timeout (e.g., configurable ms) that resolves to null after expiry, and ensure proper cleanup by removing the 'loadedmetadata' and 'error' listeners and clearing the timeout when any handler runs so the Promise settles exactly once; update the Promise executor in getAudioDuration (references: audio element, 'loadedmetadata' handler, 'error' handler) to create and clear the timeout and to detach listeners in each branch.

coderabbitai · 2026-04-21T06:33:09Z

+  // Separate images, spreadsheets, audio, and other files (documents + text).
+  // Audio is inlined as text transcript; the raw bytes never go to the chat
+  // model, so it must be excluded from fileAttachments/documentAttachments.
  const imageAttachments = attachments.filter((a) => isImage(a.fileType));
  const spreadsheetAttachments = attachments.filter(
    (a) => !isImage(a.fileType) && isSpreadsheet(a.fileName),
  );
+  const audioAttachments = attachments.filter((a) => isAudio(a.fileType));
  const fileAttachments = attachments.filter(
-    (a) => !isImage(a.fileType) && !isSpreadsheet(a.fileName),
+    (a) =>
+      !isImage(a.fileType) &&
+      !isSpreadsheet(a.fileName) &&
+      !isAudio(a.fileType),
  );
  const documentAttachments = attachments.filter(
    (a) =>
      !isImage(a.fileType) &&
      !isSpreadsheet(a.fileName) &&
+      !isAudio(a.fileType) &&
      !isTextFile(a.fileType, a.fileName),


⚠️ Potential issue | 🟡 Minor

Exclude audio from the earlier document-count heuristic too.

After introducing audioAttachments here, the documentCount calculation at Lines 102-107 still counts audio files as documents. A message with one PDF and one MP3 now drops maxDocLength to the multi-doc limit even though the audio never goes through document parsing, so document context gets truncated more aggressively than intended.

🧰 Tools

🪛 GitHub Check: Unit

[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > performance logging > calls debugLog with PERF_PARSE_ALL after all documents
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:

vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})

❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:190:13

[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > performance logging > calls debugLog with PERF_PARSE_FILE for each document
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:

vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})

❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:169:13

[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > multi-document truncation > uses full limit for single document
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:

vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})

❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:134:28

[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > multi-document truncation > uses reduced limit when multiple documents are attached
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:

vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})

❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:103:28

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/lib/attachments/process_attachments.ts` around lines 130 - 149, The document-count heuristic still includes audio files; update the documentCount calculation to exclude audio by using the already-filtered documentAttachments (or subtract audioAttachments.length) instead of counting all non-image/non-spreadsheet files. Locate the variables documentCount and maxDocLength logic and replace the current count with documentAttachments.length (or adjust to deduct audioAttachments) so audio (audioAttachments) no longer reduces maxDocLength.

coderabbitai · 2026-04-21T06:33:09Z

+ * transcription, so the raw upload can be much larger than OpenAI's 25 MB cap.
+ * Duration is the real gate — see `CHAT_AUDIO_MAX_DURATION_SEC`.
+ */
+export const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;


⚠️ Potential issue | 🟠 Major

Drop the unused export to unblock CI.

knip is already failing on CHAT_AUDIO_MAX_FILE_SIZE being exported but never imported. Keep the constant module-local unless another file actually needs the raw limit value.

🧹 Minimal fix

-export const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024; +const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;

const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;

🧰 Tools

🪛 GitHub Actions: Lint

[error] 336-336: knip: Unused exports (1). CHAT_AUDIO_MAX_FILE_SIZE is reported as unused.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/lib/shared/file-types.ts` at line 336, The exported constant CHAT_AUDIO_MAX_FILE_SIZE is unused and triggers knip; make it module-local by removing the export modifier (change "export const CHAT_AUDIO_MAX_FILE_SIZE" to "const CHAT_AUDIO_MAX_FILE_SIZE") so it remains available inside file-types.ts but not exported. If another module actually requires the raw limit, instead add an explicit export where it's used or import the value from a single authoritative place; otherwise leave it non-exported to unblock CI.

coderabbitai · 2026-04-21T06:33:09Z

+  // Optional per-MIME-prefix overrides. When the upload's MIME type matches
+  // any `mimeTypePrefix` entry, that `maxBytes` wins over `maxFileSizeBytes`.
+  // Example: `[{ mimeTypePrefix: 'audio/', maxBytes: 25 * 1024 * 1024 }]`
+  // caps audio at 25 MB while leaving other types at the global limit.
+  maxFileSizeLimits: z
+    .array(
+      z.object({
+        mimeTypePrefix: z.string().min(1),
+        maxBytes: z.number().nonnegative(),
+      }),
+    )
+    .optional(),


⚠️ Potential issue | 🟠 Major

The settings editor will currently erase this new field on save.

upload-policy-editor.tsx still builds UploadPolicyConfig from the legacy fields only, so once a policy containing maxFileSizeLimits is opened in Settings, any save rewrites the config without these overrides. Please add UI round-trip support before shipping the schema change, otherwise admins will silently lose their per-MIME limits.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/lib/shared/schemas/governance.ts` around lines 55 - 66, The settings UI currently doesn't preserve the new maxFileSizeLimits field because upload-policy-editor.tsx still builds and saves UploadPolicyConfig from legacy fields only; update the editor to round-trip the new schema by reading maxFileSizeLimits into the editor state and including it when constructing/saving the UploadPolicyConfig object. Locate the UploadPolicyConfig construction and the state initialization/serialization in upload-policy-editor.tsx (and any helper funcs that assemble the policy object) and add handling for the maxFileSizeLimits array (validate entries, bind inputs to mimeTypePrefix and maxBytes, and include the field when saving) so existing per-MIME overrides are not lost on save.

coderabbitai · 2026-04-21T06:33:09Z

      "tagEmbedding": "Embedding",
      "tagImageGeneration": "Bildgenerierung",
      "tagImageEdit": "Bildbearbeitung",
+      "tagTranscription": "Transkription",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' import json from pathlib import Path base = Path("services/platform/messages") files = {k: base / f"{k}.json" for k in ("en", "de", "fr")} data = {k: json.loads(p.read_text(encoding="utf-8")) for k, p in files.items()} def flatten_keys(obj, prefix=""): out = set() if isinstance(obj, dict): for k, v in obj.items(): key = f"{prefix}.{k}" if prefix else k out.add(key) out |= flatten_keys(v, key) return out keys = {k: flatten_keys(v) for k, v in data.items()} en = keys["en"] for loc in ("de", "fr"): missing = sorted(en - keys[loc]) extra = sorted(keys[loc] - en) print(f"{loc}: missing={len(missing)} extra={len(extra)}") if missing: print(f" sample missing: {missing[:10]}") if extra: print(f" sample extra: {extra[:10]}") PY

Repository: tale-project/tale

Length of output: 425

Update fr.json to match the key structure in en.json and de.json.

Verification confirms de.json is properly synchronized with en.json (missing=0, extra=0), but fr.json is missing 149 keys that exist in both en.json and de.json. Per coding guidelines, all base locales must maintain identical key sets on the same commit. Add the missing keys to fr.json (at minimum the new transcription and audio keys added in this PR, but ideally all 149 missing keys to fully resolve the translation sync violation).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/messages/de.json` at line 1578, fr.json is missing 149 keys that exist in en.json and de.json (including the new keys like "tagTranscription" and related audio/transcription keys); update fr.json to include the full set of keys present in en.json/de.json so all base locales share an identical key set. Open fr.json, compare it against en.json (or de.json) and add the missing keys with placeholder French values (or empty strings) for each missing entry (at minimum add "tagTranscription" and all new audio/transcription keys introduced in this PR), preserving the same key names and nesting as in en.json/de.json to restore synchronization.

coderabbitai · 2026-04-21T06:33:09Z

    "invalidFiles": "Fichiers invalides",
    "filesNotSupported": "Certains fichiers sont trop volumineux (>100 Mo) ou non pris en charge. Formats acceptés : images, PDF, documents Word, fichiers texte.",
    "fileSizeExceededMultiple": "{names} dépasse(nt) la limite de taille de fichier de {maxSize} Mo.",
+    "audioDurationExceeded": "{names} : l'audio dépasse la limite de {maxHours} heures. Veuillez le découper en segments plus courts.",


🧹 Nitpick | 🔵 Trivial

Keep chat copy tone consistent with informal voice.

This line switches to formal wording (“Veuillez…”), while surrounding chat messages use informal tone.

✍️ Suggested wording tweak

- "audioDurationExceeded": "{names} : l'audio dépasse la limite de {maxHours} heures. Veuillez le découper en segments plus courts.", + "audioDurationExceeded": "{names} : la durée audio dépasse la limite de {maxHours} heures. Merci de le découper en segments plus courts.",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"audioDurationExceeded": "{names} : l'audio dépasse la limite de {maxHours} heures. Veuillez le découper en segments plus courts.",

"audioDurationExceeded": "{names} : la durée audio dépasse la limite de {maxHours} heures. Merci de le découper en segments plus courts.",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/messages/fr.json` at line 2865, Update the French message for the key "audioDurationExceeded" to use the same informal chat tone as surrounding messages: keep the placeholders {names} and {maxHours} but replace the formal "Veuillez le découper en segments plus courts." with an informal phrasing such as "Découpe‑le en segments plus courts." so the final string reads similarly to "{names} : l'audio dépasse la limite de {maxHours} heures. Découpe‑le en segments plus courts."

- Audio chip with completed transcription shows an eye button in the bottom-right corner; clicking opens a ViewDialog with the full transcript text and duration subtitle. - Include audio/* and explicit extensions (mp3/m4a/wav/webm/ogg/…) in CHAT_UPLOAD_ACCEPT so the OS file picker no longer hides audio files by default.

- Paragraph breaks derived from Whisper segments (>=1.5s pauses or 45s max) replace the wall-of-text look in transcript previews. - Dedup identical uploads via Convex's built-in _storage.sha256: repeat uploads of the same content short-circuit to the cached transcript — no ffmpeg, no OpenAI spend, no ledger entry. - Sent-message audio chip gets the same Eye preview button as the composer chip; opens a ViewDialog with the full transcript. - Message text no longer embeds the raw transcript — emits a compact "transcript indexed in knowledge base, call rag_search with fileId=..." reference, matching the PDF pattern. User bubbles stay clean while agents retrieve on demand via RAG. - Wire audio handling into the real buildMessageWithAttachments path (the earlier Phase 1 edit to process_attachments.ts was dead code).

- document_retrieve was reporting "still being indexed (status: pending)" for completed audio uploads. Root cause: we gate out the regular uploadFileToRag for audio, so the audio row's ragStatus stayed undefined, but the tool only reads ragStatus — not transcriptRagStatus. indexTranscriptToRag now mirrors both fields in lockstep. - Fix RAG 400 on upload: SUPPORTED_EXTENSIONS has no audio extensions, so passing the raw audio filename (meeting.mp3) was rejected. Append `.txt` — content is already text/plain and the original name stays visible via metadata.originalFileName. - Citation block clicks on audio sources now open the transcript preview (ViewDialog) rather than DocumentPreviewDialog (which tried to render the audio bytes as a document). - Point agents at document_retrieve (not rag_search) in the message reference — document_retrieve is designed for "read full content by fileId" which matches the transcript-summary use case. - Re-index on dedup hit using the new storageId so citations in future chats point to the actually-uploaded file rather than the original. - Add "indexing → indexed" UI phase (transcription completed but RAG still running) and block send until both phases complete. - RAG service now logs every 4xx rejection with filename + reason at warning level, so operators can see why "400 Bad Request" happened.

Video files now flow through the same transcription pipeline as audio: ffmpeg's `-vn` flag strips the video track, the audio is silence- removed and re-encoded to 32 kbps Opus, and the transcript is indexed to RAG just like audio uploads. One pipeline, 11 container formats. Supported extensions: - mp4 / m4v (video/mp4, video/x-m4v) — Zoom/Teams/Meet recordings - mov / qt (video/quicktime) — Mac screen recordings - webm (video/webm) — browser recordings - mkv (video/x-matroska) — OBS and general purpose - avi (video/x-msvideo) — legacy Windows - mpeg / mpg (video/mpeg) - ogv (video/ogg) — Ogg Theora - 3gp / 3g2 (video/3gpp) — mobile recordings - ts / m2ts (video/mp2t) — MPEG transport stream - New `isVideo` + `isAudioOrVideo` helpers in lib/shared/file-types. - Client: Film icon (indigo) for video chips, duration read via HTMLVideoElement so it works for both audio-only and video files. - Server: mutations + buildMessageWithAttachments + file-displays + source-cards all route audio and video through the same transcription path. LLM sees `🎥` for video, `🎙️` for audio. - Max file size raised to 2 GB for audio/video (covers 4-hour 720p meetings). Duration cap stays at 4 hours. - i18n: `fileTypes.video` added to en/de/fr.

…ed exports - process_attachments.test.ts mocked `../lib/shared/file-types` without `isAudio`, which broke after the real module started importing it. Added to the mock. - file-types.test.ts expected `video/mp4` → undefined; now that we officially support video mp4, swap the assertion to an actually-unknown MIME. - Knip flagged `isVideo` and `CHAT_AUDIO_MAX_FILE_SIZE` as unused exports — both are only consumed inside file-types.ts via `isAudioOrVideo` and `getMaxFileSizeForType`. Make them module-local.

- Split provider docs into UI-only (platform/admin/providers) and a new system-level reference (self-hosted/configuration/providers) covering JSON config schema, example files, SOPS secrets, self-hosted backends, Docker networking, and provider pinning; removed platform/integrations/ providers in favour of the new reference. - Document audio and video chat transcription (PR #1591) in chat/ attachments and chat/basics across all locales: new transcription model tag, server-side pipeline, status UI, 4-hour duration cap, per-minute pricing via centsPerAudioMinute, and per-MIME upload size caps in governance. - Reframe the Meetily meeting-transcription tutorial as the fully-local alternative to the new server-side path. - Add a UI-vs-system placement rule to docs/AGENTS.md with a worked provider example so future changes keep end-user content under platform/ and filesystem/config content under self-hosted/. - Update inbound links, tutorial cross-references, and docs.json nav across en/de/fr; add explicit {#upload-policy} anchor on governance headings for cross-locale link stability.

coderabbitai Bot requested changes Apr 21, 2026

View reviewed changes

larryro added 5 commits April 21, 2026 14:34

larryro merged commit c394a54 into main Apr 21, 2026
17 checks passed

larryro deleted the feat/chat-audio-transcription branch April 21, 2026 08:16

-    return await new Promise<number | null>((resolve) => {
-      const audio = document.createElement('audio');
-      audio.preload = 'metadata';
-      audio.src = url;
-      audio.addEventListener('loadedmetadata', () => {
-        const d = audio.duration;
-        resolve(Number.isFinite(d) && d > 0 ? d : null);
-      });
-      audio.addEventListener('error', () => resolve(null));
-    });
+export async function getAudioDuration(file: File): Promise<number | null> {
+  const url = URL.createObjectURL(file);
+  try {
+    return await new Promise<number | null>((resolve) => {
+      const audio = document.createElement('audio');
+      const timeoutId = window.setTimeout(() => {
+        cleanup();
+        resolve(null);
+      }, 5000);
+      const cleanup = () => {
+        window.clearTimeout(timeoutId);
+        audio.removeEventListener('loadedmetadata', onLoadedMetadata);
+        audio.removeEventListener('error', onError);
+      };
+      const onLoadedMetadata = () => {
+        const d = audio.duration;
+        cleanup();
+        resolve(Number.isFinite(d) && d > 0 ? d : null);
+      };
+      const onError = () => {
+        cleanup();
+        resolve(null);
+      };
+      audio.preload = 'metadata';
+      audio.src = url;
+      audio.addEventListener('loadedmetadata', onLoadedMetadata, {
+        once: true,
+      });
+      audio.addEventListener('error', onError, { once: true });
+    });
+  } finally {
+    URL.revokeObjectURL(url);
+  }
+}

	export const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;
	const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;

	"audioDurationExceeded": "{names} : l'audio dépasse la limite de {maxHours} heures. Veuillez le découper en segments plus courts.",
	"audioDurationExceeded": "{names} : la durée audio dépasse la limite de {maxHours} heures. Merci de le découper en segments plus courts.",

Conversation

larryro commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading