feat(platform): audio transcription for chat uploads#1591
Conversation
Audio files uploaded to chat are server-side compressed via ffmpeg (silence removal + 32 kbps Opus mono 16 kHz), chunked into <24 MB pieces if needed, transcribed via an OpenAI-compatible Whisper provider, and inlined into the LLM prompt. Transcripts are also indexed to RAG for cross-chat search. Usage flows into the existing ledger with a new per-minute cost model; self-hosted Whisper swaps in by changing baseUrl only. 4-hour duration cap enforced client-side; watchdog cron recovers any stuck 'running' rows; user removal/cancel stops pending retries.
📝 WalkthroughWalkthroughThis PR introduces audio transcription capabilities to the platform. It adds OpenAI's Whisper-1 as a configurable transcription provider, implements server-side audio preprocessing (compression, chunking via ffmpeg), and integrates transcription state management throughout the chat interface. The backend orchestrates transcription via Convex actions with OpenAI API integration, error recovery, retry logic, and cost tracking per audio duration. Client-side components display transcription status, validate audio file constraints (duration/size), and allow users to skip or retry transcriptions. Schema updates track transcription lifecycle ( Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
services/platform/app/features/settings/providers/components/provider-edit-panel.tsx (1)
95-103:⚠️ Potential issue | 🟠 Major
isDirtymisses the newtranscriptiondefault, so Save can stay disabled.After adding
transcriptionto the defaults UI (Line 178), the dirty-state check still stops atembedding(Line 100-103). If a user only changes transcription, the form won’t be considered dirty and cannot be saved.💡 Minimal fix
const isDirty = data?.ok && (form.displayName !== data.config.displayName || form.description !== (data.config.description ?? '') || form.baseUrl !== data.config.baseUrl || form.defaults.chat !== (data.config.defaults?.chat ?? NONE_VALUE) || form.defaults.vision !== (data.config.defaults?.vision ?? NONE_VALUE) || form.defaults.embedding !== - (data.config.defaults?.embedding ?? NONE_VALUE)); + (data.config.defaults?.embedding ?? NONE_VALUE) || + form.defaults.transcription !== + (data.config.defaults?.transcription ?? NONE_VALUE));Also applies to: 178-180
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/settings/providers/components/provider-edit-panel.tsx` around lines 95 - 103, The isDirty computation in provider-edit-panel.tsx omits the new transcription default, so changes to form.defaults.transcription won't mark the form dirty; update the isDirty boolean to include a comparison of form.defaults.transcription !== (data.config.defaults?.transcription ?? NONE_VALUE) along with the other defaults (chat, vision, embedding) and ensure any other dirty checks that mirror this logic (e.g., the save-enabling checks around the defaults UI) are updated similarly to reference form.defaults.transcription and data.config.defaults?.transcription and use NONE_VALUE for the fallback.services/platform/convex/lib/attachments/process_attachments.ts (1)
8-13:⚠️ Potential issue | 🟠 MajorUpdate the
file-typesmock before merging.The Unit job is already failing because
convex/lib/attachments/__tests__/process_attachments.test.tsmocks../../../../lib/shared/file-typeswithout anisAudioexport. This import change needs a matching test update or the server suite stays red.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/lib/attachments/process_attachments.ts` around lines 8 - 13, The test mock for '../../../../lib/shared/file-types' is missing the newly imported isAudio export used by process_attachments (imported as isAudio, isImage, isSpreadsheet, isTextFile in process_attachments.ts); update the mock in convex/lib/attachments/__tests__/process_attachments.test.ts to include a stubbed isAudio export (matching the other mocked functions) or adjust the test's mocked module to the updated import path so the test provides isAudio; ensure the mock name/signature aligns with how process_attachments calls isAudio.services/platform/lib/shared/file-types.ts (1)
242-243:⚠️ Potential issue | 🟡 MinorUpdate
CHAT_UPLOAD_ACCEPTto include audio file types.
CHAT_UPLOAD_ACCEPTcurrently aliasesTEXT_FILE_ACCEPT, which does not include audio MIME types. The native file picker will reject audio files even though runtime validation (line 312 inchat-input.tsx) accepts them. Audio types are already defined infile-types.ts(audio/mpeg, audio/wav, audio/mp4, audio/webm, audio/ogg) but must be added to the accept list used by the input element.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/lib/shared/file-types.ts` around lines 242 - 243, CHAT_UPLOAD_ACCEPT currently aliases TEXT_FILE_ACCEPT and so omits audio MIME types; update the CHAT_UPLOAD_ACCEPT constant to include the audio MIME types defined in this module (add the audio MIME list or concatenate the existing audio constant) so the input accept string allows audio files too — change export const CHAT_UPLOAD_ACCEPT = TEXT_FILE_ACCEPT to a combined value (e.g., `${TEXT_FILE_ACCEPT},${AUDIO_FILE_ACCEPT}` or append the specific audio MIME types like audio/mpeg,audio/wav,audio/mp4,audio/webm,audio/ogg) and ensure the symbol CHAT_UPLOAD_ACCEPT is exported with the new value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@services/platform/app/features/chat/components/chat-input.tsx`:
- Around line 312-363: The audio-attachment branch falls back to file size when
transcription metadata hasn't loaded; add a pending UI when the component-level
isTranscribing/isTranscriptionQueryLoading flag is true but
transcriptionStatuses.get(attachment.fileId) (the local variable info) is still
undefined. In the audio branch (inside attachment.fileType.startsWith('audio/'))
check if isTranscribing (or the prop passed from chat-interface, e.g.,
isTranscriptionQueryLoading via isTranscribing) and !info, and return the same
HStack Loader + caption (using tChat('transcription.transcribing') or
info?.progress) so attachments show a transcribing pending state instead of file
size while metadata is loading.
In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts`:
- Around line 104-112: The current Math.min(mergedConfig.maxFileSize,
getMaxFileSizeForType(resolvedType)) wrongly enforces the generic
CHAT_MAX_FILE_SIZE for audio; change the logic so audio uses the per-type
ceiling instead of being min'd with mergedConfig. Replace the Math.min
expression that sets perTypeLimit with a conditional that uses
getMaxFileSizeForType(resolvedType) when resolvedType is audio (or when
getMaxFileSizeForType returns a larger per-type ceiling), otherwise use
mergedConfig.maxFileSize; keep the rest of the rejection flow
(rejectedTooLarge.push(file)) unchanged.
- Around line 132-139: Replace the sentinel mutation pattern that sets (entry as
unknown as { _tooLong?: true })._tooLong with a type-safe Set: declare const
tooLongFiles = new Set<File>() before the Promise.all where you check durations;
inside the duration-check branch call tooLongFiles.add(entry.file) instead of
mutating entry; then change the filtered computation that currently references
validFiles.filter(/* sentinel check */) to const filtered =
validFiles.filter((v) => !tooLongFiles.has(v.file)); update references to the
sentinel (_tooLong) and remove the unsafe casts.
In `@services/platform/app/features/chat/hooks/use-file-transcription-status.ts`:
- Around line 55-61: The current map entry sets startedAt using m._creationTime
which is the file-metadata creation time and misrepresents when transcription
began; update the code that builds the map (the map.set call) to use a real
transcription start timestamp instead of m._creationTime—preferably a
backend-provided field like m.transcriptionStartedAt (or similar) that reflects
when status switched to "running"—and if such a field is not available, rename
the property exposed to callers from startedAt to createdAt so callers don't
treat it as the transcription start time; ensure any downstream uses that check
the 60-second skip threshold or elapsed UI use the correct transcription-start
field (e.g., transcriptionStartedAt) or the renamed createdAt to avoid
miscalculation.
In `@services/platform/app/features/chat/utils/get-audio-duration.ts`:
- Around line 15-24: The Promise in getAudioDuration can hang if neither
'loadedmetadata' nor 'error' fires; add a safety timeout (e.g., configurable ms)
that resolves to null after expiry, and ensure proper cleanup by removing the
'loadedmetadata' and 'error' listeners and clearing the timeout when any handler
runs so the Promise settles exactly once; update the Promise executor in
getAudioDuration (references: audio element, 'loadedmetadata' handler, 'error'
handler) to create and clear the timeout and to detach listeners in each branch.
In `@services/platform/convex/file_metadata/internal_mutations.ts`:
- Around line 255-270: The watchdog currently uses row._creationTime in
recoverStuckTranscriptions which false-positives when transcription started
later; add a transcriptionStartedAt?: number to the file metadata schema, set
transcriptionStartedAt = Date.now() when updateFileTranscription transitions
transcriptionStatus to 'running', and change recoverStuckTranscriptions to
compare transcriptionStartedAt (falling back to _creationTime if
transcriptionStartedAt is absent) against the cutoff before marking
transcriptionStatus 'failed' so only truly stuck transcriptions are timed out.
In `@services/platform/convex/file_metadata/mutations.ts`:
- Around line 196-210: The retry path currently unconditionally sets
transcriptionStatus to 'queued' and enqueues
internal.file_metadata.transcribe_audio.transcribeAudio, causing duplicate work;
change the logic in the retry handler in mutations.ts to first read the current
transcriptionStatus (via ctx.db.get or a conditional patch on metadata._id) and
only transition and schedule when the status is a terminal retryable state
(e.g., 'failed' or 'skipped'); if status is already 'queued', 'running', or
'completed' return no-op; ensure transcriptionError is cleared only when you
successfully transition to 'queued' and then call ctx.scheduler.runAfter with
storageId, fileName, contentType, and organizationId.
- Around line 149-167: The handler currently only compares args.organizationId
with the row, allowing any authenticated user who knows a storageId to mutate
transcription state and it doesn't verify the file is audio; update the mutation
handlers (the async handler functions shown and the similar handler at lines
181-194) to: after authComponent.getAuthUser(ctx) call, call the existing
getOrganizationMember (or equivalent helper) to verify the authUser is a member
of args.organizationId and throw on failure, then after loading metadata check
metadata.contentType exists and startsWith('audio/') (or otherwise indicates
audio) and throw an error if not audio; keep the existing
storageId/organizationId checks and avoid introducing authorizeRls() into these
mutation handlers.
In `@services/platform/convex/file_metadata/queries.ts`:
- Around line 91-105: The schema exposes sensitive transcript fields
(transcript, transcriptionError, _creationTime) in the getByStorageIds path
without scoping rows to the caller; update the lookup (getByStorageIds) to
require and verify organizationId/owner before returning those fields (either
add an organizationId parameter and include it in the query filter, or filter by
caller identity/ownership in the resolver) and apply the same fix to the other
query that exposes these fields in the 130-135 region so rows are only returned
with transcript-related fields when organization/ownership is confirmed.
In `@services/platform/convex/governance/internal_mutations.ts`:
- Around line 247-273: The current upsert (using existingQuery.first() then
ctx.db.insert('usageLedger', ...)) can race and create duplicate rows; replace
it with the same reconciliation/upsert approach used in incrementUsageLedger (or
extract that logic into a shared helper) so concurrent transcriptions reconcile
into a single row. Specifically, instead of blind insert after !match, either
call the existing incrementUsageLedger helper or implement the reconciliation:
attempt insert, catch duplicate-key errors, then re-query the matching row
(using existingQuery) and ctx.db.patch to atomically add audioDurationSec,
costEstimate, and increment requestCount; ensure the helper is used by this
mutation to avoid duplicating the race-prone first()+insert() pattern.
- Around line 236-245: The existing query building the upsert lookup
(existingQuery) uses the index 'by_org_user_period_team_agent_model' and keys on
organizationId, userId, periodKey, teamId, agentSlug, and model but omits
provider; update the lookup and upsert to include provider so rows are keyed
per-provider (either add .eq('provider', args.provider) to the withIndex chain
and use an index that includes provider, or change to/to create an index name
that includes provider such as 'by_org_user_period_team_agent_model_provider')
ensuring the same provider field is included in both the query (existingQuery)
and the upsert/write path so transcription usage for identical model names
across providers does not collide.
In `@services/platform/convex/governance/upload_enforcement.ts`:
- Around line 74-85: The code uses the caller-provided mimeType to pick a
per-MIME size override (checking mimeType and config.maxFileSizeLimits) which
allows clients to escalate limits; instead, only apply per-MIME overrides when
you have a trusted, server-derived media type (e.g., a
validated/detectedMimeType or a flag like contentValidated) or else fall back to
the global config.maxFileSizeBytes; update the branch around
mimeType/config.maxFileSizeLimits/limit so it checks a server-validated value
(or a validation flag) before selecting a match and never raises the cap based
solely on the caller-controlled mimeType.
In `@services/platform/convex/lib/attachments/process_attachments.ts`:
- Around line 130-149: The document-count heuristic still includes audio files;
update the documentCount calculation to exclude audio by using the
already-filtered documentAttachments (or subtract audioAttachments.length)
instead of counting all non-image/non-spreadsheet files. Locate the variables
documentCount and maxDocLength logic and replace the current count with
documentAttachments.length (or adjust to deduct audioAttachments) so audio
(audioAttachments) no longer reduces maxDocLength.
In `@services/platform/lib/shared/file-types.ts`:
- Line 336: The exported constant CHAT_AUDIO_MAX_FILE_SIZE is unused and
triggers knip; make it module-local by removing the export modifier (change
"export const CHAT_AUDIO_MAX_FILE_SIZE" to "const CHAT_AUDIO_MAX_FILE_SIZE") so
it remains available inside file-types.ts but not exported. If another module
actually requires the raw limit, instead add an explicit export where it's used
or import the value from a single authoritative place; otherwise leave it
non-exported to unblock CI.
In `@services/platform/lib/shared/schemas/governance.ts`:
- Around line 55-66: The settings UI currently doesn't preserve the new
maxFileSizeLimits field because upload-policy-editor.tsx still builds and saves
UploadPolicyConfig from legacy fields only; update the editor to round-trip the
new schema by reading maxFileSizeLimits into the editor state and including it
when constructing/saving the UploadPolicyConfig object. Locate the
UploadPolicyConfig construction and the state initialization/serialization in
upload-policy-editor.tsx (and any helper funcs that assemble the policy object)
and add handling for the maxFileSizeLimits array (validate entries, bind inputs
to mimeTypePrefix and maxBytes, and include the field when saving) so existing
per-MIME overrides are not lost on save.
In `@services/platform/messages/de.json`:
- Line 1578: fr.json is missing 149 keys that exist in en.json and de.json
(including the new keys like "tagTranscription" and related audio/transcription
keys); update fr.json to include the full set of keys present in en.json/de.json
so all base locales share an identical key set. Open fr.json, compare it against
en.json (or de.json) and add the missing keys with placeholder French values (or
empty strings) for each missing entry (at minimum add "tagTranscription" and all
new audio/transcription keys introduced in this PR), preserving the same key
names and nesting as in en.json/de.json to restore synchronization.
In `@services/platform/messages/fr.json`:
- Line 2865: Update the French message for the key "audioDurationExceeded" to
use the same informal chat tone as surrounding messages: keep the placeholders
{names} and {maxHours} but replace the formal "Veuillez le découper en segments
plus courts." with an informal phrasing such as "Découpe‑le en segments plus
courts." so the final string reads similarly to "{names} : l'audio dépasse la
limite de {maxHours} heures. Découpe‑le en segments plus courts."
---
Outside diff comments:
In
`@services/platform/app/features/settings/providers/components/provider-edit-panel.tsx`:
- Around line 95-103: The isDirty computation in provider-edit-panel.tsx omits
the new transcription default, so changes to form.defaults.transcription won't
mark the form dirty; update the isDirty boolean to include a comparison of
form.defaults.transcription !== (data.config.defaults?.transcription ??
NONE_VALUE) along with the other defaults (chat, vision, embedding) and ensure
any other dirty checks that mirror this logic (e.g., the save-enabling checks
around the defaults UI) are updated similarly to reference
form.defaults.transcription and data.config.defaults?.transcription and use
NONE_VALUE for the fallback.
In `@services/platform/convex/lib/attachments/process_attachments.ts`:
- Around line 8-13: The test mock for '../../../../lib/shared/file-types' is
missing the newly imported isAudio export used by process_attachments (imported
as isAudio, isImage, isSpreadsheet, isTextFile in process_attachments.ts);
update the mock in convex/lib/attachments/__tests__/process_attachments.test.ts
to include a stubbed isAudio export (matching the other mocked functions) or
adjust the test's mocked module to the updated import path so the test provides
isAudio; ensure the mock name/signature aligns with how process_attachments
calls isAudio.
In `@services/platform/lib/shared/file-types.ts`:
- Around line 242-243: CHAT_UPLOAD_ACCEPT currently aliases TEXT_FILE_ACCEPT and
so omits audio MIME types; update the CHAT_UPLOAD_ACCEPT constant to include the
audio MIME types defined in this module (add the audio MIME list or concatenate
the existing audio constant) so the input accept string allows audio files too —
change export const CHAT_UPLOAD_ACCEPT = TEXT_FILE_ACCEPT to a combined value
(e.g., `${TEXT_FILE_ACCEPT},${AUDIO_FILE_ACCEPT}` or append the specific audio
MIME types like audio/mpeg,audio/wav,audio/mp4,audio/webm,audio/ogg) and ensure
the symbol CHAT_UPLOAD_ACCEPT is exported with the new value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f620b175-4e89-44e1-8c17-2dab27ed68a7
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (33)
examples/providers/openai.jsonexamples/providers/openai.secrets.jsonservices/convex/Dockerfileservices/platform/app/features/chat/components/chat-input.tsxservices/platform/app/features/chat/components/chat-interface.tsxservices/platform/app/features/chat/components/message-bubble/file-displays.tsxservices/platform/app/features/chat/components/model-tag-icons.tsxservices/platform/app/features/chat/hooks/use-convex-file-upload.tsservices/platform/app/features/chat/hooks/use-file-transcription-status.tsservices/platform/app/features/chat/utils/get-audio-duration.tsservices/platform/app/features/settings/providers/components/provider-add-panel.tsxservices/platform/app/features/settings/providers/components/provider-edit-panel.tsxservices/platform/app/features/settings/providers/utils/model-tag-label.tsservices/platform/convex/crons.tsservices/platform/convex/file_metadata/audio_preprocess.tsservices/platform/convex/file_metadata/internal_mutations.tsservices/platform/convex/file_metadata/mutations.tsservices/platform/convex/file_metadata/queries.tsservices/platform/convex/file_metadata/schema.tsservices/platform/convex/file_metadata/transcribe_audio.tsservices/platform/convex/governance/cost_estimation.tsservices/platform/convex/governance/internal_mutations.tsservices/platform/convex/governance/schema.tsservices/platform/convex/governance/upload_enforcement.tsservices/platform/convex/lib/attachments/process_attachments.tsservices/platform/convex/providers/file_actions.tsservices/platform/convex/providers/resolve_model.tsservices/platform/lib/shared/file-types.tsservices/platform/lib/shared/schemas/governance.tsservices/platform/lib/shared/schemas/providers.tsservices/platform/messages/de.jsonservices/platform/messages/en.jsonservices/platform/messages/fr.json
| if (attachment.fileType.startsWith('audio/')) { | ||
| const info = transcriptionStatuses?.get( | ||
| attachment.fileId, | ||
| ); | ||
| const status = info?.status; | ||
| if (status === 'queued' || status === 'running') { | ||
| return ( | ||
| <HStack gap={1} align="center"> | ||
| <Loader className="text-muted-foreground/50 size-3 animate-spin" /> | ||
| <Text | ||
| as="span" | ||
| variant="caption" | ||
| className="text-muted-foreground/50" | ||
| > | ||
| {info?.progress || | ||
| tChat('transcription.transcribing')} | ||
| </Text> | ||
| </HStack> | ||
| ); | ||
| } | ||
| if (status === 'completed') { | ||
| return ( | ||
| <Text | ||
| as="span" | ||
| variant="caption" | ||
| className="text-muted-foreground/70" | ||
| > | ||
| {tChat('transcription.transcribed')} | ||
| </Text> | ||
| ); | ||
| } | ||
| if (status === 'failed' || status === 'skipped') { | ||
| return ( | ||
| <Text | ||
| as="span" | ||
| variant="caption" | ||
| className="text-destructive" | ||
| > | ||
| {tChat('transcription.couldNotTranscribe')} | ||
| </Text> | ||
| ); | ||
| } | ||
| return ( | ||
| <Text | ||
| as="div" | ||
| variant="caption" | ||
| className="text-muted-foreground/50" | ||
| > | ||
| {formatFileSize(attachment.fileSize)} | ||
| </Text> | ||
| ); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Show a pending state while transcription metadata is still loading.
chat-interface.tsx can pass isTranscribing={isTranscribing || isTranscriptionQueryLoading}, but this branch still falls back to file size when the status map has not populated yet. That leaves the attachment looking idle while send is disabled for transcription.
♻️ Proposed adjustment
if (status === 'failed' || status === 'skipped') {
return (
<Text
as="span"
variant="caption"
className="text-destructive"
>
{tChat('transcription.couldNotTranscribe')}
</Text>
);
}
+ if (isTranscribing) {
+ return (
+ <HStack gap={1} align="center">
+ <Loader className="text-muted-foreground/50 size-3 animate-spin" />
+ <Text
+ as="span"
+ variant="caption"
+ className="text-muted-foreground/50"
+ >
+ {tChat('transcription.transcribing')}
+ </Text>
+ </HStack>
+ );
+ }
return (
<Text
as="div"
variant="caption"
className="text-muted-foreground/50"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/app/features/chat/components/chat-input.tsx` around lines
312 - 363, The audio-attachment branch falls back to file size when
transcription metadata hasn't loaded; add a pending UI when the component-level
isTranscribing/isTranscriptionQueryLoading flag is true but
transcriptionStatuses.get(attachment.fileId) (the local variable info) is still
undefined. In the audio branch (inside attachment.fileType.startsWith('audio/'))
check if isTranscribing (or the prop passed from chat-interface, e.g.,
isTranscriptionQueryLoading via isTranscribing) and !info, and return the same
HStack Loader + caption (using tChat('transcription.transcribing') or
info?.progress) so attachments show a transcribing pending state instead of file
size while metadata is loading.
| // Per-type ceiling: audio max file size is 1 GB (duration is the | ||
| // real gate — see audio duration check below); other types cap at | ||
| // the generic `maxFileSize`. | ||
| const perTypeLimit = Math.min( | ||
| mergedConfig.maxFileSize, | ||
| getMaxFileSizeForType(resolvedType), | ||
| ); | ||
| if (file.size > perTypeLimit) { | ||
| rejectedTooLarge.push(file); |
There was a problem hiding this comment.
Math.min keeps audio capped at 100 MB.
With the default config, mergedConfig.maxFileSize is still CHAT_MAX_FILE_SIZE (100 MB). For audio files this computes Math.min(100 MB, 1 GB), so long audio gets rejected before the new 4-hour duration check ever runs. That breaks the intended audio-upload path and will also show the wrong size limit in the rejection toast.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts` around
lines 104 - 112, The current Math.min(mergedConfig.maxFileSize,
getMaxFileSizeForType(resolvedType)) wrongly enforces the generic
CHAT_MAX_FILE_SIZE for audio; change the logic so audio uses the per-type
ceiling instead of being min'd with mergedConfig. Replace the Math.min
expression that sets perTypeLimit with a conditional that uses
getMaxFileSizeForType(resolvedType) when resolvedType is audio (or when
getMaxFileSizeForType returns a larger per-type ceiling), otherwise use
mergedConfig.maxFileSize; keep the rest of the rejection flow
(rejectedTooLarge.push(file)) unchanged.
| // oxlint-disable-next-line typescript/no-unsafe-type-assertion -- sentinel used only for filter step below | ||
| (entry as unknown as { _tooLong?: true })._tooLong = true; | ||
| } | ||
| }), | ||
| ); | ||
| const filtered = validFiles.filter( | ||
| // oxlint-disable-next-line typescript/no-unsafe-type-assertion -- inverse of the sentinel set above | ||
| (v) => !(v as unknown as { _tooLong?: true })._tooLong, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Get file info and view the relevant section
wc -l services/platform/app/features/chat/hooks/use-convex-file-upload.tsRepository: tale-project/tale
Length of output: 132
🏁 Script executed:
# Read context around lines 132-139
sed -n '110,160p' services/platform/app/features/chat/hooks/use-convex-file-upload.tsRepository: tale-project/tale
Length of output: 2156
🏁 Script executed:
# Read broader context including function signature and variable declarations
sed -n '90,145p' services/platform/app/features/chat/hooks/use-convex-file-upload.tsRepository: tale-project/tale
Length of output: 2427
🏁 Script executed:
# Look for the type of validFiles and entry
sed -n '80,135p' services/platform/app/features/chat/hooks/use-convex-file-upload.ts | head -60Repository: tale-project/tale
Length of output: 2405
🏁 Script executed:
# Verify the full function signature and context to ensure Set<File> is viable
sed -n '60,80p' services/platform/app/features/chat/hooks/use-convex-file-upload.tsRepository: tale-project/tale
Length of output: 696
Replace the sentinel _tooLong pattern with a Set<File> to track oversized audio.
The current as unknown as { _tooLong?: true } approach mutates objects with shadow fields and violates the TypeScript guideline: "Never as, never any, never unknown in TypeScript. Use type guards, generics, discriminated unions, or never."
Instead, declare const tooLongFiles = new Set<File>() before the Promise.all block. Inside the duration check, call tooLongFiles.add(entry.file) instead of mutating the entry. Then replace the filtered assignment with:
const filtered = validFiles.filter((v) => !tooLongFiles.has(v.file));
This is type-safe, cleaner, and avoids mutations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/app/features/chat/hooks/use-convex-file-upload.ts` around
lines 132 - 139, Replace the sentinel mutation pattern that sets (entry as
unknown as { _tooLong?: true })._tooLong with a type-safe Set: declare const
tooLongFiles = new Set<File>() before the Promise.all where you check durations;
inside the duration-check branch call tooLongFiles.add(entry.file) instead of
mutating entry; then change the filtered computation that currently references
validFiles.filter(/* sentinel check */) to const filtered =
validFiles.filter((v) => !tooLongFiles.has(v.file)); update references to the
sentinel (_tooLong) and remove the unsafe casts.
| map.set(m.storageId, { | ||
| status: m.transcriptionStatus, | ||
| error: m.transcriptionError, | ||
| transcript: m.transcript, | ||
| durationSec: m.transcriptionDurationSec, | ||
| progress: m.transcriptionProgress, | ||
| startedAt: m._creationTime, |
There was a problem hiding this comment.
startedAt is populated with the wrong timestamp.
m._creationTime is the file-metadata row creation time, not the moment transcription entered running. If this value drives the 60-second skip threshold or any elapsed-time UI, queued time and retries will be measured incorrectly. Either expose a real transcription-start field from the backend or rename this to createdAt so callers do not misinterpret it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/app/features/chat/hooks/use-file-transcription-status.ts`
around lines 55 - 61, The current map entry sets startedAt using m._creationTime
which is the file-metadata creation time and misrepresents when transcription
began; update the code that builds the map (the map.set call) to use a real
transcription start timestamp instead of m._creationTime—preferably a
backend-provided field like m.transcriptionStartedAt (or similar) that reflects
when status switched to "running"—and if such a field is not available, rename
the property exposed to callers from startedAt to createdAt so callers don't
treat it as the transcription start time; ensure any downstream uses that check
the 60-second skip threshold or elapsed UI use the correct transcription-start
field (e.g., transcriptionStartedAt) or the renamed createdAt to avoid
miscalculation.
| return await new Promise<number | null>((resolve) => { | ||
| const audio = document.createElement('audio'); | ||
| audio.preload = 'metadata'; | ||
| audio.src = url; | ||
| audio.addEventListener('loadedmetadata', () => { | ||
| const d = audio.duration; | ||
| resolve(Number.isFinite(d) && d > 0 ? d : null); | ||
| }); | ||
| audio.addEventListener('error', () => resolve(null)); | ||
| }); |
There was a problem hiding this comment.
Promise can hang indefinitely when metadata events never arrive.
getAudioDuration currently resolves only on loadedmetadata/error. In edge cases where neither fires, the upload flow can remain blocked.
💡 Suggested hardening (timeout + cleanup)
export async function getAudioDuration(file: File): Promise<number | null> {
const url = URL.createObjectURL(file);
try {
return await new Promise<number | null>((resolve) => {
const audio = document.createElement('audio');
+ const timeoutId = window.setTimeout(() => {
+ cleanup();
+ resolve(null);
+ }, 5000);
+
+ const cleanup = () => {
+ window.clearTimeout(timeoutId);
+ audio.removeEventListener('loadedmetadata', onLoadedMetadata);
+ audio.removeEventListener('error', onError);
+ };
+
+ const onLoadedMetadata = () => {
+ const d = audio.duration;
+ cleanup();
+ resolve(Number.isFinite(d) && d > 0 ? d : null);
+ };
+
+ const onError = () => {
+ cleanup();
+ resolve(null);
+ };
+
audio.preload = 'metadata';
audio.src = url;
- audio.addEventListener('loadedmetadata', () => {
- const d = audio.duration;
- resolve(Number.isFinite(d) && d > 0 ? d : null);
- });
- audio.addEventListener('error', () => resolve(null));
+ audio.addEventListener('loadedmetadata', onLoadedMetadata, {
+ once: true,
+ });
+ audio.addEventListener('error', onError, { once: true });
});
} finally {
URL.revokeObjectURL(url);
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return await new Promise<number | null>((resolve) => { | |
| const audio = document.createElement('audio'); | |
| audio.preload = 'metadata'; | |
| audio.src = url; | |
| audio.addEventListener('loadedmetadata', () => { | |
| const d = audio.duration; | |
| resolve(Number.isFinite(d) && d > 0 ? d : null); | |
| }); | |
| audio.addEventListener('error', () => resolve(null)); | |
| }); | |
| export async function getAudioDuration(file: File): Promise<number | null> { | |
| const url = URL.createObjectURL(file); | |
| try { | |
| return await new Promise<number | null>((resolve) => { | |
| const audio = document.createElement('audio'); | |
| const timeoutId = window.setTimeout(() => { | |
| cleanup(); | |
| resolve(null); | |
| }, 5000); | |
| const cleanup = () => { | |
| window.clearTimeout(timeoutId); | |
| audio.removeEventListener('loadedmetadata', onLoadedMetadata); | |
| audio.removeEventListener('error', onError); | |
| }; | |
| const onLoadedMetadata = () => { | |
| const d = audio.duration; | |
| cleanup(); | |
| resolve(Number.isFinite(d) && d > 0 ? d : null); | |
| }; | |
| const onError = () => { | |
| cleanup(); | |
| resolve(null); | |
| }; | |
| audio.preload = 'metadata'; | |
| audio.src = url; | |
| audio.addEventListener('loadedmetadata', onLoadedMetadata, { | |
| once: true, | |
| }); | |
| audio.addEventListener('error', onError, { once: true }); | |
| }); | |
| } finally { | |
| URL.revokeObjectURL(url); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/app/features/chat/utils/get-audio-duration.ts` around lines
15 - 24, The Promise in getAudioDuration can hang if neither 'loadedmetadata'
nor 'error' fires; add a safety timeout (e.g., configurable ms) that resolves to
null after expiry, and ensure proper cleanup by removing the 'loadedmetadata'
and 'error' listeners and clearing the timeout when any handler runs so the
Promise settles exactly once; update the Promise executor in getAudioDuration
(references: audio element, 'loadedmetadata' handler, 'error' handler) to create
and clear the timeout and to detach listeners in each branch.
| // Separate images, spreadsheets, audio, and other files (documents + text). | ||
| // Audio is inlined as text transcript; the raw bytes never go to the chat | ||
| // model, so it must be excluded from fileAttachments/documentAttachments. | ||
| const imageAttachments = attachments.filter((a) => isImage(a.fileType)); | ||
| const spreadsheetAttachments = attachments.filter( | ||
| (a) => !isImage(a.fileType) && isSpreadsheet(a.fileName), | ||
| ); | ||
| const audioAttachments = attachments.filter((a) => isAudio(a.fileType)); | ||
| const fileAttachments = attachments.filter( | ||
| (a) => !isImage(a.fileType) && !isSpreadsheet(a.fileName), | ||
| (a) => | ||
| !isImage(a.fileType) && | ||
| !isSpreadsheet(a.fileName) && | ||
| !isAudio(a.fileType), | ||
| ); | ||
| const documentAttachments = attachments.filter( | ||
| (a) => | ||
| !isImage(a.fileType) && | ||
| !isSpreadsheet(a.fileName) && | ||
| !isAudio(a.fileType) && | ||
| !isTextFile(a.fileType, a.fileName), |
There was a problem hiding this comment.
Exclude audio from the earlier document-count heuristic too.
After introducing audioAttachments here, the documentCount calculation at Lines 102-107 still counts audio files as documents. A message with one PDF and one MP3 now drops maxDocLength to the multi-doc limit even though the audio never goes through document parsing, so document context gets truncated more aggressively than intended.
🧰 Tools
🪛 GitHub Check: Unit
[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > performance logging > calls debugLog with PERF_PARSE_ALL after all documents
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:
vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})
❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:190:13
[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > performance logging > calls debugLog with PERF_PARSE_FILE for each document
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:
vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})
❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:169:13
[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > multi-document truncation > uses full limit for single document
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:
vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})
❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:134:28
[failure] 137-137: [server] convex/lib/attachments/tests/process_attachments.test.ts > processAttachments > multi-document truncation > uses reduced limit when multiple documents are attached
Error: [vitest] No "isAudio" export is defined on the "../../../../lib/shared/file-types" mock. Did you forget to return it from "vi.mock"?
If you need to partially mock a module, you can use "importOriginal" helper inside:
vi.mock(import("../../../../lib/shared/file-types"), async (importOriginal) => {
const actual = await importOriginal()
return {
...actual,
// your mocked methods
}
})
❯ convex/lib/attachments/process_attachments.ts:137:54
❯ Module.processAttachments convex/lib/attachments/process_attachments.ts:137:40
❯ convex/lib/attachments/tests/process_attachments.test.ts:103:28
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/lib/attachments/process_attachments.ts` around lines
130 - 149, The document-count heuristic still includes audio files; update the
documentCount calculation to exclude audio by using the already-filtered
documentAttachments (or subtract audioAttachments.length) instead of counting
all non-image/non-spreadsheet files. Locate the variables documentCount and
maxDocLength logic and replace the current count with documentAttachments.length
(or adjust to deduct audioAttachments) so audio (audioAttachments) no longer
reduces maxDocLength.
| * transcription, so the raw upload can be much larger than OpenAI's 25 MB cap. | ||
| * Duration is the real gate — see `CHAT_AUDIO_MAX_DURATION_SEC`. | ||
| */ | ||
| export const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024; |
There was a problem hiding this comment.
Drop the unused export to unblock CI.
knip is already failing on CHAT_AUDIO_MAX_FILE_SIZE being exported but never imported. Keep the constant module-local unless another file actually needs the raw limit value.
🧹 Minimal fix
-export const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;
+const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024; | |
| const CHAT_AUDIO_MAX_FILE_SIZE = 1024 * 1024 * 1024; |
🧰 Tools
🪛 GitHub Actions: Lint
[error] 336-336: knip: Unused exports (1). CHAT_AUDIO_MAX_FILE_SIZE is reported as unused.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/lib/shared/file-types.ts` at line 336, The exported
constant CHAT_AUDIO_MAX_FILE_SIZE is unused and triggers knip; make it
module-local by removing the export modifier (change "export const
CHAT_AUDIO_MAX_FILE_SIZE" to "const CHAT_AUDIO_MAX_FILE_SIZE") so it remains
available inside file-types.ts but not exported. If another module actually
requires the raw limit, instead add an explicit export where it's used or import
the value from a single authoritative place; otherwise leave it non-exported to
unblock CI.
| // Optional per-MIME-prefix overrides. When the upload's MIME type matches | ||
| // any `mimeTypePrefix` entry, that `maxBytes` wins over `maxFileSizeBytes`. | ||
| // Example: `[{ mimeTypePrefix: 'audio/', maxBytes: 25 * 1024 * 1024 }]` | ||
| // caps audio at 25 MB while leaving other types at the global limit. | ||
| maxFileSizeLimits: z | ||
| .array( | ||
| z.object({ | ||
| mimeTypePrefix: z.string().min(1), | ||
| maxBytes: z.number().nonnegative(), | ||
| }), | ||
| ) | ||
| .optional(), |
There was a problem hiding this comment.
The settings editor will currently erase this new field on save.
upload-policy-editor.tsx still builds UploadPolicyConfig from the legacy fields only, so once a policy containing maxFileSizeLimits is opened in Settings, any save rewrites the config without these overrides. Please add UI round-trip support before shipping the schema change, otherwise admins will silently lose their per-MIME limits.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/lib/shared/schemas/governance.ts` around lines 55 - 66, The
settings UI currently doesn't preserve the new maxFileSizeLimits field because
upload-policy-editor.tsx still builds and saves UploadPolicyConfig from legacy
fields only; update the editor to round-trip the new schema by reading
maxFileSizeLimits into the editor state and including it when
constructing/saving the UploadPolicyConfig object. Locate the UploadPolicyConfig
construction and the state initialization/serialization in
upload-policy-editor.tsx (and any helper funcs that assemble the policy object)
and add handling for the maxFileSizeLimits array (validate entries, bind inputs
to mimeTypePrefix and maxBytes, and include the field when saving) so existing
per-MIME overrides are not lost on save.
| "tagEmbedding": "Embedding", | ||
| "tagImageGeneration": "Bildgenerierung", | ||
| "tagImageEdit": "Bildbearbeitung", | ||
| "tagTranscription": "Transkription", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import json
from pathlib import Path
base = Path("services/platform/messages")
files = {k: base / f"{k}.json" for k in ("en", "de", "fr")}
data = {k: json.loads(p.read_text(encoding="utf-8")) for k, p in files.items()}
def flatten_keys(obj, prefix=""):
out = set()
if isinstance(obj, dict):
for k, v in obj.items():
key = f"{prefix}.{k}" if prefix else k
out.add(key)
out |= flatten_keys(v, key)
return out
keys = {k: flatten_keys(v) for k, v in data.items()}
en = keys["en"]
for loc in ("de", "fr"):
missing = sorted(en - keys[loc])
extra = sorted(keys[loc] - en)
print(f"{loc}: missing={len(missing)} extra={len(extra)}")
if missing:
print(f" sample missing: {missing[:10]}")
if extra:
print(f" sample extra: {extra[:10]}")
PYRepository: tale-project/tale
Length of output: 425
Update fr.json to match the key structure in en.json and de.json.
Verification confirms de.json is properly synchronized with en.json (missing=0, extra=0), but fr.json is missing 149 keys that exist in both en.json and de.json. Per coding guidelines, all base locales must maintain identical key sets on the same commit. Add the missing keys to fr.json (at minimum the new transcription and audio keys added in this PR, but ideally all 149 missing keys to fully resolve the translation sync violation).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/messages/de.json` at line 1578, fr.json is missing 149 keys
that exist in en.json and de.json (including the new keys like
"tagTranscription" and related audio/transcription keys); update fr.json to
include the full set of keys present in en.json/de.json so all base locales
share an identical key set. Open fr.json, compare it against en.json (or
de.json) and add the missing keys with placeholder French values (or empty
strings) for each missing entry (at minimum add "tagTranscription" and all new
audio/transcription keys introduced in this PR), preserving the same key names
and nesting as in en.json/de.json to restore synchronization.
| "invalidFiles": "Fichiers invalides", | ||
| "filesNotSupported": "Certains fichiers sont trop volumineux (>100 Mo) ou non pris en charge. Formats acceptés : images, PDF, documents Word, fichiers texte.", | ||
| "fileSizeExceededMultiple": "{names} dépasse(nt) la limite de taille de fichier de {maxSize} Mo.", | ||
| "audioDurationExceeded": "{names} : l'audio dépasse la limite de {maxHours} heures. Veuillez le découper en segments plus courts.", |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Keep chat copy tone consistent with informal voice.
This line switches to formal wording (“Veuillez…”), while surrounding chat messages use informal tone.
✍️ Suggested wording tweak
- "audioDurationExceeded": "{names} : l'audio dépasse la limite de {maxHours} heures. Veuillez le découper en segments plus courts.",
+ "audioDurationExceeded": "{names} : la durée audio dépasse la limite de {maxHours} heures. Merci de le découper en segments plus courts.",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "audioDurationExceeded": "{names} : l'audio dépasse la limite de {maxHours} heures. Veuillez le découper en segments plus courts.", | |
| "audioDurationExceeded": "{names} : la durée audio dépasse la limite de {maxHours} heures. Merci de le découper en segments plus courts.", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/messages/fr.json` at line 2865, Update the French message
for the key "audioDurationExceeded" to use the same informal chat tone as
surrounding messages: keep the placeholders {names} and {maxHours} but replace
the formal "Veuillez le découper en segments plus courts." with an informal
phrasing such as "Découpe‑le en segments plus courts." so the final string reads
similarly to "{names} : l'audio dépasse la limite de {maxHours} heures.
Découpe‑le en segments plus courts."
- Audio chip with completed transcription shows an eye button in the bottom-right corner; clicking opens a ViewDialog with the full transcript text and duration subtitle. - Include audio/* and explicit extensions (mp3/m4a/wav/webm/ogg/…) in CHAT_UPLOAD_ACCEPT so the OS file picker no longer hides audio files by default.
- Paragraph breaks derived from Whisper segments (>=1.5s pauses or 45s max) replace the wall-of-text look in transcript previews. - Dedup identical uploads via Convex's built-in _storage.sha256: repeat uploads of the same content short-circuit to the cached transcript — no ffmpeg, no OpenAI spend, no ledger entry. - Sent-message audio chip gets the same Eye preview button as the composer chip; opens a ViewDialog with the full transcript. - Message text no longer embeds the raw transcript — emits a compact "transcript indexed in knowledge base, call rag_search with fileId=..." reference, matching the PDF pattern. User bubbles stay clean while agents retrieve on demand via RAG. - Wire audio handling into the real buildMessageWithAttachments path (the earlier Phase 1 edit to process_attachments.ts was dead code).
- document_retrieve was reporting "still being indexed (status: pending)" for completed audio uploads. Root cause: we gate out the regular uploadFileToRag for audio, so the audio row's ragStatus stayed undefined, but the tool only reads ragStatus — not transcriptRagStatus. indexTranscriptToRag now mirrors both fields in lockstep. - Fix RAG 400 on upload: SUPPORTED_EXTENSIONS has no audio extensions, so passing the raw audio filename (meeting.mp3) was rejected. Append `.txt` — content is already text/plain and the original name stays visible via metadata.originalFileName. - Citation block clicks on audio sources now open the transcript preview (ViewDialog) rather than DocumentPreviewDialog (which tried to render the audio bytes as a document). - Point agents at document_retrieve (not rag_search) in the message reference — document_retrieve is designed for "read full content by fileId" which matches the transcript-summary use case. - Re-index on dedup hit using the new storageId so citations in future chats point to the actually-uploaded file rather than the original. - Add "indexing → indexed" UI phase (transcription completed but RAG still running) and block send until both phases complete. - RAG service now logs every 4xx rejection with filename + reason at warning level, so operators can see why "400 Bad Request" happened.
Video files now flow through the same transcription pipeline as audio: ffmpeg's `-vn` flag strips the video track, the audio is silence- removed and re-encoded to 32 kbps Opus, and the transcript is indexed to RAG just like audio uploads. One pipeline, 11 container formats. Supported extensions: - mp4 / m4v (video/mp4, video/x-m4v) — Zoom/Teams/Meet recordings - mov / qt (video/quicktime) — Mac screen recordings - webm (video/webm) — browser recordings - mkv (video/x-matroska) — OBS and general purpose - avi (video/x-msvideo) — legacy Windows - mpeg / mpg (video/mpeg) - ogv (video/ogg) — Ogg Theora - 3gp / 3g2 (video/3gpp) — mobile recordings - ts / m2ts (video/mp2t) — MPEG transport stream - New `isVideo` + `isAudioOrVideo` helpers in lib/shared/file-types. - Client: Film icon (indigo) for video chips, duration read via HTMLVideoElement so it works for both audio-only and video files. - Server: mutations + buildMessageWithAttachments + file-displays + source-cards all route audio and video through the same transcription path. LLM sees `🎥` for video, `🎙️` for audio. - Max file size raised to 2 GB for audio/video (covers 4-hour 720p meetings). Duration cap stays at 4 hours. - i18n: `fileTypes.video` added to en/de/fr.
…ed exports - process_attachments.test.ts mocked `../lib/shared/file-types` without `isAudio`, which broke after the real module started importing it. Added to the mock. - file-types.test.ts expected `video/mp4` → undefined; now that we officially support video mp4, swap the assertion to an actually-unknown MIME. - Knip flagged `isVideo` and `CHAT_AUDIO_MAX_FILE_SIZE` as unused exports — both are only consumed inside file-types.ts via `isAudioOrVideo` and `getMaxFileSizeForType`. Make them module-local.
- Split provider docs into UI-only (platform/admin/providers) and a new system-level reference (self-hosted/configuration/providers) covering JSON config schema, example files, SOPS secrets, self-hosted backends, Docker networking, and provider pinning; removed platform/integrations/ providers in favour of the new reference. - Document audio and video chat transcription (PR #1591) in chat/ attachments and chat/basics across all locales: new transcription model tag, server-side pipeline, status UI, 4-hour duration cap, per-minute pricing via centsPerAudioMinute, and per-MIME upload size caps in governance. - Reframe the Meetily meeting-transcription tutorial as the fully-local alternative to the new server-side path. - Add a UI-vs-system placement rule to docs/AGENTS.md with a worked provider example so future changes keep end-user content under platform/ and filesystem/config content under self-hosted/. - Update inbound links, tutorial cross-references, and docs.json nav across en/de/fr; add explicit {#upload-policy} anchor on governance headings for cross-locale link stability.
Summary
/audio/transcriptions→ concat. Handles up to 4 hours of audio.usageLedgervia a newcentsPerAudioMinutecost model.transcriptiontag + provider default. Self-hosted Whisper (faster-whisper-server/ LocalAI / vLLM) swaps in by changingbaseUrlonly.runningrows.Test plan
which ffmpeginsideexamples/providers/openai.json+ SOPS secret; verify UI acceptstranscriptiontag.m4a: status flowsqueued → running (transcribing) → completed; LLM references contenttranscription.cancelled, no further workaudioDurationSecpopulatedbaseUrlto localfaster-whisper-server; end-to-end works with zero code changeSummary by CodeRabbit
New Features
Chores