feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video)#624
feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video)#624tombeckenham wants to merge 11 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds MediaPrompt types and resolver; threads modality-aware prompt types into image/video activities and adapters; implements provider-specific mapping and validation (OpenAI, Gemini, fal.ai, Grok, OpenRouter); updates client, examples, tests, docs, scripts, and e2e for image-to-image and image-to-video. ChangesMedia input conditioning across core and adapters
Estimated code review effort: 🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested reviewers
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
🚀 Changeset Version Preview8 package(s) bumped directly, 23 bumped as dependents. 🟥 Major bumps
🟨 Minor bumps
🟩 Patch bumps
|
|
View your CI Pipeline Execution ↗ for commit 1b52533
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-mcp
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
0740073 to
483a3d4
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
packages/ai-fal/tests/image-inputs.test.ts (1)
1-249: ⚡ Quick winMove this unit test alongside its source module.
This test lives under
packages/ai-fal/tests/instead of next to the mapped source (e.g., nearpackages/ai-fal/src/image/image-inputs.ts), which diverges from the repo’s unit-test placement rule.As per coding guidelines, "
**/*.test.ts: Place unit tests in*.test.tsfiles alongside source code".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai-fal/tests/image-inputs.test.ts` around lines 1 - 249, The test file image-inputs.test.ts needs to be moved to sit next to its source module image-inputs.ts (the module exporting mapImageInputsToFalFields and mapImageInputsToFalVideoFields); relocate the test file into the same directory as image-inputs.ts, update any relative imports (e.g., ../src/image/image-inputs to ./image-inputs) and the generated constant import if needed, and ensure the test runner still discovers it (adjust any package test config or export paths if required).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/media/image-generation.md`:
- Around line 151-155: Update the example calls to use the provider's current
image model instead of the hardcoded 'gpt-image-1': replace
openaiImage('gpt-image-1') with the latest model string from the OpenAI
adapter's model-meta.ts (use the adapter helper openaiImage(...) with that model
name) for every example instance (e.g., the generateImage call shown and the
other occurrences noted). Locate and edit the generateImage usage and any other
examples referencing openaiImage to pull the canonical model identifier from the
OpenAI adapter's model-meta.ts and substitute it in the docs so examples stay
up-to-date.
In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 98-109: The current merge order in the input construction lets
mapped image fields from mapImageInputsToFalFields (inputFields) overwrite
options.modelOptions, contradicting the intended behavior; change the spread
order so options.modelOptions is applied last (i.e., merge sizeParams and
inputFields first, then ...options.modelOptions) so per-endpoint modelOptions
override derived image-input keys when creating the input object (look for the
input variable creation that includes mapImageInputsToFalFields, sizeParams, and
prompt).
In `@packages/ai-gemini/src/adapters/image.ts`:
- Around line 256-264: The imagePartToGeminiPart path currently calls
fetch(part.source.value) with no URL safety checks; add a fail-closed URL gate
before the fetch that (1) parses part.source.value, (2) enforces a protocol
allowlist (only https and optionally data/gs if your flow needs them), and (3)
resolves and blocks private/loopback/internal IP ranges (RFC1918, 127.0.0.0/8,
IPv6 equivalents, localhost, and link-local) as well as preventing hostnames
that resolve to those IPs; if the URL fails any check, throw an Error instead of
calling fetch. Locate this logic around imagePartToGeminiPart where
fetch(part.source.value) is invoked and ensure errors surface before converting
the response to blob/arrayBuffer and calling arrayBufferToBase64.
In `@packages/ai-grok/src/adapters/image.ts`:
- Around line 254-264: The fetch call in editImages is missing a bounded
timeout; create an AbortController inside the editImages function, pass
controller.signal to the fetch options, and start a setTimeout that calls
controller.abort() after a chosen timeout (e.g., 30s or a configurable
constant); ensure you clearTimeout when fetch resolves/rejects and handle abort
errors appropriately (keep existing error handling behavior). Update the fetch
invocation in packages/ai-grok/src/adapters/image.ts to include signal:
controller.signal and wrap the timer lifecycle so the request cannot hang
indefinitely.
In `@packages/ai-openai/src/adapters/image.ts`:
- Around line 193-198: The error thrown when maxImages === 0 gives incorrect
guidance by omitting supported models (it lists gpt-image-1, gpt-image-1-mini,
or dall-e-2 but not gpt-image-2); update the message in the block using
EDIT_MAX_IMAGES and the local variable maxImages (and this.name) to mention
gpt-image-2 or, better, mirror the keys/allowed models from EDIT_MAX_IMAGES so
the guidance matches the actual supported-image models.
In `@packages/ai-openai/src/adapters/video.ts`:
- Around line 90-95: validateVideoSize is only checking the root size variable
and can miss modelOptions.size used later when building the request; update the
code around the adapter where you destructure options (the block with const {
model, size, duration, modelOptions } and const { imageInputs, videoInputs,
audioInputs }) to compute an effectiveSize (e.g., size ?? modelOptions?.size)
and call validateVideoSize(model, effectiveSize) so the value actually sent in
the request is validated; do the same pattern for seconds already handled
(seconds = duration ?? modelOptions?.seconds) to ensure consistency.
In `@packages/ai-openai/src/image/image-input-to-file.ts`:
- Around line 49-55: The fetch of arbitrary part.source.value is unsafe and can
enable SSRF or hanging requests; update the image ingestion in
image-input-to-file.ts to (1) validate the URL before fetching by parsing
part.source.value with new URL and reject non-http(s) schemes, disallow
localhost/127.0.0.0/8 and RFC1918 private IP ranges (resolve hostname to IP and
check against denylist), and disallow file:, data: (handle data: separately),
and (2) perform the fetch with a bounded timeout using AbortController (and
limit redirects) so requests cannot hang; apply these checks and timeout around
the existing fetch/response logic that references part.source.value.
In `@packages/ai/skills/ai-core/media-generation/SKILL.md`:
- Around line 673-676: Update the wording that currently reads "Adapters throw a
clear runtime error when the caller passes `imageInputs` to a model that can't
honor it (dall-e-3, Imagen, Grok, OpenRouter)" to scope the limitation to
specific unsupported models rather than entire providers; keep the phrase about
adapters and `imageInputs` but replace the parenthetical list with explicit
unsupported model examples (e.g., "dall-e-3, Imagen") or rephrase to "certain
models (for example, dall-e-3 and Imagen)" and remove or clarify Grok/OpenRouter
as providers so you don't imply the whole provider lacks image-conditioned
routes. Ensure the corrected sentence still mentions adapters throwing a runtime
error for unsupported models.
In `@scripts/generate-fal-image-field-map.ts`:
- Around line 223-236: The arity sanity check is being skipped for
default-selected fields because the loop continues when chosen ===
DEFAULTS[role]; update the loop so it does not early-continue for
default-selected fields — only skip when no chosen — and ensure the existing
arity comparison using isList.get(chosen), LIST_FIELDS.has(chosen), and
endpointId still runs for DEFAULTS[role] values so mismatches between the
runtime type map (isList) and the static LIST_FIELDS are caught (also update any
comment to mention image-inputs.ts where LIST_FIELDS is defined).
---
Nitpick comments:
In `@packages/ai-fal/tests/image-inputs.test.ts`:
- Around line 1-249: The test file image-inputs.test.ts needs to be moved to sit
next to its source module image-inputs.ts (the module exporting
mapImageInputsToFalFields and mapImageInputsToFalVideoFields); relocate the test
file into the same directory as image-inputs.ts, update any relative imports
(e.g., ../src/image/image-inputs to ./image-inputs) and the generated constant
import if needed, and ensure the test runner still discovers it (adjust any
package test config or export paths if required).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c39fe612-aaa6-4d46-a616-b99323822c03
⛔ Files ignored due to path filters (1)
packages/ai-fal/src/image/generated/image-field-overrides.tsis excluded by!**/generated/**
📒 Files selected for processing (29)
.changeset/image-and-video-inputs.md.prettierignoredocs/adapters/grok.mddocs/media/image-generation.mddocs/media/video-generation.mdpackage.jsonpackages/ai-event-client/src/index.tspackages/ai-fal/src/adapters/image.tspackages/ai-fal/src/adapters/video.tspackages/ai-fal/src/image/image-inputs.tspackages/ai-fal/tests/image-inputs.test.tspackages/ai-gemini/src/adapters/image.tspackages/ai-grok/src/adapters/image.tspackages/ai-grok/src/image/image-provider-options.tspackages/ai-grok/src/model-meta.tspackages/ai-grok/tests/grok-adapter.test.tspackages/ai-openai/src/adapters/image.tspackages/ai-openai/src/adapters/video.tspackages/ai-openai/src/image/image-input-to-file.tspackages/ai-openai/tests/image-adapter.test.tspackages/ai-openrouter/src/adapters/image.tspackages/ai-openrouter/tests/image-adapter.test.tspackages/ai/skills/ai-core/media-generation/SKILL.mdpackages/ai/src/activities/generateImage/index.tspackages/ai/src/activities/generateVideo/index.tspackages/ai/src/types.tsscripts/generate-fal-image-field-map.tstesting/e2e/src/lib/feature-support.tstesting/e2e/src/lib/types.ts
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
packages/ai-fal/src/adapters/image.ts (1)
109-117:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMerge order still prevents
modelOptionsfrom overriding derived fields.The past review comment on this issue remains valid. Current spread order has
inputFields(line 112) overridemodelOptions(line 110), contradicting the comment on lines 105-107 that claims "user overrides win." Users cannot override derived image-input fields (e.g.,mask_url,reference_image_urls) viamodelOptionswhenimageInputsare present.💡 Apply the past reviewer's fix
const input = { - ...options.modelOptions, ...sizeParams, ...inputFields, + ...options.modelOptions, ...(resolved.text ? { prompt: resolved.text } : {}), num_images: options.numberOfImages, } as FalModelInput<TModel>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai-fal/src/adapters/image.ts` around lines 109 - 117, The merge order currently lets inputFields override options.modelOptions so user-provided modelOptions can't override derived image fields; update the spread order when building the input object (the variable named input of type FalModelInput) so that options.modelOptions is spread last (after sizeParams, inputFields and the conditional prompt) — this ensures user modelOptions win and still preserves resolved.text and num_images handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/ai-fal/src/adapters/video.ts`:
- Around line 164-174: The current construction of the input object lets
media-derived fields (sizeParams, inputImageFields, videoFields, audioFields)
override user-supplied modelOptions; change the spread order so modelOptions is
spread after the derived media fields (i.e., build input from sizeParams,
inputImageFields, videoFields, audioFields, then ...modelOptions) while
preserving the conditional prompt and duration spreads (resolved.text and
duration) and the FalModelInput<TModel> typing so user-provided keys like
video_url, reference_video_urls, or audio_url in modelOptions take precedence.
In `@packages/ai-openai/src/adapters/video.ts`:
- Around line 99-127: Add unit tests covering createVideoJob multimodal
handling: (1) text-only prompt should call Videos.create with model and prompt
and no input_reference; (2) text+one image should convert the resolved image via
imagePartToFile and attach the returned File as request.input_reference — mock
imagePartToFile and the OpenAI_SDK.Videos.create call to assert the passed
params include input_reference; (3) more than one image should throw the same
error path (verify the thrown message). Also add a Playwright + aimock E2E test
that posts a single-image multimodal prompt and asserts the upstream Sora
request contains a single input_reference upload. Reference createVideoJob,
resolveMediaPrompt, imagePartToFile, and request.input_reference when locating
code to test and to mock.
---
Duplicate comments:
In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 109-117: The merge order currently lets inputFields override
options.modelOptions so user-provided modelOptions can't override derived image
fields; update the spread order when building the input object (the variable
named input of type FalModelInput) so that options.modelOptions is spread last
(after sizeParams, inputFields and the conditional prompt) — this ensures user
modelOptions win and still preserves resolved.text and num_images handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 98656191-029e-4c17-87e3-a6eb4ac01845
📒 Files selected for processing (33)
.changeset/image-and-video-inputs.md.gitignoredocs/adapters/grok.mddocs/media/image-generation.mddocs/media/video-generation.mdpackages/ai-fal/src/adapters/image.tspackages/ai-fal/src/adapters/video.tspackages/ai-fal/src/image/image-inputs.tspackages/ai-fal/src/model-meta.tspackages/ai-gemini/src/adapters/image.tspackages/ai-gemini/src/image/image-provider-options.tspackages/ai-grok/src/adapters/image.tspackages/ai-grok/src/image/image-provider-options.tspackages/ai-grok/tests/grok-adapter.test.tspackages/ai-openai/src/adapters/image.tspackages/ai-openai/src/adapters/video.tspackages/ai-openai/src/image/image-provider-options.tspackages/ai-openai/src/video/video-provider-options.tspackages/ai-openai/tests/image-adapter.test.tspackages/ai-openrouter/src/adapters/image.tspackages/ai-openrouter/src/image/image-provider-options.tspackages/ai-openrouter/tests/image-adapter.test.tspackages/ai/skills/ai-core/media-generation/SKILL.mdpackages/ai/src/activities/generateImage/adapter.tspackages/ai/src/activities/generateImage/index.tspackages/ai/src/activities/generateVideo/adapter.tspackages/ai/src/activities/generateVideo/index.tspackages/ai/src/index.tspackages/ai/src/types.tspackages/ai/src/utilities/media-prompt.tspackages/ai/tests/image-per-model-type-safety.test.tspackages/ai/tests/media-prompt.test.tstesting/e2e/src/lib/feature-support.ts
✅ Files skipped from review due to trivial changes (2)
- docs/adapters/grok.md
- docs/media/video-generation.md
🚧 Files skipped from review as they are similar to previous changes (7)
- packages/ai-openrouter/src/adapters/image.ts
- packages/ai-grok/src/image/image-provider-options.ts
- packages/ai-grok/tests/grok-adapter.test.ts
- packages/ai-openai/src/adapters/image.ts
- packages/ai-fal/src/image/image-inputs.ts
- packages/ai-grok/src/adapters/image.ts
- packages/ai-gemini/src/adapters/image.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/ai-openai/tests/video-adapter.test.ts (1)
81-112: ⚡ Quick winSplit combined rejection test into separate cases for clarity.
This single test case validates rejection of both video and audio parts. While the coverage is correct, combining two distinct validation scenarios in one
it()block reduces test clarity and makes failure diagnostics less precise. If either rejection path fails, the error message will be less informative about which modality caused the failure.♻️ Proposed refactor to separate test cases
- it('rejects video and audio prompt parts', async () => { + it('rejects video prompt parts', async () => { const { adapter, mockCreate } = mockedAdapter() await expect( adapter.createVideoJob({ model: 'sora-2', prompt: [ { type: 'text', content: 'x' }, { type: 'video', source: { type: 'url', value: 'https://example.com/v.mp4' }, }, ], logger: testLogger, }), ).rejects.toThrow(/video prompt parts/) + expect(mockCreate).not.toHaveBeenCalled() + }) + it('rejects audio prompt parts', async () => { + const { adapter, mockCreate } = mockedAdapter() await expect( adapter.createVideoJob({ model: 'sora-2', prompt: [ { type: 'text', content: 'x' }, { type: 'audio', source: { type: 'url', value: 'https://example.com/a.mp3' }, }, ], logger: testLogger, }), ).rejects.toThrow(/audio prompt parts/) expect(mockCreate).not.toHaveBeenCalled() })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai-openai/tests/video-adapter.test.ts` around lines 81 - 112, Split the combined rejection test into two separate tests to improve clarity: create one it() block that uses mockedAdapter() and testLogger to call adapter.createVideoJob with a prompt containing a video part and asserts rejects.toThrow(/video prompt parts/) and that mockCreate was not called, and a second it() block that does the same for a prompt containing an audio part asserting rejects.toThrow(/audio prompt parts/) and mockCreate not called; keep using mockedAdapter(), adapter.createVideoJob, mockCreate, and testLogger to locate and reuse the existing mocks and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/ai-openai/tests/video-adapter.test.ts`:
- Around line 81-112: Split the combined rejection test into two separate tests
to improve clarity: create one it() block that uses mockedAdapter() and
testLogger to call adapter.createVideoJob with a prompt containing a video part
and asserts rejects.toThrow(/video prompt parts/) and that mockCreate was not
called, and a second it() block that does the same for a prompt containing an
audio part asserting rejects.toThrow(/audio prompt parts/) and mockCreate not
called; keep using mockedAdapter(), adapter.createVideoJob, mockCreate, and
testLogger to locate and reuse the existing mocks and assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1a45a71f-c5d6-4a1d-a238-42e1ecbdd798
📒 Files selected for processing (12)
.changeset/image-and-video-inputs.mddocs/media/image-generation.mddocs/media/video-generation.mdpackages/ai-fal/src/image/image-inputs.tspackages/ai-fal/tests/image-inputs.test.tspackages/ai-fal/tests/video-adapter.test.tspackages/ai-gemini/tests/image-adapter.test.tspackages/ai-openai/src/adapters/image.tspackages/ai-openai/src/adapters/video.tspackages/ai-openai/tests/image-adapter.test.tspackages/ai-openai/tests/video-adapter.test.tspackages/ai/skills/ai-core/media-generation/SKILL.md
✅ Files skipped from review due to trivial changes (2)
- docs/media/video-generation.md
- .changeset/image-and-video-inputs.md
🚧 Files skipped from review as they are similar to previous changes (6)
- packages/ai/skills/ai-core/media-generation/SKILL.md
- packages/ai-openai/tests/image-adapter.test.ts
- packages/ai-fal/tests/image-inputs.test.ts
- packages/ai-openai/src/adapters/video.ts
- packages/ai-openai/src/adapters/image.ts
- packages/ai-fal/src/image/image-inputs.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
examples/ts-react-media/src/components/ImageGenerator.tsx (1)
9-10: 💤 Low valueFix import order per ESLint rules.
The ESLint
import/orderrule requires the type import from@/lib/mediato come before the regular import from@/lib/server-functions. This is a formatting issue flagged by static analysis.♻️ Suggested reordering
import { generateImageFn } from '`@/lib/server-functions`' import { getRandomImagePrompt } from '`@/lib/prompts`' import { IMAGE_MODELS } from '`@/lib/models`' -import { readImageFile, toImagePart } from '`@/lib/media`' import type { AttachedImage } from '`@/lib/media`' +import { readImageFile, toImagePart } from '`@/lib/media`'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/ts-react-media/src/components/ImageGenerator.tsx` around lines 9 - 10, Reorder the imports so the type-only import AttachedImage is placed before the value imports readImageFile and toImagePart to satisfy the ESLint import/order rule; update the import lines in ImageGenerator.tsx so the `import type { AttachedImage } from '`@/lib/media`'` statement appears above `import { readImageFile, toImagePart } from '`@/lib/media`'`.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/ts-react-media/src/components/ImageGenerator.tsx`:
- Around line 194-241: Update the reference-images note in the ImageGenerator
component: replace the incorrect "Supported by Gemini native (NanoBanana) models
only" text with a correct statement that image inputs are supported only by the
Gemini image-preview models (e.g., "Supported by Gemini native image-preview
models only: gemini-3.1-flash-image-preview, gemini-3-pro-image-preview"). Do
not reference NanoBanana (fal-ai/nano-banana-pro) since server-functions.ts uses
asTextPrompt for that model (see asTextPrompt and asImagePrompt usages) and
NanoBanana does not accept image prompts.
In `@testing/e2e/src/components/VideoGenUI.tsx`:
- Around line 21-35: The fileToBase64 function currently resolves an empty
string when reader.result.split(',')[1] is undefined; update fileToBase64 to
validate the Data URL format and reject with a clear Error instead of returning
an empty string. Specifically, in fileToBase64 (where reader.onload handles
result), check that result is a string and that result.includes(',') and that
split(',')[1] is a non-empty string; if not, call reject(new Error('Invalid data
URL format')) (or similar) so callers receive an explicit error rather than an
empty value.
- Around line 68-83: In handleGenerate, add validation before calling
fileToBase64: verify imageFile exists and that its MIME type is an image (e.g.,
imageFile.type startsWith('image/') or fallback to checking the file extension)
and enforce a max size threshold (e.g., 5MB) to avoid large base64 payloads; if
the file fails validation, early-return and surface a user-friendly
error/notification instead of proceeding. Perform these checks before the base64
conversion and only call generate with the image block (and include
imageFile.type as mimeType) when validations pass; keep the validation logic
close to the existing handleGenerate, referencing imageFile, fileToBase64, and
generate.
---
Nitpick comments:
In `@examples/ts-react-media/src/components/ImageGenerator.tsx`:
- Around line 9-10: Reorder the imports so the type-only import AttachedImage is
placed before the value imports readImageFile and toImagePart to satisfy the
ESLint import/order rule; update the import lines in ImageGenerator.tsx so the
`import type { AttachedImage } from '`@/lib/media`'` statement appears above
`import { readImageFile, toImagePart } from '`@/lib/media`'`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 65b1dca2-a819-4d1e-9ae0-2c9144267443
📒 Files selected for processing (26)
.changeset/image-and-video-inputs.mddocs/media/video-generation.mdexamples/ts-react-media/src/components/ImageGenerator.tsxexamples/ts-react-media/src/components/VideoGenerator.tsxexamples/ts-react-media/src/lib/media.tsexamples/ts-react-media/src/lib/server-functions.tspackages/ai-client/src/generation-types.tspackages/ai-client/tests/video-generation-client.test.tspackages/ai-fal/src/model-meta.tspackages/ai-fal/tests/video-adapter.test.tspackages/ai/src/client.tstesting/e2e/README.mdtesting/e2e/fixtures/image-to-image/basic.jsontesting/e2e/global-setup.tstesting/e2e/src/components/ImageGenUI.tsxtesting/e2e/src/components/VideoGenUI.tsxtesting/e2e/src/lib/feature-support.tstesting/e2e/src/lib/features.tstesting/e2e/src/lib/server-functions.tstesting/e2e/src/routes/$provider/$feature.tsxtesting/e2e/src/routes/api.image.stream.tstesting/e2e/src/routes/api.image.tstesting/e2e/src/routes/api.video.stream.tstesting/e2e/src/routes/api.video.tstesting/e2e/tests/image-to-image.spec.tstesting/e2e/tests/image-to-video.spec.ts
✅ Files skipped from review due to trivial changes (6)
- testing/e2e/README.md
- testing/e2e/src/routes/api.image.stream.ts
- testing/e2e/fixtures/image-to-image/basic.json
- packages/ai-client/tests/video-generation-client.test.ts
- testing/e2e/src/routes/api.video.stream.ts
- .changeset/image-and-video-inputs.md
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/ai-fal/tests/video-adapter.test.ts
- docs/media/video-generation.md
…e activity follow-ups Closes #707. - Add openRouterVideo: async jobs adapter for OpenRouter's dedicated video API (submit -> poll -> download). Per-model size/duration/option types are generated from GET /api/v1/videos/models; frame roles map onto frame_images[] / input_references[] per the MediaInputRole taxonomy. - Teach the model-meta sync scripts the videos/models endpoint (openrouter.video-models.json + OPENROUTER_VIDEO_MODEL_META). - Image adapter follow-ups from the #624 review: throw on unmapped sizes (the size union used a Unicode multiplication sign so every non-square size silently dropped its aspect ratio), throw on numberOfImages > 1 (live-verified: the gateway ignores all count keys), expose image_config.strength. - Completed videos are returned as data: URLs (unsigned_urls 401 without the API key header) with gateway-reported cost on usage.cost. The SDK's getVideoContent is bypassed: its matcher only accepts application/octet-stream while the endpoint serves video/mp4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/ai-fal/src/adapters/image.ts (1)
113-117:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStrip
promptout ofmodelOptionsbefore the spread.Line 113 lets
modelOptions.promptsurvive wheneverresolved.textis empty, because Line 116 only reassertspromptconditionally. That reopens a second text-prompt path for media-only requests and breaks the “top-levelpromptis the multimodal surface / call-controlled fields win” contract.Proposed fix
+ const { prompt: _prompt, num_images: _numImages, ...modelOptions } = + options.modelOptions ?? {} const input = { ...sizeParams, ...inputFields, - ...options.modelOptions, + ...modelOptions, // Media-only prompts (e.g. upscalers, background removal) omit the // prompt field entirely rather than sending an empty string. ...(resolved.text ? { prompt: resolved.text } : {}), num_images: options.numberOfImages, } as FalModelInput<TModel>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/ai-fal/src/adapters/image.ts` around lines 113 - 117, The spread of options.modelOptions currently allows modelOptions.prompt to leak through when resolved.text is empty; remove prompt from modelOptions before spreading so only the conditional top-level prompt (based on resolved.text) is used. Locate the creation of the request payload where options.modelOptions is spread (refer to options.modelOptions and resolved.text in adapters/image.ts) and replace the spread with a prompt-stripped object (e.g., destructure to drop prompt from options.modelOptions or shallow-copy and delete prompt) then spread that cleaned object, preserving num_images and the conditional ...(resolved.text ? { prompt: resolved.text } : {}) behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 113-117: The spread of options.modelOptions currently allows
modelOptions.prompt to leak through when resolved.text is empty; remove prompt
from modelOptions before spreading so only the conditional top-level prompt
(based on resolved.text) is used. Locate the creation of the request payload
where options.modelOptions is spread (refer to options.modelOptions and
resolved.text in adapters/image.ts) and replace the spread with a
prompt-stripped object (e.g., destructure to drop prompt from
options.modelOptions or shallow-copy and delete prompt) then spread that cleaned
object, preserving num_images and the conditional ...(resolved.text ? { prompt:
resolved.text } : {}) behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d43d1dcb-bf3a-4293-b67b-ff94ad4d392b
📒 Files selected for processing (6)
examples/ts-react-media/src/components/ImageGenerator.tsxpackages/ai-fal/src/adapters/image.tspackages/ai-fal/src/adapters/video.tspackages/ai-openai/src/adapters/video.tsscripts/generate-fal-image-field-map.tstesting/e2e/src/components/VideoGenUI.tsx
🚧 Files skipped from review as they are similar to previous changes (4)
- testing/e2e/src/components/VideoGenUI.tsx
- packages/ai-openai/src/adapters/video.ts
- examples/ts-react-media/src/components/ImageGenerator.tsx
- packages/ai-fal/src/adapters/video.ts
…tioned generation (closes #618) Adds optional `imageInputs`, `videoInputs`, and `audioInputs` to `generateImage()` and `generateVideo()` for image-to-image, multi-reference, mask / inpaint, image-to-video, and starting-frame flows. Each input part may carry a `metadata.role` hint (`'reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character'`) that adapters use to route to the provider-specific field. Provider behavior: - OpenAI image: gpt-image-1 / -mini route to `images.edit()` (up to 16 + mask); dall-e-2 routes to `images.edit()` with one source; dall-e-3 throws. - OpenAI video: Sora-2 / -pro accept a single `input_reference`; throws on >1. - Gemini: native models receive inputs as multimodal `contents` parts; Imagen throws (text-only). - fal: 1 input → `image_url`, >1 → `image_urls`; metadata roles map to `mask_url` / `control_image_url` / `reference_image_urls`; video adds `start_image_url` / `end_image_url`. Interim mapping until the fal schemas library lands. - Grok, OpenRouter: throw with a link back to #618 (pending native Imagine API rewrite and multimodal injection work respectively). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…SDK type map Replace the fal image-input field heuristic with a per-endpoint mapping generated from @fal-ai/client's EndpointTypeMap (scripts/ generate-fal-image-field-map.ts, run via pnpm generate:fal-image-fields). The committed artifact stores only the 362 endpoints whose field names deviate from the defaults (e.g. nano-banana edit -> image_urls, Kling i2v start frame -> image_url, Veo first-last-frame -> first_frame_url / last_frame_url, Fooocus masks -> mask_image_url); the old heuristic remains the fallback for endpoints newer than the installed SDK. Safety rails: the generated file `satisfies`-checks every field name against the SDK endpoint types (type-only, erased at runtime), and a unit test hashes the installed endpoints.d.ts against the recorded hash so an SDK bump without regeneration fails test:lib with the regen command. Mappers are now typed: both return FalImageInputFields<TModel>, Pick'ed from the endpoint's real input type via a generated field-name union. Roles resolving to the same list field merge (source + reference on nano-banana); colliding scalar fields throw instead of overwriting. Also fixes the remaining CI lint failures: duplicate @tanstack/ai import and non-null assertion in ai-fal video.ts, switch-exhaustiveness errors in image-inputs.ts (restructured away), and the non-null assertion in ai-openai image.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d generation
Grok: add the xAI Imagine API image models (grok-imagine-image,
grok-imagine-image-quality) to model-meta. With imageInputs they route to
xAI's JSON POST /v1/images/edits endpoint via direct fetch (the OpenAI
SDK's images.edit() sends multipart/form-data, which xAI rejects) — a
single input as image:{url}, 2-3 inputs as images:[...] referenceable in
the prompt as <IMAGE_0>/<IMAGE_1>; >3 inputs and mask/control roles throw.
Their generic `size` uses an aspectRatio_resolution template ('16:9_2k',
suffix optional), mirroring Gemini's native image models, and maps to the
Imagine aspect_ratio/resolution parameters on both the generate and edit
paths. grok-2-image-1212 stays text-to-image only with a clear error.
OpenRouter: imageInputs are injected as multimodal image_url content parts
alongside the prompt in the chat-completions message and forwarded to the
underlying image model.
Neither path fetches or base64-encodes URL sources in-process — URLs pass
through verbatim and are fetched by the provider; data sources become data
URIs. Bumps ai-grok and ai-openrouter to minor in the existing changeset.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… API drift - Move the generated fal image-field map and the generator's paths from packages/typescript/ai-fal to packages/ai-fal (repo flattened the layout) - Add gpt-image-2 to EDIT_MAX_IMAGES (new model on main; same 16-image edit limit as the other gpt-image models) - Map edit-path usage through buildImagesUsage to match the new TokenUsage shape, and drop two now-unnecessary type assertions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s text through verbatim Replace the imageInputs / videoInputs / audioInputs fields with a multimodal prompt: string | MediaPromptPart[]. Part order is meaningful — natively multimodal providers (Gemini, OpenRouter) receive parts in interleaved order; named-field providers (OpenAI, fal, xAI) extract media parts via the new resolveMediaPrompt() utility and flatten the text. Zero magic: prompt text is always sent verbatim. The SDK never injects or rewrites in-prompt referencing markers — users write each provider's own convention (fal Kling/Seedance @image1, OpenAI/FLUX.2 "image 1" prose, Gemini content descriptions), now documented per provider in the media docs. An earlier grok <IMAGE_n> auto-injection was removed after research showed the convention is absent from xAI's official docs (images are addressed by request order). - Per-model compile-time prompt narrowing via TModelInputModalitiesByName adapter generic (e.g. dall-e-3 / Imagen reject image parts as a type error); fal modality maps are derived at the type level from the SDK's endpoint input types - metadata.tag added as an informational label (never read by adapters) - Gemini now preserves true interleaving in contents; OpenRouter maps parts 1:1 onto chat content parts in order Closes #618 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- openai: add gpt-image-2 to the editImages error message and JSDoc
(the model is edit-capable via EDIT_MAX_IMAGES but was omitted from
user-facing guidance); same fix in docs, SKILL.md, and the changeset
- openai: throw when the images.edit() response contains no usable
images (matching grok's guard) instead of resolving to { images: [] }
- openai: drop the unnecessary input_reference cast in the Sora
adapter — the SDK types the field, so assign directly
- fal: reject metadata.role 'mask'/'control' in the video mapper
instead of silently folding them into source frames
- docs: mark Veo role mappings as planned (no Veo adapter yet), note
the Gemini ~14-image limit is provider-side, bump samples to
gpt-image-2
- tests: cover the Gemini image-conditioned path (interleaved
contents, fileData vs inlineData vs fetch+inline, Imagen/video/audio
rejection), the Sora input_reference upload and guards (new file),
the fal video createVideoJob field assembly and audio guard, and the
openai empty-edit-response guard
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same defect class as the editImages guard in the previous commit: the
text-to-image path silently resolved to { images: [] } when response
items had neither b64_json nor url. Surface it as an error instead.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l field demotion - ai-client: widen ImageGenerateInput.prompt / VideoGenerateInput.prompt from string to MediaPrompt so useGenerateImage/useGenerateVideo can carry image parts from the browser; re-export the MediaPrompt types from @tanstack/ai/client - ai-fal: demote media-conditioning fields (FalImageFieldName set plus video_url/video_urls/reference_video_urls/audio_url) from required to optional in FalImageProviderOptions / FalVideoProviderOptions — i2v endpoints declare e.g. image_url as required, but with a multimodal prompt the start frame arrives as a prompt part; modelOptions stays available as the explicit escape hatch - e2e: real coverage for image-to-image (OpenAI /v1/images/edits) and image-to-video (Sora multipart /v1/videos with input_reference) — the installed aimock 1.29 mocks both multipart endpoints, so the previous "aimock can't mock this" empty provider sets were stale. New specs run all three transports and assert via aimock's request journal that the expected wire endpoint was hit. ImageGenUI/VideoGenUI gain a file input, feature routing/fixtures/onVideo registration added, README matrix updated - examples/ts-react-media: ImageGenerator gains a multi-image reference picker (Gemini native models); VideoGenerator sends the start frame as a prompt part with role 'start_frame' instead of modelOptions URLs; server functions narrow the wire prompt per model and throw on unsupported part kinds instead of dropping them Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- fal image/video: spread modelOptions after derived media fields so explicit user overrides win (matches documented intent) - openai video: validate effective size (size ?? modelOptions.size) - generate-fal-image-field-map: run arity check for default-selected fields too - ts-react-media example: correct reference-image support comment (Gemini multimodal models, not NanoBanana) - e2e VideoGenUI: reject on malformed data URL instead of resolving '' Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0c65cc7 to
acd7319
Compare
…e activity follow-ups Closes #707. - Add openRouterVideo: async jobs adapter for OpenRouter's dedicated video API (submit -> poll -> download). Per-model size/duration/option types are generated from GET /api/v1/videos/models; frame roles map onto frame_images[] / input_references[] per the MediaInputRole taxonomy. - Teach the model-meta sync scripts the videos/models endpoint (openrouter.video-models.json + OPENROUTER_VIDEO_MODEL_META). - Image adapter follow-ups from the #624 review: throw on unmapped sizes (the size union used a Unicode multiplication sign so every non-square size silently dropped its aspect ratio), throw on numberOfImages > 1 (live-verified: the gateway ignores all count keys), expose image_config.strength. - Completed videos are returned as data: URLs (unsigned_urls 401 without the API key header) with gateway-reported cost on usage.cost. The SDK's getVideoContent is bypassed: its matcher only accepts application/octet-stream while the endpoint serves video/mp4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n contract (#634) Restacked on 618-image-to-image-and-image-to-video-support to adopt the multimodal MediaPrompt format, carrying a minimal additive port of the #534 typed-duration contract: - @tanstack/ai (non-breaking): VideoAdapter/BaseVideoAdapter gain a TModelDurationByName generic (default Record<string, number> preserves existing duration?: number typing), DurationOptions, snapToDurationOption, and default availableDurations()/snapDuration() implementations. generateVideo's duration is typed via VideoDurationForAdapter. - @tanstack/ai-gemini: GeminiVideoAdapter over generateVideos / getVideosOperation with per-model typed durations (Veo 3.x 4|6|8, Veo 2 5|6|8 per current Veo docs), MediaPrompt image routing (start_frame → image, end_frame → lastFrame, reference/character → referenceImages), RAI filter surfacing, geminiVideo/createGeminiVideo factories, and finalized Veo model-meta entries. - E2E: gemini added to video-gen with a custom aimock mount for :predictLongRunning + operations polling; all transports pass. - Docs + media-generation skill updated for Veo (typed durations, image-to-video role table). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1b52533 to
acd7319
Compare
Summary
Closes #618 by making
promptitself the multimodal surface forgenerateImage()/generateVideo(): a plain string, or an ordered array of content parts (TextPart/ImagePart/VideoPart/AudioPart) for image-to-image, reference-guided, multi-reference, edit/inpaint, and image-to-video flows.Design
contents, OpenRouter chat content parts) receive parts exactly as written. Named-field providers (OpenAI, fal, xAI) downrev through the newresolveMediaPrompt()utility: flattened text + per-modality part buckets, withmetadata.rolerouting (mask, control, reference, start/end frame).@Image1, OpenAI/FLUX.2"image 1"prose, Gemini content descriptions); the per-provider table is documented indocs/media/image-generation.md. Ametadata.taglabel exists for user bookkeeping only. (An earlier<IMAGE_n>auto-injection for grok was removed after source-verified research showed the convention is absent from xAI's official docs — xAI addresses edit images by request order.)TModelInputModalitiesByNameadapter generic narrows the accepted part types per model — passing an image part todall-e-3or Imagen is a type error, with the runtime throw kept as backstop. fal's maps are derived at the type level from the SDK's endpoint input types.Provider mapping
contents1:1, interleaving preserved; Imagen text-onlyimages.edit()(mask role →mask); Sora singleinput_reference/v1/images/edits(≤3 images, request order, prompt verbatim)image_url,mask_url, …)Testing
pnpm test:prgreen across all 34 projects (sherif, knip, docs, eslint, lib, types, build)resolveMediaPrompt(verbatim guarantee, ordering, buckets), per-model prompt-modality type assertions, adapter tests updated across openai/grok/openrouter/fal🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests & Tooling
Also includes the Gemini Veo video adapter on the typed-duration contract (merged in via #746/#741) — closes #634.