feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video) by tombeckenham · Pull Request #624 · TanStack/ai

tombeckenham · 2026-05-22T09:55:32Z

Summary

Closes #618 by making prompt itself the multimodal surface for generateImage() / generateVideo(): a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-to-image, reference-guided, multi-reference, edit/inpaint, and image-to-video flows.

await generateImage({
  adapter: geminiImage('gemini-3.1-flash-image-preview'),
  prompt: [
    { type: 'text', content: 'Not like this' },
    { type: 'image', source: { type: 'url', value: badExampleUrl } },
    { type: 'text', content: 'more like this' },
    { type: 'image', source: { type: 'url', value: goodExampleUrl } },
  ],
})

Design

Interleaving is canonical. Part order is meaningful; providers with natively multimodal prompts (Gemini contents, OpenRouter chat content parts) receive parts exactly as written. Named-field providers (OpenAI, fal, xAI) downrev through the new resolveMediaPrompt() utility: flattened text + per-modality part buckets, with metadata.role routing (mask, control, reference, start/end frame).
Zero magic. Prompt text is sent verbatim — the SDK never injects or rewrites referencing markers. Users write each provider's own convention (fal Kling/Seedance @Image1, OpenAI/FLUX.2 "image 1" prose, Gemini content descriptions); the per-provider table is documented in docs/media/image-generation.md. A metadata.tag label exists for user bookkeeping only. (An earlier <IMAGE_n> auto-injection for grok was removed after source-verified research showed the convention is absent from xAI's official docs — xAI addresses edit images by request order.)
Per-model compile-time safety. A new TModelInputModalitiesByName adapter generic narrows the accepted part types per model — passing an image part to dall-e-3 or Imagen is a type error, with the runtime throw kept as backstop. fal's maps are derived at the type level from the SDK's endpoint input types.

Provider mapping

Provider	Pathway
Gemini (native image models)	parts → `contents` 1:1, interleaving preserved; Imagen text-only
OpenAI	edits via `images.edit()` (mask role → `mask`); Sora single `input_reference`
Grok	grok-imagine → `/v1/images/edits` (≤3 images, request order, prompt verbatim)
OpenRouter	parts → chat content parts 1:1 in order
fal	role + order → generated per-endpoint field map (`image_url`, `mask_url`, …)

Testing

pnpm test:pr green across all 34 projects (sherif, knip, docs, eslint, lib, types, build)
New unit tests: resolveMediaPrompt (verbatim guarantee, ordering, buckets), per-model prompt-modality type assertions, adapter tests updated across openai/grok/openrouter/fal
E2E suite: 254 passed (image-to-image/image-to-video matrix entries are present but empty pending aimock support for the edit endpoints; adapter mapping covered by unit tests)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Multimodal prompts for image/video generation (ordered text/image/video/audio parts) with per-model validation and provider routing; client hooks and examples now accept structured media prompts; utility to resolve/flatten media prompts.
Documentation
- Expanded guides, examples, provider support matrices, and role/tag conventions for image-conditioned and image-to-video workflows.
Tests & Tooling
- New unit and E2E tests for multimodal flows; generator/tooling added to maintain provider field mappings.

Also includes the Gemini Veo video adapter on the typed-duration contract (merged in via #746/#741) — closes #634.

coderabbitai · 2026-05-22T09:55:39Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds MediaPrompt types and resolver; threads modality-aware prompt types into image/video activities and adapters; implements provider-specific mapping and validation (OpenAI, Gemini, fal.ai, Grok, OpenRouter); updates client, examples, tests, docs, scripts, and e2e for image-to-image and image-to-video.

Changes

Media input conditioning across core and adapters

Layer / File(s)	Summary
Core types, resolver and activity wiring `packages/ai/src/types.ts`, `packages/ai/src/utilities/media-prompt.ts`, `packages/ai/src/activities/generateImage/index.ts`, `packages/ai/src/activities/generateVideo/index.ts`, `packages/ai/src/index.ts`, `packages/ai-event-client/src/index.ts`	Adds `MediaPrompt`, `MediaPromptPart`, `MediaInputRole`, `MediaInputMetadata`, `ResolvedMediaPrompt`, `resolveMediaPrompt`; updates Image/Video activity prompt typings to be model-aware; emits `imageInputCount`/`videoInputCount`/`audioInputCount`; re-exports resolver.
FAL field-override generator and runtime mappers `scripts/generate-fal-image-field-map.ts`, `packages/ai-fal/src/image/image-inputs.ts`, `packages/ai-fal/src/model-meta.ts`, `.prettierignore`, `package.json`	Adds a script to introspect `@fal-ai/client` endpoint types and emit `FAL_IMAGE_FIELD_OVERRIDES`; runtime mappers convert `ImagePart` inputs into per-model fal input fields with list/scalar arity checks and role bucketing.
FAL adapters: image & video integration and tests `packages/ai-fal/src/adapters/image.ts`, `packages/ai-fal/src/adapters/video.ts`, `packages/ai-fal/tests/image-inputs.test.ts`, `packages/ai-fal/tests/video-adapter.test.ts`	Fal adapters resolve media prompts, validate unsupported media parts, map media inputs into fal request fields (images/videos/audios), and add tests covering mapping behavior and generated-override integrity.
OpenAI image edit flow and file helper `packages/ai-openai/src/adapters/image.ts`, `packages/ai-openai/src/image/image-input-to-file.ts`, `packages/ai-openai/tests/image-adapter.test.ts`	Adds `imagePartToFile` helper; routes image-containing prompts to `images.edit()` with model edit limits, mask/source validation, and file conversion; tests assert routing and error cases.
OpenAI video: input_reference handling and typing `packages/ai-openai/src/adapters/video.ts`, `packages/ai-openai/src/video/video-provider-options.ts`, `packages/ai-openai/tests/video-adapter.test.ts`	Resolves media prompts for video jobs, rejects audio/video parts, enforces at most one image part mapped to `input_reference` `File`, and adds tests and per-model modality typing for Sora models.
Grok Imagine models & edits endpoint `packages/ai-grok/src/adapters/image.ts`, `packages/ai-grok/src/image/image-provider-options.ts`, `packages/ai-grok/src/model-meta.ts`, `packages/ai-grok/tests/grok-adapter.test.ts`	Adds `grok-imagine-*` model metadata and size parsing, maps generic `size` to `aspect_ratio`/`resolution`, routes image inputs to `/v1/images/edits` when supported, validates roles/count (≤3 sources), and tests payload shaping and errors.
Gemini multimodal contents `packages/ai-gemini/src/adapters/image.ts`, `packages/ai-gemini/src/image/image-provider-options.ts`, `packages/ai-gemini/tests/image-adapter.test.ts`	Resolves media prompts, rejects video/audio parts, constructs Gemini multimodal `contents` with ordered `parts` converted from `ImagePart` inputs (inlineData/fileData), and preserves Imagen text-only behavior.
OpenRouter multimodal message construction `packages/ai-openrouter/src/adapters/image.ts`, `packages/ai-openrouter/src/image/image-provider-options.ts`, `packages/ai-openrouter/tests/image-adapter.test.ts`	Resolves media prompts, converts `ImagePart` sources to `image_url` or data URIs, builds interleaved chat `content` array when images present, rejects audio/video parts, and adds tests for content shapes.
Client types, hooks, and example apps `packages/ai-client/src/generation-types.ts`, `examples/ts-react-media/src/**`, `examples/ts-react-media/src/lib/server-functions.ts`	Widens `ImageGenerateInput.prompt` and `VideoGenerateInput.prompt` to `MediaPrompt`; updates example Image/Video generators and server functions to build/validate `MediaPrompt` arrays (attachments, image-to-video start_frame).
Type-safety and resolver tests `packages/ai/tests/media-prompt.test.ts`, `packages/ai/tests/image-per-model-type-safety.test.ts`	Adds Vitest suite for `resolveMediaPrompt` and TypeScript per-model modality safety tests ensuring compile-time enforcement.
Docs, changeset, scripts, e2e flags and tests `docs/media/image-generation.md`, `docs/media/video-generation.md`, `docs/adapters/grok.md`, `packages/ai/skills/ai-core/media-generation/SKILL.md`, `.changeset/image-and-video-inputs.md`, testing/e2e/**, `.gitignore`	Extensive docs for image-conditioned generation, role hints, provider support matrix; adds `generate:fal-image-fields` script; updates e2e feature matrix, fixtures, and Playwright specs for `image-to-image` and `image-to-video`.

Estimated code review effort: 🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

#707 — Overlaps in OpenRouter multimodal prompt handling and adapter message construction.

Suggested reviewers

crutchcorn
tannerlinsley

"A rabbit sketched a prompt and said,
Parts in order, frames ahead.
Text kept true, images in line,
Adapters map, the outputs shine.
Hop, stitch, and ship the media thread!"

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 618-image-to-image-and-image-to-video-support

github-actions · 2026-05-22T09:56:23Z

🚀 Changeset Version Preview

8 package(s) bumped directly, 23 bumped as dependents.

🟥 Major bumps

Package	Version	Reason
`@tanstack/ai-event-client`	0.5.4 → 1.0.0	Changeset
`@tanstack/ai-fal`	0.7.23 → 1.0.0	Changeset
`@tanstack/ai-gemini`	0.15.1 → 1.0.0	Changeset
`@tanstack/ai-grok`	0.11.2 → 1.0.0	Changeset
`@tanstack/ai-openai`	0.14.1 → 1.0.0	Changeset
`@tanstack/ai-openrouter`	0.13.1 → 1.0.0	Changeset
`@tanstack/ai-anthropic`	0.15.1 → 1.0.0	Dependent
`@tanstack/ai-code-mode`	0.2.5 → 1.0.0	Dependent
`@tanstack/ai-code-mode-skills`	0.2.5 → 1.0.0	Dependent
`@tanstack/ai-elevenlabs`	0.2.20 → 1.0.0	Dependent
`@tanstack/ai-groq`	0.4.2 → 1.0.0	Dependent
`@tanstack/ai-isolate-node`	0.1.30 → 1.0.0	Dependent
`@tanstack/ai-isolate-quickjs`	0.1.30 → 1.0.0	Dependent
`@tanstack/ai-ollama`	0.8.1 → 1.0.0	Dependent
`@tanstack/ai-preact`	0.9.4 → 1.0.0	Dependent
`@tanstack/ai-react`	0.15.4 → 1.0.0	Dependent
`@tanstack/ai-react-ui`	0.8.6 → 1.0.0	Dependent
`@tanstack/ai-solid`	0.13.4 → 1.0.0	Dependent
`@tanstack/ai-solid-ui`	0.7.6 → 1.0.0	Dependent
`@tanstack/ai-svelte`	0.13.4 → 1.0.0	Dependent
`@tanstack/ai-vue`	0.13.4 → 1.0.0	Dependent
`@tanstack/openai-base`	0.8.1 → 1.0.0	Dependent

🟨 Minor bumps

Package	Version	Reason
`@tanstack/ai`	0.28.0 → 0.29.0	Changeset
`@tanstack/ai-client`	0.16.3 → 0.17.0	Changeset

🟩 Patch bumps

Package	Version	Reason
`@tanstack/ai-devtools-core`	0.4.8 → 0.4.9	Dependent
`@tanstack/ai-isolate-cloudflare`	0.2.21 → 0.2.22	Dependent
`@tanstack/ai-mcp`	0.1.0 → 0.1.1	Dependent
`@tanstack/ai-vue-ui`	0.2.16 → 0.2.17	Dependent
`@tanstack/preact-ai-devtools`	0.1.51 → 0.1.52	Dependent
`@tanstack/react-ai-devtools`	0.2.51 → 0.2.52	Dependent
`@tanstack/solid-ai-devtools`	0.2.51 → 0.2.52	Dependent

nx-cloud · 2026-05-22T09:57:30Z

View your CI Pipeline Execution ↗ for commit 1b52533

Command	Status	Duration	Result
`nx run-many --targets=build --exclude=examples/...`	✅ Succeeded	1s	View ↗

☁️ Nx Cloud last updated this comment at 2026-06-11 02:35:04 UTC

pkg-pr-new · 2026-05-22T09:59:24Z

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@624

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@624

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@624

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@624

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@624

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@624

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@624

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@624

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@624

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@624

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@624

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@624

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@624

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@624

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@624

@tanstack/ai-mcp

npm i https://pkg.pr.new/@tanstack/ai-mcp@624

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@624

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@624

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@624

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@624

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@624

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@624

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@624

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@624

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@624

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@624

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@624

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@624

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@624

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@624

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@624

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@624

commit: 1b52533

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (1)

packages/ai-fal/tests/image-inputs.test.ts (1)
1-249: ⚡ Quick win

Move this unit test alongside its source module.

This test lives under packages/ai-fal/tests/ instead of next to the mapped source (e.g., near packages/ai-fal/src/image/image-inputs.ts), which diverges from the repo’s unit-test placement rule.

As per coding guidelines, "**/*.test.ts: Place unit tests in *.test.ts files alongside source code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-fal/tests/image-inputs.test.ts` around lines 1 - 249, The test
file image-inputs.test.ts needs to be moved to sit next to its source module
image-inputs.ts (the module exporting mapImageInputsToFalFields and
mapImageInputsToFalVideoFields); relocate the test file into the same directory
as image-inputs.ts, update any relative imports (e.g., ../src/image/image-inputs
to ./image-inputs) and the generated constant import if needed, and ensure the
test runner still discovers it (adjust any package test config or export paths
if required).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/image-generation.md`:
- Around line 151-155: Update the example calls to use the provider's current
image model instead of the hardcoded 'gpt-image-1': replace
openaiImage('gpt-image-1') with the latest model string from the OpenAI
adapter's model-meta.ts (use the adapter helper openaiImage(...) with that model
name) for every example instance (e.g., the generateImage call shown and the
other occurrences noted). Locate and edit the generateImage usage and any other
examples referencing openaiImage to pull the canonical model identifier from the
OpenAI adapter's model-meta.ts and substitute it in the docs so examples stay
up-to-date.

In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 98-109: The current merge order in the input construction lets
mapped image fields from mapImageInputsToFalFields (inputFields) overwrite
options.modelOptions, contradicting the intended behavior; change the spread
order so options.modelOptions is applied last (i.e., merge sizeParams and
inputFields first, then ...options.modelOptions) so per-endpoint modelOptions
override derived image-input keys when creating the input object (look for the
input variable creation that includes mapImageInputsToFalFields, sizeParams, and
prompt).

In `@packages/ai-gemini/src/adapters/image.ts`:
- Around line 256-264: The imagePartToGeminiPart path currently calls
fetch(part.source.value) with no URL safety checks; add a fail-closed URL gate
before the fetch that (1) parses part.source.value, (2) enforces a protocol
allowlist (only https and optionally data/gs if your flow needs them), and (3)
resolves and blocks private/loopback/internal IP ranges (RFC1918, 127.0.0.0/8,
IPv6 equivalents, localhost, and link-local) as well as preventing hostnames
that resolve to those IPs; if the URL fails any check, throw an Error instead of
calling fetch. Locate this logic around imagePartToGeminiPart where
fetch(part.source.value) is invoked and ensure errors surface before converting
the response to blob/arrayBuffer and calling arrayBufferToBase64.

In `@packages/ai-grok/src/adapters/image.ts`:
- Around line 254-264: The fetch call in editImages is missing a bounded
timeout; create an AbortController inside the editImages function, pass
controller.signal to the fetch options, and start a setTimeout that calls
controller.abort() after a chosen timeout (e.g., 30s or a configurable
constant); ensure you clearTimeout when fetch resolves/rejects and handle abort
errors appropriately (keep existing error handling behavior). Update the fetch
invocation in packages/ai-grok/src/adapters/image.ts to include signal:
controller.signal and wrap the timer lifecycle so the request cannot hang
indefinitely.

In `@packages/ai-openai/src/adapters/image.ts`:
- Around line 193-198: The error thrown when maxImages === 0 gives incorrect
guidance by omitting supported models (it lists gpt-image-1, gpt-image-1-mini,
or dall-e-2 but not gpt-image-2); update the message in the block using
EDIT_MAX_IMAGES and the local variable maxImages (and this.name) to mention
gpt-image-2 or, better, mirror the keys/allowed models from EDIT_MAX_IMAGES so
the guidance matches the actual supported-image models.

In `@packages/ai-openai/src/adapters/video.ts`:
- Around line 90-95: validateVideoSize is only checking the root size variable
and can miss modelOptions.size used later when building the request; update the
code around the adapter where you destructure options (the block with const {
model, size, duration, modelOptions } and const { imageInputs, videoInputs,
audioInputs }) to compute an effectiveSize (e.g., size ?? modelOptions?.size)
and call validateVideoSize(model, effectiveSize) so the value actually sent in
the request is validated; do the same pattern for seconds already handled
(seconds = duration ?? modelOptions?.seconds) to ensure consistency.

In `@packages/ai-openai/src/image/image-input-to-file.ts`:
- Around line 49-55: The fetch of arbitrary part.source.value is unsafe and can
enable SSRF or hanging requests; update the image ingestion in
image-input-to-file.ts to (1) validate the URL before fetching by parsing
part.source.value with new URL and reject non-http(s) schemes, disallow
localhost/127.0.0.0/8 and RFC1918 private IP ranges (resolve hostname to IP and
check against denylist), and disallow file:, data: (handle data: separately),
and (2) perform the fetch with a bounded timeout using AbortController (and
limit redirects) so requests cannot hang; apply these checks and timeout around
the existing fetch/response logic that references part.source.value.

In `@packages/ai/skills/ai-core/media-generation/SKILL.md`:
- Around line 673-676: Update the wording that currently reads "Adapters throw a
clear runtime error when the caller passes `imageInputs` to a model that can't
honor it (dall-e-3, Imagen, Grok, OpenRouter)" to scope the limitation to
specific unsupported models rather than entire providers; keep the phrase about
adapters and `imageInputs` but replace the parenthetical list with explicit
unsupported model examples (e.g., "dall-e-3, Imagen") or rephrase to "certain
models (for example, dall-e-3 and Imagen)" and remove or clarify Grok/OpenRouter
as providers so you don't imply the whole provider lacks image-conditioned
routes. Ensure the corrected sentence still mentions adapters throwing a runtime
error for unsupported models.

In `@scripts/generate-fal-image-field-map.ts`:
- Around line 223-236: The arity sanity check is being skipped for
default-selected fields because the loop continues when chosen ===
DEFAULTS[role]; update the loop so it does not early-continue for
default-selected fields — only skip when no chosen — and ensure the existing
arity comparison using isList.get(chosen), LIST_FIELDS.has(chosen), and
endpointId still runs for DEFAULTS[role] values so mismatches between the
runtime type map (isList) and the static LIST_FIELDS are caught (also update any
comment to mention image-inputs.ts where LIST_FIELDS is defined).

---

Nitpick comments:
In `@packages/ai-fal/tests/image-inputs.test.ts`:
- Around line 1-249: The test file image-inputs.test.ts needs to be moved to sit
next to its source module image-inputs.ts (the module exporting
mapImageInputsToFalFields and mapImageInputsToFalVideoFields); relocate the test
file into the same directory as image-inputs.ts, update any relative imports
(e.g., ../src/image/image-inputs to ./image-inputs) and the generated constant
import if needed, and ensure the test runner still discovers it (adjust any
package test config or export paths if required).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c39fe612-aaa6-4d46-a616-b99323822c03

📥 Commits

Reviewing files that changed from the base of the PR and between c0af426 and 483a3d4.

⛔ Files ignored due to path filters (1)

packages/ai-fal/src/image/generated/image-field-overrides.ts is excluded by !**/generated/**

📒 Files selected for processing (29)

.changeset/image-and-video-inputs.md
.prettierignore
docs/adapters/grok.md
docs/media/image-generation.md
docs/media/video-generation.md
package.json
packages/ai-event-client/src/index.ts
packages/ai-fal/src/adapters/image.ts
packages/ai-fal/src/adapters/video.ts
packages/ai-fal/src/image/image-inputs.ts
packages/ai-fal/tests/image-inputs.test.ts
packages/ai-gemini/src/adapters/image.ts
packages/ai-grok/src/adapters/image.ts
packages/ai-grok/src/image/image-provider-options.ts
packages/ai-grok/src/model-meta.ts
packages/ai-grok/tests/grok-adapter.test.ts
packages/ai-openai/src/adapters/image.ts
packages/ai-openai/src/adapters/video.ts
packages/ai-openai/src/image/image-input-to-file.ts
packages/ai-openai/tests/image-adapter.test.ts
packages/ai-openrouter/src/adapters/image.ts
packages/ai-openrouter/tests/image-adapter.test.ts
packages/ai/skills/ai-core/media-generation/SKILL.md
packages/ai/src/activities/generateImage/index.ts
packages/ai/src/activities/generateVideo/index.ts
packages/ai/src/types.ts
scripts/generate-fal-image-field-map.ts
testing/e2e/src/lib/feature-support.ts
testing/e2e/src/lib/types.ts

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

packages/ai-fal/src/adapters/image.ts (1)
109-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Merge order still prevents modelOptions from overriding derived fields.

The past review comment on this issue remains valid. Current spread order has inputFields (line 112) override modelOptions (line 110), contradicting the comment on lines 105-107 that claims "user overrides win." Users cannot override derived image-input fields (e.g., mask_url, reference_image_urls) via modelOptions when imageInputs are present.
💡 Apply the past reviewer's fix
     const input = {
-      ...options.modelOptions,
       ...sizeParams,
       ...inputFields,
+      ...options.modelOptions,
       ...(resolved.text ? { prompt: resolved.text } : {}),
       num_images: options.numberOfImages,
     } as FalModelInput<TModel>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-fal/src/adapters/image.ts` around lines 109 - 117, The merge
order currently lets inputFields override options.modelOptions so user-provided
modelOptions can't override derived image fields; update the spread order when
building the input object (the variable named input of type FalModelInput) so
that options.modelOptions is spread last (after sizeParams, inputFields and the
conditional prompt) — this ensures user modelOptions win and still preserves
resolved.text and num_images handling.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-fal/src/adapters/video.ts`:
- Around line 164-174: The current construction of the input object lets
media-derived fields (sizeParams, inputImageFields, videoFields, audioFields)
override user-supplied modelOptions; change the spread order so modelOptions is
spread after the derived media fields (i.e., build input from sizeParams,
inputImageFields, videoFields, audioFields, then ...modelOptions) while
preserving the conditional prompt and duration spreads (resolved.text and
duration) and the FalModelInput<TModel> typing so user-provided keys like
video_url, reference_video_urls, or audio_url in modelOptions take precedence.

In `@packages/ai-openai/src/adapters/video.ts`:
- Around line 99-127: Add unit tests covering createVideoJob multimodal
handling: (1) text-only prompt should call Videos.create with model and prompt
and no input_reference; (2) text+one image should convert the resolved image via
imagePartToFile and attach the returned File as request.input_reference — mock
imagePartToFile and the OpenAI_SDK.Videos.create call to assert the passed
params include input_reference; (3) more than one image should throw the same
error path (verify the thrown message). Also add a Playwright + aimock E2E test
that posts a single-image multimodal prompt and asserts the upstream Sora
request contains a single input_reference upload. Reference createVideoJob,
resolveMediaPrompt, imagePartToFile, and request.input_reference when locating
code to test and to mock.

---

Duplicate comments:
In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 109-117: The merge order currently lets inputFields override
options.modelOptions so user-provided modelOptions can't override derived image
fields; update the spread order when building the input object (the variable
named input of type FalModelInput) so that options.modelOptions is spread last
(after sizeParams, inputFields and the conditional prompt) — this ensures user
modelOptions win and still preserves resolved.text and num_images handling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98656191-029e-4c17-87e3-a6eb4ac01845

📥 Commits

Reviewing files that changed from the base of the PR and between 483a3d4 and 2e47d5e.

📒 Files selected for processing (33)

.changeset/image-and-video-inputs.md
.gitignore
docs/adapters/grok.md
docs/media/image-generation.md
docs/media/video-generation.md
packages/ai-fal/src/adapters/image.ts
packages/ai-fal/src/adapters/video.ts
packages/ai-fal/src/image/image-inputs.ts
packages/ai-fal/src/model-meta.ts
packages/ai-gemini/src/adapters/image.ts
packages/ai-gemini/src/image/image-provider-options.ts
packages/ai-grok/src/adapters/image.ts
packages/ai-grok/src/image/image-provider-options.ts
packages/ai-grok/tests/grok-adapter.test.ts
packages/ai-openai/src/adapters/image.ts
packages/ai-openai/src/adapters/video.ts
packages/ai-openai/src/image/image-provider-options.ts
packages/ai-openai/src/video/video-provider-options.ts
packages/ai-openai/tests/image-adapter.test.ts
packages/ai-openrouter/src/adapters/image.ts
packages/ai-openrouter/src/image/image-provider-options.ts
packages/ai-openrouter/tests/image-adapter.test.ts
packages/ai/skills/ai-core/media-generation/SKILL.md
packages/ai/src/activities/generateImage/adapter.ts
packages/ai/src/activities/generateImage/index.ts
packages/ai/src/activities/generateVideo/adapter.ts
packages/ai/src/activities/generateVideo/index.ts
packages/ai/src/index.ts
packages/ai/src/types.ts
packages/ai/src/utilities/media-prompt.ts
packages/ai/tests/image-per-model-type-safety.test.ts
packages/ai/tests/media-prompt.test.ts
testing/e2e/src/lib/feature-support.ts

✅ Files skipped from review due to trivial changes (2)

docs/adapters/grok.md
docs/media/video-generation.md

🚧 Files skipped from review as they are similar to previous changes (7)

packages/ai-openrouter/src/adapters/image.ts
packages/ai-grok/src/image/image-provider-options.ts
packages/ai-grok/tests/grok-adapter.test.ts
packages/ai-openai/src/adapters/image.ts
packages/ai-fal/src/image/image-inputs.ts
packages/ai-grok/src/adapters/image.ts
packages/ai-gemini/src/adapters/image.ts

coderabbitai

🧹 Nitpick comments (1)

packages/ai-openai/tests/video-adapter.test.ts (1)

81-112: ⚡ Quick win

Split combined rejection test into separate cases for clarity.

This single test case validates rejection of both video and audio parts. While the coverage is correct, combining two distinct validation scenarios in one it() block reduces test clarity and makes failure diagnostics less precise. If either rejection path fails, the error message will be less informative about which modality caused the failure.

♻️ Proposed refactor to separate test cases

-  it('rejects video and audio prompt parts', async () => {
+  it('rejects video prompt parts', async () => {
     const { adapter, mockCreate } = mockedAdapter()

     await expect(
       adapter.createVideoJob({
         model: 'sora-2',
         prompt: [
           { type: 'text', content: 'x' },
           {
             type: 'video',
             source: { type: 'url', value: 'https://example.com/v.mp4' },
           },
         ],
         logger: testLogger,
       }),
     ).rejects.toThrow(/video prompt parts/)
+    expect(mockCreate).not.toHaveBeenCalled()
+  })

+  it('rejects audio prompt parts', async () => {
+    const { adapter, mockCreate } = mockedAdapter()

     await expect(
       adapter.createVideoJob({
         model: 'sora-2',
         prompt: [
           { type: 'text', content: 'x' },
           {
             type: 'audio',
             source: { type: 'url', value: 'https://example.com/a.mp3' },
           },
         ],
         logger: testLogger,
       }),
     ).rejects.toThrow(/audio prompt parts/)
     expect(mockCreate).not.toHaveBeenCalled()
   })

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-openai/tests/video-adapter.test.ts` around lines 81 - 112, Split
the combined rejection test into two separate tests to improve clarity: create
one it() block that uses mockedAdapter() and testLogger to call
adapter.createVideoJob with a prompt containing a video part and asserts
rejects.toThrow(/video prompt parts/) and that mockCreate was not called, and a
second it() block that does the same for a prompt containing an audio part
asserting rejects.toThrow(/audio prompt parts/) and mockCreate not called; keep
using mockedAdapter(), adapter.createVideoJob, mockCreate, and testLogger to
locate and reuse the existing mocks and assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/ai-openai/tests/video-adapter.test.ts`:
- Around line 81-112: Split the combined rejection test into two separate tests
to improve clarity: create one it() block that uses mockedAdapter() and
testLogger to call adapter.createVideoJob with a prompt containing a video part
and asserts rejects.toThrow(/video prompt parts/) and that mockCreate was not
called, and a second it() block that does the same for a prompt containing an
audio part asserting rejects.toThrow(/audio prompt parts/) and mockCreate not
called; keep using mockedAdapter(), adapter.createVideoJob, mockCreate, and
testLogger to locate and reuse the existing mocks and assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a45a71f-c5d6-4a1d-a238-42e1ecbdd798

📥 Commits

Reviewing files that changed from the base of the PR and between 2e47d5e and 36d7683.

📒 Files selected for processing (12)

.changeset/image-and-video-inputs.md
docs/media/image-generation.md
docs/media/video-generation.md
packages/ai-fal/src/image/image-inputs.ts
packages/ai-fal/tests/image-inputs.test.ts
packages/ai-fal/tests/video-adapter.test.ts
packages/ai-gemini/tests/image-adapter.test.ts
packages/ai-openai/src/adapters/image.ts
packages/ai-openai/src/adapters/video.ts
packages/ai-openai/tests/image-adapter.test.ts
packages/ai-openai/tests/video-adapter.test.ts
packages/ai/skills/ai-core/media-generation/SKILL.md

✅ Files skipped from review due to trivial changes (2)

docs/media/video-generation.md
.changeset/image-and-video-inputs.md

🚧 Files skipped from review as they are similar to previous changes (6)

packages/ai/skills/ai-core/media-generation/SKILL.md
packages/ai-openai/tests/image-adapter.test.ts
packages/ai-fal/tests/image-inputs.test.ts
packages/ai-openai/src/adapters/video.ts
packages/ai-openai/src/adapters/image.ts
packages/ai-fal/src/image/image-inputs.ts

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

examples/ts-react-media/src/components/ImageGenerator.tsx (1)

9-10: 💤 Low value

Fix import order per ESLint rules.

The ESLint import/order rule requires the type import from @/lib/media to come before the regular import from @/lib/server-functions. This is a formatting issue flagged by static analysis.

♻️ Suggested reordering

 import { generateImageFn } from '`@/lib/server-functions`'
 import { getRandomImagePrompt } from '`@/lib/prompts`'
 import { IMAGE_MODELS } from '`@/lib/models`'
-import { readImageFile, toImagePart } from '`@/lib/media`'
 import type { AttachedImage } from '`@/lib/media`'
+import { readImageFile, toImagePart } from '`@/lib/media`'

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/ts-react-media/src/components/ImageGenerator.tsx` around lines 9 -
10, Reorder the imports so the type-only import AttachedImage is placed before
the value imports readImageFile and toImagePart to satisfy the ESLint
import/order rule; update the import lines in ImageGenerator.tsx so the `import
type { AttachedImage } from '`@/lib/media`'` statement appears above `import {
readImageFile, toImagePart } from '`@/lib/media`'`.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/ts-react-media/src/components/ImageGenerator.tsx`:
- Around line 194-241: Update the reference-images note in the ImageGenerator
component: replace the incorrect "Supported by Gemini native (NanoBanana) models
only" text with a correct statement that image inputs are supported only by the
Gemini image-preview models (e.g., "Supported by Gemini native image-preview
models only: gemini-3.1-flash-image-preview, gemini-3-pro-image-preview"). Do
not reference NanoBanana (fal-ai/nano-banana-pro) since server-functions.ts uses
asTextPrompt for that model (see asTextPrompt and asImagePrompt usages) and
NanoBanana does not accept image prompts.

In `@testing/e2e/src/components/VideoGenUI.tsx`:
- Around line 21-35: The fileToBase64 function currently resolves an empty
string when reader.result.split(',')[1] is undefined; update fileToBase64 to
validate the Data URL format and reject with a clear Error instead of returning
an empty string. Specifically, in fileToBase64 (where reader.onload handles
result), check that result is a string and that result.includes(',') and that
split(',')[1] is a non-empty string; if not, call reject(new Error('Invalid data
URL format')) (or similar) so callers receive an explicit error rather than an
empty value.
- Around line 68-83: In handleGenerate, add validation before calling
fileToBase64: verify imageFile exists and that its MIME type is an image (e.g.,
imageFile.type startsWith('image/') or fallback to checking the file extension)
and enforce a max size threshold (e.g., 5MB) to avoid large base64 payloads; if
the file fails validation, early-return and surface a user-friendly
error/notification instead of proceeding. Perform these checks before the base64
conversion and only call generate with the image block (and include
imageFile.type as mimeType) when validations pass; keep the validation logic
close to the existing handleGenerate, referencing imageFile, fileToBase64, and
generate.

---

Nitpick comments:
In `@examples/ts-react-media/src/components/ImageGenerator.tsx`:
- Around line 9-10: Reorder the imports so the type-only import AttachedImage is
placed before the value imports readImageFile and toImagePart to satisfy the
ESLint import/order rule; update the import lines in ImageGenerator.tsx so the
`import type { AttachedImage } from '`@/lib/media`'` statement appears above
`import { readImageFile, toImagePart } from '`@/lib/media`'`.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65b1dca2-a819-4d1e-9ae0-2c9144267443

📥 Commits

Reviewing files that changed from the base of the PR and between 36d7683 and 349eb0d.

📒 Files selected for processing (26)

.changeset/image-and-video-inputs.md
docs/media/video-generation.md
examples/ts-react-media/src/components/ImageGenerator.tsx
examples/ts-react-media/src/components/VideoGenerator.tsx
examples/ts-react-media/src/lib/media.ts
examples/ts-react-media/src/lib/server-functions.ts
packages/ai-client/src/generation-types.ts
packages/ai-client/tests/video-generation-client.test.ts
packages/ai-fal/src/model-meta.ts
packages/ai-fal/tests/video-adapter.test.ts
packages/ai/src/client.ts
testing/e2e/README.md
testing/e2e/fixtures/image-to-image/basic.json
testing/e2e/global-setup.ts
testing/e2e/src/components/ImageGenUI.tsx
testing/e2e/src/components/VideoGenUI.tsx
testing/e2e/src/lib/feature-support.ts
testing/e2e/src/lib/features.ts
testing/e2e/src/lib/server-functions.ts
testing/e2e/src/routes/$provider/$feature.tsx
testing/e2e/src/routes/api.image.stream.ts
testing/e2e/src/routes/api.image.ts
testing/e2e/src/routes/api.video.stream.ts
testing/e2e/src/routes/api.video.ts
testing/e2e/tests/image-to-image.spec.ts
testing/e2e/tests/image-to-video.spec.ts

✅ Files skipped from review due to trivial changes (6)

testing/e2e/README.md
testing/e2e/src/routes/api.image.stream.ts
testing/e2e/fixtures/image-to-image/basic.json
packages/ai-client/tests/video-generation-client.test.ts
testing/e2e/src/routes/api.video.stream.ts
.changeset/image-and-video-inputs.md

🚧 Files skipped from review as they are similar to previous changes (2)

packages/ai-fal/tests/video-adapter.test.ts
docs/media/video-generation.md

…e activity follow-ups Closes #707. - Add openRouterVideo: async jobs adapter for OpenRouter's dedicated video API (submit -> poll -> download). Per-model size/duration/option types are generated from GET /api/v1/videos/models; frame roles map onto frame_images[] / input_references[] per the MediaInputRole taxonomy. - Teach the model-meta sync scripts the videos/models endpoint (openrouter.video-models.json + OPENROUTER_VIDEO_MODEL_META). - Image adapter follow-ups from the #624 review: throw on unmapped sizes (the size union used a Unicode multiplication sign so every non-square size silently dropped its aspect ratio), throw on numberOfImages > 1 (live-verified: the gateway ignores all count keys), expose image_config.strength. - Completed videos are returned as data: URLs (unsigned_urls 401 without the API key header) with gateway-reported cost on usage.cost. The SDK's getVideoContent is bypassed: its matcher only accepts application/octet-stream while the endpoint serves video/mp4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/ai-fal/src/adapters/image.ts (1)

113-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strip prompt out of modelOptions before the spread.

Line 113 lets modelOptions.prompt survive whenever resolved.text is empty, because Line 116 only reasserts prompt conditionally. That reopens a second text-prompt path for media-only requests and breaks the “top-level prompt is the multimodal surface / call-controlled fields win” contract.

Proposed fix

+    const { prompt: _prompt, num_images: _numImages, ...modelOptions } =
+      options.modelOptions ?? {}
     const input = {
       ...sizeParams,
       ...inputFields,
-      ...options.modelOptions,
+      ...modelOptions,
       // Media-only prompts (e.g. upscalers, background removal) omit the
       // prompt field entirely rather than sending an empty string.
       ...(resolved.text ? { prompt: resolved.text } : {}),
       num_images: options.numberOfImages,
     } as FalModelInput<TModel>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-fal/src/adapters/image.ts` around lines 113 - 117, The spread of
options.modelOptions currently allows modelOptions.prompt to leak through when
resolved.text is empty; remove prompt from modelOptions before spreading so only
the conditional top-level prompt (based on resolved.text) is used. Locate the
creation of the request payload where options.modelOptions is spread (refer to
options.modelOptions and resolved.text in adapters/image.ts) and replace the
spread with a prompt-stripped object (e.g., destructure to drop prompt from
options.modelOptions or shallow-copy and delete prompt) then spread that cleaned
object, preserving num_images and the conditional ...(resolved.text ? { prompt:
resolved.text } : {}) behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 113-117: The spread of options.modelOptions currently allows
modelOptions.prompt to leak through when resolved.text is empty; remove prompt
from modelOptions before spreading so only the conditional top-level prompt
(based on resolved.text) is used. Locate the creation of the request payload
where options.modelOptions is spread (refer to options.modelOptions and
resolved.text in adapters/image.ts) and replace the spread with a
prompt-stripped object (e.g., destructure to drop prompt from
options.modelOptions or shallow-copy and delete prompt) then spread that cleaned
object, preserving num_images and the conditional ...(resolved.text ? { prompt:
resolved.text } : {}) behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d43d1dcb-bf3a-4293-b67b-ff94ad4d392b

📥 Commits

Reviewing files that changed from the base of the PR and between 349eb0d and 0c65cc7.

📒 Files selected for processing (6)

examples/ts-react-media/src/components/ImageGenerator.tsx
packages/ai-fal/src/adapters/image.ts
packages/ai-fal/src/adapters/video.ts
packages/ai-openai/src/adapters/video.ts
scripts/generate-fal-image-field-map.ts
testing/e2e/src/components/VideoGenUI.tsx

🚧 Files skipped from review as they are similar to previous changes (4)

testing/e2e/src/components/VideoGenUI.tsx
packages/ai-openai/src/adapters/video.ts
examples/ts-react-media/src/components/ImageGenerator.tsx
packages/ai-fal/src/adapters/video.ts

…tioned generation (closes #618) Adds optional `imageInputs`, `videoInputs`, and `audioInputs` to `generateImage()` and `generateVideo()` for image-to-image, multi-reference, mask / inpaint, image-to-video, and starting-frame flows. Each input part may carry a `metadata.role` hint (`'reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character'`) that adapters use to route to the provider-specific field. Provider behavior: - OpenAI image: gpt-image-1 / -mini route to `images.edit()` (up to 16 + mask); dall-e-2 routes to `images.edit()` with one source; dall-e-3 throws. - OpenAI video: Sora-2 / -pro accept a single `input_reference`; throws on >1. - Gemini: native models receive inputs as multimodal `contents` parts; Imagen throws (text-only). - fal: 1 input → `image_url`, >1 → `image_urls`; metadata roles map to `mask_url` / `control_image_url` / `reference_image_urls`; video adds `start_image_url` / `end_image_url`. Interim mapping until the fal schemas library lands. - Grok, OpenRouter: throw with a link back to #618 (pending native Imagine API rewrite and multimodal injection work respectively). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…SDK type map Replace the fal image-input field heuristic with a per-endpoint mapping generated from @fal-ai/client's EndpointTypeMap (scripts/ generate-fal-image-field-map.ts, run via pnpm generate:fal-image-fields). The committed artifact stores only the 362 endpoints whose field names deviate from the defaults (e.g. nano-banana edit -> image_urls, Kling i2v start frame -> image_url, Veo first-last-frame -> first_frame_url / last_frame_url, Fooocus masks -> mask_image_url); the old heuristic remains the fallback for endpoints newer than the installed SDK. Safety rails: the generated file `satisfies`-checks every field name against the SDK endpoint types (type-only, erased at runtime), and a unit test hashes the installed endpoints.d.ts against the recorded hash so an SDK bump without regeneration fails test:lib with the regen command. Mappers are now typed: both return FalImageInputFields<TModel>, Pick'ed from the endpoint's real input type via a generated field-name union. Roles resolving to the same list field merge (source + reference on nano-banana); colliding scalar fields throw instead of overwriting. Also fixes the remaining CI lint failures: duplicate @tanstack/ai import and non-null assertion in ai-fal video.ts, switch-exhaustiveness errors in image-inputs.ts (restructured away), and the non-null assertion in ai-openai image.ts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…d generation Grok: add the xAI Imagine API image models (grok-imagine-image, grok-imagine-image-quality) to model-meta. With imageInputs they route to xAI's JSON POST /v1/images/edits endpoint via direct fetch (the OpenAI SDK's images.edit() sends multipart/form-data, which xAI rejects) — a single input as image:{url}, 2-3 inputs as images:[...] referenceable in the prompt as <IMAGE_0>/<IMAGE_1>; >3 inputs and mask/control roles throw. Their generic `size` uses an aspectRatio_resolution template ('16:9_2k', suffix optional), mirroring Gemini's native image models, and maps to the Imagine aspect_ratio/resolution parameters on both the generate and edit paths. grok-2-image-1212 stays text-to-image only with a clear error. OpenRouter: imageInputs are injected as multimodal image_url content parts alongside the prompt in the chat-completions message and forwarded to the underlying image model. Neither path fetches or base64-encodes URL sources in-process — URLs pass through verbatim and are fetched by the provider; data sources become data URIs. Bumps ai-grok and ai-openrouter to minor in the existing changeset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… API drift - Move the generated fal image-field map and the generator's paths from packages/typescript/ai-fal to packages/ai-fal (repo flattened the layout) - Add gpt-image-2 to EDIT_MAX_IMAGES (new model on main; same 16-image edit limit as the other gpt-image models) - Map edit-path usage through buildImagesUsage to match the new TokenUsage shape, and drop two now-unnecessary type assertions Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@image1

…s text through verbatim Replace the imageInputs / videoInputs / audioInputs fields with a multimodal prompt: string | MediaPromptPart[]. Part order is meaningful — natively multimodal providers (Gemini, OpenRouter) receive parts in interleaved order; named-field providers (OpenAI, fal, xAI) extract media parts via the new resolveMediaPrompt() utility and flatten the text. Zero magic: prompt text is always sent verbatim. The SDK never injects or rewrites in-prompt referencing markers — users write each provider's own convention (fal Kling/Seedance @image1, OpenAI/FLUX.2 "image 1" prose, Gemini content descriptions), now documented per provider in the media docs. An earlier grok <IMAGE_n> auto-injection was removed after research showed the convention is absent from xAI's official docs (images are addressed by request order). - Per-model compile-time prompt narrowing via TModelInputModalitiesByName adapter generic (e.g. dall-e-3 / Imagen reject image parts as a type error); fal modality maps are derived at the type level from the SDK's endpoint input types - metadata.tag added as an informational label (never read by adapters) - Gemini now preserves true interleaving in contents; OpenRouter maps parts 1:1 onto chat content parts in order Closes #618 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- openai: add gpt-image-2 to the editImages error message and JSDoc (the model is edit-capable via EDIT_MAX_IMAGES but was omitted from user-facing guidance); same fix in docs, SKILL.md, and the changeset - openai: throw when the images.edit() response contains no usable images (matching grok's guard) instead of resolving to { images: [] } - openai: drop the unnecessary input_reference cast in the Sora adapter — the SDK types the field, so assign directly - fal: reject metadata.role 'mask'/'control' in the video mapper instead of silently folding them into source frames - docs: mark Veo role mappings as planned (no Veo adapter yet), note the Gemini ~14-image limit is provider-side, bump samples to gpt-image-2 - tests: cover the Gemini image-conditioned path (interleaved contents, fileData vs inlineData vs fetch+inline, Imagen/video/audio rejection), the Sora input_reference upload and guards (new file), the fal video createVideoJob field assembly and audio guard, and the openai empty-edit-response guard Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Same defect class as the editImages guard in the previous commit: the text-to-image path silently resolved to { images: [] } when response items had neither b64_json nor url. Surface it as an error instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l field demotion - ai-client: widen ImageGenerateInput.prompt / VideoGenerateInput.prompt from string to MediaPrompt so useGenerateImage/useGenerateVideo can carry image parts from the browser; re-export the MediaPrompt types from @tanstack/ai/client - ai-fal: demote media-conditioning fields (FalImageFieldName set plus video_url/video_urls/reference_video_urls/audio_url) from required to optional in FalImageProviderOptions / FalVideoProviderOptions — i2v endpoints declare e.g. image_url as required, but with a multimodal prompt the start frame arrives as a prompt part; modelOptions stays available as the explicit escape hatch - e2e: real coverage for image-to-image (OpenAI /v1/images/edits) and image-to-video (Sora multipart /v1/videos with input_reference) — the installed aimock 1.29 mocks both multipart endpoints, so the previous "aimock can't mock this" empty provider sets were stale. New specs run all three transports and assert via aimock's request journal that the expected wire endpoint was hit. ImageGenUI/VideoGenUI gain a file input, feature routing/fixtures/onVideo registration added, README matrix updated - examples/ts-react-media: ImageGenerator gains a multi-image reference picker (Gemini native models); VideoGenerator sends the start frame as a prompt part with role 'start_frame' instead of modelOptions URLs; server functions narrow the wire prompt per model and throw on unsupported part kinds instead of dropping them Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- fal image/video: spread modelOptions after derived media fields so explicit user overrides win (matches documented intent) - openai video: validate effective size (size ?? modelOptions.size) - generate-fal-image-field-map: run arity check for default-selected fields too - ts-react-media example: correct reference-image support comment (Gemini multimodal models, not NanoBanana) - e2e VideoGenUI: reject on malformed data URL instead of resolving '' Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e activity follow-ups Closes #707. - Add openRouterVideo: async jobs adapter for OpenRouter's dedicated video API (submit -> poll -> download). Per-model size/duration/option types are generated from GET /api/v1/videos/models; frame roles map onto frame_images[] / input_references[] per the MediaInputRole taxonomy. - Teach the model-meta sync scripts the videos/models endpoint (openrouter.video-models.json + OPENROUTER_VIDEO_MODEL_META). - Image adapter follow-ups from the #624 review: throw on unmapped sizes (the size union used a Unicode multiplication sign so every non-square size silently dropped its aspect ratio), throw on numberOfImages > 1 (live-verified: the gateway ignores all count keys), expose image_config.strength. - Completed videos are returned as data: URLs (unsigned_urls 401 without the API key header) with gateway-reported cost on usage.cost. The SDK's getVideoContent is bypassed: its matcher only accepts application/octet-stream while the endpoint serves video/mp4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n contract (#634) Restacked on 618-image-to-image-and-image-to-video-support to adopt the multimodal MediaPrompt format, carrying a minimal additive port of the #534 typed-duration contract: - @tanstack/ai (non-breaking): VideoAdapter/BaseVideoAdapter gain a TModelDurationByName generic (default Record<string, number> preserves existing duration?: number typing), DurationOptions, snapToDurationOption, and default availableDurations()/snapDuration() implementations. generateVideo's duration is typed via VideoDurationForAdapter. - @tanstack/ai-gemini: GeminiVideoAdapter over generateVideos / getVideosOperation with per-model typed durations (Veo 3.x 4|6|8, Veo 2 5|6|8 per current Veo docs), MediaPrompt image routing (start_frame → image, end_frame → lastFrame, reference/character → referenceImages), RAI filter surfacing, geminiVideo/createGeminiVideo factories, and finalized Veo model-meta entries. - E2E: gemini added to video-gen with a custom aimock mount for :predictLongRunning + operations polling; all transports pass. - Docs + media-generation skill updated for Veo (typed durations, image-to-video role table). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

tombeckenham linked an issue May 22, 2026 that may be closed by this pull request

image-to-image and image-to-video support #618

Open

tombeckenham force-pushed the 618-image-to-image-and-image-to-video-support branch from 0740073 to 483a3d4 Compare June 5, 2026 08:10

tombeckenham mentioned this pull request Jun 5, 2026

feat(ai-grok): video generation adapter for the grok-imagine video models #705

Open

tombeckenham marked this pull request as ready for review June 5, 2026 08:25

tombeckenham requested a review from a team as a code owner June 5, 2026 08:25

tombeckenham requested a review from AlemTuzlak June 5, 2026 08:27

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

tombeckenham mentioned this pull request Jun 5, 2026

feat(ai-openrouter): video generation adapter (/api/v1/videos) + image activity follow-ups #707

Open

7 tasks

tombeckenham changed the title ~~feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation~~ feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video) Jun 7, 2026

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread packages/ai-fal/src/adapters/video.ts

Comment thread packages/ai-openai/src/adapters/video.ts

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread examples/ts-react-media/src/components/ImageGenerator.tsx

Comment thread testing/e2e/src/components/VideoGenUI.tsx

Comment thread testing/e2e/src/components/VideoGenUI.tsx

tombeckenham removed the request for review from AlemTuzlak June 7, 2026 11:48

This was referenced Jun 10, 2026

feat(ai-openrouter): video generation adapter (/api/v1/videos) + image activity follow-ups #740

Draft

feat(ai-grok): video generation adapter for the grok-imagine video models #742

Open

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

tombeckenham and others added 9 commits June 11, 2026 10:13

ci: apply automated fixes

48d0f62

tombeckenham force-pushed the 618-image-to-image-and-image-to-video-support branch from 0c65cc7 to acd7319 Compare June 11, 2026 00:18

tombeckenham mentioned this pull request Jun 11, 2026

feat(ai-gemini): add Google Veo video adapter on the typed-duration contract #741

Merged

4 tasks

tombeckenham force-pushed the 618-image-to-image-and-image-to-video-support branch from 1b52533 to acd7319 Compare June 11, 2026 02:26

tombeckenham mentioned this pull request Jun 11, 2026

feat(ai,ai-gemini): add Google Veo video adapter on the typed-duration contract #746

Merged

4 tasks

Uh oh!

Conversation

tombeckenham commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Provider mapping

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Changeset Version Preview

🟥 Major bumps

🟨 Minor bumps

🟩 Patch bumps

Uh oh!

nx-cloud Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tombeckenham commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

nx-cloud Bot commented May 22, 2026 •

edited

Loading

pkg-pr-new Bot commented May 22, 2026 •

edited

Loading