Skip to content

feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video)#624

Open
tombeckenham wants to merge 11 commits into
mainfrom
618-image-to-image-and-image-to-video-support
Open

feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video)#624
tombeckenham wants to merge 11 commits into
mainfrom
618-image-to-image-and-image-to-video-support

Conversation

@tombeckenham

@tombeckenham tombeckenham commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #618 by making prompt itself the multimodal surface for generateImage() / generateVideo(): a plain string, or an ordered array of content parts (TextPart / ImagePart / VideoPart / AudioPart) for image-to-image, reference-guided, multi-reference, edit/inpaint, and image-to-video flows.

await generateImage({
  adapter: geminiImage('gemini-3.1-flash-image-preview'),
  prompt: [
    { type: 'text', content: 'Not like this' },
    { type: 'image', source: { type: 'url', value: badExampleUrl } },
    { type: 'text', content: 'more like this' },
    { type: 'image', source: { type: 'url', value: goodExampleUrl } },
  ],
})

Design

  • Interleaving is canonical. Part order is meaningful; providers with natively multimodal prompts (Gemini contents, OpenRouter chat content parts) receive parts exactly as written. Named-field providers (OpenAI, fal, xAI) downrev through the new resolveMediaPrompt() utility: flattened text + per-modality part buckets, with metadata.role routing (mask, control, reference, start/end frame).
  • Zero magic. Prompt text is sent verbatim — the SDK never injects or rewrites referencing markers. Users write each provider's own convention (fal Kling/Seedance @Image1, OpenAI/FLUX.2 "image 1" prose, Gemini content descriptions); the per-provider table is documented in docs/media/image-generation.md. A metadata.tag label exists for user bookkeeping only. (An earlier <IMAGE_n> auto-injection for grok was removed after source-verified research showed the convention is absent from xAI's official docs — xAI addresses edit images by request order.)
  • Per-model compile-time safety. A new TModelInputModalitiesByName adapter generic narrows the accepted part types per model — passing an image part to dall-e-3 or Imagen is a type error, with the runtime throw kept as backstop. fal's maps are derived at the type level from the SDK's endpoint input types.

Provider mapping

Provider Pathway
Gemini (native image models) parts → contents 1:1, interleaving preserved; Imagen text-only
OpenAI edits via images.edit() (mask role → mask); Sora single input_reference
Grok grok-imagine → /v1/images/edits (≤3 images, request order, prompt verbatim)
OpenRouter parts → chat content parts 1:1 in order
fal role + order → generated per-endpoint field map (image_url, mask_url, …)

Testing

  • pnpm test:pr green across all 34 projects (sherif, knip, docs, eslint, lib, types, build)
  • New unit tests: resolveMediaPrompt (verbatim guarantee, ordering, buckets), per-model prompt-modality type assertions, adapter tests updated across openai/grok/openrouter/fal
  • E2E suite: 254 passed (image-to-image/image-to-video matrix entries are present but empty pending aimock support for the edit endpoints; adapter mapping covered by unit tests)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Multimodal prompts for image/video generation (ordered text/image/video/audio parts) with per-model validation and provider routing; client hooks and examples now accept structured media prompts; utility to resolve/flatten media prompts.
  • Documentation

    • Expanded guides, examples, provider support matrices, and role/tag conventions for image-conditioned and image-to-video workflows.
  • Tests & Tooling

    • New unit and E2E tests for multimodal flows; generator/tooling added to maintain provider field mappings.

Also includes the Gemini Veo video adapter on the typed-duration contract (merged in via #746/#741) — closes #634.

@tombeckenham tombeckenham linked an issue May 22, 2026 that may be closed by this pull request
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds MediaPrompt types and resolver; threads modality-aware prompt types into image/video activities and adapters; implements provider-specific mapping and validation (OpenAI, Gemini, fal.ai, Grok, OpenRouter); updates client, examples, tests, docs, scripts, and e2e for image-to-image and image-to-video.

Changes

Media input conditioning across core and adapters

Layer / File(s) Summary
Core types, resolver and activity wiring
packages/ai/src/types.ts, packages/ai/src/utilities/media-prompt.ts, packages/ai/src/activities/generateImage/index.ts, packages/ai/src/activities/generateVideo/index.ts, packages/ai/src/index.ts, packages/ai-event-client/src/index.ts
Adds MediaPrompt, MediaPromptPart, MediaInputRole, MediaInputMetadata, ResolvedMediaPrompt, resolveMediaPrompt; updates Image/Video activity prompt typings to be model-aware; emits imageInputCount/videoInputCount/audioInputCount; re-exports resolver.
FAL field-override generator and runtime mappers
scripts/generate-fal-image-field-map.ts, packages/ai-fal/src/image/image-inputs.ts, packages/ai-fal/src/model-meta.ts, .prettierignore, package.json
Adds a script to introspect @fal-ai/client endpoint types and emit FAL_IMAGE_FIELD_OVERRIDES; runtime mappers convert ImagePart inputs into per-model fal input fields with list/scalar arity checks and role bucketing.
FAL adapters: image & video integration and tests
packages/ai-fal/src/adapters/image.ts, packages/ai-fal/src/adapters/video.ts, packages/ai-fal/tests/image-inputs.test.ts, packages/ai-fal/tests/video-adapter.test.ts
Fal adapters resolve media prompts, validate unsupported media parts, map media inputs into fal request fields (images/videos/audios), and add tests covering mapping behavior and generated-override integrity.
OpenAI image edit flow and file helper
packages/ai-openai/src/adapters/image.ts, packages/ai-openai/src/image/image-input-to-file.ts, packages/ai-openai/tests/image-adapter.test.ts
Adds imagePartToFile helper; routes image-containing prompts to images.edit() with model edit limits, mask/source validation, and file conversion; tests assert routing and error cases.
OpenAI video: input_reference handling and typing
packages/ai-openai/src/adapters/video.ts, packages/ai-openai/src/video/video-provider-options.ts, packages/ai-openai/tests/video-adapter.test.ts
Resolves media prompts for video jobs, rejects audio/video parts, enforces at most one image part mapped to input_reference File, and adds tests and per-model modality typing for Sora models.
Grok Imagine models & edits endpoint
packages/ai-grok/src/adapters/image.ts, packages/ai-grok/src/image/image-provider-options.ts, packages/ai-grok/src/model-meta.ts, packages/ai-grok/tests/grok-adapter.test.ts
Adds grok-imagine-* model metadata and size parsing, maps generic size to aspect_ratio/resolution, routes image inputs to /v1/images/edits when supported, validates roles/count (≤3 sources), and tests payload shaping and errors.
Gemini multimodal contents
packages/ai-gemini/src/adapters/image.ts, packages/ai-gemini/src/image/image-provider-options.ts, packages/ai-gemini/tests/image-adapter.test.ts
Resolves media prompts, rejects video/audio parts, constructs Gemini multimodal contents with ordered parts converted from ImagePart inputs (inlineData/fileData), and preserves Imagen text-only behavior.
OpenRouter multimodal message construction
packages/ai-openrouter/src/adapters/image.ts, packages/ai-openrouter/src/image/image-provider-options.ts, packages/ai-openrouter/tests/image-adapter.test.ts
Resolves media prompts, converts ImagePart sources to image_url or data URIs, builds interleaved chat content array when images present, rejects audio/video parts, and adds tests for content shapes.
Client types, hooks, and example apps
packages/ai-client/src/generation-types.ts, examples/ts-react-media/src/**, examples/ts-react-media/src/lib/server-functions.ts
Widens ImageGenerateInput.prompt and VideoGenerateInput.prompt to MediaPrompt; updates example Image/Video generators and server functions to build/validate MediaPrompt arrays (attachments, image-to-video start_frame).
Type-safety and resolver tests
packages/ai/tests/media-prompt.test.ts, packages/ai/tests/image-per-model-type-safety.test.ts
Adds Vitest suite for resolveMediaPrompt and TypeScript per-model modality safety tests ensuring compile-time enforcement.
Docs, changeset, scripts, e2e flags and tests
docs/media/image-generation.md, docs/media/video-generation.md, docs/adapters/grok.md, packages/ai/skills/ai-core/media-generation/SKILL.md, .changeset/image-and-video-inputs.md, testing/e2e/**, .gitignore
Extensive docs for image-conditioned generation, role hints, provider support matrix; adds generate:fal-image-fields script; updates e2e feature matrix, fixtures, and Playwright specs for image-to-image and image-to-video.

Estimated code review effort: 🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #707 — Overlaps in OpenRouter multimodal prompt handling and adapter message construction.

Suggested reviewers

  • crutchcorn
  • tannerlinsley

"A rabbit sketched a prompt and said,
Parts in order, frames ahead.
Text kept true, images in line,
Adapters map, the outputs shine.
Hop, stitch, and ship the media thread!"

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 618-image-to-image-and-image-to-video-support

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

🚀 Changeset Version Preview

8 package(s) bumped directly, 23 bumped as dependents.

🟥 Major bumps

Package Version Reason
@tanstack/ai-event-client 0.5.4 → 1.0.0 Changeset
@tanstack/ai-fal 0.7.23 → 1.0.0 Changeset
@tanstack/ai-gemini 0.15.1 → 1.0.0 Changeset
@tanstack/ai-grok 0.11.2 → 1.0.0 Changeset
@tanstack/ai-openai 0.14.1 → 1.0.0 Changeset
@tanstack/ai-openrouter 0.13.1 → 1.0.0 Changeset
@tanstack/ai-anthropic 0.15.1 → 1.0.0 Dependent
@tanstack/ai-code-mode 0.2.5 → 1.0.0 Dependent
@tanstack/ai-code-mode-skills 0.2.5 → 1.0.0 Dependent
@tanstack/ai-elevenlabs 0.2.20 → 1.0.0 Dependent
@tanstack/ai-groq 0.4.2 → 1.0.0 Dependent
@tanstack/ai-isolate-node 0.1.30 → 1.0.0 Dependent
@tanstack/ai-isolate-quickjs 0.1.30 → 1.0.0 Dependent
@tanstack/ai-ollama 0.8.1 → 1.0.0 Dependent
@tanstack/ai-preact 0.9.4 → 1.0.0 Dependent
@tanstack/ai-react 0.15.4 → 1.0.0 Dependent
@tanstack/ai-react-ui 0.8.6 → 1.0.0 Dependent
@tanstack/ai-solid 0.13.4 → 1.0.0 Dependent
@tanstack/ai-solid-ui 0.7.6 → 1.0.0 Dependent
@tanstack/ai-svelte 0.13.4 → 1.0.0 Dependent
@tanstack/ai-vue 0.13.4 → 1.0.0 Dependent
@tanstack/openai-base 0.8.1 → 1.0.0 Dependent

🟨 Minor bumps

Package Version Reason
@tanstack/ai 0.28.0 → 0.29.0 Changeset
@tanstack/ai-client 0.16.3 → 0.17.0 Changeset

🟩 Patch bumps

Package Version Reason
@tanstack/ai-devtools-core 0.4.8 → 0.4.9 Dependent
@tanstack/ai-isolate-cloudflare 0.2.21 → 0.2.22 Dependent
@tanstack/ai-mcp 0.1.0 → 0.1.1 Dependent
@tanstack/ai-vue-ui 0.2.16 → 0.2.17 Dependent
@tanstack/preact-ai-devtools 0.1.51 → 0.1.52 Dependent
@tanstack/react-ai-devtools 0.2.51 → 0.2.52 Dependent
@tanstack/solid-ai-devtools 0.2.51 → 0.2.52 Dependent

@nx-cloud

nx-cloud Bot commented May 22, 2026

Copy link
Copy Markdown

View your CI Pipeline Execution ↗ for commit 1b52533

Command Status Duration Result
nx run-many --targets=build --exclude=examples/... ✅ Succeeded 1s View ↗

☁️ Nx Cloud last updated this comment at 2026-06-11 02:35:04 UTC

@pkg-pr-new

pkg-pr-new Bot commented May 22, 2026

Copy link
Copy Markdown

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@624

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@624

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@624

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@624

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@624

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@624

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@624

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@624

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@624

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@624

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@624

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@624

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@624

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@624

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@624

@tanstack/ai-mcp

npm i https://pkg.pr.new/@tanstack/ai-mcp@624

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@624

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@624

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@624

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@624

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@624

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@624

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@624

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@624

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@624

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@624

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@624

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@624

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@624

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@624

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@624

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@624

commit: 1b52533

@tombeckenham tombeckenham force-pushed the 618-image-to-image-and-image-to-video-support branch from 0740073 to 483a3d4 Compare June 5, 2026 08:10
@tombeckenham tombeckenham marked this pull request as ready for review June 5, 2026 08:25
@tombeckenham tombeckenham requested a review from a team as a code owner June 5, 2026 08:25
@tombeckenham tombeckenham requested a review from AlemTuzlak June 5, 2026 08:27

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (1)
packages/ai-fal/tests/image-inputs.test.ts (1)

1-249: ⚡ Quick win

Move this unit test alongside its source module.

This test lives under packages/ai-fal/tests/ instead of next to the mapped source (e.g., near packages/ai-fal/src/image/image-inputs.ts), which diverges from the repo’s unit-test placement rule.

As per coding guidelines, "**/*.test.ts: Place unit tests in *.test.ts files alongside source code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-fal/tests/image-inputs.test.ts` around lines 1 - 249, The test
file image-inputs.test.ts needs to be moved to sit next to its source module
image-inputs.ts (the module exporting mapImageInputsToFalFields and
mapImageInputsToFalVideoFields); relocate the test file into the same directory
as image-inputs.ts, update any relative imports (e.g., ../src/image/image-inputs
to ./image-inputs) and the generated constant import if needed, and ensure the
test runner still discovers it (adjust any package test config or export paths
if required).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/image-generation.md`:
- Around line 151-155: Update the example calls to use the provider's current
image model instead of the hardcoded 'gpt-image-1': replace
openaiImage('gpt-image-1') with the latest model string from the OpenAI
adapter's model-meta.ts (use the adapter helper openaiImage(...) with that model
name) for every example instance (e.g., the generateImage call shown and the
other occurrences noted). Locate and edit the generateImage usage and any other
examples referencing openaiImage to pull the canonical model identifier from the
OpenAI adapter's model-meta.ts and substitute it in the docs so examples stay
up-to-date.

In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 98-109: The current merge order in the input construction lets
mapped image fields from mapImageInputsToFalFields (inputFields) overwrite
options.modelOptions, contradicting the intended behavior; change the spread
order so options.modelOptions is applied last (i.e., merge sizeParams and
inputFields first, then ...options.modelOptions) so per-endpoint modelOptions
override derived image-input keys when creating the input object (look for the
input variable creation that includes mapImageInputsToFalFields, sizeParams, and
prompt).

In `@packages/ai-gemini/src/adapters/image.ts`:
- Around line 256-264: The imagePartToGeminiPart path currently calls
fetch(part.source.value) with no URL safety checks; add a fail-closed URL gate
before the fetch that (1) parses part.source.value, (2) enforces a protocol
allowlist (only https and optionally data/gs if your flow needs them), and (3)
resolves and blocks private/loopback/internal IP ranges (RFC1918, 127.0.0.0/8,
IPv6 equivalents, localhost, and link-local) as well as preventing hostnames
that resolve to those IPs; if the URL fails any check, throw an Error instead of
calling fetch. Locate this logic around imagePartToGeminiPart where
fetch(part.source.value) is invoked and ensure errors surface before converting
the response to blob/arrayBuffer and calling arrayBufferToBase64.

In `@packages/ai-grok/src/adapters/image.ts`:
- Around line 254-264: The fetch call in editImages is missing a bounded
timeout; create an AbortController inside the editImages function, pass
controller.signal to the fetch options, and start a setTimeout that calls
controller.abort() after a chosen timeout (e.g., 30s or a configurable
constant); ensure you clearTimeout when fetch resolves/rejects and handle abort
errors appropriately (keep existing error handling behavior). Update the fetch
invocation in packages/ai-grok/src/adapters/image.ts to include signal:
controller.signal and wrap the timer lifecycle so the request cannot hang
indefinitely.

In `@packages/ai-openai/src/adapters/image.ts`:
- Around line 193-198: The error thrown when maxImages === 0 gives incorrect
guidance by omitting supported models (it lists gpt-image-1, gpt-image-1-mini,
or dall-e-2 but not gpt-image-2); update the message in the block using
EDIT_MAX_IMAGES and the local variable maxImages (and this.name) to mention
gpt-image-2 or, better, mirror the keys/allowed models from EDIT_MAX_IMAGES so
the guidance matches the actual supported-image models.

In `@packages/ai-openai/src/adapters/video.ts`:
- Around line 90-95: validateVideoSize is only checking the root size variable
and can miss modelOptions.size used later when building the request; update the
code around the adapter where you destructure options (the block with const {
model, size, duration, modelOptions } and const { imageInputs, videoInputs,
audioInputs }) to compute an effectiveSize (e.g., size ?? modelOptions?.size)
and call validateVideoSize(model, effectiveSize) so the value actually sent in
the request is validated; do the same pattern for seconds already handled
(seconds = duration ?? modelOptions?.seconds) to ensure consistency.

In `@packages/ai-openai/src/image/image-input-to-file.ts`:
- Around line 49-55: The fetch of arbitrary part.source.value is unsafe and can
enable SSRF or hanging requests; update the image ingestion in
image-input-to-file.ts to (1) validate the URL before fetching by parsing
part.source.value with new URL and reject non-http(s) schemes, disallow
localhost/127.0.0.0/8 and RFC1918 private IP ranges (resolve hostname to IP and
check against denylist), and disallow file:, data: (handle data: separately),
and (2) perform the fetch with a bounded timeout using AbortController (and
limit redirects) so requests cannot hang; apply these checks and timeout around
the existing fetch/response logic that references part.source.value.

In `@packages/ai/skills/ai-core/media-generation/SKILL.md`:
- Around line 673-676: Update the wording that currently reads "Adapters throw a
clear runtime error when the caller passes `imageInputs` to a model that can't
honor it (dall-e-3, Imagen, Grok, OpenRouter)" to scope the limitation to
specific unsupported models rather than entire providers; keep the phrase about
adapters and `imageInputs` but replace the parenthetical list with explicit
unsupported model examples (e.g., "dall-e-3, Imagen") or rephrase to "certain
models (for example, dall-e-3 and Imagen)" and remove or clarify Grok/OpenRouter
as providers so you don't imply the whole provider lacks image-conditioned
routes. Ensure the corrected sentence still mentions adapters throwing a runtime
error for unsupported models.

In `@scripts/generate-fal-image-field-map.ts`:
- Around line 223-236: The arity sanity check is being skipped for
default-selected fields because the loop continues when chosen ===
DEFAULTS[role]; update the loop so it does not early-continue for
default-selected fields — only skip when no chosen — and ensure the existing
arity comparison using isList.get(chosen), LIST_FIELDS.has(chosen), and
endpointId still runs for DEFAULTS[role] values so mismatches between the
runtime type map (isList) and the static LIST_FIELDS are caught (also update any
comment to mention image-inputs.ts where LIST_FIELDS is defined).

---

Nitpick comments:
In `@packages/ai-fal/tests/image-inputs.test.ts`:
- Around line 1-249: The test file image-inputs.test.ts needs to be moved to sit
next to its source module image-inputs.ts (the module exporting
mapImageInputsToFalFields and mapImageInputsToFalVideoFields); relocate the test
file into the same directory as image-inputs.ts, update any relative imports
(e.g., ../src/image/image-inputs to ./image-inputs) and the generated constant
import if needed, and ensure the test runner still discovers it (adjust any
package test config or export paths if required).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c39fe612-aaa6-4d46-a616-b99323822c03

📥 Commits

Reviewing files that changed from the base of the PR and between c0af426 and 483a3d4.

⛔ Files ignored due to path filters (1)
  • packages/ai-fal/src/image/generated/image-field-overrides.ts is excluded by !**/generated/**
📒 Files selected for processing (29)
  • .changeset/image-and-video-inputs.md
  • .prettierignore
  • docs/adapters/grok.md
  • docs/media/image-generation.md
  • docs/media/video-generation.md
  • package.json
  • packages/ai-event-client/src/index.ts
  • packages/ai-fal/src/adapters/image.ts
  • packages/ai-fal/src/adapters/video.ts
  • packages/ai-fal/src/image/image-inputs.ts
  • packages/ai-fal/tests/image-inputs.test.ts
  • packages/ai-gemini/src/adapters/image.ts
  • packages/ai-grok/src/adapters/image.ts
  • packages/ai-grok/src/image/image-provider-options.ts
  • packages/ai-grok/src/model-meta.ts
  • packages/ai-grok/tests/grok-adapter.test.ts
  • packages/ai-openai/src/adapters/image.ts
  • packages/ai-openai/src/adapters/video.ts
  • packages/ai-openai/src/image/image-input-to-file.ts
  • packages/ai-openai/tests/image-adapter.test.ts
  • packages/ai-openrouter/src/adapters/image.ts
  • packages/ai-openrouter/tests/image-adapter.test.ts
  • packages/ai/skills/ai-core/media-generation/SKILL.md
  • packages/ai/src/activities/generateImage/index.ts
  • packages/ai/src/activities/generateVideo/index.ts
  • packages/ai/src/types.ts
  • scripts/generate-fal-image-field-map.ts
  • testing/e2e/src/lib/feature-support.ts
  • testing/e2e/src/lib/types.ts

Comment thread docs/media/image-generation.md
Comment thread packages/ai-fal/src/adapters/image.ts Outdated
Comment thread packages/ai-gemini/src/adapters/image.ts
Comment thread packages/ai-grok/src/adapters/image.ts
Comment thread packages/ai-openai/src/adapters/image.ts
Comment thread packages/ai-openai/src/adapters/video.ts
Comment thread packages/ai-openai/src/image/image-input-to-file.ts
Comment thread packages/ai/skills/ai-core/media-generation/SKILL.md Outdated
Comment thread scripts/generate-fal-image-field-map.ts
@tombeckenham tombeckenham changed the title feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation feat: multimodal prompt for generateImage/generateVideo (image-to-image, image-to-video) Jun 7, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
packages/ai-fal/src/adapters/image.ts (1)

109-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Merge order still prevents modelOptions from overriding derived fields.

The past review comment on this issue remains valid. Current spread order has inputFields (line 112) override modelOptions (line 110), contradicting the comment on lines 105-107 that claims "user overrides win." Users cannot override derived image-input fields (e.g., mask_url, reference_image_urls) via modelOptions when imageInputs are present.

💡 Apply the past reviewer's fix
     const input = {
-      ...options.modelOptions,
       ...sizeParams,
       ...inputFields,
+      ...options.modelOptions,
       ...(resolved.text ? { prompt: resolved.text } : {}),
       num_images: options.numberOfImages,
     } as FalModelInput<TModel>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-fal/src/adapters/image.ts` around lines 109 - 117, The merge
order currently lets inputFields override options.modelOptions so user-provided
modelOptions can't override derived image fields; update the spread order when
building the input object (the variable named input of type FalModelInput) so
that options.modelOptions is spread last (after sizeParams, inputFields and the
conditional prompt) — this ensures user modelOptions win and still preserves
resolved.text and num_images handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-fal/src/adapters/video.ts`:
- Around line 164-174: The current construction of the input object lets
media-derived fields (sizeParams, inputImageFields, videoFields, audioFields)
override user-supplied modelOptions; change the spread order so modelOptions is
spread after the derived media fields (i.e., build input from sizeParams,
inputImageFields, videoFields, audioFields, then ...modelOptions) while
preserving the conditional prompt and duration spreads (resolved.text and
duration) and the FalModelInput<TModel> typing so user-provided keys like
video_url, reference_video_urls, or audio_url in modelOptions take precedence.

In `@packages/ai-openai/src/adapters/video.ts`:
- Around line 99-127: Add unit tests covering createVideoJob multimodal
handling: (1) text-only prompt should call Videos.create with model and prompt
and no input_reference; (2) text+one image should convert the resolved image via
imagePartToFile and attach the returned File as request.input_reference — mock
imagePartToFile and the OpenAI_SDK.Videos.create call to assert the passed
params include input_reference; (3) more than one image should throw the same
error path (verify the thrown message). Also add a Playwright + aimock E2E test
that posts a single-image multimodal prompt and asserts the upstream Sora
request contains a single input_reference upload. Reference createVideoJob,
resolveMediaPrompt, imagePartToFile, and request.input_reference when locating
code to test and to mock.

---

Duplicate comments:
In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 109-117: The merge order currently lets inputFields override
options.modelOptions so user-provided modelOptions can't override derived image
fields; update the spread order when building the input object (the variable
named input of type FalModelInput) so that options.modelOptions is spread last
(after sizeParams, inputFields and the conditional prompt) — this ensures user
modelOptions win and still preserves resolved.text and num_images handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 98656191-029e-4c17-87e3-a6eb4ac01845

📥 Commits

Reviewing files that changed from the base of the PR and between 483a3d4 and 2e47d5e.

📒 Files selected for processing (33)
  • .changeset/image-and-video-inputs.md
  • .gitignore
  • docs/adapters/grok.md
  • docs/media/image-generation.md
  • docs/media/video-generation.md
  • packages/ai-fal/src/adapters/image.ts
  • packages/ai-fal/src/adapters/video.ts
  • packages/ai-fal/src/image/image-inputs.ts
  • packages/ai-fal/src/model-meta.ts
  • packages/ai-gemini/src/adapters/image.ts
  • packages/ai-gemini/src/image/image-provider-options.ts
  • packages/ai-grok/src/adapters/image.ts
  • packages/ai-grok/src/image/image-provider-options.ts
  • packages/ai-grok/tests/grok-adapter.test.ts
  • packages/ai-openai/src/adapters/image.ts
  • packages/ai-openai/src/adapters/video.ts
  • packages/ai-openai/src/image/image-provider-options.ts
  • packages/ai-openai/src/video/video-provider-options.ts
  • packages/ai-openai/tests/image-adapter.test.ts
  • packages/ai-openrouter/src/adapters/image.ts
  • packages/ai-openrouter/src/image/image-provider-options.ts
  • packages/ai-openrouter/tests/image-adapter.test.ts
  • packages/ai/skills/ai-core/media-generation/SKILL.md
  • packages/ai/src/activities/generateImage/adapter.ts
  • packages/ai/src/activities/generateImage/index.ts
  • packages/ai/src/activities/generateVideo/adapter.ts
  • packages/ai/src/activities/generateVideo/index.ts
  • packages/ai/src/index.ts
  • packages/ai/src/types.ts
  • packages/ai/src/utilities/media-prompt.ts
  • packages/ai/tests/image-per-model-type-safety.test.ts
  • packages/ai/tests/media-prompt.test.ts
  • testing/e2e/src/lib/feature-support.ts
✅ Files skipped from review due to trivial changes (2)
  • docs/adapters/grok.md
  • docs/media/video-generation.md
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/ai-openrouter/src/adapters/image.ts
  • packages/ai-grok/src/image/image-provider-options.ts
  • packages/ai-grok/tests/grok-adapter.test.ts
  • packages/ai-openai/src/adapters/image.ts
  • packages/ai-fal/src/image/image-inputs.ts
  • packages/ai-grok/src/adapters/image.ts
  • packages/ai-gemini/src/adapters/image.ts

Comment thread packages/ai-fal/src/adapters/video.ts
Comment thread packages/ai-openai/src/adapters/video.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/ai-openai/tests/video-adapter.test.ts (1)

81-112: ⚡ Quick win

Split combined rejection test into separate cases for clarity.

This single test case validates rejection of both video and audio parts. While the coverage is correct, combining two distinct validation scenarios in one it() block reduces test clarity and makes failure diagnostics less precise. If either rejection path fails, the error message will be less informative about which modality caused the failure.

♻️ Proposed refactor to separate test cases
-  it('rejects video and audio prompt parts', async () => {
+  it('rejects video prompt parts', async () => {
     const { adapter, mockCreate } = mockedAdapter()

     await expect(
       adapter.createVideoJob({
         model: 'sora-2',
         prompt: [
           { type: 'text', content: 'x' },
           {
             type: 'video',
             source: { type: 'url', value: 'https://example.com/v.mp4' },
           },
         ],
         logger: testLogger,
       }),
     ).rejects.toThrow(/video prompt parts/)
+    expect(mockCreate).not.toHaveBeenCalled()
+  })

+  it('rejects audio prompt parts', async () => {
+    const { adapter, mockCreate } = mockedAdapter()

     await expect(
       adapter.createVideoJob({
         model: 'sora-2',
         prompt: [
           { type: 'text', content: 'x' },
           {
             type: 'audio',
             source: { type: 'url', value: 'https://example.com/a.mp3' },
           },
         ],
         logger: testLogger,
       }),
     ).rejects.toThrow(/audio prompt parts/)
     expect(mockCreate).not.toHaveBeenCalled()
   })
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-openai/tests/video-adapter.test.ts` around lines 81 - 112, Split
the combined rejection test into two separate tests to improve clarity: create
one it() block that uses mockedAdapter() and testLogger to call
adapter.createVideoJob with a prompt containing a video part and asserts
rejects.toThrow(/video prompt parts/) and that mockCreate was not called, and a
second it() block that does the same for a prompt containing an audio part
asserting rejects.toThrow(/audio prompt parts/) and mockCreate not called; keep
using mockedAdapter(), adapter.createVideoJob, mockCreate, and testLogger to
locate and reuse the existing mocks and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/ai-openai/tests/video-adapter.test.ts`:
- Around line 81-112: Split the combined rejection test into two separate tests
to improve clarity: create one it() block that uses mockedAdapter() and
testLogger to call adapter.createVideoJob with a prompt containing a video part
and asserts rejects.toThrow(/video prompt parts/) and that mockCreate was not
called, and a second it() block that does the same for a prompt containing an
audio part asserting rejects.toThrow(/audio prompt parts/) and mockCreate not
called; keep using mockedAdapter(), adapter.createVideoJob, mockCreate, and
testLogger to locate and reuse the existing mocks and assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a45a71f-c5d6-4a1d-a238-42e1ecbdd798

📥 Commits

Reviewing files that changed from the base of the PR and between 2e47d5e and 36d7683.

📒 Files selected for processing (12)
  • .changeset/image-and-video-inputs.md
  • docs/media/image-generation.md
  • docs/media/video-generation.md
  • packages/ai-fal/src/image/image-inputs.ts
  • packages/ai-fal/tests/image-inputs.test.ts
  • packages/ai-fal/tests/video-adapter.test.ts
  • packages/ai-gemini/tests/image-adapter.test.ts
  • packages/ai-openai/src/adapters/image.ts
  • packages/ai-openai/src/adapters/video.ts
  • packages/ai-openai/tests/image-adapter.test.ts
  • packages/ai-openai/tests/video-adapter.test.ts
  • packages/ai/skills/ai-core/media-generation/SKILL.md
✅ Files skipped from review due to trivial changes (2)
  • docs/media/video-generation.md
  • .changeset/image-and-video-inputs.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • packages/ai/skills/ai-core/media-generation/SKILL.md
  • packages/ai-openai/tests/image-adapter.test.ts
  • packages/ai-fal/tests/image-inputs.test.ts
  • packages/ai-openai/src/adapters/video.ts
  • packages/ai-openai/src/adapters/image.ts
  • packages/ai-fal/src/image/image-inputs.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
examples/ts-react-media/src/components/ImageGenerator.tsx (1)

9-10: 💤 Low value

Fix import order per ESLint rules.

The ESLint import/order rule requires the type import from @/lib/media to come before the regular import from @/lib/server-functions. This is a formatting issue flagged by static analysis.

♻️ Suggested reordering
 import { generateImageFn } from '`@/lib/server-functions`'
 import { getRandomImagePrompt } from '`@/lib/prompts`'
 import { IMAGE_MODELS } from '`@/lib/models`'
-import { readImageFile, toImagePart } from '`@/lib/media`'
 import type { AttachedImage } from '`@/lib/media`'
+import { readImageFile, toImagePart } from '`@/lib/media`'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/ts-react-media/src/components/ImageGenerator.tsx` around lines 9 -
10, Reorder the imports so the type-only import AttachedImage is placed before
the value imports readImageFile and toImagePart to satisfy the ESLint
import/order rule; update the import lines in ImageGenerator.tsx so the `import
type { AttachedImage } from '`@/lib/media`'` statement appears above `import {
readImageFile, toImagePart } from '`@/lib/media`'`.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/ts-react-media/src/components/ImageGenerator.tsx`:
- Around line 194-241: Update the reference-images note in the ImageGenerator
component: replace the incorrect "Supported by Gemini native (NanoBanana) models
only" text with a correct statement that image inputs are supported only by the
Gemini image-preview models (e.g., "Supported by Gemini native image-preview
models only: gemini-3.1-flash-image-preview, gemini-3-pro-image-preview"). Do
not reference NanoBanana (fal-ai/nano-banana-pro) since server-functions.ts uses
asTextPrompt for that model (see asTextPrompt and asImagePrompt usages) and
NanoBanana does not accept image prompts.

In `@testing/e2e/src/components/VideoGenUI.tsx`:
- Around line 21-35: The fileToBase64 function currently resolves an empty
string when reader.result.split(',')[1] is undefined; update fileToBase64 to
validate the Data URL format and reject with a clear Error instead of returning
an empty string. Specifically, in fileToBase64 (where reader.onload handles
result), check that result is a string and that result.includes(',') and that
split(',')[1] is a non-empty string; if not, call reject(new Error('Invalid data
URL format')) (or similar) so callers receive an explicit error rather than an
empty value.
- Around line 68-83: In handleGenerate, add validation before calling
fileToBase64: verify imageFile exists and that its MIME type is an image (e.g.,
imageFile.type startsWith('image/') or fallback to checking the file extension)
and enforce a max size threshold (e.g., 5MB) to avoid large base64 payloads; if
the file fails validation, early-return and surface a user-friendly
error/notification instead of proceeding. Perform these checks before the base64
conversion and only call generate with the image block (and include
imageFile.type as mimeType) when validations pass; keep the validation logic
close to the existing handleGenerate, referencing imageFile, fileToBase64, and
generate.

---

Nitpick comments:
In `@examples/ts-react-media/src/components/ImageGenerator.tsx`:
- Around line 9-10: Reorder the imports so the type-only import AttachedImage is
placed before the value imports readImageFile and toImagePart to satisfy the
ESLint import/order rule; update the import lines in ImageGenerator.tsx so the
`import type { AttachedImage } from '`@/lib/media`'` statement appears above
`import { readImageFile, toImagePart } from '`@/lib/media`'`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65b1dca2-a819-4d1e-9ae0-2c9144267443

📥 Commits

Reviewing files that changed from the base of the PR and between 36d7683 and 349eb0d.

📒 Files selected for processing (26)
  • .changeset/image-and-video-inputs.md
  • docs/media/video-generation.md
  • examples/ts-react-media/src/components/ImageGenerator.tsx
  • examples/ts-react-media/src/components/VideoGenerator.tsx
  • examples/ts-react-media/src/lib/media.ts
  • examples/ts-react-media/src/lib/server-functions.ts
  • packages/ai-client/src/generation-types.ts
  • packages/ai-client/tests/video-generation-client.test.ts
  • packages/ai-fal/src/model-meta.ts
  • packages/ai-fal/tests/video-adapter.test.ts
  • packages/ai/src/client.ts
  • testing/e2e/README.md
  • testing/e2e/fixtures/image-to-image/basic.json
  • testing/e2e/global-setup.ts
  • testing/e2e/src/components/ImageGenUI.tsx
  • testing/e2e/src/components/VideoGenUI.tsx
  • testing/e2e/src/lib/feature-support.ts
  • testing/e2e/src/lib/features.ts
  • testing/e2e/src/lib/server-functions.ts
  • testing/e2e/src/routes/$provider/$feature.tsx
  • testing/e2e/src/routes/api.image.stream.ts
  • testing/e2e/src/routes/api.image.ts
  • testing/e2e/src/routes/api.video.stream.ts
  • testing/e2e/src/routes/api.video.ts
  • testing/e2e/tests/image-to-image.spec.ts
  • testing/e2e/tests/image-to-video.spec.ts
✅ Files skipped from review due to trivial changes (6)
  • testing/e2e/README.md
  • testing/e2e/src/routes/api.image.stream.ts
  • testing/e2e/fixtures/image-to-image/basic.json
  • packages/ai-client/tests/video-generation-client.test.ts
  • testing/e2e/src/routes/api.video.stream.ts
  • .changeset/image-and-video-inputs.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/ai-fal/tests/video-adapter.test.ts
  • docs/media/video-generation.md

Comment thread examples/ts-react-media/src/components/ImageGenerator.tsx
Comment thread testing/e2e/src/components/VideoGenUI.tsx
Comment thread testing/e2e/src/components/VideoGenUI.tsx
@tombeckenham tombeckenham removed the request for review from AlemTuzlak June 7, 2026 11:48
tombeckenham added a commit that referenced this pull request Jun 10, 2026
…e activity follow-ups

Closes #707.

- Add openRouterVideo: async jobs adapter for OpenRouter's dedicated video
  API (submit -> poll -> download). Per-model size/duration/option types are
  generated from GET /api/v1/videos/models; frame roles map onto
  frame_images[] / input_references[] per the MediaInputRole taxonomy.
- Teach the model-meta sync scripts the videos/models endpoint
  (openrouter.video-models.json + OPENROUTER_VIDEO_MODEL_META).
- Image adapter follow-ups from the #624 review: throw on unmapped sizes
  (the size union used a Unicode multiplication sign so every non-square
  size silently dropped its aspect ratio), throw on numberOfImages > 1
  (live-verified: the gateway ignores all count keys), expose
  image_config.strength.
- Completed videos are returned as data: URLs (unsigned_urls 401 without
  the API key header) with gateway-reported cost on usage.cost. The SDK's
  getVideoContent is bypassed: its matcher only accepts
  application/octet-stream while the endpoint serves video/mp4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/ai-fal/src/adapters/image.ts (1)

113-117: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strip prompt out of modelOptions before the spread.

Line 113 lets modelOptions.prompt survive whenever resolved.text is empty, because Line 116 only reasserts prompt conditionally. That reopens a second text-prompt path for media-only requests and breaks the “top-level prompt is the multimodal surface / call-controlled fields win” contract.

Proposed fix
+    const { prompt: _prompt, num_images: _numImages, ...modelOptions } =
+      options.modelOptions ?? {}
     const input = {
       ...sizeParams,
       ...inputFields,
-      ...options.modelOptions,
+      ...modelOptions,
       // Media-only prompts (e.g. upscalers, background removal) omit the
       // prompt field entirely rather than sending an empty string.
       ...(resolved.text ? { prompt: resolved.text } : {}),
       num_images: options.numberOfImages,
     } as FalModelInput<TModel>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai-fal/src/adapters/image.ts` around lines 113 - 117, The spread of
options.modelOptions currently allows modelOptions.prompt to leak through when
resolved.text is empty; remove prompt from modelOptions before spreading so only
the conditional top-level prompt (based on resolved.text) is used. Locate the
creation of the request payload where options.modelOptions is spread (refer to
options.modelOptions and resolved.text in adapters/image.ts) and replace the
spread with a prompt-stripped object (e.g., destructure to drop prompt from
options.modelOptions or shallow-copy and delete prompt) then spread that cleaned
object, preserving num_images and the conditional ...(resolved.text ? { prompt:
resolved.text } : {}) behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/ai-fal/src/adapters/image.ts`:
- Around line 113-117: The spread of options.modelOptions currently allows
modelOptions.prompt to leak through when resolved.text is empty; remove prompt
from modelOptions before spreading so only the conditional top-level prompt
(based on resolved.text) is used. Locate the creation of the request payload
where options.modelOptions is spread (refer to options.modelOptions and
resolved.text in adapters/image.ts) and replace the spread with a
prompt-stripped object (e.g., destructure to drop prompt from
options.modelOptions or shallow-copy and delete prompt) then spread that cleaned
object, preserving num_images and the conditional ...(resolved.text ? { prompt:
resolved.text } : {}) behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d43d1dcb-bf3a-4293-b67b-ff94ad4d392b

📥 Commits

Reviewing files that changed from the base of the PR and between 349eb0d and 0c65cc7.

📒 Files selected for processing (6)
  • examples/ts-react-media/src/components/ImageGenerator.tsx
  • packages/ai-fal/src/adapters/image.ts
  • packages/ai-fal/src/adapters/video.ts
  • packages/ai-openai/src/adapters/video.ts
  • scripts/generate-fal-image-field-map.ts
  • testing/e2e/src/components/VideoGenUI.tsx
🚧 Files skipped from review as they are similar to previous changes (4)
  • testing/e2e/src/components/VideoGenUI.tsx
  • packages/ai-openai/src/adapters/video.ts
  • examples/ts-react-media/src/components/ImageGenerator.tsx
  • packages/ai-fal/src/adapters/video.ts

tombeckenham and others added 9 commits June 11, 2026 10:13
…tioned generation (closes #618)

Adds optional `imageInputs`, `videoInputs`, and `audioInputs` to `generateImage()`
and `generateVideo()` for image-to-image, multi-reference, mask / inpaint,
image-to-video, and starting-frame flows. Each input part may carry a
`metadata.role` hint (`'reference' | 'mask' | 'control' | 'start_frame' |
'end_frame' | 'character'`) that adapters use to route to the provider-specific
field.

Provider behavior:
- OpenAI image: gpt-image-1 / -mini route to `images.edit()` (up to 16 + mask);
  dall-e-2 routes to `images.edit()` with one source; dall-e-3 throws.
- OpenAI video: Sora-2 / -pro accept a single `input_reference`; throws on >1.
- Gemini: native models receive inputs as multimodal `contents` parts; Imagen
  throws (text-only).
- fal: 1 input → `image_url`, >1 → `image_urls`; metadata roles map to
  `mask_url` / `control_image_url` / `reference_image_urls`; video adds
  `start_image_url` / `end_image_url`. Interim mapping until the fal schemas
  library lands.
- Grok, OpenRouter: throw with a link back to #618 (pending native Imagine API
  rewrite and multimodal injection work respectively).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…SDK type map

Replace the fal image-input field heuristic with a per-endpoint mapping
generated from @fal-ai/client's EndpointTypeMap (scripts/
generate-fal-image-field-map.ts, run via pnpm generate:fal-image-fields).
The committed artifact stores only the 362 endpoints whose field names
deviate from the defaults (e.g. nano-banana edit -> image_urls, Kling i2v
start frame -> image_url, Veo first-last-frame -> first_frame_url /
last_frame_url, Fooocus masks -> mask_image_url); the old heuristic
remains the fallback for endpoints newer than the installed SDK.

Safety rails: the generated file `satisfies`-checks every field name
against the SDK endpoint types (type-only, erased at runtime), and a unit
test hashes the installed endpoints.d.ts against the recorded hash so an
SDK bump without regeneration fails test:lib with the regen command.

Mappers are now typed: both return FalImageInputFields<TModel>, Pick'ed
from the endpoint's real input type via a generated field-name union.
Roles resolving to the same list field merge (source + reference on
nano-banana); colliding scalar fields throw instead of overwriting.

Also fixes the remaining CI lint failures: duplicate @tanstack/ai import
and non-null assertion in ai-fal video.ts, switch-exhaustiveness errors
in image-inputs.ts (restructured away), and the non-null assertion in
ai-openai image.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d generation

Grok: add the xAI Imagine API image models (grok-imagine-image,
grok-imagine-image-quality) to model-meta. With imageInputs they route to
xAI's JSON POST /v1/images/edits endpoint via direct fetch (the OpenAI
SDK's images.edit() sends multipart/form-data, which xAI rejects) — a
single input as image:{url}, 2-3 inputs as images:[...] referenceable in
the prompt as <IMAGE_0>/<IMAGE_1>; >3 inputs and mask/control roles throw.
Their generic `size` uses an aspectRatio_resolution template ('16:9_2k',
suffix optional), mirroring Gemini's native image models, and maps to the
Imagine aspect_ratio/resolution parameters on both the generate and edit
paths. grok-2-image-1212 stays text-to-image only with a clear error.

OpenRouter: imageInputs are injected as multimodal image_url content parts
alongside the prompt in the chat-completions message and forwarded to the
underlying image model.

Neither path fetches or base64-encodes URL sources in-process — URLs pass
through verbatim and are fetched by the provider; data sources become data
URIs. Bumps ai-grok and ai-openrouter to minor in the existing changeset.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… API drift

- Move the generated fal image-field map and the generator's paths from
  packages/typescript/ai-fal to packages/ai-fal (repo flattened the layout)
- Add gpt-image-2 to EDIT_MAX_IMAGES (new model on main; same 16-image
  edit limit as the other gpt-image models)
- Map edit-path usage through buildImagesUsage to match the new TokenUsage
  shape, and drop two now-unnecessary type assertions

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s text through verbatim

Replace the imageInputs / videoInputs / audioInputs fields with a multimodal
prompt: string | MediaPromptPart[]. Part order is meaningful — natively
multimodal providers (Gemini, OpenRouter) receive parts in interleaved order;
named-field providers (OpenAI, fal, xAI) extract media parts via the new
resolveMediaPrompt() utility and flatten the text.

Zero magic: prompt text is always sent verbatim. The SDK never injects or
rewrites in-prompt referencing markers — users write each provider's own
convention (fal Kling/Seedance @image1, OpenAI/FLUX.2 "image 1" prose, Gemini
content descriptions), now documented per provider in the media docs. An
earlier grok <IMAGE_n> auto-injection was removed after research showed the
convention is absent from xAI's official docs (images are addressed by
request order).

- Per-model compile-time prompt narrowing via TModelInputModalitiesByName
  adapter generic (e.g. dall-e-3 / Imagen reject image parts as a type
  error); fal modality maps are derived at the type level from the SDK's
  endpoint input types
- metadata.tag added as an informational label (never read by adapters)
- Gemini now preserves true interleaving in contents; OpenRouter maps parts
  1:1 onto chat content parts in order

Closes #618

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- openai: add gpt-image-2 to the editImages error message and JSDoc
  (the model is edit-capable via EDIT_MAX_IMAGES but was omitted from
  user-facing guidance); same fix in docs, SKILL.md, and the changeset
- openai: throw when the images.edit() response contains no usable
  images (matching grok's guard) instead of resolving to { images: [] }
- openai: drop the unnecessary input_reference cast in the Sora
  adapter — the SDK types the field, so assign directly
- fal: reject metadata.role 'mask'/'control' in the video mapper
  instead of silently folding them into source frames
- docs: mark Veo role mappings as planned (no Veo adapter yet), note
  the Gemini ~14-image limit is provider-side, bump samples to
  gpt-image-2
- tests: cover the Gemini image-conditioned path (interleaved
  contents, fileData vs inlineData vs fetch+inline, Imagen/video/audio
  rejection), the Sora input_reference upload and guards (new file),
  the fal video createVideoJob field assembly and audio guard, and the
  openai empty-edit-response guard

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same defect class as the editImages guard in the previous commit: the
text-to-image path silently resolved to { images: [] } when response
items had neither b64_json nor url. Surface it as an error instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l field demotion

- ai-client: widen ImageGenerateInput.prompt / VideoGenerateInput.prompt
  from string to MediaPrompt so useGenerateImage/useGenerateVideo can
  carry image parts from the browser; re-export the MediaPrompt types
  from @tanstack/ai/client
- ai-fal: demote media-conditioning fields (FalImageFieldName set plus
  video_url/video_urls/reference_video_urls/audio_url) from required to
  optional in FalImageProviderOptions / FalVideoProviderOptions — i2v
  endpoints declare e.g. image_url as required, but with a multimodal
  prompt the start frame arrives as a prompt part; modelOptions stays
  available as the explicit escape hatch
- e2e: real coverage for image-to-image (OpenAI /v1/images/edits) and
  image-to-video (Sora multipart /v1/videos with input_reference) — the
  installed aimock 1.29 mocks both multipart endpoints, so the previous
  "aimock can't mock this" empty provider sets were stale. New specs run
  all three transports and assert via aimock's request journal that the
  expected wire endpoint was hit. ImageGenUI/VideoGenUI gain a file
  input, feature routing/fixtures/onVideo registration added, README
  matrix updated
- examples/ts-react-media: ImageGenerator gains a multi-image reference
  picker (Gemini native models); VideoGenerator sends the start frame as
  a prompt part with role 'start_frame' instead of modelOptions URLs;
  server functions narrow the wire prompt per model and throw on
  unsupported part kinds instead of dropping them

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- fal image/video: spread modelOptions after derived media fields so
  explicit user overrides win (matches documented intent)
- openai video: validate effective size (size ?? modelOptions.size)
- generate-fal-image-field-map: run arity check for default-selected
  fields too
- ts-react-media example: correct reference-image support comment
  (Gemini multimodal models, not NanoBanana)
- e2e VideoGenUI: reject on malformed data URL instead of resolving ''

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tombeckenham tombeckenham force-pushed the 618-image-to-image-and-image-to-video-support branch from 0c65cc7 to acd7319 Compare June 11, 2026 00:18
tombeckenham added a commit that referenced this pull request Jun 11, 2026
…e activity follow-ups

Closes #707.

- Add openRouterVideo: async jobs adapter for OpenRouter's dedicated video
  API (submit -> poll -> download). Per-model size/duration/option types are
  generated from GET /api/v1/videos/models; frame roles map onto
  frame_images[] / input_references[] per the MediaInputRole taxonomy.
- Teach the model-meta sync scripts the videos/models endpoint
  (openrouter.video-models.json + OPENROUTER_VIDEO_MODEL_META).
- Image adapter follow-ups from the #624 review: throw on unmapped sizes
  (the size union used a Unicode multiplication sign so every non-square
  size silently dropped its aspect ratio), throw on numberOfImages > 1
  (live-verified: the gateway ignores all count keys), expose
  image_config.strength.
- Completed videos are returned as data: URLs (unsigned_urls 401 without
  the API key header) with gateway-reported cost on usage.cost. The SDK's
  getVideoContent is bypassed: its matcher only accepts
  application/octet-stream while the endpoint serves video/mp4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n contract (#634)

Restacked on 618-image-to-image-and-image-to-video-support to adopt the
multimodal MediaPrompt format, carrying a minimal additive port of the
#534 typed-duration contract:

- @tanstack/ai (non-breaking): VideoAdapter/BaseVideoAdapter gain a
  TModelDurationByName generic (default Record<string, number> preserves
  existing duration?: number typing), DurationOptions, snapToDurationOption,
  and default availableDurations()/snapDuration() implementations.
  generateVideo's duration is typed via VideoDurationForAdapter.
- @tanstack/ai-gemini: GeminiVideoAdapter over generateVideos /
  getVideosOperation with per-model typed durations (Veo 3.x 4|6|8,
  Veo 2 5|6|8 per current Veo docs), MediaPrompt image routing
  (start_frame → image, end_frame → lastFrame, reference/character →
  referenceImages), RAI filter surfacing, geminiVideo/createGeminiVideo
  factories, and finalized Veo model-meta entries.
- E2E: gemini added to video-gen with a custom aimock mount for
  :predictLongRunning + operations polling; all transports pass.
- Docs + media-generation skill updated for Veo (typed durations,
  image-to-video role table).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(gemini): add Google Veo video adapter on the typed-duration contract image-to-image and image-to-video support

1 participant