Skip to content

fix: support image input in OpenAI Chat user messages#26826

Open
KaguraTart wants to merge 2 commits into
anomalyco:devfrom
KaguraTart:fix/openai-chat-vision
Open

fix: support image input in OpenAI Chat user messages#26826
KaguraTart wants to merge 2 commits into
anomalyco:devfrom
KaguraTart:fix/openai-chat-vision

Conversation

@KaguraTart
Copy link
Copy Markdown

Issue for this PR

Closes #20802

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds image input support in the OpenAI Chat protocol layer by converting MediaPart to OpenAI's image_url content block format in user messages.

Root cause: In packages/llm/src/protocols/openai-chat.ts, the lowerUserMessage function only accepted TextPart content. When a MediaPart was encountered, it returned an unsupportedContent error, preventing image attachments from reaching vision-capable models.

Fix:

  1. Added OpenAIChatTextContentBlock and OpenAIChatImageUrlContentBlock schemas
  2. Updated user message schema to accept string | ContentBlock[]
  3. Added lowerUserPart function that converts:
    • TextPart{ type: "text", text: "..." }
    • MediaPart{ type: "image_url", image_url: { url: "data:<mediaType>;base64,<data>" } }
  4. Updated lowerUserMessage to use content blocks when media is present

This is a protocol-layer fix that complements the provider-layer fix in #21627. While #21627 addresses capability detection, this PR ensures the conversion logic at the protocol level correctly transforms media parts into the OpenAI-compatible format.

How did you verify your code works?

  1. All 16 unit tests pass (1 unrelated failure due to missing API key)
  2. Added 3 new tests covering media handling:
    • prepares user message with media as image_url content block
    • prepares user message with mixed text and media
    • prepares user message with only text (no content blocks)
  3. Typecheck passes for all 14 packages

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Comparison with #21627

#21627 fixes image support at the provider capability detection layer (1-line change in provider.ts).

This PR fixes image support at the protocol conversion layer in openai-chat.ts, ensuring MediaPart is correctly transformed to image_url content blocks. Both PRs address the same end goal but at different layers of the stack, and they are complementary.

Closes anomalyco#20802

Converts MediaPart to OpenAI image_url content block format in user messages.
@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Related PR Found:

Why it's related:
PR #21627 is the complementary provider-layer fix mentioned in the PR description. While this PR (26826) fixes the protocol conversion layer (transforming MediaPart to image_url content blocks in openai-chat.ts), PR #21627 handles capability detection at the provider level. Both PRs work together to enable complete image support for OpenAI models, but they address different layers of the stack. They are not duplicates—they are complementary fixes that should both be merged.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes OpenAI Chat protocol request lowering to support multimodal user messages by translating MediaPart inputs into OpenAI Chat image_url content blocks, allowing image attachments to reach vision-capable OpenAI-compatible /chat/completions backends.

Changes:

  • Extended the OpenAI Chat request schema so user.content can be either a string or an array of {type: "text" | "image_url"} content blocks.
  • Implemented lowerUserPart / updated lowerUserMessage to convert MediaPart into image_url data URLs (base64), and emit content blocks when any media is present.
  • Added unit tests to cover media-only, mixed text+media, and text-only user message lowering behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
packages/llm/src/protocols/openai-chat.ts Adds schemas for multimodal content blocks and lowers MediaPart into OpenAI Chat image_url blocks for user messages.
packages/llm/test/provider/openai-chat.test.ts Adds/updates tests asserting correct request-body lowering for media-only, mixed, and text-only user messages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom OpenAI-compatible providers: image file attachments do not reach vision-capable models correctly

2 participants