fix: support image input in OpenAI Chat user messages#26826
Conversation
Closes anomalyco#20802 Converts MediaPart to OpenAI image_url content block format in user messages.
|
The following comment was made by an LLM, it may be inaccurate: Related PR Found:
Why it's related: |
There was a problem hiding this comment.
Pull request overview
This PR fixes OpenAI Chat protocol request lowering to support multimodal user messages by translating MediaPart inputs into OpenAI Chat image_url content blocks, allowing image attachments to reach vision-capable OpenAI-compatible /chat/completions backends.
Changes:
- Extended the OpenAI Chat request schema so
user.contentcan be either astringor an array of{type: "text" | "image_url"}content blocks. - Implemented
lowerUserPart/ updatedlowerUserMessageto convertMediaPartintoimage_urldata URLs (base64), and emit content blocks when any media is present. - Added unit tests to cover media-only, mixed text+media, and text-only user message lowering behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/llm/src/protocols/openai-chat.ts | Adds schemas for multimodal content blocks and lowers MediaPart into OpenAI Chat image_url blocks for user messages. |
| packages/llm/test/provider/openai-chat.test.ts | Adds/updates tests asserting correct request-body lowering for media-only, mixed, and text-only user messages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Issue for this PR
Closes #20802
Type of change
What does this PR do?
Adds image input support in the OpenAI Chat protocol layer by converting
MediaPartto OpenAI'simage_urlcontent block format in user messages.Root cause: In
packages/llm/src/protocols/openai-chat.ts, thelowerUserMessagefunction only acceptedTextPartcontent. When aMediaPartwas encountered, it returned anunsupportedContenterror, preventing image attachments from reaching vision-capable models.Fix:
OpenAIChatTextContentBlockandOpenAIChatImageUrlContentBlockschemasstring | ContentBlock[]lowerUserPartfunction that converts:TextPart→{ type: "text", text: "..." }MediaPart→{ type: "image_url", image_url: { url: "data:<mediaType>;base64,<data>" } }lowerUserMessageto use content blocks when media is presentThis is a protocol-layer fix that complements the provider-layer fix in #21627. While #21627 addresses capability detection, this PR ensures the conversion logic at the protocol level correctly transforms media parts into the OpenAI-compatible format.
How did you verify your code works?
prepares user message with media as image_url content blockprepares user message with mixed text and mediaprepares user message with only text (no content blocks)Checklist
Comparison with #21627
#21627 fixes image support at the provider capability detection layer (1-line change in
provider.ts).This PR fixes image support at the protocol conversion layer in
openai-chat.ts, ensuringMediaPartis correctly transformed toimage_urlcontent blocks. Both PRs address the same end goal but at different layers of the stack, and they are complementary.