fix(provider): stabilize LM Studio Qwen requests by ipogosov · Pull Request #26744 · anomalyco/opencode

ipogosov · 2026-05-10T17:45:05Z

Issue for this PR

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Note: re-saving description after a pull_request_target race that briefly mis-flagged the type-of-change checkbox.

LM Studio's prefix cache only hits when the tokenized prompt prefix is byte-stable across turns. When OpenCode replays a Qwen / QwQ-family model's conversation history through LM Studio's OpenAI-compatible endpoint, two things make that prefix unstable:

Historical assistant reasoning content gets re-rendered differently by the Qwen chat template once a new user message is appended, so the same historical assistant turn tokenizes differently the second time it is in the prompt.
Standalone role: "tool" messages get rendered inconsistently by Qwen-style templates, while <tool_response>...</tool_response> blocks embedded in the surrounding turn render stably.

This PR makes the model-visible history stable for that specific provider/model shape, without changing behavior for non-Qwen OpenAI-compatible backends or for Anthropic/OpenAI proper:

Drop replayed assistant reasoning content for LM Studio Qwen-shaped requests. Reasoning is kept for the live turn (where the model emits it) but stripped from history before it is replayed, so the prefix that ends up in the cache does not contain content that re-renders unpredictably.
Inline tool messages as <tool_response> blocks. In the outgoing OpenAI-compatible JSON body, raw role: "tool" messages are converted into <tool_response>...</tool_response> text wrapped onto the previous turn for Qwen/QwQ-like LM Studio models. Non-Qwen targets keep the normal OpenAI-compatible tool-message shape.
Don't leak local compatibility options into the SDK call. The flags that drive this normalization are kept out of providerOptions so the underlying provider/SDK only sees standard fields.

There are tests covering both the default normalization behavior and the opt-out (i.e. that a non-Qwen LM Studio model is not affected). The change is gated by provider+model detection; it does not run for Anthropic, OpenAI, or arbitrary OpenAI-compatible endpoints.

This is intentionally separate from the plan-mode reminder persistence fix in the sibling PR. That one is a general OpenCode history-stability bug; this PR is provider/model compatibility behavior for LM Studio Qwen-style chat templates.

How did you verify your code works?

bun typecheck from packages/opencode
bun test test/provider/transform.test.ts — covers the default tool-message-to-<tool_response> rewrite for LM Studio Qwen-shaped requests, the reasoning-strip on replayed assistant turns, and the opt-out path for non-Qwen models so they keep the standard OpenAI-compatible shape.
Manually exercised against LM Studio running a Qwen-family model: confirmed that on turn 2+ the prompt prefix matches the prefix that was sent on turn 1, so LM Studio's prefix-cache hit ratio stops collapsing on multi-turn sessions.

Screenshots / recordings

N/A, non-UI change.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

github-actions · 2026-05-10T17:45:14Z

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

Open an issue describing the bug/feature (if one doesn't exist)
Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

github-actions · 2026-05-10T17:45:46Z

The following comment was made by an LLM, it may be inaccurate:

Based on my search results, here are the potentially related PRs:

PR feat(opencode): add dynamic configuration and context discovery for LM Studio #15732 - "feat(opencode): add dynamic configuration and context discovery for LM Studio"
- Related because it also involves LM Studio provider configuration/handling
PR fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability #14743 - "fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability"
- Related because it addresses similar prompt cache stability issues with tool messages
PR fix(session): cache messages across prompt loop to preserve prompt cache byte-identity #25367 - "fix(session): cache messages across prompt loop to preserve prompt cache byte-identity"
- Related because it focuses on maintaining cache stability across requests
PR feat(opencode): cache-aligned compaction to reuse prefix cache #25100 - "feat(opencode): cache-aligned compaction to reuse prefix cache"
- Related because it addresses prefix cache optimization

The most directly related appears to be PR #15732 since it specifically deals with LM Studio provider behavior. The others address related prompt caching and message stability concerns that may overlap with your Qwen request stabilization work.

github-actions · 2026-05-10T18:09:56Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

github-actions · 2026-05-10T18:31:26Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

fix(provider): stabilize LM Studio Qwen requests

2788b06

github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 10, 2026

github-actions Bot added the needs:issue label May 10, 2026

github-actions Bot removed needs:issue needs:compliance This means the issue will auto-close after 2 hours. labels May 10, 2026

github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels May 10, 2026

github-actions Bot mentioned this pull request May 10, 2026

fix(provider): discover LM Studio models from /v1/models #26756

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(provider): stabilize LM Studio Qwen requests#26744

fix(provider): stabilize LM Studio Qwen requests#26744
ipogosov wants to merge 1 commit intoanomalyco:devfrom
ipogosov:pr-lmstudio-qwen-cache-stability

ipogosov commented May 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ipogosov commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ipogosov commented May 10, 2026 •

edited

Loading