Skip to content

fix(provider): stabilize LM Studio Qwen requests#26744

Open
ipogosov wants to merge 1 commit intoanomalyco:devfrom
ipogosov:pr-lmstudio-qwen-cache-stability
Open

fix(provider): stabilize LM Studio Qwen requests#26744
ipogosov wants to merge 1 commit intoanomalyco:devfrom
ipogosov:pr-lmstudio-qwen-cache-stability

Conversation

@ipogosov
Copy link
Copy Markdown

@ipogosov ipogosov commented May 10, 2026

Issue for this PR

Closes #26750

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Note: re-saving description after a pull_request_target race that briefly mis-flagged the type-of-change checkbox.

LM Studio's prefix cache only hits when the tokenized prompt prefix is byte-stable across turns. When OpenCode replays a Qwen / QwQ-family model's conversation history through LM Studio's OpenAI-compatible endpoint, two things make that prefix unstable:

  1. Historical assistant reasoning content gets re-rendered differently by the Qwen chat template once a new user message is appended, so the same historical assistant turn tokenizes differently the second time it is in the prompt.
  2. Standalone role: "tool" messages get rendered inconsistently by Qwen-style templates, while <tool_response>...</tool_response> blocks embedded in the surrounding turn render stably.

This PR makes the model-visible history stable for that specific provider/model shape, without changing behavior for non-Qwen OpenAI-compatible backends or for Anthropic/OpenAI proper:

  • Drop replayed assistant reasoning content for LM Studio Qwen-shaped requests. Reasoning is kept for the live turn (where the model emits it) but stripped from history before it is replayed, so the prefix that ends up in the cache does not contain content that re-renders unpredictably.
  • Inline tool messages as <tool_response> blocks. In the outgoing OpenAI-compatible JSON body, raw role: "tool" messages are converted into <tool_response>...</tool_response> text wrapped onto the previous turn for Qwen/QwQ-like LM Studio models. Non-Qwen targets keep the normal OpenAI-compatible tool-message shape.
  • Don't leak local compatibility options into the SDK call. The flags that drive this normalization are kept out of providerOptions so the underlying provider/SDK only sees standard fields.

There are tests covering both the default normalization behavior and the opt-out (i.e. that a non-Qwen LM Studio model is not affected). The change is gated by provider+model detection; it does not run for Anthropic, OpenAI, or arbitrary OpenAI-compatible endpoints.

This is intentionally separate from the plan-mode reminder persistence fix in the sibling PR. That one is a general OpenCode history-stability bug; this PR is provider/model compatibility behavior for LM Studio Qwen-style chat templates.

How did you verify your code works?

  • bun typecheck from packages/opencode
  • bun test test/provider/transform.test.ts — covers the default tool-message-to-<tool_response> rewrite for LM Studio Qwen-shaped requests, the reasoning-strip on replayed assistant turns, and the opt-out path for non-Qwen models so they keep the standard OpenAI-compatible shape.
  • Manually exercised against LM Studio running a Qwen-family model: confirmed that on turn 2+ the prompt prefix matches the prefix that was sent on turn 1, so LM Studio's prefix-cache hit ratio stops collapsing on multi-turn sessions.

Screenshots / recordings

N/A, non-UI change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search results, here are the potentially related PRs:

  1. PR feat(opencode): add dynamic configuration and context discovery for LM Studio #15732 - "feat(opencode): add dynamic configuration and context discovery for LM Studio"

    • Related because it also involves LM Studio provider configuration/handling
  2. PR fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability #14743 - "fix(cache): improve Anthropic prompt cache hit rate with system split and tool stability"

    • Related because it addresses similar prompt cache stability issues with tool messages
  3. PR fix(session): cache messages across prompt loop to preserve prompt cache byte-identity #25367 - "fix(session): cache messages across prompt loop to preserve prompt cache byte-identity"

    • Related because it focuses on maintaining cache stability across requests
  4. PR feat(opencode): cache-aligned compaction to reuse prefix cache #25100 - "feat(opencode): cache-aligned compaction to reuse prefix cache"

    • Related because it addresses prefix cache optimization

The most directly related appears to be PR #15732 since it specifically deals with LM Studio provider behavior. The others address related prompt caching and message stability concerns that may overlap with your Qwen request stabilization work.

@github-actions github-actions Bot removed needs:issue needs:compliance This means the issue will auto-close after 2 hours. labels May 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@github-actions github-actions Bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels May 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LM Studio + Qwen: prompt prefix is not byte-stable across turns, breaks prefix cache

1 participant