diff --git a/docs/_data/nav.yml b/docs/_data/nav.yml
index ce59a8490..5256bb387 100644
--- a/docs/_data/nav.yml
+++ b/docs/_data/nav.yml
@@ -144,6 +144,8 @@
items:
- title: Tips & Best Practices
url: /guides/tips/
+ - title: Thinking / Reasoning
+ url: /guides/thinking/
- title: Managing Secrets
url: /guides/secrets/
- title: Go SDK
diff --git a/docs/guides/thinking/index.md b/docs/guides/thinking/index.md
new file mode 100644
index 000000000..244848b06
--- /dev/null
+++ b/docs/guides/thinking/index.md
@@ -0,0 +1,330 @@
+---
+title: "Thinking / Reasoning"
+description: "Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock."
+permalink: /guides/thinking/
+---
+
+# Thinking / Reasoning
+
+_Control how much a model reasons before responding. Works across OpenAI, Anthropic, Google Gemini, and AWS Bedrock._
+
+## What Is Thinking?
+
+Several modern models support an extended reasoning phase that happens before they produce visible output. During this phase the model plans, evaluates options, and works through the problem — internally, not shown in the response by default. This typically improves accuracy on complex tasks like coding, math, and multi-step planning, at the cost of higher token usage and latency.
+
+docker-agent exposes this through a single `thinking_budget` field on any named model. The value format differs slightly by provider, but the semantics are the same: higher effort means more thorough reasoning.
+
+
+
Think tool vs. thinking budget
+
+
The think tool is a scratchpad for models that lack native reasoning. If your model supports thinking_budget, you do not need the think tool.
+
+
+## Quick Reference
+
+| Provider | Format | Values | Default |
+| -------------- | ---------- | ------------------------------------------------------------ | ------------ |
+| OpenAI | string | `minimal`, `low`, `medium`, `high`, `xhigh`, `none`, `adaptive/` (`max` only via `adaptive/max`) | `medium` |
+| Anthropic | int or str | 1024–32768 tokens, or `adaptive`, `low`–`max`, `none` | off |
+| Gemini 2.5 | int | `0` (off), `-1` (dynamic), or token count (max 24576 / 32768) | `-1` (dynamic)|
+| Gemini 3 | string | `minimal`, `low`, `medium`, `high` | model-dependent |
+| AWS Bedrock | int or str | 1024–32768 tokens (`minimal`–`max` mapped to tokens) | off |
+| xAI / Mistral | string | `minimal`, `low`, `medium`, `high`, `xhigh`, `none` | off |
+
+## OpenAI
+
+OpenAI reasoning models (o-series, gpt-5, gpt-5-mini) use a string effort level that maps to their `reasoning_effort` API parameter.
+
+```yaml
+models:
+ gpt-thinker:
+ provider: openai
+ model: gpt-5-mini
+ thinking_budget: high # minimal | low | medium | high | xhigh
+```
+
+**Effort levels:**
+
+| Level | Description |
+| --------- | -------------------------------------------------------- |
+| `none` | Disable thinking (alias for `0`). Minimum output floor still applies. |
+| `minimal` | Fastest; lightest reasoning pass. |
+| `low` | Quick reasoning for straightforward tasks. |
+| `medium` | Balanced default. |
+| `high` | More thorough; recommended for complex tasks. |
+| `xhigh` | Near-maximum effort; slower but most accurate. |
+
+
+
Tokens and max_tokens
+
+
OpenAI reasoning models always reason internally — even with thinking_budget: none there are hidden reasoning tokens that count against max_tokens. docker-agent automatically raises the output-token floor so hidden reasoning cannot starve visible text output.
+
+
+### Adaptive thinking (OpenAI)
+
+Use `adaptive/` to let the model decide when to apply extended reasoning and scale effort dynamically:
+
+```yaml
+models:
+ gpt-adaptive:
+ provider: openai
+ model: gpt-5-mini
+ thinking_budget: adaptive/medium # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
+```
+
+> `max` is only valid inside the `adaptive/` prefix — `thinking_budget: max` (bare) is not accepted by OpenAI.
+
+## Anthropic
+
+Anthropic Claude supports two thinking modes: a **token budget** (older models) and **adaptive / effort-based** thinking (newer models).
+
+### Token budget (Claude 4 and earlier)
+
+Set an explicit number of thinking tokens (1024–32768). This must be less than `max_tokens`:
+
+```yaml
+models:
+ claude-thinker:
+ provider: anthropic
+ model: claude-sonnet-4-5
+ thinking_budget: 16384 # tokens reserved for internal reasoning
+```
+
+docker-agent auto-adjusts `max_tokens` when you set a thinking budget but leave `max_tokens` at its default. If you set `max_tokens` explicitly, it must be greater than `thinking_budget`.
+
+### Adaptive thinking (Claude Opus 4.6+)
+
+Newer Claude models support adaptive thinking, where the model decides how much to think. Use `adaptive` or pair it with an effort level:
+
+```yaml
+models:
+ claude-adaptive:
+ provider: anthropic
+ model: claude-opus-4-6
+ thinking_budget: adaptive # model decides effort
+
+ claude-adaptive-low:
+ provider: anthropic
+ model: claude-opus-4-6
+ thinking_budget: low # adaptive with low effort: low | medium | high | max
+```
+
+**Adaptive effort levels:**
+
+| Level | Description |
+| --------- | ------------------------------------------------- |
+| `low` | Minimal thinking; fastest adaptive mode. |
+| `medium` | Moderate effort. |
+| `high` | Thorough reasoning; default for `adaptive`. |
+| `max` | Maximum effort. |
+
+### Disabling thinking
+
+```yaml
+thinking_budget: none # or 0
+```
+
+### Interleaved thinking
+
+By default, thinking is non-interleaved (the model reasons once before answering). Enable interleaved thinking to allow reasoning during tool calls — useful for complex agentic tasks:
+
+```yaml
+models:
+ claude-interleaved:
+ provider: anthropic
+ model: claude-sonnet-4-5
+ thinking_budget: 16384
+ provider_opts:
+ interleaved_thinking: true
+```
+
+
+
Temperature and top_p
+
+
When extended thinking is enabled, Anthropic requires temperature=1.0. docker-agent automatically suppresses any temperature or top_p settings you have configured — they are silently ignored while thinking is active.
+
+
+### Thinking display
+
+Claude Opus 4.7 hides thinking content by default. Use `thinking_display` in `provider_opts` to control what you receive:
+
+```yaml
+models:
+ opus-47:
+ provider: anthropic
+ model: claude-opus-4-7
+ thinking_budget: adaptive
+ provider_opts:
+ thinking_display: summarized # summarized | display | omitted
+```
+
+| Value | Behavior |
+| ------------ | ------------------------------------------------------------------------------------- |
+| `summarized` | Thinking blocks returned with a text summary (default for Claude 4 models pre-4.7). |
+| `display` | Full thinking blocks returned for display. |
+| `omitted` | Thinking blocks hidden — only the signature is returned (default for Opus 4.7). |
+
+Full thinking tokens are billed regardless of `thinking_display`.
+
+### Task budget (Anthropic)
+
+`task_budget` caps total tokens across an entire multi-step agentic task (thinking + tool calls + output combined):
+
+```yaml
+models:
+ opus-bounded:
+ provider: anthropic
+ model: claude-opus-4-7
+ thinking_budget: adaptive
+ task_budget: 128000 # total token ceiling for the whole task
+```
+
+See the [Anthropic provider page]({{ '/providers/anthropic/#task-budget' | relative_url }}) for details.
+
+## Google Gemini
+
+Gemini 2.5 and Gemini 3 use different formats.
+
+### Gemini 2.5 (token budget)
+
+```yaml
+models:
+ gemini-off:
+ provider: google
+ model: gemini-2.5-flash
+ thinking_budget: 0 # disable thinking
+
+ gemini-dynamic:
+ provider: google
+ model: gemini-2.5-flash
+ thinking_budget: -1 # let the model decide (default)
+
+ gemini-fixed:
+ provider: google
+ model: gemini-2.5-flash
+ thinking_budget: 8192 # fixed token budget (max 24576 for Flash, 32768 for Pro)
+```
+
+### Gemini 3 (level-based)
+
+```yaml
+models:
+ gemini3-flash:
+ provider: google
+ model: gemini-3-flash
+ thinking_budget: medium # minimal | low | medium | high
+
+ gemini3-pro:
+ provider: google
+ model: gemini-3-pro
+ thinking_budget: high # low | high (Pro supports fewer levels)
+```
+
+## AWS Bedrock (Claude)
+
+Bedrock Claude uses a token budget like Anthropic, but only supports integer token values. String effort levels (`minimal`–`max`) are mapped automatically:
+
+| Effort level | Token budget |
+| ------------ | ------------ |
+| `minimal` | 1,024 |
+| `low` | 2,048 |
+| `medium` | 8,192 |
+| `high` | 16,384 |
+| `xhigh`/`max`| 32,768 |
+
+```yaml
+models:
+ bedrock-claude-thinker:
+ provider: amazon-bedrock
+ model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+ thinking_budget: 8192 # or use an effort level: medium
+ provider_opts:
+ region: us-east-1
+
+ bedrock-claude-interleaved:
+ provider: amazon-bedrock
+ model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+ thinking_budget: high
+ provider_opts:
+ region: us-east-1
+ interleaved_thinking: true
+```
+
+
+
Bedrock thinking requirements
+
+
Bedrock Claude requires thinking_budget to be ≥ 1024 and less than max_tokens. docker-agent logs a warning and ignores the budget if either condition is violated. Interleaved thinking requires the interleaved-thinking-2025-05-14 beta header, which docker-agent adds automatically when interleaved_thinking: true is set.
+
+
+## xAI (Grok) and Mistral
+
+Both providers use the OpenAI-compatible API and accept the same effort strings:
+
+```yaml
+models:
+ grok-thinker:
+ provider: xai
+ model: grok-3
+ thinking_budget: high # minimal | low | medium | high | xhigh | none
+
+ mistral-thinker:
+ provider: mistral
+ model: mistral-large-latest
+ thinking_budget: medium # minimal | low | medium | high | xhigh | none
+```
+
+Thinking is **off by default** for these providers. Set `thinking_budget` explicitly to enable it.
+
+## Disabling Thinking
+
+Use `none` or `0` to disable thinking on any provider:
+
+```yaml
+models:
+ fast-model:
+ provider: openai
+ model: gpt-5-mini
+ thinking_budget: none
+
+ gemini-no-think:
+ provider: google
+ model: gemini-2.5-flash
+ thinking_budget: 0
+```
+
+## Choosing an Effort Level
+
+| Task complexity | Recommended level |
+| -------------------------------- | ----------------------- |
+| Simple factual Q&A | `none` / `minimal` |
+| General-purpose chat | `low` / `medium` |
+| Coding, debugging, analysis | `medium` / `high` |
+| Complex reasoning, planning | `high` / `xhigh` |
+| Research, difficult math/logic | `xhigh` / `max` |
+| Long agentic tasks (Anthropic) | `adaptive` |
+
+## Sharing Thinking Config Across Models
+
+Define a provider with a default `thinking_budget` and all models that reference it inherit it:
+
+```yaml
+providers:
+ deep-anthropic:
+ provider: anthropic
+ thinking_budget: high
+ max_tokens: 32768
+
+models:
+ claude-smart:
+ provider: deep-anthropic
+ model: claude-sonnet-4-5 # inherits thinking_budget: high
+
+ claude-faster:
+ provider: deep-anthropic
+ model: claude-haiku-4-5
+ thinking_budget: low # overrides to low
+```
+
+## Full Example
+
+See [`examples/thinking_budget.yaml`](https://github.com/docker/docker-agent/blob/main/examples/thinking_budget.yaml) for a runnable config covering all providers.
diff --git a/docs/providers/bedrock/index.md b/docs/providers/bedrock/index.md
index 9a0beba81..1f9e61e2a 100644
--- a/docs/providers/bedrock/index.md
+++ b/docs/providers/bedrock/index.md
@@ -80,7 +80,7 @@ models:
| `role_session_name` | string | docker-agent-bedrock-session | Session name for assumed role |
| `external_id` | string | — | External ID for role assumption |
| `endpoint_url` | string | — | Custom endpoint (VPC/testing) |
-| `interleaved_thinking` | bool | true | Reasoning during tool calls (Claude) |
+| `interleaved_thinking` | bool | false | Allow reasoning between tool calls (Claude); adds the required beta header automatically |
| `disable_prompt_caching` | bool | false | Disable automatic prompt caching |
## Inference Profiles
@@ -101,6 +101,56 @@ Use inference profile prefixes for optimal routing:
+## Thinking Budget (Claude on Bedrock)
+
+Bedrock Claude models support extended thinking — an internal reasoning phase before the model produces its response. Set `thinking_budget` to a token count (1024–32768) or an effort level string that maps automatically:
+
+| Effort level | Token budget |
+| ------------ | ------------ |
+| `minimal` | 1,024 |
+| `low` | 2,048 |
+| `medium` | 8,192 |
+| `high` | 16,384 |
+| `xhigh`/`max`| 32,768 |
+
+```yaml
+models:
+ bedrock-claude-thinking:
+ provider: amazon-bedrock
+ model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+ thinking_budget: 8192 # tokens, or use an effort string like "medium"
+ max_tokens: 16384 # must be > thinking_budget
+ provider_opts:
+ region: us-east-1
+```
+
+`thinking_budget` must be ≥ 1024 and less than `max_tokens`. Values outside this range are logged as a warning and ignored.
+
+
+
Temperature and top_p
+
+
Bedrock Claude suppresses temperature and top_p while extended thinking is active — Anthropic requires temperature=1.0 internally.
+
+
+## Interleaved Thinking (Claude on Bedrock)
+
+Interleaved thinking lets the model reason between tool calls, not just at the start. This is useful for complex agentic tasks. Enable it alongside a thinking budget:
+
+```yaml
+models:
+ bedrock-claude-interleaved:
+ provider: amazon-bedrock
+ model: global.anthropic.claude-sonnet-4-5-20250929-v1:0
+ thinking_budget: high
+ provider_opts:
+ region: us-east-1
+ interleaved_thinking: true
+```
+
+docker-agent automatically adds the `interleaved-thinking-2025-05-14` beta header when `interleaved_thinking: true` is set. If you set `interleaved_thinking: false` while a thinking budget is active, a warning is logged because the budget may be ignored by Bedrock without the beta header.
+
+See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a full cross-provider overview.
+
## Prompt Caching
Automatically enabled for supported models to reduce latency and costs. System prompts, tool definitions, and recent messages are cached with a 5-minute TTL.
diff --git a/docs/providers/openai/index.md b/docs/providers/openai/index.md
index 4c69a85b2..79d589d95 100644
--- a/docs/providers/openai/index.md
+++ b/docs/providers/openai/index.md
@@ -49,16 +49,49 @@ Find more model names at [modelnames.ai](https://modelnames.ai/) or in the [offi
## Thinking Budget
-OpenAI uses effort level strings:
+OpenAI reasoning models (o-series, gpt-5, gpt-5-mini) support extended thinking through the `reasoning_effort` API parameter. Set `thinking_budget` to control the effort level:
```yaml
models:
- gpt-thinking:
+ gpt-thinker:
provider: openai
model: gpt-5-mini
- thinking_budget: low # minimal | low | medium (default) | high | xhigh | max | none | adaptive/
+ thinking_budget: high # minimal | low | medium | high | xhigh
```
+**Effort levels:**
+
+| Level | Description |
+| --------- | -------------------------------------------------------- |
+| `none` | Disable thinking (a minimum output floor still applies). |
+| `minimal` | Fastest; lightest reasoning pass. |
+| `low` | Quick reasoning for straightforward tasks. |
+| `medium` | Balanced default. |
+| `high` | More thorough; recommended for complex tasks. |
+| `xhigh` | Near-maximum effort; slower but most accurate. |
+
+
+
Hidden reasoning tokens
+
+
OpenAI reasoning models always produce hidden reasoning tokens that count against max_tokens — even with thinking_budget: none. docker-agent automatically raises the output-token floor so reasoning cannot starve visible text output.
+
+
+### Adaptive Thinking
+
+Use `adaptive/` to let the model scale effort dynamically based on task complexity:
+
+```yaml
+models:
+ gpt-adaptive:
+ provider: openai
+ model: gpt-5-mini
+ thinking_budget: adaptive/medium # adaptive/low | adaptive/medium | adaptive/high | adaptive/xhigh | adaptive/max
+```
+
+> `max` is only valid inside the `adaptive/` prefix — `thinking_budget: max` (bare) is not accepted by OpenAI.
+
+See the [Thinking / Reasoning guide]({{ '/guides/thinking/' | relative_url }}) for a cross-provider overview.
+