diff --git a/docs/adapters/anthropic.md b/docs/adapters/anthropic.md index 50e7ef155..677028d70 100644 --- a/docs/adapters/anthropic.md +++ b/docs/adapters/anthropic.md @@ -39,12 +39,12 @@ const stream = chat({ import { chat } from "@tanstack/ai"; import { createAnthropicChat } from "@tanstack/ai-anthropic"; -const adapter = createAnthropicChat(process.env.ANTHROPIC_API_KEY!, { +const adapter = createAnthropicChat("claude-sonnet-4-5", process.env.ANTHROPIC_API_KEY!, { // ... your config options }); const stream = chat({ - adapter: adapter("claude-sonnet-4-5"), + adapter, messages: [{ role: "user", content: "Hello!" }], }); ``` @@ -52,13 +52,13 @@ const stream = chat({ ## Configuration ```typescript -import { createAnthropicChat, type AnthropicChatConfig } from "@tanstack/ai-anthropic"; +import { createAnthropicChat, type AnthropicTextConfig } from "@tanstack/ai-anthropic"; -const config: Omit = { +const config: Omit = { baseURL: "https://api.anthropic.com", // Optional, for custom endpoints }; -const adapter = createAnthropicChat(process.env.ANTHROPIC_API_KEY!, config); +const adapter = createAnthropicChat("claude-sonnet-4-5", process.env.ANTHROPIC_API_KEY!, config); ``` @@ -194,39 +194,20 @@ ANTHROPIC_API_KEY=sk-ant-... ## API Reference -### `anthropicText(config?)` +Every factory pair follows the same shape: the short factory (`anthropicText`, `anthropicSummarize`) reads `ANTHROPIC_API_KEY` from the environment, while `createAnthropicChat` / `createAnthropicSummarize` take an explicit API key. Both take `model` as the first argument. -Creates an Anthropic chat adapter using environment variables. +### `anthropicText(model, config?)` / `createAnthropicChat(model, apiKey, config?)` -**Returns:** An Anthropic chat adapter instance. - -### `createAnthropicChat(apiKey, config?)` - -Creates an Anthropic chat adapter with an explicit API key. +Creates an Anthropic chat adapter. **Parameters:** -- `apiKey` - Your Anthropic API key -- `config.baseURL?` - Custom base URL (optional) - -**Returns:** An Anthropic chat adapter instance. - -### `anthropicSummarize(config?)` - -Creates an Anthropic summarization adapter using environment variables. - -**Returns:** An Anthropic summarize adapter instance. - -### `createAnthropicSummarize(apiKey, config?)` - -Creates an Anthropic summarization adapter with an explicit API key. - -**Parameters:** +- `model` - Claude model id (e.g. `"claude-sonnet-4-5"`, `"claude-opus-4-6"`) +- `config?.baseURL` - Custom base URL (optional) -- `apiKey` - Your Anthropic API key -- `config.baseURL?` - Custom base URL (optional) +### `anthropicSummarize(model, config?)` / `createAnthropicSummarize(model, apiKey, config?)` -**Returns:** An Anthropic summarize adapter instance. +Creates an Anthropic summarization adapter. ## Limitations diff --git a/docs/adapters/elevenlabs.md b/docs/adapters/elevenlabs.md index 83a108ef9..13d4dab59 100644 --- a/docs/adapters/elevenlabs.md +++ b/docs/adapters/elevenlabs.md @@ -13,9 +13,16 @@ keywords: - adapter --- -The ElevenLabs adapter provides realtime conversational voice AI for TanStack AI. Unlike text-focused adapters, the ElevenLabs adapter is **voice-focused** -- it integrates with TanStack AI's realtime system to enable voice-to-voice conversations. It does not support `chat()`, `embedding()`, or `summarize()`. +The ElevenLabs adapter is **voice-focused**. It exposes four capabilities: -ElevenLabs uses an **agent-based architecture** where you configure your conversational AI agent in the [ElevenLabs dashboard](https://elevenlabs.io/) (voice, personality, knowledge base, tools) and then connect to it at runtime. The adapter wraps the `@11labs/client` SDK for seamless integration with `useRealtimeChat` and `RealtimeClient`. +- **Realtime voice agents** (`elevenlabsRealtime` / `elevenlabsRealtimeToken`) — full-duplex voice-to-voice conversations powered by ElevenLabs Conversational AI agents. +- **Text-to-speech** (`elevenlabsSpeech`) — one-shot speech generation via `generateSpeech()`. +- **Music & sound effects** (`elevenlabsAudio`) — one-shot audio generation via `generateAudio()`. +- **Transcription** (`elevenlabsTranscription`) — speech-to-text via `generateTranscription()`. + +It does not support text `chat()` or `summarize()` — use OpenAI, Anthropic, or Gemini for those. + +The realtime adapter uses an **agent-based architecture** where you configure your conversational AI agent in the [ElevenLabs dashboard](https://elevenlabs.io/) (voice, personality, knowledge base, tools) and then connect to it at runtime. The adapter wraps the `@11labs/client` SDK for seamless integration with `useRealtimeChat` and `RealtimeClient`. ## Installation @@ -252,6 +259,61 @@ ELEVENLABS_AGENT_ID=your-agent-id Get your API key from the [ElevenLabs dashboard](https://elevenlabs.io/). Create and configure agents in the **Conversational AI** section of the dashboard. +## Text-to-Speech + +For one-shot speech generation (not realtime), use `elevenlabsSpeech` with `generateSpeech()`: + +```typescript +import { generateSpeech } from "@tanstack/ai"; +import { elevenlabsSpeech } from "@tanstack/ai-elevenlabs"; + +const result = await generateSpeech({ + adapter: elevenlabsSpeech("eleven_v3"), + text: "Hello from ElevenLabs!", + voice: "Rachel", + format: "mp3", +}); + +console.log(result.audio); // Base64-encoded audio +``` + +## Music & Sound Effects + +`elevenlabsAudio` covers both music generation and sound effects depending on the model: + +```typescript +import { generateAudio } from "@tanstack/ai"; +import { elevenlabsAudio } from "@tanstack/ai-elevenlabs"; + +// Music generation +const music = await generateAudio({ + adapter: elevenlabsAudio("music_v1"), + prompt: "An upbeat synthwave track for a product launch", +}); + +// Sound effects +const sfx = await generateAudio({ + adapter: elevenlabsAudio("sound_effects_v1"), + prompt: "A glass shattering on concrete", +}); +``` + +## Transcription + +Transcribe audio with `elevenlabsTranscription`: + +```typescript +import { generateTranscription } from "@tanstack/ai"; +import { elevenlabsTranscription } from "@tanstack/ai-elevenlabs"; + +const result = await generateTranscription({ + adapter: elevenlabsTranscription("scribe_v1"), + audio: audioFile, +}); + +console.log(result.text); +``` + ## API Reference ### `elevenlabsRealtimeToken(options)` @@ -279,14 +341,26 @@ Creates an ElevenLabs realtime client adapter for use with `useRealtimeChat` or **Returns:** A `RealtimeAdapter` for use with `useRealtimeChat()` or `RealtimeClient`. +### `elevenlabsSpeech(model, config?)` / `createElevenLabsSpeech(model, apiKey, config?)` + +Creates an ElevenLabs text-to-speech adapter for use with `generateSpeech()`. + +### `elevenlabsAudio(model, config?)` / `createElevenLabsAudio(model, apiKey, config?)` + +Creates an ElevenLabs audio adapter that covers both music generation and sound effects (selected via the model id) for use with `generateAudio()`. + +### `elevenlabsTranscription(model, config?)` / `createElevenLabsTranscription(model, apiKey, config?)` + +Creates an ElevenLabs transcription adapter for use with `generateTranscription()`. + ## Limitations -- **No text chat support** -- Use OpenAI, Anthropic, Gemini, or another text adapter for `chat()` -- **No embeddings or summarization** -- Use a text adapter for `embedding()` or `summarize()` -- **No image input** -- ElevenLabs realtime does not support sending images during a conversation -- **No runtime session updates** -- Session configuration is fixed at connection time -- **No time-domain audio data** -- Frequency data and volume levels are available, but waveform data is not -- **Agent required** -- You must create and configure an agent in the ElevenLabs dashboard before using this adapter +- **No text chat support** -- Use OpenAI, Anthropic, Gemini, or another text adapter for `chat()`. +- **No summarization** -- Use a text adapter for `summarize()`. +- **No image input** (realtime) -- ElevenLabs realtime does not support sending images during a conversation. +- **No runtime session updates** (realtime) -- Session configuration is fixed at connection time. +- **No time-domain audio data** (realtime) -- Frequency data and volume levels are available, but waveform data is not. +- **Agent required** (realtime) -- You must create and configure an agent in the ElevenLabs dashboard before using the realtime adapter. ## Next Steps diff --git a/docs/adapters/fal.md b/docs/adapters/fal.md index 76ae71599..cc32c634d 100644 --- a/docs/adapters/fal.md +++ b/docs/adapters/fal.md @@ -405,10 +405,6 @@ Creates a fal.ai image adapter using the `FAL_KEY` environment variable or an ex **Returns:** A `FalImageAdapter` instance for use with `generateImage()`. -### `createFalImage(model, config?)` - -Alias for `falImage()`. - ### `falVideo(model, config?)` Creates a fal.ai video adapter using the `FAL_KEY` environment variable or an explicit config. @@ -421,10 +417,6 @@ Creates a fal.ai video adapter using the `FAL_KEY` environment variable or an ex **Returns:** A `FalVideoAdapter` instance for use with `generateVideo()` and `getVideoJobStatus()`. -### `createFalVideo(model, config?)` - -Alias for `falVideo()`. - ### `falSpeech(model, config?)` Creates a fal.ai text-to-speech adapter. diff --git a/docs/adapters/gemini.md b/docs/adapters/gemini.md index a3e4ff7e4..a19f491ac 100644 --- a/docs/adapters/gemini.md +++ b/docs/adapters/gemini.md @@ -42,12 +42,12 @@ const stream = chat({ import { chat } from "@tanstack/ai"; import { createGeminiChat } from "@tanstack/ai-gemini"; -const adapter = createGeminiChat(process.env.GEMINI_API_KEY!, { +const adapter = createGeminiChat("gemini-2.5-pro", process.env.GEMINI_API_KEY!, { // ... your config options }); const stream = chat({ - adapter: adapter("gemini-2.5-pro"), + adapter, messages: [{ role: "user", content: "Hello!" }], }); ``` @@ -55,13 +55,13 @@ const stream = chat({ ## Configuration ```typescript -import { createGeminiChat, type GeminiChatConfig } from "@tanstack/ai-gemini"; +import { createGeminiChat, type GeminiTextConfig } from "@tanstack/ai-gemini"; -const config: Omit = { +const config: Omit = { baseURL: "https://generativelanguage.googleapis.com/v1beta", // Optional }; -const adapter = createGeminiChat(process.env.GEMINI_API_KEY!, config); +const adapter = createGeminiChat("gemini-2.5-pro", process.env.GEMINI_API_KEY!, config); ``` @@ -324,69 +324,32 @@ These models use the dedicated `generateImages` API. ## API Reference -### `geminiText(config?)` +Every factory pair follows the same shape: the short factory (`geminiText`, `geminiImage`, …) reads `GEMINI_API_KEY` (or `GOOGLE_API_KEY`) from the environment, while the `create*` variant takes an explicit API key. Both take `model` as the first argument. -Creates a Gemini text/chat adapter using environment variables. +### `geminiText(model, config?)` / `createGeminiChat(model, apiKey, config?)` -**Returns:** A Gemini text adapter instance. - -### `createGeminiText(apiKey, config?)` - -Creates a Gemini text/chat adapter with an explicit API key. +Creates a Gemini text/chat adapter. **Parameters:** -- `apiKey` - Your Gemini API key -- `config.baseURL?` - Custom base URL (optional) - -**Returns:** A Gemini text adapter instance. - -### `geminiSummarize(config?)` - -Creates a Gemini summarization adapter using environment variables. - -**Returns:** A Gemini summarize adapter instance. - -### `createGeminiSummarize(apiKey, config?)` - -Creates a Gemini summarization adapter with an explicit API key. - -**Returns:** A Gemini summarize adapter instance. - -### `geminiImage(model, config?)` - -Creates a Gemini image adapter using environment variables. Automatically routes to the correct API based on model name — `gemini-*` models use `generateContent`, `imagen-*` models use `generateImages`. - -**Parameters:** - -- `model` - The model name (e.g., `"gemini-3.1-flash-image-preview"` or `"imagen-4.0-generate-001"`) -- `config.baseURL?` - Custom base URL (optional) - -**Returns:** A Gemini image adapter instance. - -### `createGeminiImage(model, apiKey, config?)` - -Creates a Gemini image adapter with an explicit API key. - -**Parameters:** +- `model` - Gemini chat model id (e.g. `"gemini-2.5-pro"`) +- `config?.baseURL` - Custom base URL (optional) -- `model` - The model name -- `apiKey` - Your Google API key -- `config.baseURL?` - Custom base URL (optional) +### `geminiSummarize(model, config?)` / `createGeminiSummarize(model, apiKey, config?)` -**Returns:** A Gemini image adapter instance. +Creates a Gemini summarization adapter. -### `geminiTTS(config?)` +### `geminiImage(model, config?)` / `createGeminiImage(model, apiKey, config?)` -Creates a Gemini TTS adapter using environment variables. +Creates a Gemini image adapter. Automatically routes to the correct API based on the model name — `gemini-*` models use `generateContent`, `imagen-*` models use `generateImages`. -**Returns:** A Gemini TTS adapter instance. +### `geminiSpeech(model, config?)` / `createGeminiSpeech(model, apiKey, config?)` -### `createGeminiTTS(apiKey, config?)` +Creates a Gemini text-to-speech adapter. _Experimental._ -Creates a Gemini TTS adapter with an explicit API key. +### `geminiAudio(model, config?)` / `createGeminiAudio(model, apiKey, config?)` -**Returns:** A Gemini TTS adapter instance. +Creates a Gemini Lyria music generation adapter. _Experimental._ ## Next Steps diff --git a/docs/adapters/grok.md b/docs/adapters/grok.md index b08cf4091..3f6db60b8 100644 --- a/docs/adapters/grok.md +++ b/docs/adapters/grok.md @@ -62,7 +62,7 @@ const adapter = createGrokText("grok-4", process.env.XAI_API_KEY!, config); ## Example: Chat Completion ```typescript -import { chat, toStreamResponse } from "@tanstack/ai"; +import { chat, toServerSentEventsResponse } from "@tanstack/ai"; import { grokText } from "@tanstack/ai-grok"; export async function POST(request: Request) { @@ -73,7 +73,7 @@ export async function POST(request: Request) { messages, }); - return toStreamResponse(stream); + return toServerSentEventsResponse(stream); } ``` @@ -155,6 +155,44 @@ const result = await generateImage({ console.log(result.images); ``` +## Text-to-Speech + +Generate speech with Grok TTS: + +```typescript +import { generateSpeech } from "@tanstack/ai"; +import { grokSpeech } from "@tanstack/ai-grok"; + +const result = await generateSpeech({ + adapter: grokSpeech("grok-tts"), + text: "Hello from Grok!", + voice: "default", + format: "mp3", +}); + +console.log(result.audio); // Base64-encoded audio +``` + +## Transcription + +Transcribe audio with Grok STT: + +```typescript +import { generateTranscription } from "@tanstack/ai"; +import { grokTranscription } from "@tanstack/ai-grok"; + +const result = await generateTranscription({ + adapter: grokTranscription("grok-stt"), + audio: audioFile, +}); + +console.log(result.text); +``` + +## Realtime Voice + +Grok also exposes a Realtime voice adapter (`grokRealtime`) and a token issuer (`grokRealtimeToken`) for low-latency voice conversations. See [Realtime Voice Chat](../media/realtime-chat) for the end-to-end flow. + ## Environment Variables Set your API key in environment variables: @@ -216,22 +254,24 @@ Creates a Grok summarization adapter with an explicit API key. **Returns:** A Grok summarize adapter instance. -### `grokImage(model, config?)` +### `grokImage(model, config?)` / `createGrokImage(model, apiKey, config?)` + +Creates a Grok image generation adapter. + +### `grokSpeech(model, config?)` / `createGrokSpeech(model, apiKey, config?)` -Creates a Grok image generation adapter using environment variables. +Creates a Grok text-to-speech adapter. -**Returns:** A Grok image adapter instance. +### `grokTranscription(model, config?)` / `createGrokTranscription(model, apiKey, config?)` -### `createGrokImage(model, apiKey, config?)` +Creates a Grok speech-to-text adapter. -Creates a Grok image generation adapter with an explicit API key. +### `grokRealtime(...)` / `grokRealtimeToken(...)` -**Returns:** A Grok image adapter instance. +Realtime voice adapter and token issuer. See [Realtime Voice Chat](../media/realtime-chat) for usage. ## Limitations -- **Text-to-Speech**: Grok does not support text-to-speech. Use OpenAI for TTS. -- **Transcription**: Grok does not support audio transcription. Use OpenAI's Whisper. - **Responses API Tools**: Server-side tools (web search, X search, code execution) are not supported through this adapter. Use the Chat Completions API with custom tools instead. ## Next Steps diff --git a/docs/adapters/groq.md b/docs/adapters/groq.md index b6ab11530..1c4c81644 100644 --- a/docs/adapters/groq.md +++ b/docs/adapters/groq.md @@ -44,7 +44,7 @@ const adapter = createGroqText("llama-3.3-70b-versatile", process.env.GROQ_API_K }); const stream = chat({ - adapter: adapter, + adapter, messages: [{ role: "user", content: "Hello!" }], }); ``` @@ -162,57 +162,6 @@ Groq offers a diverse selection of models from multiple providers: - `moonshotai/kimi-k2-instruct-0905` - Kimi K2 with 256K context - `qwen/qwen3-32b` - Qwen 3 with reasoning support -## Text-to-Speech - -Groq provides unique Text-to-Speech capabilities via Canopy Labs Orpheus models: - -```typescript -import { generateSpeech } from "@tanstack/ai"; -import { groqSpeech } from "@tanstack/ai-groq"; - -const result = await generateSpeech({ - adapter: groqSpeech("canopylabs/orpheus-v1-english"), - text: "Hello, welcome to TanStack AI!", - voice: "autumn", - format: "wav", -}); - -// result.audio contains base64-encoded audio -console.log(result.format); // "wav" -``` - -### English Voices - -Available voices: `autumn`, `diana`, `hannah`, `austin`, `daniel`, `troy` - -### Arabic Voices - -Available voices for Arabic model (`canopylabs/orpheus-arabic-saudi`): `fahad`, `sultan`, `lulwa`, `noura` - -### TTS Model Options - -```typescript -const result = await generateSpeech({ - adapter: groqSpeech("canopylabs/orpheus-v1-english"), - text: "High quality speech", - voice: "diana", - format: "wav", - modelOptions: { - sample_rate: 24000, // Audio sample rate in Hz - }, -}); -``` - -### Supported TTS Formats - -- `wav` (only format currently supported for Orpheus models) -- `mp3` -- `flac` -- `ogg` -- `mulaw` - -> **Note:** Additional formats (`mp3`, `flac`, `ogg`, `mulaw`) are defined for future compatibility but are not yet supported by Orpheus TTS models. - ## Environment Variables Set your API key in environment variables: @@ -248,31 +197,11 @@ Creates a Groq chat adapter with an explicit API key. **Returns:** A Groq chat adapter instance. -### `groqSpeech(model, config?)` - -Creates a Groq TTS adapter using environment variables. - -**Parameters:** - -- `model` - The TTS model name (e.g., `canopylabs/orpheus-v1-english`) - -**Returns:** A Groq speech adapter instance. - -### `createGroqSpeech(model, apiKey, config?)` - -Creates a Groq TTS adapter with an explicit API key. - -**Parameters:** - -- `model` - The TTS model name (e.g., `canopylabs/orpheus-v1-english`) -- `apiKey` - Your Groq API key -- `config.baseURL?` - Custom base URL (optional) - -**Returns:** A Groq speech adapter instance. - ## Limitations -- **Image Generation**: Groq does not support image generation. Use OpenAI or Gemini for image generation. +- **Text-to-Speech**: Groq does not currently expose a TTS adapter. Use OpenAI, Gemini, ElevenLabs, or fal for speech generation. +- **Image Generation**: Groq does not support image generation. Use OpenAI, Gemini, or fal for image generation. +- **Transcription**: Groq does not currently expose a transcription adapter through TanStack AI. ## Next Steps diff --git a/docs/adapters/ollama.md b/docs/adapters/ollama.md index 0a83335a4..1dc0a0458 100644 --- a/docs/adapters/ollama.md +++ b/docs/adapters/ollama.md @@ -40,10 +40,10 @@ const stream = chat({ import { chat } from "@tanstack/ai"; import { createOllamaChat } from "@tanstack/ai-ollama"; -const adapter = createOllamaChat("http://your-server:11434"); +const adapter = createOllamaChat("llama3", "http://your-server:11434"); const stream = chat({ - adapter: adapter("llama3"), + adapter, messages: [{ role: "user", content: "Hello!" }], }); ``` @@ -53,11 +53,14 @@ const stream = chat({ ```typescript import { createOllamaChat } from "@tanstack/ai-ollama"; -// Default localhost -const adapter = createOllamaChat(); +// Custom host (URL string) +const adapter = createOllamaChat("llama3", "http://your-server:11434"); -// Custom host -const adapter = createOllamaChat("http://your-server:11434"); +// Custom client config (e.g., custom headers, fetch) +const adapter2 = createOllamaChat("llama3", { + host: "http://your-server:11434", + headers: { Authorization: "Bearer ..." }, +}); ``` ## Available Models @@ -230,7 +233,7 @@ The server runs on `http://localhost:11434` by default. ## Running on a Remote Server ```typescript -const adapter = createOllamaChat("http://your-server:11434"); +const adapter = createOllamaChat("llama3", "http://your-server:11434"); ``` To expose Ollama on a network interface: @@ -249,38 +252,26 @@ OLLAMA_HOST=http://localhost:11434 ## API Reference -### `ollamaText(options?)` +### `ollamaText(model)` -Creates an Ollama text/chat adapter. +Creates an Ollama text/chat adapter using `OLLAMA_HOST` from the environment (defaults to `http://localhost:11434`). **Parameters:** -- `options.model?` - Default model (optional) - -**Returns:** An Ollama text adapter instance. +- `model` - Model name (e.g. `"llama3"`, `"mistral:7b"`) -### `createOllamaText(host?, options?)` +### `createOllamaChat(model, hostOrConfig?)` -Creates an Ollama text/chat adapter with a custom host. +Creates an Ollama text/chat adapter with an explicit host or client config. **Parameters:** -- `host` - Ollama server URL (default: `http://localhost:11434`) -- `options.model?` - Default model (optional) - -**Returns:** An Ollama text adapter instance. - -### `ollamaSummarize(options?)` - -Creates an Ollama summarization adapter. - -**Returns:** An Ollama summarize adapter instance. - -### `createOllamaSummarize(host?, options?)` +- `model` - Model name +- `hostOrConfig?` - Either an `OLLAMA_HOST`-style URL string, or an `OllamaClientConfig` object (e.g. `{ host, headers, fetch }`). -Creates an Ollama summarization adapter with a custom host. +### `ollamaSummarize(model)` / `createOllamaSummarize(model, hostOrConfig?)` -**Returns:** An Ollama summarize adapter instance. +Creates an Ollama summarization adapter — same signature shape as the chat adapter. ## Benefits of Ollama diff --git a/docs/adapters/openai.md b/docs/adapters/openai.md index ba93d8d65..e780a9a0e 100644 --- a/docs/adapters/openai.md +++ b/docs/adapters/openai.md @@ -85,12 +85,12 @@ Both adapters work identically with [Structured Outputs](../structured-outputs/o import { chat } from "@tanstack/ai"; import { createOpenaiChat } from "@tanstack/ai-openai"; -const adapter = createOpenaiChat(process.env.OPENAI_API_KEY!, { +const adapter = createOpenaiChat("gpt-5.2", process.env.OPENAI_API_KEY!, { // ... your config options }); const stream = chat({ - adapter: adapter("gpt-5.2"), + adapter, messages: [{ role: "user", content: "Hello!" }], }); ``` @@ -98,14 +98,14 @@ const stream = chat({ ## Configuration ```typescript -import { createOpenaiChat, type OpenAIChatConfig } from "@tanstack/ai-openai"; +import { createOpenaiChat, type OpenAITextConfig } from "@tanstack/ai-openai"; -const config: Omit = { +const config: Omit = { organization: "org-...", // Optional baseURL: "https://api.openai.com/v1", // Optional, for custom endpoints }; -const adapter = createOpenaiChat(process.env.OPENAI_API_KEY!, config); +const adapter = createOpenaiChat("gpt-5.2", process.env.OPENAI_API_KEY!, config); ``` ## Example: Chat Completion @@ -242,10 +242,10 @@ Generate speech from text: ```typescript import { generateSpeech } from "@tanstack/ai"; -import { openaiTTS } from "@tanstack/ai-openai"; +import { openaiSpeech } from "@tanstack/ai-openai"; const result = await generateSpeech({ - adapter: openaiTTS("tts-1"), + adapter: openaiSpeech("tts-1"), text: "Hello, welcome to TanStack AI!", voice: "alloy", format: "mp3", @@ -263,7 +263,7 @@ Available voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`, `ash`, `b ```typescript const result = await generateSpeech({ - adapter: openaiTTS("tts-1-hd"), + adapter: openaiSpeech("tts-1-hd"), text: "High quality speech", modelOptions: { speed: 1.0, // 0.25 to 4.0 @@ -315,91 +315,53 @@ OPENAI_API_KEY=sk-... ## API Reference -### `openaiText(config?)` - -Creates an OpenAI chat adapter using environment variables. - -**Returns:** An OpenAI chat adapter instance. +Every factory pair follows the same shape: the short factory (`openaiText`, `openaiImage`, …) reads `OPENAI_API_KEY` from the environment, while the `create*` variant takes an explicit API key. Both take `model` as the first argument. -### `createOpenaiChat(apiKey, config?)` +### `openaiText(model, config?)` -Creates an OpenAI chat adapter with an explicit API key. +Creates an OpenAI text adapter against the Responses API (`/v1/responses`) using `OPENAI_API_KEY` from the environment. **Parameters:** -- `apiKey` - Your OpenAI API key -- `config.organization?` - Organization ID (optional) -- `config.baseURL?` - Custom base URL (optional) +- `model` - OpenAI chat model id (e.g. `"gpt-5.2"`, `"gpt-4o-mini"`) +- `config?.organization` - Organization ID (optional) +- `config?.baseURL` - Custom base URL (optional) -**Returns:** An OpenAI chat adapter instance. +### `createOpenaiChat(model, apiKey, config?)` -### `openaiChatCompletions(model)` +Creates an OpenAI text adapter (Responses API) with an explicit API key. -Creates an OpenAI chat adapter that targets `/v1/chat/completions` instead of the Responses API. See [Chat Completions API](#chat-completions-api) for when to use this over `openaiText`. +### `openaiChatCompletions(model, config?)` -**Returns:** An OpenAI chat adapter instance using the Chat Completions wire format. +Creates an OpenAI text adapter that targets `/v1/chat/completions` instead of the Responses API. See [Chat Completions API](#chat-completions-api) for when to use this over `openaiText`. -### `createOpenaiChatCompletions(model, config)` +### `createOpenaiChatCompletions(model, apiKey, config?)` Creates an OpenAI chat-completions adapter with an explicit API key. -**Parameters:** - -- `model` - OpenAI model id (e.g. `"gpt-5.2"`, `"gpt-4o-mini"`) -- `config.apiKey` - Your OpenAI API key -- `config.organization?` - Organization ID (optional) -- `config.baseURL?` - Custom base URL (optional) -- `config.headers?` - Additional headers (optional) - -**Returns:** An OpenAI chat adapter instance using the Chat Completions wire format. - -### `openaiSummarize(config?)` - -Creates an OpenAI summarization adapter using environment variables. - -**Returns:** An OpenAI summarize adapter instance. - -### `createOpenaiSummarize(apiKey, config?)` - -Creates an OpenAI summarization adapter with an explicit API key. - -**Returns:** An OpenAI summarize adapter instance. - -### `openaiImage(config?)` - -Creates an OpenAI image generation adapter using environment variables. - -**Returns:** An OpenAI image adapter instance. - -### `createOpenaiImage(apiKey, config?)` - -Creates an OpenAI image generation adapter with an explicit API key. - -**Returns:** An OpenAI image adapter instance. - -### `openaiTTS(config?)` +### `openaiSummarize(model, config?)` / `createOpenaiSummarize(model, apiKey, config?)` -Creates an OpenAI TTS adapter using environment variables. +Creates an OpenAI summarization adapter. -**Returns:** An OpenAI TTS adapter instance. +### `openaiImage(model, config?)` / `createOpenaiImage(model, apiKey, config?)` -### `createOpenaiTTS(apiKey, config?)` +Creates an OpenAI image generation adapter (DALL-E, gpt-image). -Creates an OpenAI TTS adapter with an explicit API key. +### `openaiSpeech(model, config?)` / `createOpenaiSpeech(model, apiKey, config?)` -**Returns:** An OpenAI TTS adapter instance. +Creates an OpenAI text-to-speech adapter. -### `openaiTranscription(config?)` +### `openaiTranscription(model, config?)` / `createOpenaiTranscription(model, apiKey, config?)` -Creates an OpenAI transcription adapter using environment variables. +Creates an OpenAI transcription adapter (Whisper). -**Returns:** An OpenAI transcription adapter instance. +### `openaiVideo(model, config?)` / `createOpenaiVideo(model, apiKey, config?)` -### `createOpenaiTranscription(apiKey, config?)` +Creates an OpenAI video generation adapter (Sora). _Experimental._ -Creates an OpenAI transcription adapter with an explicit API key. +### `openaiRealtime(...)` / `openaiRealtimeToken(...)` -**Returns:** An OpenAI transcription adapter instance. +Realtime voice adapters. See [Realtime Voice Chat](../media/realtime-chat) for usage. ## Next Steps diff --git a/docs/advanced/middleware.md b/docs/advanced/middleware.md index d76e95c8f..7fd7f411c 100644 --- a/docs/advanced/middleware.md +++ b/docs/advanced/middleware.md @@ -468,7 +468,8 @@ const stream = chat({ Caches tool call results based on tool name and arguments. When a tool is called with the same name and arguments as a previous call, the cached result is returned immediately without re-executing the tool. ```typescript -import { chat, toolCacheMiddleware } from "@tanstack/ai"; +import { chat } from "@tanstack/ai"; +import { toolCacheMiddleware } from "@tanstack/ai/middlewares"; const stream = chat({ adapter: openaiText("gpt-4o"), @@ -520,7 +521,7 @@ By default the cache lives in-memory and is scoped to a single `toolCacheMiddlew The storage interface: ```typescript -import type { ToolCacheStorage, ToolCacheEntry } from "@tanstack/ai"; +import type { ToolCacheStorage, ToolCacheEntry } from "@tanstack/ai/middlewares"; interface ToolCacheStorage { getItem: (key: string) => ToolCacheEntry | undefined | Promise; @@ -537,7 +538,7 @@ All methods may return a `Promise` for async backends. The middleware handles TT ```typescript import { createClient } from "redis"; -import { toolCacheMiddleware, type ToolCacheStorage } from "@tanstack/ai"; +import { toolCacheMiddleware, type ToolCacheStorage } from "@tanstack/ai/middlewares"; const redis = createClient(); diff --git a/docs/advanced/multimodal-content.md b/docs/advanced/multimodal-content.md index 25683d233..86d561fdb 100644 --- a/docs/advanced/multimodal-content.md +++ b/docs/advanced/multimodal-content.md @@ -293,40 +293,40 @@ import type { AnthropicImageMetadata } from '@tanstack/ai-anthropic' import type { GeminiMediaMetadata } from '@tanstack/ai-gemini' ``` -### Handling Dynamic Messages +### Validating Dynamic Messages -When receiving messages from external sources (like `request.json()`), the data is typed as `any`, which can bypass TypeScript's type checking. Use `assertMessages` to restore type safety: +When receiving messages from external sources (like `request.json()`), the data is typed as `any`. TanStack AI does not ship a runtime message validator — define a schema with your preferred Standard-Schema library (Zod, Valibot, ArkType, …) and parse the body before handing it to `chat()`. ```typescript -import { chat, assertMessages } from '@tanstack/ai' +import { chat } from '@tanstack/ai' import { openaiText } from '@tanstack/ai-openai' +import { z } from 'zod' + +const ContentPartSchema = z.discriminatedUnion('type', [ + z.object({ type: z.literal('text'), content: z.string() }), + z.object({ + type: z.literal('image'), + source: z.object({ type: z.enum(['url', 'data']), value: z.string() }), + }), +]) + +const MessageSchema = z.object({ + role: z.enum(['user', 'assistant', 'system']), + content: z.union([z.string(), z.array(ContentPartSchema)]), +}) -// In an API route handler -const { messages: incomingMessages } = await request.json() - -const adapter = openaiText('gpt-5.2') +const BodySchema = z.object({ messages: z.array(MessageSchema) }) -// Assert incoming messages are compatible with gpt-5.2 (text + image only) -const typedMessages = assertMessages({ adapter }, incomingMessages) +// In an API route handler +const { messages } = BodySchema.parse(await request.json()) -// Now TypeScript will properly check any additional messages you add const stream = chat({ - adapter, - messages: [ - ...typedMessages, - // This will error if you try to add unsupported content types - { - role: 'user', - content: [ - { type: 'text', content: 'What do you see?' }, - { type: 'image', source: { type: 'url', value: '...' } } - ] - } - ] + adapter: openaiText('gpt-5.2'), + messages, }) ``` -> **Note:** `assertMessages` is a type-level assertion only. It does not perform runtime validation. For runtime validation of message content, use a schema validation library like Zod. +The TypeScript types on `chat()` still constrain anything you append at the call site to the modalities supported by the selected model. ## Best Practices diff --git a/docs/advanced/typed-options.md b/docs/advanced/typed-options.md new file mode 100644 index 000000000..df849bb0c --- /dev/null +++ b/docs/advanced/typed-options.md @@ -0,0 +1,160 @@ +--- +title: Typed Pre-Configured Options +id: typed-options +order: 11 +description: "Define typed, reusable option objects for chat, summarize, image, video, audio, speech, and transcription with createChatOptions and friends — share configuration across routes without losing per-model type safety." +keywords: + - tanstack ai + - createChatOptions + - createSummarizeOptions + - createImageOptions + - createSpeechOptions + - createTranscriptionOptions + - createAudioOptions + - createVideoOptions + - typed options + - shared configuration +--- + +You have a `chat()` (or `generateImage()`, `generateSpeech()`, …) configuration you want to reuse — across multiple routes, between a server function and its caller, or simply factored out of a handler for clarity. By the end of this guide, you'll have a single typed options object that infers the adapter's model, modalities, and provider options, and that you can spread into any call site without losing type safety. + +## The pattern + +Every activity in `@tanstack/ai` ships a paired `createXxxOptions` helper that takes the exact same options object as the activity itself and returns it unchanged — at runtime it's the identity function. The point is **type inference**: the returned object carries the adapter's full type, so when you spread it into the activity, TypeScript still narrows `modelOptions`, content modalities, and `outputSchema` to the adapter you chose. + +```typescript +import { chat, createChatOptions } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' + +const chatOptions = createChatOptions({ + adapter: openaiText('gpt-5.2'), + // modelOptions, temperature, systemPrompts, tools — all type-checked + // against the adapter+model pair above. + modelOptions: { + reasoning: { effort: 'medium' }, + }, +}) + +// Later, anywhere in your codebase: +const stream = chat({ ...chatOptions, messages }) +``` + +Without the helper you'd have to either inline the configuration at every call site, or type the object yourself with `TextActivityOptions<...>` and resolve the generics manually — `createChatOptions` does that for you. + +## When to reach for it + +- **Sharing a configuration across multiple routes** — define once, spread into each handler. +- **Passing options through a layer** (a server function, a wrapper, a test fixture) without erasing the adapter's model-specific types. +- **Branching on a runtime value while keeping types intact** — build different options objects and choose between them, instead of weaving conditionals into a single `chat({...})` call. +- **Co-locating tools, system prompts, and middleware** with the adapter they target. + +If you only call an activity once at one site, you don't need this helper. Inline the options. + +## Available helpers + +Each helper mirrors the activity it pairs with. Same options, same return type. + +| Helper | Activity | Adapter | +|---|---|---| +| `createChatOptions` | `chat()` | text adapter (e.g. `openaiText`, `anthropicText`) | +| `createSummarizeOptions` | `summarize()` | summarize adapter (e.g. `openaiSummarize`) | +| `createImageOptions` | `generateImage()` | image adapter (e.g. `openaiImage`, `falImage`) | +| `createAudioOptions` | `generateAudio()` | audio adapter (e.g. `falAudio`, `geminiAudio`) | +| `createVideoOptions` | `generateVideo()` / `getVideoJobStatus()` | video adapter (e.g. `falVideo`, `openaiVideo`) | +| `createSpeechOptions` | `generateSpeech()` | speech adapter (e.g. `openaiSpeech`, `elevenlabsSpeech`) | +| `createTranscriptionOptions` | `generateTranscription()` | transcription adapter (e.g. `openaiTranscription`, `falTranscription`) | + +All helpers are exported from `@tanstack/ai`. + +## Example: shared chat configuration across routes + +Suppose you have several routes that all hit the same model with the same provider options and tool set. Factor the configuration out once: + +```typescript +// lib/ai/chat-options.ts +import { createChatOptions, toolDefinition } from '@tanstack/ai' +import { openaiText } from '@tanstack/ai-openai' +import { z } from 'zod' + +const lookupOrderDef = toolDefinition({ + name: 'lookupOrder', + inputSchema: z.object({ orderId: z.string() }), +}) + +const lookupOrder = lookupOrderDef.server(async ({ orderId }) => { + return db.orders.findUnique({ where: { id: orderId } }) +}) + +export const supportChatOptions = createChatOptions({ + adapter: openaiText('gpt-5.2'), + systemPrompts: ['You are a customer-support assistant for Acme Corp.'], + tools: [lookupOrder], + modelOptions: { + reasoning: { effort: 'medium' }, + }, +}) +``` + +```typescript +// routes/api/support/chat.ts +import { chat, toServerSentEventsResponse } from '@tanstack/ai' +import { supportChatOptions } from '@/lib/ai/chat-options' + +export async function POST(request: Request) { + const { messages } = await request.json() + const stream = chat({ ...supportChatOptions, messages }) + return toServerSentEventsResponse(stream) +} +``` + +```typescript +// routes/api/support/draft-reply.ts — same adapter+tools, different schema +import { chat } from '@tanstack/ai' +import { supportChatOptions } from '@/lib/ai/chat-options' +import { z } from 'zod' + +export async function POST(request: Request) { + const { ticket } = await request.json() + const draft = await chat({ + ...supportChatOptions, + messages: [{ role: 'user', content: `Draft a reply to: ${ticket}` }], + outputSchema: z.object({ subject: z.string(), body: z.string() }), + stream: false, + }) + return Response.json(draft) +} +``` + +Both routes share the adapter, system prompt, tools, and reasoning settings; each adds what it needs. Override or omit any field at the call site — the spread wins on the right. + +## Example: typed pre-configured image generation + +```typescript +import { createImageOptions, generateImage } from '@tanstack/ai' +import { openaiImage } from '@tanstack/ai-openai' + +const heroImageOptions = createImageOptions({ + adapter: openaiImage('gpt-image-1'), + size: '1792x1024', + numberOfImages: 1, +}) + +const result = await generateImage({ + ...heroImageOptions, + prompt: 'A glass sphere refracting a sunset over a calm sea', +}) +``` + +The same pattern works for `createVideoOptions`, `createSpeechOptions`, `createTranscriptionOptions`, `createAudioOptions`, and `createSummarizeOptions` — the adapter is captured in the typed options object and every downstream call is narrowed to it. + +## What the helper does NOT do + +- **No runtime behavior.** `createChatOptions(opts)` is `opts`. There is no validation, freezing, cloning, or memoization. If you mutate the returned object after creation, the next call sees the mutation. Treat the result as immutable by convention. +- **No partial typing.** The helper expects the full options shape it'll be spread into. If you need to build options up incrementally, type the intermediate state yourself (`Partial>`) and only call the helper at the boundary where the shape is complete. +- **No request execution.** The helper does not call the model. Only the activity function (`chat`, `generateImage`, …) makes the request. + +## Related + +- [Per-Model Type Safety](./per-model-type-safety) — how the adapter+model pair drives `modelOptions` inference. +- [Tree-Shaking](./tree-shaking) — why each adapter is exported separately, and how the typed-options pattern keeps your bundle small. +- [Extend Adapter](./extend-adapter) — when you need to add custom models to an adapter without losing the same typed-options ergonomics. diff --git a/docs/api/ai-vue.md b/docs/api/ai-vue.md index d1831b6c1..d403e8d4a 100644 --- a/docs/api/ai-vue.md +++ b/docs/api/ai-vue.md @@ -105,7 +105,7 @@ interface UseChatReturn { } ``` -**Note:** Reactive state (`messages`, `isLoading`, `error`, `status`, `isSubscribed`, `connectionStatus`, `sessionGenerating`) is wrapped in `DeepReadonly>`. Access values with `.value` (e.g., `messages.value`). Cleanup is automatic via `onScopeDispose`. +**Note:** Reactive state (`messages`, `isLoading`, `error`, `status`, `isSubscribed`, `connectionStatus`, `sessionGenerating`) is wrapped in `DeepReadonly>`. In ` ``` diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md index e8f21f7d4..b8e6a07d7 100644 --- a/docs/getting-started/quick-start.md +++ b/docs/getting-started/quick-start.md @@ -250,9 +250,9 @@ You now have a working chat application. The `useChat` hook handles: Since TanStack AI is framework-agnostic, you can define and use tools in any environment. Here's a quick example of defining a tool and using it in a chat: ```typescript -import { chat } from '@tanstack/ai' -import { toolDefinition } from '@tanstack/ai' +import { chat, toolDefinition } from '@tanstack/ai' import { openaiText } from '@tanstack/ai-openai' +import { z } from 'zod' const getProductsDef = toolDefinition({ name: 'getProducts', @@ -263,10 +263,10 @@ const getProducts = getProductsDef.server(async ({ query }) => { return await db.products.search(query) }) -chat({ +const stream = chat({ adapter: openaiText('gpt-5.2'), messages: [{ role: 'user', content: 'Find products' }], - tools: [getProducts] + tools: [getProducts], }) ``` diff --git a/docs/media/text-to-speech.md b/docs/media/text-to-speech.md index e281ff69b..ed960d0d4 100644 --- a/docs/media/text-to-speech.md +++ b/docs/media/text-to-speech.md @@ -31,14 +31,11 @@ Text-to-speech (TTS) is handled by TTS adapters that follow the same tree-shakea ```typescript import { generateSpeech } from '@tanstack/ai' -import { openaiTTS } from '@tanstack/ai-openai' +import { openaiSpeech } from '@tanstack/ai-openai' -// Create a TTS adapter (uses OPENAI_API_KEY from environment) -const adapter = openaiSpeech() - -// Generate speech from text +// Generate speech from text (uses OPENAI_API_KEY from environment) const result = await generateSpeech({ - adapter: openaiTTS('tts-1'), + adapter: openaiSpeech('tts-1'), text: 'Hello, welcome to TanStack AI!', voice: 'alloy', }) @@ -54,12 +51,9 @@ console.log(result.contentType) // 'audio/mpeg' import { generateSpeech } from '@tanstack/ai' import { geminiSpeech } from '@tanstack/ai-gemini' -// Create a TTS adapter (uses GOOGLE_API_KEY from environment) -const adapter = geminiSpeech() - -// Generate speech from text +// Generate speech from text (uses GOOGLE_API_KEY or GEMINI_API_KEY from environment) const result = await generateSpeech({ - adapter: geminiTTS('gemini-2.5-flash-preview-tts'), + adapter: geminiSpeech('gemini-2.5-flash-preview-tts'), text: 'Hello from Gemini TTS!', }) @@ -154,7 +148,7 @@ OpenAI provides several distinct voices: ```typescript const result = await generateSpeech({ - adapter: openaiTTS('tts-1-hd'), + adapter: openaiSpeech('tts-1-hd'), text: 'High quality speech synthesis', voice: 'nova', format: 'mp3', @@ -221,7 +215,7 @@ async function saveAudio(result: TTSResult, filename: string) { // Usage const result = await generateSpeech({ - adapter: openaiTTS('tts-1'), + adapter: openaiSpeech('tts-1'), text: 'Hello world!', }) @@ -239,7 +233,7 @@ TanStack AI provides React hooks and server-side streaming helpers to build full ```typescript // routes/api/generate/speech.ts import { generateSpeech, toServerSentEventsResponse } from '@tanstack/ai' -import { openaiTTS } from '@tanstack/ai-openai' +import { openaiSpeech } from '@tanstack/ai-openai' import { createFileRoute } from '@tanstack/react-router' export const Route = createFileRoute('/api/generate/speech')({ @@ -250,7 +244,7 @@ export const Route = createFileRoute('/api/generate/speech')({ const { text, voice, format, model } = body.data const stream = generateSpeech({ - adapter: openaiTTS(model ?? 'tts-1'), + adapter: openaiSpeech(model ?? 'tts-1'), text, voice, format, @@ -311,13 +305,13 @@ For non-streaming usage with TanStack Start server functions: // lib/server-functions.ts import { createServerFn } from '@tanstack/react-start' import { generateSpeech } from '@tanstack/ai' -import { openaiTTS } from '@tanstack/ai-openai' +import { openaiSpeech } from '@tanstack/ai-openai' export const generateSpeechFn = createServerFn({ method: 'POST' }) .inputValidator((data: { text: string; voice?: string }) => data) .handler(async ({ data }) => { return generateSpeech({ - adapter: openaiTTS('tts-1'), + adapter: openaiSpeech('tts-1'), text: data.text, voice: data.voice, }) @@ -344,14 +338,14 @@ For TanStack Start server functions that stream results. The fetcher receives ty // lib/server-functions.ts import { createServerFn } from '@tanstack/react-start' import { generateSpeech, toServerSentEventsResponse } from '@tanstack/ai' -import { openaiTTS } from '@tanstack/ai-openai' +import { openaiSpeech } from '@tanstack/ai-openai' export const generateSpeechStreamFn = createServerFn({ method: 'POST' }) .inputValidator((data: { text: string; voice?: string }) => data) .handler(({ data }) => { return toServerSentEventsResponse( generateSpeech({ - adapter: openaiTTS('tts-1'), + adapter: openaiSpeech('tts-1'), text: data.text, voice: data.voice, stream: true, @@ -470,7 +464,7 @@ TypeScript automatically infers the result type from your `onResult` return valu ```typescript try { const result = await generateSpeech({ - adapter: openaiTTS('tts-1'), + adapter: openaiSpeech('tts-1'), text: 'Hello!', }) } catch (error) { @@ -500,14 +494,14 @@ The TTS adapters use the same environment variables as other adapters: For production use or when you need explicit control: ```typescript -import { createOpenaiTTS } from '@tanstack/ai-openai' -import { createGeminiTTS } from '@tanstack/ai-gemini' +import { createOpenaiSpeech } from '@tanstack/ai-openai' +import { createGeminiSpeech } from '@tanstack/ai-gemini' // OpenAI -const openaiAdapter = createOpenaiTTS('your-openai-api-key') +const openaiAdapter = createOpenaiSpeech('tts-1', 'your-openai-api-key') // Gemini -const geminiAdapter = createGeminiTTS('your-google-api-key') +const geminiAdapter = createGeminiSpeech('gemini-2.5-flash-preview-tts', 'your-google-api-key') ``` ## Best Practices diff --git a/docs/reference/functions/generateSpeech.md b/docs/reference/functions/generateSpeech.md index 3b46735df..87edda8b5 100644 --- a/docs/reference/functions/generateSpeech.md +++ b/docs/reference/functions/generateSpeech.md @@ -39,10 +39,10 @@ Uses AI text-to-speech models to create audio from natural language text. ```ts import { generateSpeech } from '@tanstack/ai' -import { openaiTTS } from '@tanstack/ai-openai' +import { openaiSpeech } from '@tanstack/ai-openai' const result = await generateSpeech({ - adapter: openaiTTS('tts-1-hd'), + adapter: openaiSpeech('tts-1-hd'), text: 'Hello, welcome to TanStack AI!', voice: 'nova' }) @@ -52,7 +52,7 @@ console.log(result.audio) // base64-encoded audio ```ts const result = await generateSpeech({ - adapter: openaiTTS('tts-1'), + adapter: openaiSpeech('tts-1'), text: 'This is slower speech.', voice: 'alloy', format: 'wav', diff --git a/docs/tools/client-tools.md b/docs/tools/client-tools.md index f65cde7b6..a39bd94c3 100644 --- a/docs/tools/client-tools.md +++ b/docs/tools/client-tools.md @@ -244,7 +244,8 @@ Client tools go through a small set of observable lifecycle states you can surfa - `awaiting-input` — the model intends to call the tool but arguments haven't arrived yet. - `input-streaming` — the model is streaming the tool arguments (partial input may be available). - `input-complete` — all arguments have been received and the tool is executing. -- `completed` — the tool finished; part.output contains the result (or error details). +- `approval-requested` / `approval-responded` — only seen for tools with `needsApproval: true`. +- `complete` — the tool finished; `part.output` contains the result (or error details). Use these states to show loading indicators, streaming progress, and final success/error feedback. The example below maps each state to a simple UI message. @@ -261,8 +262,8 @@ function ToolCallDisplay({ part }: { part: ToolCallPart }) { if (part.state === "input-complete") { return
✓ Arguments received, executing...
; } - - if (part.output) { + + if (part.state === "complete") { return
✅ Tool completed successfully
; } diff --git a/docs/tools/tool-approval.md b/docs/tools/tool-approval.md index 67c597fa4..b92279b4a 100644 --- a/docs/tools/tool-approval.md +++ b/docs/tools/tool-approval.md @@ -13,13 +13,14 @@ keywords: - human-in-the-loop --- -The tool approval flow allows you to require user approval before executing sensitive tools, giving users control over actions like sending emails, making purchases, or deleting data. Tools go through these states during approval: +The tool approval flow allows you to require user approval before executing sensitive tools, giving users control over actions like sending emails, making purchases, or deleting data. A tool call moves through the `ToolCallState` lifecycle: -1. **`approval-requested`** - Waiting for user approval -2. **`executing`** - Approved, now executing -3. **`output-available`** - Execution completed -4. **`output-error`** - Execution failed -5. **`cancelled`** - User denied approval +1. **`awaiting-input`** — Tool call started, no arguments yet +2. **`input-streaming`** — Arguments arriving incrementally +3. **`input-complete`** — All arguments received +4. **`approval-requested`** — Waiting for user approval (only if `needsApproval: true`) +5. **`approval-responded`** — User approved or denied +6. **`complete`** — Tool finished executing (result available, or denial recorded) When a tool requires approval, the typical flow is: @@ -109,7 +110,7 @@ function ChatComponent() { return (

Approve: {part.name}

-
{JSON.stringify(part.arguments, null, 2)}
+
{JSON.stringify(part.input, null, 2)}