feat(server): multimodal image input endpoint (Gemma 4 gemma4uv content blocks)

Follow-up to #250 / PR #251. The CLI supports Gemma 4 12B image input (`--image`); the API server does not.

## Scope
Accept image **content blocks** and route them through the existing vision splice:
- [ ] `/v1/messages` (Anthropic): `image` content blocks (base64 `source`).
- [ ] `/v1/chat/completions` (OpenAI): `image_url` parts (data URL / base64; URL fetch optional).
- [ ] Decode → `ImagePreprocessor` → `GemmaUvVisionEmbedder` → splice via `ForwardEmbedding` at the `<|image|>` placeholder positions (reuse the CLI's `RunImagePrompt` logic).
- [ ] Multi-image per message (markers already supported in the CLI path).
- [ ] Smoke tests in `Tests.Server`.

## Dependencies / notes
- `SharpInference.Server` would need a reference to `SharpInference.Vision` and an mmproj-configured engine (new option, e.g. `SHARPI_MMPROJ`).
- Server currently runs the engine on its configured backend; image input needs a pass with `SupportsEmbeddingInput` (CPU or full CUDA today — see the hybrid/Vulkan follow-up).
- Only `gemma4uv` (12B) is wired; reject image content for other models with a clear error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): multimodal image input endpoint (Gemma 4 gemma4uv content blocks) #253

Scope

Dependencies / notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(server): multimodal image input endpoint (Gemma 4 gemma4uv content blocks) #253

Description

Scope

Dependencies / notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions