Follow-up to #250 / PR #251. The CLI supports Gemma 4 12B image input (--image); the API server does not.
Scope
Accept image content blocks and route them through the existing vision splice:
Dependencies / notes
SharpInference.Server would need a reference to SharpInference.Vision and an mmproj-configured engine (new option, e.g. SHARPI_MMPROJ).
- Server currently runs the engine on its configured backend; image input needs a pass with
SupportsEmbeddingInput (CPU or full CUDA today — see the hybrid/Vulkan follow-up).
- Only
gemma4uv (12B) is wired; reject image content for other models with a clear error.
Follow-up to #250 / PR #251. The CLI supports Gemma 4 12B image input (
--image); the API server does not.Scope
Accept image content blocks and route them through the existing vision splice:
/v1/messages(Anthropic):imagecontent blocks (base64source)./v1/chat/completions(OpenAI):image_urlparts (data URL / base64; URL fetch optional).ImagePreprocessor→GemmaUvVisionEmbedder→ splice viaForwardEmbeddingat the<|image|>placeholder positions (reuse the CLI'sRunImagePromptlogic).Tests.Server.Dependencies / notes
SharpInference.Serverwould need a reference toSharpInference.Visionand an mmproj-configured engine (new option, e.g.SHARPI_MMPROJ).SupportsEmbeddingInput(CPU or full CUDA today — see the hybrid/Vulkan follow-up).gemma4uv(12B) is wired; reject image content for other models with a clear error.