Summary
Users may try --backend cuda based on the README mentioning a CUDA backend, but no such flag exists on the RunCommand. The flag is silently ignored by Spectre.Console and inference falls back to CPU. The CUDA backend (SharpInference.Cuda) is wired only to ImageCommand (image generation), not to LLM inference.
What exists
-g 0 (default): CPU only
-g N: hybrid - N layers on Vulkan GPU + rest on CPU
-g -1: all layers on Vulkan GPU (auto-detect via TierPlanner)
Vulkan only. Requesting CUDA for chat inference is not supported.
Suggested fixes
- Reject unknown options (Spectre.Console has
StrictParsing mode) instead of silently ignoring --backend cuda
- Add a
--backend {cpu,vulkan} option that maps onto -g semantics, and emit an explicit error if cuda is requested for chat inference
- README cleanup: clarify that CUDA is image-only
Summary
Users may try
--backend cudabased on the README mentioning a CUDA backend, but no such flag exists on theRunCommand. The flag is silently ignored by Spectre.Console and inference falls back to CPU. The CUDA backend (SharpInference.Cuda) is wired only toImageCommand(image generation), not to LLM inference.What exists
-g 0(default): CPU only-g N: hybrid - N layers on Vulkan GPU + rest on CPU-g -1: all layers on Vulkan GPU (auto-detect viaTierPlanner)Vulkan only. Requesting CUDA for chat inference is not supported.
Suggested fixes
StrictParsingmode) instead of silently ignoring--backend cuda--backend {cpu,vulkan}option that maps onto-gsemantics, and emit an explicit error ifcudais requested for chat inference