Add llama.cpp-named MoE placement flags (--cpu-moe / --n-cpu-moe) wrapping SHARPI_CPU_MOE

## Background

Surfaced while adding the MoE CLI flags in PR #77.

SharpInference already supports forcing MoE experts onto the CPU via the `SHARPI_CPU_MOE` env override (read in `CudaHybridGdnForwardPass`), but there is **no CLI flag** for it. llama.cpp exposes this as first-class, well-known flags:

- `--cpu-moe` / `-cmoe` — keep **all** routed-expert weights on CPU.
- `--n-cpu-moe N` / `-ncmoe N` — keep the routed experts of N layers on CPU (llama.cpp counts from the highest-numbered layers).
- `-ot` / `--override-tensor` — regex tensor placement (more general; likely out of scope).

Per the project convention of matching llama.cpp arg names where an equivalent exists, we should expose `--cpu-moe` (and ideally `--n-cpu-moe`) mapping onto the existing override.

## Scope

1. Add `[CommandOption("--cpu-moe|-cmoe")] bool CpuMoe` to `RunCommand.Settings`; when set, force CPU MoE (set `SHARPI_CPU_MOE=1` early in `Execute`, same plumbing pattern as the other MoE flags).
2. Add `--n-cpu-moe|-ncmoe <N>` if/when the engine supports a per-layer CPU/GPU expert split count (today the override is all-or-nothing; scope `--n-cpu-moe` to whatever the engine can honor, or defer it with a clear note).
3. Match llama.cpp semantics where feasible (document any deviation, e.g. layer-counting direction).
4. Help text + README mention.

## Acceptance

- [ ] `--cpu-moe` forces the CPU MoE path on a supported model, equivalent to `SHARPI_CPU_MOE=1`.
- [ ] Flag name/alias matches llama.cpp; behavior documented.
- [ ] `--n-cpu-moe` either implemented to llama.cpp semantics or explicitly deferred with rationale.

## Related

- PR #77 (MoE CLI flags + the "match llama.cpp names" discussion)
- `SHARPI_CPU_MOE` handling in `CudaHybridGdnForwardPass`
- llama.cpp `--cpu-moe` / `--n-cpu-moe` / `-ot`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama.cpp-named MoE placement flags (--cpu-moe / --n-cpu-moe) wrapping SHARPI_CPU_MOE #80

Background

Scope

Acceptance

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add llama.cpp-named MoE placement flags (--cpu-moe / --n-cpu-moe) wrapping SHARPI_CPU_MOE #80

Description

Background

Scope

Acceptance

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions