You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CudaHybridGdnForwardPass rejects MoE MTP heads when !_cpuMoe with:
```
"MoE MTP head requires SHARPI_CPU_MOE=1. GPU MoE path (SLRU expert cache)
doesn't reserve slots for the MTP block; enable CPU MoE mode to load this model."
```
So Qwen3.6-35B-A3B-MTP requires SHARPI_CPU_MOE=1. On a 12 GB card the model can't fit anyway (~22 GB Q4_K_M), so this is moot today — but on 24 GB+ cards where the full routed-expert stack could live in VRAM, the _expertSlotManager needs to be sized for (L+1) * numExperts instead of L * numExperts, and the GpuMoeFfn dispatch needs a parameterised variant that takes a layer's tensors instead of an array index.
Scope
Plumb the MTP block as "layer L" in _expertSlotManager allocations (L * _numExperts → (L+1) * _numExperts, conditional on _hasMtp && _mtpIsMoE).
Upload the MTP block's routed expert weights into the SLRU cache, indexed at layer L.
Background
CudaHybridGdnForwardPassrejects MoE MTP heads when!_cpuMoewith:```
"MoE MTP head requires SHARPI_CPU_MOE=1. GPU MoE path (SLRU expert cache)
doesn't reserve slots for the MTP block; enable CPU MoE mode to load this model."
```
So Qwen3.6-35B-A3B-MTP requires
SHARPI_CPU_MOE=1. On a 12 GB card the model can't fit anyway (~22 GB Q4_K_M), so this is moot today — but on 24 GB+ cards where the full routed-expert stack could live in VRAM, the_expertSlotManagerneeds to be sized for(L+1) * numExpertsinstead ofL * numExperts, and theGpuMoeFfndispatch needs a parameterised variant that takes a layer's tensors instead of an array index.Scope
_expertSlotManagerallocations (L * _numExperts→(L+1) * _numExperts, conditional on_hasMtp && _mtpIsMoE).GpuMoeFfn(int layer)into a tensor-parameterised core (same pattern asMoeFfnCore/CpuMoeFfnCorefrom MoE-FFN MTP head: unblock Qwen3.6-35B-A3B-MTP loading #44).!_cpuMoeand_mtpIsMoE.Acceptance criteria
SHARPI_CPU_MOE=1on a card with enough VRAM.Related