The current SIMD integer shift instructions (and even the proposed ones in flexible-vectors) only ever do "shift by scalar". However, CPUs do have instructions for "shift a vector by another vector", like VQSHL as an example.
I was trying to do parallel integer processing, but since it's extremely shift-heavy and the shifts are per-element variable, it's basically equivalent to the scalar code performance-wise (or worse, actually).
My question is: why do these instructions not exist? Especially in WASM where, if the underlying hardware doesn't have a corresponding instruction, the runtime can just trivially emit a scalarized version?
The current SIMD integer shift instructions (and even the proposed ones in flexible-vectors) only ever do "shift by scalar". However, CPUs do have instructions for "shift a vector by another vector", like VQSHL as an example.
I was trying to do parallel integer processing, but since it's extremely shift-heavy and the shifts are per-element variable, it's basically equivalent to the scalar code performance-wise (or worse, actually).
My question is: why do these instructions not exist? Especially in WASM where, if the underlying hardware doesn't have a corresponding instruction, the runtime can just trivially emit a scalarized version?