Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup

We should finish fleshing out the design around the various `CLR Configuration Knobs` that allow users to control the various `System.Numerics.Vectors` and `System.Runtime.Intrinsics` support.

## Prior to HWIntrinsics

This section discusses the state of the world in netcoreapp1.0 through netcoreapp2.0. Desktop had similar, but slightly different behavior that I will attempt to call out where relevant.

### `COMPlus_EnableSSE3_4` and `COMPlus_EnableAVX`

The former (`COMPlus_EnableSSE3_4`) appears to be .NETCore only but defaults to 1 and is used in combination with a CPUID check that the VM does. It controls whether the `S.N.Vectors` intrinsics generate SSE3 through SSE4.2 instructions. The flag is completely ignored if `COMPlus_EnableAVX` and the corresponding VM check are `1`.

The latter (`COMPlus_EnableAVX`) is available for both Desktop and .NETCore, it defaults to 1 ans is used in combination with a CPUID check that the VM does. It controls:
* Whether the `S.N.Vectors` intrinsics generate AVX/AVX2 instructions
* The size of `Vector<T>` (setting it to zero, forces `Vector<T>` to be sized to 16)
* Whether the compiler emits `VEX` encoded instructions

### `COMPlus_FeatureSIMD`

This flag is available for both Desktop and .NETCore, it defaults to 1 and controls various bits of code related to `Vector<T>`, the `S.N.Vectors` compiler support, and the `TYP_SIMD*` support.

Setting this to 0 causes `Vector<T>` to be sized at 16, none of the `S.N.Vectors` code to be treated as intrinsic, and prevents the various types from being resolved as `TYP_SIMD*` (which also generally prevents these types from appearing).

## Current State

This section discusses the current state of the world for netcoreapp3.0.

### `COMPlus_Enable<ISA>`

We have a number of `Enable<ISA>` flags, including: the pre-existing `SSE3_4` and `AVX` flags as well as the `SSE`, `SSE2`, `SSE3`, SSSE3`, `SSE41, `SSE42`, `AVX2`, `AES`, `PCLMULQDQ`, `POPCNT`, FMA`, `LZCNT`, `BMI1`, and `BMI2` flags. All of these are used in combination with a corresponding CPUID check that the VM does.

These flags impact the compiler support for a given ISA and any ISAs that are "descendants" of that ISA (e.g. `SSE=0` would also disable `SSE2` which would disable any ISAs dependent on `SSE2`, etc). The flags are currently used primarily for the `HWIntrinsics` feature as that is the only thing that will cause the various instructions to be generated. In the future, these flags might be applicable more generally depending on other optimizations the JIT could consider. An exception to this is `SSE` and `SSE2` which are considered "baseline support" by RyuJIT. These ISAs only impact the corresponding `HWIntrinsic` ISAs and do not actually impact compiler support for generating these instructions.

The pre-existing `SSE3_4` flag is now treated as equivalent to `SSE3`. It impacts the `SSE3` ISA and any child ISAs (including `AVX`). It otherwise functions identically and continues impacting the codgen support for `S.N.Vectors`.

The pre-existing `AVX` flag continues impacting the size of `Vector<T>` and whether the compiler emits `VEX` encoded instructions. However, for the size of `Vector<T>` it now does so indirectly (in that disabling `AVX` also disables `AVX2`), as the check was shifted onto `AVX2`.

### `COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

The `COMPlus_FeatureSIMD` flag continues functioning as it did before.

The `COMPlus_EnableHWIntrinsic` flag controls whether the `System.Runtime.Intrinsic` methods are treated as intrinsic, and therefore, whether they throw a `PNSE` or generate actual code when the given ISA is supported by the CPU/Compiler. There is currently a bug that setting `EnableHWIntrinsic=0` will also disable compiler support for all the various ISAs listed above. This also means that it currently impacts the size of `Vector<T>` and whether the compiler will emit VEX encoded instructions.

## Proposal for Cleanup

In this section, I will attempt to describe where we want to be with the various flags.

### New Flag: `EnableVEX`

Currently we control the `VEX` support for the compiler by checking the `EnableAVX` flag (and corresponding CPUID check done by the VM). However, there are two ISAs that require the VEX encoding but not for AVX to also be enabled, these are `BMI1` and `BMI2`. While we should never encounter a CPU that has `BMI1`/`BMI2` but that does not also support `AVX`, `AVX` requires an additional check that the OS supports saving/restoring the 256-bit YMM registers. This support is not guaranteed and, at least on Windows, can be toggled by the user. Due to this, we need the check to be updated so that the `BMI1`/`BMI2` ISAs  (and any future ISAs with similar requirements) can still use the VEX encoding. Additionally, the `VEX` encoding is generally more efficient (it removes the RMW requirement from most of the instructions and supports unaligned memory addresses) and it may be desirable to still emit the VEX encoded instructions for `SSE` through `SSE42` when the user has set `EnableAVX=0`.

The proposal is then to expose a new `COMPlus_EnableVEX` flag that is used to control the VEX encoding. Setting it to `0` would disable any ISA that requires the VEX encoding (`AVX`, `AVX2`, `FMA`, `BMI1`, and `BMI2`, as well as any future ISAs). Its default value (`1`) would allow the compiler to emit the VEX encoding for `SSE` through `SSE42` when the CPU/OS support AVX but when the user has set `EnableAVX=0`. It would also allow other ISAs not in the AVX hierarchy (`BMI1` and `BMI2`) to be emitted even when the OS does not support the saving/restoring the 256-bit YMM registers.

An alternative would be to not expose a new flag and instead just update the emitter to know that it can use the `VEX` encoding for the `BMI1`/`BMI2` ISAs. The only difference from the above would be that `SSE` through `SSE42` would not use the VEX encoding when `AVX=0` (and when the OS supports saving/restoring the 256-bit YMM registers). This might be a more accurate state since the `VEX` encoded forms of the `SSE` through `SSE42` instructions are technically part of the AVX instruction set.

### New Flag: `VectorTSize`

Currently we control the sizeof `Vector<T>` by defaulting it to `16` and changing it to `32` if `AVX2` is supported. However, this is not very extensible (what do we do when/if `AVX-512` becomes supported and the size can be `64`) and it is very much tied to x86 (you wouldn't want this to impact ARM if we add `SVE` support). It also means that if you need a smaller `Vector<T>`, you must also disable the general compiler support for the `AVX2` ISA (at a minimum). This also impacts the HWIntrinsics feature.

The proposal is then to expose a new `COMPlus_VectorTSize` flag that is used to control the sizeof `Vector<T>`. The value would default to `0` which would mean to follow the normal logic we have today (size to `16` by default and change to `32` if `AVX2` is supported). We would then come up with an additional scheme such that other values allow the user to explicitly size `Vector<T>` (to a supported size).

My current thinking is that any unsupported value is treated as `0` (default). Otherwise, the supported values are the exact sizes (16 or 32, in the future 64 if `AVX-512` becomes supported, etc). Another option would be that the value is treated as the nearest size that is less than the given size. As an example, if the user gives `31`, it would be sized `16`. If the user gave `64` and we only support `32` and `16`, it would be `32`. If the user gave `100` and we support `128`, `64`, `32`, and `16`; they would get `64`.

The flag would continue being used in conjunction with the `Enable<ISA>` checks for a given platform, as you can't size `Vector<T>` to `32` if `AVX` is not supported (for example).

### `COMPlus_Enable<ISA>`

These flags are currently in a fairly good state, some considerations might be:
* Should we be exposing `SSE` and `SSE2` or should they be folded back into the `EnableHWIntrinsic` flag (given that they are considered "baseline" for CoreCLR).
* Can we get rid of `SSE3_4`, since this is now covered by the individual `SSE3`, `SSSE3`, `SSE41`, and `SSE42` flags and since it is treated as equivalent to `SSE3` (which will also disable the child ISAs).

### `COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

`COMPlus_FeatureSIMD` should have its scope reduced so that it only impacts the `S.N.Vectors` codegen. The `TYP_SIMD*` support should be split out into its own feature that `FEATURE_SIMD` and `FEATURE_HW_INTRINSICS` can sit ontop of.

`COMPlus_EnableHWIntrinsic` should be fixed so that it only impacts the `S.R.Intrinsics` codegen. It should have no impact on the various ISAs the compiler lists as supported.



category:implementation
theme:vector-codegen
skill-level:intermediate
cost:medium
impact:small

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

Prior to HWIntrinsics

`COMPlus_EnableSSE3_4` and `COMPlus_EnableAVX`

`COMPlus_FeatureSIMD`

Current State

`COMPlus_Enable<ISA>`

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

Proposal for Cleanup

New Flag: `EnableVEX`

New Flag: `VectorTSize`

`COMPlus_Enable<ISA>`

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

Description

Prior to HWIntrinsics

COMPlus_EnableSSE3_4 and COMPlus_EnableAVX

COMPlus_FeatureSIMD

Current State

COMPlus_Enable<ISA>

COMPlus_FeatureSIMD and COMPlus_EnableHWIntrinsic

Proposal for Cleanup

New Flag: EnableVEX

New Flag: VectorTSize

COMPlus_Enable<ISA>

COMPlus_FeatureSIMD and COMPlus_EnableHWIntrinsic

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`COMPlus_EnableSSE3_4` and `COMPlus_EnableAVX`

`COMPlus_FeatureSIMD`

`COMPlus_Enable<ISA>`

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`

New Flag: `EnableVEX`

New Flag: `VectorTSize`

`COMPlus_Enable<ISA>`

`COMPlus_FeatureSIMD` and `COMPlus_EnableHWIntrinsic`