Solve issues and make modifications to support CUDA for mixed precision pass here: #8069
Current initial issues as described by @Lunderberg
On the cuda side, it's failing a check that requires 16-bit floats to be used in pairs.
Check failed: lanes % 2 == 0 (1 vs. 0) : only support even lane for half type
This issue is completed when unit tests can pass for CUDA target.
Solve issues and make modifications to support CUDA for mixed precision pass here: #8069
Current initial issues as described by @Lunderberg
This issue is completed when unit tests can pass for CUDA target.