Boolean reduction performance improvements#1401
Conversation
Similar to changes in sum, now traverses the iteration dimension the fastest
- Aligns with similar changes to sum
|
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1401/index.html |
|
Array API standard conformance tests for dpctl=0.14.6dev5=py310ha25a700_4 ran successfully. |
|
Using the same example as in #1364, the performance benefits are clear: Before: after: |
|
Please add |
It's been added and fixes the CI. I'll look into properly solving the problem in a separate PR. |
|
Array API standard conformance tests for dpctl=0.14.6dev5=py310ha25a700_5 ran successfully. |
This PR makes changes to boolean reductions which align with #1364
Namely, the traversal pattern of work groups in boolean reductions has been changed to be fastest over the iteration dimension, rather than the reduction dimension, and a specialized kernel for reductions over
axis 0in matrices has been added.The original contiguous boolean reduction kernel has also been renamed to make the difference more apparent.