Skip to content

Only commutative reductions can be parallelized#6609

Merged
steven-johnson merged 1 commit intohalide:masterfrom
Elarnon:patch-2
Feb 14, 2022
Merged

Only commutative reductions can be parallelized#6609
steven-johnson merged 1 commit intohalide:masterfrom
Elarnon:patch-2

Conversation

@Elarnon
Copy link
Copy Markdown
Contributor

@Elarnon Elarnon commented Feb 8, 2022

Because parallelization changes the order of computation within the reduction, parallelizing associative but non-commutative reductions can result in (non-deterministically) incorrect results in the same way reordering them can.

For instance Halide currently accepts the following code, but generates non-deterministic outputs on GPU. On CPU with .parallel(r.x), OpenMP rejects the generated code (correctly) stating that the #pragma omp atomic is invalid for the same reasons.

#include <stdio.h>

#include "Halide.h"

using namespace Halide;

int main(int argc, char **argv) {
        Halide::Func A("A"), B("B");
        Halide::Var i("i");

        A(i) = i;
        B() = -1;
        Halide::RDom r(0, 1024);
        B() = A(r.x);

        A.compute_root();
        B.update().atomic().gpu_blocks(r.x);

        B.compile_jit(get_host_target().with_feature(Target::CUDA));
        Halide::Buffer<int32_t> b = B.realize();
        printf("%d\n", b());

        return 0;
}

Because parallelization changes the order of computation within the reduction, parallelizing associative but non-commutative reductions can result in (non-deterministically) incorrect results in the same way `reorder`ing them can.

For instance Halide currently accepts the following code, but generates non-deterministic outputs on GPU.  On CPU with `.parallel(r.x)`, OpenMP rejects the generated code (correctly) stating that the `#pragma omp atomic` is invalid for the same reasons.

```c++
#include <stdio.h>

#include "Halide.h"

using namespace Halide;

int main(int argc, char **argv) {
        Halide::Func A("A"), B("B");
        Halide::Var i("i");

        A(i) = i;
        B() = -1;
        Halide::RDom r(0, 1024);
        B() = A(r.x);

        A.compute_root();
        B.update().atomic().gpu_blocks(r.x);

        B.compile_jit(get_host_target().with_feature(Target::CUDA));
        Halide::Buffer<int32_t> b = B.realize();
        printf("%d\n", b());

        return 0;
}
```
@abadams
Copy link
Copy Markdown
Member

abadams commented Feb 11, 2022

I believe this is correct. Thanks for the fix.

@steven-johnson steven-johnson merged commit 38032e8 into halide:master Feb 14, 2022
@Elarnon Elarnon deleted the patch-2 branch February 15, 2022 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants