Better validation of gpu schedules by abadams · Pull Request #8068 · halide/Halide

abadams · 2024-02-05T23:21:14Z

GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.

This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:

Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread
loops. The maximum number of nested GPU thread loops is 3. The loop nest
is:
compute_at for g$8:
 for g$8.s0.v7:
  for g$8.s0.v6:
   for g$8.s0.v5:
    for g$8.s0.v4:
     gpu_block g$8.s0.v3:
      gpu_block g$8.s0.v2:
       gpu_thread g$8.s0.v1:
        gpu_thread g$8.s0.v0:
         store_at for f$8:
          compute_at for f$8:
           gpu_thread f$8.s0.v1:
            gpu_thread f$8.s0.v0:

Fixes the bug found in #7946

This means we actually print error messages when using exceptions and the makefile

GPU loop constraints were checked in two different places. Checking them in ScheduleFunctions was incorrect because it didn't consider update definitions and specializations. Checking them in FuseGPUThreadLoops was too late, because the Var names have gone (they've been renamed to things like __thread_id_x). Furthermore, some problems were internal errors or runtime errors when they should have been user errors. We allowed 4d thread and block dimensions, but then hit an internal error. This PR centralizes checking of GPU loop structure in CanonicalizeGPUVars and adds more helpful error messages that print the problematic loop structure. E.g: ``` Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread loops. The maximum number of nested GPU thread loops is 3. The loop nest is: compute_at for g$8: for g$8.s0.v7: for g$8.s0.v6: for g$8.s0.v5: for g$8.s0.v4: gpu_block g$8.s0.v3: gpu_block g$8.s0.v2: gpu_thread g$8.s0.v1: gpu_thread g$8.s0.v0: store_at for f$8: compute_at for f$8: gpu_thread f$8.s0.v1: gpu_thread f$8.s0.v0: ``` Fixes the bug found in #7946

…schedules

steven-johnson · 2024-02-05T23:36:47Z


    std::ostringstream err;

+    /*


If this is meant to left here in commented-out form, it is essential you add a comment explaining why. (Otherwise, just delete it.)

Oops, deleted.

…alide/Halide into abadams/validate_gpu_schedules

steven-johnson · 2024-02-07T17:42:55Z

ready to land?

* Update makefile to use test/common/terminate_handler.cpp This means we actually print error messages when using exceptions and the makefile * Better validate of GPU schedules GPU loop constraints were checked in two different places. Checking them in ScheduleFunctions was incorrect because it didn't consider update definitions and specializations. Checking them in FuseGPUThreadLoops was too late, because the Var names have gone (they've been renamed to things like __thread_id_x). Furthermore, some problems were internal errors or runtime errors when they should have been user errors. We allowed 4d thread and block dimensions, but then hit an internal error. This PR centralizes checking of GPU loop structure in CanonicalizeGPUVars and adds more helpful error messages that print the problematic loop structure. E.g: ``` Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread loops. The maximum number of nested GPU thread loops is 3. The loop nest is: compute_at for g$8: for g$8.s0.v7: for g$8.s0.v6: for g$8.s0.v5: for g$8.s0.v4: gpu_block g$8.s0.v3: gpu_block g$8.s0.v2: gpu_thread g$8.s0.v1: gpu_thread g$8.s0.v0: store_at for f$8: compute_at for f$8: gpu_thread f$8.s0.v1: gpu_thread f$8.s0.v0: ``` Fixes the bug found in halide#7946 * Delete dead code * Actually clear the ostringstream

abadams added 3 commits February 5, 2024 12:22

Update makefile to use test/common/terminate_handler.cpp

5733f52

This means we actually print error messages when using exceptions and the makefile

Merge remote-tracking branch 'origin/main' into abadams/validate_gpu_…

13161ad

…schedules

steven-johnson approved these changes Feb 5, 2024

View reviewed changes

abadams added 3 commits February 5, 2024 16:27

Delete dead code

65b9aef

Actually clear the ostringstream

3d4c3b5

Merge branch 'abadams/validate_gpu_schedules' of https://github.com/h…

a2387dd

…alide/Halide into abadams/validate_gpu_schedules

abadams merged commit 39e5c08 into main Feb 7, 2024

BrewTestBot mentioned this pull request Jul 17, 2024

halide 18.0.0 Homebrew/homebrew-core#177657

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better validation of gpu schedules#8068

Better validation of gpu schedules#8068
abadams merged 6 commits intomainfrom
abadams/validate_gpu_schedules

abadams commented Feb 5, 2024

Uh oh!

steven-johnson Feb 5, 2024

Uh oh!

abadams Feb 6, 2024

Uh oh!

steven-johnson commented Feb 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abadams commented Feb 5, 2024

Uh oh!

steven-johnson Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

abadams Feb 6, 2024

Choose a reason for hiding this comment

Uh oh!

steven-johnson commented Feb 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants