Skip to content

Better validation of gpu schedules#8068

Merged
abadams merged 6 commits intomainfrom
abadams/validate_gpu_schedules
Feb 7, 2024
Merged

Better validation of gpu schedules#8068
abadams merged 6 commits intomainfrom
abadams/validate_gpu_schedules

Conversation

@abadams
Copy link
Copy Markdown
Member

@abadams abadams commented Feb 5, 2024

GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.

This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:

Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread
loops. The maximum number of nested GPU thread loops is 3. The loop nest
is:
compute_at for g$8:
 for g$8.s0.v7:
  for g$8.s0.v6:
   for g$8.s0.v5:
    for g$8.s0.v4:
     gpu_block g$8.s0.v3:
      gpu_block g$8.s0.v2:
       gpu_thread g$8.s0.v1:
        gpu_thread g$8.s0.v0:
         store_at for f$8:
          compute_at for f$8:
           gpu_thread f$8.s0.v1:
            gpu_thread f$8.s0.v0:

Fixes the bug found in #7946

This means we actually print error messages when using exceptions and
the makefile
GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.

This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:

```
Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread
loops. The maximum number of nested GPU thread loops is 3. The loop nest
is:
compute_at for g$8:
 for g$8.s0.v7:
  for g$8.s0.v6:
   for g$8.s0.v5:
    for g$8.s0.v4:
     gpu_block g$8.s0.v3:
      gpu_block g$8.s0.v2:
       gpu_thread g$8.s0.v1:
        gpu_thread g$8.s0.v0:
         store_at for f$8:
          compute_at for f$8:
           gpu_thread f$8.s0.v1:
            gpu_thread f$8.s0.v0:
```

Fixes the bug found in #7946
Comment thread src/ScheduleFunctions.cpp Outdated

std::ostringstream err;

/*
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is meant to left here in commented-out form, it is essential you add a comment explaining why. (Otherwise, just delete it.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, deleted.

@steven-johnson
Copy link
Copy Markdown
Contributor

ready to land?

@abadams abadams merged commit 39e5c08 into main Feb 7, 2024
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Update makefile to use test/common/terminate_handler.cpp

This means we actually print error messages when using exceptions and
the makefile

* Better validate of GPU schedules

GPU loop constraints were checked in two different places. Checking them
in ScheduleFunctions was incorrect because it didn't consider update
definitions and specializations. Checking them in FuseGPUThreadLoops was
too late, because the Var names have gone (they've been renamed to
things like __thread_id_x). Furthermore, some problems were internal
errors or runtime errors when they should have been user errors. We
allowed 4d thread and block dimensions, but then hit an internal error.

This PR centralizes checking of GPU loop structure in
CanonicalizeGPUVars and adds more helpful error messages that print the
problematic loop structure. E.g:

```
Error: GPU thread loop over f$8.s0.v0 is inside three other GPU thread
loops. The maximum number of nested GPU thread loops is 3. The loop nest
is:
compute_at for g$8:
 for g$8.s0.v7:
  for g$8.s0.v6:
   for g$8.s0.v5:
    for g$8.s0.v4:
     gpu_block g$8.s0.v3:
      gpu_block g$8.s0.v2:
       gpu_thread g$8.s0.v1:
        gpu_thread g$8.s0.v0:
         store_at for f$8:
          compute_at for f$8:
           gpu_thread f$8.s0.v1:
            gpu_thread f$8.s0.v0:
```

Fixes the bug found in halide#7946

* Delete dead code

* Actually clear the ostringstream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants