[TOPI] Fix GPU Dynamic Op Schedule#7117
Conversation
mbrookhart
left a comment
There was a problem hiding this comment.
A couple of nitpicks, but overall, it looks great, awesome work.
| mod, | ||
| [np_indices_result, np_valid_box_count], | ||
| only_vm=False, | ||
| disable_targets=["nvptx"], |
There was a problem hiding this comment.
This tests the empty output VM change 👍
Why disable nvptx?
There was a problem hiding this comment.
There is issue causing segfault from dynamic nms for nvptx, and generally we need thrust for any dynamic shape sorting. For now nvptx is not ready for these operations.
There was a problem hiding this comment.
Makes sense. I'm trying to fix the default sort kernel in #7099, if you want to take a look
|
|
||
| num_inputs = 25 | ||
| x = [relay.var("x", shape=(relay.Any(),), dtype="float32") for _ in range(num_inputs)] | ||
| z = relay.op.concatenate(x, axis=0) |
There was a problem hiding this comment.
this tests the injective schedule 👍
| score_axis = score_index | ||
| score_shape = (batch_size, num_anchors) | ||
| score_tensor = te.compute(score_shape, lambda i, j: data[i, j, score_axis], tag=tag.ELEMWISE) | ||
| data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf", data_alignment=8) |
There was a problem hiding this comment.
This looks fine, but I'm a little surprised it's necessary. Do you have a test case that breaks the current code, or is this mostly for performance?
There was a problem hiding this comment.
When the nms workload is large like in RCNN models, general cuda injective schedule can still cause runtime error even with the improvement of this PR. It's common that any dynamic injective op can have runtime issue with current uniform cuda injective schedule.
This problem is not directly related to nms, but cuda injective schedule. Later we might need to revisit this part for gpu dynamic ops and have a better and more general solution(together with more tests).
| if cfg.is_fallback: | ||
| N, F, Y, X = get_const_tuple(conv.shape) | ||
| if not isinstance(N, int): | ||
| N = 1 |
There was a problem hiding this comment.
Can we add a test that hits this change?
There was a problem hiding this comment.
Yeah we do have a test for this. Now I enabled all targets.
|
Thanks @mbrookhart |
* Fix GPU dynamic op schedules * Fix dynamic shape nms * Fix * Fix test format
* Fix GPU dynamic op schedules * Fix dynamic shape nms * Fix * Fix test format
* Fix GPU dynamic op schedules * Fix dynamic shape nms * Fix * Fix test format
* Fix GPU dynamic op schedules * Fix dynamic shape nms * Fix * Fix test format
This PR limits the resources used by dynamic shape gpu kernels to avoid runtime errors. It also skips
CallPackedin vm if kernel has only one output and this output is empty, like (1, 0, 6).After this PR, TF and PT object detection models should be runnable on Nvidia GPU.
@zhiics @Laurawly @mbrookhart