[TOPI, CUDA] Bug fix properly bind gpu threads to injective op#1603
[TOPI, CUDA] Bug fix properly bind gpu threads to injective op#1603masahi wants to merge 1 commit intoapache:masterfrom
Conversation
|
This seems to be the problem of the fusor, as currently we do not encourage fusing of injective op into conv2d, only elemwise is allowed |
|
ok, close it for now. But to change fusor, we need to realize conv + batchnorm and broadcast_mul before the final elemwise_add. I think the logic becomes very tricky. |
|
The fusor's logic should enforce broadcast_mul not being fused into conv2d+bn, because broadcast_mul should be marked as injective(due to it is being fused into reshape) |
|
The difficulty seems to be that broadcast_mul is a sibling node of conv + bn, so it cannot 'see' conv + bn node during the initial partitioning stage. But they are both assigned FuseRule::kFuseToMaster to elemwise_add at the bottom, so they are fused during the last grouping stage. |
|
We should be able to check the fused type of the group, which is injective instead of elemwise |
|
ok, my plan is as follows:
|
We had an interesting error report on the discussion forum, where reshape op is fused into convolution. In picture, it looks like this. Nodes in red are fused by NNVM. This fusion happens without the recent change I made in #1548.
The error occurred for cuda target because reshape op, which is an injective op, is not bound gpu threads when fused with convolution. The output of tvm.lower(...) is below.
I replaced the use of tag.is_broadcast with tag.is_injective to make sure injective ops as well as broadcast ops are inlined correctly. But I'm not quite sure if this is a valid change, so please review @tqchen @Laurawly @merrymercy . If this looks good, then I should replace other uses of tag.is_broadcast() within other backends.
I added a simplified test case which fails without this PR.