[TOPI] Rewrite GPU argwhere using exclusive scan#7314
Merged
masahi merged 2 commits intoapache:mainfrom Jan 21, 2021
Merged
Conversation
Contributor
|
Could we add a column for the performance of the PR without thrust (i.e., TIR exclusive scan?) |
mbrookhart
approved these changes
Jan 20, 2021
Contributor
mbrookhart
left a comment
There was a problem hiding this comment.
I'd like to include benchmarks without thrust in the PR for posterity, but otherwise this looks great, thanks! I'd wait to merge until @zhiics can review, since he wrote the existing kernel.
Member
Author
|
Ok updated the numbers to include TIR scan result. |
Contributor
|
👍 Not as fast as thrust, as expected, but it's good to see it's still a performance improvement. |
85a91e9 to
63469a6
Compare
Member
Author
|
Thanks @mbrookhart @zhiics |
alexwong
pushed a commit
to alexwong/tvm
that referenced
this pull request
Feb 11, 2021
* use ex scan to write argwhere * add doc
electriclilies
pushed a commit
to electriclilies/tvm
that referenced
this pull request
Feb 18, 2021
* use ex scan to write argwhere * add doc
Lokiiiiii
pushed a commit
to Lokiiiiii/tvm
that referenced
this pull request
Mar 2, 2021
* use ex scan to write argwhere * add doc
trevor-m
pushed a commit
to neo-ai/tvm
that referenced
this pull request
Mar 2, 2021
* use ex scan to write argwhere * add doc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR improves the implementation of GPU
argwhereadded in #6868, using exclusive scan (see #7303).The current implementation of
argwhereis very inefficient, because it uses atomic to update the write location. Since all threads compete for the single location, this effectively makes it a sequential kernel. Moreover, since the output indices need to be lexicographically sorted, the current implementation involves sorting along each axis.Since
argwhereis literally an instance of stream compaction, this is a perfect application of exclusive scan. Now,argwheresimply consists ofboth of which are highly parallel operation. Thus, both atomic and sort are gone, vastly simplifying the implementation. Moreover, it also brings huge speed up, as shown below.
All numbers in milli sec
please review @zhiics @Laurawly @mbrookhart @tkonolige @anijain2305 @trevor-m