[METAL] Split kernels and compile them separately#7980
Conversation
Refactor Metal module to build each kernel separately. This should help to avoid potential problem then generated blockSize is more than maxTotalThreadsPerThreadgroup.
| id<MTLLibrary> lib = nil; | ||
| std::string source; | ||
| auto kernel = parsed_kernels_.find(func_name); | ||
| if (kernel != parsed_kernels_.end()) |
There was a problem hiding this comment.
I would add here a comment that this behaviour is done for backward compatibility when we had all kernels going together without explicit separators, but looks good
There was a problem hiding this comment.
Do we have metal deployments that require the old behavior that can't be rebuilt easily? Otherwise we don't need to preserve backward compatibility.
There was a problem hiding this comment.
I cannot say for sure. I can imagine some use cases which might have issues with recompilation, but this is just assumptions.
If we do not declare backward compatibility for now, we can skip my comment and return original behaviour and rise an exception if we do not find kernel in parsed_kernels_ container.
There was a problem hiding this comment.
In that case I personally would prefer the metal backward compat be specific to the metal backend and the general SplitKernel util to contain the existence ICHECK on the function delimiter.
There was a problem hiding this comment.
@tqchen do we care about backward compatibility or not? If not, it might be ok so far, but we need to decide when we will start to care and what use cases should work and be regularly verified in the future. It must not be a deal only metal backend
@csullivan what bahavior do you propose in common SplitKernel? Metal backend uses it, it cannot just abort execution. Warnings in the runtime are also useles. It might be an error code, but empty returned map detects the absent of Function delimiter with the same efficiency as return code
|
cc @masahi @csullivan please help to review the PR |
8791c68 to
939fd09
Compare
939fd09 to
070bba4
Compare
|
@tqchen please take a look again |
tqchen
left a comment
There was a problem hiding this comment.
One additional comment on updating the bundles
| std::string delimiter) { | ||
| std::unordered_map<std::string, std::string> split_kernels; | ||
| if (source.size()) { | ||
| size_t begin = source.find(delimiter); |
There was a problem hiding this comment.
I noticed you deleted ICHECK here, probably because the error message referred to OpenCL specifically, but there still should be an
ICHECK(begin != std::string::npos)
on the first search for a kernel delimiter. If none are found, then it is better to catch an error in the code generation early.
There was a problem hiding this comment.
I did it to keep backward compatibility as @elvin-nnov requested. In case when the Metal library was generated before this change, we won't find any delimiters in the source code.
But I added some checks to opencl module:
https://github.com/apache/tvm/pull/7980/files#diff-9d619c5f05402f93afa552f93675dea84d789f8f6b0c553845aa183128f38494R192-R196
Also, I added check in metal module were kernels found or not: https://github.com/apache/tvm/pull/7980/files#diff-dc30040e2ec938d8dd0e6e87d6517870f037093775d4b52e0f249fdbebfdb9a3R95-R101
There was a problem hiding this comment.
I would prefer the metal backward compat be specific to the metal backend and the general SplitKernel util to contain the existence ICHECK on the function delimiter.
csullivan
left a comment
There was a problem hiding this comment.
Please see suggested change ⬇️ after that LGTM.
| std::string delimiter) { | ||
| std::unordered_map<std::string, std::string> split_kernels; | ||
| if (source.size()) { | ||
| size_t begin = source.find(delimiter); |
There was a problem hiding this comment.
I would prefer the metal backward compat be specific to the metal backend and the general SplitKernel util to contain the existence ICHECK on the function delimiter.
Refactor Metal module to build each kernel separately. This should help
to avoid potential problem then generated blockSize is more than
maxTotalThreadsPerThreadgroup.
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.