Generate dot() in the Metal backend#7085
Conversation
|
Monday morning review ping |
|
We also should think longer-term how we want to do these kinds of pattern matches. @rootjalex @abadams In the future, should we consider moving some of these kinds of rules into earlier passes and do them via pattern-matching? |
|
Marking for backport to release/15.x |
|
@shoaibkamil We've discussed separating instruction selection from CodeGen (and I have a slightly stale but still active PR open for doing so on x86, #6884). I was planning on doing this for ARM/HVX as well, but we could definitely do this for some of the GPU backends as well. I think we still need to spend some time on the correct design of the IR for this sort of thing, I think @abadams and I have not come to a conclusion on a few design principles. |
And I know HVX already has HexagonOptimize - but I want to turn some of those passes into proper term-rewriting systems. The current model of "exact pattern goes to specific intrinsic" is rather restrictive and does not support many of the rules that my project has generated. |
* dot() support for Metal backend) * Restrict dot() to floats
* dot() support for Metal backend) * Restrict dot() to floats
* dot() support for Metal backend) * Restrict dot() to floats
* Generate dot() in the Metal backend (#7085) * dot() support for Metal backend) * Restrict dot() to floats * Fix subtle CMake Install bugs (#7103) * Update CMakeLists.txt * Update CMakeLists.txt * Fix some dead links to the 'master' branch (#7107) * Attempt to fix pip build issues (#7098) * Add evaluate() and evaluate_may_gpu() to Python bindings (#7108) * Add evaluate() and evaluate_may_gpu() to Python bindings * pacify clang-tidy Co-authored-by: Volodymyr Kysenko <vksnk@google.com> Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com>
* dot() support for Metal backend) * Restrict dot() to floats
Basically, it will generate dot() call for vector_reduce(Add, mul(float, float)). I've tested it locally to make sure it is actually generated. It'd be nice to have something similar to simd_op_check for GPU targets, but it doesn't exist (#7084).