[TIR] Handle axis_separators during FlattenBuffer#12652
Conversation
For buffers with more than one physical axis, the `axis_separators` are required in order to know which groups of logical axes to fuse into each physical axis. The implementation in `tir.FlattenBuffer` assumed that all buffers were being flattened to a single physical axis. Because `tir.LowerOpaqueBlock` replaces the `BlockNode::alloc_buffers` with `Allocate` nodes, `tir.FlattenBuffer` no longer has access to the axis separators and performs inconsistent flattening for `Allocate` as opposed to `BufferLoad`/`BufferStore`. This was introduced in apache#12172, which decoupled the lowering/flattening steps. The commit reorders the `tir.FlattenBuffer` to occur before `tir.LowerOpaqueBlock`, to make use of the axis separators. Any `Allocate` nodes that exist at that point (e.g. from hand-written schedules) are still flattened to 1-d physical buffers, but the `BlockNode::alloc_buffers` are flattened according to the axis separators.
|
LGTM. Here are my remaining some questions:
|
|
|
||
|
|
||
| class TestGPU(BaseCompare): | ||
| """Buffers allocated inside GPU-specific constructs are ignored. |
There was a problem hiding this comment.
Does the meaning of "ignored" refer to B?
There was a problem hiding this comment.
Whoops, I was mistaken in that comment, and was getting it mixed up with some of the allocation handling in StorageRewrite. Updated the comment.
I like that idea, and having independent order of passes would be better overall. There was some similar logic in
In principle, the Adding a test that validates the buffer shape after running through all of (Side-note: If |
The DeclBuffer node can be inserted during LowerOpaqueBlock, then provide the missing Buffer information required to flatten the allocation.
|
Inserting the I'm also reverting most of the unit test changes, since they should go back to operating on |
With the insertion of `DeclBuffer` nodes, `LowerOpaqueBlock` no longer needs to be before `FlattenBuffer`, and has been moved back to its original position. Revering the tests to use `T.allocate` instead of `T.alloc_buffer` more closely represents the functions as they are being lowered.
Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in apache#12652.
Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in apache#12652.
The DeclBuffer annotations aren't yet supported in all passes. This restricts them to being introduced in LowerOpaqueBuffer, then immediately removed in FlattenBuffer.
|
cc @cconvey |
23cf49b to
018f4f8
Compare
|
@tvm-bot rerun |
|
@wrongtest-intellif Requesting a re-review, if you have the time, as the implementation changed significantly from the reviewed/approved version. |
wrongtest-intellif
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the great efforts~
Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in apache#12652.
) Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in #12652.
* [TIR] Moved tir.FlattenBuffer to occur before tir.LowerOpaqueBlock For buffers with more than one physical axis, the `axis_separators` are required in order to know which groups of logical axes to fuse into each physical axis. The implementation in `tir.FlattenBuffer` assumed that all buffers were being flattened to a single physical axis. Because `tir.LowerOpaqueBlock` replaces the `BlockNode::alloc_buffers` with `Allocate` nodes, `tir.FlattenBuffer` no longer has access to the axis separators and performs inconsistent flattening for `Allocate` as opposed to `BufferLoad`/`BufferStore`. This was introduced in apache#12172, which decoupled the lowering/flattening steps. The commit reorders the `tir.FlattenBuffer` to occur before `tir.LowerOpaqueBlock`, to make use of the axis separators. Any `Allocate` nodes that exist at that point (e.g. from hand-written schedules) are still flattened to 1-d physical buffers, but the `BlockNode::alloc_buffers` are flattened according to the axis separators. * Add unit test to validate non-flat memory after tvm.lower * Explicitly write T.reads for test on BufferRegion updates * Update incorrect docstring for test * Use DeclBuffer information in FlattenBuffer The DeclBuffer node can be inserted during LowerOpaqueBlock, then provide the missing Buffer information required to flatten the allocation. * Use T.allocate in unit tests With the insertion of `DeclBuffer` nodes, `LowerOpaqueBlock` no longer needs to be before `FlattenBuffer`, and has been moved back to its original position. Revering the tests to use `T.allocate` instead of `T.alloc_buffer` more closely represents the functions as they are being lowered. * Fix usage of T.decl_buffer in updated tests * Update LowerOpaqueBuffer to expect the DeclBuffer nodes * Strip DeclBuffer annotation in FlattenBuffer The DeclBuffer annotations aren't yet supported in all passes. This restricts them to being introduced in LowerOpaqueBuffer, then immediately removed in FlattenBuffer. * Strip out all DeclBuffer nodes in FlattenBuffer * Update unit tests to remove expectation of DeclBuffer nodes
…che#12662) Previously, the test cases only tested TE-based schedules. This commit runs the same tests for equivalent TIR-based schedules as well. This is intended to catch Hexagon-specific regressions, such as the one resolved in apache#12652.
For buffers with more than one physical axis, the
axis_separatorsare required in order to know which groups of logical axes to fuse into each physical axis. The implementation intir.FlattenBufferassumed that all buffers were being flattened to a single physical axis. Becausetir.LowerOpaqueBlockreplaces theBlockNode::alloc_bufferswithAllocatenodes,tir.FlattenBufferno longer has access to the axis separators and performs inconsistent flattening forAllocateas opposed toBufferLoad/BufferStore. This was introduced in #12172, which decoupled the lowering/flattening steps.The commit reorders theSee #12652 (comment).tir.FlattenBufferto occur beforetir.LowerOpaqueBlock, to make use of the axis separators. AnyAllocatenodes that exist at that point (e.g. from hand-written schedules) are still flattened to 1-d physical buffers, but theBlockNode::alloc_buffersare flattened according to the axis separators.This PR adds a
DeclBuffernode to the output oftir.LowerOpaqueBlock, which is then used duringtir.FlattenBufferto identify the axis separators.cc @Hzfengsy @junrushao1994