[Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for Cuda & X86#7148
[Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for Cuda & X86#7148comaniac merged 4 commits intoapache:mainfrom
Conversation
…for both cuda & x86
tkonolige
left a comment
There was a problem hiding this comment.
What was the issue holding up CSR scheduling?
We already talked about performance for this right?
Cuda scheduling for Sparse_dense Op is internally changed to Sparse_dense_padded. But it works only when multiple of warp_size, but if it is lower than that, there is no fallback scheduling for CSR, so i have resolved that part here. Please let me know in case i am not clear. Thanks! |
|
Gentle ping @tkonolige ! |
There was a problem hiding this comment.
So I just hit the bug that this fixes. Can we add a test to make sure we don't hit it again in the future. Here is the test I wrote:
@tvm.testing.requires_cuda
def test_sparse_dense_padded_alter_op():
with tvm.target.Target("cuda"):
M = 128
N = 16
K = 128
X_np = np.random.randn(M, K).astype("float32")
W_sp_np = random_bsr_matrix(N, K, 2, 2, density=0.01, dtype="float32")
x = relay.var("x", relay.TensorType(X_np.shape,"float32"))
mult = relay.op.nn.sparse_dense(
x,
(
relay.Constant(tvm.nd.array(W_sp_np.data)),
relay.Constant(tvm.nd.array(W_sp_np.indices)),
relay.Constant(tvm.nd.array(W_sp_np.indptr)),
),
)
f = relay.Function([x], mult)
f_ = relay.transform.InferType()(tvm.IRModule.from_expr(f))
f_ = relay.transform.AlterOpLayout()(f_)
assert f_["main"].body.op.name == "nn.internal.sparse_dense_padded"
# build with cuda and AlterOpLayout to ensure that sparse_dense_padded is has an implementation
with tvm.transform.PassContext(opt_level=3, required_pass="AlterOpLayout"):
x = relay.build(tvm.IRModule.from_expr(f), target=tvm.target.Target("cuda"))in tests/python/topi/python/test_topi_sparse.py
Thanks @tkonolige ! The test case is added now. |
There was a problem hiding this comment.
Looks good! @comaniac @junrushao1994 I think this is ready to merge. (Assuming it passes CI).
|
Thanks @ANSHUMAN87 @tkonolige |
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
This is a follow up PR.
cc @tkonolige , @FrozenGene !