Skip to content

[TIR] Tir constants integration into compilation pipeline#8509

Merged
manupak merged 26 commits intoapache:mainfrom
d-smirnov:tir_constant_2
Feb 22, 2022
Merged

[TIR] Tir constants integration into compilation pipeline#8509
manupak merged 26 commits intoapache:mainfrom
d-smirnov:tir_constant_2

Conversation

@d-smirnov
Copy link
Copy Markdown
Contributor

This PR is a follow-up PR to integate tir constant nodes introduced in 8472 to compilation pipeline.

@d-smirnov
Copy link
Copy Markdown
Contributor Author

d-smirnov requested review from anijain2305, areusch, comaniac, jroesch, junrushao1994, jwfromm, kparzysz-quic, MarisaKirisame, masahi, mbrookhart, merrymercy, slyubomirsky, tqchen, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners 12 hours ago

Oh. not sure how did I manage to do this. I am sorry for such a massive query.

Copy link
Copy Markdown
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change-Id: Ia4adca9d3315b26fbdc006ef7c115900c081e303
Change-Id: Ice305f4fefd13fe95e97574e6d63ffeb664621df
Refactored ExtractPrimFuncConstants to IRModule pass.
deDup -> DeDup
Refactored logic of Applicator supplementary class

Change-Id: I6c120d175eb6790ba90f176c4f856bde8f0c7c94
Change-Id: Ie3ee6ea2479476a30f486baef74f20070f117942
Change-Id: I12c63731663b9c9ea574b9ed5cb17311ba3cf701
@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 22, 2022

Thanks @d-smirnov working so long on this to enable representation of non-scalar constants in TIR.

Thanks @giuseros for the initial work..
Thanks @junrushao1994 and @areusch for reviews!

@kparzysz-quic
Copy link
Copy Markdown
Contributor

We're seeing issues on Hexagon after this commit. They are directly caused by float16 not being supported for embedded constants, but there are other potential problems there (difference between target and executor in fuse_ops). See https://discuss.tvm.apache.org/t/problem-with-fuseops-and-embedded-constants-in-tir/12165 for more info. I'm wondering about your thoughts about this.

@tmoreau89
Copy link
Copy Markdown
Contributor

@kparzysz-quic thanks for signaling the regression. Is there a way this problem can be triggered via a simple unit test? Perhaps we can get the authors to issue a patch, or perhaps even we could consider reverting this commit until we can resolve the issue?

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 25, 2022

We can look into this.

@kparzysz-quic , would you be able to submit a PR with an xfail that should have been broken by this PR ?
(It is bit harder to track tests that are not in tree).

They are directly caused by float16 not being supported for embedded constants

case runtime::DataType::TypeCode::kFloat:
switch (arr_type.bits()) {
case 16:
// NOTE: float16 is treated as uint16_t.
element_type = llvm::Type::getIntNTy(*ctx, arr_type.bits());
BuildLLVMVector<uint16_t>(element_type, arr->data, num_elements, &elements);
break;
case 32:
element_type = llvm::Type::getFloatTy(*ctx);
BuildLLVMVector<float>(element_type, arr->data, num_elements, &elements);
break;
case 64:
element_type = llvm::Type::getDoubleTy(*ctx);
BuildLLVMVector<double>(element_type, arr->data, num_elements, &elements);
break;
default:
CHECK(false) << "CodegenParams: only support 32- or 64-bit floating point; saw "
<< arr_type.bits() << "-bit array";
break;
}
break;

case runtime::DataType::TypeCode::kFloat: {
os.fill(' ');
os.setf(std::ios::left, std::ios::adjustfield);
if (arr_type.bits() == 16) {
// NOTE: print types not widely supported by C as uint16_t.
PrintIntegralArray<uint16_t>(arr->data, num_elements, indent_chars, os);
} else if (arr_type.bits() == 32) {
PrintFloatingPointArray<float>(arr->data, num_elements, indent_chars, os);
} else if (arr_type.bits() == 64) {
PrintFloatingPointArray<double>(arr->data, num_elements, indent_chars, os);
} else {
CHECK(false) << "CodegenParams: only support 32- or 64-bit floating point; saw "
<< arr_type.bits() << "-bit array";
}
break;
}

The above are called respectively in the following locations :

auto array = NDArrayToLLVMArray(ctx_, data);

NDArrayDataToC(data, 4, decl_stream);

(These are same code used to codegen LinkedParam node)

So, I am not following what is meant here.. Can you elaborate further ?

To understand a bit more, would you be able to share the relay.build(...) configuration (targets, executor, runtime) that you are using ? Ideally, if the target does not support -link-params, it should be able to disable it or if the target needs them linked (i.e. not handled via executor) it needs codegen tir.AllocateConst node -- for which the support has been added for all backends of TVM (with tests). By naive, look at the Hexagon codegen it seems like it is extending codegen LLVM.

So what we need to understand, what is the expectation of Hexagon backend when linked-param is used ?
(I suppose it requires LinkedParam nodes ? )

I am sensing we might need to move link-param from being executor attr to a target attr . cc : @Mousius @d-smirnov

Im a bit concerned by the suggestion to revert based on non-tree tests. FYI : @u99127

@kparzysz-quic
Copy link
Copy Markdown
Contributor

We have a temporary workaround---just reset the link_params variable to false in fuse_ops, so the urgency for us is lower.

There are two issues that need to be addressed:

  1. The link_params property is inconsistent with the CPU target in FuseOps, when invoked through FoldConstants.
  2. Float16 not being supported in AllocateConst visitors, e.g. https://github.com/apache/tvm/blob/main/src/relay/backend/te_compiler_cache.cc#L242

Hexagon does utilize the LinkedParams function, but we can change that as needed. So far, without any workarounds, we don't even get to the Hexagon codegen.

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 25, 2022

Hi @kparzysz-quic ,

This PR is about non-scalar constants, 2) links to scalar constants -- maybe that feature is yet to be supported anyway ?

For 1) maybe lets create an issue -- Im trying to understand the relevance to this PR, @d-smirnov thoughts ?

@kparzysz-quic
Copy link
Copy Markdown
Contributor

kparzysz-quic commented Feb 25, 2022

Right---the relevance is that I tried setting "relay.FuseOps.link_params" to False in PassContext/config, but it didn't have any effect, since FoldConstants creates its own empty context. It also uses a new CPU target for folding, which has link_params=False. However, FuseOps uses the Executor attribute from IR module, which has link_params taken from the original target (i.e. True).

@kparzysz-quic
Copy link
Copy Markdown
Contributor

The changes in fuse_ops that this PR made (obtaining the value of link_params) created a different behavior even for scalar constants. I think the immediate issue is really with how the link_params variable is set in fuse_ops.

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 25, 2022

However, FuseOps uses the Executor attribute from IR module, which has link_params taken from the original target (i.e. True).

I am still a bit puzzled.

https://github.com/apache/tvm/pull/8509/files#diff-8b95b87a2611e5e0c367ce17396b051fd868aa260c51ce0e50b94564fcb0e71fR1054-R1057

I read the above code as PassContext overriding whatever provided by the Executor.
cc : @d-smirnov

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 25, 2022

Would you be able to provide what should be the "correct" behaviour ?

@kparzysz-quic
Copy link
Copy Markdown
Contributor

kparzysz-quic commented Feb 25, 2022

However, FuseOps uses the Executor attribute from IR module, which has link_params taken from the original target (i.e. True).

I am still a bit puzzled.

https://github.com/apache/tvm/pull/8509/files#diff-8b95b87a2611e5e0c367ce17396b051fd868aa260c51ce0e50b94564fcb0e71fR1054-R1057

I read the above code as PassContext overriding whatever provided by the Executor. cc : @d-smirnov

Line 1054: If executor is present in IRModule, set link_params to the value from the executor. This sets link_params to True, even though the current target is CPU (since the executor was created based on the original target "hexagon").

Line 1057: If the current config has a key "relay.FuseOps.link_params", set link_params to that value. Otherwise, use the value from line 1054. Here the config is empty, since FoldConstants creates an empty one, so the default value is used (i.e. the one from 1054).

@kparzysz-quic
Copy link
Copy Markdown
Contributor

I think you mentioned that link_params should be a property of the target. If fuse_ops got it from the target, everything would work fine, since it would get it from "CPU" instead of "hexagon". I think that would be the correct behavior.

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 25, 2022

Hmmm, Just to understand this a bit better, what is the target string being used in your case ?

Secondly, does the hexagon backend requires the constants binded to the IR, eventually to be codegen'd or does it require them to be provided at the runtime ?

@kparzysz-quic
Copy link
Copy Markdown
Contributor

kparzysz-quic commented Feb 25, 2022

The first failing testcase is resnet50:

This is the script that reproduces the crash is:

import coremltools
import onnx
import onnxmltools
import tvm
import tvm.contrib.hexagon
from tvm import relay
from tvm.relay.transform import InferType, ToMixedPrecision

dtype_dict = {"data": "float32"}
shape_dict = {"data": [1,3,224,224]}

#input name and path for your caffe model
proto_file = './ResNet-50-deploy.prototxt'
input_caffe_path = './ResNet-50-model.caffemodel'

# Convert Caffe model to CoreML
coreml_model = coremltools.converters.caffe.convert((input_caffe_path, proto_file))
# Convert the Core ML model into ONNX
onnx_model = onnxmltools.convert_coreml(coreml_model)
onnxmltools.utils.save_model(onnx_model, 'resnet-50.onnx')
mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
mod = InferType()(mod)
mod = ToMixedPrecision("float16")(mod)

target = tvm.target.hexagon("v68", link_params=True)
config = {"relay.FuseOps.link_params":0}                    # <-- doesn't do anything
with tvm.transform.PassContext(opt_level=3, config=config):
    lib = relay.build(mod, target, target_host=target, params=params, mod_name="default")

At the bottom of the crash dump you should see

  0: tvm::relay::tec::ScheduleBuilder::VisitExpr_(tvm::relay::ConstantNode const*)::{lambda(tvm::runtime::Array<tvm::tir::Var, void> const&)#1}::operator()(tvm::runtime::Array<tvm::tir::Var, void> const&) const
  File "/w/src/dmlc/tvm/src/relay/backend/te_compiler_cache.cc", line 242
TVMError: float16 not handled

The target string is hexagon -keys=hexagon -link-params=1 -mattr=+hvxv68,+hvx-length128b -mcpu=hexagonv68 -mtriple=hexagon.

We don't yet have codegen for AllocateConst specific to Hexagon, but if the LLVM codegen handles it, it will probably work for us as well. Right now we use the link_params function, which contains all the constants in it, but is called at runtime to supply them to the model.

Edit: This will crash with the upstream code as well, so if you have the same resnet50 as we do, you should be able to reproduce this crash.

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 25, 2022

config = {"relay.FuseOps.link_params":0}

def test_fuse_take(link_params):
"""Test fusion case involving concat and take"""
def before():
shape = (tvm.tir.const(10, "int64"), tvm.tir.const(1, "int64"))
x = relay.var("x", shape=shape)
concat = relay.concatenate([x, x], axis=-1)
out = relay.op.take(concat, indices=relay.const([0], dtype="int64"))
return relay.Function(relay.analysis.free_vars(out), out)
def expected(link_params):
shape1 = (tvm.tir.const(10, "int64"), tvm.tir.const(1, "int64"))
shape2 = (tvm.tir.const(1, "int64"),)
x = relay.var("x", shape=shape1)
p0 = relay.var("p0", shape=shape1)
p1 = relay.var("p1", shape=shape2, dtype="int64")
c = relay.const([0], dtype="int64")
concat = relay.concatenate([p0, p0], axis=-1)
out = relay.op.take(concat, indices=c if link_params else p1)
f0 = relay.Function([p0] if link_params else [p0, p1], out)
f0 = f0.with_attr("Primitive", tvm.tir.IntImm("int32", 1))
y = relay.Call(f0, [x] if link_params else [x, c])
return relay.Function([x], y)
after = run_opt_pass(expected(link_params), transform.InferType())
with tvm.transform.PassContext(opt_level=2, config={"relay.FuseOps.link_params": link_params}):
m = run_opt_pass(before(), transform.InferType())
m = run_opt_pass(m, transform.FuseOps())
assert tvm.ir.structural_equal(m, after)

I think this is respected. maybe we should need to verify when Executor has link-params ?

But after having read this all again -- the actual issue link-params will be set to True if the Executor has it. It seems like hexagon backend was reliant on FoldConstant behaviour of always having link-params set to False. Is it not ?

If so the solution might be to provide link-params override for FoldConstants as well ?

Something feels like link-params behaviour for FuseOps pass in the main compilation pipeline needs to be True. Am I wrong here ?
Or are you saying that if you hardcode FuseOps link_params to be False always, it all works for you ?

@u99127
Copy link
Copy Markdown

u99127 commented Feb 25, 2022

Can ? / should this discussion please be moved into a proper issue ?

Ramana

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Feb 25, 2022

Yes I think it make sense -- @kparzysz-quic lets create an issue and move it there.

@driazati
Copy link
Copy Markdown
Member

driazati commented Mar 3, 2022

@manupa-arm and @junrushao1994 can you tag me on changes to CI scripts in the future? The changes here in tests/scripts/ and tests/lint/python_format.sh broke linting CI on main in a roundabout way

@manupak
Copy link
Copy Markdown
Contributor

manupak commented Mar 3, 2022

Oops sorry about this! Will do
Attn : @d-smirnov

@driazati
Copy link
Copy Markdown
Member

driazati commented Mar 3, 2022

thanks! btw, it's fixed in #10469 so no action here is necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants