[RELAY][BACKEND] Enable PlanMemory in the graph runtime.#2120
[RELAY][BACKEND] Enable PlanMemory in the graph runtime.#2120tqchen merged 4 commits intoapache:masterfrom
Conversation
|
@tqchen quick question - are there plans to allow dynamic memory allocation in the graph runtime, which would allow variable shapes? I believe that's not currently supported, but was curious if you had plans there. |
|
@ajtulloch yes, I think the plan is to write a new runtime system soon ™️. A few of us are working on a PLDI submission, and expect to ship a bunch of improvements/fixes/features post deadline. |
|
Overall looks good to me, a bit tired from the PLDI push so maybe someone else should do a pass. |
|
@tqchen Quite busy recently. But I will try my best to spend sometime to do a round tonight if it is not too late. |
|
When we start to move into NNVMv2(relay) we have this clear separation of compiler and runtime. The migration starts as two-phase process, and we are in the first step that moves the compiler, but keeps the old graph runtime. I think we can expect the static graph runtime to exist for a while, but we can also explore the possibility of new backends that breaks different assumptions(e.g. dynamic memory alloca, control flow). Luckily the IR is expressive enough to represent all these workloads. There is also a tradeoff here, depending on whether we want to allow JIT, how big the runtime is, etc. So I can imagine it could be possible that we build several of them. @ajtulloch I think it is a good time to hear opinions from everyone on what do we need |
|
there is not an existing RFC, how about we open a new one? |
|
opened in #2122 |
|
@ajtulloch RFC seems like a great idea, would look forward to figuring out what everyone is interested in, and what people are looking to do. |
|
@tqchen I only have some nit comments. Overall LGTM. |
| } | ||
|
|
||
| void VisitExpr_(const TupleNode* op) final { | ||
| // Do nothing. |
| std::unordered_map<const ExprNode*, std::vector<StorageToken*> > token_map_; | ||
|
|
||
| /*! | ||
| * \brief call get token to get the necessary token. |
| } | ||
| // create token for the call node. | ||
| CreateToken(op, true); | ||
| // check if there is orphaned output that can be released immediately/ |
| struct StorageToken { | ||
| /*! \brief Reference counter */ | ||
| int ref_counter{0}; | ||
| /*! \brief numbe of bytes */ |
| } | ||
|
|
||
| void VisitExpr_(const OpNode* op) final { | ||
| // Do nothing. |
There was a problem hiding this comment.
just try to learn, what is the default behavior if such function is not defined?
There was a problem hiding this comment.
by default, it will recursively visit, which is fine, just to make it explicit
| << ttype->shape; | ||
| size *= static_cast<size_t>(pval[0]); | ||
| } | ||
| size *= (ttype->dtype.bits() * ttype->dtype.lanes() + 7) / 8; |
There was a problem hiding this comment.
add comments for magic number 7 & 8?
There was a problem hiding this comment.
IMO this should be refactored into a round_up/div_round_up function.
There was a problem hiding this comment.
+1, it might be necessary to have an "alignment" function which takes byte, word, or dword, etc.
|
Thanks, @ajtulloch @yzhliu @zhiics, i have addressed the comments. |
|
Thanks @ajtulloch @yzhliu @zhiics this is merged |
This PR implements PlanMemory for graph runtime codegen backend of Relay. It also contains a few other improvements
The algorithm is basically the same from NNVM. We do need to introduce a storage token and have an initialization phase that propagates and calculate expected reference count of the token, before we run the greedy allocation algorithm.