[RUNTIME] Fix the manual determination of cores in FillDataForMeasure#13849
[RUNTIME] Fix the manual determination of cores in FillDataForMeasure#13849echuraev merged 9 commits intoapache:mainfrom
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
| void Run(int i) { | ||
| int64_t chunk_size = size / num_threads; | ||
| void Run(int i, int num_threads) { | ||
| int64_t chunk_size = ceil(size / num_threads); |
There was a problem hiding this comment.
In case when penv->num_task is a pretty big number, then is it still correct to use this number as a divider for size? And how prev->num_task is correlate with number of threads?
Is it possible that size is less than num_threads?
There was a problem hiding this comment.
is it still correct to use this number as a divider for size.
Yes and no simultaneously. With current implementation it is incorrect. There is missed check at line below int64_t st = std::min(i * chunk_size, size);. With adding this line it will be correct.
how prev->num_task is correlate with number of threads?
It's one and the same. TVMBackendParallelLaunch API reference
Is it possible that size is less than num_threads?
Yes, it's possible.
|
|
||
| void Run(int i) { | ||
| int64_t chunk_size = size / num_threads; | ||
| void Run(int i, int num_threads) { |
There was a problem hiding this comment.
| void Run(int i, int num_threads) { | |
| void Run(int i, int num_tasks) { |
|
@tvm-bot rerun |
1 similar comment
|
@tvm-bot rerun |
57cc953 to
7a7a8f3
Compare
Motivation: Assertion failed during tuning
Error message from thread_pool.cc:295:
Check failed: num_task <= num_workers_used_ (8 vs. 1) : Request parallel sync task larger than number of threads used workers=1 request=8Main problem description:
Tuning of the ARM Snapdragon 888 CPU architecture ends with an error above.
Suspected reason:
Incorrect (manual) determination of the number of threads. The number of threads is determined using MaxConcurrency and returns 8 threads for this architecture, but the number of actually used threads is 4. This fix urges to use automatic determination of the number of threads by passing 'zero' as 'num_threads' attribute in 'TVMBackendParallelLaunch' to avoid the abovementioned discrepancy.