[AutoScheduler] Remove max_registers_per_block in HardwareParams#7040
Merged
masahi merged 2 commits intoapache:mainfrom Dec 5, 2020
Merged
[AutoScheduler] Remove max_registers_per_block in HardwareParams#7040masahi merged 2 commits intoapache:mainfrom
max_registers_per_block in HardwareParams#7040masahi merged 2 commits intoapache:mainfrom
Conversation
Member
Author
|
cc @jcf94 |
comaniac
requested changes
Dec 5, 2020
|
|
||
| device_api->GetAttr(ctx, tvm::runtime::DeviceAttrKind::kMaxRegistersPerBlock, &ret); | ||
| int max_registers_per_block = ret; | ||
| int max_local_memory_per_block = INT32_MAX; |
Contributor
There was a problem hiding this comment.
Add a comment as the PR description.
| // This setting looks working for Metal GPUs later than A10 | ||
| int max_shared_memory_per_block = 32 * 1024; | ||
| int max_registers_per_block = 4 * 1024; | ||
| int max_local_memory_per_block = INT32_MAX; |
Member
|
ok I'll update my PR #7038 after we let this in first. |
f9e7def to
26cd727
Compare
Member
Author
|
@comaniac Comments are addressed. |
masahi
approved these changes
Dec 5, 2020
Member
|
thanks @merrymercy @comaniac |
TusharKanekiDey
pushed a commit
to TusharKanekiDey/tvm
that referenced
this pull request
Jan 20, 2021
…pache#7040) * [AutoScheduler] Fix hardware params * address comments
trevor-m
pushed a commit
to neo-ai/tvm
that referenced
this pull request
Jan 21, 2021
…pache#7040) * [AutoScheduler] Fix hardware params * address comments
electriclilies
pushed a commit
to electriclilies/tvm
that referenced
this pull request
Feb 18, 2021
…pache#7040) * [AutoScheduler] Fix hardware params * address comments
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, we use
hardware_params->max_registers_per_blockgot from Cuda device query as the value ofmax_local_memory_per_blockinVerifyGPUCode. This is wrong. They are just not the same thing.Luckily, for NVIDIA GPUs, this bug does not affect the performance. Because
kMaxRegistersPerBlockreturns a very large value. The check inVerifyGPUCodewith this large value almost affects nothing.We have to rename
hardware_params->max_registers_per_blockto a correct namehardware_params->max_local_memory_per_block, so it is more meaningful for other backends.A better way is to set it as
INT32_MAXto simply skip this check. Because there is no hard limitation in the CUDA runtime for this value. Setting it toINT32_MAXcan enlarge the search space while keeping most of the measured schedules still valid.