[CUDA] Initial support for dynamic shared memory#8466
Conversation
| return "global" + tag; | ||
| case StorageRank::kShared: | ||
| return "shared" + tag; | ||
| case StorageRank::kDynShared: |
There was a problem hiding this comment.
Let us use the same storage scope as shared, but introduce a different tag. e.g. shared.dyn. The scope rank was used to rank the memory hierachy and needs to also made consistent with the thread rank. In this case dynamic shared memory is a special case of shared memory
There was a problem hiding this comment.
I changed dynamic shared memory to be a tagged memory for kShared, but I had to add a check e->scope.tag != ".dyn" in a few places in lower_device_storage_access_info.cc and storage_rewrite.cc to workaround error/segfault.
| * Call(f, | ||
| * [arg1, arg2, ..., arg_n, | ||
| * work_size_1, work_size_2, ... work_size_m]) | ||
| * work_size_1, work_size_2, ... work_size_m, dyn_shmem_size]) |
There was a problem hiding this comment.
Let us introduce a special meta-data to indicate that dynamic shared memory is used. This is to make sure the calling convention is backward compatible when dyn shared memory is not provided.
constexpr const char* kDeviceUseDynSharedMemory = "tir.device_use_dyn_shared_memory";
| w.work_size[arg_index_map_[i]] = size; | ||
| } | ||
| } | ||
| w.dyn_shmem_size = static_cast<size_t>(x.values[base_ + arg_index_map_.size()].v_int64); |
There was a problem hiding this comment.
We need to consider backward compatibility for kernels where dynamic shared memory is not used.
Update https://github.com/apache/tvm/blob/main/src/runtime/meta_data.h#L103 to add a use_dyn_shared_memory(default to false) field, update json reader to use DeclareOptionalField, to optionally read this field in. Only decode the value when the flag is true.
| if (!reader->Read(&name)) return false; | ||
| if (!reader->Read(&arg_types)) return false; | ||
| if (!reader->Read(&thread_axis_tags)) return false; | ||
| if (!reader->Read(&use_dyn_shared_memory)) return false; |
There was a problem hiding this comment.
@tqchen I didn't find a way to optionally read this attribute for the binary reader.
There was a problem hiding this comment.
I think. let us think about another way to do backward compact then.
Since binary backward compatibility is more important to us here.
Here is one solution that also helps future backward compatibility.
Let us rename thread_axis_tags => launch_param_tags, and use dyn_shared_memory as a tag for the launching parameter of dynamic shared memory. We can then change the ThreadAxisConfig => LaunchParamConfig.
There was a problem hiding this comment.
Let us try the ^ change and do a test to make sure we can load back old cuda binaries
There was a problem hiding this comment.
If we do the rename thread_axis_tags => launch_param_tags, the JSON reader would break backward compat at helper.DeclareField("launch_param_tags", &launch_param_tags); .
Or shall we do the rename but keep the old attribute name for JSON reader/writer? e.g.
helper.DeclareField("thread_axis_tags", &launch_param_tags);.
Or shall we have both new and old attribute names as optional field, e.g.
helper.DeclareOptionalField("launch_param_tags", &launch_param_tags);
helper.DeclareOptionalField("thread_axis_tags", &launch_param_tags); // for backward compatibility
There was a problem hiding this comment.
Went with the latter solution, and confirmed that an old saved module can be loaded via both json and binary.
There was a problem hiding this comment.
We can remove the thread_axis_tags later
| if (!reader->Read(&name)) return false; | ||
| if (!reader->Read(&arg_types)) return false; | ||
| if (!reader->Read(&thread_axis_tags)) return false; | ||
| if (!reader->Read(&use_dyn_shared_memory)) return false; |
There was a problem hiding this comment.
Let us try the ^ change and do a test to make sure we can load back old cuda binaries
|
@tqchen Backward compat issue has been addressed. |
* send dyn shmem size to runtime * add dyn shared storage scope * associate buffer var and its storage scoe in split_host_device * tried NVPTX but failed with INVALID_PTX error * test stub * dynamic shmem reduce working * log2 issue fixed * nvptx working * refactor llvm shmem allocation * make linkage argument * support rocm too * send dyn shmem param to hip runtime * remove alloc map from split_host_device.cc * remove attr::storage_scope from split_host_device * lint fix * formatting * update calling convention doc * minor update to test * remove log * remove kDynShared, dyn.shared -> shared.dyn * support backward compat * update json/binary reader/writer * thread_axis_tags -> launch_param_tags * ThreadAxisConfig -> LaunchParamConfig * remove use_dynamic_shared_memory from FunctionInfo meta data * revert change in test_tir_ir_builder.py * make sure kUseDynamicSharedMemoryTag is the last tag * remove continue * update doc string following name change * more comment update following name change Co-authored-by: masa <masa@pop-os.localdomain> Co-authored-by: Masahiro Masuda <masahi@129@gmail.com>
* send dyn shmem size to runtime * add dyn shared storage scope * associate buffer var and its storage scoe in split_host_device * tried NVPTX but failed with INVALID_PTX error * test stub * dynamic shmem reduce working * log2 issue fixed * nvptx working * refactor llvm shmem allocation * make linkage argument * support rocm too * send dyn shmem param to hip runtime * remove alloc map from split_host_device.cc * remove attr::storage_scope from split_host_device * lint fix * formatting * update calling convention doc * minor update to test * remove log * remove kDynShared, dyn.shared -> shared.dyn * support backward compat * update json/binary reader/writer * thread_axis_tags -> launch_param_tags * ThreadAxisConfig -> LaunchParamConfig * remove use_dynamic_shared_memory from FunctionInfo meta data * revert change in test_tir_ir_builder.py * make sure kUseDynamicSharedMemoryTag is the last tag * remove continue * update doc string following name change * more comment update following name change Co-authored-by: masa <masa@pop-os.localdomain> Co-authored-by: Masahiro Masuda <masahi@129@gmail.com>
Only one allocation is allowed for now. See the test case for usage.
@tqchen @junrushao1994 @vinx13