Add Adreno GPU target and topi supporting textures with dynamically allocated textures by elvin-n · Pull Request #11161 · apache/tvm

elvin-n · 2022-04-28T12:01:39Z

There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d
for NCHW/NHWC, average/max pooling
Fix of dynamically allocated textures caching
Add texture-nhwc scope
Fix issue with codegen of vars having non acceptable symbols

elvin-n · 2022-05-02T20:28:53Z

@csullivan Could you please take a look?

csullivan

Looks great @elvin-n! I've reviewed everything except the schedules which I will do in a follow up pass.

Note as this is a squash I would suggest use of Co-authored-by in the commit to reflect the co-authorship.

csullivan · 2022-05-02T20:38:34Z

     *         e.g. image2d[height=O, width=IHW]
     */
    kImage2DWeight,
+    kTexture2DNHWC,


Note: We can now support arbitrary layouts with transform_layout which I will suggest we move to. It will require some rework on the TIR lowering. I don't suggest this block these schedules from being upstreamed now, but we should circle back on this soon.

Should we add any AR/TODO into the code?

I like that idea. Something like,

TODO(tvm-team): Uncouple use of storage scope and data layout by using the transform_layout schedule primitive to express the desired texture layout. This will require supporting Nd indices in BufferLoad and BufferStore in CodegenOpenCL, and ensuring Nd allocations for texture are correctly routed to the AllocateTexture packed function in the OpenCL DeviceAPI.

csullivan · 2022-05-02T20:43:19Z

+        elif data_layout == "NHWC4c":
+            ic = data.shape[3] * data.shape[4]
+        else:
+            # TODO(amalyshe) add proper error raising


Address the TODOs

csullivan · 2022-05-02T21:07:45Z

+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name,unused-variable,unused-argument,no-member
+"""Conv2D alter op and legalize functions for x86"""


csullivan · 2022-05-02T21:09:15Z

+from ..utils import get_const_tuple
+
+
+def getDiv(value, start):


snake_case to match the rest of the file

csullivan · 2022-05-02T21:19:10Z

+    ----------
+    out: tuple of the (chunks, block, tail)
+    """
+    tail = trip_count % 4


Use block throughout

csullivan · 2022-05-02T21:38:02Z

+    in_channel_tail: int
+        Tail in the latest chunk diffing original number of channels vs blocked one
+        If in_channel_tail != in_channel_block:
+          original_channels = in_channel_chunks * in_channel_block - in_channel_tail


nit: consider referring to this as padding_tail so that it's clear this isn't the remainder of a floordiv. anything to make this a little more clear upfront, took me a bit to understand given the current naming convention. Same comment for filter api below.

tried to do my best

csullivan · 2022-05-02T21:40:50Z

+    def _reorder_data_nchw(*indices):
+        condition = []
+        condition.append(indices[1] == in_channel_chunks - 1)
+        condition.append(indices[4] >= in_channel_tail)
+        condition = tvm.tir.all(*condition)
+        return tvm.tir.if_then_else(
+            condition,
+            pad_value,
+            Input[indices[0], indices[1] * in_channel_block + indices[4], indices[2], indices[3]],
+        )
+
+    def _reorder_data_nhwc(*indices):
+        condition = []
+        condition.append(indices[3] == in_channel_chunks - 1)
+        condition.append(indices[4] >= in_channel_tail)
+        condition = tvm.tir.all(*condition)
+        return tvm.tir.if_then_else(
+            condition,
+            pad_value,
+            Input[indices[0], indices[1], indices[2], indices[3] * in_channel_block + indices[4]],
+        )


Note: Explicit buffer layout padding as part of transform_layout is on the roadmap and will appear in RFC soon. Putting a note here to note that explicit layout transformations like this should be unnecessary in the future.

added comment and reference to rfc

csullivan · 2022-05-02T21:43:10Z

+    in_height, in_width, kernel_h, kernel_w, dilation_h, dilation_w, padding, stride_h, stride_w
+):
+    """
+    Expands spatial dimensions to be dividable by factor 4. This will allow us to do extrimely


Typos

Suggested change

Expands spatial dimensions to be dividable by factor 4. This will allow us to do extrimely

Expands spatial dimensions to be dividable by factor 4. This will allow us

could you please point where typos are?

csullivan · 2022-05-02T21:44:06Z

+        Height of the feature map
+
+    in_width: int
+        Width of the featrue map


Suggested change

Width of the featrue map

Width of the feature map

csullivan · 2022-05-02T21:58:29Z

+    # certain limitation of the Qualcomm devices. Subject to be determined for certain device
+    # individually, but until we have access to remote device during compilation, we have to
+    # define it uniformly for all target devices
+    limit = 16384


Let us use the Target attributes for this, and specifically use the attribute preprocessor as is done for cuda here. Add image extent to the attribute list for the device api and use it when calling DetectDeviceFlag to query the size limits of the opencl image on the remote device.

I added new texture_spatial_limit attribute to opencl target, added to the DeviceAttrKind and runtime_ctypes in python, but not sure if it was required since I don;t know how and when to use DetectDeviceFlag as well I have an access to the texture_spatial_limit in the python part through tvm.target.Target.current().attrs["texture_spatial_limit"]
I would consider this as "addressed" but need to understand if my solution is applicable and if we need parts related to DeviceAttrKind

- There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d for NCHW/NHWC, average pooling - Fix of dynamically allocated textures caching - Add texture-nhwc scope - Fix issue with codegen of vars having non acceptable symbols Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com>

elvin-n · 2022-05-04T21:18:39Z

Note as this is a squash I would suggest use of Co-authored-by in the commit to reflect the co-authorship.

Done

Co-authored-by: Li <quic_lih@quicinc.com>

csullivan · 2022-05-11T15:47:28Z

+
+    pad_data, kernel = s[conv].op.input_tensors
+
+    s[pad_data].compute_inline()


Are you meaning to inline padding here? Your comment above implies that you intend to do otherwise.

It is inlined into next stage - cache read for textures

AT = s.cache_read(pad_data, "global.texture", [conv]) bind_data_copy(s[AT])

If I do not add s[pad_data].compute_inline() the schedule would not be complete and would claim about missing of some bindings

csullivan · 2022-05-11T15:53:38Z

+from tvm.contrib import graph_runtime
+
+
+def get_reference(mod, params1, input_shape, inputs):


Common utility shared in other test files, consider adding to the utils subdir.

moved shared functions into utils/adreno_utils.py

csullivan · 2022-05-11T15:55:06Z

+
+
+# build module run with opencl and cpu, compare results
+def build_run_compare(


Common utility

moved shared functions into utils/adreno_utils.py

csullivan · 2022-05-11T15:59:09Z

+
+
+@tvm.testing.requires_opencl
+def test_conv2d_yolov3_v2_nchw_3c():


Do these tests pass on a local opencl device (e.g. with an nvgpu?). If not, it would be good to skip the tests that depend on the RPC tracker env vars if they are not set if they require a remote device.

I have not verified with nvidia gpu, but they pass successfully on intel integrated graphics and enabled opencl in the platform and tvm. I need to verify if tests run in the CI, but cannot do this due to issues with GPU build in CI

@csullivan Looked into CI test results and got an impression that all opencl tests are disabled. It seems we need to enable them in CI but in separate PR

That's accurate, and I agree we can consider enabling them in CI in a separate PR. If you see that these tests pass when running locally and without and RPC tracker that is sufficient.

csullivan · 2022-05-11T16:02:13Z

    .add_attr_option<Bool>("system-lib")
    .add_attr_option<Integer>("max_num_threads", Integer(256))
    .add_attr_option<Integer>("thread_warp_size", Integer(1))
+    .add_attr_option<Integer>("texture_spatial_limit", Integer(16384))


Thanks for adding this. An improvement would be to query the remote device using a call to the device api GetAttr using the target attr preprocessor.

I still do not fully understand the usage model. I left for a while only definition of texture_spatial_limit in opencl target and access in python because adding of kTextureSpatialLimit in DeviceAttrKind caused a fail during compilation of cuda and as I do not fully understand usage model, don't know how to fix this properly. If I need to extend cuda as well for this constant or just ignore and if ignore in which place kTextureSpatialLimit should be used

csullivan

LGTM with a few final nits

csullivan · 2022-05-13T15:43:20Z

Many thanks for the great work @elvin-n, @echuraev, @lhez. This is merged.

…llocated textures (apache#11161) * Add Adreno GPU target and topi supporting textures - There are 5 compute/schedules: conv2d for NCHW/NHWC, depthwise_conv2d for NCHW/NHWC, average pooling - Fix of dynamically allocated textures caching - Add texture-nhwc scope - Fix issue with codegen of vars having non acceptable symbols Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> * Address comments * Add vectorization into some adreno pool flow Co-authored-by: Li <quic_lih@quicinc.com> * Fix adreno tests for running on the opencl host platform * remove unnecessary kDriverVersion in DeviceAttrKind * Move utils adreno functinos to separate shared file * fix black hits Co-authored-by: Chris Sullivan <csullivan@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Li <quic_lih@quicinc.com>

elvin-n force-pushed the scout/adreno branch 2 times, most recently from ddfa320 to fb29643 Compare May 2, 2022 16:24

csullivan self-requested a review May 2, 2022 20:36

csullivan requested changes May 2, 2022

View reviewed changes

elvin-n force-pushed the scout/adreno branch from 7923f71 to e1580ec Compare May 4, 2022 21:18

elvin-n and others added 2 commits May 5, 2022 14:07

Address comments

31368e4

Add vectorization into some adreno pool flow

21e85d0

Co-authored-by: Li <quic_lih@quicinc.com>

elvin-n force-pushed the scout/adreno branch from 8ca5959 to 21e85d0 Compare May 6, 2022 07:37

elvin-n added 2 commits May 7, 2022 10:21

Merge branch 'main' into scout/adreno

842cf49

Merge branch 'main' into scout/adreno

86429e6

csullivan reviewed May 11, 2022

View reviewed changes

csullivan approved these changes May 11, 2022

View reviewed changes

elvin-n added 4 commits May 12, 2022 11:49

Fix adreno tests for running on the opencl host platform

8445fbe

remove unnecessary kDriverVersion in DeviceAttrKind

0394122

Move utils adreno functinos to separate shared file

fd99b70

fix black hits

780b2cb

csullivan merged commit c2d1905 into apache:main May 13, 2022

csullivan mentioned this pull request May 13, 2022

[Texture support][Part 2] Add opencl adreno target, topi schedules, and relay op strategies #7687

Closed

argrento mentioned this pull request May 23, 2022

[WIP] [OpenCL] Enable OpenCL for GPU tasks #11408

Closed

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

		from ..utils import get_const_tuple


		def getDiv(value, start):

	Expands spatial dimensions to be dividable by factor 4. This will allow us to do extrimely
	Expands spatial dimensions to be dividable by factor 4. This will allow us


		pad_data, kernel = s[conv].op.input_tensors

		s[pad_data].compute_inline()

		from tvm.contrib import graph_runtime


		def get_reference(mod, params1, input_shape, inputs):



		# build module run with opencl and cpu, compare results
		def build_run_compare(



		@tvm.testing.requires_opencl
		def test_conv2d_yolov3_v2_nchw_3c():

Conversation

elvin-n commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elvin-n commented May 2, 2022

Uh oh!

csullivan left a comment

Choose a reason for hiding this comment

Uh oh!

csullivan May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elvin-n commented May 4, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

elvin-n commented Apr 28, 2022 •

edited

Loading

csullivan May 2, 2022 •

edited

Loading