Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) by elvin-n · Pull Request #8897 · apache/tvm

elvin-n · 2021-09-01T11:07:59Z

Extend the list of different target for x86 topi
Extend tests for conv2d x86 int8 for fast i8 x86 platforms

this change in theory can give up to 2x speedup on int8 models vs fp32 models, currently slightly less

Resnet50 performance:

	Core i7-1185G7 sse4	Core i7-1185G7 avx2	Core i7-1185G7 avx512	Core i7-1185G7 VNNI	Core i7-8700B	Core i5-9400T
	FPS	FPS	FPS	FPS	FPS	FPS
TVM FP32		53	53	53	54	48
TVM int32		12			16
TVM int8 default	34	61	92	142	78	62
TVM int8 atvm		70		134	95	79

- Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms

jcf94

Thanks for your answer! @elvin-n

elvin-n · 2021-09-03T10:29:32Z

The change in get_fp32_len affected ARM flow - now it started to block by 4 instead previous default 8. It must not affect from performance point of view since NEON SIMD vector size is 64 or 128 bit, but will affect the knowledge database of tuned kernels.

Will verify the performance aspect on ARM. As for backward compatibility - still open question. So far I have an impression that we do not care about it so much.

elvin-n · 2021-09-03T14:37:52Z

I verified ARM flow and confirm that it started to use 4 channel values instead of 8 for blocking and this fact did not affect performance anyhow (as i expected)

…pache#8897) * Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd - Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms * fix code style * Change x86-64-v2 to nahalem in test to support llvm11 * Change test target to get NCHW8c

Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd

ff61cc9

- Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms

elvin-n requested review from Huyuwei, Laurawly, ZihengJiang, anijain2305, areusch, comaniac, jcf94, jroesch, junrushao, jwfromm, kevinthesun, masahi, mbrookhart, merrymercy, tqchen, vinx13 and yzhliu as code owners September 1, 2021 11:07

elvin-n added 3 commits September 1, 2021 14:20

fix code style

7da5b4c

Change x86-64-v2 to nahalem in test to support llvm11

0742047

Change test target to get NCHW8c

051f9cc

vinx13 approved these changes Sep 2, 2021

View reviewed changes

jcf94 reviewed Sep 3, 2021

View reviewed changes

Comment thread python/tvm/topi/x86/utils.py

Comment thread tests/python/relay/test_op_level2.py

jcf94 approved these changes Sep 3, 2021

View reviewed changes

masahi merged commit 1bebd0a into apache:main Sep 9, 2021

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)#8897

Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd)#8897
masahi merged 4 commits intoapache:mainfrom
Deelvin:amalyshe/int8_avx2_sse4

elvin-n commented Sep 1, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jcf94 left a comment

Uh oh!

elvin-n commented Sep 3, 2021 •

edited

Loading

Uh oh!

elvin-n commented Sep 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

elvin-n commented Sep 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jcf94 left a comment

Choose a reason for hiding this comment

Uh oh!

elvin-n commented Sep 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elvin-n commented Sep 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

elvin-n commented Sep 1, 2021 •

edited

Loading

elvin-n commented Sep 3, 2021 •

edited

Loading