Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Simplified HybridBlock.forward commit made Sockeye 4% slower #18699

@kpuatamazon

Description

@kpuatamazon

I'm experiencing a 4% slowdown in Sockeye due to commit 83b5170 "Add simplified HybridBlock.forward without F (#17530)".

Commit Time (s)
08528c5 295.90
56e7985 295.73
d4052fd 295.69
9a355eb 295.28
3840786 293.25
83b5170 293.70
8e39518 281.37
b133899 282.58
2f358fd 281.95
f01dc80 283.60
3667e9a 283.79
f7c4323 282.60

But it's slightly more complicated. At the beginning (f7c432), the build worked with MKLDNN at cb2cc7ac. Then 3667e9a broke the build with an MKLDNN upgrade, a bunch of commits went in with MKLDNN broken so they don't compile, and 08528c5 fixed it by downgrading MKL back to cb2cc7ac.

Hence I wrote this script that downgrades MKLDNN to make stuff build and find the relevant commit:

#!/bin/bash
export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so
export CXXFLAGS="-O3 -march=native -DUSE_MKL -I/opt/intel/mkl/include -pipe"
set -e -o pipefail
. ~/test/bin/activate
cd ~/mxnet
git reset --hard
git checkout --force $1
git clean -xdff
git reset --hard
git submodule foreach --recursive git clean -ffxd
git submodule foreach --recursive git reset --hard
git submodule update --init --recursive
cd 3rdparty/mkldnn/
git checkout cb2cc7ac17ff4e2ef50805c7048d33256d82be4d
cd ../..
rm -rf build
mkdir build
cd build
cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release ..
ninja -j 4
cd ../python
pip3 install -e .
~/benchmark.sh

Test conditions:

  • c5.2xlarge specifically a "Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz"
  • OMP_NUM_THREADS=3
  • Forced MKL backend with export CXXFLAGS="-O3 -march=native -DUSE_MKL -I/opt/intel/mkl/include -pipe"
  • Sockeye 45d704a4
  • Batch size 1

More broadly, I'm trying to unpick performance differences seen in Sockeye as MXNet has changed since v1.5.x. This image shows commits since master diverged from v1.5.x. v1.5.x is on the left and cbbb864 is on the right.

master

The first big slowdown is an MKLDNN change on the left but that appears to have been fixed. Then there's a slowdown near the right that doesn't appear to be a single commit but rather a bunch of incremental changes. And this is the first of them I've been able to isolate.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions