Description
There'a big performance regression in the Augmentation for RecordIO pipeline (slowing down from ~5000 samples/sec to ~3000 samples/sec for Resnet50 on Imagenet). This is linked to this PR #11027
What the PR tries to do itself is not problematic, I can get 5k samples/sec with an older commit d37f3a3 on that PR from May24. But in the form it got merged in there's a big slowdown.
Environment info
Package used (Python/R/Scala/Julia): Python 3
Build info
pip nightly (mxnet-cu90-1.3.0b20180627) , as well as built from source from master any commit after the above PR got merged
MXNet commit hash: N/A
Build config: Tried with and without USE_LIBJPEG_TURBO, using that increases the speed a bit (~3500), but still much slower than before. Also enabled USE_CUDA, USE_CUDNN
Steps to reproduce
python example/image-classification/train_imagenet.py --gpus 0,1,2,3,4,5,6,7 --batch-size 2048 --dtype float16 --network resnet-v1b --data-nthreads 40 --optimizer sgd --data-train /media/ramdisk/pass-through/train-passthrough.rec --data-train-idx /media/ramdisk/pass-through/train-passthrough.idx --data-val /media/ramdisk/pass-through/val-passthrough.rec --data-val-idx /media/ramdisk/pass-through/val-passthrough.idx
What have you tried to solve it?
I've tried to profile it and see what might be wrong with the tool perf. It looks like opencv is causing a wait for some reason. Please see figure 3
-
Here's a perf summary now

-
Perf summary from the May 24 commit

-
Call graph using perf

@hetong007 @piiswrong Any ideas?
Description
There'a big performance regression in the Augmentation for RecordIO pipeline (slowing down from ~5000 samples/sec to ~3000 samples/sec for Resnet50 on Imagenet). This is linked to this PR #11027
What the PR tries to do itself is not problematic, I can get 5k samples/sec with an older commit d37f3a3 on that PR from May24. But in the form it got merged in there's a big slowdown.
Environment info
Package used (Python/R/Scala/Julia): Python 3
Build info
pip nightly (mxnet-cu90-1.3.0b20180627) , as well as built from source from master any commit after the above PR got merged
MXNet commit hash: N/A
Build config: Tried with and without USE_LIBJPEG_TURBO, using that increases the speed a bit (~3500), but still much slower than before. Also enabled USE_CUDA, USE_CUDNN
Steps to reproduce
What have you tried to solve it?
I've tried to profile it and see what might be wrong with the tool perf. It looks like opencv is causing a wait for some reason. Please see figure 3
Here's a perf summary now

Perf summary from the May 24 commit

Call graph using perf

@hetong007 @piiswrong Any ideas?