Hi, I just updated mxnet today from 0.10.0 to 1.0.0 in order to use some new features. Both versions are installed with pip like pip3 install mxnet-cu80==1.0.0. However, after a detailed benchmark test, I observed a significant speed drop when running resnet inference especially when batch size is small. The result for resnet152 is like below (network json file is downloaded from here)
MXNet version: 0.10.0
########################################################
speed test for batch size: 1
avg forward speed: 85.146210 samples/s
avg forward time: mean = 0.011743 s, std = 0.000341 s
########################################################
speed test for batch size: 4
avg forward speed: 215.284806 samples/s
avg forward time: mean = 0.018579 s, std = 0.000175 s
########################################################
speed test for batch size: 16
avg forward speed: 297.233244 samples/s
avg forward time: mean = 0.053827 s, std = 0.001030 s
########################################################
speed test for batch size: 64
avg forward speed: 316.399717 samples/s
avg forward time: mean = 0.202272 s, std = 0.001754 s
########################################################
speed test for batch size: 128
avg forward speed: 321.620336 samples/s
avg forward time: mean = 0.397981 s, std = 0.002363 s
MXNet version: 1.0.0
########################################################
speed test for batch size: 1
avg forward speed: 67.866811 samples/s
avg forward time: mean = 0.014733 s, std = 0.000391 s
########################################################
speed test for batch size: 4
avg forward speed: 188.020417 samples/s
avg forward time: mean = 0.021272 s, std = 0.000563 s
########################################################
speed test for batch size: 16
avg forward speed: 286.253890 samples/s
avg forward time: mean = 0.055892 s, std = 0.000565 s
########################################################
speed test for batch size: 64
avg forward speed: 310.045353 samples/s
avg forward time: mean = 0.206418 s, std = 0.004860 s
########################################################
speed test for batch size: 128
avg forward speed: 320.566647 samples/s
avg forward time: mean = 0.399289 s, std = 0.002575 s
PS. I noticed that when batch size is small (i.e. batch_size=1), the GPU usage is 95~100% in mxnet 0.10.0 and 80-83% in mxnet 1.0.0 which means the GPU is not fully utilized at all.
Software env: Ubuntu 16.04, Python 3.5, CUDA 8.0, CUDNN 5.1.
GPU: GTX 1080 Ti.
I also test on a server with Titan XP and got a similar result. The speed test script is pasted below:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time
import argparse
import mxnet as mx
import numpy as np
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='speed test')
parser.add_argument('--net', type=str, required=True,
help='network symbol json file')
parser.add_argument('--size', type=int, default=224,
help='image size')
parser.add_argument('--n-batch', type=int, default=100,
help='batch number for test')
parser.add_argument('--gpu', type=int, default=0,
help='gpu device id')
args = parser.parse_args()
print('MXNet version: %s' % mx.__version__)
ctx = mx.gpu(args.gpu)
batch_size_list = [1, 4, 16, 64, 128]
mod = mx.mod.Module(symbol=mx.sym.load(args.net),
context=ctx,
data_names=['data', ],
label_names=['softmax_label', ])
mod.bind(data_shapes=[('data', (1, 3, args.size, args.size))],
label_shapes=[('softmax_label', (1,))],
for_training=False)
mod.init_params(initializer=mx.init.Normal())
for batch_size in batch_size_list:
print('########################################################')
print('speed test for batch size: %d' % batch_size)
mod.reshape(data_shapes=[('data', (batch_size,
3,
args.size,
args.size))],
label_shapes=[('softmax_label', (batch_size,))])
# pre-allocate
batch_data = mx.nd.random_normal(0, 0.5, (batch_size, 3, args.size, args.size), ctx=ctx)
# warm up GPU
for _ in range(50):
mod.forward(mx.io.DataBatch(data=[batch_data, ],
label=None),
is_train=False)
out = mod.get_outputs()[0].asnumpy()
k = 0
t_start = time.time()
t_fwd_list = []
for _ in range(args.n_batch):
t1 = time.time()
mod.forward(mx.io.DataBatch(data=[batch_data, ],
label=None),
is_train=False)
out = mod.get_outputs()[0]
out.wait_to_read()
t2 = time.time()
t_fwd_list.append(t2-t1)
t_end = time.time()
n_samples = args.n_batch*batch_size
print('\tavg forward speed: %f samples/s' % (n_samples/(t_end-t_start)))
print('\tavg forward time: mean = %f s, std = %f s' %
(np.mean(t_fwd_list), np.std(t_fwd_list)))
Hi, I just updated mxnet today from 0.10.0 to 1.0.0 in order to use some new features. Both versions are installed with pip like
pip3 install mxnet-cu80==1.0.0. However, after a detailed benchmark test, I observed a significant speed drop when running resnet inference especially when batch size is small. The result for resnet152 is like below (network json file is downloaded from here)PS. I noticed that when batch size is small (i.e. batch_size=1), the GPU usage is 95~100% in mxnet 0.10.0 and 80-83% in mxnet 1.0.0 which means the GPU is not fully utilized at all.
Software env: Ubuntu 16.04, Python 3.5, CUDA 8.0, CUDNN 5.1.
GPU: GTX 1080 Ti.
I also test on a server with Titan XP and got a similar result. The speed test script is pasted below: