[CI] run pytest in parallel by szha · Pull Request #18146 · apache/mxnet

szha · 2020-04-23T07:15:39Z

Description

run pytest in parallel

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

run small tests in parallel
mark tests that require more resources
run large tests in serial
use built-in tmpdir fixtures for temporary files

mxnet-bot · 2020-04-23T07:15:41Z

Hey @szha , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, clang, centos-gpu, unix-cpu, windows-cpu, unix-gpu, sanity, windows-gpu, centos-cpu, edge, miscellaneous]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

leezu

Thanks.

I find that the runtime of some tests is increased, maybe due to thrashing, overhead of the threadedengine or other problems. Though this can be improved in separate PR.

For example

505.28s call tests/python/unittest/test_optimizer.py::test_sparse_adam4
and
460.96s call tests/python/unittest/test_optimizer.py::test_sparse_adam
on last two unix-cpu runs in this PR in (Python3 MKL-CPU) but on master
150.76s call tests/python/unittest/test_optimizer.py::test_sparse_adam

szha · 2020-05-04T23:44:21Z

will look into it in a follow-up PR.

szha · 2020-05-05T04:58:03Z

@PatricZhao I noticed that the MKL/MKLDNN tests are taking a lot longer than non-MKL builds in the parallel test setting. I will try to run a couple more times to verify so this is just FYI. Example:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18146/51/pipeline/366

szha · 2020-05-05T19:13:04Z

Update: the MKLDNN builds are actually executing a different set of tests which could explain the time difference. However, the MKL build is indeed executing the same unittest as regular python 3 CPU build and it's consistently taking a lot longer.

marcoabreu · 2020-05-05T21:20:51Z

Could that be due to cold booz? Doesn't mkl generate kernels the first time you invoke them? Sheng Zha <notifications@github.com> schrieb am Di., 5. Mai 2020, 21:13:

…

Update: the MKLDNN builds are actually executing a different set of tests which could explain the time difference. However, the MKL build is indeed executing the same unittest as regular python 3 CPU build and it's consistently taking a lot longer. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#18146 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEOED27YF6LHD3T3K2GP7N3RQBQNBANCNFSM4MOZZ7VQ> .

szha · 2020-05-05T22:07:43Z

@marcoabreu indeed that could be a likely cause. I reported my findings in #18244 and we can continue the discussion there.

* run pytest in parallel * disable memory pool * address flaky ftrl/fm test and layernorm timeout * mark tests as serial * use parametrize in numpy op tests * fix io bugs * fix gluon rnn cell test and doc * replace xfail with raises scope * fix flaky numpy, mkldnn quantize, and rnn tests * fix tempfile/dir usage

szha force-pushed the parallel branch 5 times, most recently from 7103615 to 774f703 Compare April 24, 2020 23:47

szha mentioned this pull request Apr 25, 2020

Flaky test_numpy_op.py::test_np_mixedType_unary_funcs #18166

Open

szha force-pushed the parallel branch 23 times, most recently from c183ced to 98d7154 Compare April 29, 2020 22:52

szha force-pushed the parallel branch 6 times, most recently from 7781d4c to fddadd1 Compare May 2, 2020 21:14

run pytest in parallel

d9375fe

szha force-pushed the parallel branch from fddadd1 to 46b315b Compare May 3, 2020 01:28

Sheng Zha added 5 commits May 3, 2020 11:31

disable memory pool

d8f5348

address flaky ftrl/fm test and layernorm timeout

dbc2160

mark tests as serial

489a9f4

use parametrize in numpy op tests

4ea0acf

fix io bugs

b60c220

szha force-pushed the parallel branch 2 times, most recently from fb69023 to ac43291 Compare May 4, 2020 00:07

Sheng Zha added 4 commits May 4, 2020 10:03

fix gluon rnn cell test and doc

82da302

replace xfail with raises scope

c664162

fix flaky numpy, mkldnn quantize, and rnn tests

1160fc7

fix tempfile/dir usage

ffe1e15

szha force-pushed the parallel branch from ac43291 to ffe1e15 Compare May 4, 2020 17:04

leezu approved these changes May 4, 2020

View reviewed changes

szha merged commit 0580200 into apache:master May 4, 2020

szha deleted the parallel branch May 4, 2020 23:45

szha mentioned this pull request May 5, 2020

unix-cpu MKL/MKL-DNN Test Time #18244

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] run pytest in parallel#18146

[CI] run pytest in parallel#18146
szha merged 10 commits intoapache:masterfrom
szha:parallel

szha commented Apr 23, 2020 •

edited

Loading

Uh oh!

mxnet-bot commented Apr 23, 2020

Uh oh!

leezu left a comment

Uh oh!

szha commented May 4, 2020

Uh oh!

szha commented May 5, 2020 •

edited

Loading

Uh oh!

szha commented May 5, 2020

Uh oh!

marcoabreu commented May 5, 2020 via email

Uh oh!

szha commented May 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

szha commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Essentials

Changes

Uh oh!

mxnet-bot commented Apr 23, 2020

Uh oh!

leezu left a comment

Choose a reason for hiding this comment

Uh oh!

szha commented May 4, 2020

Uh oh!

szha commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szha commented May 5, 2020

Uh oh!

marcoabreu commented May 5, 2020 via email

Uh oh!

szha commented May 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

szha commented Apr 23, 2020 •

edited

Loading

szha commented May 5, 2020 •

edited

Loading