Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[CI] run pytest in parallel#18146

Merged
szha merged 10 commits intoapache:masterfrom
szha:parallel
May 4, 2020
Merged

[CI] run pytest in parallel#18146
szha merged 10 commits intoapache:masterfrom
szha:parallel

Conversation

@szha
Copy link
Copy Markdown
Member

@szha szha commented Apr 23, 2020

Description

run pytest in parallel

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • run small tests in parallel
  • mark tests that require more resources
  • run large tests in serial
  • use built-in tmpdir fixtures for temporary files

@mxnet-bot
Copy link
Copy Markdown

Hey @szha , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, clang, centos-gpu, unix-cpu, windows-cpu, unix-gpu, sanity, windows-gpu, centos-cpu, edge, miscellaneous]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@szha szha force-pushed the parallel branch 5 times, most recently from 7103615 to 774f703 Compare April 24, 2020 23:47
@szha szha force-pushed the parallel branch 23 times, most recently from c183ced to 98d7154 Compare April 29, 2020 22:52
@szha szha force-pushed the parallel branch 6 times, most recently from 7781d4c to fddadd1 Compare May 2, 2020 21:14
@szha szha force-pushed the parallel branch 2 times, most recently from fb69023 to ac43291 Compare May 4, 2020 00:07
Copy link
Copy Markdown
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I find that the runtime of some tests is increased, maybe due to thrashing, overhead of the threadedengine or other problems. Though this can be improved in separate PR.

For example

505.28s call tests/python/unittest/test_optimizer.py::test_sparse_adam4
and
460.96s call tests/python/unittest/test_optimizer.py::test_sparse_adam
on last two unix-cpu runs in this PR in (Python3 MKL-CPU) but on master
150.76s call tests/python/unittest/test_optimizer.py::test_sparse_adam

@szha
Copy link
Copy Markdown
Member Author

szha commented May 4, 2020

will look into it in a follow-up PR.

@szha szha merged commit 0580200 into apache:master May 4, 2020
@szha szha deleted the parallel branch May 4, 2020 23:45
@szha
Copy link
Copy Markdown
Member Author

szha commented May 5, 2020

@PatricZhao I noticed that the MKL/MKLDNN tests are taking a lot longer than non-MKL builds in the parallel test setting. I will try to run a couple more times to verify so this is just FYI. Example:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18146/51/pipeline/366

@szha
Copy link
Copy Markdown
Member Author

szha commented May 5, 2020

Update: the MKLDNN builds are actually executing a different set of tests which could explain the time difference. However, the MKL build is indeed executing the same unittest as regular python 3 CPU build and it's consistently taking a lot longer.

@marcoabreu
Copy link
Copy Markdown
Contributor

marcoabreu commented May 5, 2020 via email

@szha
Copy link
Copy Markdown
Member Author

szha commented May 5, 2020

@marcoabreu indeed that could be a likely cause. I reported my findings in #18244 and we can continue the discussion there.

AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
* run pytest in parallel

* disable memory pool

* address flaky ftrl/fm test and layernorm timeout

* mark tests as serial

* use parametrize in numpy op tests

* fix io bugs

* fix gluon rnn cell test and doc

* replace xfail with raises scope

* fix flaky numpy, mkldnn quantize, and rnn tests

* fix tempfile/dir usage
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants