Optimize fmaxf etc. by kripken · Pull Request #9689 · emscripten-core/emscripten

kripken · 2019-10-22T19:19:26Z

The wasm builtins are very similar to the normal libc functions, except that nans are handled differently. Keep the musl nan handling, and otherwise use the builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster on this silly benchmark:

#include <math.h>
#include <stdio.h>

int main() {
  union {
    int i;
    float f;
  } u;
  float sum = 0;
  const int N = 20000;
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
      u.i = ((i << 15) + j + 5085) ^ j ^ (i >> 2);
      sum += fmaxf(u.f, 0.5);
    }
  }
  printf("%.2f\n", sum);
}

After the speedup we are about equal with gcc natively.

Verified this does not change the output of our tests on this, and added more test coverage.

cc @sunfishcode

sunfishcode · 2019-10-22T19:37:31Z

Clever! I'm surprised by the magnitude of the speedup, but I guess that means those signbit calls aren't easy to optimize.

jgravelle-google · 2019-10-23T18:01:32Z

+
+// fmin etc. are not specced to be sensitive to negative zero, and LLVM does
+// depend on that for optimizations, so check only the absolute value there
+#define TESTS(name) \


yay macros that make things easier to read :)

kripken · 2019-10-23T19:51:43Z

I added more tests for negative zero. Wasm and musl do handle it correctly even if the libc spec doesn't require it, so nice to make sure we don't regress that.

Use wasm's builtin min and max operators to implement libc `fmin`, `fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly. Credit to emscripten-core/emscripten#9689 for spotting this opportunity!

The wasm builtins are very similar to the normal libc functions, except that nans are handled differently. Keep the musl nan handling, and otherwise use the builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster on this silly benchmark: #include <math.h> #include <stdio.h> int main() { union { int i; float f; } u; float sum = 0; const int N = 20000; for (int i = 0; i < N; i++) { for (int j = 0; j < N; j++) { u.i = ((i << 15) + j + 5085) ^ j ^ (i >> 2); sum += fmaxf(u.f, 0.5); } } printf("%.2f\n", sum); } After the speedup we are about equal with gcc natively. Verified this does not change the output of our tests on this, and added more test coverage, including of negative zero which libc is not guaranteed to get right, but the implementation actually does, and using wasm builtins preserves that.

kripken added 3 commits October 22, 2019 10:57

wip

a42c4f0

wip [ci skip]

4577898

fix

83c36cf

kripken requested review from sbc100 and tlively October 22, 2019 19:19

Merge remote-tracking branch 'origin/incoming' into fs

8fdc7ef

jgravelle-google approved these changes Oct 23, 2019

View reviewed changes

Add tests for negative zero, using a function pointer.

531da1f

kripken merged commit 3bd5e10 into incoming Oct 23, 2019

delete-merged-branch Bot deleted the fs branch October 23, 2019 22:44

sunfishcode mentioned this pull request Oct 25, 2019

Optimize fmin, fmax, etc. WebAssembly/wasi-libc#120

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize fmaxf etc.#9689

Optimize fmaxf etc.#9689
kripken merged 5 commits into
incomingfrom
fs

kripken commented Oct 22, 2019

Uh oh!

sunfishcode commented Oct 22, 2019

Uh oh!

jgravelle-google Oct 23, 2019

Uh oh!

kripken commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kripken commented Oct 22, 2019

Uh oh!

sunfishcode commented Oct 22, 2019

Uh oh!

jgravelle-google Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

kripken commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants