Skip to content

Optimize fmaxf etc.#9689

Merged
kripken merged 5 commits into
incomingfrom
fs
Oct 23, 2019
Merged

Optimize fmaxf etc.#9689
kripken merged 5 commits into
incomingfrom
fs

Conversation

@kripken

@kripken kripken commented Oct 22, 2019

Copy link
Copy Markdown
Member

The wasm builtins are very similar to the normal libc functions, except that nans are handled differently. Keep the musl nan handling, and otherwise use the builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster on this silly benchmark:

#include <math.h>
#include <stdio.h>

int main() {
  union {
    int i;
    float f;
  } u;
  float sum = 0;
  const int N = 20000;
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
      u.i = ((i << 15) + j + 5085) ^ j ^ (i >> 2);
      sum += fmaxf(u.f, 0.5);
    }
  }
  printf("%.2f\n", sum);
}

After the speedup we are about equal with gcc natively.

Verified this does not change the output of our tests on this, and added more test coverage.

cc @sunfishcode

@kripken kripken requested review from sbc100 and tlively October 22, 2019 19:19
@sunfishcode

Copy link
Copy Markdown
Collaborator

Clever! I'm surprised by the magnitude of the speedup, but I guess that means those signbit calls aren't easy to optimize.


// fmin etc. are not specced to be sensitive to negative zero, and LLVM does
// depend on that for optimizations, so check only the absolute value there
#define TESTS(name) \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay macros that make things easier to read :)

@kripken

kripken commented Oct 23, 2019

Copy link
Copy Markdown
Member Author

I added more tests for negative zero. Wasm and musl do handle it correctly even if the libc spec doesn't require it, so nice to make sure we don't regress that.

@kripken kripken merged commit 3bd5e10 into incoming Oct 23, 2019
@delete-merged-branch delete-merged-branch Bot deleted the fs branch October 23, 2019 22:44
sunfishcode added a commit to WebAssembly/wasi-libc that referenced this pull request Oct 25, 2019
Use wasm's builtin min and max operators to implement libc `fmin`,
`fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly.

Credit to emscripten-core/emscripten#9689
for spotting this opportunity!
sunfishcode added a commit to WebAssembly/wasi-libc that referenced this pull request Nov 8, 2019
Use wasm's builtin min and max operators to implement libc `fmin`,
`fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly.

Credit to emscripten-core/emscripten#9689
for spotting this opportunity!
sunfishcode added a commit to WebAssembly/wasi-libc that referenced this pull request Nov 20, 2019
Use wasm's builtin min and max operators to implement libc `fmin`,
`fmax, `fminf`, and `fmaxf`, by handling the NaN cases explicitly.

Credit to emscripten-core/emscripten#9689
for spotting this opportunity!
belraquib pushed a commit to belraquib/emscripten that referenced this pull request Dec 23, 2020
The wasm builtins are very similar to the normal libc functions, except that
nans are handled differently. Keep the musl nan handling, and otherwise use the
builtins. This is ~41 bytes less in each fmaxf etc. function, and is 30% faster
on this silly benchmark:

#include <math.h>
#include <stdio.h>

int main() {
  union {
    int i;
    float f;
  } u;
  float sum = 0;
  const int N = 20000;
  for (int i = 0; i < N; i++) {
    for (int j = 0; j < N; j++) {
      u.i = ((i << 15) + j + 5085) ^ j ^ (i >> 2);
      sum += fmaxf(u.f, 0.5);
    }
  }
  printf("%.2f\n", sum);
}

After the speedup we are about equal with gcc natively.

Verified this does not change the output of our tests on this, and added more
test coverage, including of negative zero which libc is not guaranteed to get
right, but the implementation actually does, and using wasm builtins preserves
that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants