[MC] Explicitly use memcpy in emitBytes() (NFC)#177187
Conversation
We've observed a compile-time regression in LLVM 22 when including large blobs. The root cause way that emitBytes() was copying bytes one-by-one, which is much slower than using memcpy. Optimization of std::copy to memmove is apparently much less reliable than one might think. In particular, when using a non-bleeding-edge libstdc++ (anything older than version 15), this does not happen if the types of the input and output iterators do not match (like here, where there is a signed/unsigned mismatch). As this code is performance sensitive, I think it makes sense to directly use memcpy.
|
@llvm/pr-subscribers-llvm-mc Author: Nikita Popov (nikic) ChangesWe've observed a compile-time regression in LLVM 22 when including large blobs. The root cause was that emitBytes() was copying bytes one-by-one, which is much slower than using memcpy for large objects. Optimization of std::copy to memmove is apparently much less reliable than one might think. In particular, when using a non-bleeding-edge libstdc++ (anything older than version 15), this does not happen if the types of the input and output iterators do not match (like here, where there is a signed/unsigned mismatch). As this code is performance sensitive, I think it makes sense to directly use memcpy. Previously this code used SmallVector::append, which explicitly uses memcpy here: Full diff: https://github.com/llvm/llvm-project/pull/177187.diff 1 Files Affected:
diff --git a/llvm/lib/MC/MCObjectStreamer.cpp b/llvm/lib/MC/MCObjectStreamer.cpp
index 261e9a37ecb55..d44e14a35cac8 100644
--- a/llvm/lib/MC/MCObjectStreamer.cpp
+++ b/llvm/lib/MC/MCObjectStreamer.cpp
@@ -109,7 +109,9 @@ void MCObjectStreamer::addSpecialFragment(MCFragment *Frag) {
void MCObjectStreamer::appendContents(ArrayRef<char> Contents) {
ensureHeadroom(Contents.size());
assert(FragSpace >= Contents.size());
- llvm::copy(Contents, getCurFragEnd());
+ // As this is performance-sensitive code, explicitly use std::memcpy.
+ // Optimization of std::copy to memmove is unreliable.
+ std::memcpy(getCurFragEnd(), Contents.begin(), Contents.size());
CurFrag->FixedSize += Contents.size();
FragSpace -= Contents.size();
}
|
|
/cherry-pick 15e421d |
|
/pull-request #177320 |
We've observed a compile-time regression in LLVM 22 when including large blobs. The root cause was that emitBytes() was copying bytes one-by-one, which is much slower than using memcpy for large objects. Optimization of std::copy to memmove is apparently much less reliable than one might think. In particular, when using a non-bleeding-edge libstdc++ (anything older than version 15), this does not happen if the types of the input and output iterators do not match (like here, where there is a signed/unsigned mismatch). As this code is performance sensitive, I think it makes sense to directly use memcpy. Previously this code used SmallVector::append, which explicitly uses memcpy. (cherry picked from commit 15e421d)
|
Looks like this caused a ubsan failure: https://lab.llvm.org/buildbot/#/builders/85/builds/17944 . Can you please take a look? (Specifically the |
|
I've pushed a speculative fix at d064f39. |
|
/cherry-pick d064f39 |
|
/pull-request #177907 |
Update to LLVM 22 Scheduled release date: Feb 24 1.94 becomes stable: Mar 5 Changes: * Update to rc2, with one patch to work around our outdated illumos sysroot (rust-lang/llvm-project@41256ab). * Update the host toolchain as well, otherwise we lose cross-language LTO, in particular for jemalloc. * Adjust one loongarch assembly test. The split into r and s variants is based on the suggestion in #151134. Depends on: * [x] #151410 * [ ] #150756 * [x] llvm/llvm-project#175190 * [x] llvm/llvm-project#175912 * [x] llvm/llvm-project#175965 * [x] llvm/llvm-project#176195 * [x] llvm/llvm-project#157073 * [x] llvm/llvm-project#176421 * [x] llvm/llvm-project#176925 * [x] llvm/llvm-project#177187
Update to LLVM 22 Scheduled release date: Feb 24 1.94 becomes stable: Mar 5 Changes: * Update to rc2, with one patch to work around our outdated illumos sysroot (rust-lang/llvm-project@41256ab). * Update the host toolchain as well, otherwise we lose cross-language LTO, in particular for jemalloc. * Adjust one loongarch assembly test. The split into r and s variants is based on the suggestion in #151134. Depends on: * [x] #151410 * [ ] #150756 * [x] llvm/llvm-project#175190 * [x] llvm/llvm-project#175912 * [x] llvm/llvm-project#175965 * [x] llvm/llvm-project#176195 * [x] llvm/llvm-project#157073 * [x] llvm/llvm-project#176421 * [x] llvm/llvm-project#176925 * [x] llvm/llvm-project#177187
Update to LLVM 22 Scheduled release date: Feb 24 1.94 becomes stable: Mar 5 Changes: * Update to rc2, with one patch to work around our outdated illumos sysroot (rust-lang/llvm-project@41256ab). * Update the host toolchain as well, otherwise we lose cross-language LTO, in particular for jemalloc. * Adjust one loongarch assembly test. The split into r and s variants is based on the suggestion in rust-lang/rust#151134. Depends on: * [x] rust-lang/rust#151410 * [ ] rust-lang/rust#150756 * [x] llvm/llvm-project#175190 * [x] llvm/llvm-project#175912 * [x] llvm/llvm-project#175965 * [x] llvm/llvm-project#176195 * [x] llvm/llvm-project#157073 * [x] llvm/llvm-project#176421 * [x] llvm/llvm-project#176925 * [x] llvm/llvm-project#177187
Update to LLVM 22 Scheduled release date: Feb 24 1.94 becomes stable: Mar 5 Changes: * Update to rc2, with one patch to work around our outdated illumos sysroot (rust-lang/llvm-project@41256ab). * Update the host toolchain as well, otherwise we lose cross-language LTO, in particular for jemalloc. * Adjust one loongarch assembly test. The split into r and s variants is based on the suggestion in rust-lang/rust#151134. Depends on: * [x] rust-lang/rust#151410 * [ ] rust-lang/rust#150756 * [x] llvm/llvm-project#175190 * [x] llvm/llvm-project#175912 * [x] llvm/llvm-project#175965 * [x] llvm/llvm-project#176195 * [x] llvm/llvm-project#157073 * [x] llvm/llvm-project#176421 * [x] llvm/llvm-project#176925 * [x] llvm/llvm-project#177187
Update to LLVM 22 Scheduled release date: Feb 24 1.94 becomes stable: Mar 5 Changes: * Update to rc2, with one patch to work around our outdated illumos sysroot (rust-lang/llvm-project@41256ab). * Update the host toolchain as well, otherwise we lose cross-language LTO, in particular for jemalloc. * Adjust one loongarch assembly test. The split into r and s variants is based on the suggestion in rust-lang/rust#151134. Depends on: * [x] rust-lang/rust#151410 * [ ] rust-lang/rust#150756 * [x] llvm/llvm-project#175190 * [x] llvm/llvm-project#175912 * [x] llvm/llvm-project#175965 * [x] llvm/llvm-project#176195 * [x] llvm/llvm-project#157073 * [x] llvm/llvm-project#176421 * [x] llvm/llvm-project#176925 * [x] llvm/llvm-project#177187
Update to LLVM 22 Scheduled release date: Feb 24 1.94 becomes stable: Mar 5 Changes: * Update to rc2, with one patch to work around our outdated illumos sysroot (rust-lang/llvm-project@41256ab). * Update the host toolchain as well, otherwise we lose cross-language LTO, in particular for jemalloc. * Adjust one loongarch assembly test. The split into r and s variants is based on the suggestion in rust-lang/rust#151134. Depends on: * [x] rust-lang/rust#151410 * [ ] rust-lang/rust#150756 * [x] llvm/llvm-project#175190 * [x] llvm/llvm-project#175912 * [x] llvm/llvm-project#175965 * [x] llvm/llvm-project#176195 * [x] llvm/llvm-project#157073 * [x] llvm/llvm-project#176421 * [x] llvm/llvm-project#176925 * [x] llvm/llvm-project#177187
This reverts commit 15e421d.
We've observed a compile-time regression in LLVM 22 when including large blobs. The root cause was that emitBytes() was copying bytes one-by-one, which is much slower than using memcpy for large objects.
Optimization of std::copy to memmove is apparently much less reliable than one might think. In particular, when using a non-bleeding-edge libstdc++ (anything older than version 15), this does not happen if the types of the input and output iterators do not match (like here, where there is a signed/unsigned mismatch).
As this code is performance sensitive, I think it makes sense to directly use memcpy.
Previously this code used SmallVector::append, which explicitly uses memcpy here:
llvm-project/llvm/include/llvm/ADT/SmallVector.h
Line 519 in 82d7a52