Surface headers for Linking To#190
Merged
ms609 merged 13 commits intotransfer-consensusfrom Mar 23, 2026
Merged
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## transfer-consensus #190 +/- ##
======================================================
- Coverage 95.76% 95.76% -0.01%
======================================================
Files 53 57 +4
Lines 5267 5406 +139
======================================================
+ Hits 5044 5177 +133
- Misses 223 229 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Performance benchmark results
|
Performance benchmark results
|
Performance benchmark results
|
lap_impl.h: add __attribute__((optimize("align-functions=64",
"align-loops=16"))) on lap() (GCC only) to stabilise instruction
alignment across TU layout changes.
cost_matrix.h: add setWithTranspose() and markTransposed() methods.
lap.cpp: use combined fill+transpose in lapjv() to eliminate the extra
makeTranspose() O(n²) pass introduced by the header refactor.
tree_distance_functions.cpp: add cpp_mci_impl_score() wrapper that calls
TreeDist::mutual_clustering_score() from the installable header, covering
find_exact_matches_raw() and the MCI score computation.
test-mci_impl.R: exercises exact-match early exit and partial-match + LAP
paths through the header implementation.
Performance benchmark results
|
PKG_CXXFLAGS is overridden by ASAN/ubsan ~/.R/Makevars on CI, dropping the inst/include path. PKG_CPPFLAGS is not overridden.
Performance benchmark results
|
lap_impl.h is for downstream LinkingTo consumers only. Including it in TreeDist's own lap.cpp changed the TU context enough for GCC 14's register allocator to produce ~8% more instructions in the Dijkstra hot loop, causing a 20-25% regression on standalone LAPJV (n >= 400). Fix: define lap() directly in lap.cpp (matching main's pattern) and add GCC align-functions=64 / align-loops=16 attributes. The lapjv() wrapper now fills the transposed buffer first (matching R's column-major storage) then untransposes — restoring the cache-friendly construction pattern. Residual: ~5-9% on LAPJV 1999x1999 vs main, from the different CostMatrix class definition visible through the installed headers (different method set, namespace wrapping). Tree distance metrics (CID, MSD, PID, etc.) are unaffected — they call lap() from pairwise_distances.cpp. cost_matrix.h: add dim8() accessor (needed by lapjv() matrix fill).
Performance benchmark results
|
The expanded lap.o (direct lap() implementation) shifted the DLL layout
enough to regress PathDist by ~50% on Linux/GCC -O3 — a function in a
completely different TU that was a linker-level alignment casualty.
Per-function __attribute__((optimize("align-functions=64"))) only
controls alignment within the object file; it cannot prevent neighbouring
functions from being placed at unfavorable offsets when another .o changes
size. A global -falign-functions=64 in PKG_CXXFLAGS ensures every
function entry point is 64-byte aligned regardless of link order.
Removed the per-function attribute from lap.cpp (now redundant).
Kept it in lap_impl.h for downstream LinkingTo consumers.
The 'Initialize ASan configuration' step overwrote src/Makevars with '>', destroying PKG_CPPFLAGS = -I../inst/include added by the expose-lapjv branch. Use sed to replace PKG_CXXFLAGS in-place instead, preserving PKG_CPPFLAGS and PKG_LIBS.
CRAN's R CMD check flags this as non-portable (GCC-specific). Tree distance metrics were unaffected by the alignment regression; only standalone LAPJV microbenchmarks showed sensitivity.
Performance benchmark results
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.