fix(thinc): restore ln2/f_log_cosh to keep AMD omptarget module-data layout#1416
fix(thinc): restore ln2/f_log_cosh to keep AMD omptarget module-data layout#1416sbryngelson wants to merge 2 commits intoMFlowCode:masterfrom
Conversation
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
Claude Code ReviewHead SHA: 7be7cf0 Files changed:
Findings[Correctness]
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1416 +/- ##
=======================================
Coverage 64.94% 64.94%
=======================================
Files 72 72
Lines 18861 18861
Branches 1570 1570
=======================================
Hits 12250 12250
Misses 5638 5638
Partials 973 973 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Restores
ln2andf_log_coshtom_thinc.fpp(along withGPU_DECLARE(create=[gq3_pts, gq3_wts, ln2])) to fix spurious AMDgpu-ompCI failures introduced by #1401. The MTHINC normals math fix from #1401 (the actual feature work) stays untouched.Background — first attempt was wrong
My initial diagnosis blamed the
create=→copyin=macro-form swap in #1401. CI on the first revision of this PR (withcreate=instead ofcopyin=) failed with the identicalomptargeterror pattern, so that wasn't it.Revised diagnosis
The only other module-data change in #1401 was the deletion of:
ln2 = 0.6931471805599453_wpstatic constant, andf_log_cosh(the only consumer ofln2).Both were already dead code on master before #1401 (no callers of
f_log_cosh), so removing them was a reasonable cleanup. But it shrankm_thinc's.datasegment by 8 bytes, which shifted the layout of subsequent declare-target entities (mthinc_nhat,mthinc_ddescriptors) and adjacent module declares.The AMD/ROCm omptarget runtime is sensitive to that layout: post-#1401 it now flags a 96-byte mapping whose tail extends 48 bytes into a sibling 2496/14784/5056-byte mapping with
omptarget message: explicit extension not allowed: ..., and aborts withomptarget fatal error 1: failure of target construct while offloading is mandatory. Pre-#1401, withln2present, the layout doesn't trigger that overlap.Why bring back dead code
ln2andf_log_coshare load-bearing for AMD's runtime layout, even though MFC's own code doesn't currently referencef_log_cosh. Keeping the function alive makesln2a non-unused module variable so compilers don't warn / DCE it.s_compute_mthinc_normalsnon-uniform-grid math fix, theint_compreconstruction routing inm_rhs.fpp, theuse m_nvtxinm_muscl.fpp) stays.Caveat
This is a hypothesis-driven fix. I cannot reproduce on Frontier AMD locally, so the proof will be in this PR's CI. If
Oak Ridge | Frontier (AMD) (gpu-omp [1/2])and[2/2]go green, the layout-shift theory is confirmed and we should land this. If they fail again, the cause is elsewhere in #1401 and the fallback is reverting #1401 entirely and re-doing the math fix in a separately-tested PR.Failing-test list (for verification)
13 1D tests per shard, all of which
use m_muscl → use m_thincand so activate the broken declare-target mapping at module init, even though they don't invoke MTHINC at runtime:0879E062,8E3D99E6,1CCA82F5,02748F0F,9DAC4DDC,3A8359F6,461DCB09,F1CF01C4,F5512823,0288BDAD,B2EFB9F7,76C663B7,4A759316(shard 1) and similar for shard 2.Test plan
./mfc.sh precheck -j 8passes.gpu-omp [1/2]and[2/2]shards return to green (the actual proof).gpu-omp [1/2]and[2/2]shards stay green (no regression on the working backend).5126B21F,4E4FECA9,4F3722DB,4C4F339C,7A1719C6) pass.