Skip to content

p2p_api: add construct cleanup/unwind path (Bug 4016670)#1172

Open
fo40225 wants to merge 1 commit into
NVIDIA:mainfrom
fo40225:fix_4016670
Open

p2p_api: add construct cleanup/unwind path (Bug 4016670)#1172
fo40225 wants to merge 1 commit into
NVIDIA:mainfrom
fo40225:fix_4016670

Conversation

@fo40225
Copy link
Copy Markdown

@fo40225 fo40225 commented May 30, 2026

p2papiConstruct_IMPL created the P2P mapping (kbusCreateP2PMapping) but, on any later failure, returned the error WITHOUT removing it - leaking the peer IOMMU mapping and the bus P2P refcount. That leak is what trips the "left-over mappings in IOVAS" (io_vaspace.c) and "Sysmemdesc outlived its attached pGpu" (mem_desc.c) asserts on teardown.

Route every post-create failure exit to a single cleanup: label that removes the mapping(s): the EGM mapping create, both kbusGetBar1P2PDmaInfo queries, the SR-IOV _p2papiReservePeerID reservations, the vGPU NV_RM_RPC_ALLOC_OBJECT, both kbusSetupBindFla local/remote binds (and the bare-metal early return between them), refAddDependant, and the alive-refcount overflow checks. On success the label is reached with status == NV_OK and nothing is removed. Mapping removal mirrors p2papiDestruct (FLA bind / RPC object are not reversed there either, so they are not reversed here).

Compile-verified on Linux 6.17 + gcc 13.3 (src/nvidia/.../p2p_api.o).

p2papiConstruct_IMPL created the P2P mapping (kbusCreateP2PMapping) but, on any
later failure, returned the error WITHOUT removing it - leaking the peer IOMMU
mapping and the bus P2P refcount. That leak is what trips the "left-over
mappings in IOVAS" (io_vaspace.c) and "Sysmemdesc outlived its attached pGpu"
(mem_desc.c) asserts on teardown.

Route every post-create failure exit to a single cleanup: label that removes
the mapping(s): the EGM mapping create, both kbusGetBar1P2PDmaInfo queries, the
SR-IOV _p2papiReservePeerID reservations, the vGPU NV_RM_RPC_ALLOC_OBJECT, both
kbusSetupBindFla local/remote binds (and the bare-metal early return between
them), refAddDependant, and the alive-refcount overflow checks. On success the
label is reached with status == NV_OK and nothing is removed. Mapping removal
mirrors p2papiDestruct (FLA bind / RPC object are not reversed there either, so
they are not reversed here).

Compile-verified on Linux 6.17 + gcc 13.3 (src/nvidia/.../p2p_api.o).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 30, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants