fix(moq-video): probe NVIDIA driver libs before NVENC init (avoid abort on GPU-less box)#1844
Closed
kixelated wants to merge 1 commit into
Closed
fix(moq-video): probe NVIDIA driver libs before NVENC init (avoid abort on GPU-less box)#1844kixelated wants to merge 1 commit into
kixelated wants to merge 1 commit into
Conversation
`Kind::Auto` (the default used by `publish_capture`) aborted the process on a box without the NVIDIA driver instead of falling back to software. cudarc and nvidia-video-codec-sdk resolve their entry points via dlopen and `panic!` when the library is missing rather than returning an error, and the workspace builds with `panic = "abort"`, so `CudaContext::new` took the whole process down before `backend::open` could try openh264. This defeated the crate's "one portable binary falls back to software on a GPU-less machine" goal, and was newly reachable for H.265 once NVENC started advertising it. `Nvenc::open` now dlopen-probes libcuda and libnvidia-encode first and returns a clean error if either is absent, so the fallback chain proceeds to openh264. A driver-present-but-GPU-absent box still fails through the normal `CUresult` path, which was already handled. Add a regression test (`auto_h264_falls_back_without_driver`) that asserts `Auto` returns a backend rather than aborting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KVFm4YtH5u71sZzW6uaZC5
24 tasks
Collaborator
Author
|
Closing as superseded by #1819 ("make hardware encoders always-on"), which landed on Note: #1819 made the Linux hardware encoders always-on (dropped the Generated by Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Reviewing the just-merged #1840 surfaced a pre-existing robustness bug that my NVENC H.265 change widened:
Kind::Autoaborts the process on a box without the NVIDIA driver instead of falling back to software.Root cause (verified empirically on a GPU-less Linux box)
fallback-dynamic-loading) andnvidia-video-codec-sdk(dynamic-loading) resolve their entry points viadlopenandpanic!when the library is missing, rather than returning an error. cudarc'sculib()callspanic_no_lib_found(...).panic = "abort"(both[profile.dev]and[profile.release]).Nvenc::open→CudaContext::new(0)panics → process aborts, beforebackend::opencan fall through to openh264. The.map_err(...)?never runs.This defeats the crate's central packaging goal ("one portable binary reaches the GPU at runtime, falls back to software where the driver is absent"). It's reachable via the default
publish_capturepath (Options::kinddefaults toAuto). #1840 made it newly reachable for H.265, since NVENC now advertises that codec.Proof: a temporary
Auto+H264 test aborted with a cudarc panic backtrace atCudaContext::new; this box has nolibcudainldconfig.Fix
Nvenc::opennowdlopen-probeslibcuda/libnvidia-encodeup front (vialibloading, already in the tree through cudarc) and returns a cleanErrif either is absent, so the fallback chain proceeds to openh264. A driver-present-but-GPU-absent box still fails through the normalCUresultpath, which was already handled.Probing presence (not
catch_unwind) is the right tool here precisely becausepanic = "abort"makes unwinding-based recovery impossible.Test
auto_h264_falls_back_without_driver(Linux +nvencfeature) assertsbackend::openwithAutoreturns a backend rather than aborting. It holds on a GPU box (NVENC opens) and a driverless box (openh264 fallback) alike, guarding the panic regression. Fullcargo test -p moq-video --features nvenc: 14 pass.Related (not fixed here)
The VAAPI backend almost certainly has the identical latent bug: cros-libva
dlopenslibvaand likely panics on a miss, soAuto+--features vaapion a libva-less box would also abort. I couldn't build/verify it in this environment (no libva headers for cros-libva's bindgen), so I left it out rather than fix blind. Tracked as a follow-up in #1837. (Written by Claude)🤖 Generated with Claude Code
Generated by Claude Code