Skip to content

fix(moq-video): NVENC falls back instead of aborting when no NVIDIA driver#1825

Closed
kixelated wants to merge 1 commit into
devfrom
claude/moq-video-nvenc-fallback
Closed

fix(moq-video): NVENC falls back instead of aborting when no NVIDIA driver#1825
kixelated wants to merge 1 commit into
devfrom
claude/moq-video-nvenc-fallback

Conversation

@kixelated

Copy link
Copy Markdown
Collaborator

Summary

On a Linux host with no NVIDIA driver, moq-video's NVENC backend aborts the process instead of falling back to software. cudarc (CudaContext::new) and the NVENC SDK both dlopen their driver libraries lazily and panic! when the library is missing (cudarc's panic_no_lib_found, the SDK's "failed to dlopen the NVIDIA encode library"). Under release's panic = "abort" that panic aborts; in tests (Cargo forces unwind) it shows up as a hard test failure.

So Kind::Auto / Kind::Hardware — the default for moq-cli capture — crashes on any GPU-less Linux box, contrary to the documented "falls back to software (see backend::open)" behavior.

This was surfaced by a Kind::Auto round-trip test in #1802 panicking on the GPU-less CI runner.

Fix

Probe the driver libraries with libloading before calling into cudarc / the SDK, and return Error::Codec if either is absent so backend::open moves on to the next candidate (openh264):

  • libcuda.so.1 / libcuda.so (cudarc's CUDA driver), and
  • libnvidia-encode.so.1 / libnvidia-encode.so (matches the SDK's own candidate list).

A pre-check (not catch_unwind) is required because release builds use panic = "abort", where catch_unwind can't help. libloading is added as an nvenc-feature-gated optional dep, mirroring the existing cudarc / nvidia-video-codec-sdk gating.

Test plan

  • cargo fmt --check clean; non-NVENC cargo build -p moq-video unaffected (macOS).
  • New #[cfg(all(target_os = "linux", feature = "nvenc"))] test missing_driver_errors_instead_of_panicking: on a host without the driver, Nvenc::open returns Err rather than panicking. This runs on the GPU-less Linux CI runner (cargo test -p moq-video --all-features), which is exactly where the abort happened before.
  • Could not compile-check the cfg(linux) path locally (macOS box; nix/rustup cross-toolchain split blocked a zigbuild). Relying on the Linux CI runner to compile + run the new test.

Note

VAAPI (moq-vaapi, dlopen'd libva) may have the same panic-on-missing-library shape; if so, Kind::Auto would still abort at the VAAPI candidate after NVENC falls back. Worth a follow-up check, but out of scope here.

(Written by Claude)

…ting

cudarc and the NVENC SDK dlopen their driver libraries lazily and panic (which
aborts the process under release's `panic = "abort"`) when the library is
absent. So on a GPU-less Linux host, Kind::Auto/Hardware crashed in
`CudaContext::new` instead of falling back to openh264, contrary to the
documented behavior.

Probe libcuda and libnvidia-encode with libloading before touching either
crate, and return Error::Codec when a library is missing so backend::open moves
on to the next candidate. Adds a Linux/nvenc test asserting open() errors
(rather than panics) when the driver is absent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kixelated

Copy link
Copy Markdown
Collaborator Author

Folded into #1819 (commit b0e7682): that PR makes hardware encoders always-on, which is what turns this NVENC panic into a crash on every GPU-less Linux box, so the fallback fix belongs there rather than as a separate PR racing it. Same probe (libcuda + libnvidia-encode via libloading) + the GPU-less test, adapted to the always-on dep layout (libloading as a non-optional target.linux dep). Verified in podman: cargo test -p moq-video --all-features --locked passes with the driver absent.

(Written by Claude)

@kixelated kixelated closed this Jun 20, 2026
@kixelated kixelated deleted the claude/moq-video-nvenc-fallback branch June 20, 2026 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant