A cross-platform webcam capture application featuring real-time video/audio processing pipeline, RTSP streaming, MP4/MPEG-TS/AAC recording, and CNN-based face detection — with Rockchip MPP/RGA hardware acceleration on Rockchip platforms for high-efficiency H.264 encoding, JPEG decoding, and 2D image operations.
- Real-time webcam capture - Windows (DirectShow + VFW) and Linux (V4L2) support
- JPEG image capture - Save snapshots manually or automatically
- H.264 encoding - Hardware-accelerated video encoding via Rockchip MPP (Linux) or x264 (cross-platform)
- AAC audio encoding - Real-time audio capture and encoding via ALSA (Linux) or WaveIn (Windows), with support for audio-only recording
- YUV color space conversion - Fast I420 conversion using libyuv
- RTSP streaming - Built-in RTSP server for live video and audio streaming
- MP4/MPEG-TS recording - Save video and audio stream to MP4 or MPEG-TS file (with mp4v2)
- Multi-threaded pipeline - Optimized processing with separate capture, conversion, and encoding threads
- No OpenCV dependency - Lightweight implementation using native APIs
- Face detection - CNN-based face detection with bounding box overlay using libfacedetection
- CMake 3.10+
- C++14 compatible compiler
- Windows: Visual Studio 2022+ with Windows SDK (DirectShow / VFW)
- Linux: V4L2 development libraries
All third-party dependencies are integrated in 3rdparty/ and built automatically via CMake ExternalProject:
Strongly Recommended: Use Visual Studio 2022's built-in CMake support (see below). This provides the best development experience with integrated debugging, IntelliSense, and seamless CMake configuration.
- Open Microsoft Visual Studio 2022 Community (or later edition).
- Select Open Local Folder, then open the folder containing the root
CMakeLists.txt. - VS2022 will automatically detect and configure the CMake project. Choose
x64-Debug(orx64-Release) from the toolbar dropdown. - Press F5 to build and debug.
For more details, refer to: CMake projects in Visual Studio
Simplified (uses default generator, builds Debug by default)
mkdir build
cd build
cmake ..
cmake --build .Explicit (recommended for reproducible builds)
mkdir build
cd build
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config ReleaseDefaults to Visual Studio 64-bit generator and builds Debug. For Release, add --config Release.
# Install dependencies
sudo apt-get update
sudo apt-get install libomp-dev # Required for libfacedetection OpenMP support
sudo apt-get install nasm # Required for x264 assembly code building
sudo apt-get install libasound2-plugins # Required for ALSA (pulse plugin)
# Build (rockchip-mpp hardware encoder is auto-built on Linux)
mkdir build
cd build
cmake ..
make- RTSP streaming mode (default)
- Start RTSP server, encode camera feed to H.264 and stream via RTSP.
- JPEG capture mode
- No RTSP; capture JPEG snapshots only (manual or periodic).
| Key | Action |
|---|---|
s |
Save current frame as JPEG |
q |
Quit program |
- Auto-capture: Saves image every 30 frames
- RTSP streaming: Available on configurable port (default 8554)
- Manual capture:
snapshot_<timestamp>.jpg - Auto capture:
auto_capture_<timestamp>.jpg
webcam_capture can be configured via command-line options (same as in main.cpp):
webcam_capture [options]Common options:
--vfw- Use VFW capture backend (default: DirectShow) (Windows only)--test-pattern- Use virtual test-pattern generator (color bars, no camera required)--test-audio-wave- Use virtual audio wave generator (sine tone, no microphone required)--device <index>- Camera device index (default: 0)--jpeg- Start in JPEG capture mode (default is RTSP mode)--rtsp- Start in RTSP streaming mode (default)--port <port>- RTSP server port (default: 8554)--width <width>- Video width (default: 1280)--height <height>- Video height (default: 720)--fps <fps>- Frame rate (default: 30)--bitrate <bps>- Video bitrate (default: 2000000, ~2 Mbps)--encoder <enc>- Encoder type:x264(cross-platform, default) orrockchip(Linux, hardware-accelerated)--decoder <dec>- JPEG decoder type:libyuv(cross-platform, default) orrockchip(Linux, hardware-accelerated)--no-osd- Disable OSD overlay (default: enabled)--detect- Enable face detection with bounding box overlay (default: disabled)--audio- Enable audio capture (default: disabled)--audio-device <idx>- Audio device index: 0,1,2,... (default: auto)--audio-sample-rate <hz>- Audio sample rate (default: 16000)--audio-channels <n>- Audio channels: 1=mono, 2=stereo (default: 1)--audio-bitrate <bps>- Audio bitrate (default: 128000, FDK-AAC)--record [file]- Enable recording, optional specify output filename (default:record_<timestamp>.<format>)--format <fmt>- Recording format:mp4(default),ts, oraac(audio-only)--segment-duration <sec>- Max duration per file in seconds (default: 300)--segment-size <MB>- Max size per file in MB (default: 1024)--max-files <num>- Max files to keep in loop mode (default: 100)--loop- Enable loop overwrite mode (auto-delete oldest files when limit is reached)--min-disk-space <MB>- Minimum free disk space threshold (default: 500)--help- Show help
Examples:
# Default: RTSP mode, 1280x720@30fps, 2Mbps, port 8554, OSD enabled, x264 encoder
webcam_capture
# Test pattern mode: color-bar generator (no camera needed, useful for debugging)
webcam_capture --test-pattern
# Test audio-wave mode: sine-wave generator (no microphone needed, useful for audio pipeline debugging)
webcam_capture --test-audio-wave
# Full virtual pipeline: test-pattern + test-audio-wave (no hardware needed)
webcam_capture --test-pattern --test-audio-wave
# RTSP mode at 1280x720@30fps with OSD disabled
webcam_capture --rtsp --width 1280 --height 720 --fps 30 --no-osd
# RTSP mode with audio capture (default: 16000Hz, mono, 128kbps FDK-AAC)
webcam_capture --audio
# RTSP mode with audio capture on specific device
webcam_capture --audio --audio-device 1
# RTSP mode with stereo audio at 16kHz and 128kbps
webcam_capture --audio --audio-sample-rate 16000 --audio-channels 2 --audio-bitrate 128000
# RTSP mode with Rockchip hardware encoder (Linux/Rockchip boards only)
webcam_capture --rtsp --encoder rockchip
# RTSP mode with Rockchip hardware encoder and decoder (Linux/Rockchip boards only)
webcam_capture --rtsp --encoder rockchip --decoder rockchip
# RTSP mode with Rockchip hardware encoder and MP4 recording
webcam_capture --rtsp --encoder rockchip --record output
# RTSP mode with MP4 recording (video only)
webcam_capture --rtsp --record output
# RTSP mode with audio and MP4 recording
webcam_capture --audio --record output
# RTSP mode with audio-only AAC recording (no video)
webcam_capture --audio --record output --format aac
# RTSP mode with MPEG-TS recording
webcam_capture --rtsp --record output --format ts
# RTSP mode with 5-minute segment duration and loop recording (max 10 files)
webcam_capture --rtsp --record mycam --segment-duration 300 --max-files 10 --loop
# JPEG capture mode
webcam_capture --jpegrtsp://<server-ip>:8554/live
Webcam Capture (V4L2 / DirectShow / VFW / TestPattern)
↓
FrameQueue
↓
YUV Converter
├─ JpegDecoder (JPEG to YUV420P)
| ├─ LibyuvJpegDecoder (libyuv, cross-platform)
| └─ RockchipJpegDecoder (Rockchip MPP, Linux/Rockchip boards)
└─ RgbConverter (BGR888 to YUV420P)
| ├─ LibyuvRgbConverter (libyuv, cross-platform)
| └─ RgaRgbConverter (Rockchip RGA, Linux/Rockchip boards)
↓
FrameQueue
↓
Detector (optional)
├─ ScaleConverter (YUV420P to BGR888)
| ├─ LibyuvScaleConverter (libyuv, cross-platform)
| └─ RgaScaleConverter (Rockchip RGA, Linux/Rockchip boards)
↓
FrameQueue
↓
OSD Renderer (optional)
↓
FrameQueue
↓
H264Encoder (abstraction layer)
├─ X264Encoder (x264, cross-platform)
└─ RockchipEncoder (Rockchip MPP, Linux/Rockchip boards)
↓
FrameQueue
↓
RTSP Server ←→ Video Writer (parallel)
↓ ↓
Network MP4/TS File
Audio Capture (ALSA / WaveIn / TestWave)
↓
FrameQueue
↓
AACEncoder (FDK-AAC)
↓
FrameQueue
├─→ RTSP Server (audio stream)
└─→ Video Writer (MP4/TS/AAC recording)
On Windows, the capture backend uses DirectShow instead of the deprecated Sample Grabber filter. The architecture is built from scratch with two custom COM objects:
A minimal DirectShow filter with a single input pin that implements:
IBaseFilter— filter lifecycle (AddRef/Release, state transitions, graph management)IEnumPins— enumerates the single input pinIPin(input) — receives connection requests and media type negotiationIMemInputPin— receives video frames viaReceive()
Camera Filter (Capture Device)
↓ (PIN_CATEGORY_CAPTURE)
FrameSinkFilter::FrameSinkInputPin::Receive()
↓
FrameQueue<WebcamFrame>
Frames arrive as compressed IMediaSample buffers (e.g. MJPG, YUY2) and are pushed directly into the FrameQueue without any intermediate copy through Sample Grabber callbacks.
| Format | GUID | Notes |
|---|---|---|
| MJPG | MEDIASUBTYPE_MJPG | Preferred for USB cameras |
| YUY2 | MEDIASUBTYPE_YUY2 | Uncompressed, higher bandwidth |
If ICaptureGraphBuilder2::RenderStream fails to build the graph automatically, the code falls back to manual pin enumeration and IGraphBuilder::Connect.
The program supports three recording formats:
| Format | File Extension | Description |
|---|---|---|
| MP4 | .mp4 |
H.264 video + AAC audio in a standard container |
| MPEG-TS | .ts |
H.264 + AAC in an MPEG Transport Stream container |
| AAC | .aac |
Audio-only raw AAC stream with ADTS headers |
When --format aac is used, only audio is captured and saved — no video encoding or processing occurs. This is useful for standalone audio recording scenarios.
The output .aac file contains raw AAC frames with ADTS headers prepended, making it directly playable by most media players and compatible with standard AAC decoders.
The program includes a built-in file manager supporting automatic segmentation and loop recording:
Segmentation — Files can be split by duration (--segment-duration) or size (--segment-size). When a segment limit is reached, a new file is created automatically:
output_001.mp4 ← 5 min
output_002.mp4 ← 5 min
output_003.mp4 ← 5 min
Loop Recording — When --loop is enabled, the program automatically deletes the oldest files once --max-files is reached, ensuring the disk never fills up:
webcam_capture --rtsp --record mycam --segment-duration 300 --max-files 10 --loop
This keeps the last 10 segment files and overwrites the oldest when a new one starts.
The manager also monitors disk space (--min-disk-space) and stops recording gracefully if free space drops below the threshold.
When running the program on Linux, you may encounter the following error:
ALSA lib ...: Cannot open shared library libasound_module_conf_pulse.so
ALSA lib ...: Unknown PCM default
This occurs because the ALSA library used by the program cannot locate the PulseAudio plugin. Here are two solutions:
Method 1: Set the plugin directory via environment variable (temporary)
- First, locate
libasound_module_conf_pulse.soon your system:
find / -name libasound_module_conf_pulse.so 2>/dev/nullThe output typically looks like:
/usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_conf_pulse.so
- Run the program with the plugin directory specified:
ALSA_PLUGIN_DIR=/usr/lib/aarch64-linux-gnu/alsa-lib/ ./webcam_capture --device 1 --encoder rockchip --audioMethod 2: Create symbolic links (permanent)
If you prefer not to set the environment variable every time, create symbolic links pointing to the system's ALSA plugins:
sudo mkdir -p <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/
sudo ln -s /usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_conf_pulse.so <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/
sudo ln -s /usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/Replace <build-dir> with your actual build directory, e.g. build or out/build/x64-Debug.
x264: Compiling on Windows Produces Libraries Without Assembly Optimizations, Resulting in Low Efficiency
The unofficial CMake script used to compile x264 on Windows does not support compiling assembly code files. As a result, the x264 library built on Windows is compiled in pure C, which leads to significantly lower video encoding/decoding efficiency. One workaround is to replace the x264 library generated by this project with a pre-built one that includes assembly optimizations, then relink.
Steps:
- Perform a full build of the project.
- Download a pre-built x264 library with assembly optimizations from ShiftMediaProject/x264 releases. This project uses Visual Studio 2022 (MSVC17), and the current x264 version is r164. Download the package
libx264_0.164.r3194_msvc17.zip. - Copy
lib\x64\libx264.libfrom the archive to<build-dir>\build\x64-Debug\3rdparty\x264-install\lib\, and rename it tox264_static.lib. - Rebuild (not a clean rebuild). The compiler will only perform linking, and the x264 library with assembly optimizations will be linked into the target executable.
This project is licensed under the MIT License - see the LICENSE file for details.