Webcam Capture Program

A cross-platform webcam capture application featuring real-time video/audio processing pipeline, RTSP streaming, MP4/MPEG-TS/AAC recording, and CNN-based face detection — with Rockchip MPP/RGA hardware acceleration on Rockchip platforms for high-efficiency H.264 encoding, JPEG decoding, and 2D image operations.

Features

Real-time webcam capture - Windows (DirectShow + VFW) and Linux (V4L2) support
JPEG image capture - Save snapshots manually or automatically
H.264 encoding - Hardware-accelerated video encoding via Rockchip MPP (Linux) or x264 (cross-platform)
AAC audio encoding - Real-time audio capture and encoding via ALSA (Linux) or WaveIn (Windows), with support for audio-only recording
YUV color space conversion - Fast I420 conversion using libyuv
RTSP streaming - Built-in RTSP server for live video and audio streaming
MP4/MPEG-TS recording - Save video and audio stream to MP4 or MPEG-TS file (with mp4v2)
Multi-threaded pipeline - Optimized processing with separate capture, conversion, and encoding threads
No OpenCV dependency - Lightweight implementation using native APIs
Face detection - CNN-based face detection with bounding box overlay using libfacedetection

Dependencies

CMake 3.10+
C++14 compatible compiler
Windows: Visual Studio 2022+ with Windows SDK (DirectShow / VFW)
Linux: V4L2 development libraries

All third-party dependencies are integrated in 3rdparty/ and built automatically via CMake ExternalProject:

Library	Purpose	URL
libjpeg-turbo	JPEG encoding/decoding	https://github.com/winlibs/libjpeg
libyuv	YUV color space conversion	https://github.com/lemenkov/libyuv
x264	H.264 software encoding (cross-platform)	https://github.com/Pawday/x264-cmake
rockchip-mpp	H.264 hardware encoding (Linux/Rockchip only)	https://github.com/rockchip-linux/mpp
rockchip-librga	2D hardware accelerator (Linux/Rockchip only)	https://github.com/tsukumijima/librga-rockchip
live555	RTSP streaming server	https://github.com/melchi45/live555
freetype	Font rendering for OSD text	https://download.savannah.gnu.org/releases/freetype/
libfacedetection	CNN-based face detection	https://github.com/ShiqiYu/libfacedetection
mp4v2	MP4 file recording	https://github.com/enzo1982/mp4v2
alsa-lib	Audio capture (Linux)	https://www.alsa-project.org/files/pub/lib/
fdk-aac	AAC audio encoding	https://github.com/mstorsjo/fdk-aac

Building

Windows

Strongly Recommended: Use Visual Studio 2022's built-in CMake support (see below). This provides the best development experience with integrated debugging, IntelliSense, and seamless CMake configuration.

Visual Studio 2022 (Recommended)

Open Microsoft Visual Studio 2022 Community (or later edition).
Select Open Local Folder, then open the folder containing the root CMakeLists.txt.
VS2022 will automatically detect and configure the CMake project. Choose x64-Debug (or x64-Release) from the toolbar dropdown.
Press F5 to build and debug.

For more details, refer to: CMake projects in Visual Studio

Command Line

Simplified (uses default generator, builds Debug by default)

mkdir build
cd build
cmake ..
cmake --build .

Explicit (recommended for reproducible builds)

mkdir build
cd build
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config Release

Defaults to Visual Studio 64-bit generator and builds Debug. For Release, add --config Release.

Linux

# Install dependencies
sudo apt-get update
sudo apt-get install libomp-dev    # Required for libfacedetection OpenMP support
sudo apt-get install nasm          # Required for x264 assembly code building
sudo apt-get install libasound2-plugins    # Required for ALSA (pulse plugin)

# Build (rockchip-mpp hardware encoder is auto-built on Linux)
mkdir build
cd build
cmake ..
make

Usage

Basic Modes

RTSP streaming mode (default)
- Start RTSP server, encode camera feed to H.264 and stream via RTSP.
JPEG capture mode
- No RTSP; capture JPEG snapshots only (manual or periodic).

Controls (JPEG mode)

Key	Action
`s`	Save current frame as JPEG
`q`	Quit program

Automatic Features

Auto-capture: Saves image every 30 frames
RTSP streaming: Available on configurable port (default 8554)

Output Files

Manual capture: snapshot_<timestamp>.jpg
Auto capture: auto_capture_<timestamp>.jpg

Program Arguments

webcam_capture can be configured via command-line options (same as in main.cpp):

webcam_capture [options]

Common options:

--vfw - Use VFW capture backend (default: DirectShow) (Windows only)
--test-pattern - Use virtual test-pattern generator (color bars, no camera required)
--test-audio-wave - Use virtual audio wave generator (sine tone, no microphone required)
--device <index> - Camera device index (default: 0)
--jpeg - Start in JPEG capture mode (default is RTSP mode)
--rtsp - Start in RTSP streaming mode (default)
--port <port> - RTSP server port (default: 8554)
--width <width> - Video width (default: 1280)
--height <height> - Video height (default: 720)
--fps <fps> - Frame rate (default: 30)
--bitrate <bps> - Video bitrate (default: 2000000, ~2 Mbps)
--encoder <enc> - Encoder type: x264 (cross-platform, default) or rockchip (Linux, hardware-accelerated)
--decoder <dec> - JPEG decoder type: libyuv (cross-platform, default) or rockchip (Linux, hardware-accelerated)
--no-osd - Disable OSD overlay (default: enabled)
--detect - Enable face detection with bounding box overlay (default: disabled)
--audio - Enable audio capture (default: disabled)
--audio-device <idx> - Audio device index: 0,1,2,... (default: auto)
--audio-sample-rate <hz> - Audio sample rate (default: 16000)
--audio-channels <n> - Audio channels: 1=mono, 2=stereo (default: 1)
--audio-bitrate <bps> - Audio bitrate (default: 128000, FDK-AAC)
--record [file] - Enable recording, optional specify output filename (default: record_<timestamp>.<format>)
--format <fmt> - Recording format: mp4 (default), ts, or aac (audio-only)
--segment-duration <sec> - Max duration per file in seconds (default: 300)
--segment-size <MB> - Max size per file in MB (default: 1024)
--max-files <num> - Max files to keep in loop mode (default: 100)
--loop - Enable loop overwrite mode (auto-delete oldest files when limit is reached)
--min-disk-space <MB> - Minimum free disk space threshold (default: 500)
--help - Show help

Examples:

# Default: RTSP mode, 1280x720@30fps, 2Mbps, port 8554, OSD enabled, x264 encoder
webcam_capture

# Test pattern mode: color-bar generator (no camera needed, useful for debugging)
webcam_capture --test-pattern

# Test audio-wave mode: sine-wave generator (no microphone needed, useful for audio pipeline debugging)
webcam_capture --test-audio-wave

# Full virtual pipeline: test-pattern + test-audio-wave (no hardware needed)
webcam_capture --test-pattern --test-audio-wave

# RTSP mode at 1280x720@30fps with OSD disabled
webcam_capture --rtsp --width 1280 --height 720 --fps 30 --no-osd

# RTSP mode with audio capture (default: 16000Hz, mono, 128kbps FDK-AAC)
webcam_capture --audio

# RTSP mode with audio capture on specific device
webcam_capture --audio --audio-device 1

# RTSP mode with stereo audio at 16kHz and 128kbps
webcam_capture --audio --audio-sample-rate 16000 --audio-channels 2 --audio-bitrate 128000

# RTSP mode with Rockchip hardware encoder (Linux/Rockchip boards only)
webcam_capture --rtsp --encoder rockchip

# RTSP mode with Rockchip hardware encoder and decoder (Linux/Rockchip boards only)
webcam_capture --rtsp --encoder rockchip --decoder rockchip

# RTSP mode with Rockchip hardware encoder and MP4 recording
webcam_capture --rtsp --encoder rockchip --record output

# RTSP mode with MP4 recording (video only)
webcam_capture --rtsp --record output

# RTSP mode with audio and MP4 recording
webcam_capture --audio --record output

# RTSP mode with audio-only AAC recording (no video)
webcam_capture --audio --record output --format aac

# RTSP mode with MPEG-TS recording
webcam_capture --rtsp --record output --format ts

# RTSP mode with 5-minute segment duration and loop recording (max 10 files)
webcam_capture --rtsp --record mycam --segment-duration 300 --max-files 10 --loop

# JPEG capture mode
webcam_capture --jpeg

RTSP URL

rtsp://<server-ip>:8554/live

Pipeline Flow

Webcam Capture (V4L2 / DirectShow / VFW / TestPattern)
    ↓
FrameQueue
    ↓
YUV Converter
    ├─ JpegDecoder (JPEG to YUV420P)
    |      ├─ LibyuvJpegDecoder     (libyuv, cross-platform)
    |      └─ RockchipJpegDecoder   (Rockchip MPP, Linux/Rockchip boards)
    └─ RgbConverter (BGR888 to YUV420P)
    |      ├─ LibyuvRgbConverter    (libyuv, cross-platform)
    |      └─ RgaRgbConverter       (Rockchip RGA, Linux/Rockchip boards)
    ↓
FrameQueue
    ↓
Detector (optional)
    ├─ ScaleConverter (YUV420P to BGR888)
    |      ├─ LibyuvScaleConverter  (libyuv, cross-platform)
    |      └─ RgaScaleConverter     (Rockchip RGA, Linux/Rockchip boards)
    ↓
FrameQueue
    ↓
OSD Renderer (optional)
    ↓
FrameQueue
    ↓
H264Encoder (abstraction layer)
    ├─ X264Encoder         (x264, cross-platform)
    └─ RockchipEncoder     (Rockchip MPP, Linux/Rockchip boards)
    ↓
FrameQueue
    ↓
RTSP Server ←→ Video Writer (parallel)
    ↓              ↓
Network       MP4/TS File

Audio Pipeline (parallel)

Audio Capture (ALSA / WaveIn / TestWave)
    ↓
FrameQueue
    ↓
AACEncoder (FDK-AAC)
    ↓
FrameQueue
    ├─→ RTSP Server (audio stream)
    └─→ Video Writer (MP4/TS/AAC recording)

DirectShow Backend (Windows)

On Windows, the capture backend uses DirectShow instead of the deprecated Sample Grabber filter. The architecture is built from scratch with two custom COM objects:

FrameSinkFilter

A minimal DirectShow filter with a single input pin that implements:

IBaseFilter — filter lifecycle (AddRef/Release, state transitions, graph management)
IEnumPins — enumerates the single input pin
IPin (input) — receives connection requests and media type negotiation
IMemInputPin — receives video frames via Receive()

Frame Flow

Camera Filter (Capture Device)
        ↓ (PIN_CATEGORY_CAPTURE)
FrameSinkFilter::FrameSinkInputPin::Receive()
        ↓
FrameQueue<WebcamFrame>

Frames arrive as compressed IMediaSample buffers (e.g. MJPG, YUY2) and are pushed directly into the FrameQueue without any intermediate copy through Sample Grabber callbacks.

Format Support

Format	GUID	Notes
MJPG	MEDIASUBTYPE_MJPG	Preferred for USB cameras
YUY2	MEDIASUBTYPE_YUY2	Uncompressed, higher bandwidth

Fallback

If ICaptureGraphBuilder2::RenderStream fails to build the graph automatically, the code falls back to manual pin enumeration and IGraphBuilder::Connect.

Recording

The program supports three recording formats:

Format	File Extension	Description
MP4	`.mp4`	H.264 video + AAC audio in a standard container
MPEG-TS	`.ts`	H.264 + AAC in an MPEG Transport Stream container
AAC	`.aac`	Audio-only raw AAC stream with ADTS headers

AAC File Recording

When --format aac is used, only audio is captured and saved — no video encoding or processing occurs. This is useful for standalone audio recording scenarios.

The output .aac file contains raw AAC frames with ADTS headers prepended, making it directly playable by most media players and compatible with standard AAC decoders.

File Management

The program includes a built-in file manager supporting automatic segmentation and loop recording:

Segmentation — Files can be split by duration (--segment-duration) or size (--segment-size). When a segment limit is reached, a new file is created automatically:

output_001.mp4  ← 5 min
output_002.mp4  ← 5 min
output_003.mp4  ← 5 min

Loop Recording — When --loop is enabled, the program automatically deletes the oldest files once --max-files is reached, ensuring the disk never fills up:

webcam_capture --rtsp --record mycam --segment-duration 300 --max-files 10 --loop

This keeps the last 10 segment files and overwrites the oldest when a new one starts.

The manager also monitors disk space (--min-disk-space) and stops recording gracefully if free space drops below the threshold.

ALSA: `Cannot open shared library libasound_module_conf_pulse.so`

When running the program on Linux, you may encounter the following error:

ALSA lib ...: Cannot open shared library libasound_module_conf_pulse.so
ALSA lib ...: Unknown PCM default

This occurs because the ALSA library used by the program cannot locate the PulseAudio plugin. Here are two solutions:

Method 1: Set the plugin directory via environment variable (temporary)

First, locate libasound_module_conf_pulse.so on your system:

find / -name libasound_module_conf_pulse.so 2>/dev/null

The output typically looks like:

/usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_conf_pulse.so

Run the program with the plugin directory specified:

ALSA_PLUGIN_DIR=/usr/lib/aarch64-linux-gnu/alsa-lib/ ./webcam_capture --device 1 --encoder rockchip --audio

Method 2: Create symbolic links (permanent)

If you prefer not to set the environment variable every time, create symbolic links pointing to the system's ALSA plugins:

sudo mkdir -p <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/
sudo ln -s /usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_conf_pulse.so <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/
sudo ln -s /usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/

Replace <build-dir> with your actual build directory, e.g. build or out/build/x64-Debug.

x264: Compiling on Windows Produces Libraries Without Assembly Optimizations, Resulting in Low Efficiency

The unofficial CMake script used to compile x264 on Windows does not support compiling assembly code files. As a result, the x264 library built on Windows is compiled in pure C, which leads to significantly lower video encoding/decoding efficiency. One workaround is to replace the x264 library generated by this project with a pre-built one that includes assembly optimizations, then relink.

Steps:

Perform a full build of the project.
Download a pre-built x264 library with assembly optimizations from ShiftMediaProject/x264 releases. This project uses Visual Studio 2022 (MSVC17), and the current x264 version is r164. Download the package libx264_0.164.r3194_msvc17.zip.
Copy lib\x64\libx264.lib from the archive to <build-dir>\build\x64-Debug\3rdparty\x264-install\lib\, and rename it to x264_static.lib.
Rebuild (not a clean rebuild). The compiler will only perform linking, and the x264 library with assembly optimizations will be linked into the target executable.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webcam Capture Program

Features

Dependencies

Building

Windows

Visual Studio 2022 (Recommended)

Command Line

Linux

Usage

Basic Modes

Controls (JPEG mode)

Automatic Features

Output Files

Program Arguments

RTSP URL

Pipeline Flow

Audio Pipeline (parallel)

DirectShow Backend (Windows)

FrameSinkFilter

Frame Flow

Format Support

Fallback

Recording

AAC File Recording

File Management

ALSA: `Cannot open shared library libasound_module_conf_pulse.so`

x264: Compiling on Windows Produces Libraries Without Assembly Optimizations, Resulting in Low Efficiency

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
3rdparty		3rdparty
include		include
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
main.cpp		main.cpp

Folders and files

Latest commit

History

Repository files navigation

Webcam Capture Program

Features

Dependencies

Building

Windows

Visual Studio 2022 (Recommended)

Command Line

Linux

Usage

Basic Modes

Controls (JPEG mode)

Automatic Features

Output Files

Program Arguments

RTSP URL

Pipeline Flow

Audio Pipeline (parallel)

DirectShow Backend (Windows)

FrameSinkFilter

Frame Flow

Format Support

Fallback

Recording

AAC File Recording

File Management

ALSA: Cannot open shared library libasound_module_conf_pulse.so

x264: Compiling on Windows Produces Libraries Without Assembly Optimizations, Resulting in Low Efficiency

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

ALSA: `Cannot open shared library libasound_module_conf_pulse.so`

Packages