Skip to content

leiddev/webcamcapture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Webcam Capture Program

A cross-platform webcam capture application featuring real-time video/audio processing pipeline, RTSP streaming, MP4/MPEG-TS/AAC recording, and CNN-based face detection — with Rockchip MPP/RGA hardware acceleration on Rockchip platforms for high-efficiency H.264 encoding, JPEG decoding, and 2D image operations.

Features

  • Real-time webcam capture - Windows (DirectShow + VFW) and Linux (V4L2) support
  • JPEG image capture - Save snapshots manually or automatically
  • H.264 encoding - Hardware-accelerated video encoding via Rockchip MPP (Linux) or x264 (cross-platform)
  • AAC audio encoding - Real-time audio capture and encoding via ALSA (Linux) or WaveIn (Windows), with support for audio-only recording
  • YUV color space conversion - Fast I420 conversion using libyuv
  • RTSP streaming - Built-in RTSP server for live video and audio streaming
  • MP4/MPEG-TS recording - Save video and audio stream to MP4 or MPEG-TS file (with mp4v2)
  • Multi-threaded pipeline - Optimized processing with separate capture, conversion, and encoding threads
  • No OpenCV dependency - Lightweight implementation using native APIs
  • Face detection - CNN-based face detection with bounding box overlay using libfacedetection

Dependencies

  • CMake 3.10+
  • C++14 compatible compiler
  • Windows: Visual Studio 2022+ with Windows SDK (DirectShow / VFW)
  • Linux: V4L2 development libraries

All third-party dependencies are integrated in 3rdparty/ and built automatically via CMake ExternalProject:

Library Purpose URL
libjpeg-turbo JPEG encoding/decoding https://github.com/winlibs/libjpeg
libyuv YUV color space conversion https://github.com/lemenkov/libyuv
x264 H.264 software encoding (cross-platform) https://github.com/Pawday/x264-cmake
rockchip-mpp H.264 hardware encoding (Linux/Rockchip only) https://github.com/rockchip-linux/mpp
rockchip-librga 2D hardware accelerator (Linux/Rockchip only) https://github.com/tsukumijima/librga-rockchip
live555 RTSP streaming server https://github.com/melchi45/live555
freetype Font rendering for OSD text https://download.savannah.gnu.org/releases/freetype/
libfacedetection CNN-based face detection https://github.com/ShiqiYu/libfacedetection
mp4v2 MP4 file recording https://github.com/enzo1982/mp4v2
alsa-lib Audio capture (Linux) https://www.alsa-project.org/files/pub/lib/
fdk-aac AAC audio encoding https://github.com/mstorsjo/fdk-aac

Building

Windows

Strongly Recommended: Use Visual Studio 2022's built-in CMake support (see below). This provides the best development experience with integrated debugging, IntelliSense, and seamless CMake configuration.

Visual Studio 2022 (Recommended)

  1. Open Microsoft Visual Studio 2022 Community (or later edition).
  2. Select Open Local Folder, then open the folder containing the root CMakeLists.txt.
  3. VS2022 will automatically detect and configure the CMake project. Choose x64-Debug (or x64-Release) from the toolbar dropdown.
  4. Press F5 to build and debug.

For more details, refer to: CMake projects in Visual Studio

Command Line

Simplified (uses default generator, builds Debug by default)

mkdir build
cd build
cmake ..
cmake --build .

Explicit (recommended for reproducible builds)

mkdir build
cd build
cmake .. -G "Visual Studio 17 2022" -A x64
cmake --build . --config Release

Defaults to Visual Studio 64-bit generator and builds Debug. For Release, add --config Release.

Linux

# Install dependencies
sudo apt-get update
sudo apt-get install libomp-dev    # Required for libfacedetection OpenMP support
sudo apt-get install nasm          # Required for x264 assembly code building
sudo apt-get install libasound2-plugins    # Required for ALSA (pulse plugin)

# Build (rockchip-mpp hardware encoder is auto-built on Linux)
mkdir build
cd build
cmake ..
make

Usage

Basic Modes

  • RTSP streaming mode (default)
    • Start RTSP server, encode camera feed to H.264 and stream via RTSP.
  • JPEG capture mode
    • No RTSP; capture JPEG snapshots only (manual or periodic).

Controls (JPEG mode)

Key Action
s Save current frame as JPEG
q Quit program

Automatic Features

  • Auto-capture: Saves image every 30 frames
  • RTSP streaming: Available on configurable port (default 8554)

Output Files

  • Manual capture: snapshot_<timestamp>.jpg
  • Auto capture: auto_capture_<timestamp>.jpg

Program Arguments

webcam_capture can be configured via command-line options (same as in main.cpp):

webcam_capture [options]

Common options:

  • --vfw - Use VFW capture backend (default: DirectShow) (Windows only)
  • --test-pattern - Use virtual test-pattern generator (color bars, no camera required)
  • --test-audio-wave - Use virtual audio wave generator (sine tone, no microphone required)
  • --device <index> - Camera device index (default: 0)
  • --jpeg - Start in JPEG capture mode (default is RTSP mode)
  • --rtsp - Start in RTSP streaming mode (default)
  • --port <port> - RTSP server port (default: 8554)
  • --width <width> - Video width (default: 1280)
  • --height <height> - Video height (default: 720)
  • --fps <fps> - Frame rate (default: 30)
  • --bitrate <bps> - Video bitrate (default: 2000000, ~2 Mbps)
  • --encoder <enc> - Encoder type: x264 (cross-platform, default) or rockchip (Linux, hardware-accelerated)
  • --decoder <dec> - JPEG decoder type: libyuv (cross-platform, default) or rockchip (Linux, hardware-accelerated)
  • --no-osd - Disable OSD overlay (default: enabled)
  • --detect - Enable face detection with bounding box overlay (default: disabled)
  • --audio - Enable audio capture (default: disabled)
  • --audio-device <idx> - Audio device index: 0,1,2,... (default: auto)
  • --audio-sample-rate <hz> - Audio sample rate (default: 16000)
  • --audio-channels <n> - Audio channels: 1=mono, 2=stereo (default: 1)
  • --audio-bitrate <bps> - Audio bitrate (default: 128000, FDK-AAC)
  • --record [file] - Enable recording, optional specify output filename (default: record_<timestamp>.<format>)
  • --format <fmt> - Recording format: mp4 (default), ts, or aac (audio-only)
  • --segment-duration <sec> - Max duration per file in seconds (default: 300)
  • --segment-size <MB> - Max size per file in MB (default: 1024)
  • --max-files <num> - Max files to keep in loop mode (default: 100)
  • --loop - Enable loop overwrite mode (auto-delete oldest files when limit is reached)
  • --min-disk-space <MB> - Minimum free disk space threshold (default: 500)
  • --help - Show help

Examples:

# Default: RTSP mode, 1280x720@30fps, 2Mbps, port 8554, OSD enabled, x264 encoder
webcam_capture

# Test pattern mode: color-bar generator (no camera needed, useful for debugging)
webcam_capture --test-pattern

# Test audio-wave mode: sine-wave generator (no microphone needed, useful for audio pipeline debugging)
webcam_capture --test-audio-wave

# Full virtual pipeline: test-pattern + test-audio-wave (no hardware needed)
webcam_capture --test-pattern --test-audio-wave

# RTSP mode at 1280x720@30fps with OSD disabled
webcam_capture --rtsp --width 1280 --height 720 --fps 30 --no-osd

# RTSP mode with audio capture (default: 16000Hz, mono, 128kbps FDK-AAC)
webcam_capture --audio

# RTSP mode with audio capture on specific device
webcam_capture --audio --audio-device 1

# RTSP mode with stereo audio at 16kHz and 128kbps
webcam_capture --audio --audio-sample-rate 16000 --audio-channels 2 --audio-bitrate 128000

# RTSP mode with Rockchip hardware encoder (Linux/Rockchip boards only)
webcam_capture --rtsp --encoder rockchip

# RTSP mode with Rockchip hardware encoder and decoder (Linux/Rockchip boards only)
webcam_capture --rtsp --encoder rockchip --decoder rockchip

# RTSP mode with Rockchip hardware encoder and MP4 recording
webcam_capture --rtsp --encoder rockchip --record output

# RTSP mode with MP4 recording (video only)
webcam_capture --rtsp --record output

# RTSP mode with audio and MP4 recording
webcam_capture --audio --record output

# RTSP mode with audio-only AAC recording (no video)
webcam_capture --audio --record output --format aac

# RTSP mode with MPEG-TS recording
webcam_capture --rtsp --record output --format ts

# RTSP mode with 5-minute segment duration and loop recording (max 10 files)
webcam_capture --rtsp --record mycam --segment-duration 300 --max-files 10 --loop

# JPEG capture mode
webcam_capture --jpeg

RTSP URL

rtsp://<server-ip>:8554/live

Pipeline Flow

Webcam Capture (V4L2 / DirectShow / VFW / TestPattern)
    ↓
FrameQueue
    ↓
YUV Converter
    ├─ JpegDecoder (JPEG to YUV420P)
    |      ├─ LibyuvJpegDecoder     (libyuv, cross-platform)
    |      └─ RockchipJpegDecoder   (Rockchip MPP, Linux/Rockchip boards)
    └─ RgbConverter (BGR888 to YUV420P)
    |      ├─ LibyuvRgbConverter    (libyuv, cross-platform)
    |      └─ RgaRgbConverter       (Rockchip RGA, Linux/Rockchip boards)
    ↓
FrameQueue
    ↓
Detector (optional)
    ├─ ScaleConverter (YUV420P to BGR888)
    |      ├─ LibyuvScaleConverter  (libyuv, cross-platform)
    |      └─ RgaScaleConverter     (Rockchip RGA, Linux/Rockchip boards)
    ↓
FrameQueue
    ↓
OSD Renderer (optional)
    ↓
FrameQueue
    ↓
H264Encoder (abstraction layer)
    ├─ X264Encoder         (x264, cross-platform)
    └─ RockchipEncoder     (Rockchip MPP, Linux/Rockchip boards)
    ↓
FrameQueue
    ↓
RTSP Server ←→ Video Writer (parallel)
    ↓              ↓
Network       MP4/TS File

Audio Pipeline (parallel)

Audio Capture (ALSA / WaveIn / TestWave)
    ↓
FrameQueue
    ↓
AACEncoder (FDK-AAC)
    ↓
FrameQueue
    ├─→ RTSP Server (audio stream)
    └─→ Video Writer (MP4/TS/AAC recording)

DirectShow Backend (Windows)

On Windows, the capture backend uses DirectShow instead of the deprecated Sample Grabber filter. The architecture is built from scratch with two custom COM objects:

FrameSinkFilter

A minimal DirectShow filter with a single input pin that implements:

  • IBaseFilter — filter lifecycle (AddRef/Release, state transitions, graph management)
  • IEnumPins — enumerates the single input pin
  • IPin (input) — receives connection requests and media type negotiation
  • IMemInputPin — receives video frames via Receive()

Frame Flow

Camera Filter (Capture Device)
        ↓ (PIN_CATEGORY_CAPTURE)
FrameSinkFilter::FrameSinkInputPin::Receive()
        ↓
FrameQueue<WebcamFrame>

Frames arrive as compressed IMediaSample buffers (e.g. MJPG, YUY2) and are pushed directly into the FrameQueue without any intermediate copy through Sample Grabber callbacks.

Format Support

Format GUID Notes
MJPG MEDIASUBTYPE_MJPG Preferred for USB cameras
YUY2 MEDIASUBTYPE_YUY2 Uncompressed, higher bandwidth

Fallback

If ICaptureGraphBuilder2::RenderStream fails to build the graph automatically, the code falls back to manual pin enumeration and IGraphBuilder::Connect.

Recording

The program supports three recording formats:

Format File Extension Description
MP4 .mp4 H.264 video + AAC audio in a standard container
MPEG-TS .ts H.264 + AAC in an MPEG Transport Stream container
AAC .aac Audio-only raw AAC stream with ADTS headers

AAC File Recording

When --format aac is used, only audio is captured and saved — no video encoding or processing occurs. This is useful for standalone audio recording scenarios.

The output .aac file contains raw AAC frames with ADTS headers prepended, making it directly playable by most media players and compatible with standard AAC decoders.

File Management

The program includes a built-in file manager supporting automatic segmentation and loop recording:

Segmentation — Files can be split by duration (--segment-duration) or size (--segment-size). When a segment limit is reached, a new file is created automatically:

output_001.mp4  ← 5 min
output_002.mp4  ← 5 min
output_003.mp4  ← 5 min

Loop Recording — When --loop is enabled, the program automatically deletes the oldest files once --max-files is reached, ensuring the disk never fills up:

webcam_capture --rtsp --record mycam --segment-duration 300 --max-files 10 --loop

This keeps the last 10 segment files and overwrites the oldest when a new one starts.

The manager also monitors disk space (--min-disk-space) and stops recording gracefully if free space drops below the threshold.

ALSA: Cannot open shared library libasound_module_conf_pulse.so

When running the program on Linux, you may encounter the following error:

ALSA lib ...: Cannot open shared library libasound_module_conf_pulse.so
ALSA lib ...: Unknown PCM default

This occurs because the ALSA library used by the program cannot locate the PulseAudio plugin. Here are two solutions:

Method 1: Set the plugin directory via environment variable (temporary)

  1. First, locate libasound_module_conf_pulse.so on your system:
find / -name libasound_module_conf_pulse.so 2>/dev/null

The output typically looks like:

/usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_conf_pulse.so
  1. Run the program with the plugin directory specified:
ALSA_PLUGIN_DIR=/usr/lib/aarch64-linux-gnu/alsa-lib/ ./webcam_capture --device 1 --encoder rockchip --audio

Method 2: Create symbolic links (permanent)

If you prefer not to set the environment variable every time, create symbolic links pointing to the system's ALSA plugins:

sudo mkdir -p <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/
sudo ln -s /usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_conf_pulse.so <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/
sudo ln -s /usr/lib/aarch64-linux-gnu/alsa-lib/libasound_module_pcm_pulse.so <build-dir>/3rdparty/alsa-lib-install/lib/alsa-lib/

Replace <build-dir> with your actual build directory, e.g. build or out/build/x64-Debug.

x264: Compiling on Windows Produces Libraries Without Assembly Optimizations, Resulting in Low Efficiency

The unofficial CMake script used to compile x264 on Windows does not support compiling assembly code files. As a result, the x264 library built on Windows is compiled in pure C, which leads to significantly lower video encoding/decoding efficiency. One workaround is to replace the x264 library generated by this project with a pre-built one that includes assembly optimizations, then relink.

Steps:

  1. Perform a full build of the project.
  2. Download a pre-built x264 library with assembly optimizations from ShiftMediaProject/x264 releases. This project uses Visual Studio 2022 (MSVC17), and the current x264 version is r164. Download the package libx264_0.164.r3194_msvc17.zip.
  3. Copy lib\x64\libx264.lib from the archive to <build-dir>\build\x64-Debug\3rdparty\x264-install\lib\, and rename it to x264_static.lib.
  4. Rebuild (not a clean rebuild). The compiler will only perform linking, and the x264 library with assembly optimizations will be linked into the target executable.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A cross-platform webcam capture application featuring real-time video/audio processing pipeline, RTSP streaming, MP4/MPEG-TS/AAC recording, and CNN-based face detection — with Rockchip MPP/RGA hardware acceleration on Rockchip platforms for high-efficiency H.264 encoding, JPEG decoding, and 2D image operations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors