AI-powered video highlight generation for educational/content video editing
First Release: Class Record Video Editor Agent (executed by OpenClaw)
Creating highlight reels from raw video footage is labor-intensive:
- Manual transcription and timestamp marking
- Subjective clip selection by watching entire footage
- Tedious cutting and concatenation in video editors
- Inconsistent quality across different editors
Typical workflow: 1 hour of raw footage → 30-60 minutes of manual editing → 3-5 minute highlight
An automated pipeline that:
- Transcribes speech with Whisper (word-level timestamps)
- Segments content by complete sentences (5s silence = boundary)
- Scores clips by content quality (keyword density, engagement signals)
- Selects top segments to match target duration (±10s tolerance)
- Exports ready-to-use highlight clips
| Task | Manual | Our Pipeline |
|---|---|---|
| Transcription | 0.5-1x video duration | 0.1-0.2x video duration |
| Clip Selection | 10-30 min | Automatic (seconds) |
| Export | 5-15 min | Automatic (seconds) |
| Total | 30-60 min per highlight | ~5-20 min (mostly Whisper) |
Time saved: 50-80% (mostly in transcription and selection)
Using local Whisper (CPU/GPU) instead of API — token cost is 0.
| Video Duration | Pipeline Tokens | Cost |
|---|---|---|
| 10 min | ~10K | Free |
| 30 min | ~15K | Free |
| 60 min | ~20K | Free |
Note: Only costs would be compute time for Whisper, no API charges.
- Scenario-specific templates:
event_recap— Event highlights (key moments, applause, Q&A)meeting_minutes— Meeting summaries (decisions, action items)lecture_summary— Educational content (concepts, examples)training_clip— Training videos (step-by-step instructions)
- Multi-language support (non-English transcription)
- Scene detection fallback (for non-speech videos)
- Custom scoring templates per content type
- Web UI for configuration and monitoring
- ✅ Production-ready: Successfully processed multiple projects
- ✅ Sentence-aware segmentation: Respects natural speech breaks
- ✅ Normalized scoring: Longer clips don't automatically win
- ✅ Queue system: Multi-folder batch processing with notifications
- ✅ Heartbeat monitoring: Progress tracking per SOP
# Clone the repo
git clone https://github.com/enderjiang/aiditor.git
cd aiditor
# Install dependencies
pip install openai-whisper
brew install ffmpeg # macOS
# Create a job config
# Copy jobs/00_TEMPLATE.json to jobs/my_project.json and edit the paths
# Run pipeline (via OpenClaw agent)
python pipeline.py --config job_config.json
# Or use the queue system for multi-folder processing
python queue.py --add # Interactive job creation
python queue.py # Run all jobs sequentiallyThis pipeline is designed to run as an OpenClaw agent task. The pipeline:
- Follows SOP (Standard Operating Procedure) defined in
SOP.txt - Uses heartbeat for progress tracking
- Supports queue system for batch processing
- Can be triggered via OpenClaw's task system
| Field | Description |
|---|---|
source_dir |
Folder containing input videos |
output_dir |
Where to save processed files |
target_duration |
Target clip length in seconds |
mode |
individual (one per video) or single (combine all) |
template |
Scoring template (default: classrecap) |
See SOP.txt for detailed documentation.
