Skip to content

htekdev/vidpipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

113 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ██╗   ██╗██╗██████╗ ██████╗ ██╗██████╗ ███████╗
 ██║   ██║██║██╔══██╗██╔══██╗██║██╔══██╗██╔════╝
 ██║   ██║██║██║  ██║██████╔╝██║██████╔╝█████╗  
 ╚██╗ ██╔╝██║██║  ██║██╔═══╝ ██║██╔═══╝ ██╔══╝  
  ╚████╔╝ ██║██████╔╝██║     ██║██║     ███████╗
   ╚═══╝  ╚═╝╚═════╝ ╚═╝     ╚═╝╚═╝     ╚══════╝

Your AI video editor and content ideation engine — turn raw recordings into shorts, reels, captions, social posts, and blog posts. Ideate, record, edit, publish.

An agentic video editor and content ideation platform that watches for new recordings and edits them into social-media-ready content — shorts, reels, captions, blog posts, and platform-tailored social posts — using GitHub Copilot SDK AI agents, OpenAI Whisper, and Google Gemini.

CI npm version Node.js 20+ License: ISC Docs Last Updated

npm install -g vidpipe

✨ Features

VidPipe Features — Input → AI Processing → Outputs


💡 Content Ideation (ID8) — AI-generated, trend-backed video ideas 🎙️ Whisper Transcription — Word-level timestamps
📐 Split-Screen Layouts — Portrait, square, and feed 🔇 AI Silence Removal — Context-aware, capped at 20%
💬 Karaoke Captions — Word-by-word highlighting ✂️ Short Clips — Best 15–60s moments, hook-first ordering
🎞️ Medium Clips — 1–3 min with crossfade transitions 📑 Chapter Detection — JSON, Markdown, YouTube, FFmeta
📱 Social Posts — TikTok, YouTube, Instagram, LinkedIn, X 📰 Blog Post — Dev.to style with web-sourced links
🎨 Brand Voice — Custom tone, hashtags via brand.json 🔍 Face Detection — ONNX-based webcam cropping
🚀 Auto-Publish — Scheduled posting via Late API 👁️ Gemini Vision — AI video analysis and scene detection

🚀 Quick Start

# Install globally
npm install -g vidpipe

# Set up your environment
# Unix/Mac
cp .env.example .env
# Windows (PowerShell)
Copy-Item .env.example .env

# Then edit .env and add your OpenAI API key (REQUIRED):
#   OPENAI_API_KEY=sk-your-key-here

# Verify all prerequisites are met
vidpipe --doctor

# Process a single video
vidpipe /path/to/video.mp4

# Watch a folder for new recordings
vidpipe --watch-dir ~/Videos/Recordings

# Generate a saved idea bank for future recordings
vidpipe ideate --topics "GitHub Copilot, Azure, TypeScript" --count 4

# Add a single idea with AI enrichment
vidpipe ideate --add --topic "Building CI/CD with GitHub Actions"

# Full example with options
vidpipe \
  --watch-dir ~/Videos/Recordings \
  --output-dir ~/Content/processed \
  --openai-key sk-... \
  --brand ./brand.json \
  --verbose

Prerequisites:

  • Node.js 20+
  • FFmpeg 6.0+ — Auto-bundled on common platforms (Windows x64, macOS, Linux x64) via ffmpeg-static. On other architectures, install system FFmpeg (see Troubleshooting). Override with FFMPEG_PATH env var if you need a specific build.
  • OpenAI API key (required) — Get one at platform.openai.com/api-keys. Needed for Whisper transcription and all AI features.
  • GitHub Copilot subscription — Required for AI agent features (shorts generation, social media posts, summaries, blog posts). See GitHub Copilot.

See Getting Started for full setup instructions.


🎮 CLI Usage

vidpipe [options] [video-path]
vidpipe init              # Interactive setup wizard
vidpipe review            # Open post review web app
vidpipe schedule          # View posting schedule
vidpipe realign           # Realign scheduled posts to match schedule.json
vidpipe realign --queue   # Queue-based realignment (reshuffleExisting)
vidpipe sync-queues       # Sync schedule.json queue definitions to Late API
vidpipe reschedule        # Reschedule idea-linked posts for optimal placement
vidpipe ideate            # Generate or list saved content ideas
vidpipe chat              # Interactive schedule management agent
vidpipe doctor            # Check all prerequisites

Process Options

Option Description
[video-path] Process a specific video file (implies --once)
--watch-dir <path> Folder to watch for new recordings
--output-dir <path> Output directory (default: ./recordings)
--openai-key <key> OpenAI API key
--exa-key <key> Exa AI key for web search in social posts
--brand <path> Path to brand.json (default: ./brand.json)
--ideas <ids> Comma-separated idea IDs to link to this video
--once Process next video and exit
--no-silence-removal Skip silence removal
--no-shorts Skip short clip extraction
--no-medium-clips Skip medium clip generation
--no-social Skip social media posts
--no-social-publish Skip social media queue-build stage
--no-captions Skip caption generation/burning
--late-api-key <key> Override Late API key
-v, --verbose Debug-level logging
--progress Emit structured JSON progress events to stderr
--doctor Check that all prerequisites are installed

Ideate Options

Option Description
--topics <topics> Comma-separated seed topics for trend research
--count <n> Number of ideas to generate (default: 5)
--list List existing ideas instead of generating
--status <status> Filter by status: draft, ready, recorded, published
--format <format> Output format: table (default) or json
--output <dir> Ideas directory (default: ./ideas)
--brand <path> Brand config path (default: ./brand.json)
--add Create a single idea (AI-enriched by default)
--topic <topic> Topic for the idea (required with --add)
--hook <hook> Opening hook (AI-generated if omitted)
--audience <audience> Target audience (default: "developers")
--platforms <list> Comma-separated platforms: youtube,tiktok,instagram,linkedin,x
--key-takeaway <msg> Core message (AI-generated if omitted)
--talking-points <list> Comma-separated talking points
--tags <list> Comma-separated categorization tags
--publish-by <date> Publish-by date (default: 14 days from now)
--trend-context <text> Trend research context
--no-ai Skip AI research agent, use CLI values + defaults

📁 Output Structure

recordings/
└── my-awesome-demo/
    ├── my-awesome-demo.mp4                  # Original video
    ├── my-awesome-demo-edited.mp4           # Silence-removed
    ├── my-awesome-demo-captioned.mp4        # With burned-in captions
    ├── transcript.json                      # Word-level transcript
    ├── transcript-edited.json               # Timestamps adjusted for silence removal
    ├── README.md                            # AI-generated summary with screenshots
    ├── captions/
    │   ├── captions.srt                     # SubRip subtitles
    │   ├── captions.vtt                     # WebVTT subtitles
    │   └── captions.ass                     # Advanced SSA (karaoke-style)
    ├── shorts/
    │   ├── catchy-title.mp4                 # Landscape base clip
    │   ├── catchy-title-captioned.mp4       # Landscape + burned captions
    │   ├── catchy-title-portrait.mp4        # 9:16 split-screen
    │   ├── catchy-title-portrait-captioned.mp4  # Portrait + captions + hook overlay
    │   ├── catchy-title-feed.mp4            # 4:5 split-screen
    │   ├── catchy-title-square.mp4          # 1:1 split-screen
    │   ├── catchy-title.md                  # Clip metadata
    │   └── catchy-title/
    │       └── posts/                       # Per-short social posts (5 platforms)
    ├── medium-clips/
    │   ├── deep-dive-topic.mp4              # Landscape base clip
    │   ├── deep-dive-topic-captioned.mp4    # With burned captions
    │   ├── deep-dive-topic.md               # Clip metadata
    │   └── deep-dive-topic/
    │       └── posts/                       # Per-clip social posts (5 platforms)
    ├── chapters/
    │   ├── chapters.json                    # Structured chapter data
    │   ├── chapters.md                      # Markdown table
    │   ├── chapters.ffmetadata              # FFmpeg metadata format
    │   └── chapters-youtube.txt             # YouTube description timestamps
    └── social-posts/
        ├── tiktok.md                        # Full-video social posts
        ├── youtube.md
        ├── instagram.md
        ├── linkedin.md
        ├── x.md
        └── devto.md                         # Dev.to blog post

💡 Content Ideation (ID8)

VidPipe includes a research-backed content ideation engine that generates video ideas before you record. Ideas are stored as GitHub Issues for full lifecycle tracking.

# Generate ideas backed by trend research
vidpipe ideate --topics "GitHub Copilot, TypeScript" --count 4

# List all saved ideas
vidpipe ideate --list

# Filter by status
vidpipe ideate --list --status ready

# JSON output for programmatic access (e.g., VidRecord integration)
vidpipe ideate --list --format json

# Link ideas to a recording
vidpipe process video.mp4 --ideas 12,15

Manual Idea Creation

Add a single idea with AI enrichment or direct CLI values:

# AI-researched — full IdeationAgent with MCP research tools
vidpipe ideate --add --topic "Building CI/CD with GitHub Actions"

# Direct — skip AI, use CLI flags + defaults
vidpipe ideate --add --topic "Quick Demo" --no-ai --hook "Ship it live" --audience "developers"

# JSON output for programmatic consumers (e.g., VidRecord Electron app)
vidpipe ideate --add --topic "My Topic" --format json

How It Works

The IdeationAgent uses MCP tools (Exa web search, YouTube, Perplexity) to research trending topics in your niche before generating ideas. Each idea includes:

  • Topic & hook — The angle that makes it compelling
  • Audience & key takeaway — Who it's for and what they'll learn
  • Talking points — Structured bullet points to guide your recording
  • Publish-by date — Based on timeliness (3–5 days for hot trends, months for evergreen)
  • Trend context — The research findings that back the idea

Idea Lifecycle

draft → ready → recorded → published
Status Meaning
draft Generated by AI, awaiting your review
ready Approved — ready to record
recorded Linked to a video via --ideas flag
published Content from this idea has been published

Ideas automatically influence downstream content — when you link ideas to a recording with --ideas, the pipeline's agents (shorts, social posts, summaries, blog) reference your intended topic and hook for more focused output.


📺 Review App

VidPipe includes a built-in web app for reviewing, editing, and scheduling social media posts before publishing.

VidPipe Review UI
Review and approve posts across YouTube, TikTok, Instagram, LinkedIn, and X/Twitter
# Launch the review app
vidpipe review
  • Platform tabs — Filter posts by platform (YouTube, TikTok, Instagram, LinkedIn, X)
  • Video preview — See the video thumbnail and content before approving
  • Keyboard shortcuts — Arrow keys to navigate, Enter to approve, Backspace to reject
  • Smart scheduling — Posts are queued with optimal timing per platform

🔄 Pipeline

graph LR
    A[📥 Ingest] --> B[🎙️ Transcribe]
    B --> C[🔇 Silence Removal]
    C --> D[💬 Captions]
    D --> E[🔥 Caption Burn]
    E --> F[✂️ Shorts]
    F --> G[🎞️ Medium Clips]
    G --> H[📑 Chapters]
    H --> I[📝 Summary]
    I --> J[📱 Social Media]
    J --> K[📱 Short Posts]
    K --> L[📱 Medium Posts]
    L --> M[📰 Blog]
    M --> N[📦 Queue Build]

    style A fill:#2d5a27,stroke:#4ade80
    style B fill:#1e3a5f,stroke:#60a5fa
    style E fill:#5a2d27,stroke:#f87171
    style F fill:#5a4d27,stroke:#fbbf24
    style N fill:#2d5a27,stroke:#4ade80
Loading
# Stage Description
1 Ingestion Copies video, extracts metadata with FFprobe
2 Transcription Extracts audio → OpenAI Whisper for word-level transcription
3 Silence Removal AI detects dead-air segments; context-aware removals capped at 20%
4 Captions Generates .srt, .vtt, and .ass subtitle files with karaoke word highlighting
5 Caption Burn Burns ASS captions into video (single-pass encode when silence was also removed)
6 Shorts AI identifies best 15–60s moments; extracts single and composite clips with 6 variants per short
7 Medium Clips AI identifies 1–3 min standalone segments with crossfade transitions
8 Chapters AI detects topic boundaries; outputs JSON, Markdown, FFmetadata, and YouTube timestamps
9 Summary AI writes a Markdown README with captured screenshots
10 Social Media Platform-tailored posts for TikTok, YouTube, Instagram, LinkedIn, and X
11 Short Posts Per-short social media posts for all 5 platforms
12 Medium Clip Posts Per-medium-clip social media posts for all 5 platforms
13 Blog Dev.to blog post with frontmatter, web-sourced links via Exa
14 Queue Build Builds publish queue from social posts with scheduled slots

Each stage can be independently skipped with --no-* flags. A stage failure does not abort the pipeline — subsequent stages proceed with whatever data is available.

Progress Events

Pass --progress to emit structured JSONL progress events to stderr while normal logs continue on stdout:

vidpipe process video.mp4 --progress 2>progress.jsonl

Each line is a self-contained JSON object:

{"event":"pipeline:start","videoPath":"video.mp4","totalStages":16,"timestamp":"..."}
{"event":"stage:start","stage":"ingestion","stageNumber":1,"totalStages":16,"name":"Ingestion","timestamp":"..."}
{"event":"stage:complete","stage":"ingestion","stageNumber":1,"totalStages":16,"name":"Ingestion","duration":423,"success":true,"timestamp":"..."}
{"event":"stage:skip","stage":"shorts","stageNumber":7,"totalStages":16,"name":"Shorts","reason":"SKIP_SHORTS","timestamp":"..."}
{"event":"pipeline:complete","totalDuration":45000,"stagesCompleted":14,"stagesFailed":0,"stagesSkipped":2,"timestamp":"..."}

Event types: pipeline:start, stage:start, stage:complete, stage:error, stage:skip, pipeline:complete.

Integrating tools can read stderr line-by-line to display a live progress UI (e.g., "Stage 3/16: Silence Removal").


🤖 LLM Providers

VidPipe supports multiple LLM providers:

Provider Env Var Default Model Notes
copilot (default) Claude Opus 4.6 Uses GitHub Copilot auth
openai OPENAI_API_KEY gpt-4o Direct OpenAI API
claude ANTHROPIC_API_KEY claude-opus-4.6 Direct Anthropic API

Set LLM_PROVIDER in your .env or pass via CLI. Override model with LLM_MODEL.

The pipeline tracks token usage and estimated cost across all providers, displaying a summary at the end of each run.


⚙️ Configuration

Configuration is loaded from CLI flags → environment variables → .env file → defaults.

# .env
OPENAI_API_KEY=sk-your-key-here
WATCH_FOLDER=/path/to/recordings
OUTPUT_DIR=/path/to/output
# EXA_API_KEY=your-exa-key       # Optional: enables web search in social/blog posts
# BRAND_PATH=./brand.json         # Optional: path to brand voice config
# FFMPEG_PATH=/usr/local/bin/ffmpeg
# FFPROBE_PATH=/usr/local/bin/ffprobe
# LATE_API_KEY=sk_your_key_here   # Optional: Late API for social publishing
# GITHUB_TOKEN=ghp_...            # Optional: GitHub token for ID8 idea storage
# IDEAS_REPO=owner/repo           # Optional: GitHub repo for storing ideas as Issues

Social media publishing is configured via schedule.json and the Late API. See Social Publishing Guide for details.


📚 Documentation

Guide Description
Getting Started Prerequisites, installation, and first run
Configuration All CLI flags, env vars, skip options, and examples
FFmpeg Setup Platform-specific install (Windows, macOS, Linux, ARM64)
Brand Customization Customize AI voice, vocabulary, hashtags, and content style
Social Publishing Review, schedule, and publish social posts via Late API
Architecture (L0–L7) Layer hierarchy, import rules, and testing strategy
Platform Content Strategy Research-backed recommendations per social platform

Full reference docs are available at htekdev.github.io/vidpipe.


🏗️ Architecture

VidPipe uses a strict L0–L7 layered architecture where each layer can only import from specific lower layers. This enforces clean separation of concerns and makes every layer independently testable.

L7-app         CLI, servers, watchers          → L0, L1, L3, L6
L6-pipeline    Stage orchestration             → L0, L1, L5
L5-assets      Lazy-loaded asset + bridges     → L0, L1, L4
L4-agents      LLM agents (BaseAgent)          → L0, L1, L3
L3-services    Business logic + cost tracking  → L0, L1, L2
L2-clients     External API/process wrappers   → L0, L1
L1-infra       Infrastructure (config, logger) → L0
L0-pure        Pure functions, zero I/O        → (nothing)

Each editing task is handled by a specialized AI agent built on the GitHub Copilot SDK:

graph TD
    BP[🧠 BaseAgent] --> SRA[SilenceRemovalAgent]
    BP --> SA[SummaryAgent]
    BP --> SHA[ShortsAgent]
    BP --> MVA[MediumVideoAgent]
    BP --> CA[ChapterAgent]
    BP --> SMA[SocialMediaAgent]
    BP --> BA[BlogAgent]
    BP --> IA[IdeationAgent]

    SRA -->|tools| T1[detect_silence, decide_removals]
    SHA -->|tools| T2[plan_shorts]
    MVA -->|tools| T3[plan_medium_clips]
    CA -->|tools| T4[generate_chapters]
    SA -->|tools| T5[capture_frame, write_summary]
    SMA -->|tools| T6[search_links, create_posts]
    BA -->|tools| T7[search_web, write_blog]
    IA -->|tools| T8[web_search, youtube_search, generate_ideas]

    style BP fill:#1e3a5f,stroke:#60a5fa,color:#fff
    style IA fill:#5a4d27,stroke:#fbbf24,color:#fff
Loading

Each agent communicates with the LLM through structured tool calls, ensuring reliable, parseable outputs. See the Architecture Guide for full details on layer rules and import enforcement.


🛠️ Tech Stack

Technology Purpose
TypeScript Language (ES2022, ESM)
GitHub Copilot SDK AI agent framework
OpenAI Whisper Speech-to-text
Google Gemini Vision-based video analysis
FFmpeg Video/audio processing
Sharp Image analysis (webcam detection)
Octokit GitHub API (idea storage as Issues)
Commander.js CLI framework
Chokidar File system watching
Winston Logging
Exa AI Web search for social posts, blog, and ideation

🗺️ Roadmap

  • Automated social posting — Publish directly to platforms via Late API
  • Content ideation (ID8) — AI-generated, trend-backed video ideas with lifecycle tracking
  • Gemini Vision integration — AI-powered video analysis and scene detection
  • L0–L7 layered architecture — Strict separation of concerns with import enforcement
  • GitHub agentic workflows — Automated issue and PR triage via GitHub Actions
  • Hook-first clip ordering — Most engaging moment plays first in shorts
  • Multi-language support — Transcription and summaries in multiple languages
  • Custom templates — User-defined Markdown & social post templates
  • Batch processing — Process an entire folder of existing videos
  • Thumbnail generation — Auto-generate branded thumbnails for shorts

🔧 Troubleshooting

No binary found for architecture during install

ffmpeg-static (an optional dependency) bundles FFmpeg for common platforms. On unsupported architectures, it skips gracefully and vidpipe falls back to your system FFmpeg.

Fix: Install FFmpeg on your system:

  • Windows: winget install Gyan.FFmpeg
  • macOS: brew install ffmpeg
  • Linux: sudo apt install ffmpeg (Debian/Ubuntu) or sudo dnf install ffmpeg (Fedora)

You can also point to a custom binary: export FFMPEG_PATH=/path/to/ffmpeg

Run vidpipe doctor to verify your setup.


📄 License

ISC © htekdev


🧩 SDK Usage

VidPipe also ships as a Node.js ESM SDK for programmatic use:

import { createVidPipe } from 'vidpipe'

const vidpipe = createVidPipe({
  openaiApiKey: process.env.OPENAI_API_KEY,
  outputDir: './recordings',
})

const result = await vidpipe.processVideo('./videos/demo.mp4', {
  skipGit: true,
})

console.log(result.video.videoDir)
console.log(result.shorts.length)

SDK features include:

  • processVideo() for the full pipeline
  • ideate() plus ideas.* CRUD helpers
  • schedule.* helpers for slots, calendar, and realignment
  • video.* helpers for clips, captions, silence detection, variants, and frames
  • social.generatePosts() for quick platform-specific drafts
  • doctor() and config.* for diagnostics and configuration access

See docs/sdk.md for the full SDK guide.

About

CLI tool that auto-processes video recordings: transcribes, removes silence, generates captions, creates shorts, social posts, and more

Resources

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages