Summary
Implement zero-downtime deployments for the Tale platform using blue-green deployment strategy with Docker Compose on a single VPS/VM.
Current Pain Points
- Service restarts cause brief outages (~90 seconds downtime)
- Database migrations block traffic
- Long container startup times before traffic resumes
Proposed Solution
Use blue-green deployment with Caddy upstream pools. Two versions of stateless services run simultaneously during deployment, with traffic switched only after the new version is healthy.
How It Works
PHASE 1: Blue serving → Internet → Caddy → [Blue ✅]
PHASE 2: Green starting → Internet → Caddy → [Blue ✅] + [Green ⏳]
PHASE 3: Both healthy → Internet → Caddy → [Blue draining] + [Green ✅ NEW]
PHASE 4: Cleanup → Internet → Caddy → [Green ✅]
Colors rotate: blue → green → blue → green...
Implementation Tasks
Files to Create
Files to Modify
Database Consistency
Both Blue and Green share the same database. For schema changes, use the expand-contract pattern:
- EXPAND: Add new columns/tables (backward compatible) - run BEFORE deployment
- DEPLOY: Code works with both old and new schema
- CONTRACT: Remove old columns (only AFTER old code is gone)
Safe Migrations (EXPAND)
ADD COLUMN
ADD TABLE
CREATE INDEX CONCURRENTLY
ADD NULLABLE COLUMN
Unsafe Migrations (CONTRACT - run manually after deploy)
DROP COLUMN
RENAME COLUMN
ADD NOT NULL
Resource Requirements
Running two versions simultaneously requires ~2x memory during deployment:
| Phase |
Memory Required |
| Normal operation |
~5-6 GB |
| During deployment |
~10-12 GB |
Recommendation: VPS should have at least 12-16 GB RAM.
Deployment Commands
# Normal deployment
./scripts/deploy.sh deploy
# Quick rollback (if previous containers still running)
./scripts/deploy.sh rollback
# Check status
./scripts/deploy.sh status
Testing Plan
- Deploy blue version initially
- Make code change and deploy (should switch to green)
- Verify zero dropped requests during switch
- Make another change and deploy (should switch back to blue)
- Test rollback scenario
- Test failed deployment scenario (new version doesn't pass health check)
Alternative: Simpler Rolling Update
If blue-green is too resource-intensive, use rolling updates:
docker compose up -d --no-deps --build platform
Caddy's lb_try_duration handles brief unavailability (~10-30s potential for errors).
Detailed Plan
See the comment below for the full implementation plan with diagrams.
Summary
Implement zero-downtime deployments for the Tale platform using blue-green deployment strategy with Docker Compose on a single VPS/VM.
Current Pain Points
Proposed Solution
Use blue-green deployment with Caddy upstream pools. Two versions of stateless services run simultaneously during deployment, with traffic switched only after the new version is healthy.
How It Works
Colors rotate: blue → green → blue → green...
Implementation Tasks
Files to Create
scripts/deploy.sh- Deployment orchestration scriptcompose.blue.yml- Blue deployment overlay-bluesuffixplatform-blue,rag-blue, etc.compose.green.yml- Green deployment overlay-greensuffixplatform-green,rag-green, etc.Files to Modify
compose.ymlcontainer_namefrom stateless services (platform, rag, crawler, search, graph-db)container_namefor stateful services (db, proxy)services/proxy/Caddyfilelb_policy first)services/platform/docker-entrypoint.shDatabase Consistency
Both Blue and Green share the same database. For schema changes, use the expand-contract pattern:
Safe Migrations (EXPAND)
ADD COLUMNADD TABLECREATE INDEX CONCURRENTLYADD NULLABLE COLUMNUnsafe Migrations (CONTRACT - run manually after deploy)
DROP COLUMNRENAME COLUMNADD NOT NULLResource Requirements
Running two versions simultaneously requires ~2x memory during deployment:
Recommendation: VPS should have at least 12-16 GB RAM.
Deployment Commands
Testing Plan
Alternative: Simpler Rolling Update
If blue-green is too resource-intensive, use rolling updates:
Caddy's
lb_try_durationhandles brief unavailability (~10-30s potential for errors).Detailed Plan
See the comment below for the full implementation plan with diagrams.