A reproducible, minimal stack for learning, experimenting, and developing sustainable Retrieval-Augmented Generation (RAG) systems locally. Designed for rapid prototyping, collaborative learning, and easy extension to CI/CD and custom LLM training.
Before you begin, ensure you have the following installed:
-
Docker & Docker Compose: Required to run the stack locally.
- Install Docker Desktop (includes Docker Compose)
- Verify installation:
docker --version docker compose version
- Disk space: At least 20GB free (for models, data, and volumes)
- RAM: At least 8GB available (recommend 16GB+ for smooth operation)
-
Make: For running convenience commands (optional but recommended)
- macOS: Pre-installed or install via
xcode-select --install - Linux:
sudo apt-get install make(Debian/Ubuntu) - Windows: Install GNU Make for Windows or use WSL2 with Linux commands
- macOS: Pre-installed or install via
This project provides a simple, local-first RAG pipeline using Infrastructure as Code principles for reproducible, maintainable environments:
- Ollama for running LLMs locally
- Weaviate as a vector database
- Jupyter Notebooks for experimentation and documentation
- OpenWebUI (optional) for LLM chat interface
The goal is to empower anyone to learn, work, and build with RAGs on their own hardware, with minimal setup and maximal transparency. By treating local infrastructure as code, we gain better control over hardware resources, simplified maintenance, and a foundation for scaling from development to production.
- Clone the repository:
git clone https://github.com/escape/RAG-IAC.git cd rag-iac - Start the stack:
This will bring up all services, ingest a sample document, and run a test query.
make smoke
Use the provided Makefile for common stack operations:
make up— Start all services in the backgroundmake down— Stop all servicesmake reset— Stop all services and destroy all volumes (irreversible)make logs— View live logs from all servicesmake ingest— Run the ingestion script in Jupytermake query q="your question"— Run a query against the stackmake smoke— Run the full smoke test (stack up, ingest, query)
These commands simplify stack management and troubleshooting. Use make reset to clear all persistent data and start fresh, or make smoke to validate the stack end-to-end.
- Experiment in Jupyter:
- Open Jupyter at http://localhost:8889
- Try the provided notebooks for ingestion and querying.
All utilities run inside the Jupyter container. Paths must be container paths (your repo is mounted at /home/jovyan). Examples below assume input files live under /home/jovyan/data/....
Tips
- Add
QUIET=trueto batch operations to print only a compact JSON summary. - Estimation can be very verbose on large folders; use summary-only.
Turn ChatGPT-like conversation JSON into many small markdown files ready for ingestion.
Examples
- Convert one JSON file to a folder of
.mdfiles:- make json-to-text input=/home/jovyan/data/raw/conversations.json out=/home/jovyan/data/raw/json2txt
- Optionally filter by filename pattern during conversion (keeps only matching titles/ids if the script supports it):
- make json-to-text input=/home/jovyan/data/raw/conversations.json out=/home/jovyan/data/raw/json2txt pattern='*'
Output goes to the out directory; each message pair becomes Title__0001.md, etc.
Get per-file and total chunk counts using the current splitter settings (size ~900, overlap ~150).
Examples
- Whole folder (totals only):
- make estimate path=/home/jovyan/data/raw/json2txt QUIET=true
- Single file (full detail):
- make estimate path=/home/jovyan/data/raw/one_big.md
Useful for giant files that you want to ingest in parallel or track progress on.
Examples
- Split a 200MB file into ~5MB parts:
- make split file=/home/jovyan/data/raw/one_big.md size=5MB out=/home/jovyan/data/raw/parts prefix=big
Creates out/prefix.part-0001.md, etc.
Two common flows are supported:
- Split-and-ingest a single large file
- In one shot: split then ingest the parts
- make ingest-batch-file file=/home/jovyan/data/raw/one_big.md size=5MB out=/home/jovyan/data/raw/parts prefix=big QUIET=true
- Ingest a directory of files
- Non-recursive (default) with an explicit filename pattern:
- make ingest-batch-dir dir=/home/jovyan/data/raw/json2txt pattern='*.md' QUIET=true
- Recursive mode:
- make ingest-batch-dir dir=/home/jovyan/data/raw recursive=true QUIET=true
Notes
- QUIET mode returns a single final JSON summary line with totals (ingested chunks, files processed).
- The scripts wait for Ollama and Weaviate to be ready before starting.
Ask questions against your ingested corpus using retrieval + local LLM generation.
Example
- make query q='What are the key steps in the creative methodology breakdown?'
Notes
- The query path validates service readiness and will error if the question is empty.
- Uses the same embeddings/vector store as the ingestion utilities.
- Ollama: Runs LLMs locally (e.g., phi3:mini). No cloud required.
- Weaviate: Stores and retrieves vector embeddings for context-aware queries.
- Jupyter: Interactive notebooks for code, notes, and experiments.
- OpenWebUI: Optional web interface for chatting with LLMs.
The stack is designed to maintain consistency across different interfaces:
- Weaviate Vector Store: Central source of truth for embeddings
- Persistent across restarts via Docker volumes
- Accessible simultaneously from all interfaces
- Use
make resetto clear if needed
-
Programmatic (Python Scripts)
- Direct API access to Weaviate and Ollama
- Best for automated pipelines and testing
- Use for bulk operations and systematic evaluation
-
Interactive (Jupyter)
- Same underlying APIs as scripts
- Great for exploration and learning
- Maintains persistence with main vector store
- Perfect for LLM training experiments
-
Chat (OpenWebUI)
- Uses the same Ollama instance
- Can access the same context via Weaviate
- Ideal for quick testing and demonstrations
- Use notebooks for exploration and training
- Scripts for production and testing
- OpenWebUI for demo and quick checks
- All interfaces share the same vector store
- Document significant changes to shared state
- Ingest data: Place text files in
data/raw/and run the ingestion notebook or script. - Query: Use the query notebook or script to ask questions and get context-aware answers.
- Extend: Add new notebooks, scripts, or data sources as needed.
- CI/CD: Easily add GitHub Actions or other CI/CD tools for automated testing and deployment.
- Custom LLM Training: The notebook approach supports step-by-step experimentation with fine-tuning and evaluation.
- Scalability: Start minimal, scale up as needed—add more data, models, or integrations.
- Understand the RAG pipeline end-to-end.
- Learn how to run LLMs and vector DBs locally.
- Experiment and document findings in notebooks.
- Build a foundation for sustainable, privacy-preserving AI development.
Pull requests, issues, and suggestions are welcome! Help us make local RAGs accessible to all.
This project is a starting point. Refine, extend, and share your improvements!
If you encounter issues (e.g., Weaviate stuck, stale data, or persistent errors), you can restart the stack from scratch:
- Stop all services:
docker compose down
- Remove persistent volumes (clears all data):
(You can remove only
docker volume rm rag-iac_weaviate-data rag-iac_ollama-data
rag-iac_weaviate-dataif you want to keep Ollama models.) - Restart the stack:
make smoke
This will ensure a clean state for all services. Useful for troubleshooting, development, or sharing a reproducible environment.
Weaviate stores its data in the named Docker volume rag-iac_weaviate-data (mounted at /var/lib/weaviate). Here are safe, minimal recipes to maintain that data.
Important notes
- Prefer stopping Weaviate before backup/restore to ensure consistency.
- Keep
CLUSTER_HOSTNAMEstable (already set tonode1incompose.yml). If it changes, stale Raft state may require a targeted cleanup.
This clears Weaviate’s state without touching Ollama models.
# Stop the stack (or at least Weaviate)
docker compose down
# Remove only Weaviate data (irreversible)
docker volume rm rag-iac_weaviate-data
# Start and validate
make smokeCreates a timestamped tarball under ./backups/. Best done with Weaviate stopped.
# Stop only Weaviate to get a consistent snapshot
docker compose stop weaviate
# Ensure backups folder exists
mkdir -p backups
# Create a compressed backup of the volume
docker run --rm \
-v rag-iac_weaviate-data:/data:ro \
-v "$PWD/backups:/backup" \
alpine:3.19 sh -c "tar -C /data -czf /backup/weaviate-$(date +%Y%m%d-%H%M%S).tgz . && ls -lh /backup/weaviate-*.tgz"
# Bring Weaviate back up
docker compose start weaviateOptional: backup Ollama models as well
docker run --rm \
-v rag-iac_ollama-data:/data:ro \
-v "$PWD/backups:/backup" \
alpine:3.19 sh -c "tar -C /data -czf /backup/ollama-$(date +%Y%m%d-%H%M%S).tgz . && ls -lh /backup/ollama-*.tgz"Restores a previously created tarball into the rag-iac_weaviate-data volume.
# Stop the stack (or at least Weaviate)
docker compose down
# Pick the backup file name (from ./backups)
BACKUP_FILE="weaviate-YYYYMMDD-HHMMSS.tgz" # replace with your filename
# Restore into the (emptied) volume
docker run --rm \
-v rag-iac_weaviate-data:/data \
-v "$PWD/backups:/backup" \
alpine:3.19 sh -c "rm -rf /data/* && tar -C /data -xzf /backup/$BACKUP_FILE && ls -la /data | head"
# Start and validate
docker compose up -d
make smoketar -tzf backups/weaviate-*.tgz | head- If Weaviate reports 503 and logs show Raft join errors, prefer the targeted cleanup above instead of a full reset.
- Backups can be large; ensure sufficient disk space before creating/restoring.
- You can also script these as
maketargets (e.g.,make backup-weaviate,make restore-weaviate). They’re left out by default to keep the stack minimal.
