Minimal Local RAG Stack

A reproducible, minimal stack for learning, experimenting, and developing sustainable Retrieval-Augmented Generation (RAG) systems locally. Designed for rapid prototyping, collaborative learning, and easy extension to CI/CD and custom LLM training.

Prerequisites

Before you begin, ensure you have the following installed:

Docker & Docker Compose: Required to run the stack locally.
- Install Docker Desktop (includes Docker Compose)
- Verify installation:
```
docker --version
docker compose version
```
- Disk space: At least 20GB free (for models, data, and volumes)
- RAM: At least 8GB available (recommend 16GB+ for smooth operation)
Make: For running convenience commands (optional but recommended)
- macOS: Pre-installed or install via xcode-select --install
- Linux: sudo apt-get install make (Debian/Ubuntu)
- Windows: Install GNU Make for Windows or use WSL2 with Linux commands

Project Overview

This project provides a simple, local-first RAG pipeline using Infrastructure as Code principles for reproducible, maintainable environments:

Ollama for running LLMs locally
Weaviate as a vector database
Jupyter Notebooks for experimentation and documentation
OpenWebUI (optional) for LLM chat interface

The goal is to empower anyone to learn, work, and build with RAGs on their own hardware, with minimal setup and maximal transparency. By treating local infrastructure as code, we gain better control over hardware resources, simplified maintenance, and a foundation for scaling from development to production.

Quickstart

Clone the repository:

git clone https://github.com/escape/RAG-IAC.git
cd rag-iac

Start the stack:
```
make smoke
```
This will bring up all services, ingest a sample document, and run a test query.

Makefile Commands for Stack Management

Use the provided Makefile for common stack operations:

make up — Start all services in the background
make down — Stop all services
make reset — Stop all services and destroy all volumes (irreversible)
make logs — View live logs from all services
make ingest — Run the ingestion script in Jupyter
make query q="your question" — Run a query against the stack
make smoke — Run the full smoke test (stack up, ingest, query)

These commands simplify stack management and troubleshooting. Use make reset to clear all persistent data and start fresh, or make smoke to validate the stack end-to-end.

Experiment in Jupyter:
- Open Jupyter at http://localhost:8889
- Try the provided notebooks for ingestion and querying.

Data prep and ingestion utilities

All utilities run inside the Jupyter container. Paths must be container paths (your repo is mounted at /home/jovyan). Examples below assume input files live under /home/jovyan/data/....

Tips

Add QUIET=true to batch operations to print only a compact JSON summary.
Estimation can be very verbose on large folders; use summary-only.

Convert JSON exports to Q/A markdown

Turn ChatGPT-like conversation JSON into many small markdown files ready for ingestion.

Examples

Convert one JSON file to a folder of .md files:
- make json-to-text input=/home/jovyan/data/raw/conversations.json out=/home/jovyan/data/raw/json2txt
Optionally filter by filename pattern during conversion (keeps only matching titles/ids if the script supports it):
- make json-to-text input=/home/jovyan/data/raw/conversations.json out=/home/jovyan/data/raw/json2txt pattern='*'

Output goes to the out directory; each message pair becomes Title__0001.md, etc.

Estimate chunks before ingest

Get per-file and total chunk counts using the current splitter settings (size ~900, overlap ~150).

Examples

Whole folder (totals only):
- make estimate path=/home/jovyan/data/raw/json2txt QUIET=true
Single file (full detail):
- make estimate path=/home/jovyan/data/raw/one_big.md

Split a large file into parts

Useful for giant files that you want to ingest in parallel or track progress on.

Examples

Split a 200MB file into ~5MB parts:
- make split file=/home/jovyan/data/raw/one_big.md size=5MB out=/home/jovyan/data/raw/parts prefix=big

Creates out/prefix.part-0001.md, etc.

Ingest data

Two common flows are supported:

Split-and-ingest a single large file

In one shot: split then ingest the parts
- make ingest-batch-file file=/home/jovyan/data/raw/one_big.md size=5MB out=/home/jovyan/data/raw/parts prefix=big QUIET=true

Ingest a directory of files

Non-recursive (default) with an explicit filename pattern:
- make ingest-batch-dir dir=/home/jovyan/data/raw/json2txt pattern='*.md' QUIET=true
Recursive mode:
- make ingest-batch-dir dir=/home/jovyan/data/raw recursive=true QUIET=true

Notes

QUIET mode returns a single final JSON summary line with totals (ingested chunks, files processed).
The scripts wait for Ollama and Weaviate to be ready before starting.

Query

Ask questions against your ingested corpus using retrieval + local LLM generation.

Example

make query q='What are the key steps in the creative methodology breakdown?'

Notes

The query path validates service readiness and will error if the question is empty.
Uses the same embeddings/vector store as the ingestion utilities.

Stack Components

Ollama: Runs LLMs locally (e.g., phi3:mini). No cloud required.
Weaviate: Stores and retrieves vector embeddings for context-aware queries.
Jupyter: Interactive notebooks for code, notes, and experiments.
OpenWebUI: Optional web interface for chatting with LLMs.

Multi-Interface Architecture

The stack is designed to maintain consistency across different interfaces:

Shared State Management

Weaviate Vector Store: Central source of truth for embeddings
- Persistent across restarts via Docker volumes
- Accessible simultaneously from all interfaces
- Use make reset to clear if needed

Interface Coordination

Programmatic (Python Scripts)
- Direct API access to Weaviate and Ollama
- Best for automated pipelines and testing
- Use for bulk operations and systematic evaluation
Interactive (Jupyter)
- Same underlying APIs as scripts
- Great for exploration and learning
- Maintains persistence with main vector store
- Perfect for LLM training experiments
Chat (OpenWebUI)
- Uses the same Ollama instance
- Can access the same context via Weaviate
- Ideal for quick testing and demonstrations

Best Practices

Use notebooks for exploration and training
Scripts for production and testing
OpenWebUI for demo and quick checks
All interfaces share the same vector store
Document significant changes to shared state

Usage

Ingest data: Place text files in data/raw/ and run the ingestion notebook or script.
Query: Use the query notebook or script to ask questions and get context-aware answers.
Extend: Add new notebooks, scripts, or data sources as needed.

Extensibility

CI/CD: Easily add GitHub Actions or other CI/CD tools for automated testing and deployment.
Custom LLM Training: The notebook approach supports step-by-step experimentation with fine-tuning and evaluation.
Scalability: Start minimal, scale up as needed—add more data, models, or integrations.

Learning Goals

Understand the RAG pipeline end-to-end.
Learn how to run LLMs and vector DBs locally.
Experiment and document findings in notebooks.
Build a foundation for sustainable, privacy-preserving AI development.

Contributing

Pull requests, issues, and suggestions are welcome! Help us make local RAGs accessible to all.

This project is a starting point. Refine, extend, and share your improvements!

Stack Health & Reset

If you encounter issues (e.g., Weaviate stuck, stale data, or persistent errors), you can restart the stack from scratch:

Stop all services:
```
docker compose down
```
Remove persistent volumes (clears all data):
```
docker volume rm rag-iac_weaviate-data rag-iac_ollama-data
```
(You can remove only rag-iac_weaviate-data if you want to keep Ollama models.)
Restart the stack:
```
make smoke
```

This will ensure a clean state for all services. Useful for troubleshooting, development, or sharing a reproducible environment.

Weaviate maintenance: targeted cleanup, backup, and restore

Weaviate stores its data in the named Docker volume rag-iac_weaviate-data (mounted at /var/lib/weaviate). Here are safe, minimal recipes to maintain that data.

Important notes

Prefer stopping Weaviate before backup/restore to ensure consistency.
Keep CLUSTER_HOSTNAME stable (already set to node1 in compose.yml). If it changes, stale Raft state may require a targeted cleanup.

Quick targeted cleanup (only Weaviate data)

This clears Weaviate’s state without touching Ollama models.

# Stop the stack (or at least Weaviate)
docker compose down

# Remove only Weaviate data (irreversible)
docker volume rm rag-iac_weaviate-data

# Start and validate
make smoke

Offline backup of Weaviate data

Creates a timestamped tarball under ./backups/. Best done with Weaviate stopped.

# Stop only Weaviate to get a consistent snapshot
docker compose stop weaviate

# Ensure backups folder exists
mkdir -p backups

# Create a compressed backup of the volume
docker run --rm \
   -v rag-iac_weaviate-data:/data:ro \
   -v "$PWD/backups:/backup" \
   alpine:3.19 sh -c "tar -C /data -czf /backup/weaviate-$(date +%Y%m%d-%H%M%S).tgz . && ls -lh /backup/weaviate-*.tgz"

# Bring Weaviate back up
docker compose start weaviate

Optional: backup Ollama models as well

docker run --rm \
   -v rag-iac_ollama-data:/data:ro \
   -v "$PWD/backups:/backup" \
   alpine:3.19 sh -c "tar -C /data -czf /backup/ollama-$(date +%Y%m%d-%H%M%S).tgz . && ls -lh /backup/ollama-*.tgz"

Restore Weaviate from a backup

Restores a previously created tarball into the rag-iac_weaviate-data volume.

# Stop the stack (or at least Weaviate)
docker compose down

# Pick the backup file name (from ./backups)
BACKUP_FILE="weaviate-YYYYMMDD-HHMMSS.tgz"  # replace with your filename

# Restore into the (emptied) volume
docker run --rm \
   -v rag-iac_weaviate-data:/data \
   -v "$PWD/backups:/backup" \
   alpine:3.19 sh -c "rm -rf /data/* && tar -C /data -xzf /backup/$BACKUP_FILE && ls -la /data | head"

# Start and validate
docker compose up -d
make smoke

Verify backups quickly

tar -tzf backups/weaviate-*.tgz | head

Tips and FAQs

If Weaviate reports 503 and logs show Raft join errors, prefer the targeted cleanup above instead of a full reset.
Backups can be large; ensure sufficient disk space before creating/restoring.
You can also script these as make targets (e.g., make backup-weaviate, make restore-weaviate). They’re left out by default to keep the stack minimal.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data/raw		data/raw
notebooks		notebooks
scripts		scripts
.env		.env
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
compose.yml		compose.yml
test-01.jpg		test-01.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal Local RAG Stack

Prerequisites

Project Overview

Quickstart

Makefile Commands for Stack Management

Data prep and ingestion utilities

Convert JSON exports to Q/A markdown

Estimate chunks before ingest

Split a large file into parts

Ingest data

Query

Stack Components

Multi-Interface Architecture

Shared State Management

Interface Coordination

Best Practices

Usage

Extensibility

Learning Goals

Contributing

Stack Health & Reset

Weaviate maintenance: targeted cleanup, backup, and restore

Quick targeted cleanup (only Weaviate data)

Offline backup of Weaviate data

Restore Weaviate from a backup

Verify backups quickly

Tips and FAQs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Minimal Local RAG Stack

Prerequisites

Project Overview

Quickstart

Makefile Commands for Stack Management

Data prep and ingestion utilities

Convert JSON exports to Q/A markdown

Estimate chunks before ingest

Split a large file into parts

Ingest data

Query

Stack Components

Multi-Interface Architecture

Shared State Management

Interface Coordination

Best Practices

Usage

Extensibility

Learning Goals

Contributing

Stack Health & Reset

Weaviate maintenance: targeted cleanup, backup, and restore

Quick targeted cleanup (only Weaviate data)

Offline backup of Weaviate data

Restore Weaviate from a backup

Verify backups quickly

Tips and FAQs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages