Generate clean, token-optimized code structure maps using tree-sitter for AI-assisted development.
Inspired by: Aider's repomap.py
Purpose: Reduce context window tokens while maintaining code understanding, improving AI performance and reducing costs.
This repository includes complete automation for repomap management. The repomap is automatically updated via pre-commit hooks and validated in CI.
# Install dependencies (includes tree-sitter packages)
uv pip install -r requirements.txt
# Install pre-commit hooks (includes repomap auto-update)
pre-commit install
# Manual update (if needed)
./scripts/update-repomap.sh
# Validate repomap is current
./scripts/update-repomap.sh --checkThat's it! The repomap will auto-update when you commit code changes.
- Python 3.11+
- pip or uv package manager
# Recommended method
uv pip install -r requirements.txt
# Or install individually with uv
uv pip install tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-go tree-sitter-bash# Map current directory
python repomap.py .
# Map specific directory
python repomap.py /path/to/repo
# Show verbose output (processing details, warnings)
python repomap.py . --verbose
# Disable parallel processing
python repomap.py . --no-parallel
# Set maximum file size (default: 1MB)
python repomap.py . --max-file-size 2097152 # 2MB# Save to file
python repomap.py . > repomap.txt
# Copy to clipboard (macOS)
python repomap.py . | pbcopy
# Copy to clipboard (Linux)
python repomap.py . | xclip -selection clipboardRepomap generates a clean tree structure showing:
- File hierarchy
- Code symbols (classes, functions, methods, interfaces, structs)
- Nesting relationships
src/
main.py
class Application
def __init__()
def run()
def main()
utils/
helpers.py
def sanitize_string()
def validate_slug()
config.py
class Config
def load()
| Language | Extensions | Symbols Extracted |
|---|---|---|
| Python | .py |
Functions, Classes, Methods |
| JavaScript | .js |
Functions, Classes |
| TypeScript | .ts, .tsx |
Functions, Classes, Interfaces |
| Go | .go |
Functions, Methods, Structs |
| Shell | .sh, .bash |
Functions |
Repomap automatically skips:
- Binary files (detected via null bytes)
- Large files (>1MB by default, configurable)
- Files matching
.gitignorepatterns - Common ignore patterns (
.git,__pycache__,node_modules,.venv, etc.)
- Parallel processing: Uses multiprocessing for faster parsing (disable with
--no-parallel) - Efficient parsing: Tree-sitter provides fast, incremental parsing
- Smart filtering: Skips irrelevant files before parsing
- Gracefully handles parse errors (continues processing other files)
- Skips unsupported file types silently
- Reports errors in verbose mode (
--verbose)
# Generate map for AI context
python repomap.py . > repomap.txt
# Use in AI prompts
cat repomap.txt | pbcopy # Copy to clipboard# .github/workflows/generate-repomap.yml
name: Generate Repomap
on: [push, pull_request]
jobs:
repomap:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install uv
uv pip install tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-go tree-sitter-bash
- name: Generate repomap
run: python repomap.py . > repomap.txt
- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: repomap
path: repomap.txt# .gitlab-ci.yml
generate_repomap:
stage: build
image: python:3.11
script:
- pip install uv
- uv pip install tree-sitter tree-sitter-python tree-sitter-javascript tree-sitter-typescript tree-sitter-go tree-sitter-bash
- python repomap.py . > repomap.txt
artifacts:
paths:
- repomap.txtThis reference repository includes complete repomap automation:
The pre-commit hook automatically regenerates .repomap.txt when you commit changes to code files:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: repomap-update
name: Update repository map
entry: scripts/update-repomap.sh
language: system
pass_filenames: false
files: '\.(py|js|ts|tsx|go|sh|bash)$'Triggers on: .py, .js, .ts, .tsx, .go, .sh, .bash file changes
What it does: Runs ./scripts/update-repomap.sh to regenerate .repomap.txt before commit
GitHub Actions workflow validates repomap is current on every push/PR:
# .github/workflows/ci.yml
repomap-validation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install repomap dependencies
run: |
pip install uv
uv pip install tree-sitter tree-sitter-python tree-sitter-javascript \
tree-sitter-typescript tree-sitter-go tree-sitter-bash
- name: Validate repomap is current
run: ./scripts/update-repomap.sh --checkWhat it does: Blocks merge if .repomap.txt is outdated
The scripts/update-repomap.sh script provides:
# Regenerate repomap
./scripts/update-repomap.sh
# Validate repomap is current (CI usage)
./scripts/update-repomap.sh --check
# Show help
./scripts/update-repomap.sh --helpFeatures:
- Clear error messages with dependency installation hints
- Validates repomap currency for CI/CD
- Used by pre-commit hook for auto-updates
If you prefer a simpler manual hook in other repositories:
# .git/hooks/pre-commit
#!/bin/bash
python repomap.py . > .repomap.txt
git add .repomap.txtControl maximum file size to parse:
# Default: 1MB (1048576 bytes)
python repomap.py . --max-file-size 2097152 # 2MBRepomap automatically respects .gitignore patterns in the repository root.
Default ignore patterns (always applied):
.git__pycache__node_modules.venvvenv*.pyc.DS_Store
- No caching: Always recomputes (acceptable performance trade-off)
- Simple gitignore: Basic pattern matching (not full gitignore spec)
- No incremental updates: Generates full map on each run
- Limited symbol extraction: Focuses on primary code symbols (functions, classes)
Potential improvements for future versions:
- File-level caching (only reparse changed files)
- Full gitignore specification support
- Additional symbol types (variables, imports, exports)
- Configurable output formats (JSON, Markdown, etc.)
- Symbol filtering (include/exclude patterns)
Benchmark: Medium-sized repository (~1000 files)
- Sequential: ~15-20 seconds
- Parallel: ~5-8 seconds (on multi-core systems)
Optimization tips:
- Use
--max-file-sizeto skip large generated files - Ensure
.gitignoreexcludes build artifacts and dependencies - Use
--no-parallelonly for debugging (slower)
Problem: ImportError: No module named 'tree_sitter'
Solution: Install dependencies
uv pip install -r requirements.txtProblem: Files not showing symbols or "Error" messages in verbose mode
Possible causes:
- Syntax errors in source files
- Unsupported language features
- Encoding issues (non-UTF-8 files)
Solution: Run with --verbose to see detailed error messages
python repomap.py . --verboseProblem: Slow parsing on large repositories
Solutions:
- Reduce
--max-file-sizeto skip large files - Update
.gitignoreto exclude build artifacts - Use parallel processing (default, faster on multi-core systems)
This is a reference implementation. Contributions welcome:
- Additional language support
- Performance optimizations
- Enhanced gitignore pattern matching
- Additional output formats
See LICENSE file in repository root.
Quickstart:
- Install dependencies:
uv pip install -r requirements.txt - Install pre-commit hooks:
pre-commit install - The repomap auto-updates on commits!
Manual usage (if needed):
./scripts/update-repomap.sh # Regenerate
./scripts/update-repomap.sh --check # Validate
python repomap.py . # Direct generation