AI-powered code review and repository intelligence platform.
BugViper ingests your repositories into a Neo4j knowledge graph via Tree-sitter AST parsing, then sends a LangGraph-powered agent to review pull requests โ finding bugs, security issues, and code quality problems with full codebase context. It also ships a Query interface for full-text and semantic code search, plus an AI chat agent that reasons directly over your graph.
Each indexed repository shows live stats derived directly from the Neo4j graph โ file count, function count, class count, and import count. Repositories are indexed once and stay up to date via GitHub push webhooks.
Search any symbol name or keyword across the entire graph. Results are anchored to the exact source line with an inline peek viewer โ expand up or down to read surrounding context without leaving the page.
When full-text isn't enough, semantic search embeds your query and returns results ranked by cosine similarity from Neo4j vector indexes. Useful for finding code by intent: "embedding model configuration" returns EmbeddingModelName, RoastResponse, and other conceptually related nodes at 73%, 69%, 68% similarity.
The Ask Agent page connects a ReAct LLM to your Neo4j graph. Ask natural language questions โ the agent reasons across 13+ tool calls, cites source files, and shows the relevant code inline. Ask "What embedding do we use?" and it finds the embedder, explains the batch flow, and shows the actual Cypher query.
BugViper materialises your codebase as a property graph โ 312 nodes and 336 relationships shown here for a single repository, spanning Function, Class, File, Module, Variable, and Repository node types. Explore it directly in Neo4j Browser or query it from the API.
When a PR is opened the BugViper bot posts a structured top-level comment with:
- Model used and actionable comment count
- Walkthrough table โ every changed file and a one-line summary of what changed
- Impact Analysis and Positive Findings sections
Each issue is posted as an inline diff comment pinned to the exact line. Here the agent flagged a bare except Exception: that catches KeyboardInterrupt and SystemExit โ severity Low, confidence 7/10 โ and suggested a specific fix with a one-line code change you can commit directly from GitHub.
The same review run caught a Medium security issue: LLM error details (rate limits, model names, API keys) leaking into a user-facing response via str(e)[:100]. The agent suggested logging the error server-side and returning a clean fallback message, preventing accidental information disclosure.
When you add a repository, BugViper:
- Clones or downloads the repo
- Runs Tree-sitter parsers (17 languages) to produce ASTs
- Extracts
Function,Class,Variable,File,Module,Repositorynodes - Writes the graph to Neo4j with relationships:
CONTAINS,DEFINES,CALLS,IMPORTS,INHERITS - Calculates cyclomatic complexity at parse time for every function
- Optionally batch-embeds all nodes with
text-embedding-3-smallโ stores vectors in Neo4j vector indexes
GitHub
โ
โผ
Tree-sitter AST (17 languages)
โ
Graph Builder โโโโโโโโโโบ Neo4j (nodes + relationships)
โ
Embedder (optional) โโโโโบ Neo4j (vector indexes: 1536-dim cosine)
Neo4j's full-text search is backed by Apache Lucene โ the same engine that powers Elasticsearch. BugViper creates two Lucene indexes at setup time:
| Index | Node types | Fields |
|---|---|---|
code_search |
Function, Class, Variable |
name, docstring, source_code |
file_content_search |
File |
source_code |
Two-tier search strategy (db/queries.py โ search_code()):
User query
โ
โโโบ Tier 1 โ `code_search` Lucene index
โ Simple identifiers โ phrase search "parse_unified_diff"
โ Special characters โ AND-keywords token1 AND token2
โ
โโโบ Tier 2 โ fallback to `file_content_search` (if Tier 1 empty)
Searches raw file content line-by-line
Returns: path + line_number + matching line (no full source dump)
Searching parse_unified_diff hits the function node instantly by name. Searching "Authorization: Bearer" falls through to line-level file content search. Both paths return lean results; the Peek API (/code-finder/peek) then fetches a windowed view around any line on demand, keeping responses fast regardless of file size.
Lucene escaping is applied automatically: clean identifiers get phrase-quoted, anything with special characters is tokenised and joined with AND.
The review pipeline is a two-phase LangGraph graph (code_review_agent/agent/):
PR opened / comment trigger
โ
โผ
Build diff + context prompt
โ
โโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโ
โ Phase 1 โ ReAct Explorer โ
โ LangGraph StateGraph โ LLM + 19 Neo4j tools
โ MAX_TOOL_ROUNDS = 6 โ Stops deterministically
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโ
โ accumulated messages (diff + tool results)
โผ
โโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ
โ Phase 2 โ Synthesizer โ
โ Plain LLM call โ JSON schema embedded in prompt
โ Works on any OpenRouter โ Robust JSON extraction (fence/prose/raw)
โ model โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโ
โ
Confidence filter โฅ 7 / 10
โ
โผ
Post inline GitHub comments
Phase 1 โ ReAct Exploration
The agent receives the PR diff and iteratively calls tools against Neo4j to build context for the code under review. It is capped at 6 tool rounds using a tool_rounds counter in ReviewExplorerState โ no reliance on LangGraph's recursion limit, so accumulated messages are always returned cleanly.
The agent has 19 tools:
| # | Tool | What it queries in Neo4j |
|---|---|---|
| 1 | search_code |
Lucene full-text across Function / Class / Variable / File |
| 2 | peek_code |
Line window from a file stored in the graph |
| 3 | semantic_search |
Vector similarity search (embeddings) |
| 4 | find_function |
Function node by exact or fuzzy name |
| 5 | find_class |
Class node by exact or fuzzy name |
| 6 | find_variable |
Variable by substring |
| 7 | find_by_content |
Symbol bodies containing a pattern |
| 8 | find_by_line |
Raw file content line-by-line |
| 9 | find_module |
Module / package and which files import it |
| 10 | find_imports |
Import statements referencing a module or alias |
| 11 | find_method_usages |
All callers of a function |
| 12 | find_callers |
Call chain tracing upstream |
| 13 | get_class_hierarchy |
Inheritance tree โ parents and children |
| 14 | get_change_impact |
Blast radius: how many callers would break |
| 15 | get_complexity |
Cyclomatic complexity for a specific function |
| 16 | get_top_complex_functions |
Highest-risk functions in the repo |
| 17 | get_file_source |
Full file content from the graph |
| 18 | get_language_stats |
Per-language file / function / class counts |
| 19 | get_repo_stats |
Overall graph statistics |
Phase 2 โ Structured Synthesis
After exploration, a second LLM call receives all accumulated messages plus a JSON schema embedded directly in the system prompt. The response is parsed robustly โ handles code fences, prose wrapping, and raw JSON โ so any model on OpenRouter works without needing structured-output API support.
Node types: Repository ยท File ยท Function ยท Class ยท Variable ยท Module
Relationships:
(Repository)-[:CONTAINS]โโโบ(File)
(File)-[:CONTAINS]โโโโโโโโโโบ(Function | Class | Variable)
(File)-[:IMPORTS]โโโโโโโโโโโบ(Module)
(Class)-[:CONTAINS]โโโโโโโโโบ(Function)
(Class)-[:INHERITS]โโโโโโโโโบ(Class)
(Function)-[:CALLS]โโโโโโโโโบ(Function)
| Component | Technology |
|---|---|
| API framework | FastAPI + Uvicorn |
| Package manager | uv |
| Database | Neo4j |
| Code parsing | Tree-sitter (17 languages) |
| AI / LLM | LangGraph + LangChain + OpenRouter |
| Embeddings | openai/text-embedding-3-small via OpenRouter |
| GitHub integration | PyGithub + GitHub App webhooks |
| Auth / user data | Firebase Admin SDK + Firestore |
| Observability | Logfire |
| Component | Technology |
|---|---|
| Framework | Next.js 16 (App Router) + React 19 |
| Language | TypeScript (strict mode) |
| Styling | TailwindCSS 4 + shadcn/ui (Radix primitives) |
| Icons | Lucide React |
api/ # FastAPI backend
โโโ app.py # Entry point, CORS, router registration
โโโ routers/
โ โโโ ingestion.py # POST /repository, /setup, /github
โ โโโ query.py # GET /search, /stats, /code-finder/*
โ โโโ repository.py # GET/DELETE repositories
โ โโโ webhook.py # POST /onPush, /onComment, /github
โโโ services/
โโโ review_service.py # PR review pipeline orchestration
โโโ push_service.py # Incremental push handling
code_review_agent/ # LangGraph PR review agent
โโโ agent/
โ โโโ review_graph.py # Phase 1: ReAct exploration graph
โ โโโ runner.py # Two-phase pipeline entry point
โ โโโ tools.py # 19 Neo4j query tools
โ โโโ prompts.py # System prompts
โโโ models/
โโโ agent_schemas.py # AgentFindings, ReviewResults, Issue
db/ # Neo4j database layer
โโโ client.py # Connection management + retry
โโโ ingestion.py # Graph ingestion service
โโโ queries.py # CodeQueryService (search, stats, CRUD)
โโโ schema.py # Constraints, Lucene indexes, CYPHER_QUERIES
ingestion/ # Code parsing & ingestion engine
โโโ repo_ingestion_engine.py # Main orchestrator
โโโ graph_builder.py # Graph construction from ASTs
โโโ code_search.py # CodeFinder class
โโโ languages/ # 17 per-language Tree-sitter parsers
common/ # Shared utilities
โโโ embedder.py # Batch embedding via OpenRouter
โโโ diff_parser.py # Unified diff parsing
โโโ bugviper_firebase_service.py
apps/frontend/ # Next.js 16 frontend
โโโ app/(protected)/
โ โโโ query/ # Search + Analysis + CodeFinder + Review tabs
โ โโโ repositories/ # Repo management + ingestion
โโโ lib/
โโโ api.ts # All fetch wrappers
โโโ auth-context.tsx
- Python 3.13+,
uv - Node.js 20+
- Neo4j (local or AuraDB)
- OpenRouter API key
uv sync
cp .env.example .env # fill in variables
uvicorn api.app:app --host 0.0.0.0 --port 8000 --reloadcd apps/frontend && npm install && npm run dev # http://localhost:3000./start.sh # API + Frontend + Ngrok# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=...
# LLM
OPENROUTER_API_KEY=...
REVIEW_MODEL=z-ai/glm-5 # any OpenRouter model
# GitHub App
GITHUB_APP_ID=...
GITHUB_PRIVATE_KEY_PATH=...
GITHUB_WEBHOOK_SECRET=...
# Firebase
SERVICE_FILE_LOC=path/to/service-account.json
# Optional
ENABLE_LOGFIRE=true
LOGFIRE_TOKEN=...
API_ALLOWED_ORIGINS=http://localhost:3000
INGESTION_SERVICE_URL= # empty = local; set = Cloud TasksCyclomatic complexity is stored on every Function node at ingestion time:
| Score | Risk |
|---|---|
| 1โ5 | Simple |
| 6โ10 | Moderate |
| 11โ20 | Complex โ refactor candidate |
| 20+ | High risk โ bugs likely here |
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/ingest/repository |
Ingest a repository |
| POST | /api/v1/ingest/setup |
Init DB schema + indexes |
| GET | /api/v1/repos/ |
List all repositories |
| GET | /api/v1/query/search |
Full-text code search |
| GET | /api/v1/query/code-finder/function |
Find function by name |
| GET | /api/v1/query/code-finder/peek |
Peek lines around a file location |
| GET | /api/v1/query/code-finder/complexity/top |
Most complex functions |
| POST | /api/v1/query/diff-context |
Build RAG context for a diff |
| POST | /api/v1/webhook/github |
GitHub App webhook dispatcher |
Full API docs: /docs (Swagger) and /redoc when the server is running.
# Python
black . # format
ruff check . # lint
mypy . # type check
pytest # tests
pytest --cov # coverage
# Frontend
cd apps/frontend && npm run lint && npm run build- Abstract review model โ pluggable per-repo agent configs
- Improve incremental re-indexing on push
- Per-project CLAUDE.md / guidelines injected into review prompts
- Guardrails and output validation
- GitHub push, PR, and branch webhook coverage
- Auto-tag CLAUDE.md from ingested repo







