OntoBricks 0.2.0

Digital Twin Builder for Databricks

Project Description

OntoBricks is a web application that transforms Databricks tables into a materialized knowledge graph. It lets you design ontologies (OWL), map them to Unity Catalog tables via R2RML, materialize triples into a Delta or LadybugDB triple store, reason over the graph (OWL 2 RL, SWRL, SHACL), and query it through an auto-generated GraphQL API. The entire pipeline — from metadata import to a queryable knowledge graph — can run in four clicks using LLM-powered automation.

Project Support

Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

Building the Project

OntoBricks uses uv for dependency management. All dependencies are declared in pyproject.toml.

# Clone the repository
git clone <repository-url>
cd OntoBricks

# Install dependencies (uv resolves them from pyproject.toml)
uv sync

# Or use the setup script
scripts/setup.sh

Prerequisites

Python 3.10 or higher
Databricks workspace access with a Personal Access Token
A SQL Warehouse ID
A Unity Catalog Volume for the domain registry
(Optional) A Databricks Lakebase Postgres database — required only when the admin switches the registry storage backend from Volume (default) to Lakebase in Settings → Registry. OntoBricks targets Lakebase Autoscaling exclusively (Provisioned instances are not supported). Install the optional driver with uv sync --extra lakebase.

Deploying / Installing the Project

Local Development

# Configure credentials
cp .env.example .env
# Edit .env with your Databricks host, token, and warehouse ID

# Start the application
scripts/start.sh
# Open http://localhost:8000

Deploy to Databricks Apps

# Install and configure the Databricks CLI
pip install databricks-cli
databricks configure --token

# Deploy
make deploy
# Or: scripts/deploy.sh

After deployment, bind the sql-warehouse and volume resources in the Databricks Apps UI (Compute > Apps > ontobricks > Resources). If the registry volume is empty, open the app and click Settings > Registry > Initialize.

Lakebase backend (optional). To deploy with the Lakebase Postgres backend instead of (in addition to) the Volume, deploy to the dev-lakebase target (databricks bundle deploy -t dev-lakebase) and tune the bundle variables lakebase_project, lakebase_branch, lakebase_database_resource_segment (the db-… id from databricks postgres list-databases "projects/<id>/branches/<branch>" -o json, not the Postgres database name shown in the SQL UI), and lakebase_registry_schema (mirror in app.yaml as LAKEBASE_SCHEMA). The DAB composes the full Apps postgres.database path. The DAB binds a database Apps resource so the runtime auto-injects PGHOST/PGPORT/PGDATABASE/PGUSER; the app mints the OAuth token automatically (no user secret required). The default dev/prod targets stay Volume-only and keep working as before.

First deploy only: make deploy runs scripts/bootstrap-app-permissions.sh automatically, which grants each app's service principal CAN_MANAGE on itself. Without that grant the middleware cannot read the app's own ACL and every first-time visitor — including the deploying CAN_MANAGE user — lands on the access-denied page. If you deploy via databricks bundle deploy directly, run make bootstrap-perms once afterwards (it is idempotent).

See Deployment Guide for the full checklist including resource configuration and permissions.

Releasing the Project

Ensure all tests pass: make test
Update the version in pyproject.toml
Commit, tag, and push:

git add -A && git commit -m "Release vX.Y.Z"
git tag vX.Y.Z
git push origin main --tags

Deploy the new version: make deploy

Using the Project

Automated Pipeline (4 clicks)

Step	Action	What Happens
1	Import Metadata (Domain > Metadata)	Fetches table and column metadata from Unity Catalog
2	Generate Ontology (Ontology > Wizard)	LLM designs entities, relationships, and attributes from your metadata
3	Auto-Map (Mapping > Auto-Map)	LLM generates SQL mappings for every entity and relationship
4	Synchronize (Digital Twin > Status)	Executes mappings and populates the triple store

Domain & registry (0.1.2 UX)

Ontology Designer — the main ontology graph view lives under Ontology → Designer (visual canvas + AI Assistant).
Domain Cockpit (Validation) — Active Version shows which registry version is exposed via API / MCP; it can differ from the version you have loaded in the editor.
Registry → Browse — only place to set the Active (API/MCP) version for a domain; Domain → Versions shows that status as a read-only badge.
New domain — after New Domain, a full-page loading overlay runs until Domain Information finishes its first load.
Domain Information — triple-store / snapshot / local graph paths update when you commit the domain name (blur or change) or change version (aligned with naming rules before save).
Duplicate names — Save to Unity Catalog is blocked if the sanitized domain name already exists in the registry (inline check + confirmation before POST).
Navbar — domain name and version in the top bar refresh after load, save, clear, import, and version switches (browser cache invalidated on those actions).

Manual Workflow

Design an ontology visually using the OntoViz canvas, or import OWL/RDFS/industry standards (FIBO, CDISC, IOF)
Map ontology entities to Databricks tables with column-level precision
Build the Digital Twin — materializes triples into the triple store (incremental by default)
Query through the GraphQL playground or explore the interactive knowledge graph
Reason over the graph — run OWL 2 RL inference, SWRL rules, SHACL validation, and constraint checks

Knowledge Graph Features

Two-phase search — preview matching entities in a flat list, then select specific ones to expand into the full graph with relationships and neighbors
Configurable search depth — control the maximum traversal depth and entity cap for graph expansion
Bridge navigation — follow cross-domain bridges to automatically switch domains and focus on the target entity in the knowledge graph
Data cluster detection — detect communities in the knowledge graph using Louvain, Label Propagation, or Greedy Modularity algorithms; available client-side (Graphology) for the visible subgraph and server-side (NetworkX) for the full graph; cluster results can be visualized with color-by-cluster mode and collapsed into super-nodes
Data quality violation limits — cap the number of violations displayed per rule (configurable via dropdown, default 10) for faster quality checks
Per-rule progress tracking — SWRL inference and data quality checks report progress for each individual rule

AI Assistant

The Ontology Designer view (Ontology → Designer) includes a floating AI Assistant (bottom-right of the canvas) that lets you modify your ontology through natural language commands — add entities, remove orphans, list relationships, and more. Conversation history is maintained within the session.

Navigation & Performance

Deep-linked sidebar sections — shareable URLs, browser Back/Forward support
Breadcrumb navigation — always see your position (Registry > Domain > Ontology > Section)
Keyboard shortcuts — Cmd/Ctrl+S save, Cmd/Ctrl+K search, ? help overlay
SQL connection pooling — reusable database connections, no per-query TLS handshake
CSRF protection — double-submit cookie for all state-changing requests
Structured JSON logging — set LOG_FORMAT=json for production-grade observability

MCP Integration

OntoBricks exposes the knowledge graph to LLM agents via the Model Context Protocol. Deploy the companion mcp-ontobricks app and connect from Cursor, Claude Desktop, or the Databricks Playground.

Registry Import / Export (CLI)

Promote domains between Databricks environments with the scripts/registry_transfer.sh command-line tool — export a curated subset of domains/versions from a source registry into a .zip, then preview and commit it into the target registry. No UI, no HTTP endpoint. See Registry Import / Export (CLI) for the full reference and examples.

Documentation

Full documentation is available in docs/. For a comprehensive feature list and architecture details, see INFO.md.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.claude/skills		.claude/skills
.cursor		.cursor
.github		.github
docs		docs
licenses		licenses
releases		releases
scripts		scripts
src		src
tests		tests
.cursorrules		.cursorrules
.databricksignore		.databricksignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
NOTICE.txt		NOTICE.txt
README.md		README.md
app.yaml		app.yaml
databricks.yml		databricks.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OntoBricks 0.2.0

Project Description

Project Support

Building the Project

Prerequisites

Deploying / Installing the Project

Local Development

Deploy to Databricks Apps

Releasing the Project

Using the Project

Automated Pipeline (4 clicks)

Domain & registry (0.1.2 UX)

Manual Workflow

Knowledge Graph Features

AI Assistant

Navigation & Performance

MCP Integration

Registry Import / Export (CLI)

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OntoBricks 0.2.0

Project Description

Project Support

Building the Project

Prerequisites

Deploying / Installing the Project

Local Development

Deploy to Databricks Apps

Releasing the Project

Using the Project

Automated Pipeline (4 clicks)

Domain & registry (0.1.2 UX)

Manual Workflow

Knowledge Graph Features

AI Assistant

Navigation & Performance

MCP Integration

Registry Import / Export (CLI)

Documentation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages