Logic Network Generator

Generate logic networks from Reactome pathways by decomposing complexes and EntitySets into their components, expanding alternatives, and emitting a graph of input/output/catalyst/regulator/assembly/dissociation edges suitable for perturbation modeling.

Features

Position-aware UUIDs — the same entity at different pathway positions gets distinct identifiers, so perturbing a protein in one location doesn't unintentionally perturb it elsewhere.
Full EntitySet expansion with provenance — every alternative input becomes its own virtual reaction; decomposed_uid_mapping.csv records source_entity_id so leaves can be traced back to their parent set.
Boundary decomposition — root-input and terminal-output complexes get synthetic assembly / dissociation edges so individual proteins are perturbable at the network's edges; intermediate complexes stay intact (they're real species in the pathway).
Reaction-level AND/OR semantics — input/catalyst/positive-regulator edges are AND, negative regulators are OR (any one blocks). See docs/DESIGN_DECISIONS.md.
Bulk Cypher pre-fetch — pathway generation pulls all entity/reaction data in five queries up front, making per-reaction processing cache-only. Cell_Cycle's 474 reactions go from hours to minutes.
Validated against curator predictions — 70.55% end-to-end agreement with the MP-BioPath curator-prediction test set across 12,895 valid cases; 98.3% on cases where the network is the deciding factor. See validation_results/.

Quick Start

Prerequisites

Python 3.10+
Poetry
Docker (for the Reactome Neo4j database)

Installation

git clone https://github.com/reactome/logic-network-generator.git
cd logic-network-generator
poetry install

# Start a local Reactome Neo4j database
docker-compose up -d

By default the connection points at bolt://localhost:7687 with user neo4j / password test. Override via env vars NEO4J_URL, NEO4J_USER, NEO4J_PASSWORD.

Generate a Pathway

# Single pathway (use the R-HSA-prefixed stable ID)
poetry run python bin/create-pathways.py --pathway-id R-HSA-69620

# Batch from a TSV with `id` and `pathway_name` columns
poetry run python bin/create-pathways.py --pathway-list pathways.tsv

# Every Homo sapiens top-level pathway
poetry run python bin/create-pathways.py --top-level-pathways

Output

Each pathway generates a directory under output/:

output/<Pathway_Name>_R-HSA-<id>/
├── logic_network.csv          # Main output: edges of the perturbation graph
├── stid_to_uuid_mapping.csv   # UUID → Reactome stable ID
└── cache/
    ├── reaction_connections.csv
    ├── decomposed_uid_mapping.csv
    └── best_matches.csv

Logic Network Format

logic_network.csv columns:

Column	Description
`source_id`	UUID of source node (entity or virtual reaction)
`target_id`	UUID of target node
`pos_neg`	`pos` (activates / produces) or `neg` (negative regulator)
`and_or`	`and` (required), `or` (alternative source), or empty (single producer)
`edge_type`	`input`, `output`, `catalyst`, `regulator`, `assembly`, or `dissociation`
`stoichiometry`	Stoichiometric coefficient from Reactome

assembly and dissociation edges only appear at boundaries: a leaf protein assembles into a root-input complex, or a terminal-output complex dissociates into its components.

Validation

The generated networks have been benchmarked against the MP-BioPath curator-prediction test set (Sundararaman et al., 2017).

70.55% end-to-end accuracy on 12,895 valid test cases (vs ~75% published for MP-BioPath on its 10-pathway empirical subset)
98.3% network correctness on cases where the network's connectivity is the deciding factor (excludes propagator limitations and v86→v96 test-set drift)
1.2% of valid tests flagged as bug_candidate; on per-case investigation all examined cases were structural limitations of directed-flow Boolean propagation (substrate-consumption mass-action effects), not network-generation bugs

Full methodology, per-pathway reports, and failure-category analysis: validation_results/README.md.

Utilities

# Generate a database-ID-to-name mapping file
poetry run python bin/create-db-id-name-mapping-file.py

# Validate a generated set of networks against the MP-BioPath test set
poetry run python bin/validate-against-mpbiopath.py

# For each "no_path" failure, ask Neo4j whether the original pathway has a path
poetry run python bin/check-no-path-cases-in-neo4j.py

Testing

Three tiers, distinguished by what they need to run:

# Unit tier (CI runs this — no database, no generated artifacts)
poetry run pytest -m "not database and not integration"

# Integration tier (needs output/ artifacts from a prior run)
poetry run pytest -m integration

# Database tier (needs a running Reactome Neo4j)
poetry run pytest -m database

See tests/README.md for details on each tier.

Tracking new Reactome releases

The database-tier tests are version-agnostic — they discover pathways from the loaded graph rather than hard-coding stable IDs — so they should survive a Reactome bump. The workflow when a new Reactome release ships:

Update the image tag in docker-compose.yml (public.ecr.aws/reactome/graphdb:Release<N>).
docker-compose pull && docker-compose up -d.
poetry run pytest -m database -v — test_reactome_version.py records which release the run actually used; other tests will surface anything Reactome changed schematically.
If accuracy numbers in validation_results/ matter for that release, re-run poetry run python bin/validate-against-mpbiopath.py and update the README headline numbers.

Documentation

Architecture — system architecture and data flow
Design Decisions — behaviors that look surprising but are intentional (Complex vs EntitySet semantics, the two-layer decomposition model, surplus input/output fan-out)
Position-Aware UUIDs — why and how UUIDs are assigned per pathway position
Examples — usage examples and patterns
Validation Results — full benchmark methodology and per-pathway numbers

Development

# Start the Neo4j database
docker-compose up -d

# Lint and format
poetry run ruff check src/
poetry run ruff format src/

# Pre-commit hooks
poetry run pre-commit install
poetry run pre-commit run --all-files

See CONTRIBUTING.md for development guidelines.

License

Apache 2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logic Network Generator

Features

Quick Start

Prerequisites

Installation

Generate a Pathway

Output

Logic Network Format

Validation

Utilities

Testing

Tracking new Reactome releases

Documentation

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github		.github
bin		bin
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
validation_results		validation_results
.env		.env
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
pathways.tsv		pathways.tsv
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Logic Network Generator

Features

Quick Start

Prerequisites

Installation

Generate a Pathway

Output

Logic Network Format

Validation

Utilities

Testing

Tracking new Reactome releases

Documentation

Development

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages