Skip to content

reactome/logic-network-generator

Logic Network Generator

Tests Code Style Python Version License

Generate logic networks from Reactome pathways by decomposing complexes and EntitySets into their components, expanding alternatives, and emitting a graph of input/output/catalyst/regulator/assembly/dissociation edges suitable for perturbation modeling.

Features

  • Position-aware UUIDs — the same entity at different pathway positions gets distinct identifiers, so perturbing a protein in one location doesn't unintentionally perturb it elsewhere.
  • Full EntitySet expansion with provenance — every alternative input becomes its own virtual reaction; decomposed_uid_mapping.csv records source_entity_id so leaves can be traced back to their parent set.
  • Boundary decomposition — root-input and terminal-output complexes get synthetic assembly / dissociation edges so individual proteins are perturbable at the network's edges; intermediate complexes stay intact (they're real species in the pathway).
  • Reaction-level AND/OR semantics — input/catalyst/positive-regulator edges are AND, negative regulators are OR (any one blocks). See docs/DESIGN_DECISIONS.md.
  • Bulk Cypher pre-fetch — pathway generation pulls all entity/reaction data in five queries up front, making per-reaction processing cache-only. Cell_Cycle's 474 reactions go from hours to minutes.
  • Validated against curator predictions — 70.55% end-to-end agreement with the MP-BioPath curator-prediction test set across 12,895 valid cases; 98.3% on cases where the network is the deciding factor. See validation_results/.

Quick Start

Prerequisites

Installation

git clone https://github.com/reactome/logic-network-generator.git
cd logic-network-generator
poetry install

# Start a local Reactome Neo4j database
docker-compose up -d

By default the connection points at bolt://localhost:7687 with user neo4j / password test. Override via env vars NEO4J_URL, NEO4J_USER, NEO4J_PASSWORD.

Generate a Pathway

# Single pathway (use the R-HSA-prefixed stable ID)
poetry run python bin/create-pathways.py --pathway-id R-HSA-69620

# Batch from a TSV with `id` and `pathway_name` columns
poetry run python bin/create-pathways.py --pathway-list pathways.tsv

# Every Homo sapiens top-level pathway
poetry run python bin/create-pathways.py --top-level-pathways

Output

Each pathway generates a directory under output/:

output/<Pathway_Name>_R-HSA-<id>/
├── logic_network.csv          # Main output: edges of the perturbation graph
├── stid_to_uuid_mapping.csv   # UUID → Reactome stable ID
└── cache/
    ├── reaction_connections.csv
    ├── decomposed_uid_mapping.csv
    └── best_matches.csv

Logic Network Format

logic_network.csv columns:

Column Description
source_id UUID of source node (entity or virtual reaction)
target_id UUID of target node
pos_neg pos (activates / produces) or neg (negative regulator)
and_or and (required), or (alternative source), or empty (single producer)
edge_type input, output, catalyst, regulator, assembly, or dissociation
stoichiometry Stoichiometric coefficient from Reactome

assembly and dissociation edges only appear at boundaries: a leaf protein assembles into a root-input complex, or a terminal-output complex dissociates into its components.

Validation

The generated networks have been benchmarked against the MP-BioPath curator-prediction test set (Sundararaman et al., 2017).

  • 70.55% end-to-end accuracy on 12,895 valid test cases (vs ~75% published for MP-BioPath on its 10-pathway empirical subset)
  • 98.3% network correctness on cases where the network's connectivity is the deciding factor (excludes propagator limitations and v86→v96 test-set drift)
  • 1.2% of valid tests flagged as bug_candidate; on per-case investigation all examined cases were structural limitations of directed-flow Boolean propagation (substrate-consumption mass-action effects), not network-generation bugs

Full methodology, per-pathway reports, and failure-category analysis: validation_results/README.md.

Utilities

# Generate a database-ID-to-name mapping file
poetry run python bin/create-db-id-name-mapping-file.py

# Validate a generated set of networks against the MP-BioPath test set
poetry run python bin/validate-against-mpbiopath.py

# For each "no_path" failure, ask Neo4j whether the original pathway has a path
poetry run python bin/check-no-path-cases-in-neo4j.py

Testing

Three tiers, distinguished by what they need to run:

# Unit tier (CI runs this — no database, no generated artifacts)
poetry run pytest -m "not database and not integration"

# Integration tier (needs output/ artifacts from a prior run)
poetry run pytest -m integration

# Database tier (needs a running Reactome Neo4j)
poetry run pytest -m database

See tests/README.md for details on each tier.

Tracking new Reactome releases

The database-tier tests are version-agnostic — they discover pathways from the loaded graph rather than hard-coding stable IDs — so they should survive a Reactome bump. The workflow when a new Reactome release ships:

  1. Update the image tag in docker-compose.yml (public.ecr.aws/reactome/graphdb:Release<N>).
  2. docker-compose pull && docker-compose up -d.
  3. poetry run pytest -m database -vtest_reactome_version.py records which release the run actually used; other tests will surface anything Reactome changed schematically.
  4. If accuracy numbers in validation_results/ matter for that release, re-run poetry run python bin/validate-against-mpbiopath.py and update the README headline numbers.

Documentation

  • Architecture — system architecture and data flow
  • Design Decisions — behaviors that look surprising but are intentional (Complex vs EntitySet semantics, the two-layer decomposition model, surplus input/output fan-out)
  • Position-Aware UUIDs — why and how UUIDs are assigned per pathway position
  • Examples — usage examples and patterns
  • Validation Results — full benchmark methodology and per-pathway numbers

Development

# Start the Neo4j database
docker-compose up -d

# Lint and format
poetry run ruff check src/
poetry run ruff format src/

# Pre-commit hooks
poetry run pre-commit install
poetry run pre-commit run --all-files

See CONTRIBUTING.md for development guidelines.

License

Apache 2.0 — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages