Skip to content

[Model] PrizeCollectingSteinerForest #1026

@zhangjieqingithub

Description

@zhangjieqingithub

Motivation

The catalog already has Steiner-tree models, but it does not have the biology-facing prize-collecting Steiner forest variant used to recover multiple pathway components at once.

This model is useful because it:

  • generalizes prize-collecting Steiner tree from one connected tree to a forest,
  • keeps the paper's node-prize / edge-cost objective directly in the model,
  • separates the biological model from the artificial-root trick, which should live in a reduction rule rather than in the base problem definition.

Associated rule:

Definition

Name: PrizeCollectingSteinerForest
Reference:

  • Nurcan Tuncbag et al., "Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem," Journal of Computational Biology 20(2):124-136, 2013. https://doi.org/10.1089/cmb.2012.0092
  • Nurcan Tuncbag et al., "Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem," RECOMB 2012, LNBI 7262, pp. 287-301, 2012. https://doi.org/10.1007/978-3-642-29627-7_31

For a network
G = (V, E, c, p)
with nonnegative edge costs c(e) >= 0, nonnegative vertex prizes p(v) >= 0, and parameters beta >= 0, omega >= 0, find a forest subgraph
F = (V_F, E_F)
minimizing
beta * sum_{v notin V_F} p(v) + sum_{e in E_F} c(e) + omega * kappa(F),
where kappa(F) is the number of tree components of F.

Important conventions for this proposal:

  • singleton vertices are allowed as one-vertex tree components,
  • the empty forest is feasible,
  • at the paper level, the input network may be directed or undirected,
  • in the directed case, "forest" means acyclic after forgetting edge directions.

So this is a minimization problem.

Variables

Let n = |V| and m = |E|.

  • Count: n + m binary variables
  • Per-variable domain: {0,1}
  • Meaning:
    • x_v = 1 iff vertex v belongs to V_F
    • y_e = 1 iff edge or arc e belongs to E_F

A configuration is feasible iff the selected edges are incident only to selected vertices and the selected subgraph is a forest. Singleton selected vertices are allowed.

Schema (data type)

Type name: PrizeCollectingSteinerForest

Paper-level variants:

  • undirected networks
  • directed networks

Recommended first concrete repo variant:

  • SimpleGraph initially
  • weight types i32 and f64
  • directed-network support can be added later as a variant extension
Field Type Description
graph G The underlying network
vertex_prizes Vec<W> Vertex prizes p : V -> R_{>= 0}
edge_costs Vec<W> Edge or arc costs c : E -> R_{>= 0}
beta W Tradeoff coefficient on omitted prizes
omega W Cost per tree component

Complexity

A conservative exact baseline is brute-force over vertex selections and edge selections, checking whether the chosen subgraph is a forest and evaluating the objective.

This gives a conservative worst-case bound
O(2^(num_vertices + num_edges) * poly(num_vertices + num_edges))

so the registry complexity string should be
2^(num_vertices + num_edges).

I have not verified a sharper exact worst-case bound for this specific paper-style PCSF variant, so the proposal should use the conservative baseline above.

References for the model and its practical solver context:

Extra Remark

This is not the standard demand-pair penalty version of PCSF.

The intended model is the biology-paper variant with:

  • node prizes,
  • edge costs,
  • an explicit component penalty omega * kappa(F).

The artificial root is not part of the base model. It belongs in the companion reduction rule.

Reduction Rule Crossref

How to solve

Example Instance

Take the undirected path
0 - 1 - 2
with:

  • edge costs
    • c(0,1) = 1
    • c(1,2) = 6
  • vertex prizes
    • p(0) = 5
    • p(1) = 2
    • p(2) = 5
  • beta = 1
  • omega = 2

Expected Outcome

An optimal forest is:

  • V_F = {0,1,2}
  • E_F = {(0,1)}

So the forest has two components:

  • the tree on {0,1}
  • the singleton tree {2}

Its objective value is:

  • omitted-prize term: 0
  • edge-cost term: 1
  • component term: 2 * 2 = 4

Total:
1 + 4 = 5

This is better than:

  • the full path {(0,1),(1,2)}, which costs 1 + 6 + 2 = 9
  • three singleton vertices, which cost 0 + 0 + 3 * 2 = 6
  • the empty forest, which costs 5 + 2 + 5 = 12

So the optimum is 5.

BibTeX

@article{TuncbagEtAl2013PCSF,
  author  = {Nurcan Tuncbag and Alfredo Braunstein and Andrea Pagnani and Shao-Shan Carol Huang and Jennifer Chayes and Christian Borgs and Riccardo Zecchina and Ernest Fraenkel},
  title   = {Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem},
  journal = {Journal of Computational Biology},
  volume  = {20},
  number  = {2},
  pages   = {124--136},
  year    = {2013},
  doi     = {10.1089/cmb.2012.0092},
  url     = {https://doi.org/10.1089/cmb.2012.0092}
}

@inproceedings{TuncbagEtAl2012RECOMB,
  author    = {Nurcan Tuncbag and Alfredo Braunstein and Andrea Pagnani and Shao-Shan Carol Huang and Jennifer Chayes and Christian Borgs and Riccardo Zecchina and Ernest Fraenkel},
  title     = {Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem},
  booktitle = {Research in Computational Molecular Biology (RECOMB 2012)},
  series    = {Lecture Notes in Bioinformatics},
  volume    = {7262},
  pages     = {287--301},
  year      = {2012},
  publisher = {Springer},
  doi       = {10.1007/978-3-642-29627-7_31},
  url       = {https://doi.org/10.1007/978-3-642-29627-7_31}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions