[Rule] PrizeCollectingSteinerForest to SteinerTree

## Source

PrizeCollectingSteinerForest

## Target

SteinerTree

## Motivation

This is the natural companion rule for the biology-paper PCSF model.

The key modeling point is:
- the artificial root belongs in the reduction rule,
- not in the base PCSF model.

This rule records the well-known auxiliary-root construction: turn a forest into a single tree in an augmented graph by adding an artificial root `r`, while compiling the omitted-prize tradeoff into the target tree formulation through an explicit per-vertex terminal-pair gadget.

## Reference

The cited papers reduce PCSF to **Prize-Collecting Steiner Tree (PCST)** (a target that still carries node prizes), not to plain Steiner Tree. We use the same artificial-root idea, plus a standard auxiliary-terminal gadget to translate the remaining prize term into ordinary Steiner-tree edge costs. This combined construction is treated as a well-known auxiliary-root gadget in the prize-collecting Steiner literature (see Bienstock-Goemans-Simchi-Levi-Williamson 1993 for the original prize/penalty framework predating Tuncbag).

- Daniel Bienstock, Michel X. Goemans, David Simchi-Levi, David Williamson, "A note on the prize collecting traveling salesman problem," *Mathematical Programming* 59 (1993), 413-420. https://doi.org/10.1007/BF01581256
- Nurcan Tuncbag et al., "Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem," *Journal of Computational Biology* 20(2):124-136, 2013. https://doi.org/10.1089/cmb.2012.0092
- Nurcan Tuncbag et al., "Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem," *RECOMB 2012*, LNBI 7262, pp. 287-301, 2012. https://doi.org/10.1007/978-3-642-29627-7_31

The Tuncbag paper gives the exact PCSF objective
`f'(F) = beta * sum_{v notin V_F} p(v) + sum_{e in E_F} c(e) + omega * kappa(F)`
and explains the artificial-root transformation by attaching a new root to every original vertex with cost `omega`. The remaining step — translating the prize term `beta * p(v)` for `v notin V_F` into plain Steiner-tree edge costs via a constant-size auxiliary-terminal gadget — is the routine extension recorded by this rule.

## Reduction Algorithm

This proposal is for the first concrete undirected repo variant.

Let the source instance be:
- a graph `G = (V, E)`
- vertex prizes `p(v) >= 0`
- edge costs `c(e) >= 0`
- parameters `beta >= 0`, `omega >= 0`

Let `V_p = { v in V : p(v) > 0 }` denote the vertices that actually carry a prize. Vertices with `p(v) = 0` contribute nothing to the omitted-prize term and need no gadget.

### Construction

Build the target graph `H = (V_H, E_H)` with edge costs `c_H` and terminal set `T_H` as follows.

1. **Add a new artificial root** `r`. Set `V_H = V cup {r}`.

2. **Keep every original edge** `e = (u, v) in E` with cost `c_H(e) = c(e)`.

3. **Add a root-attachment edge** `(r, v)` of cost `omega` for every original vertex `v in V`.

4. **Per-vertex prize gadget.** For each `v in V_p` (vertex with positive prize):
   - add an auxiliary vertex `t_v` (so `V_H` gains one node per prized vertex),
   - add edge `(v, t_v)` with cost `0`,
   - add edge `(r, t_v)` with cost `beta * p(v)`.

   Intuition:
   - if `v` is included in the projected forest (so `v` is connected to `r` through original vertices), the gadget is paid by taking the `0`-cost edge `(v, t_v)`,
   - if `v` is omitted, the only way to connect `t_v` to the tree is the `(r, t_v)` edge of cost `beta * p(v)`.

5. **Target terminal set.** `T_H = {r} cup { t_v : v in V_p }`. Original vertices are **non-terminals** (Steiner vertices).

### Scaling parameter beta

If both `p(v)` and `c(e)` are integral and `beta` is integral, take `beta = 1` (the construction above uses `beta` directly; no scaling needed because all charges are commensurate).

If any of `p(v)`, `c(e)`, or `beta` is rational, multiply all edge costs in `H` by a common denominator before solving the target instance (Steiner tree optima are preserved under positive scaling). If a future variant requires integer costs but the source has real-valued data, pick a scale factor `S >= max(c(e), beta*p(v)) + 1` to disambiguate ties; for the integral case used here, `beta = 1` with the original costs suffices.

### Solving and witness extraction

Solve the resulting `SteinerTree` instance on `H` to obtain an optimal Steiner tree `T*` spanning `T_H`.

To recover the source PCSF witness `(V_F, E_F)`:
- `E_F = T* cap E(G)`  (intersect the target tree edges with the *original* edge set `E`),
- `V_F = { v in V : t_v in V(T*) and the edge (v, t_v) belongs to T* } cup { v in V : v is an endpoint of some edge in E_F } cup { v in V : (r, v) in T* and v is the unique attachment of its component }`.

Equivalently, take the connected components of `T*` after removing `r` and all gadget vertices `t_v`; the union of all original vertices appearing in these components is `V_F`, and `E_F` is the set of original edges among them.

### Correctness intuition

- Every tree component of a source forest `F` becomes attached exactly once to the artificial root `r`, so `T*` pays exactly `omega * kappa(F)`.
- Original selected edges retain their original costs in the target objective.
- For `v in V_p`: if `v in V_F`, the gadget edge `(v, t_v)` costs `0`; if `v notin V_F`, the only way to reach the terminal `t_v` is the `(r, t_v)` edge of cost `beta * p(v)`. Summing over `V_p` reproduces `beta * sum_{v notin V_F} p(v)`.
- Vertices with `p(v) = 0` have no gadget and contribute nothing to the omitted-prize sum, matching the source objective.
- Conversely, any source forest `F` can be lifted to a feasible Steiner tree in `H` by attaching each component once to `r` and resolving each gadget locally (included `v`: take `(v, t_v)`; omitted `v`: take `(r, t_v)`).

The two objectives agree exactly: `cost_H(T*) = f'(F*)`.

## Size Overhead

Let:
- `n = num_vertices`
- `m = num_edges`
- `k = | V_p |` = number of original vertices with `p(v) > 0` (at most `n`)

Exact target sizes:

| Target metric | Formula |
|---------------|---------|
| `num_vertices` | `n + k + 1` |
| `num_edges` | `m + n + 2 * k`  (original edges + root-attachment edges + two gadget edges per prized vertex) |
| `num_terminals` | `k + 1` |

Conservative upper bounds (using `k <= n`):

| Target metric | Upper bound |
|---------------|-------------|
| `num_vertices` | `<= 2 * n + 1` |
| `num_edges` | `<= m + 3 * n` |
| `num_terminals` | `<= n + 1` |

The overhead is linear in the source graph size.

For the `#[reduction(overhead = {...})]` macro (which uses source-instance getters), the overhead expressions are:

```
num_vertices = "num_vertices + num_vertices_with_prize + 1"
num_edges    = "num_edges + num_vertices + 2 * num_vertices_with_prize"
num_terminals = "num_vertices_with_prize + 1"
```

(`num_vertices_with_prize` is the getter on the PCSF source type that returns `|V_p|`. If the source variant exposes only `num_vertices`, use the bound versions `2 * num_vertices + 1`, `num_edges + 3 * num_vertices`, `num_vertices + 1`.)

## Validation Method

- Compare the target Steiner-tree optimum against brute-force PCSF evaluation on small graphs (`n <= 6`).
- Closed-loop: lift the optimal source forest to a target tree, solve the target instance, and check `cost_H(T*) = f'(F*)`.
- Witness round-trip: extract `(V_F, E_F)` from `T*` and verify it is a forest on `G` whose PCSF objective matches.
- Include cases:
  - empty graph (`n = 0`),
  - empty forest optimum (large `beta * p(v)` is *not* paid — vertex omitted),
  - one connected tree optimum,
  - multiple-tree optimum from a large `omega` / edge-cost tradeoff,
  - **at least one vertex omitted at the optimum** so the `(r, t_v)` half of the gadget is exercised,
  - zero-prize vertices alongside positive-prize vertices.

## Example

We pick an instance where a prize vertex is **omitted at the optimum** (so the gadget is genuinely exercised), per the check-issue recommendation.

### Source instance

- path `0 - 1 - 2` with edges `(0,1)` and `(1,2)`
- edge costs `c(0,1) = 10`, `c(1,2) = 10`
- vertex prizes `p(0) = 5`, `p(1) = 1`, `p(2) = 5`
- `beta = 1`, `omega = 1`

### Feasible source forests and their objectives

For each candidate forest `F = (V_F, E_F)`:

| `V_F` | `E_F` | edge cost | omitted prize | components `kappa(F)` | `omega * kappa(F)` | **total** |
|------|------|-----------|---------------|------------------------|---------------------|-----------|
| `{}`        | `{}`           | 0  | `5+1+5=11` | 0 | 0 | `11` |
| `{0}`       | `{}`           | 0  | `1+5=6`    | 1 | 1 | `7`  |
| `{2}`       | `{}`           | 0  | `5+1=6`    | 1 | 1 | `7`  |
| `{0, 2}`    | `{}`           | 0  | `1`        | 2 | 2 | `3`  |
| `{0, 1}`    | `{(0,1)}`      | 10 | `5`        | 1 | 1 | `16` |
| `{1, 2}`    | `{(1,2)}`      | 10 | `5`        | 1 | 1 | `16` |
| `{0, 1, 2}` | `{(0,1),(1,2)}`| 20 | `0`        | 1 | 1 | `21` |

The **optimum** is `V_F* = {0, 2}`, `E_F* = {}`, objective `3`. Vertex `1` is **omitted** even though `p(1) = 1 > 0`, because paying `beta * p(1) = 1` is cheaper than paying the edge cost to connect it (`c(0,1) = 10` or `c(1,2) = 10`).

### Augmented target instance

Vertices: `V_H = {0, 1, 2, r, t_0, t_1, t_2}`  (so `|V_H| = n + k + 1 = 3 + 3 + 1 = 7`).

Edges and costs:

| edge | cost | origin |
|------|------|--------|
| `(0,1)` | 10 | original |
| `(1,2)` | 10 | original |
| `(r,0)` | 1  | root-attach (omega) |
| `(r,1)` | 1  | root-attach (omega) |
| `(r,2)` | 1  | root-attach (omega) |
| `(0, t_0)` | 0 | gadget include-edge |
| `(r, t_0)` | 5 | gadget omit-edge (`beta*p(0)`) |
| `(1, t_1)` | 0 | gadget include-edge |
| `(r, t_1)` | 1 | gadget omit-edge (`beta*p(1)`) |
| `(2, t_2)` | 0 | gadget include-edge |
| `(r, t_2)` | 5 | gadget omit-edge (`beta*p(2)`) |

Total target edges: `m + n + 2k = 2 + 3 + 6 = 11`. Terminal set: `T_H = {r, t_0, t_1, t_2}` (`|T_H| = k+1 = 4`).

### Lifted optimal target tree

The lift of the source optimum `V_F* = {0, 2}, E_F* = {}` is:

- root attach `0`: edge `(r, 0)` cost 1
- root attach `2`: edge `(r, 2)` cost 1
- include gadget for `0`: edge `(0, t_0)` cost 0
- include gadget for `2`: edge `(2, t_2)` cost 0
- omit gadget for `1`: edge `(r, t_1)` cost 1

This is a tree spanning `T_H = {r, t_0, t_1, t_2}` (vertices `0`, `2` are Steiner relays). Vertex `1` and its incident original edges are *not* in `T*`.

Total target objective: `1 + 1 + 0 + 0 + 1 = 3`. **Matches the source optimum** of `3`.

### Witness extraction

Removing `r` and all `t_v` from `T*` leaves the original vertices `{0, 2}` with no edges among them. So `V_F = {0, 2}`, `E_F = T* cap E(G) = {}`. This is the source optimum, confirming the round-trip.

### Why this example exercises the novel gadget

Vertex `1` is **omitted at the optimum**, so the gadget edge `(r, t_1)` (cost `beta * p(1) = 1`) is selected in `T*`. A bug that charged `beta * p(v)` when `v` is included (instead of omitted), or that omitted the `(r, t_v)` edge entirely, would change this example's optimum and be caught by the closed-loop test.

## BibTeX

```bibtex
@article{BienstockGoemansSimchiLeviWilliamson1993,
  author  = {Daniel Bienstock and Michel X. Goemans and David Simchi-Levi and David Williamson},
  title   = {A note on the prize collecting traveling salesman problem},
  journal = {Mathematical Programming},
  volume  = {59},
  number  = {1},
  pages   = {413--420},
  year    = {1993},
  doi     = {10.1007/BF01581256},
  url     = {https://doi.org/10.1007/BF01581256}
}

@article{TuncbagEtAl2013PCSF,
  author  = {Nurcan Tuncbag and Alfredo Braunstein and Andrea Pagnani and Shao-Shan Carol Huang and Jennifer Chayes and Christian Borgs and Riccardo Zecchina and Ernest Fraenkel},
  title   = {Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem},
  journal = {Journal of Computational Biology},
  volume  = {20},
  number  = {2},
  pages   = {124--136},
  year    = {2013},
  doi     = {10.1089/cmb.2012.0092},
  url     = {https://doi.org/10.1089/cmb.2012.0092}
}

@inproceedings{TuncbagEtAl2012RECOMB,
  author    = {Nurcan Tuncbag and Alfredo Braunstein and Andrea Pagnani and Shao-Shan Carol Huang and Jennifer Chayes and Christian Borgs and Riccardo Zecchina and Ernest Fraenkel},
  title     = {Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem},
  booktitle = {Research in Computational Molecular Biology (RECOMB 2012)},
  series    = {Lecture Notes in Bioinformatics},
  volume    = {7262},
  pages     = {287--301},
  year      = {2012},
  publisher = {Springer},
  doi       = {10.1007/978-3-642-29627-7_31},
  url       = {https://doi.org/10.1007/978-3-642-29627-7_31}
}
```


Target metric	Formula
`num_vertices`	`n + k + 1`
`num_edges`	`m + n + 2 * k` (original edges + root-attachment edges + two gadget edges per prized vertex)
`num_terminals`	`k + 1`

Target metric	Upper bound
`num_vertices`	`<= 2 * n + 1`
`num_edges`	`<= m + 3 * n`
`num_terminals`	`<= n + 1`

`V_F`	`E_F`	edge cost	omitted prize	components `kappa(F)`	`omega * kappa(F)`	total
`{}`	`{}`	0	`5+1+5=11`	0	0	`11`
`{0}`	`{}`	0	`1+5=6`	1	1	`7`
`{2}`	`{}`	0	`5+1=6`	1	1	`7`
`{0, 2}`	`{}`	0	`1`	2	2	`3`
`{0, 1}`	`{(0,1)}`	10	`5`	1	1	`16`
`{1, 2}`	`{(1,2)}`	10	`5`	1	1	`16`
`{0, 1, 2}`	`{(0,1),(1,2)}`	20	`0`	1	1	`21`

edge	cost	origin
`(0,1)`	10	original
`(1,2)`	10	original
`(r,0)`	1	root-attach (omega)
`(r,1)`	1	root-attach (omega)
`(r,2)`	1	root-attach (omega)
`(0, t_0)`	0	gadget include-edge
`(r, t_0)`	5	gadget omit-edge (`beta*p(0)`)
`(1, t_1)`	0	gadget include-edge
`(r, t_1)`	1	gadget omit-edge (`beta*p(1)`)
`(2, t_2)`	0	gadget include-edge
`(r, t_2)`	5	gadget omit-edge (`beta*p(2)`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rule] PrizeCollectingSteinerForest to SteinerTree #1027

Source

Target

Motivation

Reference

Reduction Algorithm

Construction

Scaling parameter beta

Solving and witness extraction

Correctness intuition

Size Overhead

Validation Method

Example

Source instance

Feasible source forests and their objectives

Augmented target instance

Lifted optimal target tree

Witness extraction

Why this example exercises the novel gadget

BibTeX

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Rule] PrizeCollectingSteinerForest to SteinerTree #1027

Description

Source

Target

Motivation

Reference

Reduction Algorithm

Construction

Scaling parameter beta

Solving and witness extraction

Correctness intuition

Size Overhead

Validation Method

Example

Source instance

Feasible source forests and their objectives

Augmented target instance

Lifted optimal target tree

Witness extraction

Why this example exercises the novel gadget

BibTeX

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions