Skip to content

[Model] SetBasis #400

@isPANN

Description

@isPANN

Motivation

SET BASIS (P134) from Garey & Johnson, A3 SP7. An NP-complete problem in set/storage theory: given a collection C of subsets of a finite set S, find a minimum-size "basis" B of subsets such that every set in C can be expressed as a union of sets in B. This has applications in data compression, database schema design, and Boolean function minimization -- finding a compact representation of a family of sets. The problem is closely related to set cover but has a fundamentally different structure: instead of covering elements, we must reconstruct exact sets via unions.

Associated reduction rules:

  • As target: R74 (VERTEX COVER -> SET BASIS)

Definition

Name: SetBasis
Canonical name: Set Basis Problem; also: Minimum Test Set Basis, Minimum Set Basis
Reference: Garey & Johnson, Computers and Intractability, A3 SP7

Mathematical definition:

INSTANCE: Collection C of subsets of a finite set S, positive integer K ≤ |C|.
QUESTION: Is there a collection B of subsets of S with |B| = K such that, for each c ∈ C, there is a subcollection of B whose union is exactly c?

The optimization version asks: find the minimum K such that a basis B of size K exists.

Variables

  • Count: K × |S| binary variables. The K basis elements are each encoded as a binary vector of length |S|, indicating which elements of S are included in that basis set.
  • Per-variable domain: binary {0, 1} — whether element j ∈ S belongs to basis set i.
  • Meaning: The configuration is a K × |S| binary matrix where row i defines basis element b_i ⊆ S. The assignment is valid (returns true) if every set c ∈ C can be exactly reconstructed as a union of some subcollection of B = {b_1, ..., b_K}.
  • dims() spec: vec![2; k * universe_size] — K × |S| binary variables, where variables [i*|S| .. (i+1)*|S|) encode basis element b_i.

Schema (data type)

Type name: SetBasis
Variants: none (pure set-theoretic problem)

Field Type Description
universe_size usize Size of the ground set S (=
collection Vec<Vec<usize>> The collection C of target subsets of S (each represented as sorted element indices)
k usize Maximum allowed basis size K

Size fields (for overhead expressions): universe_size (= |S|), num_sets (= |C|), basis_size (= K)

Notes:

  • This is a satisfaction (decision) problem: Metric = bool, implementing SatisfactionProblem.
  • Key getter methods: universe_size() (= |S|), num_sets() (= |C|), basis_size() (= K).
  • Basis elements are arbitrary subsets of S (not necessarily members of C).
  • Field naming follows MinimumSetCovering conventions (universe_size, not set_size).

Complexity

  • Decision complexity: NP-complete (Stockmeyer, 1975; transformation from VERTEX COVER). Remains NP-complete when all c ∈ C have |c| ≤ 3, but is trivial when all |c| ≤ 2.
  • Best known exact algorithm: No specialized exact exponential algorithm is known. The problem can be formulated as an ILP. For practical instances, constraint programming and SAT solvers are used.
  • Complexity string: "2^(basis_size * universe_size)" (brute force over K × |S| binary variables)

declare_variants! guidance:

crate::declare_variants! {
    SetBasis => "2^(basis_size * universe_size)",
}
  • References:
    • L. J. Stockmeyer (1975). "The Set Basis Problem is NP-Complete." IBM Research Report RC 5431, IBM Research Center, Yorktown Heights, NY.

Specialization

  • This is a special case of: General set representation / compression problems
  • Known special cases:
    • All sets in C have size ≤ 2: trivially solvable (each element is its own basis element)
    • All sets in C have size ≤ 3: still NP-complete
  • Related problems: SET COVER (basis elements must cover S, not reconstruct C), EXACT COVER (disjoint union), MINIMUM EQUIVALENT EXPRESSION

Extra Remark

Full book text:

INSTANCE: Collection C of subsets of a finite set S, positive integer K ≤ |C|.
QUESTION: Is there a collection B of subsets of S with |B| = K such that, for each c ∈ C, there is a subcollection of B whose union is exactly c?
Reference: [Stockmeyer, 1975]. Transformation from VERTEX COVER.
Comment: Remains NP-complete if all c ∈ C have |c| ≤ 3, but is trivial if all c ∈ C have |c| ≤ 2.

How to solve

  • It can be solved by (existing) bruteforce. Enumerate all possible collections of K subsets of S; for each collection B, check if every c ∈ C can be expressed as a union of a subcollection of B.
  • It can be solved by reducing to integer programming. Binary variable x_T for each candidate subset T ⊆ S; constrain sum(x_T) = K; for each c ∈ C, add constraints ensuring that c is expressible as the union of selected basis sets.
  • Other: SAT encoding; constraint programming.

Example Instance

Ground set S = {a, b, c, d} (|S| = 4), Collection C of 4 subsets, K = 3:

  • c_1 = {a, b}
  • c_2 = {b, c}
  • c_3 = {a, c}
  • c_4 = {a, b, c}

Question: Is there a basis B of 3 subsets of S such that each c_i is a union of sets in B?

Solution: B = { {a}, {b}, {c} } with |B| = 3.

  • c_1 = {a, b} = {a} ∪ {b} ✓
  • c_2 = {b, c} = {b} ∪ {c} ✓
  • c_3 = {a, c} = {a} ∪ {c} ✓
  • c_4 = {a, b, c} = {a} ∪ {b} ∪ {c} ✓

Answer for K = 3: YES.

Infeasibility for K = 2: Suppose B = {b_1, b_2}. Then c_1 = {a,b}, c_2 = {b,c}, and c_3 = {a,c} must each be either b_1, b_2, or b_1 ∪ b_2. Since b_1 ∪ b_2 can equal at most one of these three sets, the other two must each be b_1 or b_2 — but we only have two basis elements for three distinct sets. So K = 2 is infeasible.

Answer for K = 2: NO. Minimum basis size = 3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions