Skip to content

[Model] ConsecutiveSets #421

@isPANN

Description

@isPANN

Motivation

CONSECUTIVE SETS (P166) from Garey & Johnson, A4 SR18. An NP-complete problem from the domain of storage and retrieval. Given a finite alphabet and a collection of subsets, the question is whether there exists a short string over the alphabet such that each subset's elements appear as a consecutive block in the string. This is a generalization of the consecutive ones property from matrices to a string-based formulation and arises in information retrieval and file organization.

Associated rules:

  • R112: Hamiltonian Path -> Consecutive Sets (as target)

Definition

Name: ConsecutiveSets
Canonical name: CONSECUTIVE SETS
Reference: Garey & Johnson, Computers and Intractability, A4 SR18

Mathematical definition:

INSTANCE: Finite alphabet Sigma, collection C = {Sigma_1, Sigma_2, ..., Sigma_n} of subsets of Sigma, and a positive integer K.
QUESTION: Is there a string w in Sigma* with |w| <= K such that, for each i, the elements of Sigma_i occur in a consecutive block of |Sigma_i| symbols of w?

Variables

  • Count: bound_k position variables, one per position in the string w.
  • Per-variable domain: Each position takes a value in {0, ..., alphabet_size} — values 0..alphabet_size-1 represent symbols, value alphabet_size represents "unused" (for strings shorter than bound_k).
  • dims(): vec![alphabet_size + 1; bound_k]
  • evaluate(): Interpret the configuration as a string w (strip trailing "unused" symbols). Return true iff for every subset Sigma_i in C, there exists a contiguous substring of w of length |Sigma_i| that contains exactly the elements of Sigma_i.

Schema (data type)

Type name: ConsecutiveSets
Variants: None

Field Type Description
alphabet_size usize Size of the alphabet Sigma (elements are {0, ..., alphabet_size - 1})
subsets Vec<Vec<usize>> The collection C of subsets of Sigma
bound_k usize The positive integer K (max string length)

Size fields (getter methods for overhead expressions and declare_variants!):

  • alphabet_size() — returns alphabet_size
  • num_subsets() — returns subsets.len() (n)
  • bound_k() — returns K

Notes:

  • This is a satisfaction (decision) problem: Metric = bool, implementing SatisfactionProblem.
  • When K = |Sigma| (number of distinct symbols), the problem is equivalent to testing a matrix for the consecutive ones property, which is polynomial-time solvable.
  • The circular variant (blocks may wrap around from end to beginning of ww) is also NP-complete [Booth, 1975].

Complexity

  • Best known exact algorithm: O(alphabet_size^bound_k * n) brute-force by trying all strings of length bound_k over the alphabet and checking if subsets form consecutive blocks. When bound_k = alphabet_size, this reduces to O(alphabet_size! * n) by considering only permutations.
  • NP-completeness: NP-complete [Kou, 1977]. Transformation from HAMILTONIAN PATH.
  • declare_variants! complexity string: "alphabet_size^bound_k * num_subsets"
  • Polynomial special case: If K equals the number of distinct symbols appearing in the subsets, the problem reduces to testing a binary matrix for the consecutive ones property [Booth and Lueker, 1976], solvable in linear time.
  • References:
    • L. T. Kou (1977). "Polynomial complete consecutive information retrieval problems." SIAM Journal on Computing, 6(1):67-75.
    • K. S. Booth (1975). "PQ Tree Algorithms." Ph.D. thesis, University of California, Berkeley.
    • K. S. Booth and G. S. Lueker (1976). "Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms." J. Computer and System Sciences, 13:335-379.

Extra Remark

Full book text:

INSTANCE: Finite alphabet Sigma, collection C = {Sigma_1, Sigma_2, ..., Sigma_n} of subsets of Sigma, and a positive integer K.
QUESTION: Is there a string w in Sigma* with |w| <= K such that, for each i, the elements of Sigma_i occur in a consecutive block of |Sigma_i| symbols of W?
Reference: [Kou, 1977]. Transformation from HAMILTONIAN PATH.
Comment: The variant in which we ask only that the elements of each Sigma_i occur in a consecutive block of |Sigma_i| symbols of the string ww (i.e., we allow blocks that circulate from the end of w back to its beginning) is also NP-complete [Booth, 1975]. If K is the number of distinct symbols in the Sigma_i, then these problems are equivalent to determining whether a matrix has the consecutive ones property or the circular ones property and are solvable in polynomial time.

How to solve

  • It can be solved by (existing) bruteforce -- enumerate all strings w of length <= K over Sigma and verify the consecutive block condition for each subset.
  • It can be solved by reducing to integer programming -- assign position variables to symbols and linearize the consecutiveness constraints.
  • Other: Reduction to consecutive ones property testing for the special case K = |Sigma|; constraint programming for general instances.

Example Instance

Instance 1 (YES instance):
alphabet_size = 6 (symbols: {0, 1, 2, 3, 4, 5})
Subsets: C = [{0, 4}, {2, 4}, {2, 5}, {1, 5}, {1, 3}]
bound_k = 6

The identity string [0, 1, 2, 3, 4, 5] fails: {0, 4} requires 0 and 4 adjacent, but they are 4 apart.

String w = [0, 4, 2, 5, 1, 3] (length 6 = bound_k):

  • {0, 4}: positions 0-1 = [0, 4] -- consecutive. YES.
  • {2, 4}: positions 1-2 = [4, 2] -- consecutive. YES.
  • {2, 5}: positions 2-3 = [2, 5] -- consecutive. YES.
  • {1, 5}: positions 3-4 = [5, 1] -- consecutive. YES.
  • {1, 3}: positions 4-5 = [1, 3] -- consecutive. YES.
    Answer: YES (the overlapping pair constraints force a chain: 0-4-2-5-1-3)

Instance 2 (NO instance):
alphabet_size = 6 (symbols: {0, 1, 2, 3, 4, 5})
Subsets: C = [{0, 2, 4}, {1, 3, 5}, {0, 1}, {2, 3}, {4, 5}]
bound_k = 6

For any string w of length 6 that is a permutation of {0, 1, 2, 3, 4, 5}:

  • {0, 2, 4} must be consecutive: symbols 0, 2, 4 must appear in 3 adjacent positions.
  • {1, 3, 5} must be consecutive: symbols 1, 3, 5 must appear in 3 adjacent positions.
  • These two blocks of 3 must occupy positions [0-2] and [3-5] (in some order).
  • {0, 1} requires 0 and 1 adjacent, but 0 is in one block and 1 is in the other. They can only be adjacent at the boundary (positions 2 and 3).
  • Similarly {2, 3} and {4, 5} each require cross-block adjacency at the boundary.
  • Only one pair can be at the boundary, so at most one of {0,1}, {2,3}, {4,5} can be satisfied.
    Answer: NO

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions