Skip to content

[Model] AdditionalKey #445

@isPANN

Description

@isPANN

Motivation

ADDITIONAL KEY (P175) from Garey & Johnson, A4 SR27. A classical NP-complete problem from relational database theory that asks whether a relational schema has a candidate key not already in a given set of known keys. This problem is central to database normalization: when designing schemas, one must enumerate all candidate keys to determine normal forms (especially BCNF). The NP-completeness of this problem means that verifying completeness of key enumeration is computationally intractable.

Associated rules:

  • R121: Hitting Set -> Additional Key (this model is the target)

Definition

Name: AdditionalKey
Canonical name: Additional Key
Reference: Garey & Johnson, Computers and Intractability, A4 SR27, p.232

Mathematical definition:

INSTANCE: A set A of attribute names, a collection F of functional dependencies on A, a subset R ⊆ A, and a set K of keys for the relational scheme <R,F>.
QUESTION: Does R have a key not already contained in K, i.e., is there an R' ⊆ R such that R' not in K, (R',R) in F*, and for no R'' strictly contained in R' is (R'',R) in F*?

Variables

  • Count: |R| binary variables, one per attribute in the relation scheme R.
  • Per-variable domain: binary {0, 1} — whether the attribute is included in the candidate key R'.
  • dims(): vec![2; relation_attrs.len()]
  • evaluate(): Let R' = {relation_attrs[i] : config[i] = 1}. Compute the closure of R' under F*. Return true iff (1) closure(R') = R (the full relation attributes), (2) R' is minimal (no proper subset of R' also has closure = R), and (3) R' is not in known_keys.

Schema (data type)

Type name: AdditionalKey
Variants: none (no graph or weight parameters)

Field Type Description
num_attributes usize Number of attributes in A (attributes indexed 0..num_attributes)
dependencies Vec<(Vec<usize>, Vec<usize>)> Functional dependencies F; each pair (lhs, rhs) means lhs -> rhs
relation_attrs Vec<usize> Subset R ⊆ A: the relation scheme attributes
known_keys Vec<Vec<usize>> Set K of known candidate keys for <R,F>

Size fields (getter methods for overhead expressions and declare_variants!):

  • num_attributes() — returns num_attributes
  • num_dependencies() — returns dependencies.len()
  • num_relation_attrs() — returns relation_attrs.len()
  • num_known_keys() — returns known_keys.len()

Notes:

  • This is a satisfaction (decision) problem: Metric = bool, implementing SatisfactionProblem.
  • No weight type is needed.
  • The answer is YES if and only if there exists a candidate key for <R,F> that is not in the known set K.
  • The problem is related to key enumeration: if the Lucchesi-Osborne algorithm finds all keys, then checking if any are missing from K is straightforward, but the enumeration itself may produce exponentially many keys.
  • The input includes both the full attribute set A (for functional dependency context) and the relation subset R.

Complexity

  • Best known exact algorithm: Enumerate all subsets of R, check each for key property and absence from K. Worst case: O(2^num_relation_attrs * num_dependencies * num_attributes).
  • NP-completeness: NP-complete [Beeri and Bernstein, 1979], via transformation from HITTING SET.
  • declare_variants! complexity string: "2^num_relation_attrs * num_dependencies * num_attributes"
  • Relationship to Hitting Set: The reduction from Hitting Set encodes the covering constraint as a key-determination constraint, making the additional key search equivalent to finding an uncovered hitting set.
  • References:
    • C. Beeri and P. A. Bernstein (1979). "Computational problems related to the design of normal form relational schemas." ACM Transactions on Database Systems, 4(1), pp. 30-59.

Extra Remark

Full book text:

INSTANCE: A set A of attribute names, a collection F of functional dependencies on A, a subset R ⊆ A, and a set K of keys for the relational scheme <R,F>.
QUESTION: Does R have a key not already contained in K, i.e., is there an R' ⊆ R such that R' not in K, (R',R) ∈ F*, and for no R'' ⊆ R' is (R'',R) ∈ F*?
Reference: [Beeri and Bernstein, 1978]. Transformation from HITTING SET.

Connection to BCNF normalization: A relation scheme <R,F> is in Boyce-Codd Normal Form (BCNF) if and only if every non-trivial functional dependency X -> Y has X as a superkey. Checking BCNF requires knowing all candidate keys, which in turn requires solving the Additional Key problem to ensure no keys have been missed. The NP-completeness of Additional Key implies that BCNF testing is also intractable in general (Beeri and Bernstein, 1979).

How to solve

  • It can be solved by (existing) bruteforce -- enumerate all subsets of R, check each for key property (closure = R, minimality), and verify it is not in K.
  • It can be solved by reducing to integer programming -- binary variable per attribute in R; constraint that closure covers R; constraint that the selected set is not equal to any known key (expressed via at least one differing attribute per known key); minimality as additional constraint.
  • Other: Use Lucchesi-Osborne key enumeration algorithm to find all keys, then check against K. Alternatively, reduce to SAT: encode closure computation, minimality, and exclusion of known keys as Boolean constraints.

Example Instance

Instance 1 (YES — additional key exists):
num_attributes = 6 (attributes: {0, 1, 2, 3, 4, 5})
relation_attrs = [0, 1, 2, 3, 4, 5] (full relation)
Functional dependencies F:

  • {0, 1} -> {2, 3}
  • {2, 3} -> {4, 5}
  • {4, 5} -> {0, 1}
  • {0, 2} -> {3}
  • {3, 5} -> {1}

known_keys = [[0, 1], [2, 3], [4, 5]]

The three known keys form a cycle: {0,1} -> {2,3} -> {4,5} -> {0,1}. But there is a hidden key {0, 2} that requires a 3-step closure chain:

  • Start: {0, 2}
  • Apply {0, 2} -> {3}: closure = {0, 2, 3}
  • Apply {2, 3} -> {4, 5}: closure = {0, 2, 3, 4, 5}
  • Apply {4, 5} -> {0, 1}: closure = {0, 1, 2, 3, 4, 5} = A
  • {0, 2} is minimal: {0} alone has closure {0}, {2} alone has closure {2}. Neither determines A.
  • {0, 2} is not in known_keys.

Answer: YES, {0, 2} is an additional key not in known_keys.

Instance 2 (NO — no additional key):
num_attributes = 3 (attributes: {0, 1, 2})
relation_attrs = [0, 1, 2]
Functional dependencies F:

  • {0} -> {1, 2}

known_keys = [[0]]

Analysis:

  • {0}: closure = {0,1,2} = A. Key.
  • {1}: closure = {1}. Not a key.
  • {2}: closure = {2}. Not a key.
  • {0,1}: contains {0} which is already a key, so not minimal.
  • {0,2}: same, contains {0}.
  • {1,2}: closure = {1,2}. Not a key.
  • The only candidate key is {0}, which is already in known_keys.

Answer: NO, no additional key exists.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions