Skip to content

Implement graph-driven cascade delete and restrict on Diagram#1407

Open
dimitri-yatsenko wants to merge 33 commits intomasterfrom
design/restricted-diagram
Open

Implement graph-driven cascade delete and restrict on Diagram#1407
dimitri-yatsenko wants to merge 33 commits intomasterfrom
design/restricted-diagram

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Feb 21, 2026

Summary

Replace the error-driven cascade in Table.delete() with graph-driven restriction propagation using Diagram. The Diagram is purely a graph computation and inspection tool — all mutation logic (transactions, SQL execution, prompts) lives in Table.delete() and Table.drop().

Resolves: #865 (applying restrictions to a Diagram), #1110 (cascade delete fails on MySQL 8 with limited privileges)

Architecture

Table.delete() builds a Diagram, calls cascade() to compute the affected subgraph, then executes the delete itself in reverse topological order. Table.drop() follows the same pattern. The Diagram never executes mutations — it computes the cascade graph and provides preview() for inspection.

New Diagram methods (graph computation / inspection only)

  • cascade(table_expr) — OR convergence, one-shot, trims to seed + descendants subgraph
  • restrict(table_expr) — AND convergence, chainable, preserves full graph (for export/subsetting)
  • preview() — show affected tables and row counts without modifying data
  • prune() — remove tables with zero matching rows from the diagram
  • _from_table() — lightweight internal factory for Table.delete/Table.drop

Table.delete() and Table.drop() changes

  • Table.delete() absorbs all execution logic: transaction management, reverse-topo-order SQL execution, IntegrityError handling for unloaded schemas, part integrity post-check, user confirmation
  • Table.drop() absorbs drop execution: part integrity pre-check, reverse-topo-order DROP, user confirmation
  • New dry_run parameter: delete(dry_run=True) and drop(dry_run=True) return affected row counts via Diagram.preview() without modifying data
  • part_integrity parameter: data-driven post-check avoids false positives when a Part table appears in the cascade graph but has zero affected rows

Restriction propagation rules

For edge Parent→Child with attr_map:

Condition Child restriction
Non-aliased AND parent_attrs ⊆ child.primary_key Copy parent restriction directly
Aliased FK (fk_attrs ≠ pk_attrs) parent.proj(**{fk: pk for fk, pk in attr_map.items()})
Non-aliased AND parent_attrs ⊄ child.primary_key parent.proj()

Convergence semantics

  • Cascade (delete): OR — a row is deleted if ANY ancestor path reaches it
  • Restrict (export): AND — a row is included only if ALL ancestor conditions match

Bug fixes

  • Index declaration parsing: Allow inline comments on index lines (e.g., index(y, z) # comment)
  • SQL generation: Restrictions applied via restrict()make_condition() rather than direct _restriction assignment, fixing invalid WHERE clauses

Advantages over previous implementation

Scenario Error-driven (prior) Graph-driven (new)
MySQL 8 + limited privileges Crashes (#1110) Works — no error parsing needed
PostgreSQL Savepoint overhead per attempt No errors triggered
part_integrity Post-hoc check after delete Data-driven post-check (no false positives)
Inspectability Opaque recursive cascade preview() / dry_run before executing
Reusability Delete-only Delete, drop, export, prune

Files changed

File Change
src/datajoint/diagram.py cascade(), restrict(), preview(), prune(), _restricted_table() — no delete()/drop()
src/datajoint/table.py Table.delete() and Table.drop() own all execution logic, dry_run parameter
src/datajoint/declare.py Fix index declaration regex to allow trailing comments
src/datajoint/user_tables.py Part.delete() passes part_integrity and dry_run through
src/datajoint/version.py Bump to 2.2.0dev0
docs/design/thread-safe-mode.md Removed (captured in datajoint-docs)
tests/integration/test_cascade_delete.py New dry_run tests for delete and drop
tests/integration/test_erd.py Prune tests

Test plan

  • All existing cascade delete tests pass (MySQL + PostgreSQL)
  • New dry_run tests for delete and drop
  • All 20 datajoint-docs tutorials pass against PostgreSQL
  • Full suite passes
  • Pre-commit hooks pass

🤖 Generated with Claude Code

dimitri-yatsenko and others added 5 commits February 21, 2026 13:56
Graph-driven cascade delete using restricted Diagram nodes,
replacing error-message parsing with dependency graph traversal.
Addresses MySQL 8 privilege issues and PostgreSQL overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Unrestricted nodes are not affected by operations
- Multiple restrict() calls create separate restriction sets
- Delete combines sets with OR (any taint → delete)
- Export combines sets with AND (all criteria → include)
- Within a set, multiple FK paths combine with OR (structural)
- Added open questions on lenient vs strict AND and same-table restrictions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Delete: one restriction, propagated downstream only, OR at convergence
- Export: downstream + upstream context, AND at convergence
- Removed over-engineered "multiple restriction sets" abstraction
- Clarified alias nodes (same parent, multiple FKs) vs convergence (different parents)
- Non-downstream tables: excluded for delete, included for export

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- cascade(): OR at convergence, downstream only — for delete
- restrict(): AND at convergence, includes upstream context — for export
- Both propagate downstream via attr_map, differ only at convergence
- Table.delete() internally constructs diagram.cascade()
- part_integrity is a parameter of cascade(), not delete()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Table.drop() rewritten as Diagram(table).drop()
- Shared infrastructure: reverse topo traversal, part_integrity pre-checks,
  unloaded-schema error handling, preview
- drop is DDL (no restrictions), delete is DML (with cascade restrictions)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimitri-yatsenko and others added 2 commits February 22, 2026 11:15
Replace the error-driven cascade in Table.delete() (~200 lines) with
graph-driven restriction propagation on Diagram. Table.delete() and
Table.drop() now delegate to Diagram.cascade().delete() and
Diagram.drop() respectively.

New Diagram methods:
- cascade(table_expr) — OR at convergence, one-shot, for delete
- restrict(table_expr) — AND at convergence, chainable, for export
- delete() — execute cascade delete in reverse topo order
- drop() — drop tables in reverse topo order
- preview() — show affected tables and row counts
- _from_table() — lightweight factory for Table.delete/drop

Restructure: single Diagram(nx.DiGraph) class always defined.
Only visualization methods gated on diagram_active.

Resolves #865, #1110.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in diagram.py and table.py:
- Adopt master's config access pattern (self._connection._config)
- Keep graph-driven cascade/restrict implementation
- Apply master's declare() config param, split_full_table_name(), _config in store context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko changed the title Design: Restricted Diagrams for cascading operations Implement graph-driven cascade delete and restrict on Diagram Feb 22, 2026
- Add assert after conditional config import to narrow type for mypy
  (filepath.py, attach.py)
- Add Any type annotation to untyped config parameters (hash_registry.py)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimitri-yatsenko and others added 10 commits February 23, 2026 13:54
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cascade restrictions stored as plain lists (for OR semantics) were
being directly assigned to ft._restriction, causing list objects to
be stringified as Python repr ("[' condition ']") in SQL WHERE clauses.

Use restrict_in_place() which properly handles lists as OR conditions
through the standard restrict() path. Also fix version string to be
PEP 440 compliant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The delete() pre-check for part_integrity="enforce" was hardcoded and
did not respect the part_integrity parameter passed to cascade(). Also,
explicitly deleting from a part table (e.g. Website().delete()) would
always fail because the cascade seed is the part itself and its master
is never in the cascade graph.

Fix: store _part_integrity and _cascade_seed during cascade(), only run
the enforce check when part_integrity="enforce", and skip the seed node
since it was explicitly targeted by the caller.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pre-check on the cascade graph was too conservative — it flagged
part tables that appeared in the graph but had zero rows to delete.
The old code checked actual deletions within a transaction.

Replace the graph-based pre-check with a post-hoc check on
deleted_tables (tables that actually had rows deleted). If a part
table had rows deleted without its master also having rows deleted,
roll back the transaction and raise DataJointError. This matches
the original part_integrity="enforce" semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion_attributes

FreeTable._restriction_attributes is None by default. The property
accessor initializes it to set() on first access. The make_condition
call in part_integrity="cascade" upward propagation was using the
private attribute directly, causing AttributeError when columns=None.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds prune() method that removes tables with zero matching rows from
the diagram. Without prior restrictions, removes physically empty
tables. With restrictions (cascade or restrict), removes tables where
the restricted query yields zero rows. Returns a new Diagram.

Includes 5 integration tests: unrestricted prune, prune after restrict,
prune after cascade, idempotency, and prune-then-restrict chaining.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add prune() method to both spec and design docs
- Rename _propagate_to_children → _propagate_restrictions + _apply_propagation_rule
- Fix delete() part_integrity: post-check with rollback, not pre-check
- Add _part_integrity instance attribute
- Update files affected, verification, and implementation phases
- Mark open questions as resolved with actual decisions
- Mark export/restore as future work

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove process artifacts (implementation phases, verification checklists,
resolved decisions, files-changed tables). Both documents now describe
the current system as-is, ready for migration into datajoint-docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko marked this pull request as draft March 2, 2026 20:49
dimitri-yatsenko and others added 2 commits March 6, 2026 18:15
…ost-check part integrity

Replace direct `_restriction` assignment with `restrict()` calls in Diagram
so that AndList and QueryExpression objects are converted to valid SQL via
`make_condition()`. Cascade delete uses OR convergence (a row is deleted if
ANY FK reference points to a deleted row), while restrict/export uses AND.

Part integrity enforcement uses a data-driven post-check: only raises when
rows were actually deleted from a Part without its master also being deleted.
This avoids false positives when a Part table appears in the cascade graph
but has zero affected rows.

Also adds dry_run support to delete()/drop(), prune() method, fixes CLI test
subprocess invocation, and updates test fixtures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge restricted-diagram.md and restricted-diagram-spec.md into a single
document reflecting the final implementation: _restrict_freetable for SQL
generation, OR/AND convergence semantics, data-driven part_integrity
post-check, and dry_run support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko marked this pull request as ready for review March 7, 2026 11:05
Move OR/AND convergence logic into a single Diagram method that returns
a FreeTable with the diagram's restrictions already applied. Callers no
longer need to know about modes or pass restriction lists explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimitri-yatsenko and others added 12 commits March 9, 2026 08:54
Update Part.delete() kwargs docstring to document the dry_run parameter.
Add integration test for Part.delete(dry_run=True) with part_integrity="ignore".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Part.delete() was not returning the result from super().delete(),
causing dry_run=True to return None instead of the row count dict.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cascade() now removes non-descendant nodes from the returned Diagram,
so the graph itself defines the delete scope. This eliminates the
redundant _cascade_restrictions membership check in delete() — it
simply walks all non-alias nodes in the trimmed graph.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These warnings originate from matplotlib's internal pyparsing usage
(_fontconfig_pattern.py, _mathtext.py), not from datajoint code.
Filter them in pytest config to reduce noise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CI environment uses a newer pyparsing that doesn't have
PyparsingDeprecationWarning. Use a message-based DeprecationWarning
filter scoped to matplotlib instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The PyparsingDeprecationWarning only occurs in older matplotlib
versions. CI uses a newer version where it doesn't exist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- cascade() now documents graph trimming step (step 4)
- delete() walks all non-alias nodes (graph already trimmed)
- _restrict_freetable() renamed to _restricted_table() (instance method)
- Sharpen distinction between cascade (delete) and restrict (subset)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diagram is now purely a graph computation and inspection tool
(cascade, restrict, preview, prune). All mutation logic — transaction
management, SQL execution, prompts — lives in Table.delete() and
Table.drop(). Remove design docs superseded by datajoint-docs specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The regex matching index lines in table definitions required exact
end-of-line after the closing paren, rejecting valid declarations
like `index(y, z) # for efficient coronal slice queries`. Updated
regex to accept optional trailing comments and strip them before
passing to compile_index.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diagram now supports Python iteration protocol, yielding FreeTable
objects in topological order. Table.delete() and Table.drop() use
reversed(diagram) instead of manual topo_sort loops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids confusion with QueryExpression.preview() which shows table
contents. Diagram.counts() returns row counts per table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant