Skip to content

feat: Support grounding without parent_ids#27

Open
nicoloesch wants to merge 2 commits into
mainfrom
22-unconstrained-grounding
Open

feat: Support grounding without parent_ids#27
nicoloesch wants to merge 2 commits into
mainfrom
22-unconstrained-grounding

Conversation

@nicoloesch

@nicoloesch nicoloesch commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Grounding currently requires one or more explicit parent_ids to anchor candidates within a known hierarchy. That precision has a cost: it makes grounding unusable in contexts where no such anchor exists yet

  • annotating free text with no known ontological context
  • benchmark cases sourced from external tools that don't carry a parent, or
  • open-ended graph exploration (MCP/ACP) where a caller is searching the concept space itself rather than validating a hit against an already-known branch.

This PR adds a parent-less mode: candidates still resolve and standardize to their canonical OMOP form via lexical/semantic search and identity hops, just without ancestor verification. This trades disambiguation power for coverage where no anchor is available.

Implementation

  • find_standard_paths now accepts targets: Optional[Tuple[int, ...]]
    • With targets: behavior is unchanged (batched ancestor check, separation = ancestor-hierarchy distance to the required parent).
    • Without targets: the first standard concept reached via an edge (IDENTITY in grounding) is accepted directly
      • no ancestor query at all
      • separation repurposed as the identity-hop distance from the original candidate
      • expansion capped at max_depth hops to bound the walk
  • Added StandardConcept.identity_hops: the raw BFS hop count, captured unconditionally in both modes from item.iterations (previously computed and discarded). Currently unused but allows disambiguation in the constrained case
  • Scoring itself (scoring.py) needed no changes: parsimony_penalty = alpha * separation already generalisation correctly since separation carries the right distance for either mode.
  • Deliberately did not add a depth/specificity-to-root signal.
    • The concept graph has no single root. Tt's a DAG with many disconnected per-domain/per-vocabulary top nodes
    • depth-to-root isn't comparable across candidates (a Condition concept vs. a Drug concept, or even two branches of the same domain) and would read as a principled signal while actually being noise.

@nicoloesch nicoloesch changed the title Support grounding without parent_ids feat: Support grounding without parent_ids Jul 2, 2026
@nicoloesch nicoloesch marked this pull request as ready for review July 2, 2026 23:10
@nicoloesch nicoloesch requested a review from gkennos July 2, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Grounding without Parent ID

1 participant