feat: Centralised configuration utility by nicoloesch · Pull Request #17 · AustralianCancerDataNetwork/omop-graph

nicoloesch · 2026-06-09T05:47:53Z

Summary

Adapts omop-graph to the oa-configurator configuration layer, replacing the previous environment-variable-based setup with a typed TOML-backed config and a first-class omop-config configure omop_graph subcommand.

FIXES: Include oa-configurator for centralised configuration #15

Notes

Due to the importance, this PR also absorbed the following issue:

FIXES: Hardcodes COSINE #16
FIXES: Semantic similarity is also being used for non-embedding resolvers #23
FIXES: N+1 SQL round-trips in ancestry validation during concept_ground (find_standard_paths / find_standard_concepts) #24
FIXES: Reclassify loose/historical/lossy relationship types out of Identity in predicate_mapping.csv #25

Changes

OmopGraphConfig subclasses PackageConfigBase, exposing all package settings as typed Pydantic fields backed by [tools.omop_graph] in ~/.config/omop/config.toml
Entry point registered under omop.config so omop-config configure omop_graph prompts for package extras interactively or via named flags (--max-depth, --max-paths); omop-graph owns no database resource directly — it relies on the CDM resource configured by omop-alchemy
OmopGraphConfig.get_config() (inherited classmethod) replaces the old standalone get_config() function; all internal call sites updated
Resolver.from_active_config() replaces the old standalone get_resolver() function; all engine-creation helpers updated (db/session.py, oaklib_interface/omop_factory.py)
OmopGraphConfig.configure_logging(verbosity=…) (inherited classmethod) replaces the old standalone configure_logging() wrapper; extra_logging_namespaces = ("omop_alchemy", "omop_emb") declares transitive dependencies whose logs are also configured (omop_emb is optional but harmless to include)
docker-compose.yaml updated from --resource-set/--set flags to named flags matching the new CLI

…esolver

…id_level

gkennos

I think only the comments in paths and grounding actually block the release

gkennos · 2026-06-30T07:40:12Z


-        if max_concepts and len(found_standard_concepts) >= max_concepts:
+        if max_concepts and all(
+            found_count_per_target.get(t, 0) >= max_concepts for t in targets


if one of these targets has no concept_ancestor values then all() is permanently False and max_concepts has no effect

Valid edge case but the current code is already the right trade-off. When all targets are reachable and all hit max_concepts the break fires correctly. When some targets are unreachable the break never fires and the BFS drains fully. This is identical to not having this limit set at all but that is the implication of the limit.

Instead of optimising the early-stopping condition, I optimised the number of round-trips to the DB by doing batched ancestry search. This should reduce the number of requests to the DB and speed the BFS up.

yup ok that's a good optimisation

gkennos · 2026-06-30T08:20:06Z

+        standard_concepts=tuple(sc for sc in standard_concepts if sc.match_kind != LabelMatchKind.EMBEDDING),
        kg=kg,
-        nearest_concept_matches=nearest_concept_matches,
+        nearest_concept_matches=None,  # No embedding-based scoring for non-embedding matches


How come this was changed? if a synonym is super different textually but somehow overwhelmed in embedding space, it should not be penalised that hard? so - CIN is synonym of Cervical intraepithelial neoplasia, but does not appear in top n embedding concepts -> shouldn't be scored to effectively 0 by its lack of textual similarity I think? it should get its (kind of mid, but existent) embedding score, even though it wasn't enough to score in the top n and be resolved by the embedding resolver

was line 190/191 a huge performance hit? is there a reason specifically to keep the list shorter there? otherwise I think it should not be None here

The split is intentional. Resolvers that find concepts textually (Exact/Synonym/Partial/FTS) score textually; the EmbeddingResolver scores semantically. Adding embedding scoring to FTS results reintroduces the dilution problem the split was designed to fix: FTS surfaces hundreds of NOS/body-part variants with nearly identical embeddings, and scoring all of them semantically swamps the concept-specific signal.

This is NOT a performance fix but a structural/syntactic (?) fix.

ok that is rational - just wasn't clear on why it was changed

… IDENTITY

nicoloesch · 2026-07-01T01:50:04Z

@gkennos Included requested changes. There are 3 open conversations I would like your feedback on, which is why they are not being marked as "Resolved".

In addition to the suggested changes, I added an optimisation to the BFS search to query the DB for all concept ancestors for the entire heap at once. That way, we should be able to reduce the number of times the DB is queried significantly (i.e. not doing the ancestry check for EACH individual item). This performance improvement is probably the most noticeable the more concepts we have to check.

Let me know your thoughts on the implementation. It is deliberately still open for bringing costs to specific predicate kinds in the future (which is why the heap is still there) so we can extend if we wanted to. Docstrings should also indicate this.

nicoloesch added 23 commits June 4, 2026 01:00

oa-configuration + docker-compose + docs update + logging

3b5d9ca

Update the kg to have the correct imports for fulltext

94ebae4

Resolve params from cfg

2b1f9d9

Remove skip-if-configured

f08afcf

Consolidate cfg, rename primary_db to database

bacbba4

Update the logging interface

69c8ddd

Include more perfromant omop-cdm-db

8368c16

Update to new test capabilities of oa-configurator

847879e

Check ruff, pylance, mkdocs and pytest

fe8f600

Corrected import for cli_utils

416cc4d

Updated becnhmark script to include new config utility

6b3417c

Update oa-configurator dependency

1a5796a

Small change to docs

e1f8095

Update formatting after ruff

276c465

Updated tracing, included amia cases, embeddings only for embedding r…

29d0594

…esolver

Support OHDSI gold standard concepts

94e19c5

Batched ancestry search

708c58c

Update trace example with progressbar and the option with the parent_…

e245907

…id_level

Update the predicate classification

9d41963

AMIA trace example version with updated plots

efae8b3

Remove old manual AMIA cases

5c9a06a

Remove conftest and unused variables

a885f16

Ruff check, finalised versioning

3fa56fa

nicoloesch marked this pull request as ready for review June 30, 2026 02:09

nicoloesch requested a review from gkennos June 30, 2026 02:09

gkennos requested changes Jun 30, 2026

View reviewed changes

nicoloesch added 3 commits July 1, 2026 01:28

Address PR comments, speed up BFS search

0340bb3

Fix ancestry cases with the new interface of multiple child_ids

409cae9

Further docstrings and cleanup, Force the grounding constraints to be…

82002b4

… IDENTITY

nicoloesch requested a review from gkennos July 1, 2026 01:50

gkennos approved these changes Jul 1, 2026

View reviewed changes

Bump omop-emb version with new hotfix

8d63581

nicoloesch merged commit 049cd6e into main Jul 2, 2026
4 checks passed

nicoloesch deleted the 15-oa-config branch July 2, 2026 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Centralised configuration utility#17

feat: Centralised configuration utility#17
nicoloesch merged 27 commits into
mainfrom
15-oa-config

nicoloesch commented Jun 9, 2026 •

edited

Loading

Uh oh!

gkennos left a comment

Uh oh!

gkennos Jun 30, 2026

Uh oh!

nicoloesch Jul 1, 2026

Uh oh!

gkennos Jul 1, 2026

Uh oh!

gkennos Jun 30, 2026

Uh oh!

nicoloesch Jul 1, 2026

Uh oh!

gkennos Jul 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nicoloesch commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nicoloesch commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Changes

Uh oh!

gkennos left a comment

Choose a reason for hiding this comment

Uh oh!

gkennos Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

nicoloesch Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gkennos Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gkennos Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

nicoloesch Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gkennos Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nicoloesch commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nicoloesch commented Jun 9, 2026 •

edited

Loading