Skip to content

Make staging table location schema-aware instead of relying on the _staging_ prefix #15

Description

@nicoloesch

Notes

This is an issue created as a response to the discussion of omop-alchemy (see AustralianCancerDataNetwork/OMOP_Alchemy#14 (comment)). It therefore depends on PR 14 of `omop-alchemy to complete!

Context

Staging tables are disambiguated purely by name, never by schema. staging_tablename() (src/orm_loader/tables/loadable_table.py:71-86) returns _staging_{tablename}, and every statement in src/orm_loader/backends/postgres.py (create_staging_table, drop_staging_table, load_staging_fast, the merge methods) uses that name unqualified, e.g. CREATE UNLOGGED TABLE "{staging_name}" (LIKE "{target_name}" ...). orm_loader has no concept of a schema anywhere in this path.

Problem

A consumer that wants staging tables physically isolated in their own schema (kept out of backups, migrations, and introspection of the real schema) has no API for it. The only workaround today, used in OMOP_Alchemy's cli_vocab.py, is to put a dedicated staging schema first in search_path and rely on the _staging_ prefix never colliding with a real table name so the unqualified SQL resolves to the right schema by accident. That's fragile, and required a full search_path walkthrough in PR review for someone to understand which table an unqualified TRUNCATE was actually hitting.

Proposal

Thread an explicit schema through the staging table lifecycle (e.g. staging_schema: str | None on create_staging_table/drop_staging_table/load_staging_fast/the merge methods, or a class-level override alongside _staging_tablename), and qualify identifiers ("{schema}"."{name}") instead of leaving resolution to search_path. Consumers that don't pass a schema keep today's behavior.

Decision

We considered dropping the separate schema and leaning entirely on the _staging_ prefix instead, but went with making orm_loaderschema-aware so the schema separation OMOP_Alchemy wants doesn't depend on name non-collision.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions