Skip to content

A query in scalar context should use the first column #75

@xitology

Description

@xitology

Consider a query used in a scalar context, such as SELECT clause, EXISTS expression, IN expression, or an argument of a scalar function.

For convenience, assume a query that returns one row, e.g., @funsql from(person).limit(1). It does return multiple columns: person_id, gender_concept_id, etc. How should this query be interpreted when used in a scalar context, such as:

@funsql select(from(person).limit(1))

This is a challenge because in a scalar context, a query must return exactly one column. Currently, the returned column is NULL unless the column is explicitly specified with a select() combinator. Thus, select(from(person).limit(1)) returns NULL, but select(from(person).limit(1).select(person_id)) returns the value of person_id.

This interpretation allows us to accept any query in a scalar context, which is particularly useful for EXISTS. However, it may cause confusion when the query is used as an argument of IN or a scalar function.

There is a better interpretation: A query used in a scalar context should return its first column.

This interpretation does not change the semantics of a query with an explicit select(). For queries without select(), it would pick the first column of a table, which is typically its primary key. This allows us to write, for example

@funsql begin
    cohort() = begin
        from(person)
        filter(gender_concept_id == 8532)
    end

    relevant_visit() = begin
        from(visit_occurrence)
        filter(person_id in cohort()) # rather than `in cohort().select(person_id)`
    end
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions