Skip to content

Restore nullability support for consuming Substrait fields. #12727

@westonpace

Description

@westonpace

Is your feature request related to a problem or challenge?

In #11130 the Substrait consumer was changed to always produce nullable fields.

However, I'm not entirely sure of the rationale. From that PR it states:

Arrow requires schema to match exactly to literals. Substrait contains nullability separate in literals and types, and the nullability of a literal may vary
row-by-row.

What does "Arrow requires schema to match exactly to literals" mean? Arrow doesn't really have a concept of literals that I'm aware of (neither does arrow-rs). So Arrow shouldn't really care if literals are nullable or not.

It appears that Datafusion maps literals to ScalarValue and there is no such thing in Datafusion as a non-nullable literal. So it makes sense to me that nullability is ignored when consuming literals.

However, arrow-rs and Substrait both have a concept of nullable fields and, as far as I can tell, those concepts are compatible. Can we map Substrait nullability to Arrow field nullability?

Describe the solution you'd like

The method from_substrait_type should either return (DataType, bool) or Field (with an empty name).
The method from_substrait_named_struct should return a schema that has the nullability set appropriately in fields.

Non-nullable fields can round-trip from DF->Substrait->DF without losing their non-nullability.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions