Skip to content

GetIndexedField doesn't support indexing Map types #7824

@swgillespie

Description

@swgillespie

Describe the bug

If you load a Parquet file that has a column of type Map, you can't write a query involving GetIndexedField that queries it. This would appear to be because GetIndexedField only specifically supports structs and lists and not maps.

To Reproduce

DataFusion CLI v31.0.0
❯ create external table test stored as parquet location '../scratch';
0 rows in set. Query took 0.014 seconds.

❯ show columns from test;
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type

                                                                                               | is_nullable |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| datafusion    | public       | test       | ints        | Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false) | NO          |
| datafusion    | public       | test       | strings     | Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)  | NO          |
| datafusion    | public       | test       | timestamp   | Utf8

                                                                                               | NO          |
+---------------+--------------+------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
❯ select avg(ints['bytes']), strings['method'] from test group by strings['method'];
Error during planning: The expression to get an indexed field is only valid for `List` or `Struct` types, got Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "value", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, false)

Expected behavior

I would expect the above query

SELECT avg(ints['bytes']), strings['method']
FROM test 
GROUP BY strings['method'];

to work and produce a result set with two columns.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions