Skip to content

Support for grouping in UUID columns #46468

@Fokko

Description

@Fokko

Describe the enhancement requested

Python 3.10.16 (main, Dec  3 2024, 17:27:57) [Clang 16.0.0 (clang-1600.0.26.4)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.31.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pyarrow as pa
   ...: import uuid
   ...: 
   ...: arr_table = pa.Table.from_pydict(
   ...:     {
   ...:         "uuid": [
   ...:             uuid.UUID("00000000-0000-0000-0000-000000000000").bytes,
   ...:             uuid.UUID("11111111-1111-1111-1111-111111111111").bytes,
   ...:         ],
   ...:     },
   ...:     schema=pa.schema(
   ...:         [
   ...:             pa.field("uuid", pa.uuid(), nullable=False),
   ...:         ]
   ...:     ),
   ...: )
   ...: 
   ...: arr_table.group_by('uuid').aggregate([])
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In[1], line 18
      2 import uuid
      4 arr_table = pa.Table.from_pydict(
      5     {
      6         "uuid": [
   (...)
     15     ),
     16 )
---> 18 arr_table.group_by('uuid').aggregate([])

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/table.pxi:6560, in pyarrow.lib.TableGroupBy.aggregate()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/acero.py:410, in _group_by(table, aggregates, keys, use_threads)
    404 def _group_by(table, aggregates, keys, use_threads=True):
    406     decl = Declaration.from_sequence([
    407         Declaration("table_source", TableSourceNodeOptions(table)),
    408         Declaration("aggregate", AggregateNodeOptions(aggregates, keys=keys))
    409     ])
--> 410     return decl.to_table(use_threads=use_threads)

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/_acero.pyx:590, in pyarrow._acero.Declaration.to_table()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File /opt/homebrew/lib/python3.10/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowNotImplementedError: Keys of type extension<arrow.uuid>

Looking at the stacktrace, I think we've need to change something here. The UUID is just a fixed with column under the hood, so I think we can re-use that logic.

Thoughts from the Arrow maintainers?

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions