Skip to content

[Rust][DataFusion] Improve performance of Array.slice #27229

@asfimport

Description

@asfimport

In DataFusion we are using Array.slice since #9271 to pass data into the accumulators, instead of having the overhead of building arrays (possibly with few rows) at once.

However, it seems pretty inefficient by now (taking a 1/6 of instructions for hash aggregates) doing some allocations under the hood instead of the promised "zero copy", much more than for example take which copies / shuffles the entire array based on indices.

@jorgecarleitao

Yes, slicing is suboptimal atm. Also, IMO it should not be the Array to implement that method, but each implementation individually. I haven't touch that part yet, though.
105164296-42515780-5b15-11eb-87f0-a042c4287514.png

 

 

 

Reporter: Daniël Heres / @Dandandan

Original Issue Attachments:

Note: This issue was originally created as ARROW-11331. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions