Describe the enhancement requested
Interoperation between numpy ndarrays and Arrow's ListArray types (ListArray, LargeListArray, FixedSizeListArray) is a bit tricky.
It's hard to construct values: one must convert to a Python list-of-lists first, which is unnecessarily expensive:
>>> import numpy as np
>>> import pyarrow as pa
>>> np_values = np.ones((3, 2), np.float64())
>>> pa_dtype = pa.list_(pa.float64())
>>> pa_values= pa.array(np_values, type=pa_dtype)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: only handle 1-dimensional arrays
>>> pa_values = pa.array(np_values.tolist(), type=pa_dtype)
<pyarrow.lib.ListArray object at 0x11ba433a0>
[
[
1,
1
],
[
1,
1
],
[
1,
1
]
]
Likewise, converting to a numpy ndarray from a Pyarrow ListArray type is tricky, as described in #35622. That issue describes trickiness with FixedSizeListArrays, but the same is true of ListArrays, which often might have equal-length lists in every entry, making them amenable to presentation as an ndarray.
I'd like to propose the following 6 new methods:
-
FixedSizeListArray.from_numpy_ndarray(values, type):
Constructs a new FixedSizeListArray from values, which must be a numpy ndarray with ndim == 2.
type is optional; it will be looked up from the ndarray's dtype if unset.
If type is set, values of the ndarray's dtype must be convertible to the provided type.
-
FixedSizeListArray.to_numpy_ndarray(self):
Returns the FixedSizeListArray's values as a numpy ndarray with a shape of (len(self), self.type.list_size).
If any of the FixedSizeListArray's values are null, raises an error.
If any of the FixedSizeListArray's values contain a null, then returns a ndarray with nan in the null spots, and with dtype set to float64, or None in the null spots and dtype of object if a conversion to float64 is not possible. This matches the behavior of Array.to_numpy for primitive types.
-
ListArray.from_numpy_ndarray(values, type):
Works just like FixedSizeListArray.from_numpy_ndarray.
-
ListArray.to_numpy_ndarray(self):
Works like FixedSizeListArray.to_numpy_ndarray, with an additional check that all list elements are of equal length. If any are different, then raises an error.
and same for LargeListArray as for ListArray, bringing the total to 6.
The FixedSizeListArray methods already have an implementation in the FixedShapeTensor extension type. Those implementation are actually a bit more complicated because of tensors' support for permutations:
|
def to_numpy_ndarray(self): |
|
def from_numpy_ndarray(obj): |
Component(s)
Python
Describe the enhancement requested
Interoperation between numpy
ndarrays and Arrow's ListArray types (ListArray, LargeListArray, FixedSizeListArray) is a bit tricky.It's hard to construct values: one must convert to a Python list-of-lists first, which is unnecessarily expensive:
Likewise, converting to a numpy ndarray from a Pyarrow ListArray type is tricky, as described in #35622. That issue describes trickiness with FixedSizeListArrays, but the same is true of ListArrays, which often might have equal-length lists in every entry, making them amenable to presentation as an ndarray.
I'd like to propose the following 6 new methods:
FixedSizeListArray.from_numpy_ndarray(values, type):Constructs a new FixedSizeListArray from
values, which must be a numpy ndarray withndim == 2.typeis optional; it will be looked up from the ndarray'sdtypeif unset.If
typeis set, values of the ndarray's dtype must be convertible to the provided type.FixedSizeListArray.to_numpy_ndarray(self):Returns the FixedSizeListArray's values as a numpy ndarray with a shape of
(len(self), self.type.list_size).If any of the FixedSizeListArray's values are
null, raises an error.If any of the FixedSizeListArray's values contain a
null, then returns andarraywithnanin the null spots, and withdtypeset tofloat64, orNonein the null spots anddtypeofobjectif a conversion tofloat64is not possible. This matches the behavior ofArray.to_numpyfor primitive types.ListArray.from_numpy_ndarray(values, type):Works just like
FixedSizeListArray.from_numpy_ndarray.ListArray.to_numpy_ndarray(self):Works like
FixedSizeListArray.to_numpy_ndarray, with an additional check that all list elements are of equal length. If any are different, then raises an error.and same for LargeListArray as for ListArray, bringing the total to 6.
The FixedSizeListArray methods already have an implementation in the
FixedShapeTensorextension type. Those implementation are actually a bit more complicated because of tensors' support for permutations:arrow/python/pyarrow/array.pxi
Line 3149 in 95c33d8
arrow/python/pyarrow/array.pxi
Line 3164 in 95c33d8
Component(s)
Python