Is your feature request related to a problem or challenge?
Related to the discussion on #11192 with @Xuanwo
RisingWave has a library for automatically creating vectorized implementations of functions (e.g. that operate on arrow arrays) from scalar implementations
The library is here: https://github.com/risingwavelabs/arrow-udf
A blog post describing it is here: https://risingwave.com/blog/simplifying-sql-function-implementation-with-rust-procedural-macro/
DataFusion uses macros to do something similar in binary.rs but they are pretty hard to read / understand in my opinon:
|
macro_rules! compute_utf8_op { |
|
($LEFT:expr, $RIGHT:expr, $OP:ident, $DT:ident) => {{ |
|
let ll = $LEFT |
|
.as_any() |
|
.downcast_ref::<$DT>() |
|
.expect("compute_op failed to downcast left side array"); |
|
let rr = $RIGHT |
|
.as_any() |
|
.downcast_ref::<$DT>() |
|
.expect("compute_op failed to downcast right side array"); |
|
Ok(Arc::new(paste::expr! {[<$OP _utf8>]}(&ll, &rr)?)) |
|
}}; |
|
} |
One main benefit I can see to switching to https://github.com/risingwavelabs/arrow-udf is that we could then extend arrow-udf to support Dictionary and StringView and maybe other types to generate fast kernels for multiple different array layouts.
Describe the solution you'd like
I think it would be great if someone could evaluate the feasibility of using the macros in https://github.com/risingwavelabs/arrow-udf to implement Datafusion's operations (and maybe eventually functions etc)
Describe alternatives you've considered
I suggest a POC that picks one or two functions (maybe string equality or regexp_match or something) and tries to use arrow-udfs function macro instead.
Here is an example of how to use it: https://docs.rs/arrow-udf/0.3.0/arrow_udf/
Additional context
No response
Is your feature request related to a problem or challenge?
Related to the discussion on #11192 with @Xuanwo
RisingWave has a library for automatically creating vectorized implementations of functions (e.g. that operate on arrow arrays) from scalar implementations
The library is here: https://github.com/risingwavelabs/arrow-udf
A blog post describing it is here: https://risingwave.com/blog/simplifying-sql-function-implementation-with-rust-procedural-macro/
DataFusion uses macros to do something similar in binary.rs but they are pretty hard to read / understand in my opinon:
datafusion/datafusion/physical-expr/src/expressions/binary.rs
Lines 118 to 130 in 7a23ea9
One main benefit I can see to switching to https://github.com/risingwavelabs/arrow-udf is that we could then extend arrow-udf to support Dictionary and StringView and maybe other types to generate fast kernels for multiple different array layouts.
Describe the solution you'd like
I think it would be great if someone could evaluate the feasibility of using the macros in https://github.com/risingwavelabs/arrow-udf to implement Datafusion's operations (and maybe eventually functions etc)
Describe alternatives you've considered
I suggest a POC that picks one or two functions (maybe string equality or regexp_match or something) and tries to use
arrow-udfs function macro instead.Here is an example of how to use it: https://docs.rs/arrow-udf/0.3.0/arrow_udf/
Additional context
No response