Is your feature request related to a problem or challenge?
As part of making DataFusion even more customizable (#8045 ), it is valuable to let system designers mix and match different packages of functions to get the precise behavior they want (e.g. postgres style to_date or spark style to_date).
To support this functionality as well as to ensure the ScalarUDF API exposes the full power of DataFusion, we are in the process of extracting the "built in" functions out of the core and into separate crates.
This epic tracks the work to actually move the functions out of the core datafusion crate (spread through datafusion_expr and datafusion-physical-expr and into the new datafusion-functions / datafusion-functions-array crates
Tasks:
Here is list of many of the items necessary to complete this transition. Eventually there should be tickets for all tasks, and there are tickets for some already, but I don't want to make 100s of tickets all at once. I plan to make more as we make it through more of this project.
Anyone should feel free to make other tickets if they want to help with items below.
math_expressions
These should be located in the datafusion-functions crate (source link )
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/math/mod.rs
Move nullif and isnan to datafusion-functions #9216
Move abs to datafusion_functions #9286
refactor: move acos() to function crate #9297
Abs, Asin,
Atan, Atan2, Acosh, Asinh, Atanh,
Cbrt, Ceil, Cos, Cosh, Degrees
Exp, Factorial Move ceil, exp, factorial to datafusion-functions crate #9939
Floor, Gcd, Lcm, Ln, Log, Log10, Log2, Pi, Power,
Radians, Signum, Sin, Sinh, Sqrt,
Tan, Tanh, Trunc, Cot, Round, iszero
array_expressions
Note that given the size and specialization of these functions are put in their own subcrate, datafusion-functions-array
ArrayToString Create datafusion-functions-array crate and move ArrayToString function into it #9113
ArrayDims, ArrayNdims, Cardinality move ArrayDims, ArrayNdims and Cardinality to datafusion-function-crate #9425
ArrayHas, ArrayHasAll, ArrayHasAny Port ArrayHas family to functions-array #9496
MakeArray, ArrayAppend, ArrayPrepend, ArrayConat move make_array array_append array_prepend array_concat function to datafusion-functions-array crate #9343 Move make_array to datafusion-functions #9288 move make_array array_append array_prepend array_concat function to datafusion-functions-array crate #9504
Range, GenSeries port range function and change gen_series logic #9352
ArrayEmpty ArrayLength port array_empty and array_length to datafusion-function-array crate #9510
Flatten port flatten to datafusion-function-array #9523
StringToArray Port StringToArray to function-arrays #9497
ArraySort Port ArraySort to function-arrays subcrate #9551
ArrayDistinct Port ArrayDistinct to functions-array subcrate #9549
ArrayRepeat Port ArrayRepeat to functions-array subcrate #9565
ArrayResize Port ArrayResize to functions-array subcrate #9570
ArrayElement, ArraySlice, ArrayPopFront, ArrayPopBack Port ArrayElem/Slice/PopFront/Back into functions-array #9615
ArrayPosition, ArrayPositions Port ArrayPosition and ArrayPositions to functions-array subcrate #9617
ArrayReverse Add array_reverse function to datafusion-function-* crate #9630
ArrayIntersect, ArrayUnion Port Array Union and Intersect to functions-array #9629
ArrayExcept Port ArrayExcept to functions-array subcrate #9634
ArrayRemove, ArrayRemoveN, ArrayRemoveAll Port ArrayRemove, ArrayRemoveN, ArrayRemoveAll to functions-array subcrate #9635
ArrayReplace, ArrayReplaceN, ArrayReplaceAll move array_replace family functions to datafusion-function-array crate #9651
MakeArray: construct an array from columns (union/except depends on this)
Move datafusion_array_function specific rewrite rules like to datafusion_functions_array crate #9519
Core functions
These should be located in the datafusion-functions crate (source link )
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/core/mod.rs
crypto_expressions
These should be located in the datafusion-functions crate (source link )
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/crypto/mod.rs
string_expressions
These should be located in the datafusion-functions crate (source link )
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/string/mod.rs
concat, concat_ws, ends_with, initcap Move take concat, concat_ws, ends_with, initcap, to datafusion-functions #9540
Create string module in datafusion/functions/src/string and string_expressions feature flag, move ascii function
ascii, bit_length, btrim, chr,
instr, lower, ltrim, octet_length,
repeat, replace, rtrim, split_part,
starts_with, to_hex, trim, upper,
levenshtein, uuid, overlay
unicode_expressions
These should be located in the datafusion-functions crate (source link )
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/unicode/mod.rs
Create unicode module in datafusion/functions/src/unicode and unicode_expressions feature flag, move charlength function
CharLength,
Left, Lpad, Reverse, Right, Rpad,
Strpos, Substr,
Translate, SubstrIndex, FindInSet
regex_expressions
These should be located in the datafusion-functions crate (source link )
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/regexp/mod.rs
datetime_expressions
These should be located in the datafusion-functions crate (source link )
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/datetime/mod.rs
Create datetime module in datafusion/functions/src/datetime and datetime_expressions feature flag, move date_part
Move the to_timestamp* functions to datafusion-functions #9291
port benchmarks to datafusion-functions crate
date_part, date_trunc, date_bin,
to_timestamp, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos, to_timestamp_seconds,
from_unixtime, now, current_date, current_time
Infrastructure
Describe alternatives you've considered
No response
Additional context
The organization was discussed in #9100
Is your feature request related to a problem or challenge?
As part of making DataFusion even more customizable (#8045), it is valuable to let system designers mix and match different packages of functions to get the precise behavior they want (e.g. postgres style
to_dateor spark styleto_date).To support this functionality as well as to ensure the
ScalarUDFAPI exposes the full power of DataFusion, we are in the process of extracting the "built in" functions out of the core and into separate crates.This epic tracks the work to actually move the functions out of the core datafusion crate (spread through
datafusion_expranddatafusion-physical-exprand into the newdatafusion-functions/datafusion-functions-arraycratesTasks:
Here is list of many of the items necessary to complete this transition. Eventually there should be tickets for all tasks, and there are tickets for some already, but I don't want to make 100s of tickets all at once. I plan to make more as we make it through more of this project.
Anyone should feel free to make other tickets if they want to help with items below.
math_expressionsThese should be located in the
datafusion-functionscrate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/math/mod.rs
nullifandisnanto datafusion-functions #9216abstodatafusion_functions#9286ceil,exp,factorialtodatafusion-functionscrate #9939array_expressionsNote that given the size and specialization of these functions are put in their own subcrate,
datafusion-functions-arraydatafusion-functions-arraycrate and moveArrayToStringfunction into it #9113functions-array#9496make_arrayto datafusion-functions #9288 move make_array array_append array_prepend array_concat function to datafusion-functions-array crate #9504StringToArraytofunction-arrays#9497ArraySorttofunction-arrayssubcrate #9551ArrayDistincttofunctions-arraysubcrate #9549ArrayRepeattofunctions-arraysubcrate #9565ArrayResizetofunctions-arraysubcrate #9570functions-array#9615ArrayPositionandArrayPositionstofunctions-arraysubcrate #9617array_reversefunction to datafusion-function-* crate #9630functions-array#9629ArrayExcepttofunctions-arraysubcrate #9634ArrayRemove,ArrayRemoveN,ArrayRemoveAlltofunctions-arraysubcrate #9635MakeArray: construct an array from columns (union/except depends on this)datafusion_array_functionspecific rewrite rules like todatafusion_functions_arraycrate #9519Core functions
These should be located in the
datafusion-functionscrate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/core/mod.rs
coremodule, extractnullif: Movenullifandisnanto datafusion-functions #9216arrow_casttodatafusion-functionscrate #9287ArrowTypeOf: return the arrow type of a value Portarrow_typeofto datafusion-function #9524Coalesce: return the first non-null valueStruct: Create a structNullIf: return null if the two values are equalRandom: return a random numberNanvl: return the first non-NaN valuecrypto_expressionsThese should be located in the
datafusion-functionscrate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/crypto/mod.rs
cryptomodule indatafusion/functions/src/cryptoandcrypto_expressionsfeature flag, movedigestfunctionstring_expressionsThese should be located in the
datafusion-functionscrate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/string/mod.rs
stringmodule indatafusion/functions/src/stringandstring_expressionsfeature flag, moveasciifunctionunicode_expressionsThese should be located in the
datafusion-functionscrate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/unicode/mod.rs
unicodemodule indatafusion/functions/src/unicodeandunicode_expressionsfeature flag, movecharlengthfunctionregex_expressionsThese should be located in the
datafusion-functionscrate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/regexp/mod.rs
datetime_expressionsThese should be located in the
datafusion-functionscrate (source link)Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/datetime/mod.rs
datetimemodule indatafusion/functions/src/datetimeanddatetime_expressionsfeature flag, movedate_partInfrastructure
Describe alternatives you've considered
No response
Additional context
The organization was discussed in #9100