You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Cast expr (i.e. select 1::text in SQL or implicit casts) which uses the arrow cast kernel which does not support nested structs and such
It would be good to unify these.
There was discussion of this very point in apache/arrow-rs#7176 and one thing that came up was to have arrow develop some sort of SchemaAdapter for itself.
One of the important issues to consider here in terms of performance, and maybe something to have a broader discussion on, is that one of the advantages of SchemaAdapter is that it can pre-compute the work to do be done and then avoid any sort of introspection in the hot path. This is not possible with a PhysicalExpr.
Thus I would like to propose the following rough course of action:
Unify the code paths, this can be something as naive as dynamically building a SchemaAdapter each time a Cast PhysicalExpr gets called or could be something like refactoring the code to be shared.
Think about some sort of PhysicalExpr::optimize(inputs) that can in this case pre-compute the needed casts and build efficient data structures to apply those in a loop. I think this could benefit a lot of other expressions as well that need to do prep work for each execution.
While working on #16589 (comment) we came to the realization that there is now 2 paths of casting / adaptation logic:
SchemaAdapterwhich now supports nested structs as of Add nested struct casting support and integrate into SchemaAdapter #16371Castexpr (i.e.select 1::textin SQL or implicit casts) which uses the arrow cast kernel which does not support nested structs and suchIt would be good to unify these.
There was discussion of this very point in apache/arrow-rs#7176 and one thing that came up was to have arrow develop some sort of
SchemaAdapterfor itself.One of the important issues to consider here in terms of performance, and maybe something to have a broader discussion on, is that one of the advantages of SchemaAdapter is that it can pre-compute the work to do be done and then avoid any sort of introspection in the hot path. This is not possible with a PhysicalExpr.
Thus I would like to propose the following rough course of action:
SchemaAdaptereach time aCastPhysicalExpr gets called or could be something like refactoring the code to be shared.PhysicalExpr::optimize(inputs)that can in this case pre-compute the needed casts and build efficient data structures to apply those in a loop. I think this could benefit a lot of other expressions as well that need to do prep work for each execution.