Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
| let coerced_dtypes = if node | ||
| .as_opt::<Binary>() | ||
| .is_some_and(|operator| operator.is_comparison()) | ||
| { | ||
| match comparison_literal_coercions(&node, &child_dtypes) { | ||
| Some(coerced_dtypes) => coerced_dtypes, | ||
| None => node.scalar_fn().coerce_args(&child_dtypes)?, | ||
| } | ||
| } else { | ||
| node.scalar_fn().coerce_args(&child_dtypes)? | ||
| }; |
There was a problem hiding this comment.
unrelated but can we use this in DF to deal with Decimal casting?
| }) | ||
| } | ||
|
|
||
| /// Plan an expression against a Vortex dtype. |
There was a problem hiding this comment.
Worth explaining what "planning" is.
| /// The [`DType`] returned by the scan, after applying the projection. | ||
| pub fn dtype(&self) -> VortexResult<DType> { | ||
| self.projection.return_dtype(self.layout_reader.dtype()) | ||
| plan_expression(self.projection.clone(), self.layout_reader.dtype())? |
There was a problem hiding this comment.
Ok after reading through all this PR, isn't this a significant behavior change? We used to talk about Vortex expressions as very strict and "physical", but now we implicitly cast and adapt them
There was a problem hiding this comment.
Hmmm, we do need some way of planning the query and executing expressions that don't necessarily match. Vortex expressions shouldn't do anything implicitly and instead we have this planning stage that inserts all the additional operations. However, there's still a limit on the autocoercions you'd want to do and we should limit them to only the nonfallible ones
There was a problem hiding this comment.
Using Vortex directly from Python feels much more like the front-end of a query engine.
Honestly, I'm not sure where to draw the line.
If you use Vortex from inside a query engine, you have almost certainly already done physical planning. If you use Vortex directly, you probably haven't. It's a shame that Arrow Datasets also don't do physical planning, else we could use that as our "engine interface".
There was a problem hiding this comment.
I just wanted to point out that this is a significant conceptual change which is bigger than what the title hints at.
One way to draw the line is to have a logical expression layer to handle this, which as far as I can tell is what pyarrow does, and is "bound" to a specific schema during execution (Like during dataset.to_table(..))
| Use `filter_policy="pushdown"` to raise when a PyArrow expression cannot be pushed into Vortex. Use | ||
| `filter_policy="fallback"` to read the rows and apply the PyArrow filter after the scan. |
| let lhs_literal = node.child(0).is::<Literal>(); | ||
| let rhs_literal = node.child(1).is::<Literal>(); | ||
| if lhs_literal == rhs_literal { | ||
| return None; | ||
| } | ||
|
|
||
| let literal_idx = usize::from(rhs_literal); | ||
| let context_idx = usize::from(lhs_literal); | ||
| let literal_dtype = &child_dtypes[literal_idx]; | ||
| let context_dtype = &child_dtypes[context_idx]; |
There was a problem hiding this comment.
nit: cute, match would have been more readable
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Merging this PR will degrade performance by 18.85%
Performance Changes
Comparing Footnotes
|
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
No description provided.