Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion rust/datafusion/src/logical_plan/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ use std::collections::HashSet;
/// # Ok(())
/// # }
/// ```
#[derive(Clone)]
#[derive(Clone, PartialEq)]
pub enum Expr {
/// An expression with a specific name.
Alias(Box<Expr>, String),
Expand Down
2 changes: 1 addition & 1 deletion rust/datafusion/src/physical_plan/functions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ use fmt::{Debug, Formatter};
use std::{fmt, str::FromStr, sync::Arc};

/// A function's signature, which defines the function's supported argument types.
#[derive(Debug, Clone)]
#[derive(Debug, Clone, PartialEq)]
pub enum Signature {
/// arbitrary number of arguments of an common type out of a list of valid types
// A function such as `concat` is `Variadic(vec![DataType::Utf8, DataType::LargeUtf8])`
Expand Down
6 changes: 6 additions & 0 deletions rust/datafusion/src/physical_plan/udaf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@ impl Debug for AggregateUDF {
}
}

impl PartialEq for AggregateUDF {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering, is this the PartialEq implementation we always want, or just for this purpose? Otherwise it might be better to create a normal function outside the PartialEq for it? Or do we need it now for a map/set somewhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if it was a suitable implementation, and Jorge's raised a good reason why it's not. See thread below.

The changes in this pull request depend heavily on determining equivalency of two expressions. For example, rebase_expr() and find_exprs_in_expr() use Vec::contains() (which requires the contained items implement the PartialEq trait).

fn eq(&self, other: &Self) -> bool {
self.name == other.name && self.signature == other.signature
}
}

impl AggregateUDF {
/// Create a new AggregateUDF
pub fn new(
Expand Down
6 changes: 6 additions & 0 deletions rust/datafusion/src/physical_plan/udf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ impl Debug for ScalarUDF {
}
}

impl PartialEq for ScalarUDF {
fn eq(&self, other: &Self) -> bool {
self.name == other.name && self.signature == other.signature
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this may not be sufficient: two UDFs may have the same name and signature and represent different semantics. In general UDFs are anonymous functions and thus do not have a partialEq. If we base equality on name and two UDFs are assigned the same name, there will be a semantic error in this equality.

Note that while this can't happen in SQL, because UDFs are registered via register_udf(name, ...), the DataFrame API supports using bypassing the context and calling UDFs without registering them on the context.

In the simple_udf.rs example, the line

let expr1 = pow.call(vec![col("a"), col("b")]);

demonstrates how to use a UDF without registering it on the context, in which case its name is only used when printing the plan.

Copy link
Copy Markdown
Contributor Author

@drusso drusso Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I was afraid there might be an issue like this.

I'm happy to back out the PartialEq derivation for Expr (as well as ScalarUDF and AggregateUDF) , and do the equivalency checks manually in SQL planner as needed.

Before I do that, I'm wondering if you think there is a more suitable implementation of PartialEq for ScalarUDF and AggregateUDF? Maybe their eq() implementations can:

  1. Compare the function pointers?
  2. Always return false?

I would advocate for some definition of equality for Expr, as, for example, I imagine an optimizer (outside the context of the SQL planner) should leverage equivalency of duplicated expressions. Of course equivalency doesn't need to be via PartialEq, but that would be a nice convenience.

Let me know what you think. Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a good answer here. Both options that you presented are reasonable. IMO we should just proceed with one, but let's double-check with @andygrove and @alamb ^_^

Copy link
Copy Markdown
Member

@jorgecarleitao jorgecarleitao Dec 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if we are performing comparison of expressions, we should have partialEq and not a generic function, as otherwise we may end up with multiple variations of a eq, which is error-prone.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to handle it in the general case we would have to require that the user defined function / aggregate itself define equality (perhaps with a default implementation that compares function pointers as suggested by @drusso ).

I think user defined functions in this kind of framework are also tagged with other properties (like if they have side effects, and thus can't be optimized away)

I personally suggest filing a ticket for this issue in the future -- it is kind of like the category of "more mature user defined function support".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed ARROW-10963.

}
}

impl ScalarUDF {
/// Create a new ScalarUDF
pub fn new(
Expand Down
1 change: 1 addition & 0 deletions rust/datafusion/src/sql/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@

pub mod parser;
pub mod planner;
mod utils;
Loading