Plan {
version: Some(
Version {
major_number: 0,
minor_number: 31,
patch_number: 0,
git_hash: "",
producer: "datafusion",
},
),
extension_uris: [],
extensions: [
SimpleExtensionDeclaration {
mapping_type: Some(
ExtensionFunction(
ExtensionFunction {
extension_uri_reference: 4294967295,
function_anchor: 0,
name: "not_equal",
},
),
),
},
],
relations: [
PlanRel {
rel_type: Some(
Root(
RelRoot {
input: Some(
Rel {
rel_type: Some(
Project(
ProjectRel {
common: None,
input: Some(
Rel {
rel_type: Some(
Join(
JoinRel {
common: None,
left: Some(
Rel {
rel_type: Some(
Read(
ReadRel {
common: None,
base_schema: Some(
NamedStruct {
names: [
"a",
"b",
"c",
"d",
"e",
"f",
],
r#struct: None,
},
),
filter: None,
best_effort_filter: None,
projection: Some(
MaskExpression {
select: Some(
StructSelect {
struct_items: [
StructItem {
field: 0,
child: None,
},
StructItem {
field: 4,
child: None,
},
],
},
),
maintain_singular_struct: false,
},
),
advanced_extension: None,
read_type: Some(
NamedTable(
NamedTable {
names: [
"data",
],
advanced_extension: None,
},
),
),
},
),
),
},
),
right: Some(
Rel {
rel_type: Some(
Read(
ReadRel {
common: None,
base_schema: Some(
NamedStruct {
names: [
"a",
"b",
"c",
"d",
"e",
"f",
],
r#struct: None,
},
),
filter: None,
best_effort_filter: None,
projection: Some(
MaskExpression {
select: Some(
StructSelect {
struct_items: [
StructItem {
field: 4,
child: None,
},
],
},
),
maintain_singular_struct: false,
},
),
advanced_extension: None,
read_type: Some(
NamedTable(
NamedTable {
names: [
"data",
],
advanced_extension: None,
},
),
),
},
),
),
},
),
expression: None,
post_join_filter: Some(
Expression {
rex_type: Some(
ScalarFunction(
ScalarFunction {
function_reference: 0,
arguments: [
FunctionArgument {
arg_type: Some(
Value(
Expression {
rex_type: Some(
Selection(
FieldReference {
reference_type: Some(
DirectReference(
ReferenceSegment {
reference_type: Some(
StructField(
StructField {
field: 1,
child: None,
},
),
),
},
),
),
root_type: None,
},
),
),
},
),
),
},
FunctionArgument {
arg_type: Some(
Value(
Expression {
rex_type: Some(
Selection(
FieldReference {
reference_type: Some(
DirectReference(
ReferenceSegment {
reference_type: Some(
StructField(
StructField {
field: 2,
child: None,
},
),
),
},
),
),
root_type: None,
},
),
),
},
),
),
},
],
options: [],
output_type: None,
args: [],
},
),
),
},
),
r#type: Inner,
advanced_extension: None,
},
),
),
},
),
expressions: [
Expression {
rex_type: Some(
Selection(
FieldReference {
reference_type: Some(
DirectReference(
ReferenceSegment {
reference_type: Some(
StructField(
StructField {
field: 0,
child: None,
},
),
),
},
),
),
root_type: None,
},
),
),
},
],
advanced_extension: None,
},
),
),
},
),
names: [
"d1.a",
],
},
),
),
},
],
advanced_extensions: None,
expected_type_urls: [],
}
Preserve aliases in Substrait.
Is your feature request related to a problem or challenge?
If there is a
SubqueryAliasrelation,datafusion-substraitwill bypass it. This works for the producer, the generated Substrait plans are correct. However, the DF plan generated with the consumer will be incorrect since it has no way to distinguish between the different relations that read from the same table.This can be demonstrated in these examples:
The original DF plan is:
Projection: d1.a Inner Join: Filter: d1.e != d2.e SubqueryAlias: d1 TableScan: data projection=[a, e] SubqueryAlias: d2 TableScan: data projection=[e]once this plan is fed through the producer, we get the correct Substrait plan:
Plan { version: Some( Version { major_number: 0, minor_number: 31, patch_number: 0, git_hash: "", producer: "datafusion", }, ), extension_uris: [], extensions: [ SimpleExtensionDeclaration { mapping_type: Some( ExtensionFunction( ExtensionFunction { extension_uri_reference: 4294967295, function_anchor: 0, name: "not_equal", }, ), ), }, ], relations: [ PlanRel { rel_type: Some( Root( RelRoot { input: Some( Rel { rel_type: Some( Project( ProjectRel { common: None, input: Some( Rel { rel_type: Some( Join( JoinRel { common: None, left: Some( Rel { rel_type: Some( Read( ReadRel { common: None, base_schema: Some( NamedStruct { names: [ "a", "b", "c", "d", "e", "f", ], r#struct: None, }, ), filter: None, best_effort_filter: None, projection: Some( MaskExpression { select: Some( StructSelect { struct_items: [ StructItem { field: 0, child: None, }, StructItem { field: 4, child: None, }, ], }, ), maintain_singular_struct: false, }, ), advanced_extension: None, read_type: Some( NamedTable( NamedTable { names: [ "data", ], advanced_extension: None, }, ), ), }, ), ), }, ), right: Some( Rel { rel_type: Some( Read( ReadRel { common: None, base_schema: Some( NamedStruct { names: [ "a", "b", "c", "d", "e", "f", ], r#struct: None, }, ), filter: None, best_effort_filter: None, projection: Some( MaskExpression { select: Some( StructSelect { struct_items: [ StructItem { field: 4, child: None, }, ], }, ), maintain_singular_struct: false, }, ), advanced_extension: None, read_type: Some( NamedTable( NamedTable { names: [ "data", ], advanced_extension: None, }, ), ), }, ), ), }, ), expression: None, post_join_filter: Some( Expression { rex_type: Some( ScalarFunction( ScalarFunction { function_reference: 0, arguments: [ FunctionArgument { arg_type: Some( Value( Expression { rex_type: Some( Selection( FieldReference { reference_type: Some( DirectReference( ReferenceSegment { reference_type: Some( StructField( StructField { field: 1, child: None, }, ), ), }, ), ), root_type: None, }, ), ), }, ), ), }, FunctionArgument { arg_type: Some( Value( Expression { rex_type: Some( Selection( FieldReference { reference_type: Some( DirectReference( ReferenceSegment { reference_type: Some( StructField( StructField { field: 2, child: None, }, ), ), }, ), ), root_type: None, }, ), ), }, ), ), }, ], options: [], output_type: None, args: [], }, ), ), }, ), r#type: Inner, advanced_extension: None, }, ), ), }, ), expressions: [ Expression { rex_type: Some( Selection( FieldReference { reference_type: Some( DirectReference( ReferenceSegment { reference_type: Some( StructField( StructField { field: 0, child: None, }, ), ), }, ), ), root_type: None, }, ), ), }, ], advanced_extension: None, }, ), ), }, ), names: [ "d1.a", ], }, ), ), }, ], advanced_extensions: None, expected_type_urls: [], }however, if we want to get back a DF plan, and use the consumer, we'll get:
Notice that because there is no way for DF to distinguish between the left
datatable and the rightdatatable, DF thinks they are they are from the sameTableScanrelation. Thus, the output DF plan is incorrect.Describe the solution you'd like
Preserve aliases in Substrait.
Describe alternatives you've considered
N/A
Additional context
Additional example: