Skip to content

Ambiguous reference error for named columns #7790

@Blajda

Description

@Blajda

Describe the bug

I'm using the dataframe API to perform a join. I can build a join without issue however attempting to add an additional column results in a failure. This is the logical plan

DataFrame {
    session_state: SessionState {
        session_id: "56e65554-2665-46a7-8f3f-6839b25e542c",
    },
    plan: Full Join:  Filter: target.id = source.id
      Projection: source.id, source.value, source.modified, Boolean(true) AS __delta_rs_source
        TableScan: source
      Projection: target.id, target.value, target.modified, Boolean(true) AS __delta_rs_target
        TableScan: target,
}

With the following error being given

Result::unwrap()` on an `Err` value: Generic("Schema error: Ambiguous reference to unqualified field id")

To Reproduce

Original code that caused this issue is here: https://github.com/Blajda/delta-rs/blob/merge-logical/rust/src/operations/merge.rs#L649

Codes that reproduces that issue

let schema = Arc::new(ArrowSchema::new(vec![
    Field::new("id", DataType::Utf8, true),
    Field::new("value", DataType::Int32, true),
    Field::new("modified", DataType::Utf8, true),
]));

let ctx = SessionContext::new();
let batch = RecordBatch::try_new(
    Arc::clone(&schema),
    vec![
        Arc::new(arrow::array::StringArray::from(vec!["B", "C", "X"])),
        Arc::new(arrow::array::Int32Array::from(vec![10, 20, 30])),
        Arc::new(arrow::array::StringArray::from(vec![
            "2021-02-02",
            "2023-07-04",
            "2023-07-04",
        ])),
    ],
)
.unwrap();
let source = ctx.read_batch(batch).unwrap();

let batch = RecordBatch::try_new(
    Arc::clone(&schema),
    vec![
        Arc::new(arrow::array::StringArray::from(vec!["B", "D", "X"])),
        Arc::new(arrow::array::Int32Array::from(vec![10, 20, 30])),
        Arc::new(arrow::array::StringArray::from(vec![
            "2021-02-02",
            "2023-07-04",
            "2023-07-04",
        ])),
    ],
)
.unwrap();
let target = ctx.read_batch(batch).unwrap();

let source_name = TableReference::bare("source");
let source =
    LogicalPlanBuilder::scan(source_name, provider_as_source(source.into_view()), None)
        .unwrap()
        .build()
        .unwrap();
let source = DataFrame::new(ctx.state(), source);

let target_name = TableReference::bare("source");
let target =
    LogicalPlanBuilder::scan(target_name, provider_as_source(target.into_view()), None)
        .unwrap()
        .build()
        .unwrap();
let target = DataFrame::new(ctx.state(), target);

let join = source
    .join(
        target,
        datafusion_common::JoinType::Full,
        &[],
        &[],
        Some(col("source.id").eq(col("target.id"))),
    )
    .unwrap();
let proj = join.with_column("test123", lit(true)).unwrap();
proj.show().await.unwrap();

Expected behavior

I should be able to add a new unique columns to this Dataframe

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions