Skip to content

take schema from record batch when creating mem table if any#7637

Closed
matthewgapp wants to merge 1 commit intoapache:mainfrom
matthewgapp:matt/fix/incorrect-schema-saved-in-create-table-ddl
Closed

take schema from record batch when creating mem table if any#7637
matthewgapp wants to merge 1 commit intoapache:mainfrom
matthewgapp:matt/fix/incorrect-schema-saved-in-create-table-ddl

Conversation

@matthewgapp
Copy link
Copy Markdown
Contributor

@matthewgapp matthewgapp commented Sep 24, 2023

This is probably better resolved by embedding this information in the logical plan

Which issue does this PR close?

Closes #7636

Rationale for this change

The schema that is saved when using create table should be correct (i.e., it should capture nullable: false requirements on fields).

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions Bot added the core Core DataFusion crate label Sep 24, 2023
let physical = DataFrame::new(self.state(), input);

let batches: Vec<_> = physical.collect_partitioned().await?;
let schema = {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the bug but feels like the input.schema() should already be correct. We shouldn't need to reach into the record batch to find out the true schema

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't investigated enough to know what is the "correct" fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree input.schema() should be fixed (in the sense that in the RecordBatches produced don't match the schema declared) that is the root cause of the issue

Perhaps this is related to #7636 that you filed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #7636 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my initial and quick attempt at fixing #7636 before I dove deeper toward a more correct solution. I'm going to close this out since it should be replaced by #7638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CREATE TABLE DDL does not save correct schema, resulting in mismatched plan vs execution (record batch) schema

2 participants