InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")'

**Describe the bug**
Datafusion panic when I query `select app,count(distinct name) from `demo` group by app`.
Here is the stacktrace:
```
InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")' at "/Users/michael/.cargo/git/checkouts/arrow-datafusion-b9eb4f789f8bda1f/d84ea9c/datafusion/core/src/physical_plan/repartition.rs:178"
   0: backtrace::backtrace::libunwind::trace
             at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:66:5
      backtrace::backtrace::trace_unsynchronized
             at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:66:5
      backtrace::backtrace::trace
             at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/backtrace/mod.rs:53:14
      backtrace::capture::Backtrace::create
             at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/capture.rs:176:9
      backtrace::capture::Backtrace::new
             at /Users/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.66/src/capture.rs:140:22
   1: common_util::panic::set_panic_hook::{{closure}}
             at common_util/src/panic.rs:41:18
   2: std::panicking::rust_panic_with_hook
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/std/src/panicking.rs:702:17
   3: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/std/src/panicking.rs:588:13
   4: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/std/src/sys_common/backtrace.rs:138:18
   5: rust_begin_unwind
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/std/src/panicking.rs:584:5
   6: core::panicking::panic_fmt
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/core/src/panicking.rs:142:14
   7: core::result::unwrap_failed
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/core/src/result.rs:1814:5
   8: core::result::Result<T,E>::unwrap
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/core/src/result.rs:1107:23
      datafusion::physical_plan::repartition::BatchPartitioner::partition
             at /Users/michael/.cargo/git/checkouts/arrow-datafusion-b9eb4f789f8bda1f/d84ea9c/datafusion/core/src/physical_plan/repartition.rs:178:33
   9: datafusion::physical_plan::repartition::RepartitionExec::pull_from_input::{{closure}}
             at /Users/michael/.cargo/git/checkouts/arrow-datafusion-b9eb4f789f8bda1f/d84ea9c/datafusion/core/src/physical_plan/repartition.rs:452:13
      <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/d394408fb38c4de61f765a3ed5189d2731a1da91/library/core/src/future/mod.rs:91:19

```
**To Reproduce**
1. Create a table like this:
```
CREATE TABLE `demo` (`app` string NULL,`name` string NULL, `value` double NOT NULL)
```
2. Insert data 
```
INSERT INTO demo(app, value) VALUES('app', 100)
```
3. Query like following statement， with Group-By and Count(DISTINCT) operator.
```
select `t`, count(distinct name) from demo group by `t`
```
**Expected behavior**
Return a result, not panic

**Additional context**
I found this bug when I use ceresdb, https://github.com/CeresDB/ceresdb/issues/302;  
And I found if partition_num is set to more than 1,  the error is as above;  If partition_num is set to 1, error is as:https://github.com/apache/arrow-datafusion/issues/1623.

With digging into code, I found Logical Plan is :
```
 Projection: #demo.t, #COUNT(DISTINCT demo.name)
  Aggregate: groupBy=[[#demo.t]], aggr=[[COUNT(DISTINCT #demo.name)]]
    TableScan: demo projection=[t, name]
```

and physical plan is as following, I guess the second AggregateExec's schema field "COUNT(DISTINCT demo.name)[count distinct]" which is nullable cause the error. 
```
ProjectionExec {
    expr: [
        (
            Column {
                name: "t",
                index: 0,
            },
            "t",
        ),
        (
            Column {
                name: "COUNT(DISTINCT demo.name)",
                index: 1,
            },
            "COUNT(DISTINCT demo.name)",
        ),
    ],
    schema: Schema {
        fields: [
            Field {
                name: "t",
                data_type: Timestamp(
                    Millisecond,
                    None,
                ),
                nullable: false,
                dict_id: 0,
                dict_is_ordered: false,
                metadata: None,
            },
            Field {
                name: "COUNT(DISTINCT demo.name)",
                data_type: Int64,
                nullable: true,
                dict_id: 0,
                dict_is_ordered: false,
                metadata: None,
            },
        ],
        metadata: {},
    },
    input: AggregateExec {
        mode: FinalPartitioned,
        group_by: PhysicalGroupBy {
            expr: [
                (
                    Column {
                        name: "t",
                        index: 0,
                    },
                    "t",
                ),
            ],
            null_expr: [],
            groups: [
                [
                    false,
                ],
            ],
        },
        aggr_expr: [
            DistinctCount {
                name: "COUNT(DISTINCT demo.name)",
                data_type: Int64,
                state_data_types: [
                    Utf8,
                ],
                exprs: [
                    Column {
                        name: "name",
                        index: 1,
                    },
                ],
            },
        ],
        input: CoalesceBatchesExec {
            input: RepartitionExec {
                input: AggregateExec {
                    mode: Partial,
                    group_by: PhysicalGroupBy {
                        expr: [
                            (
                                Column {
                                    name: "t",
                                    index: 0,
                                },
                                "t",
                            ),
                        ],
                        null_expr: [],
                        groups: [
                            [
                                false,
                            ],
                        ],
                    },
                    aggr_expr: [
                        DistinctCount {
                            name: "COUNT(DISTINCT demo.name)",
                            data_type: Int64,
                            state_data_types: [
                                Utf8,
                            ],
                            exprs: [
                                Column {
                                    name: "name",
                                    index: 1,
                                },
                            ],
                        },
                    ],
                    input: ScanTable {
                        projected_schema: ProjectedSchema {
                            original_schema: Schema {
                                num_key_columns: 2,
                                timestamp_index: 0,
                                tsid_index: Some(
                                    1,
                                ),
                                enable_tsid_primary_key: true,
                                column_schemas: ColumnSchemas {
                                    columns: [
                                        ColumnSchema {
                                            id: 1,
                                            name: "t",
                                            data_type: Timestamp,
                                            is_nullable: false,
                                            is_tag: false,
                                            comment: "",
                                            escaped_name: "t",
                                            default_value: None,
                                        },
                                        ColumnSchema {
                                            id: 2,
                                            name: "tsid",
                                            data_type: UInt64,
                                            is_nullable: false,
                                            is_tag: false,
                                            comment: "",
                                            escaped_name: "tsid",
                                            default_value: None,
                                        },
                                        ColumnSchema {
                                            id: 3,
                                            name: "name",
                                            data_type: String,
                                            is_nullable: true,
                                            is_tag: true,
                                            comment: "",
                                            escaped_name: "name",
                                            default_value: None,
                                        },
                                        ColumnSchema {
                                            id: 4,
                                            name: "value",
                                            data_type: Double,
                                            is_nullable: false,
                                            is_tag: false,
                                            comment: "",
                                            escaped_name: "value",
                                            default_value: None,
                                        },
                                    ],
                                },
                                version: 1,
                            },
                            projection: Some(
                                [
                                    0,
                                    2,
                                ],
                            ),
                        },
                        table: "demo",
                        read_order: None,
                        read_parallelism: 8,
                        predicate: Predicate {
                            exprs: [],
                            time_range: TimeRange {
                                inclusive_start: Timestamp(
                                    -9223372036854775808,
                                ),
                                exclusive_end: Timestamp(
                                    9223372036854775807,
                                ),
                            },
                        },
                    },
                    schema: Schema {
                        fields: [
                            Field {
                                name: "t",
                                data_type: Timestamp(
                                    Millisecond,
                                    None,
                                ),
                                nullable: false,
                                dict_id: 0,
                                dict_is_ordered: false,
                                metadata: None,
                            },
                            Field {
                                name: "COUNT(DISTINCT demo.name)[count distinct]",
                                data_type: List(
                                    Field {
                                        name: "item",
                                        data_type: Utf8,
                                        nullable: true,
                                        dict_id: 0,
                                        dict_is_ordered: false,
                                        metadata: None,
                                    },
                                ),
                                nullable: false,
                                dict_id: 0,
                                dict_is_ordered: false,
                                metadata: None,
                            },
                        ],
                        metadata: {},
                    },
                    input_schema: Schema {
                        fields: [
                            Field {
                                name: "t",
                                data_type: Timestamp(
                                    Millisecond,
                                    None,
                                ),
                                nullable: false,
                                dict_id: 0,
                                dict_is_ordered: false,
                                metadata: Some(
                                    {
                                        "field::comment": "",
                                        "field::id": "1",
                                        "field::is_tag": "false",
                                    },
                                ),
                            },
                            Field {
                                name: "name",
                                data_type: Utf8,
                                nullable: true,
                                dict_id: 0,
                                dict_is_ordered: false,
                                metadata: Some(
                                    {
                                        "field::comment": "",
                                        "field::id": "3",
                                        "field::is_tag": "true",
                                    },
                                ),
                            },
                        ],
                        metadata: {
                            "schema:num_key_columns": "2",
                            "schema::enable_tsid_primary_key": "true",
                            "schema::timestamp_index": "0",
                            "schema::version": "1",
                        },
                    },
                    metrics: ExecutionPlanMetricsSet {
                        inner: Mutex {
                            data: MetricsSet {
                                metrics: [],
                            },
                        },
                    },
                },
                partitioning: Hash(
                    [
                        Column {
                            name: "t",
                            index: 0,
                        },
                    ],
                    8,
                ),
                state: Mutex {
                    data: RepartitionExecState {
                        channels: {},
                        abort_helper: AbortOnDropMany(
                            [],
                        ),
                    },
                },
                metrics: ExecutionPlanMetricsSet {
                    inner: Mutex {
                        data: MetricsSet {
                            metrics: [],
                        },
                    },
                },
            },
            target_batch_size: 4096,
            metrics: ExecutionPlanMetricsSet {
                inner: Mutex {
                    data: MetricsSet {
                        metrics: [],
                    },
                },
            },
        },
        schema: Schema {
            fields: [
                Field {
                    name: "t",
                    data_type: Timestamp(
                        Millisecond,
                        None,
                    ),
                    nullable: false,
                    dict_id: 0,
                    dict_is_ordered: false,
                    metadata: None,
                },
                Field {
                    name: "COUNT(DISTINCT demo.name)",
                    data_type: Int64,
                    nullable: true,
                    dict_id: 0,
                    dict_is_ordered: false,
                    metadata: None,
                },
            ],
            metadata: {},
        },
        input_schema: Schema {
            fields: [
                Field {
                    name: "t",
                    data_type: Timestamp(
                        Millisecond,
                        None,
                    ),
                    nullable: false,
                    dict_id: 0,
                    dict_is_ordered: false,
                    metadata: Some(
                        {
                            "field::comment": "",
                            "field::id": "1",
                            "field::is_tag": "false",
                        },
                    ),
                },
                Field {
                    name: "name",
                    data_type: Utf8,
                    nullable: true,
                    dict_id: 0,
                    dict_is_ordered: false,
                    metadata: Some(
                        {
                            "field::comment": "",
                            "field::id": "3",
                            "field::is_tag": "true",
                        },
                    ),
                },
            ],
            metadata: {
                "schema:num_key_columns": "2",
                "schema::enable_tsid_primary_key": "true",
                "schema::timestamp_index": "0",
                "schema::version": "1",
            },
        },
        metrics: ExecutionPlanMetricsSet {
            inner: Mutex {
                data: MetricsSet {
                    metrics: [],
                },
            },
        },
    },
    metrics: ExecutionPlanMetricsSet {
        inner: Mutex {
            data: MetricsSet {
                metrics: [],
            },
        },
    },
}
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")' #4040

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")' #4040

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions