Search before asking
Description
Search before asking
I had searched in the issues and found no similar issues.
Description
Currently, Apache Doris async materialized views only allow date_trunc
as a valid function in PARTITION BY expressions. Any other scalar
function applied to a datetime column — such as CONVERT_TZ, DATE_ADD,
CAST, or combinations thereof — is rejected at DDL time with:
errCode = 2, detailMessage = Unable to find a suitable base table for
partitioning, the fail reason is partition column is not in group by or
window partition by, [column to check use invalid implicit expression,
invalid expression is convert_tz(created_on, 'UTC', 'Europe/Istanbul')]
This is a fundamental limitation that affects a wide range of real-world
use cases wherever data needs to be transformed, normalized, or
bucketed differently from how it is stored.
Real-World Use Cases
Case 1 — Timezone Conversion (UTC → Local Time)
Data is ingested in UTC. Business reporting requires partitioning by
local calendar day.
2025-07-24 21:00:00 UTC = 2025-07-25 00:00:00 in UTC+3 (Turkey).
Without conversion, this record falls into the wrong day partition.
-- Desired: partition by local calendar day
PARTITION BY (date_trunc(CONVERT_TZ(created_on, 'UTC', 'Europe/Istanbul'), 'day'))
Case 2 — Non-UTC Source Timezone to UTC Normalization
Data arrives in a local timezone (e.g. America/New_York) and needs to
be normalized to UTC for partitioning to ensure consistency across
regions.
-- Normalize EST data to UTC before partitioning
PARTITION BY (date_trunc(CONVERT_TZ(event_time, 'America/New_York', 'UTC'), 'day'))
Case 3 — Fiscal / Business Day Offset
Some businesses define their "day" as starting at 06:00 instead of
00:00 (e.g. a scooter rental company where the operational day starts
at dawn).
-- Shift time by -6 hours so partitions align with business day
PARTITION BY (date_trunc(DATE_SUB(created_on, INTERVAL 6 HOUR), 'day'))
Case 4 — Unix Timestamp to Date
Raw data is stored as a Unix epoch integer. Partitioning requires
conversion to a date type first.
-- Convert epoch to date for partitioning
PARTITION BY (date_trunc(FROM_UNIXTIME(event_ts), 'day'))
Case 5 — Type Casting for Partitioning
A DATETIME column needs to be cast to DATE for partition alignment.
-- Cast DATETIME to DATE
PARTITION BY (CAST(event_time AS DATE))
Current Workaround
The only workaround today is to materialize the transformed column
directly in the base table:
ALTER TABLE my_table ADD COLUMN created_on_local DATE;
UPDATE my_table
SET created_on_local = CAST(
date_trunc(CONVERT_TZ(created_on, 'UTC', 'Europe/Istanbul'), 'day') AS DATE
);
This is problematic because:
- It requires schema changes to potentially immutable source tables
- It pollutes the base table with derived/presentation-layer columns
- It requires changes to every ingestion pipeline writing to that table
- It duplicates storage for data that can be computed on the fly
- It is not always possible when the base table is an external or
catalog table
Expected Behavior
Allow PARTITION BY in async materialized views to support any
deterministic scalar expression over a date/datetime column,
not just date_trunc. At minimum, the following should be supported:
| Expression |
Description |
date_trunc(CONVERT_TZ(col, src_tz, dst_tz), unit) |
Timezone-aware day/hour partitioning |
date_trunc(DATE_ADD(col, INTERVAL N HOUR), unit) |
Business hour offset partitioning |
date_trunc(FROM_UNIXTIME(col), unit) |
Epoch-to-date partitioning |
CAST(col AS DATE) |
Simple type cast partitioning |
The key requirement is that the expression must be deterministic
and monotonically non-decreasing with respect to the underlying
column, so that Doris can still correctly map base table partitions
to materialized view partitions for incremental refresh.
Comparison with Other Engines
| Engine |
Arbitrary expressions in MV partition |
| ClickHouse |
✅ Supported natively |
| BigQuery |
✅ Supported via expression-based partitioning |
| Snowflake |
✅ Supported |
| Apache Doris |
❌ Only date_trunc supported |
Impact
This limitation affects any deployment where:
- Data is stored in UTC and queried in local time
- Multiple timezones are involved (global products)
- Source systems use non-midnight day boundaries
- Raw timestamps require type conversion before partitioning
This is not a niche use case — it applies to virtually every
production deployment handling time-series data across timezones.
Version
Apache Doris 4.0.5
Are you willing to submit a PR?
No, but I am happy to provide detailed test cases, expected behaviors,
and feedback on any proposed implementation.
Use case
Any deployment storing datetime data needs to transform or normalize
timestamps before partitioning materialized views. Common scenarios include:
-
UTC storage, local-time reporting: Data ingested in UTC must be
partitioned by local calendar day (e.g. UTC+3 Turkey, UTC-5 New York).
Without CONVERT_TZ support, records near midnight fall into the wrong
day partition — a silent data correctness issue.
-
Non-midnight business day boundaries: Operational systems (logistics,
scooter rentals, restaurants) define their "day" starting at 06:00 or
08:00. Partitioning requires DATE_SUB(col, INTERVAL 6 HOUR) before
date_trunc.
-
Epoch integer to date: Raw event tables store Unix timestamps as
BIGINT. Partitioning requires FROM_UNIXTIME(col) before date_trunc.
-
Cross-timezone SaaS platforms: Multi-region products store events in
UTC but must report per-tenant local time, requiring per-query or
per-view timezone conversion.
In all cases above, the workaround — adding a pre-computed column to the
base table — is invasive, wastes storage, and is impossible when the
source is an external/catalog table.
Related issues
No directly related issues found.
Closest context: #35562 added date_trunc support for partition tracking,
but did not generalize to other deterministic scalar functions.
Are you willing to submit PR?
Code of Conduct
Search before asking
Description
Search before asking
I had searched in the issues and found no similar issues.
Description
Currently, Apache Doris async materialized views only allow
date_truncas a valid function in
PARTITION BYexpressions. Any other scalarfunction applied to a datetime column — such as
CONVERT_TZ,DATE_ADD,CAST, or combinations thereof — is rejected at DDL time with:errCode = 2, detailMessage = Unable to find a suitable base table for
partitioning, the fail reason is partition column is not in group by or
window partition by, [column to check use invalid implicit expression,
invalid expression is convert_tz(created_on, 'UTC', 'Europe/Istanbul')]
This is a fundamental limitation that affects a wide range of real-world
use cases wherever data needs to be transformed, normalized, or
bucketed differently from how it is stored.
Real-World Use Cases
Case 1 — Timezone Conversion (UTC → Local Time)
Data is ingested in UTC. Business reporting requires partitioning by
local calendar day.
2025-07-24 21:00:00 UTC=2025-07-25 00:00:00in UTC+3 (Turkey).Without conversion, this record falls into the wrong day partition.
Case 2 — Non-UTC Source Timezone to UTC Normalization
Data arrives in a local timezone (e.g. America/New_York) and needs to
be normalized to UTC for partitioning to ensure consistency across
regions.
Case 3 — Fiscal / Business Day Offset
Some businesses define their "day" as starting at 06:00 instead of
00:00 (e.g. a scooter rental company where the operational day starts
at dawn).
Case 4 — Unix Timestamp to Date
Raw data is stored as a Unix epoch integer. Partitioning requires
conversion to a date type first.
Case 5 — Type Casting for Partitioning
A DATETIME column needs to be cast to DATE for partition alignment.
Current Workaround
The only workaround today is to materialize the transformed column
directly in the base table:
This is problematic because:
catalog table
Expected Behavior
Allow
PARTITION BYin async materialized views to support anydeterministic scalar expression over a date/datetime column,
not just
date_trunc. At minimum, the following should be supported:date_trunc(CONVERT_TZ(col, src_tz, dst_tz), unit)date_trunc(DATE_ADD(col, INTERVAL N HOUR), unit)date_trunc(FROM_UNIXTIME(col), unit)CAST(col AS DATE)The key requirement is that the expression must be deterministic
and monotonically non-decreasing with respect to the underlying
column, so that Doris can still correctly map base table partitions
to materialized view partitions for incremental refresh.
Comparison with Other Engines
date_truncsupportedImpact
This limitation affects any deployment where:
This is not a niche use case — it applies to virtually every
production deployment handling time-series data across timezones.
Version
Apache Doris 4.0.5
Are you willing to submit a PR?
No, but I am happy to provide detailed test cases, expected behaviors,
and feedback on any proposed implementation.
Use case
Any deployment storing datetime data needs to transform or normalize
timestamps before partitioning materialized views. Common scenarios include:
UTC storage, local-time reporting: Data ingested in UTC must be
partitioned by local calendar day (e.g. UTC+3 Turkey, UTC-5 New York).
Without CONVERT_TZ support, records near midnight fall into the wrong
day partition — a silent data correctness issue.
Non-midnight business day boundaries: Operational systems (logistics,
scooter rentals, restaurants) define their "day" starting at 06:00 or
08:00. Partitioning requires DATE_SUB(col, INTERVAL 6 HOUR) before
date_trunc.
Epoch integer to date: Raw event tables store Unix timestamps as
BIGINT. Partitioning requires FROM_UNIXTIME(col) before date_trunc.
Cross-timezone SaaS platforms: Multi-region products store events in
UTC but must report per-tenant local time, requiring per-query or
per-view timezone conversion.
In all cases above, the workaround — adding a pre-computed column to the
base table — is invasive, wastes storage, and is impossible when the
source is an external/catalog table.
Related issues
No directly related issues found.
Closest context: #35562 added date_trunc support for partition tracking,
but did not generalize to other deterministic scalar functions.
Are you willing to submit PR?
Code of Conduct