Skip to content

[Feature] Support arbitrary scalar functions in materialized view partition expressions (e.g. CONVERT_TZ, DATE_ADD, CAST) #62395

@hakanuzum

Description

@hakanuzum

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Search before asking

I had searched in the issues and found no similar issues.

Description

Currently, Apache Doris async materialized views only allow date_trunc
as a valid function in PARTITION BY expressions. Any other scalar
function applied to a datetime column — such as CONVERT_TZ, DATE_ADD,
CAST, or combinations thereof — is rejected at DDL time with:

errCode = 2, detailMessage = Unable to find a suitable base table for
partitioning, the fail reason is partition column is not in group by or
window partition by, [column to check use invalid implicit expression,
invalid expression is convert_tz(created_on, 'UTC', 'Europe/Istanbul')]

This is a fundamental limitation that affects a wide range of real-world
use cases wherever data needs to be transformed, normalized, or
bucketed
differently from how it is stored.


Real-World Use Cases

Case 1 — Timezone Conversion (UTC → Local Time)

Data is ingested in UTC. Business reporting requires partitioning by
local calendar day.

2025-07-24 21:00:00 UTC = 2025-07-25 00:00:00 in UTC+3 (Turkey).
Without conversion, this record falls into the wrong day partition.

-- Desired: partition by local calendar day
PARTITION BY (date_trunc(CONVERT_TZ(created_on, 'UTC', 'Europe/Istanbul'), 'day'))

Case 2 — Non-UTC Source Timezone to UTC Normalization

Data arrives in a local timezone (e.g. America/New_York) and needs to
be normalized to UTC for partitioning to ensure consistency across
regions.

-- Normalize EST data to UTC before partitioning
PARTITION BY (date_trunc(CONVERT_TZ(event_time, 'America/New_York', 'UTC'), 'day'))

Case 3 — Fiscal / Business Day Offset

Some businesses define their "day" as starting at 06:00 instead of
00:00 (e.g. a scooter rental company where the operational day starts
at dawn).

-- Shift time by -6 hours so partitions align with business day
PARTITION BY (date_trunc(DATE_SUB(created_on, INTERVAL 6 HOUR), 'day'))

Case 4 — Unix Timestamp to Date

Raw data is stored as a Unix epoch integer. Partitioning requires
conversion to a date type first.

-- Convert epoch to date for partitioning
PARTITION BY (date_trunc(FROM_UNIXTIME(event_ts), 'day'))

Case 5 — Type Casting for Partitioning

A DATETIME column needs to be cast to DATE for partition alignment.

-- Cast DATETIME to DATE
PARTITION BY (CAST(event_time AS DATE))

Current Workaround

The only workaround today is to materialize the transformed column
directly in the base table:

ALTER TABLE my_table ADD COLUMN created_on_local DATE;
UPDATE my_table 
SET created_on_local = CAST(
  date_trunc(CONVERT_TZ(created_on, 'UTC', 'Europe/Istanbul'), 'day') AS DATE
);

This is problematic because:

  • It requires schema changes to potentially immutable source tables
  • It pollutes the base table with derived/presentation-layer columns
  • It requires changes to every ingestion pipeline writing to that table
  • It duplicates storage for data that can be computed on the fly
  • It is not always possible when the base table is an external or
    catalog table

Expected Behavior

Allow PARTITION BY in async materialized views to support any
deterministic scalar expression over a date/datetime column,
not just date_trunc. At minimum, the following should be supported:

Expression Description
date_trunc(CONVERT_TZ(col, src_tz, dst_tz), unit) Timezone-aware day/hour partitioning
date_trunc(DATE_ADD(col, INTERVAL N HOUR), unit) Business hour offset partitioning
date_trunc(FROM_UNIXTIME(col), unit) Epoch-to-date partitioning
CAST(col AS DATE) Simple type cast partitioning

The key requirement is that the expression must be deterministic
and monotonically non-decreasing with respect to the underlying
column, so that Doris can still correctly map base table partitions
to materialized view partitions for incremental refresh.


Comparison with Other Engines

Engine Arbitrary expressions in MV partition
ClickHouse ✅ Supported natively
BigQuery ✅ Supported via expression-based partitioning
Snowflake ✅ Supported
Apache Doris ❌ Only date_trunc supported

Impact

This limitation affects any deployment where:

  • Data is stored in UTC and queried in local time
  • Multiple timezones are involved (global products)
  • Source systems use non-midnight day boundaries
  • Raw timestamps require type conversion before partitioning

This is not a niche use case — it applies to virtually every
production deployment handling time-series data across timezones.


Version

Apache Doris 4.0.5

Are you willing to submit a PR?

No, but I am happy to provide detailed test cases, expected behaviors,
and feedback on any proposed implementation.

Use case

Any deployment storing datetime data needs to transform or normalize
timestamps before partitioning materialized views. Common scenarios include:

  1. UTC storage, local-time reporting: Data ingested in UTC must be
    partitioned by local calendar day (e.g. UTC+3 Turkey, UTC-5 New York).
    Without CONVERT_TZ support, records near midnight fall into the wrong
    day partition — a silent data correctness issue.

  2. Non-midnight business day boundaries: Operational systems (logistics,
    scooter rentals, restaurants) define their "day" starting at 06:00 or
    08:00. Partitioning requires DATE_SUB(col, INTERVAL 6 HOUR) before
    date_trunc.

  3. Epoch integer to date: Raw event tables store Unix timestamps as
    BIGINT. Partitioning requires FROM_UNIXTIME(col) before date_trunc.

  4. Cross-timezone SaaS platforms: Multi-region products store events in
    UTC but must report per-tenant local time, requiring per-query or
    per-view timezone conversion.

In all cases above, the workaround — adding a pre-computed column to the
base table — is invasive, wastes storage, and is impossible when the
source is an external/catalog table.

Related issues

No directly related issues found.
Closest context: #35562 added date_trunc support for partition tracking,
but did not generalize to other deterministic scalar functions.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions