Skip to content

Antalya 26.1: Fix view over iceberg#1671

Closed
ianton-ru wants to merge 4 commits intoantalya-26.1from
bugfix/antalya-26.1/view_over_iceberg
Closed

Antalya 26.1: Fix view over iceberg#1671
ianton-ru wants to merge 4 commits intoantalya-26.1from
bugfix/antalya-26.1/view_over_iceberg

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix incorrect work view over Iceberg table

Documentation entry for user-facing changes

Solved #1669 - do not make null values for complex column names like materialized(time).
Solved #1629 - use ParquetReadRowGroups in test instead of S3GetObject/AzureGetObject.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

Workflow [PR], commit [3cd97c6]

@ianton-ru ianton-ru added antalya bugfix port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Apr 22, 2026
@ianton-ru ianton-ru changed the title Fix view over iceberg Antalya 26.1: Fix view over iceberg Apr 22, 2026
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 024385ff6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


/// With View over Iceberg table we have someting like 'materialize(time)' as column_name
ParserExpression parser;
ASTPtr expr = parseQuery(parser, column_name, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH, DBMS_DEFAULT_MAX_PARSER_BACKTRACKS);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard expression parsing for non-SQL-safe column names

parseQuery(parser, column_name, ...) throws on syntax errors, but column_name here comes from getNameInStorage() (raw storage names), not guaranteed to be a standalone SQL expression. When schema evolution leaves a nullable column absent in some files and that column name requires quoting (for example names with spaces/symbols), this new path raises SYNTAX_ERROR and aborts the read instead of applying the NULL-constant fallback.

Useful? React with 👍 / 👎.

Comment on lines +778 to +779
if (expr->as<ASTFunction>())
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid treating arbitrary parsed expressions as generated columns

Using expr->as<ASTFunction>() as the skip condition is too broad: quoted physical column names like a-b are parsed as expression ASTs, so this branch skips NULL-constant synthesis for genuinely missing nullable columns. In evolved Iceberg schemas, that leaves such columns in requested_columns_copy, and later readers may try to fetch non-existent parquet fields instead of returning NULLs.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the bugfix/antalya-26.1/view_over_iceberg branch from 6a7740c to 12337b9 Compare April 23, 2026 10:33
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru
Copy link
Copy Markdown
Author

PR #1671 — Antalya 26.1: Fix view over Iceberg

AI audit note: This document was generated by AI (gpt-5.3-codex).

Scope: Altinity fork, branch vs antalya-26.1StorageObjectStorageSource.cpp, tests/integration/test_storage_iceberg_with_spark/test_read_constant_columns_optimization.py.


Confirmed defects

No confirmed defects in reviewed scope.

(Earlier revision had orphaned parser includes after dropping parseQuery; the current branch tip removes those includes, and the net diff vs antalya-26.1 adds only the materialize() heuristic in createReader.)


Coverage summary

Item Detail
Scope reviewed Iceberg experimental read optimization path that synthesizes NULL constants for nullable columns missing from per-file metadata, and skips that for names shaped like materialize(...) (view-over-Iceberg). Integration test changes: ParquetReadRowGroups assertions and new view test.
Categories failed
Categories passed No SQL parse of raw storage names (no throw on odd names); no broad “any function AST” skip; includes unchanged vs base in net diff; changed logic uses only local state (no new shared-state / locking concerns in the diff hunk).

Call graph (in-scope)

Entry: StorageObjectStorageSource::createReader
Branch: allow_experimental_iceberg_read_optimization → file metadata present → iterate columns_info (constants / all-null) → second loop over requested_columns_list keyed by getNameInStorage()
Effect: Optional constant_columns / constant_columns_with_valuesrequested_columns_copy.eraseNames → downstream pipe / count fast path.


Residual risks (not confirmed defects)

  1. Heuristic breadth: Only names matching materialize( prefix and ) suffix are skipped. If another computed storage name can appear in the same nullable-missing-metadata situation, wrong NULL synthesis remains theoretically possible; no in-repo reproduction was traced in review.
  2. Tests: Assertions use ParquetReadRowGroups and assume one row group per data file as in the fixture. check_events accepts is_cluster but the SQL does not use it (harmless, slightly misleading API).

Changelog alignment (from PR description)

  • Addresses incorrect behavior for views over Iceberg when complex names like materialize(time) must not be forced to NULL-from-missing-field.
  • Tests: prefer ParquetReadRowGroups over object-store get-object profile events where appropriate.

References

@ianton-ru
Copy link
Copy Markdown
Author

Failed tests looks unrelated

@ianton-ru
Copy link
Copy Markdown
Author

Rebase on 26.3: #1759

@ianton-ru ianton-ru removed the port-antalya PRs to be ported to all new Antalya releases label May 8, 2026
@ianton-ru ianton-ru closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants