Skip to content

Antalya 26.1 - Forward port of parquet metadata caching #1039#1385

Merged
zvonand merged 8 commits intoantalya-26.1from
fp_antalya_26_1_parquet_metadata_cache
Mar 3, 2026
Merged

Antalya 26.1 - Forward port of parquet metadata caching #1039#1385
zvonand merged 8 commits intoantalya-26.1from
fp_antalya_26_1_parquet_metadata_cache

Conversation

@arthurpassos
Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos commented Feb 9, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Forward port of parquet metadata caching #1039 and #1186

Documentation entry for user-facing changes

Forward port of parquet metadata caching #1039

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

…a_caching

Antalya 25.8 - Forward port of #938 - Parquet metadata caching
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 9, 2026

Workflow [PR], commit [9161c13]

@arthurpassos arthurpassos changed the title FP - do not review yet - Merge pull request #1039 from Altinity/fp_antaya_25_8_parquet_metadat… Antalya 26.1 - Forward port of parquet metadata caching #1039 Feb 9, 2026
@arthurpassos arthurpassos added antalya port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Feb 9, 2026
Comment thread src/Core/SettingsChangesHistory.cpp Outdated
{"throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insert", true, false, "It becomes obsolete."},
{"database_datalake_require_metadata_access", true, true, "New setting."},
{"automatic_parallel_replicas_min_bytes_per_replica", 0, 1_MiB, "Better default value derived from testing results"},
{"input_format_parquet_use_metadata_cache", true, true, "New setting, turned ON by default"},
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Baselines generated with v25.12.1 (pre-release)

@arthurpassos
Copy link
Copy Markdown
Collaborator Author

As far as I can tell, this one is ready for reviehe failures look unrelated.

@zvonand
Copy link
Copy Markdown
Collaborator

zvonand commented Mar 3, 2026

  • test_storage_delta/test.py::test_network_activity_with_system_tables -- fails in other PRs
  • Integration tests (arm_binary, distributed plan, 3/4) -- unstable suite, always gets killed by timeout

Regression suites did not run for an unknown reason.

@zvonand zvonand merged commit 1fb76a0 into antalya-26.1 Mar 3, 2026
308 of 316 checks passed
@Selfeer
Copy link
Copy Markdown
Collaborator

Selfeer commented Mar 5, 2026

Audit Review - PR #1385

Confirmed defects first (fail-first): 1 High, 0 Medium, 0 Low.


1) Scope and partitions

PR audited: Antalya 26.1 - Forward port of parquet metadata caching #1039
Static review basis: diff bundle for PR #1385 plus referenced codepaths in repository.

Functional partitions used for deep audit:

  1. Parquet metadata cache core
    • src/Processors/Formats/Impl/ParquetFileMetaDataCache.{h,cpp}
    • src/Processors/Formats/Impl/ParquetBlockInputFormat.{h,cpp}
    • src/Processors/Formats/Impl/ParquetMetadataInputFormat.{h,cpp}
  2. Entry/config/control plane
    • src/Core/FormatFactorySettings.h
    • src/Core/ServerSettings.cpp
    • programs/server/Server.cpp
    • src/Interpreters/InterpreterSystemQuery.cpp
    • src/Parsers/ASTSystemQuery.{h,cpp}
    • src/Access/Common/AccessType.h
  3. Object-storage integration + tests/observability
    • src/Storages/ObjectStorage/StorageObjectStorageSource.cpp
    • src/Common/ProfileEvents.cpp
    • stateless/integration tests added in PR

Cross-partition deduplication: one root-cause defect identified; no additional independent root causes confirmed.


2) Call graph

2.1 Runtime read path (Parquet data / metadata formats)

  • Entrypoint: object-storage source creates input format via FormatFactory::getInput(...).
  • StorageObjectStorageSource injects storage identity via IInputFormat::setStorageRelatedUniqueKey(settings, path:etag).
  • ParquetBlockInputFormat::setStorageRelatedUniqueKey(...) / ParquetMetadataInputFormat::setStorageRelatedUniqueKey(...) store {key, use_cache}.
  • Read path:
    • ParquetBlockInputFormat::getFileMetaData() or getFileMetadata(...) helper in ParquetMetadataInputFormat.cpp
    • if cache disabled/missing key -> direct parquet::ReadMetaData(...)
    • else -> ParquetFileMetaDataCache::instance()->getOrSet(key, load_func)
  • Shared-state/cache interaction:
    • ParquetFileMetaDataCache extends CacheBase<String, parquet::FileMetaData>
    • cache max size configured in Server.cpp from input_format_parquet_metadata_cache_max_size.
  • Side effects:
    • profile counters ParquetMetaDataCacheHits/ParquetMetaDataCacheMisses.

2.2 Control path (SYSTEM DROP PARQUET METADATA CACHE)

  • SQL parse/AST enum: ASTSystemQuery::Type::DROP_PARQUET_METADATA_CACHE.
  • Interpreter dispatch: InterpreterSystemQuery switch case.
  • Access validation: checkAccess(AccessType::SYSTEM_DROP_PARQUET_METADATA_CACHE).
  • Mutation: ParquetFileMetaDataCache::instance()->clear().
  • Error path (compiled without parquet): throws SUPPORT_IS_DISABLED.

2.3 Config/load path

  • Session setting: input_format_parquet_use_metadata_cache (format setting, default ON).
  • Server setting: input_format_parquet_metadata_cache_max_size (intended bytes limit).
  • Bootstrap wiring: Server.cpp sets cache max size from server setting.

3) Transition matrix

Transition ID Entry -> Processing -> State update -> Output Key invariants
T1 Parquet read -> evaluate use_cache && key -> bypass cache -> metadata returned I1: no stale shared state mutation in bypass path
T2 Parquet read -> getOrSet(key) miss -> load metadata -> insert cache -> return metadata I2: cache bound must enforce configured memory cap
T3 Parquet read -> getOrSet(key) hit -> return cached metadata I3: metadata object lifetime safe across threads
T4 SYSTEM DROP command -> access check -> clear() cache -> future reads repopulate I4: clear is atomic with respect to in-flight inserts (no corruption)
T5 Server start -> read server setting -> setMaxSizeInBytes(...) I5: unit semantics of configured size match implementation accounting

Invariants used:

  • I1 Correctness invariant: bypass path must still parse full metadata exactly once per read.
  • I2 Resource invariant: cache growth must stay under configured max-size intent.
  • I3 Concurrency invariant: shared cache operations must be thread-safe and non-UAF.
  • I4 Management invariant: clear operation must not leave inconsistent cache internals.
  • I5 Config-contract invariant: setting names/descriptions and runtime behavior must align.

4) Logical code-path testing summary

Reviewed branches (static):

  • Cache disabled branch (use_cache=0) -> direct metadata read.
  • Cache enabled but empty key branch (local file/non-keyed path) -> direct metadata read.
  • Cache enabled + key branch:
    • miss (loaded=true) -> load and store.
    • hit (loaded=false) -> return existing.
  • System command branch:
    • parquet enabled build -> access checked + clear.
    • parquet disabled build -> explicit exception.
  • Setting/default branch:
    • session default ON for metadata cache.
    • server max-size setting consumed at startup.

Malformed/integration/timing paths considered:

  • missing object metadata (key not set) -> bypass path.
  • repeated reads for many unique object keys.
  • concurrent readers on same key and different keys.
  • concurrent clear() interleaving with getOrSet(...).

Observed fail-open/fail-closed behavior:

  • Access control for SYSTEM DROP is fail-closed (explicit privilege gate).
  • Cache absence/misconfiguration generally fails-open to direct file reads (availability-friendly).

5) Fault categories and category-by-category injection results

Fault categories defined from changed logic:

  1. C1 - Cache accounting/eviction contract faults
    • Scope: max-size setting, cache weight accounting, eviction trigger.
  2. C2 - Shared-state concurrency faults
    • Scope: getOrSet, clear, singleton cache synchronization.
  3. C3 - Keying correctness faults
    • Scope: key derivation (path:etag) and stale-hit prevention.
  4. C4 - Access-control/command dispatch faults
    • Scope: parser enum, interpreter switch, privilege mapping.
  5. C5 - Error-contract parity faults
    • Scope: unsupported-build branch and equivalent cache-drop command behavior.
  6. C6 - Performance/resource pressure faults
    • Scope: high-cardinality object sets; memory pressure behavior.
  7. C7 - C++ lifetime/UB/invalidation faults
    • Scope: object lifetimes, shared_ptr safety, container invalidation.
  8. C8 - Exception-safety/partial-update faults
    • Scope: loader exceptions in getOrSet, clear+insert transitions.

Category-by-category logical injection outcomes:

  • C1 Executed - FAILED - Defect confirmed
    Injected condition: high cardinality of unique parquet keys with non-trivial metadata sizes.
    Outcome: configured "max size" is not enforced as bytes; practical unbounded growth.
  • C2 Executed - Passed
    CacheBase uses internal mutex + insert tokens; no race/deadlock defect confirmed in reviewed paths.
  • C3 Executed - Passed (with residual risk noted)
    path:etag keying avoids obvious stale-hit on object updates where etag changes.
  • C4 Executed - Passed
    parser/interpreter/access additions are coherent for DROP command.
  • C5 Executed - Passed
    unsupported parquet build throws explicit error; no silent success.
  • C6 Executed - FAILED via C1 root cause
    resource-failure mode is memory-growth risk due to accounting mismatch.
  • C7 Executed - Passed
    no UAF/use-after-move/iterator invalidation defect confirmed in new code.
  • C8 Executed - Passed
    getOrSet exception path appears rollback-safe for token ownership and cache insertion.

5.1 Fault-category completion matrix

Category Status Outcome Defects Notes
C1 accounting/eviction Executed Fail 1 Root-cause defect
C2 concurrency Executed Pass 0 Mutex/token model reviewed
C3 keying correctness Executed Pass 0 ETag-based keying reviewed
C4 access/dispatch Executed Pass 0 Parser+Interpreter+AccessType aligned
C5 error contract Executed Pass 0 Unsupported build path explicit
C6 perf/resource Executed Fail 1 Same root cause as C1
C7 C++ lifetime/UB Executed Pass 0 No confirmed lifetime UB
C8 exception/partial-update Executed Pass 0 No confirmed partial-update defect

No category marked Not Applicable or Deferred.

5.2 Interleaving analysis (multithreaded paths)

  • I-A Thread A getOrSet(k) miss, Thread B getOrSet(k) concurrent:
    • token serialization in CacheBase allows single loader; second waits and reuses value.
    • no duplicate load defect confirmed.
  • I-B Thread A getOrSet(k) loading, Thread B clear():
    • clear() wipes insert_tokens; post-load insertion checks token presence before set.
    • prevents inconsistent insertion after clear; correctness preserved.
  • I-C Many threads getOrSet(k_i) distinct keys under memory pressure:
    • synchronization works, but eviction contract fault (C1) permits excessive retained memory.

6) Confirmed defects (High / Medium / Low)

High

H-1: Parquet metadata cache max-size setting is semantically broken (entries counted as weight 1, not bytes)

  • Impact: server memory can grow far beyond configured input_format_parquet_metadata_cache_max_size, risking OOM and availability loss under many unique parquet objects.
  • Severity rationale (rubric):
    • Correctness/availability impact: High (resource exhaustion / potential crash).
    • Likelihood: realistic for object storage workloads with high file cardinality.
    • Blast radius: process-wide (global singleton cache).
    • Exploitability: operationally easy (querying many unique objects).
  • Anchor:
    • src/Processors/Formats/Impl/ParquetFileMetaDataCache.h (ParquetFileMetaDataCache inheritance)
    • src/Common/ICachePolicy.h (EqualWeightFunction)
    • src/Core/ServerSettings.cpp (input_format_parquet_metadata_cache_max_size definition)
  • Trigger condition (fault injection mapping):
    enable metadata cache, run reads across many unique path:etag keys with non-trivial metadata; expected byte cap should evict, but does not as intended.
  • Affected transition: T2/T5 (cache insert + configured-size enforcement).
  • Why this is a defect (not preference):
    product contract says "maximum size" for metadata cache, but implementation uses default per-entry weight (=1), so byte-size semantics are violated.
  • Smallest logical repro:
    1. Keep input_format_parquet_use_metadata_cache=1 and default server max-size setting (500000000).
    2. Query Parquet files in object storage with many unique keys (distinct path/etag).
    3. Observe cache admission continues far past intended byte budget because each entry contributes weight 1.
  • Likely fix direction:
    • Implement a dedicated weight functor for parquet::FileMetaData approximating in-memory bytes.
    • Change cache type to CacheBase<String, parquet::FileMetaData, std::hash<String>, ParquetMetadataWeightFunction>.
    • Add validation docs/tests clarifying unit semantics (bytes vs entries).
  • Regression test direction:
    • Add unit test with synthetic metadata objects + tiny max-size to force eviction by byte weight.
    • Add integration test with many unique parquet files verifying bounded cache sizeInBytes()/count() progression.
    • Verify SYSTEM DROP PARQUET METADATA CACHE resets stats and memory footprint.
  • Affected subsystem / blast radius: Parquet object-storage read path, both Parquet and ParquetMetadata input formats; all server threads using global cache.
  • Code evidence snippets:
// src/Processors/Formats/Impl/ParquetFileMetaDataCache.h
class ParquetFileMetaDataCache : public CacheBase<String, parquet::FileMetaData>
{
public:
    static ParquetFileMetaDataCache * instance();
    // ...
};
// src/Common/ICachePolicy.h
template <typename T>
struct EqualWeightFunction
{
    size_t operator()(const T &) const
    {
        return 1;
    }
};
// src/Core/ServerSettings.cpp
DECLARE(UInt64, input_format_parquet_metadata_cache_max_size, 500000000,
    "Maximum size of parquet file metadata cache", 0)

Medium

No confirmed medium-severity defects.

Low

No confirmed low-severity defects.


7) Coverage accounting + stop-condition status

Call-graph node coverage:

  • Reviewed: 16/16 in-scope changed nodes.
  • Not reviewed: 0 (within PR scope).

Transition coverage:

  • Reviewed transitions: T1..T5 all covered.
  • Not reviewed transitions: none in scope.

Fault category coverage:

  • Executed: 8/8.
  • Not Applicable: 0.
  • Deferred: 0.

Coverage stop-condition: Satisfied
All in-scope call-graph nodes, transitions, and fault categories were either reviewed or executed with outcome.


8) Assumptions & limits

  • Static reasoning only; no runtime execution in this audit pass.
  • Memory-footprint impact is proven by code contract mismatch; exact growth curve depends on real metadata object sizes and workload.
  • Did not run TSAN/ASAN/UBSAN nor full integration suite in this pass.
  • Audit is limited to PR Antalya 26.1 - Forward port of parquet metadata caching #1039 #1385 diff scope and directly referenced cache-policy semantics.

9) Confidence rating and confidence-raising evidence

  • Overall confidence: High for defect H-1.
  • Why high:
    • direct, deterministic semantic mismatch between stated size contract and default cache weight function.
    • clear transition mapping from config (max_size) to cache implementation.
  • Evidence that would further raise confidence for operational impact quantification:
    • runtime memory telemetry under high-cardinality parquet workloads,
    • dedicated eviction tests demonstrating pre-fix/post-fix behavior.

10) Residual risks and untested paths

  • Residual risk (not confirmed defects):
    • keying strategy assumes etag quality/availability in all object-storage backends and proxy layers.
    • profile event semantics rely on getOrSet return bool interpretation remaining stable.
  • Untested paths (runtime):
    • behavior during long-running mixed workloads with frequent cache clears,
    • memory pressure interactions with other caches (query condition, object list cache).

@Selfeer
Copy link
Copy Markdown
Collaborator

Selfeer commented Mar 6, 2026

PR #1385 Verification Report

PR: Antalya 26.1 - Forward port of parquet metadata caching #1039
Author: arthurpassos | State: MERGED | Branch: fp_antalya_26_1_parquet_metadata_cacheantalya-26.1
Changes: +330 / -12 across 23 files | Build: 26.1.3.20001.altinityantalya
Regression Run: 22721284052 (x86, Keeper, with-analyzer)


Verdict: PASS — No regressions introduced by this PR

The PR-specific parquet metadata caching feature does not introduce any regressions. All 10 failures are either pre-existing baseline issues or caused by stale base branch (missing features added to antalya-26.1 after this PR branched off).


PR-Relevant Suites

Suite Status Report
parquet_minio ✅ Pass Job
parquet_aws_s3 ✅ Pass Job
parquet ⚠️ Fail (pre-existing, not metadata-cache related) Job
All S3 suites (minio 1-3, aws 1-2, azure 1-2, gcs 2) ✅ Pass 8/8 passed
All tiered_storage suites (minio, aws, gcs, local) ✅ Pass 4/4 passed

The parquet suite failure is only in the unsupportednull test (exitcode assertion mismatch — error code 636 vs expected), which is a pre-existing issue unrelated to parquet metadata caching. The same failure is observed in PR #1388 verification.


Regression Run Summary

59 passed · 10 failed · 1 skipped · 1 cancelled


Failed Suites Analysis

Category 1: Pre-Existing Failures (5 suites)

These same failures occur in the PR #1388 comparison runs.

Suite Failing Test Error Type Job
aggregate_functions_3 rankCorrState/with group by SnapshotError Job
ice /ice/feature Error Job
parquet datatypes/unsupportednull AssertionError (exitcode 636 vs expected) Job
settings default values (mass SnapshotError) SnapshotError Job
version embedded logos SnapshotError Job

Category 2: Stale Base Branch Failures (4 suites)

These suites fail because PR #1385's build is based on an older state of antalya-26.1 that lacks features added later (export part/partition from PR #1388, object_storage_cluster setting).

Suite Root Cause Error Job
s3_minio_export_part Missing/incomplete export part feature (PR #1388 merged after #1385) Test failure in sanity/basic table Job
s3_minio_export_partition Missing/incomplete export partition feature (PR #1388 merged after #1385) Test failure in sanity/basic table Job
iceberg_1 Missing object_storage_cluster setting UNKNOWN_SETTING Job
swarms Missing object_storage_cluster / cluster discovery features Multiple test failures Job

Category 3: Environment-Specific (1 suite)

Suite Failing Test Error Job
s3_gcs_1 /s3/gcs/part 1 ClickHouse PID still alive during restart (infra issue) Job

Skipped / Cancelled

Suite Status Reason
hive_partitioning ⏭️ Skipped Not applicable to this run configuration
iceberg_2 ❌ Cancelled Likely dependent on iceberg_1 which failed

ClickHouse CI Summary

Category Result
Builds (amd_release, arm_release, amd_debug, amd_binary, arm_binary) ✅ All pass
Quick functional tests ✅ Pass
Stateless tests (all shards/configs) ✅ All pass
Integration tests ⚠️ 3/5 amd pass, 2/4 arm pass (known flaky shards)
Docker images ✅ Pass
Compatibility checks ✅ Pass
Stress / AST fuzzer / BuzzHouse ✅ Pass
Grype vulnerability scans ⚠️ 1 non-alpine scan reported failure; alpine + keeper ✅
Sanitizer tests (ASAN, TSAN, MSAN, UBSAN) ⏭️ Skipped per PR config

Integration test failures are on arm distributed plan shards 3/4 and amd shard 2/5, which are known unstable shards.


@Selfeer Selfeer added the verified-with-issues Verified by QA and issues found. label Mar 6, 2026
@Selfeer Selfeer added verified Approved for release and removed verified-with-issues Verified by QA and issues found. labels Mar 10, 2026
@arthurpassos arthurpassos removed the port-antalya PRs to be ported to all new Antalya releases label Apr 20, 2026
zvonand added a commit that referenced this pull request Apr 28, 2026
…ta_cache

Antalya 26.1 - Forward port of parquet metadata caching #1039

Source-PR: #1385 (#1385)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants