feat: add sorted_series column for DataFusion streaming aggregation#6290
feat: add sorted_series column for DataFusion streaming aggregation#6290
Conversation
60d859c to
9522326
Compare
53bc3e0 to
cb1b4d2
Compare
9522326 to
b0344ba
Compare
9a8b9a5 to
58f4810
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 652d128d5e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
0a0adb5 to
720f498
Compare
|
commit history is messed up, pushed existing history to https://github.com/quickwit-oss/quickwit/tree/matthew.kim/gtt/sorted-series-key just in case, going to cherry pick the commits we want |
652d128 to
069a5a1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 069a5a1871
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Compute a composite, lexicographically sortable binary column (sorted_series) at Parquet write time using storekey order-preserving encoding. For each row the key encodes: 1. Non-null sort schema tag columns as (ordinal: u8, value: str) 2. timeseries_id (i64) as final discriminator Identical timeseries always produce identical byte keys regardless of timestamp or value, enabling DataFusion's streaming AggregateExec and BoundedWindowAggExec with O(1) memory instead of O(N) hash tables. Also fixes create_nullable_dict_array which used the original array index as dictionary key instead of the position in the unique values array, causing out-of-bounds panics for mixed null/non-null inputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without the ordinal, the timeseries_id bytes could collide with a subsequent tag column's ordinal+string encoding. Every component in the key now consistently gets an ordinal prefix from its sort schema position. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tests that assert: - timeseries_id gets ordinal 6 prefix (its sort schema position) - key length is exact: ordinal(1) + str(2) + ordinal(1) + i64(8) = 12 - when timeseries_id is absent, no trailing ordinal appears Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Writes a 6-row batch with 4 distinct series (including null tags) through the ParquetWriter pipeline, reads back, and verifies: - 4 distinct keys produced (series identity) - series with 3 rows produces 3 identical keys - null host differs from present host (ordinal skipping) - all-null tags differ from partial-null tags - ordinal bytes are correct (0x00 for metric_name, 0x01 for service, 0x06 for timeseries_id) even when intermediate tags are null - equal keys are contiguous after sort (streaming aggregation ready) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Regenerate storekey entry via dd-rust-license-tool (correct authors) - Fix 4 rustfmt nightly formatting diffs in sorted_series tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
069a5a1 to
7444e26
Compare
- Remove sea-query (moved to quickwit-metastore with InsertableParquetSplit) - Remove tokio dev-dependency (unused) - Fix extra blank line in writer.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ys, zonemap) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lently skipping Tag columns in the sort schema must be string-typed (Dict or Utf8). Previously, UInt8 and unknown types returned None, silently dropping them from the sorted_series key and reducing discrimination. Now returns an error for any non-string type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l sort order encode_row_key skips null columns, producing shorter keys that compare before longer keys with the same prefix. With nulls_first=true, a row with a null tag sorted before a non-null tag physically, but its key compared after — breaking the monotonicity invariant needed for DataFusion streaming aggregation. Adds a monotonicity assertion to the integration test: sorted_series values must be non-decreasing in the writer's physical output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
timeseries_id is the only guaranteed discriminator for series identity. Without it, different series sharing the same tags (e.g., same metric and tags but different metric_type) would collapse onto the same sorted_series key, causing incorrect group merges in streaming aggregation. - resolve_key_columns now returns Result and errors if timeseries_id is missing from the sort schema or the batch - encode_row_key takes &KeyColumn (not Option) for timeseries_id - All test helpers and inline test batches include timeseries_id - Two tests converted to verify error on missing timeseries_id Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The metrics and sketch pipeline E2E tests construct batches directly (bypassing OTLP ingest which computes timeseries_id). Now that sorted_series encoding requires timeseries_id, these test batches need it too. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…elds Previously, ParquetWriter::new silently degraded to empty sort order on an unparseable sort_fields string (log + continue). This masked configuration errors and was inconsistent with prepare_write, which propagated parse errors from append_sorted_series_column. Now both paths fail consistently: ParquetWriter::new returns Result and ParquetSplitWriter::new propagates it. Since sort_fields comes from ProductType::default_sort_fields (not user input), a parse failure is a programming error that should surface immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…needs it The enum and its impl block were added anticipating use by row_keys and zonemap PRs, but are dead code in this PR. Remove rather than suppress with #[allow(dead_code)]. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
sorted_seriesbinary column at Parquet write time usingstorekeyorder-preserving encoding(ordinal: u8, value: str)pairs, then appends(ordinal: u8, timeseries_id: i64)as final discriminatorAggregateExecandBoundedWindowAggExecwith O(1) memory instead of O(N) hash tablesreorder_columns) for optimal streaming readcreate_nullable_dict_arraybug: dictionary keys now correctly index into unique values (was using original array index, causing panics for mixed null/non-null inputs)timeseries_idis mandatory — batches without it are rejected as malformedParquetWriter::newreturnsResult— invalid sort fields are a hard error, not a silent degradationnulls_first=falseso sorted_series keys are monotonic with physical sort orderStacked on top of #6287 (column ordering) and timeseries_id work.
Design
Based on the Sorted Series Column design doc:
Null tag columns are skipped (no ordinal or value emitted). The ordinal prefix prevents cross-column byte collisions for sparse schemas.
timeseries_idis always present as the final discriminator — it is the only guaranteed column that distinguishes series with identical tags.Nulls sort last (
nulls_first=false) in the physical sort order, which matches the key encoding: a skipped (null) ordinal produces a shorter key that compares before a present ordinal, and nulls-last ensures the physical row order agrees.Test plan
quickwit-parquet-engine(identity, discrimination, sort-order, null handling, stability, Parquet round-trip, structural ordinal verification, monotonicity, mandatory timeseries_id, proptests)quickwit-indexing(metrics, sketch, file-backed metastore)🤖 Generated with Claude Code