Skip to content

feat: add sorted_series column for DataFusion streaming aggregation#6290

Merged
g-talbot merged 20 commits intomainfrom
gtt/sorted-series-key
Apr 21, 2026
Merged

feat: add sorted_series column for DataFusion streaming aggregation#6290
g-talbot merged 20 commits intomainfrom
gtt/sorted-series-key

Conversation

@g-talbot
Copy link
Copy Markdown
Contributor

@g-talbot g-talbot commented Apr 10, 2026

Summary

  • Compute a composite, lexicographically sortable sorted_series binary column at Parquet write time using storekey order-preserving encoding
  • For each row, encodes non-null sort schema tag columns as (ordinal: u8, value: str) pairs, then appends (ordinal: u8, timeseries_id: i64) as final discriminator
  • Identical timeseries always produce identical byte keys regardless of timestamp or value, enabling DataFusion's streaming AggregateExec and BoundedWindowAggExec with O(1) memory instead of O(N) hash tables
  • Column is placed after sort columns in physical layout (Phase 1b in reorder_columns) for optimal streaming read
  • Fixes create_nullable_dict_array bug: dictionary keys now correctly index into unique values (was using original array index, causing panics for mixed null/non-null inputs)
  • timeseries_id is mandatory — batches without it are rejected as malformed
  • ParquetWriter::new returns Result — invalid sort fields are a hard error, not a silent degradation
  • Writer sorts with nulls_first=false so sorted_series keys are monotonic with physical sort order

Stacked on top of #6287 (column ordering) and timeseries_id work.

Design

Based on the Sorted Series Column design doc:

Key structure for sort schema [metric_name(0), service(1), ..., host(5), timeseries_id(6)]:

┌──────────┬────────────────┬──────────┬──────────────┬──────────┬─────────────────┐
│ ordinal 0│ "cpu.usage"    │ ordinal 1│ "api"        │ ordinal 6│ timeseries_id   │
│ (u8)     │ (storekey str) │ (u8)     │ (storekey)   │ (u8)     │ (storekey i64)  │
└──────────┴────────────────┴──────────┴──────────────┴──────────┴─────────────────┘

Null tag columns are skipped (no ordinal or value emitted). The ordinal prefix prevents cross-column byte collisions for sparse schemas. timeseries_id is always present as the final discriminator — it is the only guaranteed column that distinguishes series with identical tags.

Nulls sort last (nulls_first=false) in the physical sort order, which matches the key encoding: a skipped (null) ordinal produces a shorter key that compares before a present ordinal, and nulls-last ensures the physical row order agrees.

Test plan

  • 216 tests pass on quickwit-parquet-engine (identity, discrimination, sort-order, null handling, stability, Parquet round-trip, structural ordinal verification, monotonicity, mandatory timeseries_id, proptests)
  • 3 E2E pipeline tests pass on quickwit-indexing (metrics, sketch, file-backed metastore)
  • Clippy clean (zero warnings), formatted, no unused deps
  • License headers pass
  • Docs compile

🤖 Generated with Claude Code

@g-talbot g-talbot changed the base branch from main to gtt/sorted-series-column April 10, 2026 14:12
@g-talbot g-talbot changed the base branch from gtt/sorted-series-column to main April 10, 2026 14:12
@g-talbot g-talbot changed the base branch from main to gtt/sorted-series-column April 10, 2026 14:14
@g-talbot g-talbot force-pushed the gtt/sorted-series-column branch from 60d859c to 9522326 Compare April 10, 2026 14:17
@g-talbot g-talbot force-pushed the gtt/sorted-series-key branch from 53bc3e0 to cb1b4d2 Compare April 10, 2026 14:25
@g-talbot g-talbot force-pushed the gtt/sorted-series-column branch from 9522326 to b0344ba Compare April 10, 2026 14:42
@g-talbot g-talbot force-pushed the gtt/sorted-series-key branch from 9a8b9a5 to 58f4810 Compare April 10, 2026 14:42
@g-talbot g-talbot requested a review from mattmkim April 10, 2026 16:05
Comment thread quickwit/quickwit-parquet-engine/src/sorted_series/mod.rs Outdated
@mattmkim
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 652d128d5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread quickwit/quickwit-parquet-engine/src/sorted_series/mod.rs
Comment thread quickwit/quickwit-parquet-engine/src/sorted_series/mod.rs Outdated
@mattmkim mattmkim force-pushed the gtt/sorted-series-column branch 2 times, most recently from 0a0adb5 to 720f498 Compare April 15, 2026 21:27
@mattmkim
Copy link
Copy Markdown
Contributor

commit history is messed up, pushed existing history to https://github.com/quickwit-oss/quickwit/tree/matthew.kim/gtt/sorted-series-key just in case, going to cherry pick the commits we want

Base automatically changed from gtt/sorted-series-column to main April 15, 2026 22:27
@mattmkim mattmkim force-pushed the gtt/sorted-series-key branch from 652d128 to 069a5a1 Compare April 15, 2026 22:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 069a5a1871

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread quickwit/quickwit-parquet-engine/src/storage/writer.rs
Comment thread quickwit/quickwit-parquet-engine/src/sorted_series/mod.rs Outdated
g-talbot and others added 9 commits April 21, 2026 09:12
Compute a composite, lexicographically sortable binary column
(sorted_series) at Parquet write time using storekey order-preserving
encoding. For each row the key encodes:

  1. Non-null sort schema tag columns as (ordinal: u8, value: str)
  2. timeseries_id (i64) as final discriminator

Identical timeseries always produce identical byte keys regardless of
timestamp or value, enabling DataFusion's streaming AggregateExec and
BoundedWindowAggExec with O(1) memory instead of O(N) hash tables.

Also fixes create_nullable_dict_array which used the original array
index as dictionary key instead of the position in the unique values
array, causing out-of-bounds panics for mixed null/non-null inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without the ordinal, the timeseries_id bytes could collide with a
subsequent tag column's ordinal+string encoding. Every component in
the key now consistently gets an ordinal prefix from its sort schema
position.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tests that assert:
- timeseries_id gets ordinal 6 prefix (its sort schema position)
- key length is exact: ordinal(1) + str(2) + ordinal(1) + i64(8) = 12
- when timeseries_id is absent, no trailing ordinal appears

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Writes a 6-row batch with 4 distinct series (including null tags)
through the ParquetWriter pipeline, reads back, and verifies:

- 4 distinct keys produced (series identity)
- series with 3 rows produces 3 identical keys
- null host differs from present host (ordinal skipping)
- all-null tags differ from partial-null tags
- ordinal bytes are correct (0x00 for metric_name, 0x01 for service,
  0x06 for timeseries_id) even when intermediate tags are null
- equal keys are contiguous after sort (streaming aggregation ready)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Regenerate storekey entry via dd-rust-license-tool (correct authors)
- Fix 4 rustfmt nightly formatting diffs in sorted_series tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot force-pushed the gtt/sorted-series-key branch from 069a5a1 to 7444e26 Compare April 21, 2026 13:14
- Remove sea-query (moved to quickwit-metastore with InsertableParquetSplit)
- Remove tokio dev-dependency (unused)
- Fix extra blank line in writer.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
g-talbot and others added 10 commits April 21, 2026 09:52
…ys, zonemap)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lently skipping

Tag columns in the sort schema must be string-typed (Dict or Utf8).
Previously, UInt8 and unknown types returned None, silently dropping
them from the sorted_series key and reducing discrimination. Now
returns an error for any non-string type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l sort order

encode_row_key skips null columns, producing shorter keys that compare
before longer keys with the same prefix. With nulls_first=true, a row
with a null tag sorted before a non-null tag physically, but its key
compared after — breaking the monotonicity invariant needed for
DataFusion streaming aggregation.

Adds a monotonicity assertion to the integration test: sorted_series
values must be non-decreasing in the writer's physical output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
timeseries_id is the only guaranteed discriminator for series identity.
Without it, different series sharing the same tags (e.g., same metric
and tags but different metric_type) would collapse onto the same
sorted_series key, causing incorrect group merges in streaming
aggregation.

- resolve_key_columns now returns Result and errors if timeseries_id
  is missing from the sort schema or the batch
- encode_row_key takes &KeyColumn (not Option) for timeseries_id
- All test helpers and inline test batches include timeseries_id
- Two tests converted to verify error on missing timeseries_id

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The metrics and sketch pipeline E2E tests construct batches directly
(bypassing OTLP ingest which computes timeseries_id). Now that
sorted_series encoding requires timeseries_id, these test batches
need it too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…elds

Previously, ParquetWriter::new silently degraded to empty sort order on
an unparseable sort_fields string (log + continue). This masked
configuration errors and was inconsistent with prepare_write, which
propagated parse errors from append_sorted_series_column.

Now both paths fail consistently: ParquetWriter::new returns Result and
ParquetSplitWriter::new propagates it. Since sort_fields comes from
ProductType::default_sort_fields (not user input), a parse failure is
a programming error that should surface immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…needs it

The enum and its impl block were added anticipating use by row_keys
and zonemap PRs, but are dead code in this PR. Remove rather than
suppress with #[allow(dead_code)].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@g-talbot g-talbot merged commit 68990db into main Apr 21, 2026
8 checks passed
@g-talbot g-talbot deleted the gtt/sorted-series-key branch April 21, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants