Skip to content

[WIP] Add timeseries / observability benchmarks#9017

Closed
alamb wants to merge 8 commits intoapache:mainfrom
alamb:alamb/ts_bench
Closed

[WIP] Add timeseries / observability benchmarks#9017
alamb wants to merge 8 commits intoapache:mainfrom
alamb:alamb/ts_bench

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Jan 27, 2024

Which issue does this PR close?

Closes #8791

Rationale for this change

There are usecases for several DataFusion users (like IOx) that store observability data, that is often characterized by low cardinality string data encoded as dictionaries. While the current parquet_filter pushdown benchmarks (TODO LINK) cover this example, we don't have an end to end test that does.

This has caused problems when have made changes such as #7647 that should improve the performance of these queries but we had no reproducible way to measure the impact, and couldn't evaluate if the change was beneficial enough to warrant additional code complexity

There in systems such as IOx the data is very often sorted and the sort order is quite important for performance. However, DataFusion's existing benchmark coverage does not have any pre-sorted data

What changes are included in this PR?

  1. Add a datafusion specific data set to to model common patterns in timeseries data -- http access logs / metrics and tracing data specifically. This uses the same generator as used in several other parts of DataFusion
  2. Add a XXX benchmark to dfbench, runnable by bench.sh along with several queries

Are these changes tested?

All tests

Are there any user-facing changes?

No

TODO

  • add ticket / extend to model logging data as well

@github-actions
Copy link
Copy Markdown

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions Bot added the Stale PR has not had any activity for some time label Apr 13, 2024
@github-actions github-actions Bot closed this Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add test coverage for grouping on dictionary encoded columns

1 participant