Is your feature request related to a problem or challenge?
This pattern is common:
SELECT c1, c2
FROM t
ORDER BY c3
LIMIT 10
For example we have queries in IOx like the following (this is the same pattern @NGA-TRAN describes on #7162)
SELECT tag, value1, ...
FROM t
WHERE other_column = 'foo'
ORDER BY time
LIMIT 10
Describe the solution you'd like
If the data IS NOT already sorted, what happens today is a plan like
LIMIT(fetch=10)
SORT(sort_exprs=[c3] fetch=10)
SCAN(...)
And the Sort can take partial advantage of the fetch -- and it will be better after @gruuya 's change in #7180
We can probably do better still with a special operator like the following that uses some specialized structure (perhaps some type of heap)
TOPK(fetch=10, sort_exprs=[c3])
SCAN(...)
Describe alternatives you've considered
If the data is already sorted the right way, DataFusion can just read first N values and stop as described on #7162
Additional context
No response
Is your feature request related to a problem or challenge?
This pattern is common:
For example we have queries in IOx like the following (this is the same pattern @NGA-TRAN describes on #7162)
Describe the solution you'd like
If the data IS NOT already sorted, what happens today is a plan like
And the Sort can take partial advantage of the fetch -- and it will be better after @gruuya 's change in #7180
We can probably do better still with a special operator like the following that uses some specialized structure (perhaps some type of heap)
Describe alternatives you've considered
If the data is already sorted the right way, DataFusion can just read first N values and stop as described on #7162
Additional context
No response