Skip to content

Feature/query by row#745

Open
hongzhi-gao wants to merge 8 commits intoapache:developfrom
hongzhi-gao:feature/query-by-row
Open

Feature/query by row#745
hongzhi-gao wants to merge 8 commits intoapache:developfrom
hongzhi-gao:feature/query-by-row

Conversation

@hongzhi-gao
Copy link
Contributor

@hongzhi-gao hongzhi-gao commented Mar 17, 2026

Summary

Add queryByRow(paths/table, offset, limit) for both tree and table model. Results are equivalent to “full query then skip first offset rows and take at most limit rows,” but offset/limit are pushed down so that Chunk/Page-level skipping avoids decoding unnecessary data where possible.

Changes

Tree model

  • API: TsFileReader::queryByRow(path_list, offset, limit) / TsFileTreeReader::queryByRow(devices, measurements, offset, limit).
  • Pushdown: Single-path: set_row_range(offset, limit) on SSI → Chunk/Page skipped by count. Multi-path: offset/limit applied in merge loop; min_time_hint used to skip stale Chunks.
  • Tests: Correctness (no offset/limit, offset only, limit only, offset+limit, boundaries, multi-path merge) + QueryByRowFasterThanManualNext (timing: queryByRow faster than full query + manual next, 5% tolerance).

Table model

  • API: TsFileReader::queryByRow(table_name, column_names, offset, limit).
  • Pushdown:
    • Device: skip whole device when remaining_offset >= device_row_count (Dense (in this codebase) means: within one device, every queried column has the same number of rows and the same timestamps.).
    • SSI: when dense, set_row_range(offset, limit) on each column’s SSI → Chunk/Page skip by count.
    • TsBlock: when sparse or not fully consumed at SSI, offset/limit applied in merge loop.
  • Tests: Correctness (single/multi device, offset/limit, boundaries, equivalence with manual skip, SSI-level pushdown) + QueryByRowFasterThanManualNext (same timing check as tree).

Review focus

  • Semantics: queryByRow(offset, limit) matches “full query + skip offset + take limit” (existing equivalence tests).
  • Performance: New timing tests require queryByRow to be no slower than manual next within 5% (min of 5 runs); confirms pushdown is used in practice.

@hongzhi-gao hongzhi-gao deleted the feature/query-by-row branch March 17, 2026 09:23
@hongzhi-gao hongzhi-gao restored the feature/query-by-row branch March 17, 2026 09:24
@hongzhi-gao hongzhi-gao reopened this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant