fix(runs): honor Offset in ListActions so WatchActions loads all child actions#7585
Draft
pingsutw wants to merge 8 commits into
Draft
fix(runs): honor Offset in ListActions so WatchActions loads all child actions#7585pingsutw wants to merge 8 commits into
pingsutw wants to merge 8 commits into
Conversation
…d actions ListActions only implemented keyset (CursorToken) pagination and silently ignored the Offset field. WatchActions.listAndSendAllActions pages by Offset (offset += pageSize), so every iteration ran the identical query and re-read the same first page. Effects: - only the first pageSize (100) actions ever entered the run-state tree, capping children_phase_counts at 100 regardless of the real child count (observed with a 10k-element map task). - for runs with more than one page, len(batch) < pageSize was never true, so the initial-snapshot loop never returned and the live-update loop was never reached. Add the OFFSET clause to ListActions when Offset > 0. Regression test asserts offset shifts the page window and that walking by offset covers all actions and terminates. Signed-off-by: Kevin Su <pingsutw@apache.org>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes WatchActions getting stuck and undercounting children_phase_counts by making the runs action repository honor ListResourceInput.Offset during ListActions pagination.
Changes:
- Add
OFFSETsupport to theListActionsSQL query wheninput.Offset > 0. - Add a regression test ensuring offset-based pagination returns the expected window and can walk the full result set.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
runs/repository/impl/action.go |
Applies OFFSET in the ListActions query so offset pagination actually advances across pages. |
runs/repository/impl/action_test.go |
Adds a regression test validating Offset shifts the page window and supports full traversal. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@gmail.com>
Comment on lines
+355
to
+361
| if input.CursorToken != "" && input.Offset > 0 { | ||
| return nil, fmt.Errorf("CursorToken and Offset are mutually exclusive") | ||
| } | ||
| if input.Offset > 0 { | ||
| queryBuilder.WriteString(" OFFSET ?") | ||
| args = append(args, input.Offset) | ||
| } |
…tions A map task bulk-creates thousands of children with identical created_at. listAndSendAllActions paged the snapshot by Offset sorted on created_at alone; OFFSET over a non-unique ORDER BY has undefined order among ties, so pages overlap/skip and the run-state tree loses rows -- children_phase_counts came up short (e.g. 4.8K instead of 20000). Add a deterministic 'name' tiebreaker (unique within a run) to the sort so offset paging is stable. Signed-off-by: Kevin Su <pingsutw@apache.org>
Comment on lines
1324
to
1327
| SortParameters: []interfaces.SortParameter{ | ||
| impl.NewSortParameter("created_at", interfaces.SortOrderAscending), | ||
| impl.NewSortParameter("name", interfaces.SortOrderAscending), | ||
| }, |
Comment on lines
+349
to
+357
| // Offset-based pagination. Callers either page by CursorToken (keyset) or by | ||
| // Offset; the two are mutually exclusive. | ||
| if input.CursorToken != "" && input.Offset > 0 { | ||
| return nil, fmt.Errorf("CursorToken and Offset are mutually exclusive") | ||
| } | ||
| if input.Offset > 0 { | ||
| queryBuilder.WriteString(" OFFSET ?") | ||
| args = append(args, input.Offset) | ||
| } |
Comment on lines
+521
to
+523
| assert.Len(t, seen, total, "offset paging over tied created_at must cover all actions exactly once") | ||
| assert.True(t, sort1.IsSorted(sort1.StringSlice(ordered)), | ||
| "the name tiebreaker must give a total order so pages don't overlap/skip") |
2 tasks
listAndSendAllActions now passes a (created_at ASC, name ASC) sort; update TestListAndSendAllActionsUsesAscendingSort to assert both parameters. Signed-off-by: Kevin Su <pingsutw@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
WatchActionsreturnedchildren_phase_countscapped at 100 no matter the real child count (reproduced with a 10,000-element map task).Root cause is a pagination contract mismatch:
WatchActions.listAndSendAllActions(runs/service/run_service.go:1262) loads the initial snapshot by offset paging:ListActions(runs/repository/impl/action.go) only implemented keyset paging (CursorToken) and silently ignoredOffset— there was noOFFSETclause in the query.So every iteration ran the identical query and re-read the same first page. Consequences:
pageSize(100) actions ever entered the run-state tree, soattachChild/ChildPhaseCountsonly ever counted 100 children →children_phase_countscapped at 100.len(batch) < pageSizewas never true, solistAndSendAllActionsnever returned and the live-update loop (updatesCh) was never reached — the watch got stuck replaying the first page. (Runs with <100 actions worked, which is why this only showed up at scale.)The
ListResourceInputstruct already documentsOffsetas a supported, cursor-exclusive option (runs/repository/interfaces/common.go); it just wasn't wired into the query.What changes were proposed in this pull request?
Add the
OFFSETclause toListActionswhenOffset > 0:This makes the existing offset loop walk all actions and terminate. (Longer term the snapshot loop could move to the keyset
CursorTokenthe repo also supports, but that needs the cursor direction fixed for theASCsort plus a(created_at, name)tiebreaker — out of scope here.)How was this patch tested?
Added
TestListActionsOffsetPagination(runs/repository/impl/action_test.go): creates 150 root actions and asserts thatOffset=50starts atr050(notr000), andVerified it fails without the fix (the
Offset=50page returns the first page →r000 != r050) and passes with it. Fullruns/repository/implsuite passes.Labels
Check all the applicable boxes