TestJumpToDateEndpoint: Guard the parallel boundary subtests against millisecond collisions.#872
Open
jevolk wants to merge 1 commit intomatrix-org:mainfrom
Open
Conversation
…t millisecond collisions. Both subtests sample `time.Now()` and immediately probe a homeserver that stamps events using its own millisecond clock; when the two readings land in the same millisecond, `/timestamp_to_event`'s inclusive boundary semantics return an event the subtest does not want. `should_find_event_after_given_timestamp` races the trailing createRoom state event ahead of the message that a forward search is expected to find; `should_find_nothing_before_the_earliest_timestamp` races `m.room.create` against a backward search expected to find nothing. Pause around the relevant samples so the next homeserver stamp lands in a strictly later millisecond. Both flakes have been observed across multiple homeserver implementations on otherwise green runs. Closes matrix-org#868. Signed-off-by: Jason Volk <jason@zemos.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #868.
The two
parallelboundary subtestsshould_find_event_after_given_timestampandshould_find_nothing_before_the_earliest_timestampflake against any homeserver fast enough to handle acreateRoomround-trip inside a single millisecond. The race is in the test, not the implementation:/timestamp_to_eventis millisecond-granular by spec and resolves the boundary inclusively (forward picks the earliest event withts >= query, backward picks the latest withts <= query), so when atime.Now()sample collides with anorigin_server_tsthe homeserver just stamped, the inclusive match returns an event the subtest does not want.should_find_event_after_given_timestampcreateTestRoomreadstime.Now()immediately afterMustCreateRoomreturns. The trailing state event of thepublic_chatpreset's createRoom batch (typicallym.room.guest_access, aftercreate,member/join,power_levels,join_rules,history_visibility) lands in the same millisecond as that sample on a fast deploy. The forward search anchored ontimeBeforeEventAthen iterates events withts >= samplefrom the earliest, hits the trailing state event beforeMessage A, and the assertion fails with the wrongevent_id.should_find_nothing_before_the_earliest_timestampThe subtest samples
timeBeforeRoomCreation := time.Now()and immediately callscreateTestRoom. Them.room.createevent the homeserver writes in response is stamped at the same millisecond. The backward search at that boundary inclusively returns the create event when the test wants 404.Solution
The race direction is different for the two subtests, so the placement of the guard differs:
createTestRoom, the offending event has already been stamped beforetime.Now()is called. The sleep goes before the sample (betweenMustCreateRoomreturning andtimeBeforeEventA := time.Now()), advancing the sample past the trailing state event's millisecond.time.Now(). The sleep goes after the sample (betweentimeBeforeRoomCreation := time.Now()andcreateTestRoom), pushingm.room.createinto a strictly later millisecond.A 2 ms pause is long enough to step past the 1 ms clock resolution exposed on every supported platform and short enough to be invisible against the normal homeserver round-trip. The constant
tsBoundaryGuardand a paragraph above it document the invariant so neither sleep gets removed as cosmetic later.The wire contract is unchanged:
/timestamp_to_eventcontinues to be millisecond-granular and continues to resolve the boundary inclusively. Only the test pacing changes.Empirical evidence
A local soak against tuwunel (
benchbuild,complement_count=N,complement_parallel=1, single docker invocation per batch, freshOldDeployper iteration):should_find_event_after_given_timestampshould_find_nothing_before_the_earliest_timestamp89 flakes vs 0 across the 200 paired trials. The unpatched tester image is the prior
tuwunel-changesHEAD (9e8d447); the patched tester image is the same Complement plus this commit. The testee binary, the homeserver deployment shape (hs1 + hs2 + AS bridge), and the test invocation flags are identical between batches.The same flake fired three times in matrix-construct/tuwunel CI in the days before we routed the two subtests onto a
complement_skipallowlist:25346770228, job74331724982(2026-05-04): both subtests failed.25363319978, job74376393945(2026-05-05):should_find_event_after_given_timestampfailed.25367139913, job74452002935(2026-05-05): both subtests failed.The patch only adjusts
tests/room_timestamp_to_event_test.go. Theshould_not_be_able_to_query_a_*_room_you_are_not_a_member_ofsubtests sample the sametimeBeforeRoomCreationpattern but assert HTTP 403, which the homeserver returns regardless of which event a backward search would surface, so the boundary does not matter there. Theparallel/federation/when_looking_backwards_before_the_room_was_created,_should_be_able_to_find_event_that_was_importedsubtest has the same pattern but its dominant failure mode is federation backfill, not the boundary; the guard is harmless if added later but is out of scope here.The matching guard is backported to the matrix-construct fork at
matrix-construct/complement@28df927on itstuwunel-changesbranch and is wired into matrix-construct/tuwunel CI (complement_refindocker/Dockerfile.complementis bumped to that hash, and the correspondingcomplement_skipentries are dropped) so the patched path is exercised against tuwunel ahead of merge here.Pull Request Checklist