Skip to content

Add defensive sequence number validation in replicated event persistence#2815

Draft
He-Pin wants to merge 1 commit intoapache:mainfrom
He-Pin:improve/validate-replica-seqnr
Draft

Add defensive sequence number validation in replicated event persistence#2815
He-Pin wants to merge 1 commit intoapache:mainfrom
He-Pin:improve/validate-replica-seqnr

Conversation

@He-Pin
Copy link
Copy Markdown
Member

@He-Pin He-Pin commented Mar 28, 2026

Motivation

When handleExternalReplicatedEventPersist receives a replicated event, it updates seenPerReplica with the event's originSequenceNr without validating it matches the expected next sequence number. This was flagged as a FIXME referencing akka#29259.

If events arrive out of order (e.g., seqNr 7 arrives when expecting 6), seenPerReplica advances past the gap, and the missed event (seqNr 6) will later be filtered as "already seen" — permanently lost.

Modification

  • Replaced the FIXME with a proper defensive validation check
  • Added warning log when originSequenceNr != expectedSeqNr for the replica
  • Documented the caller contracts explaining why only gap events (not duplicates) can reach this validation:
    • onPublishedEvent: Rejects both duplicates and gaps before calling — only exact-match seqNr passes
    • onReplicatedEvent: Filters duplicates via alreadySeen(), but gaps may pass through
  • Preserved backward compatibility: Events are still persisted even on mismatch (rejecting could stall replication)

Design Decision: Why != instead of separate < and > branches

Analysis confirmed that:

  1. The < expectedSeqNr (duplicate) branch would be dead code — callers already filter duplicates
  2. Only the > expectedSeqNr (gap) scenario can actually fire
  3. Using a single != check avoids dead code while remaining defensive against future caller changes

Result

  • Operators now get actionable warning logs when sequence number gaps occur in replication
  • The warning message includes the expected vs actual seqNr and notes that earlier events may be skipped
  • No behavior change for normal operation (exact-match events processed silently)
  • All 18 replicated event tests pass (ReplicatedEventSourcingSpec + ReplicatedEventPublishingSpec)

References

  • Resolves FIXME from akka#29259 (which is now Apache licensed)

Replace the FIXME (akka#29259) comment in handleExternalReplicatedEventPersist
with a proper defensive validation of the replica sequence number before
updating seenPerReplica.

The validation logs a warning when the incoming event's originSequenceNr
does not match the expected next sequence number for that replica. This
covers the gap scenario where events from a replica may arrive out of
order via the replication stream (onReplicatedEvent path). The event is
still persisted for backward compatibility — rejecting it could stall
the replication stream.

Key design decisions (confirmed by cross-review from GPT-5.4 and Sonnet 4.6):
- Only gap detection (seqNr > expected) can fire from current callers;
  onPublishedEvent filters both duplicates and gaps before calling.
  onReplicatedEvent filters duplicates via alreadySeen() but allows gaps.
- Uses != check (not separate < and > branches) to avoid dead code.
- Warning message includes the advancing seqNr to help operators diagnose
  potential event loss.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@He-Pin He-Pin force-pushed the improve/validate-replica-seqnr branch from afbf92c to 14f3ab7 Compare March 28, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant