Skip to content

perf(iterator): seek-ahead MVCC version skipping#2303

Open
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:feat/iter-version-skip
Open

perf(iterator): seek-ahead MVCC version skipping#2303
shaunpatterson wants to merge 1 commit into
dgraph-io:mainfrom
shaunpatterson:feat/iter-version-skip

Conversation

@shaunpatterson

Copy link
Copy Markdown

Motivation

In parseItem, the iterator skips versions newer than readTs and older duplicates of an already-returned key one mi.Next() at a time through the whole merge tree (iterator.go). For keys with long version chains — e.g. a frequently-rewritten posting list / counter — this is O(versions scanned) per read, with no seek-ahead.

What changed

Two linear version-stepping loops in parseItem become seeks, for forward + non-AllVersions iteration only:

  1. When the current version is > readTs, Seek(KeyWithTs(userKey, readTs)) jumps straight to the first version <= readTs of the same user key instead of stepping over every too-new version.
  2. After yielding the newest visible version of a key, Seek jumps to the next user key (KeyWithTs(userKey+0x00, MaxUint64)) instead of stepping through every older version.

The seek target userKey+0x00 is the smallest key strictly greater than all versions of userKey and never overshoots prefix-extension keys (for "k1" it still visits "k10", since "k1\x00" < "k10"), because y.CompareKeys compares the user-key portion before the timestamp suffix.

Compatibility & correctness

Pure optimization — iteration results are unchanged. Reverse and AllVersions paths are untouched. A differential test compares seek-skip against forced step-skip byte-for-byte over a matrix of {reverse, AllVersions, prefixes, SinceTs, readTs} with deleted/expired versions and data spread across memtable + LSM levels.

Testing

go build, go vet, full go test . -count=1 green; the differential test passes under -race.

🤖 Generated with Claude Code

https://claude.ai/code/session_01NtGkC4K2J2XYwcAKwjhHbM

Forward, non-AllVersions iteration previously walked the whole version
chain of each user key one mi.Next() at a time: (a) skipping versions
above readTs, and (b) skipping older duplicate versions after yielding a
key. Both are O(versions) linear walks through the merge tree and are
dgraph's dgraph-io#1 read cost on long posting-list version chains.

Replace both with a single Seek:
 - version > readTs: Seek(KeyWithTs(userKey, readTs)) jumps straight to
   the first version <= readTs of the same user key.
 - dedup of older versions: Seek(KeyWithTs(userKey+0x00, MaxUint64))
   jumps to the newest version of the next user key. Appending 0x00 (not
   incrementing the last byte) is required so prefix-extension keys like
   k10 are not overshot when skipping past k1.

Both optimizations are guarded to forward, non-AllVersions iteration;
reverse and AllVersions keep stepping. A forceStepSkip test hook lets a
differential test prove byte-identical iteration vs the step-skip path
over randomized multi-version data (deleted/expired/SinceTs/prefix/
reverse/allVersions, across memtable + LSM levels).

Point Get is already version-safe (db.get seeks to KeyWithTs(key,readTs)
and takes the max version across all tables), so no new method is added;
TestVersionSafePointGet documents and verifies this.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01NtGkC4K2J2XYwcAKwjhHbM
@shaunpatterson shaunpatterson requested a review from a team as a code owner June 22, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant