Refactor transaction timestamp management for cross shard data by bootjp · Pull Request #313 · bootjp/elastickv

bootjp · 2026-02-15T11:42:08Z

No description provided.

gemini-code-assist · 2026-02-15T11:42:38Z

Summary of Changes

Hello @bootjp, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the core transaction timestamp management within the system, enabling robust distributed transaction capabilities. It introduces explicit phases for transactions (prepare, commit, abort) and manages their state using dedicated metadata and internal keys. The changes ensure consistent timestamping across transactional operations and provide mechanisms for automatic lock resolution during data access, significantly enhancing the system's transactional integrity and scalability.

Highlights

Distributed Transaction Support: Introduced comprehensive support for distributed transactions across multiple shards, including prewrite, commit, and abort phases, orchestrated by the ShardedCoordinator.
Refactored Timestamp Management: Significantly refactored how transaction timestamps are managed, separating raw request timestamping from transactional timestamping, and introducing explicit commit and abort timestamps.
Transaction Metadata and Internal Keys: Implemented new data structures (TxnMeta, txnLock, txnIntent) and encoding/decoding logic for transaction metadata, along with a system of internal keys (!txn|lock|, !txn|int|, etc.) to manage transaction state within the key-value store.
MVCC Store Enhancements: Updated the MVCC store to correctly handle versioned values, ensuring they are inserted in sorted order by timestamp and improving the alignCommitTS logic.
Transaction Lock Resolution: Added logic to ShardStore to automatically resolve transaction locks during reads and scans, determining the transaction's status (committed, rolled back, or pending) and applying the appropriate resolution.

Changelog

adapter/internal.go
- Imported the bytes package.
- Refactored stampTimestamps into stampRawTimestamps and stampTxnTimestamps for clearer timestamp assignment logic.
- Added helper functions forwardedTxnStartTS, forwardedTxnMetaMutation, and fillForwardedTxnCommitTS to manage transaction metadata and commit timestamps for forwarded requests.
kv/coordinator.go
- Imported the sort package.
- Modified dispatchTxn to incorporate primary key, commit timestamp, and lock TTL into transaction requests.
- Updated redirect to use the newly introduced primaryKeyForElems for transaction requests.
- Refactored txnRequests to construct prepare and commit mutations with transaction metadata, primary key, and commit timestamp.
- Added primaryKeyForElems function to determine the primary key from a list of elements.
- Added bytesCompare utility function for byte slice comparison.
kv/fsm.go
- Imported the bytes package.
- Modified Apply to correctly extract the commit timestamp from transaction metadata for commit and abort phases.
- Added validation in handleRawRequest to prevent mutations of transaction-internal keys and check for conflicting locks.
- Updated handleCommitRequest and handleAbortRequest to utilize transaction metadata and pass commitTS for aborts.
- Introduced uniqueMutations to filter duplicate mutations within a request.
- Added buildPrepareStoreMutations, buildCommitStoreMutations, and buildAbortCleanupStoreMutations to construct store-level mutations for different transaction phases.
- Implemented appendRollbackRecord to record transaction rollbacks.
- Added numerous helper functions for transaction-specific operations, including prepareTxnMutation, txnIntentFromPBMutation, txnCleanupMutations, txnLockForCommit, txnIntentForCommit, storeMutationForIntent, commitTxnKeyMutations, shouldClearAbortKey, splitTxnMeta, and assertNoConflictingTxnLock.
kv/fsm_occ_test.go
- Updated a test case to align with the new transaction prepare phase and metadata structure, changing a commit attempt to a prewrite attempt.
kv/hlc_wall.go
- Added hlcWallNow function to retrieve the current wall clock time.
- Added hlcWallFromNowMs function to calculate a future timestamp based on a millisecond delta.
kv/shard_key.go
- Modified routeKey to extract the embedded logical user key from transaction-internal keys for correct shard routing.
kv/shard_store.go
- Refactored GetAt to use localGetAt and leaderGetAt for improved read path logic.
- Added isVerifiedRaftLeader helper to check for verified Raft leadership.
- Introduced leaderGetAt to handle transaction lock resolution for non-transactional keys during leader reads.
- Added localGetAt for direct access to the underlying store.
- Modified scanRouteAt to apply resolveScanLocks to scan results.
- Implemented maybeResolveTxnLock to resolve transaction locks during reads.
- Added resolveScanLocks to filter transaction-internal keys and resolve locks during scans.
- Introduced txnStatus enum and primaryTxnStatus function to determine the state of a primary transaction.
- Added txnCommitTS, hasTxnRollback, and loadTxnLock for querying transaction records.
- Implemented bestEffortAbortPrimary and applyTxnResolution for transaction cleanup and resolution.
- Added cleanupTS to determine an appropriate timestamp for transaction cleanup.
kv/sharded_coordinator.go
- Imported the bytes package.
- Modified Dispatch to handle transactional requests via a new dispatchTxn method.
- Added dispatchTxn to orchestrate distributed transactions through prewrite, primary commit, and secondary commit phases.
- Introduced preparedGroup struct to track prepared transaction groups.
- Added prewriteTxn for the prepare phase of distributed transactions.
- Added commitPrimaryTxn for committing the primary key of a distributed transaction.
- Added commitSecondaryTxns for committing secondary keys across shards.
- Implemented abortPreparedTxn for rolling back prepared transactions.
- Added txnGroupForID to retrieve a transaction group by ID.
- Added nextTxnTSAfter to generate the next transaction timestamp after a given start timestamp.
- Added abortTSFrom to derive an abort timestamp from a commit timestamp.
- Added txnMetaMutation to create transaction metadata mutations.
- Added engineGroupIDForKey to determine the group ID for a given key.
- Updated txnLogs to handle single-shard transactions using the new metadata structure.
- Added primaryKeyFromMutations to identify the primary key from a set of mutations.
- Added keyMutations to extract unique keys from mutations.
kv/sharded_integration_test.go
- Updated TestShardedCoordinatorDispatch_RejectsCrossShardTxn to reflect the new distributed transaction behavior, asserting the outcome of a cross-shard transaction rather than its rejection.
kv/transaction.go
- Modified Commit to conditionally attempt transactional cleanup (Abort) only for actual transactional batches.
- Modified Abort to properly construct abort requests using extracted transaction metadata and a derived abort timestamp.
- Added extractTxnMeta function to parse transaction metadata from a list of mutations.
kv/txn_codec.go
- Added TxnMeta struct for transaction metadata and implemented its EncodeTxnMeta and DecodeTxnMeta functions.
- Added txnLock struct for transaction locks and implemented its encodeTxnLock and decodeTxnLock functions.
- Added txnIntent struct for transaction intents and implemented its encodeTxnIntent and decodeTxnIntent functions.
- Implemented encodeTxnCommitRecord and decodeTxnCommitRecord for transaction commit records.
- Implemented encodeTxnRollbackRecord for transaction rollback records.
- Added u64ToInt helper function for safe uint64 to int conversion.
kv/txn_consts.go
- Added constants for defaultTxnLockTTLms and various mutation factors for transaction prepare, commit, and abort phases.
kv/txn_errors.go
- Added new error types specific to transaction operations, such as ErrTxnMetaMissing, ErrTxnInvalidMeta, ErrTxnLocked, ErrTxnCommitTSRequired, ErrTxnAlreadyCommitted, and ErrTxnPrimaryKeyRequired.
kv/txn_keys.go
- Defined prefixes for transaction-internal keys (txnLockPrefix, txnIntentPrefix, txnCommitPrefix, txnRollbackPrefix, txnMetaPrefix).
- Implemented functions to construct transaction-internal keys (txnLockKey, txnIntentKey, txnCommitKey, txnRollbackKey).
- Added isTxnInternalKey and isTxnMetaKey functions to identify transaction-related keys.
- Added txnRouteKey to extract the logical user key from transaction-internal keys for routing purposes.
store/lsm_store.go
- Removed logic in alignCommitTS that forced the commit timestamp to be strictly greater than the last commit timestamp plus one.
store/mvcc_store.go
- Imported the sort package.
- Modified putVersionLocked and deleteVersionLocked to use insertVersionSorted for maintaining sorted versions.
- Added insertVersionSorted function to insert versioned values into a sorted slice.
- Modified alignCommitTS to update lastCommitTS only if the new commitTS is greater.
store/mvcc_store_prop_test.go
- Updated a property test for Delete to ensure the deletion timestamp (delTS) is strictly after the actual put timestamp (actualPutTS).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a major and impressive refactoring to support distributed transactions using a 2PC-like protocol. The changes are extensive, touching coordination, state machine logic, and storage layers. The introduction of explicit transaction metadata, locks, and intents is well-structured into new txn_* files. The overall design moves timestamp management to the coordinator level, which is a solid choice for a distributed system.

I've found a few areas for improvement:

Some code duplication that can be refactored.
A couple of performance issues, one of which is significant for scans.
A potential bug in primary key selection logic.

Addressing these points will make the new transaction implementation more robust and performant. Great work on this complex feature.

kv/shard_store.go

kv/sharded_coordinator.go

adapter/internal.go

kv/coordinator.go

kv/fsm.go

Copilot

Pull request overview

This PR refactors transaction timestamp handling and adds MVCC-level plumbing for transactional metadata/locks so commits/aborts can carry an explicit commit timestamp and reads/scans can resolve transactional locks.

Changes:

Remove “auto-bump” commit timestamp alignment in MVCC stores and keep MVCC versions ordered by timestamp.
Introduce transaction internal keys/codecs (meta/lock/intent/commit/rollback) and propagate commit timestamps through coordinator/FSM paths.
Add shard-store read/scan lock-resolution behavior and update integration/property tests for the new semantics.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
store/mvcc_store_prop_test.go	Ensures delete timestamps are strictly after puts in property tests.
store/mvcc_store.go	Keeps per-key version lists sorted; adjusts commitTS alignment behavior.
store/lsm_store.go	Removes commitTS auto-bump; relies on max-tracking for LastCommitTS.
kv/txn_keys.go	Adds txn internal key prefixes, key builders, and routing extraction.
kv/txn_errors.go	Adds typed txn errors for meta/locking/commitTS validation.
kv/txn_consts.go	Adds txn constants (TTL default, mutation sizing factors).
kv/txn_codec.go	Adds encoding/decoding for txn meta, locks, intents, and records.
kv/transaction.go	Updates commit/abort behavior and txn meta extraction for cleanup.
kv/sharded_integration_test.go	Updates cross-shard txn expectations (now succeeds) but needs renaming.
kv/sharded_coordinator.go	Implements distributed txn dispatch (prewrite/commit/abort helpers).
kv/shard_store.go	Adds read-time lock resolution and filters txn-internal keys from scans.
kv/shard_key.go	Extends routeKey() to route txn-internal keys by embedded user key.
kv/hlc_wall.go	Adds wall-time helpers for TTL/cleanup timestamp derivation.
kv/fsm_occ_test.go	Updates OCC conflict test to match prepare-phase conflicts.
kv/fsm.go	Implements prepare/commit/abort txn phases with locks/intents and commitTS enforcement.
kv/coordinator.go	Refactors txn request construction to include meta + explicit commitTS.
adapter/internal.go	Centralizes timestamp stamping for forwarded raw/txn requests and fills commitTS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T11:49:46Z

kv/shard_store.go

+func (s *ShardStore) maybeResolveTxnLock(ctx context.Context, g *ShardGroup, key []byte, readTS uint64) error {
+	// Only consider locks visible at the read timestamp.
+	lockBytes, err := g.Store.GetAt(ctx, txnLockKey(key), readTS)
+	if err != nil {
+		if errors.Is(err, store.ErrKeyNotFound) {


The new lock-resolution path introduced here (maybeResolveTxnLock and related helpers) changes observable read semantics (auto-resolving committed/rolled-back txns, returning ErrTxnLocked, filtering txn-internal keys on scans), but there are no focused tests exercising these behaviors. Please add unit/integration coverage for pending-lock reads, commit-driven resolution, and scan filtering of !txn|... keys.

kv/sharded_integration_test.go

kv/shard_store.go

kv/transaction.go

adapter/internal.go

kv/sharded_coordinator.go

bootjp · 2026-02-15T15:54:48Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a major refactoring to implement a distributed transaction protocol, likely based on Percolator. The changes are extensive, touching coordination, state machine logic, storage, and adapters. The introduction of 2-phase commit (prewrite/commit), transaction metadata, lock resolution on read, and sharded transaction coordination is a significant step towards a robust distributed KV store. The code is well-structured with new concepts cleanly separated into new files (e.g., txn_codec.go, txn_keys.go). My review includes one suggestion for a minor performance improvement in the primary key selection logic.

kv/coordinator.go

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T15:59:50Z

kv/sharded_coordinator.go

+
+func (c *ShardedCoordinator) nextTxnTSAfter(startTS uint64) uint64 {
+	if c.clock == nil {
+		return startTS + 1


The function nextTxnTSAfter doesn't check for overflow when c.clock is nil. If startTS equals ^uint64(0) (the maximum uint64 value), then startTS + 1 will overflow to 0. This could cause issues when comparing timestamps. Consider adding an overflow check similar to the one in abortTSFrom or documenting that timestamps near uint64 max are not supported.

Suggested change

return startTS + 1

nextTS := startTS + 1

if nextTS == 0 {

return startTS

}

return nextTS

kv/sharded_coordinator.go

Copilot · 2026-02-15T15:59:51Z

adapter/internal.go

+}
+
+func (i *Internal) fillForwardedTxnCommitTS(reqs []*pb.Request, startTS uint64) {
+	const metaPrefix = "!txn|meta|"


The transaction metadata prefix is duplicated here as a local constant. This constant is already defined in kv/txn_keys.go as txnMetaPrefix = "!txn|meta|". Consider importing and using the constant from kv package instead of duplicating it to maintain consistency and avoid potential issues if the prefix changes.

bootjp · 2026-02-15T16:23:37Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a major refactoring of the transaction handling mechanism, moving from a simple optimistic concurrency control model to a full-fledged two-phase commit (2PC) protocol to support cross-shard transactions. This is a significant and complex change that touches many parts of the system, including the coordinator, the Raft FSM, the storage layer, and the Redis adapter.

The implementation appears to be well-structured and correct, introducing new concepts like transaction metadata, locks, intents, and commit/rollback records. The logic for the 2PC coordinator, read-time lock resolution, and timestamp management seems sound. The code is well-modularized with new files for codecs, key formats, and constants, which improves maintainability.

I have one suggestion regarding a potential performance issue in the scan lock resolution logic, which could lead to an N+1 query problem. Otherwise, this is an excellent piece of work.

kv/shard_store.go

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T16:31:27Z

kv/sharded_coordinator.go

+func abortTSFrom(commitTS uint64) uint64 {
+	abortTS := commitTS + 1
+	if abortTS == 0 {
+		return commitTS
+	}
+	return abortTS


abortTSFrom returns commitTS when commitTS+1 overflows, which can yield abortTS == commitTS and potentially <= startTS. That makes ABORT requests invalid (kvFSM requires abortTS > startTS) and can strand locks. Prefer to detect overflow and return an explicit error (or choose a different cleanup timestamp strategy) instead of returning an invalid/ambiguous abort timestamp.

Copilot · 2026-02-15T16:31:28Z

kv/coordinator.go

 	var requests []*pb.Request
 	if reqs.IsTxn {
-		requests = txnRequests(reqs.StartTS, reqs.Elems)
+		primary := primaryKeyForElems(reqs.Elems)


In the follower redirect path, a txn with all-empty keys will generate a TxnMeta with an empty PrimaryKey and forward it to the leader. The leader will later fail with ErrTxnPrimaryKeyRequired, but the leader path returns that error earlier. To keep behavior consistent (and avoid forwarding obviously-invalid txns), validate that primaryKeyForElems(reqs.Elems) is non-empty here and return ErrTxnPrimaryKeyRequired/ErrInvalidRequest before calling txnRequests.

Suggested change

primary := primaryKeyForElems(reqs.Elems)

primary := primaryKeyForElems(reqs.Elems)

if len(primary) == 0 {

return nil, errors.WithStack(ErrTxnPrimaryKeyRequired)

}

store/mvcc_store.go

kv/sharded_coordinator.go

bootjp · 2026-02-16T18:59:36Z

/gemini review

gemini-code-assist

Code Review

The pull request refactors transaction timestamp management to support cross-shard data using a 2PC-like approach with locks and intents (Percolator-style). This is a significant improvement over the previous OCC-only model, providing better consistency guarantees for distributed operations. The implementation of lock resolution at read time and the distributed coordinator logic are well-structured. However, there are some opportunities for optimization in the lock resolution path during scans and a potential correctness issue regarding ts=0 handling in proxied reads that should be addressed to ensure reliability on follower nodes.

gemini-code-assist · 2026-02-16T19:04:07Z

kv/shard_store.go

+	locked, err := scanKeyLockedAt(ctx, g, kvp.Key, ts)
+	if err != nil {
+		return nil, false, err
+	}
+	if !locked {
+		return kvp, false, nil
+	}
+
+	v, err := s.leaderGetAt(ctx, g, kvp.Key, ts)


There is a redundant storage lookup here. scanKeyLockedAt performs a GetAt on the transaction lock key. If a lock is found, leaderGetAt is called, which eventually calls maybeResolveTxnLock, which performs the exact same GetAt on the lock key to retrieve the lock metadata. This doubles the number of storage lookups for every locked key encountered during a scan. Consider refactoring scanKeyLockedAt to return the lock data if present, or merging the logic into resolveScanKVP.

gemini-code-assist · 2026-02-16T19:04:07Z

kv/sharded_coordinator.go

+		primaryKey := primaryKeyFromMutations(muts)
+		if len(primaryKey) == 0 {
+			return nil, errors.WithStack(ErrTxnPrimaryKeyRequired)
+		}
 		logs = append(logs,
-			&pb.Request{IsTxn: true, Phase: pb.Phase_PREPARE, Ts: startTS, Mutations: muts},
-			&pb.Request{IsTxn: true, Phase: pb.Phase_COMMIT, Ts: startTS, Mutations: muts},
+			&pb.Request{
+				IsTxn: true,
+				Phase: pb.Phase_PREPARE,
+				Ts:    startTS,
+				Mutations: append([]*pb.Mutation{
+					{Op: pb.Op_PUT, Key: []byte(txnMetaPrefix), Value: EncodeTxnMeta(TxnMeta{PrimaryKey: primaryKey, LockTTLms: defaultTxnLockTTLms, CommitTS: 0})},
+				}, muts...),
+			},
+			&pb.Request{
+				IsTxn: true,
+				Phase: pb.Phase_COMMIT,
+				Ts:    startTS,
+				Mutations: append([]*pb.Mutation{
+					{Op: pb.Op_PUT, Key: []byte(txnMetaPrefix), Value: EncodeTxnMeta(TxnMeta{PrimaryKey: primaryKey, LockTTLms: 0, CommitTS: commitTS})},
+				}, keyMutations(muts)...),


In buildTxnLogs, the code performs two separate passes over the mutations: one in primaryKeyFromMutations and another in keyMutations. Both functions use a map to deduplicate keys. These can be combined into a single pass to reduce overhead, especially for large transaction batches.

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T19:07:09Z

kv/fsm.go

+	seen := make(map[string]struct{}, len(muts))
+	out := make([]*pb.Mutation, 0, len(muts))
+	for _, mut := range muts {
+		if mut == nil || len(mut.Key) == 0 {
+			return nil, errors.WithStack(ErrInvalidRequest)
+		}
+		keyStr := string(mut.Key)
+		if _, ok := seen[keyStr]; ok {
+			continue
+		}
+		seen[keyStr] = struct{}{}
+		out = append(out, mut)
+	}


uniqueMutations silently drops subsequent mutations for the same key (it keeps the first occurrence). If a transactional batch contains multiple operations on the same key (e.g., Put then Del), the later operation will be ignored, leading to incorrect committed state. Consider either rejecting duplicate keys in a txn with a clear error, or preserving the last mutation per key ("last write wins") when deduplicating.

Copilot · 2026-02-16T19:07:10Z

kv/fsm.go

+		// Already resolved (committed/rolled back).
+		return nil, nil
+	}
+	if lock.StartTS != startTS {


commitTxnKeyMutations only validates lock.StartTS but ignores lock.PrimaryKey, so a COMMIT request could accidentally (or via malformed input) commit/clean up a lock that belongs to a different primary as long as startTS matches. Since the lock encoding includes PrimaryKey, consider passing the expected primary key into commitTxnKeyMutations (and similarly for abort cleanup) and verifying it matches before applying/cleaning up the lock/intent.

Suggested change

if lock.StartTS != startTS {

if lock.StartTS != startTS || !bytes.Equal(lock.PrimaryKey, key) {

gemini-code-assist

Code Review

This pull request is a significant and impressive refactoring of the transaction system, moving from a simple single-node OCC model to a full-fledged, sharded 2PC (Percolator-style) distributed transaction protocol. The changes are extensive, touching everything from the storage layer and FSM to the coordinator and client-facing adapters.

The new implementation appears robust and well-thought-out, correctly handling timestamp management with HLC, the two-phase commit protocol, and read-time lock resolution for failed or pending transactions. The addition of detailed tests for the new complex logic is also commendable.

I have one suggestion for improving robustness in the single-node coordinator's timestamp logic, which aligns with the principles of MVCC for consistent snapshot views. Overall, this is an excellent piece of work that greatly enhances the system's capabilities.

gemini-code-assist · 2026-02-16T20:07:39Z

kv/coordinator.go

+	commitTS := c.clock.Next()
+	if commitTS <= startTS {
+		c.clock.Observe(startTS)
+		commitTS = c.clock.Next()
+	}


The logic to determine the commitTS is mostly correct, ensuring it's greater than startTS. However, there's a rare edge case where c.clock.Next() after c.clock.Observe(startTS) might still not produce a timestamp greater than startTS (e.g., if the logical component of the HLC overflows).

The sharded coordinator (kv/sharded_coordinator.go) includes a final defensive check for this. For consistency and improved robustness, I suggest adding a similar check here.

Suggested change

commitTS := c.clock.Next()

if commitTS <= startTS {

c.clock.Observe(startTS)

commitTS = c.clock.Next()

}

commitTS := c.clock.Next()

if commitTS <= startTS {

c.clock.Observe(startTS)

commitTS = c.clock.Next()

}

if commitTS <= startTS {

return nil, errors.WithStack(ErrTxnCommitTSRequired)

}

References

In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view. The proposed check for commitTS > startTS is crucial for maintaining the integrity and correct ordering of timestamps within an MVCC transaction, ensuring atomicity and a consistent snapshot view.

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T20:11:34Z

kv/shard_store.go

+func (s *ShardStore) planScanLockItem(ctx context.Context, g *ShardGroup, ts uint64, plan *scanLockPlan, kvp *store.KVPair) error {
+	if kvp == nil || isTxnInternalKey(kvp.Key) {
+		plan.items = append(plan.items, scanItem{skip: true})
+		return nil
+	}


ScanAt lock handling skips all txn-internal keys from the underlying store scan. That means pending locks for keys with no previously committed user-key version can be missed (the scan may return only the !txn|lock|... entry, which gets skipped), so ScanAt may incorrectly return no error / incomplete results instead of ErrTxnLocked or resolving the lock. Consider detecting txn lock keys in the scan results (e.g., handle the !txn|lock| prefix by decoding the lock and mapping it back to the embedded user key) or performing a separate lock-key scan over the same range before filtering internal keys.

Copilot · 2026-02-16T20:11:35Z

adapter/redis.go

+	if resp.Value == nil {
+		return nil, errors.WithStack(store.ErrKeyNotFound)
+	}


RawGet/tryLeaderGetAt currently interprets a nil resp.Value as "key not found". In proto3, an explicitly-set empty byte slice and an unset bytes field are indistinguishable on the wire, so this can make it impossible to reliably store/read empty values across gRPC (an empty value may round-trip as nil and be treated as not found). If empty values should be supported, consider adding an explicit "exists" (or "found") flag to RawGetResponse / GetResponse or returning a NotFound gRPC status instead of encoding non-existence as a nil bytes field.

Suggested change

if resp.Value == nil {

return nil, errors.WithStack(store.ErrKeyNotFound)

}

bootjp · 2026-02-16T20:28:42Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a major refactoring of the transaction management system, moving from a simple optimistic concurrency control (OCC) model to a more robust two-phase commit (2PC) protocol. This change is crucial for supporting cross-shard transactions and ensuring data consistency in a distributed environment. The changes are extensive, touching coordination, state machine logic, storage adapters, and introducing new concepts like transaction metadata, locks, and intents. The new implementation appears to be well-thought-out, with careful handling of timestamps (using an HLC), lock resolution on read, and proper routing of transactional operations. New tests have been added to cover the new 2PC logic, including overflow conditions and multi-key transaction semantics. My main feedback is a minor suggestion to refactor some duplicated code for improved maintainability. Overall, this is a significant and well-executed improvement to the system's transactional capabilities.

gemini-code-assist · 2026-02-16T20:31:19Z

kv/sharded_coordinator.go

+func keyMutations(muts []*pb.Mutation) []*pb.Mutation {
+	out := make([]*pb.Mutation, 0, len(muts))
+	seen := map[string]struct{}{}
+	for _, m := range muts {
+		if m == nil || len(m.Key) == 0 {
+			continue
+		}
+		k := string(m.Key)
+		if _, ok := seen[k]; ok {
+			continue
+		}
+		seen[k] = struct{}{}
+		out = append(out, &pb.Mutation{Op: pb.Op_PUT, Key: m.Key})
+	}
+	return out
 }


The logic in this function for iterating through mutations and collecting unique keys is very similar to the logic in primaryKeyAndKeyMutations. This code duplication could be reduced to improve maintainability. Consider extracting a common helper function or refactoring one function to call the other.

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T20:35:11Z

kv/shard_store.go

 	if g.Raft == nil {
 		kvs, err := g.Store.ScanAt(ctx, start, end, limit, ts)
 		if err != nil {
 			return nil, errors.WithStack(err)
 		}
-		return kvs, nil
+		// Keep ScanAt behavior consistent even when running without raft.
+		return filterTxnInternalKVs(kvs), nil
 	}

 	// Reads should come from the shard's leader to avoid returning stale or
 	// incomplete results when this node is a follower for a given shard.
-	if g.Raft.State() == raft.Leader {
-		if err := g.Raft.VerifyLeader().Error(); err == nil {
-			kvs, err := g.Store.ScanAt(ctx, start, end, limit, ts)
-			if err != nil {
-				return nil, errors.WithStack(err)
-			}
-			return kvs, nil
-		}
+	if isVerifiedRaftLeader(g.Raft) {
+		return s.scanRouteAtLeader(ctx, g, start, end, limit, ts)
 	}

 	kvs, err := s.proxyRawScanAt(ctx, g, start, end, limit, ts)
 	if err != nil {
 		return nil, err
 	}
-	return kvs, nil
+	// The leader's RawScanAt is expected to perform lock resolution and filtering
+	// via ShardStore.ScanAt, so avoid N+1 proxy gets here.
+	return filterTxnInternalKVs(kvs), nil
+}


ScanAt filters txn-internal keys after calling the underlying store's ScanAt with the user-provided limit. Because the underlying ScanAt counts internal keys toward the limit, filtering afterward can return fewer than limit user keys (or even an empty result if the first N keys are internal). Consider over-fetching (loop/paginate until you collect limit non-internal keys or exhaust the range) or moving the filtering into the store scan so internal keys don't consume the limit.

kv/shard_store.go

proto/service.pb.go

proto/service_grpc.pb.go

bootjp · 2026-02-16T21:14:15Z

/gemini review

gemini-code-assist

Code Review

This is a substantial pull request that introduces distributed transactions across shards using a 2PC-like protocol. The changes are extensive and touch many parts of the system, from the gRPC layer to the storage FSM. The implementation of the 2PC protocol, including prewrite/commit phases and read-time lock resolution, appears well-thought-out and is a significant step forward for the system's capabilities. The addition of comprehensive tests for the new transaction logic is also commendable.

My review focuses on a couple of areas for potential improvement regarding code duplication and the retry strategy for secondary commits. Overall, this is a very impressive piece of work.

kv/sharded_coordinator.go

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T21:20:36Z

kv/shard_store.go

+func cleanupTS(startTS uint64) uint64 {
+	now := hlcWallNow()
+	next := startTS + 1
+	if now > next {
+		return now
+	}
+	return next


cleanupTS can overflow when startTS==^uint64(0), returning a value (hlcWallNow) that may be <= startTS. That makes ABORT resolution requests invalid (kvFSM requires resolveTS > startTS), potentially leaving locks permanently unresolved. Consider handling the overflow explicitly (e.g., detect startTS==max and return 0/error, or reuse abortTSFrom to compute a strictly-greater timestamp) so lock cleanup never emits a non-monotonic resolve timestamp.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-16T21:21:44Z

@bootjp I've opened a new pull request, #314, to work on those changes. Once the pull request is ready, I'll request review from you.

Replace cleanupTS with abortTSFrom to handle overflow when startTS==^uint64(0). Add explicit overflow checks to prevent invalid abort timestamps that violate the FSM requirement that resolveTS > startTS, which could leave locks permanently unresolved. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Pass commitTS from primaryTxnStatus to abortTSFrom for semantic clarity and consistency with usage elsewhere in the codebase. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Explain why startTS is passed twice to abortTSFrom: no commitTS is available when aborting an expired lock with no commit record. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Use 0 instead of startTS as the second parameter to abortTSFrom to explicitly indicate that no commitTS is available in this context. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Remove extra blank lines left after deleting cleanupTS function to fix golangci-lint gci formatting check. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Clarify that while uint64 overflow is not expected in normal operation, the overflow checks are defensive programming to handle the edge case where startTS==^uint64(0) or a bug causes overflow, preventing violation of the FSM invariant resolveTS > startTS. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Fix timestamp overflow in lock cleanup causing permanent lock leaks

Refactor transaction timestamp management

dc280eb

Copilot AI review requested due to automatic review settings February 15, 2026 11:42

Copilot started reviewing on behalf of bootjp February 15, 2026 11:42 View session

gemini-code-assist bot reviewed Feb 15, 2026

View reviewed changes

kv/shard_store.go Outdated Show resolved Hide resolved

kv/sharded_coordinator.go Outdated Show resolved Hide resolved

adapter/internal.go Outdated Show resolved Hide resolved

kv/coordinator.go Outdated Show resolved Hide resolved

kv/fsm.go Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 15, 2026

View reviewed changes

bootjp added 2 commits February 15, 2026 21:02

Fix concurrency issues with MULTI/EXEC commands

fda8142

Refactor transaction handling and error checks

35f7783

bootjp requested a review from Copilot February 15, 2026 15:54

Copilot started reviewing on behalf of bootjp February 15, 2026 15:55 View session

bootjp changed the title ~~Refactor transaction timestamp management~~ Refactor transaction timestamp management for cross shard data Feb 15, 2026

gemini-code-assist bot reviewed Feb 15, 2026

View reviewed changes

kv/coordinator.go Show resolved Hide resolved

Copilot AI reviewed Feb 15, 2026

View reviewed changes

kv/adapter: address review feedback

b7e219a

bootjp requested a review from Copilot February 15, 2026 16:23

Copilot started reviewing on behalf of bootjp February 15, 2026 16:24 View session

gemini-code-assist bot reviewed Feb 15, 2026

View reviewed changes

kv/shard_store.go Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 15, 2026

View reviewed changes

kv/store: address latest PR review comments

daa1385

bootjp requested a review from Copilot February 16, 2026 18:59

Copilot started reviewing on behalf of bootjp February 16, 2026 18:59 View session

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

Copilot AI reviewed Feb 16, 2026

View reviewed changes

bootjp added 3 commits February 17, 2026 04:30

kv/fsm: harden txn key dedupe and lock validation

bf234fd

kv: reduce scan lock lookup overhead

e2a20b0

kv: batch scan lock resolution by transaction

ff9b016

bootjp requested a review from Copilot February 16, 2026 19:49

Copilot started reviewing on behalf of bootjp February 16, 2026 20:05 View session

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

Copilot AI reviewed Feb 16, 2026

View reviewed changes

kv/adapter: handle latest PR review issues

e480042

bootjp requested a review from Copilot February 16, 2026 20:28

Copilot started reviewing on behalf of bootjp February 16, 2026 20:28 View session

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

Copilot AI reviewed Feb 16, 2026

View reviewed changes

bootjp added 3 commits February 17, 2026 05:40

kv/adapter: keep RawGet compatibility across versions

965985a

kv: harden lock resolution and txn commit retries

11c13ca

test: add txn primary-key and timestamp stamping coverage

b2bb811

bootjp requested a review from Copilot February 16, 2026 21:13

Copilot started reviewing on behalf of bootjp February 16, 2026 21:14 View session

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

kv/sharded_coordinator.go Show resolved Hide resolved

kv/sharded_coordinator.go Show resolved Hide resolved

Copilot AI reviewed Feb 16, 2026

View reviewed changes

Initial plan

d44348e

Copilot AI mentioned this pull request Feb 16, 2026

Fix timestamp overflow in lock cleanup causing permanent lock leaks #314

Merged

9 tasks

Copilot AI and others added 7 commits February 16, 2026 21:25

Use commitTS parameter in abortTSFrom calls where available

9b2ef8b

Pass commitTS from primaryTxnStatus to abortTSFrom for semantic clarity and consistency with usage elsewhere in the codebase. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Add comment clarifying tryAbortExpiredPrimary timestamp handling

65ede5c

Explain why startTS is passed twice to abortTSFrom: no commitTS is available when aborting an expired lock with no commit record. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Pass 0 for commitTS in tryAbortExpiredPrimary for clarity

5099174

Use 0 instead of startTS as the second parameter to abortTSFrom to explicitly indicate that no commitTS is available in this context. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Fix gci linter error - remove extra blank lines

3ff3e3b

Remove extra blank lines left after deleting cleanupTS function to fix golangci-lint gci formatting check. Co-authored-by: bootjp <1306365+bootjp@users.noreply.github.com>

Merge pull request #314 from bootjp/copilot/sub-pr-313

32dc5ef

Fix timestamp overflow in lock cleanup causing permanent lock leaks

bootjp enabled auto-merge February 16, 2026 21:43

bootjp merged commit 8d7753a into main Feb 16, 2026
7 checks passed

bootjp deleted the feature/multi-shard-transaction branch February 16, 2026 21:45

-		return startTS + 1
+		nextTS := startTS + 1
+		if nextTS == 0 {
+			return startTS
+		}
+		return nextTS

	if lock.StartTS != startTS {
	if lock.StartTS != startTS \|\| !bytes.Equal(lock.PrimaryKey, key) {

	if resp.Value == nil {
	return nil, errors.WithStack(store.ErrKeyNotFound)
	}

Conversation

bootjp commented Feb 15, 2026

Uh oh!

gemini-code-assist bot commented Feb 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bootjp commented Feb 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Feb 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bootjp commented Feb 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment