Skip to content

Fix precision loss for large bigint PKs in table repair JSON roundtrip#98

Open
mason-sharp wants to merge 1 commit intomainfrom
fix/ACE-179/large-numeric-PK-repair
Open

Fix precision loss for large bigint PKs in table repair JSON roundtrip#98
mason-sharp wants to merge 1 commit intomainfrom
fix/ACE-179/large-numeric-PK-repair

Conversation

@mason-sharp
Copy link
Copy Markdown
Member

OrderedMap.UnmarshalJSON used Go's default JSON number decoding (float64), which silently truncates integers exceeding 2^53. For tables with large bigint primary keys (e.g. snowflake IDs like 415588913294348289), this caused:

  • PK corruption: 415588913294348289 → 415588913294348288 (off by 1)
  • PK collisions: adjacent PKs mapped to the same float64, causing rows to silently overwrite each other in repair maps
  • Wrong upserts/deletes: repairs targeted wrong rows or missed them
  • Growing diffs after repair: corrupted values replicated via spock

Fix: add dec.UseNumber() to OrderedMap.UnmarshalJSON so JSON numbers are preserved as json.Number strings. Update ConvertToPgxType to handle json.Number for integer types (lossless Int64 parse), numeric/decimal (exact string via pgtype.Numeric), and float types (Float64 parse). Also update comparePKValues/toFloat64 to recognize json.Number.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

Added json.Number support and precise numeric comparison using math/big across conversion, comparison, and JSON decoding paths; replaced reflect/float64-based numeric handling with a CompareNumeric implementation and updated tests and integration tests to cover large-integer scenarios.

Changes

Cohort / File(s) Summary
Type Conversion Utilities
pkg/common/utils.go
Added json.Number handling for integer, float and numeric/decimal conversions; removed reflect-based isNumeric/toFloat64; introduced CompareNumeric (int64 fast-path, math/big.Float fallback) and bigFloatPrec; updated comparePKValues to use CompareNumeric.
Type Conversion Tests
pkg/common/utils_test.go
Added tests covering json.Number conversions to int64, float64, and pgtype.Numeric; verifies large-int precision, numeric/string handling, and StringifyOrderedMapKey behavior for adjacent large PK values.
JSON Decode Config
pkg/types/types.go
Enabled dec.UseNumber() in OrderedMap.UnmarshalJSON so numeric JSON tokens are decoded as json.Number instead of float64.
OrderedMap JSON Tests
pkg/types/types_test.go
Added JSON round-trip tests asserting json.Number preservation and exact string representations for large integers, decimals, and mixed numeric types.
Integration: Table Repair
tests/integration/table_repair_test.go
New test TestTableRepair_LargeBigintPK that exercises repair/diff logic with adjacent int64 PKs that would collide under float64 handling; asserts correct diff counts and successful repair.
Repair & Parser Logic
internal/consistency/repair/executor.go, internal/consistency/repair/plan/parser/parser.go, internal/consistency/repair/table_repair.go
Replaced local float-conversion helpers (asFloat/asNumber) with utils.CompareNumeric for PK matching, range comparisons, freshness ordering, and operator evaluation; enhanced commit_ts parsing to accept json.Number; removed deleted helper functions.

Poem

🐰 I nibble digits, tidy and bright,
No float will gobble a bigint tonight,
Strings keep their shape, comparisons stay true,
Big numbers hop home—precision anew! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main fix: addressing precision loss for large bigint primary keys in table repair JSON roundtrip.
Description check ✅ Passed The description thoroughly explains the problem (float64 truncation of large integers), concrete impacts (PK corruption, collisions, wrong operations), and the solution implemented across multiple components.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ACE-179/large-numeric-PK-repair

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/integration/table_repair_test.go (1)

2215-2224: Minor: Cleanup only removes table from repset on node1.

The table is added to the repset on both nodes (lines 2211-2212), but repset_remove_table is only called on Node1Pool. For consistency, consider removing from both nodes:

Suggested cleanup improvement
 	t.Cleanup(func() {
-		_, _ = pgCluster.Node1Pool.Exec(ctx, fmt.Sprintf(`SELECT spock.repset_remove_table('default', '%s');`, qualifiedTableName))
+		for _, pool := range []*pgxpool.Pool{pgCluster.Node1Pool, pgCluster.Node2Pool} {
+			_, _ = pool.Exec(ctx, fmt.Sprintf(`SELECT spock.repset_remove_table('default', '%s');`, qualifiedTableName))
+		}
 		for _, pool := range []*pgxpool.Pool{pgCluster.Node1Pool, pgCluster.Node2Pool} {
 			_, _ = pool.Exec(ctx, fmt.Sprintf("DROP TABLE IF EXISTS %s CASCADE", qualifiedTableName))
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/table_repair_test.go` around lines 2215 - 2224, The cleanup
currently only calls spock.repset_remove_table via pgCluster.Node1Pool; update
the Cleanup to remove the table from the repset on both nodes by invoking
spock.repset_remove_table for pgCluster.Node2Pool as well (use the same
qualifiedTableName), ensuring symmetry with the earlier addition on both
Node1Pool and Node2Pool and keeping the rest of the cleanup (DROP TABLE on both
pools, removing diff files) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/integration/table_repair_test.go`:
- Around line 2215-2224: The cleanup currently only calls
spock.repset_remove_table via pgCluster.Node1Pool; update the Cleanup to remove
the table from the repset on both nodes by invoking spock.repset_remove_table
for pgCluster.Node2Pool as well (use the same qualifiedTableName), ensuring
symmetry with the earlier addition on both Node1Pool and Node2Pool and keeping
the rest of the cleanup (DROP TABLE on both pools, removing diff files)
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ffbde1de-b77f-4381-8899-6dd23ebb6e19

📥 Commits

Reviewing files that changed from the base of the PR and between 005e1e0 and a123dee.

📒 Files selected for processing (5)
  • pkg/common/utils.go
  • pkg/common/utils_test.go
  • pkg/types/types.go
  • pkg/types/types_test.go
  • tests/integration/table_repair_test.go

OrderedMap.UnmarshalJSON used Go's default JSON number decoding (float64),
which silently truncates integers exceeding 2^53. For tables with large
bigint primary keys (e.g. snowflake IDs like 415588913294348289), this
caused:

- PK corruption: 415588913294348289 → 415588913294348288 (off by 1)
- PK collisions: adjacent PKs mapped to the same float64, causing rows
  to silently overwrite each other in repair maps
- Wrong upserts/deletes: repairs targeted wrong rows or missed them
- Growing diffs after repair: corrupted values replicated via spock

Fix: add dec.UseNumber() to OrderedMap.UnmarshalJSON so JSON numbers are
preserved as json.Number strings. Update ConvertToPgxType to handle
json.Number for integer types (lossless Int64 parse), numeric/decimal
(exact string via pgtype.Numeric), and float types (Float64 parse).

Replace the three duplicate lossy numeric helpers (toFloat64, asFloat,
asNumber) with a single precision-safe CompareNumeric in pkg/common that
uses an int64 fast path and falls back to math/big.Float (256-bit) for
json.Number decimals, large uint64, and native floats. Also handle
json.Number in origin timestamp extraction (table_repair.go).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mason-sharp mason-sharp force-pushed the fix/ACE-179/large-numeric-PK-repair branch from a123dee to 051f8c1 Compare March 31, 2026 00:34
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/consistency/repair/plan/parser/parser.go`:
- Around line 617-633: The numeric tokenization still parses all numbers to
float64 in scanNumber, causing precision loss before CompareNumeric runs; update
scanNumber to capture the original numeric text and store it as a json.Number
token (instead of using strconv.ParseFloat -> float64) so that
utils.CompareNumeric can operate on json.Number losslessly, and ensure any token
struct/enum used by parseExpression (the token carrying numeric values) is
adjusted to hold json.Number (and preserve existing paths that expect float64 by
converting only when necessary).

In `@pkg/common/utils.go`:
- Around line 548-563: The json.Number handling branch currently falls back to
parsing Float64 and casting to int64 which silently truncates fractional values;
update the block that handles val.(json.Number) so that after parsing f, you
verify it's an exact integer (e.g., compare f to math.Trunc(f) or check
fractional part) and if it has a fractional component return an error mentioning
the original json.Number and pgType instead of casting, while still keeping the
existing safe-range check for float64-to-int64 conversion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4af07048-e098-403f-a789-b8c5f02eb4cb

📥 Commits

Reviewing files that changed from the base of the PR and between a123dee and 051f8c1.

📒 Files selected for processing (8)
  • internal/consistency/repair/executor.go
  • internal/consistency/repair/plan/parser/parser.go
  • internal/consistency/repair/table_repair.go
  • pkg/common/utils.go
  • pkg/common/utils_test.go
  • pkg/types/types.go
  • pkg/types/types_test.go
  • tests/integration/table_repair_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/types/types_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/types/types.go
  • pkg/common/utils_test.go

Comment on lines +617 to 633
if cmp, ok := utils.CompareNumeric(left, right); ok {
switch op {
case tokEq:
return ln == rn, nil
return cmp == 0, nil
case tokNeq:
return ln != rn, nil
return cmp != 0, nil
case tokLt:
return ln < rn, nil
return cmp < 0, nil
case tokLte:
return ln <= rn, nil
return cmp <= 0, nil
case tokGt:
return ln > rn, nil
return cmp > 0, nil
case tokGte:
return ln >= rn, nil
return cmp >= 0, nil
default:
return false, fmt.Errorf("unsupported numeric operator %v", op)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n internal/consistency/repair/plan/parser/parser.go | head -100

Repository: pgEdge/ace

Length of output: 2749


🏁 Script executed:

wc -l internal/consistency/repair/plan/parser/parser.go

Repository: pgEdge/ace

Length of output: 107


🏁 Script executed:

rg -n "func.*scanNumber" internal/consistency/repair/plan/parser/parser.go -A 20

Repository: pgEdge/ace

Length of output: 459


🏁 Script executed:

head -50 internal/consistency/repair/plan/parser/parser.go

Repository: pgEdge/ace

Length of output: 1418


🏁 Script executed:

rg -n "tokNumber" internal/consistency/repair/plan/parser/parser.go

Repository: pgEdge/ace

Length of output: 191


🏁 Script executed:

sed -n '600,650p' internal/consistency/repair/plan/parser/parser.go

Repository: pgEdge/ace

Length of output: 1099


🏁 Script executed:

rg -n "CompareNumeric" --type-list go -A 10

Repository: pgEdge/ace

Length of output: 443


🏁 Script executed:

sed -n '210,240p' internal/consistency/repair/plan/parser/parser.go

Repository: pgEdge/ace

Length of output: 674


🏁 Script executed:

fd -type f -name "*.go" | xargs rg -l "CompareNumeric" | head -5

Repository: pgEdge/ace

Length of output: 351


🏁 Script executed:

rg -n "func.*CompareNumeric" -A 30

Repository: pgEdge/ace

Length of output: 1453


🏁 Script executed:

rg -n "func.*toBigFloat" -A 20 pkg/common/utils.go

Repository: pgEdge/ace

Length of output: 1015


🏁 Script executed:

sed -n '1406,1440p' pkg/common/utils.go

Repository: pgEdge/ace

Length of output: 795


🏁 Script executed:

head -50 pkg/common/utils.go | grep -E "^import|json"

Repository: pgEdge/ace

Length of output: 79


Numeric literals are still rounded before CompareNumeric sees them.

This only fixes the comparison step. scanNumber earlier in the file (lines 210-236) still materializes every literal as float64 via strconv.ParseFloat, so when clauses like n1.id = 415588913294348289 are already lossy (integers >2^53 lose precision), and decimals such as 0.1 can fail against exact json.Number row values.

The fix is valid: CompareNumeric already handles json.Number correctly—asInt64() and toBigFloat() both accept and parse it losslessly. Storing tokens as json.Number preserves the original text and allows accurate comparison.

Minimal fix (touches scanNumber earlier in this file)
+import "encoding/json"
 import (
 	"fmt"
-	"strconv"
 	"strings"
 	"unicode"
 
 	utils "github.com/pgedge/ace/pkg/common"
 )
@@
 func (l *lexer) scanNumber() (token, error) {
 	...
 	text := l.input[start:l.pos]
-	num, err := strconv.ParseFloat(text, 64)
-	if err != nil {
+	if _, err := json.Number(text).Float64(); err != nil {
 		return token{}, fmt.Errorf("invalid number %q at pos %d", text, start)
 	}
-	return token{typ: tokNumber, lit: text, pos: start, value: num}, nil
+	return token{typ: tokNumber, lit: text, pos: start, value: json.Number(text)}, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/consistency/repair/plan/parser/parser.go` around lines 617 - 633,
The numeric tokenization still parses all numbers to float64 in scanNumber,
causing precision loss before CompareNumeric runs; update scanNumber to capture
the original numeric text and store it as a json.Number token (instead of using
strconv.ParseFloat -> float64) so that utils.CompareNumeric can operate on
json.Number losslessly, and ensure any token struct/enum used by parseExpression
(the token carrying numeric values) is adjusted to hold json.Number (and
preserve existing paths that expect float64 by converting only when necessary).

Comment on lines +548 to +563
if n, ok := val.(json.Number); ok {
if i64, err := n.Int64(); err == nil {
return i64, nil
}
// Handle fractional representation (e.g. "42.0") by parsing
// as float64, but only when the value is in float64's exact
// integer range to avoid silent precision loss.
f, err := n.Float64()
if err != nil {
return nil, fmt.Errorf("failed to parse json.Number %q as integer for %s: %w", n.String(), pgType, err)
}
const maxSafeInt = 1 << 53
if f > maxSafeInt || f < -maxSafeInt {
return nil, fmt.Errorf("json.Number %q exceeds safe integer range for float64-to-int64 conversion (%s)", n.String(), pgType)
}
return int64(f), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/common/utils.go | sed -n '548,563p'

Repository: pgEdge/ace

Length of output: 837


🏁 Script executed:

# Also check the broader context to understand the function
cat -n pkg/common/utils.go | sed -n '520,580p'

Repository: pgEdge/ace

Length of output: 2150


🏁 Script executed:

# Search for test files related to this functionality
fd -e _test.go | xargs grep -l "json.Number" 2>/dev/null | head -5

Repository: pgEdge/ace

Length of output: 36


🏁 Script executed:

# Check for any tests of the utils.go file
fd -e _test.go | xargs grep -l "utils" | head -10

Repository: pgEdge/ace

Length of output: 36


🏁 Script executed:

# Find the function this code is in
cat -n pkg/common/utils.go | sed -n '400,550p' | tail -50

Repository: pgEdge/ace

Length of output: 1593


🏁 Script executed:

# Search for test files more broadly
find . -name "*test*" -type f | grep -E "\.(go|py|js|ts)$" | head -20

Repository: pgEdge/ace

Length of output: 865


🏁 Script executed:

# Look for any tests in common package
find . -path "*common*test*" -name "*.go"

Repository: pgEdge/ace

Length of output: 80


🏁 Script executed:

# Check what tests exist for utils
find . -name "*utils*test*" -type f

Repository: pgEdge/ace

Length of output: 80


🏁 Script executed:

# Search for any json.Number usage in tests
rg "json\.Number" --type go | head -20

Repository: pgEdge/ace

Length of output: 1772


🏁 Script executed:

cat -n pkg/common/utils_test.go | head -200

Repository: pgEdge/ace

Length of output: 7432


🏁 Script executed:

# Look for tests related to toPgType function
rg "toPgType" pkg/common/utils_test.go -A 5 -B 5

Repository: pgEdge/ace

Length of output: 36


🏁 Script executed:

# Look for tests with fractional json.Number values
rg "json.Number.*\." pkg/common/utils_test.go | head -20

Repository: pgEdge/ace

Length of output: 127


🏁 Script executed:

# Find the test that checks json.Number with integer types
rg -A 20 "json.Number.*415588913294348289" pkg/common/utils_test.go

Repository: pgEdge/ace

Length of output: 1712


🏁 Script executed:

# Look for tests labeled about json.Number and integer
rg -B 5 -A 15 "json.Number.*bigint" pkg/common/utils_test.go

Repository: pgEdge/ace

Length of output: 1157


🏁 Script executed:

# Get broader context around integer tests
cat -n pkg/common/utils_test.go | sed -n '200,400p'

Repository: pgEdge/ace

Length of output: 7264


🏁 Script executed:

# Look for any tests or code that might depend on fractional truncation
rg "42\.0" --type go

Repository: pgEdge/ace

Length of output: 231


🏁 Script executed:

# Check the comment more carefully - look for any discussion around "42.0"
rg -B 5 -A 5 "fractional representation" pkg/common/utils.go

Repository: pgEdge/ace

Length of output: 593


🏁 Script executed:

# Look for float64 handling in integer conversion on line 566-567
cat -n pkg/common/utils.go | sed -n '560,580p'

Repository: pgEdge/ace

Length of output: 718


🏁 Script executed:

# Check if there are any tests for float64 to integer conversion
rg "float64.*integer" pkg/common/utils_test.go -B 3 -A 3

Repository: pgEdge/ace

Length of output: 366


🏁 Script executed:

# Verify math.Trunc is available in Go standard library and used in codebase
rg "math.Trunc" pkg/common/ --type go

Repository: pgEdge/ace

Length of output: 36


🏁 Script executed:

# Check imports in utils.go to see if math is already imported
head -30 pkg/common/utils.go

Repository: pgEdge/ace

Length of output: 629


🏁 Script executed:

# Verify the proposed fix syntax by checking for similar patterns
rg "math.Trunc.*!=" --type go | head -5

Repository: pgEdge/ace

Length of output: 36


🏁 Script executed:

# Look for any code that might intentionally pass fractional values expecting truncation
rg "ConvertToPgxType.*integer|bigint|int2|int4|int8|serial" --type go -A 2 | head -40

Repository: pgEdge/ace

Length of output: 3397


🏁 Script executed:

# Check if there are any comments or issues discussing this behavior
rg "truncat" pkg/common/utils.go --type go -B 2 -A 2

Repository: pgEdge/ace

Length of output: 196


🏁 Script executed:

# Verify that the proposed fix logic is sound by testing edge cases
python3 << 'EOF'
import math

test_cases = [
    ("42.0", True),      # Should pass - exactly an integer
    ("42.5", False),     # Should fail - has fractional part
    ("-42.0", True),     # Should pass
    ("-42.5", False),    # Should fail
    ("0.0", True),       # Should pass
    ("0.1", False),      # Should fail
]

print("Testing math.Trunc(f) != f logic:")
for val_str, should_pass in test_cases:
    f = float(val_str)
    is_integer = math.trunc(f) == f  # This is what the guard would check
    result = "PASS" if is_integer else "FAIL"
    expected = "PASS" if should_pass else "FAIL"
    status = "✓" if result == expected else "✗"
    print(f"{status} {val_str:6} -> {f:6.1f} -> is_integer={is_integer} -> {result} (expected {expected})")
EOF

Repository: pgEdge/ace

Length of output: 462


Add integer validation to reject fractional json.Number values.

After Int64() fails, this code accepts any Float64() in the safe range and casts it to int64, so json.Number("42.5") silently becomes 42. Integer columns should fail fast rather than truncate fractional input.

Suggested fix
 			f, err := n.Float64()
 			if err != nil {
 				return nil, fmt.Errorf("failed to parse json.Number %q as integer for %s: %w", n.String(), pgType, err)
 			}
 			const maxSafeInt = 1 << 53
 			if f > maxSafeInt || f < -maxSafeInt {
 				return nil, fmt.Errorf("json.Number %q exceeds safe integer range for float64-to-int64 conversion (%s)", n.String(), pgType)
 			}
+			if math.Trunc(f) != f {
+				return nil, fmt.Errorf("json.Number %q is not an integer for %s", n.String(), pgType)
+			}
 			return int64(f), nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/common/utils.go` around lines 548 - 563, The json.Number handling branch
currently falls back to parsing Float64 and casting to int64 which silently
truncates fractional values; update the block that handles val.(json.Number) so
that after parsing f, you verify it's an exact integer (e.g., compare f to
math.Trunc(f) or check fractional part) and if it has a fractional component
return an error mentioning the original json.Number and pgType instead of
casting, while still keeping the existing safe-range check for float64-to-int64
conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant