Skip to content

codepuke/gobdotnet

Repository files navigation

gobdotnet

A pure C# encoder and decoder for Go's encoding/gob binary serialization format. Sister library to pygob.

Any byte stream produced by Go's encoder decodes correctly in C#. Any byte stream produced by gobdotnet decodes correctly in Go.

Targets: .NET 10, no external runtime dependencies, NativeAOT-compatible via source generator.


Table of Contents


Quick Start

using GobDotNet;

// ── Decode a Go-produced gob stream ──────────────────────────────────────────

byte[] gobBytes = File.ReadAllBytes("data.gob");
var point = Gob.Decode<GobObject>(gobBytes);
Console.WriteLine($"X={point["X"]}, Y={point["Y"]}");

// ── Encode a struct and send it to a Go service ───────────────────────────────

[GobStruct("Point")]
public partial class Point
{
    public long X { get; set; }
    public long Y { get; set; }
}

byte[] encoded = Gob.Encode(new Point { X = 1, Y = 2 });

// ── Round-trip with time.Time and uuid.UUID ───────────────────────────────────

byte[] withTime = File.ReadAllBytes("event.gob");
var evt = Gob.Decode<GobObject>(withTime, DefaultCodecs.All);
// evt["CreatedAt"] is a DateTimeOffset; evt["Id"] is a Guid

Installation

gobdotnet is not yet published to NuGet. Add it to your solution as a project reference:

<ItemGroup>
  <ProjectReference Include="../gobdotnet/GobDotNet/GobDotNet.csproj" />
  <!-- Optional: source generator for AOT support and compile-time validation -->
  <ProjectReference Include="../gobdotnet/GobDotNet.SourceGenerators/GobDotNet.SourceGenerators.csproj"
                    OutputItemType="Analyzer" ReferenceOutputAssembly="false" />
</ItemGroup>

Core Concepts

GobObject is what you get when decoding a Go struct with no registered C# type. It behaves like a read-only dictionary with string keys, and carries the Go type name and schema needed to re-encode it.

GobSchema describes a struct type: its Go name and ordered list of (string Name, GobFieldType Type) field descriptors. You can derive one automatically from a [GobStruct] class or build one by hand.

GobFieldType is the type descriptor for a field: GobFieldType.Int, GobFieldType.String, GobFieldType.SliceOf(...), etc. These correspond directly to Go's wire type IDs.

GobEncoded wraps the raw bytes for a Go type implementing GobEncoder, BinaryMarshaler, or TextMarshaler when no C# codec is registered. Inspect .TypeName and .Data, or register a codec to get a typed value instead.


API Reference

Gob — Convenience Functions

One-shot helpers that create a fresh encoder/decoder per call. Inherently thread-safe.

// Encode a value to a byte array.
byte[] Gob.Encode<T>(T value, IReadOnlyDictionary<string, IGobCodec>? codecs = null);

// Encode a dictionary with an explicit schema.
byte[] Gob.Encode(IDictionary<string, object?> value, GobSchema schema,
                  IReadOnlyDictionary<string, IGobCodec>? codecs = null);

// Decode the first value from a byte array.
object? Gob.Decode(byte[] data, IReadOnlyDictionary<string, IGobCodec>? codecs = null);

// Decode and cast to T. Throws InvalidCastException on type mismatch.
T Gob.Decode<T>(byte[] data, IReadOnlyDictionary<string, IGobCodec>? codecs = null);

GobEncoder

Stream-oriented, thread-safe encoder. Keeps type definition state across multiple Encode calls — the same type ID is reused for the same schema, matching Go's wire protocol.

// Construction
var enc = new GobEncoder(stream);                        // no codecs
var enc = new GobEncoder(stream, DefaultCodecs.All);     // with time/UUID codecs

// Encoding
enc.Encode(new Point { X = 1, Y = 2 });                 // [GobStruct] POCO
enc.Encode(dict, schema);                                // explicit schema
enc.Encode(gobObject);                                   // re-encode a decoded GobObject

// Post-construction registration
enc.Register("main.Point", schema);                      // for interface field concrete types
enc.RegisterCodec("MyType", myCodec);                    // custom marshaler codec

Caller owns the stream. GobEncoder does not implement IDisposable.

GobDecoder

Stream-oriented, thread-safe decoder. Maintains a type registry across calls — type definitions received in earlier messages are reused in later ones.

// Construction
var dec = new GobDecoder(stream);
var dec = new GobDecoder(stream, DefaultCodecs.All);

// Decoding
object? value  = dec.Decode();               // returns GobObject, long, List<object?>, etc.
Point point    = dec.Decode<Point>();        // casts; throws InvalidCastException on mismatch
bool ok        = dec.TryDecode(out var v);  // returns false at end of stream
bool ok        = dec.TryDecode<Point>(out var p);

// Multi-message stream: call Decode() / TryDecode() in a loop
while (dec.TryDecode<GobObject>(out var record))
    Process(record);

// Post-construction registration
dec.Register<Point>("Point");                // map Go struct name → C# type
dec.RegisterCodec("Time", TimeCodec.Instance);

Caller owns the stream. GobDecoder does not implement IDisposable. Decode() throws EndOfStreamException when the stream is exhausted; TryDecode returns false instead.


Type Mapping

Go type C# type Notes
int, int64 long Go's default int size
uint, uint64 ulong
int32, int16, int8 int, short, sbyte Encoded as signed gob int
uint32, uint16, uint8 uint, ushort, byte Encoded as unsigned gob int
bool bool
float64 double
float32 float
complex128 System.Numerics.Complex
string string
[]byte byte[]
[]T List<T> or T[] Decoded as List<object?>; any IEnumerable<T> encodes
[N]T fixed-length array via GobFieldType.ArrayOf Length not preserved in decoded value
map[K]V Dictionary<K, V> Decoded as Dictionary<object, object?>
struct [GobStruct] POCO or GobObject POCO requires registration
interface{} object? Concrete value embedded; structs become GobObject
time.Time DateTimeOffset Requires DefaultCodecs.All
uuid.UUID Guid Requires DefaultCodecs.All
time.Duration TimeSpan via GobFieldType.Duration 100 ns tick precision
GobEncoder/BinaryMarshaler/TextMarshaler GobEncoded or custom type Register a codec for typed decoding

Defining Schemas

[GobStruct] Attribute

The primary way to define an encodable struct. Add partial to enable the source generator (recommended for correctness and AOT compatibility).

[GobStruct("Point")]          // Go type name on the wire
public partial class Point
{
    public long X { get; set; }
    public long Y { get; set; }
}

[GobStruct("Person")]
public partial class Person
{
    [GobField(Order = 1)]             // explicit wire order when C# order differs from Go
    public string Name { get; set; } = "";

    [GobField(Order = 2)]
    public long Age { get; set; }

    [GobField(Name = "home_city")]    // override field name on the wire
    public string HomeCity { get; set; } = "";

    [GobField(Ignore = true)]         // skip this property entirely
    public string CacheKey { get; set; } = "";
}

Field ordering on the wire must match the Go struct's source declaration order. Use [GobField(Order = N)] when your C# property order differs from Go's. If any property has Order set, all must.

Explicit GobSchema

For plain dictionaries, GobObject re-encoding, or any case where a POCO isn't practical:

var schema = new GobSchema("Point",
    ("X", GobFieldType.Int),
    ("Y", GobFieldType.Int));

var encoded = Gob.Encode(
    new Dictionary<string, object?> { ["X"] = 1L, ["Y"] = 2L },
    schema);

Derive a schema from a [GobStruct] type at runtime:

GobSchema schema = GobSchema.For<Point>();   // source generator path, then reflection fallback
GobSchema schema = GobSchema.For(typeof(Point));

GobFieldType Descriptors

// Primitives
GobFieldType.Bool
GobFieldType.Int        // signed: long, int, short, sbyte
GobFieldType.UInt       // unsigned: ulong, uint, ushort, byte
GobFieldType.Float      // double, float
GobFieldType.Bytes      // byte[]
GobFieldType.String
GobFieldType.Complex    // System.Numerics.Complex
GobFieldType.Interface  // object?

// Well-known semantic types
GobFieldType.Duration   // TimeSpan ↔ int64 nanoseconds

// Composites
GobFieldType.SliceOf(GobFieldType.Int)
GobFieldType.MapOf(GobFieldType.String, GobFieldType.Int)
GobFieldType.ArrayOf(GobFieldType.Int, length: 3)   // Go fixed-length array
GobFieldType.StructOf(nestedSchema)

// Marshaler types
GobFieldType.Marshaler("Time", "gob")     // Go time.Time (implements GobEncoder)
GobFieldType.Marshaler("UUID", "binary")  // uuid.UUID (implements BinaryMarshaler)

Duration exampleGobFieldType.Duration converts TimeSpan to and from Go's int64 nanoseconds on the wire:

var schema = new GobSchema("Event",
    ("Name", GobFieldType.String),
    ("Duration", GobFieldType.Duration));   // TimeSpan ↔ int64 nanoseconds

var encoded = Gob.Encode(
    new Dictionary<string, object?> { ["Name"] = "ping", ["Duration"] = TimeSpan.FromSeconds(5) },
    schema);

var decoded = Gob.Decode<GobObject>(encoded);
// The decoder returns the raw wire value (int64 nanoseconds) when no Duration schema is supplied:
// decoded["Duration"] == 5_000_000_000L

Source Generator

The GobDotNet.SourceGenerators project is an incremental Roslyn source generator that runs at compile time on any partial class decorated with [GobStruct].

What it generates for each eligible class:

  • A cached GobSchema static field derived from property declaration order (reliable, unlike MetadataToken ordering from reflection).
  • An IGobStructGenerated interface implementation with CreateFromFields and WriteFields — pure code with no reflection at runtime, fully NativeAOT-compatible.

GobSchema.For<T>() checks for IGobStructGenerated first and uses the generated schema, falling back to reflection only for non-partial classes.

Compile-time diagnostics:

Code Condition
GOB001 [GobStruct] class is not partial — source generator cannot extend it
GOB002 Mixed [GobField(Order = N)] usage — some properties have Order, others don't
GOB003 Unsupported property type (e.g., BigInteger)
GOB004 [GobStruct] on an abstract class or interface

Non-partial classes silently fall back to reflection; GOB001 is informational, not an error.


Codecs — time.Time and uuid.UUID

Go's time.Time and UUID types implement marshaler interfaces. Pass DefaultCodecs.All to the encoder/decoder to handle them automatically:

// Decode a Go struct containing time.Time and uuid.UUID fields
var decoded = Gob.Decode<GobObject>(bytes, DefaultCodecs.All);
var createdAt = (DateTimeOffset)decoded["CreatedAt"]!;
var id        = (Guid)decoded["Id"]!;

// Without DefaultCodecs.All, marshaler values decode as GobEncoded:
var raw = Gob.Decode<GobObject>(bytes);
var enc = (GobEncoded)raw["CreatedAt"]!;  // enc.TypeName == "Time", enc.Data == byte[15]

TimeCodec (DefaultCodecs.All["Time"]):

  • Decodes Go time.TimeDateTimeOffset. UTC times decode with TimeSpan.Zero offset.
  • Encodes DateTimeOffset → Go time.Time wire format (15 bytes, version 1).
  • Precision: DateTimeOffset has 100 ns tick precision; sub-tick nanoseconds from Go are truncated on decode.
  • Offset narrowing: The wire format stores whole minutes only. Sub-minute offsets are silently truncated on encode (real-world time zones are always whole minutes).
  • Zone name loss: Go's IANA zone name (e.g., "America/New_York") cannot be stored in DateTimeOffset; only the numeric offset survives.

GuidCodec (DefaultCodecs.All["UUID"]):

  • Decodes Go uuid.UUID (16-byte RFC 4122 big-endian) → Guid.
  • Compatible with github.com/google/uuid, github.com/gofrs/uuid, and github.com/satori/go.uuid — all produce the same wire format.
  • Uses new Guid(span, bigEndian: true) (.NET 8+) for direct big-endian construction.

Interface Values

Go's interface{} fields are fully supported on both sides of the wire.

Decoding (Go → C#)

No registration is required. The gob stream is self-describing: the concrete type definition is embedded inline, and the decoder reconstructs the value automatically. Interface fields decode as GobObject with GobType set to the qualified Go type name (e.g. "main.Point"):

var container = Gob.Decode<GobObject>(bytes);
var inner = (GobObject)container["Value"]!;
Console.WriteLine(inner.GobType);   // "main.Point"
Console.WriteLine(inner["X"]);      // 7L

Encoding (C# → Go)

Before encoding a struct that contains an interface{} field, register the concrete type's qualified Go name and its schema:

var pointSchema = new GobSchema("Point", ("X", GobFieldType.Int), ("Y", GobFieldType.Int));
var containerSchema = new GobSchema("Container",
    ("Name", GobFieldType.String),
    ("Value", GobFieldType.Interface));

var enc = new GobEncoder(stream);
enc.Register("main.Point", pointSchema);   // qualified Go name

var point = new GobObject("main.Point", pointSchema,
[
    new KeyValuePair<string, object?>("X", 7L),
    new KeyValuePair<string, object?>("Y", 13L)
]);
enc.Encode(new Dictionary<string, object?> { ["Name"] = "hello", ["Value"] = point },
    containerSchema);

Types annotated with [GobStruct] are auto-registered by the source generator — no manual Register() call is needed for those types.


Custom Codecs

Implement IGobCodec<T> to handle any Go type that implements GobEncoder, BinaryMarshaler, or TextMarshaler:

public sealed class MyTypeCodec : IGobCodec<MyType>
{
    public string MarshalerType => "binary";  // "gob", "binary", or "text"

    public MyType Decode(ReadOnlySpan<byte> data)
    {
        // Parse data → MyType
        return new MyType(data);
    }

    public byte[] Encode(MyType value)
    {
        // Serialize value → bytes
        return value.ToBytes();
    }
}

// Register at construction (apply to all values in the stream):
var codecs = new Dictionary<string, IGobCodec>
{
    ["MyType"] = new MyTypeCodec(),
    ["Time"]   = TimeCodec.Instance,
};
var dec = new GobDecoder(stream, codecs);

// Or register post-construction (typed, works correctly):
var dec = new GobDecoder(stream);
dec.RegisterCodec("MyType", new MyTypeCodec());

Note: Codecs passed via the constructor dictionary use the ICodecObjectDecoder internal interface. If your codec implements only IGobCodec<T> (the public interface), use RegisterCodec<T>() after construction to ensure your Decode method is called. Codecs registered via the constructor that don't implement the internal interface will return GobEncoded instead.


Semantic Types

Go allows named primitive types (type Status string, type Count int64). Map them to C# types with semantic converters:

// type Count int64 in Go → int in C# (with scale)
var countType = GobFieldType.SemanticInt<int>(
    decode: l => (int)l,
    encode: i => (long)i,
    zero:   0);

// type Tag string in Go → enum in C#
var tagType = GobFieldType.SemanticString<Tag>(
    decode: s => Enum.Parse<Tag>(s),
    encode: t => t.ToString(),
    zero:   Tag.Unknown);

var schema = new GobSchema("Event",
    ("Count", countType),
    ("Tag",   tagType));

var encoded = Gob.Encode(
    new Dictionary<string, object?> { ["Count"] = 42, ["Tag"] = Tag.Important },
    schema);

Semantic types are encoder-side only: the decoder sees the underlying wire type (long, ulong, double, or string) and returns that. The conversion back to the C# type is the caller's responsibility.

Available factories: GobFieldType.SemanticInt<T>, GobFieldType.SemanticUInt<T>, GobFieldType.SemanticString<T>, GobFieldType.SemanticFloat<T>.


Error Handling

// End of stream
try { dec.Decode(); }
catch (EndOfStreamException) { /* stream exhausted */ }

// Or use TryDecode:
if (!dec.TryDecode(out var value)) { /* end of stream */ }

// Format errors
try { dec.Decode(); }
catch (GobDecodeException ex) { /* malformed data */ }

try { enc.Encode(value); }
catch (GobEncodeException ex) { /* unsupported type, missing codec, etc. */ }

// Type mismatch
try { dec.Decode<Point>(); }
catch (InvalidCastException) { /* decoded value is not a Point */ }

Exception hierarchy:

GobException : Exception
├── GobDecodeException   (malformed wire data)
└── GobEncodeException   (unsupported type, missing codec, schema error)

EndOfStreamException (BCL) is thrown by Decode() at end of stream; TryDecode returns false instead.

BigInteger as a [GobStruct] property type throws GobEncodeException at schema derivation — fail-loud, not silent truncation.


Thread Safety

GobEncoder and GobDecoder serialize concurrent method calls via an internal lock. All of Encode, Register, and RegisterCodec on the encoder, and Decode, TryDecode, Register, and RegisterCodec on the decoder, share the same lock per instance.

// Safe: multiple threads encoding on the same encoder
var enc = new GobEncoder(sharedStream);
Parallel.ForEach(items, item => enc.Encode(item));

// Safe: multiple threads decoding from the same decoder
var dec = new GobDecoder(sharedStream);
// Note: each Decode() call returns the next value — concurrent calls
// get different values, not the same value duplicated.

Gob.Encode and Gob.Decode are inherently thread-safe because each call uses a fresh encoder/decoder instance.

GobObject, GobSchema, GobFieldType, and GobEncoded are immutable and safe to share across threads without synchronization.


Benchmarks

Measured on Apple M3 Max, .NET 10.0.5, BenchmarkDotNet short job. Compared against Newtonsoft.Json for equivalent payloads.

Scenario Gob JSON Ratio
Scalar int encode 161 ns 86 ns 1.9×
Scalar int decode 210 ns 123 ns 1.7×
Scalar string encode 183 ns 124 ns 1.5×
Scalar string decode 242 ns 178 ns 1.4×
Struct (2 fields) encode 439 ns 139 ns 3.2×
Struct (2 fields) decode 555 ns 336 ns 1.7×
Nested struct encode 1,058 ns 309 ns 3.4×
Nested struct decode 1,649 ns 875 ns 1.9×
Slice of 1000 encode 15,489 ns 17,119 ns 0.9× (gob faster)
Slice of 1000 decode 10,001 ns 29,013 ns 0.3× (gob faster)
Map of 1000 encode 35,430 ns 31,307 ns 1.1×
Map of 1000 decode 55,533 ns 77,605 ns 0.7× (gob faster)
Mixed round-trip 1,382 ns 876 ns 1.6×

Summary: Scalars and mixed payloads are within the 2× target. Small struct encode is 3–4× slower than JSON (dictionary lookup overhead in the benchmark setup); collections are consistently faster than JSON because the binary format is more compact and avoids text parsing. See PROGRESS.md for context.


Wire Format Compatibility Notes

  • User type IDs start at 65. Go's encoding/gob constant is firstUserId = 65; gobdotnet matches this. Because Go's in-process type registry accumulates IDs across multiple encode calls, a fresh C# encoder and an in-process Go encoder assign different IDs to the same struct — even though the decoded values are identical. Struct output must be validated by decoding and comparing values structurally, not byte-for-byte. Byte-level comparison is only reliable for scalars, which carry no user type IDs.
  • Go map iteration is non-deterministic. Never byte-compare map-containing gob output; compare decoded values structurally.
  • Field order on the wire must match Go's source declaration order. Use [GobField(Order = N)] when your C# property declaration order differs.
  • Zero-valued fields are omitted on the wire. The decoder pre-populates all fields with zero values before the delta loop.
  • The partial modifier is optional but recommended. Without it, schema field order is inferred from MetadataToken, which matches source order on current .NET runtimes but is not ECMA-guaranteed.
  • Nested struct fields are unwrapped. A nested struct value is raw delta-encoded bytes — no type def, no byte-count prefix.
  • Collection wire types use empty CommonType.Name. The Id field arrives with delta=2 (skipping the absent Name).
  • Top-level non-struct values use a singleton wrapper. The payload is 0x00 encoded_value.
  • Float bytes are reversed. Go's float encoding is byte-reversed IEEE 754, then encoded as unsigned int.

Limitations

  • No async API. EncodeAsync/DecodeAsync are explicitly out of scope. Use a thread pool worker with the synchronous API if needed.
  • No pointer types. Go pointers are transparent in gob; gobdotnet does not model them.
  • No BigInteger. Throws GobEncodeException at schema derivation — fail-loud.
  • Array length not preserved on decode. Go [3]int decodes to object?[3]; the fixed-length annotation is lost (re-encoding with GobFieldType.ArrayOf restores wire fidelity).
  • time.Duration precision. TimeSpan has 100 ns tick precision; sub-tick nanoseconds are truncated on decode.
  • time.Time offset narrowing. Sub-minute offsets are truncated on encode (real-world time zones are whole minutes).
  • Zone name loss. Go's IANA zone name does not survive a round-trip through DateTimeOffset.
  • Semantic type decode is encoder-side only. The decoder returns the underlying wire primitive (long, ulong, double, or string); callers convert back to the semantic type themselves.
  • Custom codecs via constructor don't call Decode. Pass a non-built-in codec via the codecs constructor parameter and the value returns as GobEncoded. Use RegisterCodec<T>() post-construction for typed decoding.

Development

# Run all tests
dotnet test

# Run only Go cross-validation tests (requires Go on PATH)
dotnet test --filter "Category=GoVerify"

# Verbose output
dotnet test --logger "console;verbosity=detailed"

# Run benchmarks
dotnet run -c Release --project GobDotNet.Benchmarks

# Regenerate testdata fixtures (requires Go)
go run GobDotNet.Tests/generate_testdata.go

# Manual Go verifier check
echo "" | go run ./GobDotNet.Tests/go_verify struct_simple

Test layers:

  1. Go → C# — decode every .gob file in testdata/ against its .json sidecar.
  2. C# → C# — round-trip property tests (FsCheck, 1000 iterations per shape) plus example-based round-trips.
  3. C# → Go — pipe C#-encoded output to go_verify; skipped when Go is not on PATH.
  4. Thread safety — 100-thread concurrent encode/decode stress tests.

Solution layout:

GobDotNet/                   Library (net10.0, no external deps, AOT-compatible)
GobDotNet.SourceGenerators/  [GobStruct] → compile-time schema (netstandard2.0)
GobDotNet.Tests/             xUnit + FsCheck + Go cross-validation
GobDotNet.Benchmarks/        BenchmarkDotNet vs. Newtonsoft.Json

Related Projects

  • pygob — Python port, the source of gobdotnet's testdata and mental model.
  • encoding/gob — Go's standard library implementation, the authoritative wire format specification.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors