A pure-managed C# port of MinishLab Model2Vec
static-embedding inference. It loads Model2Vec model folders containing
model.safetensors, tokenizer.json, and config.json, then computes sentence
embeddings without Python, native libraries, or ONNX.
Model2Vec inference is:
- tokenize with the model tokenizer, without special tokens;
- remove unknown-token ids;
- truncate to
maxLengthtokens (default: 512); - gather embedding rows for token ids;
- apply optional vocabulary-quantization weights if present;
- mean-pool over tokens, returning zeros for empty token lists;
- L2-normalize when
config.jsonhas"normalize": true.
- Pure C# (
net10.0), no native dependency, no P/Invoke. - Reads the safetensors format directly.
- Supports Model2Vec
embeddingstensors and Sentence Transformersembedding.weighttensors. - Supports
F32,F16,F64,I8, andU8embedding tensors. - Supports Model2Vec vocabulary-quantization
weightsandmappingtensors. - Uses
Microsoft.ML.Tokenizersfor Hugging Face WordPiece and byte-level BPE tokenizers. - SIMD-accelerated scaling and normalization via
System.Numerics.Tensors. - Implements
Microsoft.Extensions.AIIEmbeddingGenerator<string, Embedding<float>>for use in the .NET AI ecosystem (RAG, vector stores, semantic search).
The library parses tokenizer.json, dispatches by model.type, and constructs the
corresponding Microsoft.ML.Tokenizers tokenizer:
WordPiece:BertTokenizer, includingBertNormalizersettings such as lowercase, accent stripping, CJK splitting, unknown token, continuation prefix, and maximum input characters per word.BPE:BpeTokenizer. Byte-level BPE tokenizers are supported, including GPT-2/Roberta byte-to-unicode preprocessing andadd_prefix_space.Unigram: supported when a SentencePiece.modelfile is present alongsidetokenizer.json; Hugging Face JSON-only Unigram vocabularies require follow-up support becauseMicrosoft.ML.Tokenizers2.0.0 loads SentencePiece from.model.
All tokenizers encode without special tokens, remove unknown-token ids before
pooling, and apply Model2Vec pre-truncation and final maxLength truncation.
Model2Vec.Net implements the inference half of Model2Vec — loading a distilled static model and encoding text. It deliberately does not include model distillation or training:
- Distilling a new static model from a teacher sentence-transformer (forward-passing the vocabulary, PCA dimensionality reduction, and Zipf/SIF weighting).
- The
tokenlearncorpus post-training step and classifier-head training (model2vec.train,model2vec.distill).
Why these are out of scope: distillation and training require running a full
transformer encoder and an autodiff/optimizer training loop. In .NET that means
taking a native deep-learning dependency (ONNX Runtime or libtorch), which would
break this package's defining property: pure-managed with no native dependency.
Distillation is also a one-time, offline, GPU-friendly step — you produce a model
once with the upstream Python tooling and load the resulting model.safetensors
here. If managed distillation is ever needed it belongs in a separate package
built on a DL runtime, keeping this inference core small and dependency-free.
using Model2VecNet;
var model = Model2VecModel.Load(@"C:\models\potion-base-2M");
float[] embedding = model.Encode("The quick brown fox jumps over the lazy dog.");
Console.WriteLine(model.Dimension);
Console.WriteLine(embedding.Length);Batch encoding:
float[][] embeddings = model.Encode([
"First sentence",
"Second sentence",
]);Model2VecModel is immutable after loading and safe to share across threads.
Model files are published on Hugging Face and are not bundled in this repository. The test suite downloads:
minishlab/potion-base-2MJarbas/ovos-model2vec-intents-distilroberta-base-ca-v2model.safetensorstokenizer.jsonconfig.json
dotnet build -c Release
dotnet test -c ReleaseThe oracle tests compare .NET embeddings against Python model2vec outputs for
the WordPiece and BPE test models with element-wise tolerance 1e-4.
The BenchmarkDotNet suite is under bench\Model2Vec.Net.Benchmarks and covers
single short-text encode, single long-text encode, batch encode, and model load.
Place potion-base-2M under the benchmark project's models folder or set
MODEL2VEC_POTION_BASE_2M.
MIT — see LICENSE. See THIRD-PARTY-NOTICES.md for attribution.