Skip to content

mobarski/rydz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rydz

"Lepszy rydz niż nic" — Polish proverb meaning "better something than nothing"

Build and deploy LLM-based classifiers in minutes, not days.

Rydz uses logprobs to extract classification probabilities from LLMs in a single API call — no fine-tuning, no training data, no ML pipeline. Just a prompt and a model.

The key advantage: Rydz does not just return a label. It tells you how uncertain the model is.

Docs: DeepWiki

Practical agent-oriented examples: USAGE_PATTERNS.md

Why?

A perfect classifier you don't have time to build is worth less than a good-enough one you can deploy right now. Rydz gives you the latter — and, crucially, it tells you when the answer is shaky.

That matters because one of the standard complaints about AI systems is that they sound equally confident when they are right and when they are wrong. Rydz is useful precisely because it exposes that uncertainty instead of hiding it.

Rydz is an ad hoc classifier creation tool: define labels, write a prompt, and get usable probabilities immediately.

It also fits standard ML workflows. In practice, "complex ML model vs no classifier" is rarely a useful comparison. A better baseline is traditional ML vs ad hoc LLM classifier, then compare quality lift against total cost (data collection, training, inference, maintenance) to make a real cost/benefit decision.

Fun fact: Contrary to popular belief, "rydz" in the Polish proverb doesn't refer to a mushroom. It's Camelina sativa — a humble oil plant that thrives where nothing else will grow. Seemed like a fitting name for a library that gets the job done when fancier solutions aren't an option.

Quick start

from rydz import get_logprobs_response, get_probability

prompt = """Classify the sentiment of this review as POSITIVE or NEGATIVE.

Review: "The battery lasts forever and the screen is gorgeous!"

Sentiment:"""

resp = get_logprobs_response("openai:gpt-4o-mini", prompt)
print(f"positive: {get_probability(resp, 'POSITIVE'):.1%}")
print(f"negative: {get_probability(resp, 'NEGATIVE'):.1%}")
# positive: 99.2%  negative: 0.8%

Parallel classification

from rydz import get_logprobs_response, get_probability, tmap

def classify(text):
    prompt = f"Is this spam? Answer YES or NO.\n\n{text}\n\nAnswer:"
    resp = get_logprobs_response("xai:grok-4-1-fast-non-reasoning", prompt)
    return get_probability(resp, "YES")

results = list(tmap(classify, texts, workers=16))

How it works

  1. You craft a prompt that frames the classification task
  2. Rydz sends it to the LLM and requests logprobs for the first output token
  3. Probabilities of your target labels are extracted directly — no sampling, no repeated calls
  4. One API call → one classification with confidence scores

This means: high throughput, low cost, low latency. Thousands of input tokens, 1 output token.

It also means you can route low-confidence cases to humans, apply thresholds, or trigger fallback logic instead of blindly trusting every answer.

Why logprobs?

approach calls per classification confidence score training data setup time
Rydz (LLM + logprobs) 1 yes, native none minutes
LLM + text parsing OR structured output 1 no none minutes
LLM + repeated sampling 5–20 approximate none minutes
traditional ML 1 yes thousands+ days–weeks

Logprobs give you calibrated confidence in a single call: no repeated sampling, no parsing "yes"/"no" from free text, and no training data collection. You get a probability distribution over your labels directly from the model's internals.

This is the opposite of the usual "black box AI" problem: instead of a bare answer, you get an explicit signal of uncertainty.

Real-world benchmark: 280 book fragments × 8 classification criteria (~1.5M input tokens, 2200+ data points) — processed in 36 seconds for $0.25 using two cloud providers in parallel, or 4 minutes using a local model (Bielik) on a single RTX 3090.

Model selection

Rydz works best with instruction-tuned (chat) models — they follow the prompt and put the answer label as the first token, which is exactly what logprobs extraction needs.

Reasoning models (e.g. GLM-5, Kimi K2.5, DeepSeek V3.2) — they emit chain-of-thought tokens before the answer, so they require a higher max_tokens value, cost more, and take longer.

To use reasoning models, add reasoning=True to the get_logprobs_response call or add :reasoning suffix to the model name:

get_logprobs_response("together:moonshotai/Kimi-K2.5", prompt, reasoning=True)
get_logprobs_response("together:moonshotai/Kimi-K2.5:reasoning", prompt)

Beyond naive classifiers

A single model with a hand-crafted prompt is just the starting point. Rydz's low cost and high throughput make it practical to build more advanced systems:

  • Ensemble / majority voting — score the same input with multiple models and aggregate results for higher accuracy and resilience
  • Cache-friendly repeated queries — keep one long document fixed, vary short entity-relation questions at the end, and exploit prompt cache / KV cache for much cheaper repeated scoring
  • Automatic prompt optimization — integrate with frameworks like DSPy to optimize prompts systematically instead of relying on intuition alone

Rydz is also a practical baseline for production evaluation: compare your traditional ML pipeline against a strong ad hoc LLM classifier, then decide if the extra quality justifies the extra cost and complexity.

You don't have to pick one path upfront — start simple, then scale only where the cost/benefit is clear.

Features

  • One thing, done well — logprobs-based classification
  • Uncertainty is visible — get probability distributions, not just labels
  • Multiple providers — OpenAI, xAI, Together, Fireworks, Hyperbolic, Cerebras, OpenRouter, LM Studio
  • Provider quirks handled — different APIs, limits, endpoints — all behind one interface
  • Parallel processing — classify thousands of items across providers in seconds
  • Minimal dependencies — just openai
  • Reasoning models — extract logprobs from first token after chain-of-thought (v0.2)

Installation

pip install git+https://github.com/mobarski/rydz

or

uv pip install git+https://github.com/mobarski/rydz

In the future it will be added to the PyPI.

Model format

Models use the provider:model_name convention (optional :reasoning suffix for reasoning models):

"openai:gpt-4.1-nano"
"xai:grok-4-1-fast-non-reasoning"
"together:moonshotai/Kimi-K2-Instruct-0905"
"hyperbolic:Qwen/Qwen3-Next-80B-A3B-Instruct"
"lmstudio:bielik-11b-v3.0-instruct"
"together:moonshotai/Kimi-K2.5:reasoning"

Supported providers

local

provider env variable notes
lmstudio LMSTUDIO_API_KEY

cloud

provider env variable notes
openai OPENAI_API_KEY
xai XAI_API_KEY
together TOGETHER_API_KEY
hyperbolic HYPERBOLIC_API_KEY
fireworks FIREWORKS_API_KEY
cerebras CEREBRAS_API_KEY
openrouter OPENROUTER_API_KEY most inference providers = no logprobs
google GOOGLE_API_KEY no logprobs

custom provider

from rydz import register_provider
register_provider("myprovider", "https://api.example.com/v1", quirks={"max_tokens": 4})
# uses MYPROVIDER_API_KEY env variable, model string: "myprovider:model-name"

You can also create aliases for existing providers — useful for multiple API keys (higher rate limits, separate billing):

from rydz import register_alias
register_alias("openai2", "openai")
# uses OPENAI2_API_KEY env variable, model string: "openai2:gpt-4.1-nano"

register_alias("openai3", "openai", quirks={"top_logprobs": 10})
# same as above but with custom quirks

secret manager integration

By default, Rydz reads API keys from environment variables. In corporate environments you may need to fetch keys from a secret manager (AWS Secrets Manager, Azure Key Vault, HashiCorp Vault, etc.). Use the get_api_key quirk:

from rydz import set_quirk

set_quirk("openai", "get_api_key", lambda model: vault.get_secret("openai-api-key"))

The function receives the full model string (e.g. "openai:gpt-4.1-nano") so you can route different models to different secrets. Works with custom providers and aliases too:

set_quirk("myprovider", "get_api_key", my_secret_manager.get_key)

Use cases

E-commerce & Marketing — sentiment analysis in product reviews, matching ad profiles to article content

Customer Support — automatic ticket categorization and priority routing, intent detection in customer messages

Media & Publishing — content moderation, scene and emotion tagging in narratives, content complexity matching to target audience

Legal & Compliance — contract risk clause detection, document confidentiality classification

Software & AI — safety analysis of AI-generated code, detecting policy violations in LLM outputs

These are just examples. For best results, point your favorite AI assistant to this repo and ask how Rydz can help in your business.

Planned features

  • Multimodal input — classify images alongside text using vision-capable models
  • More local inference providers — vllm, sglang, llama-server, ollama
  • More online inference providers — huggingface, featherless, nscale, zai
  • Advanced usage examples — evaluation, majority voting, automatic prompt optimization

License

MIT

Changelog

  • 0.2.1 - make reasoning the canonical name for thinking/reasoning in the API and in the docs
  • 0.2 - thinking/reasoning model support (logprobs extraction after chain-of-thought)

About

LLM-based classification via logprobs — one call, one token, full confidence scores

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages