Baseten Solutions Architect — Study Guide & POC Toolkit

Preparing for a Solutions Architect IC role at Baseten. This guide covers the end-to-end customer journey, core platform pillars, POC methodology, hardware selection, scripting, and inference engineering fundamentals.

How to Use This Guide

Work through each section in order. Each builds on the last:

00 — Inference Engineering Foundations The mental models you need before touching any platform. Covers TTFT, throughput, batching, quantization, KV cache, prefill vs decode — the physics of inference.
01 — The Customer Journey (End-to-End) What a customer experiences from first contact through production deployment. Maps to the SA role: where you add value at each stage.
02 — Baseten Core Pillars Deep dive into the four product areas: Model Performance, MCM/Infra, DevUI & Truss, and Post-Training. What differentiates each from competitors and on-prem alternatives.
03 — Hardware Selection Guide H100 vs B200 vs A100 — how hardware choice affects throughput, latency, cost-per-token, and what a customer ultimately selects.
04 — The POC Playbook How a Solutions Architect runs a proof-of-concept: model selection, deploy, optimize, benchmark, and present findings that prove better throughput/latency than what the customer has today.
05 — Scripting for SAs Python and bash scripts for benchmarking, deployment automation, metrics collection, and customer-facing reporting. Hands-on examples.
06 — Competitive Landscape & On-Prem How Baseten compares to running vLLM on your own GPUs, using Replicate/Modal/ RunPod/Together, and what arguments win deals.
07 — Blind Spots & Glossary Things that trip up people new to inference engineering. Terminology, common misconceptions, and interview-relevant gotchas.
08 — Book Corrections & Additions (READ THIS FIRST) Cross-referenced against Philip Kiely's "Inference Engineering" (Baseten Books, 2026). 20 gaps, corrections, and key additions. Covers disaggregation, EAGLE speculation, ops:byte ratio, SGLang, NVIDIA Dynamo, quantization sensitivity, cache-aware routing, H200/B300 GPUs, MIG, distillation, and more. This is the errata sheet — read alongside the originals.

Scripts

All in scripts/ — runnable examples that demonstrate SA-relevant workflows:

deploy_model.py — Deploy a model via Truss programmatically
benchmark.py — Load test an endpoint, measure TTFT/throughput/p95
compare_quantizations.py — Deploy same model at fp16/fp8/fp4, compare
cost_calculator.py — Calculate cost-per-token for different GPU configs
generate_report.py — Generate a customer-facing POC report from benchmark data
health_check.sh — Quick endpoint health/latency check (bash)

Quick Start

# Install dependencies
pip install truss openai requests numpy tabulate matplotlib

# Authenticate with Baseten
uvx truss login

# Run your first benchmark against a pre-optimized model
python scripts/benchmark.py --model "deepseek-v3" --requests 100

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
notes		notes
scripts		scripts
Inference Engineering.pdf		Inference Engineering.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Baseten Solutions Architect — Study Guide & POC Toolkit

How to Use This Guide

Scripts

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Baseten Solutions Architect — Study Guide & POC Toolkit

How to Use This Guide

Scripts

Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages