Skip to content

AdamTheCreator/baseten-study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Baseten Solutions Architect — Study Guide & POC Toolkit

Preparing for a Solutions Architect IC role at Baseten. This guide covers the end-to-end customer journey, core platform pillars, POC methodology, hardware selection, scripting, and inference engineering fundamentals.

How to Use This Guide

Work through each section in order. Each builds on the last:

  1. 00 — Inference Engineering Foundations The mental models you need before touching any platform. Covers TTFT, throughput, batching, quantization, KV cache, prefill vs decode — the physics of inference.

  2. 01 — The Customer Journey (End-to-End) What a customer experiences from first contact through production deployment. Maps to the SA role: where you add value at each stage.

  3. 02 — Baseten Core Pillars Deep dive into the four product areas: Model Performance, MCM/Infra, DevUI & Truss, and Post-Training. What differentiates each from competitors and on-prem alternatives.

  4. 03 — Hardware Selection Guide H100 vs B200 vs A100 — how hardware choice affects throughput, latency, cost-per-token, and what a customer ultimately selects.

  5. 04 — The POC Playbook How a Solutions Architect runs a proof-of-concept: model selection, deploy, optimize, benchmark, and present findings that prove better throughput/latency than what the customer has today.

  6. 05 — Scripting for SAs Python and bash scripts for benchmarking, deployment automation, metrics collection, and customer-facing reporting. Hands-on examples.

  7. 06 — Competitive Landscape & On-Prem How Baseten compares to running vLLM on your own GPUs, using Replicate/Modal/ RunPod/Together, and what arguments win deals.

  8. 07 — Blind Spots & Glossary Things that trip up people new to inference engineering. Terminology, common misconceptions, and interview-relevant gotchas.

  9. 08 — Book Corrections & Additions (READ THIS FIRST) Cross-referenced against Philip Kiely's "Inference Engineering" (Baseten Books, 2026). 20 gaps, corrections, and key additions. Covers disaggregation, EAGLE speculation, ops:byte ratio, SGLang, NVIDIA Dynamo, quantization sensitivity, cache-aware routing, H200/B300 GPUs, MIG, distillation, and more. This is the errata sheet — read alongside the originals.

Scripts

All in scripts/ — runnable examples that demonstrate SA-relevant workflows:

  • deploy_model.py — Deploy a model via Truss programmatically
  • benchmark.py — Load test an endpoint, measure TTFT/throughput/p95
  • compare_quantizations.py — Deploy same model at fp16/fp8/fp4, compare
  • cost_calculator.py — Calculate cost-per-token for different GPU configs
  • generate_report.py — Generate a customer-facing POC report from benchmark data
  • health_check.sh — Quick endpoint health/latency check (bash)

Quick Start

# Install dependencies
pip install truss openai requests numpy tabulate matplotlib

# Authenticate with Baseten
uvx truss login

# Run your first benchmark against a pre-optimized model
python scripts/benchmark.py --model "deepseek-v3" --requests 100

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors