Site tier-1 follow-up: per-model deep-dive page#23
Draft
MaxGhenis wants to merge 1 commit into
Draft
Conversation
Statically generates a dedicated page for each of the 12 models in data.json, using generateStaticParams so the entire site stays a pure static export. Each page renders: - Headline strip: provider mark, model name, global/US/UK scores, parse-rate pill — all sourced from globalStat.countryScores. - Hardest outputs: top-5 lowest-scoring output groups (country × outputGroup) computed by reusing buildAllRows/scorePrediction from lib/sensitivity.ts and lib/scoring.ts, aggregated the same way as the headline scorer. - Sample wrong predictions: up to 10 (scenario, variable) cells where relErr > 10% and score < 0.75, sorted by largest relative error, with prediction / ground-truth / error columns plus a collapsible model explanation and a link back to /#scenarios. - Back to leaderboard link. Reuses SiteHeader (alwaysExpanded + actionLink back to /), the Badge color scheme from ModelLeaderboard, and Tailwind v4 design-token classes throughout. Build smoke-test: `bun run build` produces the /model/[id] SSG route with all 12 model paths; `bun run lint` is clean. https://claude.ai/code/session_01DS3KJmEye7o7ff18RdthTC
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #8 and #9. Adds a statically-generated per-model deep-dive page at
/model/[id]— one page per model present indata.json.Rendered sections
1. Headline strip (inside
SiteHeaderexpandedContent,alwaysExpanded)ProviderMark) + model name + provider labelglobalStat.countryScores), Parse rate (nParsed / n)2. Hardest outputs — top 5 lowest-scoring output groups for this model
(country, outputGroup)level usingbuildAllRows+scorePredictionfromlib/sensitivity.ts/lib/scoring.tsscoresPerCountryModel: per-row scores → output-group mean → displayed scoregetVariableLabel), country tag, and aBadge(same color thresholds asModelLeaderboard)3. Sample wrong predictions — up to 10 distinct
(country, scenario, variable)cells where relErr > 10% and score < 0.75<details>block with the model's explanation text/#scenariosfor the scenario explorer, plus the scenario ID4. Back to leaderboard link at page bottom
Static routes generation
generateStaticParamscollects all model IDs fromdashboard.global.modelStatsand the union of country-levelmodelStats, returning one{ id }entry per model. The current data produces 12 static routes:Library reuse
lib/scoring.ts—scorePrediction,metricTypeForVariablelib/sensitivity.ts—buildAllRowsScoreRow[]for all countries, filtered to the modellib/bootstrap.tsScoring math
For each
(country, outputGroup)pair, the displayed score is the mean of per-row scores (eachscorePredictionresult × 100) across all scenarios and person-expanded variables that map to that output group. This is equivalent to the inner two levels of the 3-level mean inscoresPerCountryModel.Smoke test
Build output excerpt:
Test plan
/model/gpt-5.5— headline shows Global / US / UK scores, parse rate<details>block/#scenarios//model/nonexistent-model— returns 404🤖 Generated with Claude Code
Generated by Claude Code