Save rankings feature by warreveys · Pull Request #49 · techwolf-ai/workrb

warreveys · 2026-05-06T10:07:38Z

Description

Add a save_rankings: bool = False flag to workrb.evaluate() that persists the full prediction matrix for each ranking-task dataset under <output_folder>/rankings/<model_name>/<task>__<dataset_id>.json. Each artifact carries a self-describing header (schema version, workrb version, model/task/dataset identity, sizes, query/target canary strings).

To enable this without recomputing, RankingTask.evaluate is split into compute_prediction_matrix + compute_metrics_from_prediction_matrix. Default behaviour of evaluate() is unchanged.

Adds a companion entry point workrb.evaluate_rankings(rankings_dir, tasks, ...) that replays saved artifacts to compute metrics without a model. A new rankings module handles loading, header validation (schema version is hard reject; structural mismatch is a hard reject; workrb-version drift only warns), and matrix materialization. RankingsArtifactMissing and RankingsArtifactInvalid are exported.

Also: BenchmarkMetadata gains replayed_from_workrb_version (None for normal runs), and _get_dataset_ids_to_evaluate is refactored so ExecutionMode.ALL consistently keeps every dataset regardless of
aggregation mode.

Checklist

Added new tests for new functionality
Tested locally with example tasks
Code follows project style guidelines
Documentation updated
No new warnings introduced

Add a `save_rankings: bool = False` flag to `workrb.evaluate()` that persists per-target ranking score arrays for each ranking-task dataset under `<output_folder>/rankings/<model_name>/<task>__<dataset_id>.json`. Each artifact also records `model_name` in its payload so files remain self-describing if moved. To enable this without recomputing the prediction matrix, `RankingTask.evaluate` is split into `compute_prediction_matrix` + `compute_metrics_from_prediction_matrix`; default behavior is unchanged.

warreveys added 2 commits May 6, 2026 11:40

ruff fixes

c603baf

warreveys marked this pull request as draft May 6, 2026 10:07

warreveys marked this pull request as ready for review May 6, 2026 10:24

warreveys requested a review from Mattdl May 7, 2026 06:44

warreveys added 6 commits May 16, 2026 12:50

1 file per run

4dd2681

Standardize the save + implementation of a reading counterpart

de9af5d

documentation enhancement

2a84f1c

lint

2133f2f

tweaks

b38ccce

Merge branch 'main' into save-rankings-feature

ba2342f

warreveys merged commit dcba546 into techwolf-ai:main May 26, 2026
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save rankings feature#49

Save rankings feature#49
warreveys merged 8 commits into
techwolf-ai:mainfrom
warreveys:save-rankings-feature

warreveys commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

warreveys commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

warreveys commented May 6, 2026 •

edited

Loading