Skip to content

docs: lock hard downstream benchmark spec and tracker #418

Description

@AbdelStark

Parent

#417

What to build

Land the repo-level contract for the hard anti-saturation downstream reranking benchmark. This includes RFC-0016, the dedicated roadmap, the downstream benchmark/spec wording, implementation tracker entries, and docs tests that prevent future agents from weakening the anti-saturation gate.

Acceptance criteria

  • RFC-0016 defines the scientific question, candidate classes, anti-saturation filter, required baselines, metrics, split policy, security boundary, and claim gate.
  • The LLM/world-model harness spec points to the anti_saturation_semantic_v1 profile and the codelewm.downstream_anti_saturation_report.v1 diagnostic report.
  • The downstream benchmark docs explain why MBPP-Plus saturation motivated this follow-up.
  • Roadmap/tracker docs list the v1.5 issue order and preserve the v1.0 claim boundary.
  • Docs tests assert the new contract and issue references.

Blocked by

None - can start immediately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions