data: build public-safe hard-negative candidate pools (#420) by AbdelStark · Pull Request #426 · AbdelStark/CodeLeWM

AbdelStark · 2026-06-08T13:46:53Z

Summary

Adds the deterministic public-safe hard-negative candidate-pool generator and the sandbox label-construction path for the RFC-0016 hard downstream benchmark. A pack can now mix a passing reference with plausible wrong candidates (no-action / near-no-action baits + single-point AST mutants) that create real reranking headroom, instead of relying on easy syntax failures. The non-executing generator lives under codelewm/eval; the only code-executing piece (sandbox labeler) lives under codelewm/data, preserving the sandbox import boundary.

Linked Issue

Closes #420.

Spec / RFC Reference

Spec section: docs/spec/11-llm-world-model-harness.md, docs/spec/06-security.md
RFC: docs/rfcs/RFC-0016-hard-downstream-reranking-benchmark.md

Public Surface Impact

New Python API:

codelewm.eval: generate_hard_negative_pool, build_label_construction_report, HardNegativeCandidate, HardNegativePoolError, HARD_NEGATIVE_POOL_SCHEMA_VERSION, LABEL_CONSTRUCTION_REPORT_SCHEMA_VERSION.
codelewm.data.hard_negative_labeler: label_candidate, label_candidates, build_sandbox_label_construction_report, LabelTestCase, CandidateLabel, HardNegativeLabelerError.

New schema versions (additive): codelewm.hard_negative_pool.v1, codelewm.downstream_label_construction_report.v1, codelewm.hard_negative_labeled_candidate.v1.

New config key (optional, task-level): generated_pool (reference_after_path, seed, pool_size). DownstreamBenchmarkPackResult gains optional label_construction_report_path. No existing field/baseline/schema changed. No new CLI command — eval downstream-pack drives generation when the config sets generated_pool.

Validation

uv run pytest tests/eval/test_hard_downstream_pool.py tests/data/test_hard_negative_labeler.py -q   # 14 passed
uv run pytest tests/eval/test_downstream_pack.py tests/eval/test_downstream_rerank.py tests/eval/test_downstream_schema.py tests/eval/test_hard_downstream_schema.py tests/eval/test_hard_downstream_pack.py tests/security/test_sandbox_import_boundary.py tests/test_imports.py -q   # 34 passed
uv run python -m compileall -q codelewm/eval/hard_negative_pool.py codelewm/data/hard_negative_labeler.py codelewm/eval/downstream_pack.py codelewm/eval/__init__.py
git diff --check

Artifact Impact

A pack with a generated_pool task writes reports/label_construction_report.json (codelewm.downstream_label_construction_report.v1), materializes generated candidate files under tasks/<id>/candidates/, and records label_construction_report in the manifest. Each generated candidate's source carries hard_negative_class, checksum, generator, label_source, and source_license_status.

Deprecations

none

Caveats / Follow-ups

Mutant candidates are labeled unknown by the deterministic generator (not asserted). The codelewm.data.hard_negative_labeler sandbox path upgrades them to verified pass/fail; feed those labels back into a config for headline runs.
The bundled generated-pool fixture has one task (and identical token sets at n=1), so it stays claim-blocked; it exercises plumbing, not eligibility.
Follow-ups: harness: ingest LLM candidate packs into the hard benchmark #421 (LLM candidate ingestion), eval: score hard benchmark baselines and CodeLeWM claim gate #422 (scoring + claim gate), results: publish hard benchmark artifacts and claim audit #423 (publication).

Add the deterministic hard-negative candidate-pool generator and the sandbox label-construction path for the RFC-0016 hard downstream benchmark, so a pack can mix a passing reference with plausible wrong candidates that create real reranking headroom. New `codelewm/eval/hard_negative_pool.py` (non-executing generator): - `generate_hard_negative_pool` derives a pool from an accepted reference: the passing reference, a no-action bait (unchanged before-state), a near-no-action bait, then single-point AST mutants (`codelewm.data.wsd_mutations.generate_mutants`) mapped to wrong-symbol / wrong-branch / deterministic-mutant classes. Output is deterministic given (reference, seed, pool_size). - Each candidate records a stable id, hard-negative class, checksum (`compute_json_sha256`), and static-check status via `ast.parse`. Mutant labels default to `unknown` (never asserted without verification); the two definitional baits are `fail` and the reference is `pass`. - `build_label_construction_report` emits the `codelewm.downstream_label_construction_report.v1` accounting report. The module never imports the sandbox (enforced by the eval import boundary). New `codelewm/data/hard_negative_labeler.py` (data-prep, sandbox): - `label_candidate` / `label_candidates` construct trustworthy pass/fail labels by executing candidates through the allowlisted stdlib-only sandbox (`run_one`) under timeouts, output limits, and the determinism check, and `build_sandbox_label_construction_report` records the sandbox policy version. This is the only RFC-0016 path that runs candidate code; it stays under `codelewm/data` so no scoring path imports it. `downstream_pack.py` gains an optional task-level `generated_pool` spec (`reference_after_path`, `seed`, `pool_size`). When present, the build generates the pool, materializes each candidate file, injects the hard-negative class / checksum / source-license status into the candidate source, writes `reports/label_construction_report.json`, and records it in the manifest. The source/license gate, split-leakage report (task_id + repo_id across splits), secret scan, and anti-saturation diagnostics from #419 all apply to generated pools. Adds a generated-pool fixture plus tests for candidate-class accounting, checksums and determinism, the label-construction report, the pack-build integration, split-leakage rejection, source/license blockers, the non-execution boundary, and sandbox-verified labeling. Closes #420. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

AbdelStark merged commit 4c9b13f into main Jun 8, 2026
9 checks passed

AbdelStark deleted the issue-420-hard-negative-pools branch June 8, 2026 13:51

AbdelStark mentioned this pull request Jun 8, 2026

[TRACKER] v1.5 hard anti-saturation downstream reranking benchmark #417

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data: build public-safe hard-negative candidate pools (#420)#426

data: build public-safe hard-negative candidate pools (#420)#426
AbdelStark merged 1 commit into
mainfrom
issue-420-hard-negative-pools

AbdelStark commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AbdelStark commented Jun 8, 2026

Summary

Linked Issue

Spec / RFC Reference

Public Surface Impact

Validation

Artifact Impact

Deprecations

Caveats / Follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant