Skip to content

data: build public-safe hard-negative candidate pools (#420)#426

Merged
AbdelStark merged 1 commit into
mainfrom
issue-420-hard-negative-pools
Jun 8, 2026
Merged

data: build public-safe hard-negative candidate pools (#420)#426
AbdelStark merged 1 commit into
mainfrom
issue-420-hard-negative-pools

Conversation

@AbdelStark

Copy link
Copy Markdown
Owner

Summary

Adds the deterministic public-safe hard-negative candidate-pool generator and the sandbox label-construction path for the RFC-0016 hard downstream benchmark. A pack can now mix a passing reference with plausible wrong candidates (no-action / near-no-action baits + single-point AST mutants) that create real reranking headroom, instead of relying on easy syntax failures. The non-executing generator lives under codelewm/eval; the only code-executing piece (sandbox labeler) lives under codelewm/data, preserving the sandbox import boundary.

Linked Issue

Closes #420.

Spec / RFC Reference

  • Spec section: docs/spec/11-llm-world-model-harness.md, docs/spec/06-security.md
  • RFC: docs/rfcs/RFC-0016-hard-downstream-reranking-benchmark.md

Public Surface Impact

New Python API:

  • codelewm.eval: generate_hard_negative_pool, build_label_construction_report, HardNegativeCandidate, HardNegativePoolError, HARD_NEGATIVE_POOL_SCHEMA_VERSION, LABEL_CONSTRUCTION_REPORT_SCHEMA_VERSION.
  • codelewm.data.hard_negative_labeler: label_candidate, label_candidates, build_sandbox_label_construction_report, LabelTestCase, CandidateLabel, HardNegativeLabelerError.

New schema versions (additive): codelewm.hard_negative_pool.v1, codelewm.downstream_label_construction_report.v1, codelewm.hard_negative_labeled_candidate.v1.

New config key (optional, task-level): generated_pool (reference_after_path, seed, pool_size). DownstreamBenchmarkPackResult gains optional label_construction_report_path. No existing field/baseline/schema changed. No new CLI command — eval downstream-pack drives generation when the config sets generated_pool.

Validation

uv run pytest tests/eval/test_hard_downstream_pool.py tests/data/test_hard_negative_labeler.py -q   # 14 passed
uv run pytest tests/eval/test_downstream_pack.py tests/eval/test_downstream_rerank.py tests/eval/test_downstream_schema.py tests/eval/test_hard_downstream_schema.py tests/eval/test_hard_downstream_pack.py tests/security/test_sandbox_import_boundary.py tests/test_imports.py -q   # 34 passed
uv run python -m compileall -q codelewm/eval/hard_negative_pool.py codelewm/data/hard_negative_labeler.py codelewm/eval/downstream_pack.py codelewm/eval/__init__.py
git diff --check

Artifact Impact

A pack with a generated_pool task writes reports/label_construction_report.json (codelewm.downstream_label_construction_report.v1), materializes generated candidate files under tasks/<id>/candidates/, and records label_construction_report in the manifest. Each generated candidate's source carries hard_negative_class, checksum, generator, label_source, and source_license_status.

Deprecations

none

Caveats / Follow-ups

Add the deterministic hard-negative candidate-pool generator and the
sandbox label-construction path for the RFC-0016 hard downstream benchmark,
so a pack can mix a passing reference with plausible wrong candidates that
create real reranking headroom.

New `codelewm/eval/hard_negative_pool.py` (non-executing generator):
- `generate_hard_negative_pool` derives a pool from an accepted reference:
  the passing reference, a no-action bait (unchanged before-state), a
  near-no-action bait, then single-point AST mutants
  (`codelewm.data.wsd_mutations.generate_mutants`) mapped to wrong-symbol /
  wrong-branch / deterministic-mutant classes. Output is deterministic given
  (reference, seed, pool_size).
- Each candidate records a stable id, hard-negative class, checksum
  (`compute_json_sha256`), and static-check status via `ast.parse`. Mutant
  labels default to `unknown` (never asserted without verification); the two
  definitional baits are `fail` and the reference is `pass`.
- `build_label_construction_report` emits the
  `codelewm.downstream_label_construction_report.v1` accounting report. The
  module never imports the sandbox (enforced by the eval import boundary).

New `codelewm/data/hard_negative_labeler.py` (data-prep, sandbox):
- `label_candidate` / `label_candidates` construct trustworthy pass/fail
  labels by executing candidates through the allowlisted stdlib-only sandbox
  (`run_one`) under timeouts, output limits, and the determinism check, and
  `build_sandbox_label_construction_report` records the sandbox policy
  version. This is the only RFC-0016 path that runs candidate code; it stays
  under `codelewm/data` so no scoring path imports it.

`downstream_pack.py` gains an optional task-level `generated_pool` spec
(`reference_after_path`, `seed`, `pool_size`). When present, the build
generates the pool, materializes each candidate file, injects the
hard-negative class / checksum / source-license status into the candidate
source, writes `reports/label_construction_report.json`, and records it in
the manifest. The source/license gate, split-leakage report (task_id +
repo_id across splits), secret scan, and anti-saturation diagnostics from
#419 all apply to generated pools.

Adds a generated-pool fixture plus tests for candidate-class accounting,
checksums and determinism, the label-construction report, the pack-build
integration, split-leakage rejection, source/license blockers, the
non-execution boundary, and sandbox-verified labeling.

Closes #420.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AbdelStark AbdelStark merged commit 4c9b13f into main Jun 8, 2026
9 checks passed
@AbdelStark AbdelStark deleted the issue-420-hard-negative-pools branch June 8, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data: build public-safe hard-negative candidate pools

1 participant