data: build public-safe hard-negative candidate pools (#420)#426
Merged
Conversation
Add the deterministic hard-negative candidate-pool generator and the sandbox label-construction path for the RFC-0016 hard downstream benchmark, so a pack can mix a passing reference with plausible wrong candidates that create real reranking headroom. New `codelewm/eval/hard_negative_pool.py` (non-executing generator): - `generate_hard_negative_pool` derives a pool from an accepted reference: the passing reference, a no-action bait (unchanged before-state), a near-no-action bait, then single-point AST mutants (`codelewm.data.wsd_mutations.generate_mutants`) mapped to wrong-symbol / wrong-branch / deterministic-mutant classes. Output is deterministic given (reference, seed, pool_size). - Each candidate records a stable id, hard-negative class, checksum (`compute_json_sha256`), and static-check status via `ast.parse`. Mutant labels default to `unknown` (never asserted without verification); the two definitional baits are `fail` and the reference is `pass`. - `build_label_construction_report` emits the `codelewm.downstream_label_construction_report.v1` accounting report. The module never imports the sandbox (enforced by the eval import boundary). New `codelewm/data/hard_negative_labeler.py` (data-prep, sandbox): - `label_candidate` / `label_candidates` construct trustworthy pass/fail labels by executing candidates through the allowlisted stdlib-only sandbox (`run_one`) under timeouts, output limits, and the determinism check, and `build_sandbox_label_construction_report` records the sandbox policy version. This is the only RFC-0016 path that runs candidate code; it stays under `codelewm/data` so no scoring path imports it. `downstream_pack.py` gains an optional task-level `generated_pool` spec (`reference_after_path`, `seed`, `pool_size`). When present, the build generates the pool, materializes each candidate file, injects the hard-negative class / checksum / source-license status into the candidate source, writes `reports/label_construction_report.json`, and records it in the manifest. The source/license gate, split-leakage report (task_id + repo_id across splits), secret scan, and anti-saturation diagnostics from #419 all apply to generated pools. Adds a generated-pool fixture plus tests for candidate-class accounting, checksums and determinism, the label-construction report, the pack-build integration, split-leakage rejection, source/license blockers, the non-execution boundary, and sandbox-verified labeling. Closes #420. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the deterministic public-safe hard-negative candidate-pool generator and the sandbox label-construction path for the RFC-0016 hard downstream benchmark. A pack can now mix a passing reference with plausible wrong candidates (no-action / near-no-action baits + single-point AST mutants) that create real reranking headroom, instead of relying on easy syntax failures. The non-executing generator lives under
codelewm/eval; the only code-executing piece (sandbox labeler) lives undercodelewm/data, preserving the sandbox import boundary.Linked Issue
Closes #420.
Spec / RFC Reference
docs/spec/11-llm-world-model-harness.md,docs/spec/06-security.mddocs/rfcs/RFC-0016-hard-downstream-reranking-benchmark.mdPublic Surface Impact
New Python API:
codelewm.eval:generate_hard_negative_pool,build_label_construction_report,HardNegativeCandidate,HardNegativePoolError,HARD_NEGATIVE_POOL_SCHEMA_VERSION,LABEL_CONSTRUCTION_REPORT_SCHEMA_VERSION.codelewm.data.hard_negative_labeler:label_candidate,label_candidates,build_sandbox_label_construction_report,LabelTestCase,CandidateLabel,HardNegativeLabelerError.New schema versions (additive):
codelewm.hard_negative_pool.v1,codelewm.downstream_label_construction_report.v1,codelewm.hard_negative_labeled_candidate.v1.New config key (optional, task-level):
generated_pool(reference_after_path,seed,pool_size).DownstreamBenchmarkPackResultgains optionallabel_construction_report_path. No existing field/baseline/schema changed. No new CLI command —eval downstream-packdrives generation when the config setsgenerated_pool.Validation
Artifact Impact
A pack with a
generated_pooltask writesreports/label_construction_report.json(codelewm.downstream_label_construction_report.v1), materializes generated candidate files undertasks/<id>/candidates/, and recordslabel_construction_reportin the manifest. Each generated candidate'ssourcecarrieshard_negative_class,checksum,generator,label_source, andsource_license_status.Deprecations
noneCaveats / Follow-ups
unknownby the deterministic generator (not asserted). Thecodelewm.data.hard_negative_labelersandbox path upgrades them to verifiedpass/fail; feed those labels back into a config for headline runs.