Skip to content

feat(skills): remotion-to-hyperframes SKILL.md + orchestrator (7/7)#517

Merged
jrusso1020 merged 6 commits intomainfrom
skill/r2hf-skill-body
Apr 28, 2026
Merged

feat(skills): remotion-to-hyperframes SKILL.md + orchestrator (7/7)#517
jrusso1020 merged 6 commits intomainfrom
skill/r2hf-skill-body

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

What

The leaf PR of the 7-PR stack. Two pieces:

  1. Real SKILL.md body — replaces the placeholder from PR 1 with a 5-step workflow that loads per-topic references on demand (skill-creator's progressive-disclosure pattern).
  2. Top-level orchestrator at assets/test-corpus/run.sh that runs all four tiers and emits a pass/fail report.

End-to-end validation

Run on a clean checkout:

remotion-to-hyperframes corpus run
==================================
  ▶ tier-1-title-card (threshold 0.95, composition TitleCard)
    ⏳ render Remotion baseline
    ⏳ render HF translation
    ✓ pass (mean SSIM 0.9739, threshold 0.95)
  ▶ tier-2-multi-scene (threshold 0.95, composition MultiScene)
    ⏳ render Remotion baseline
    ⏳ render HF translation
    ✓ pass (mean SSIM 0.985292, threshold 0.95)
  ▶ tier-3-data-driven (threshold 0.9, composition Stargazed)
    ⏳ render Remotion baseline
    ⏳ render HF translation
    ✓ pass (mean SSIM 0.952941, threshold 0.9)
  ▶ tier-4-escape-hatch (lint-only)
    ✓ pass (8/8 cases)

==================================================
  passed 4/4, failed 0, skipped 0

SKILL.md design

Per skill-creator's progressive-disclosure pattern:

  • Frontmatter (always loaded): trigger phrases + out-of-scope list. Unchanged from PR 1.
  • Body (loaded when triggered): 5-step workflow under 100 lines. Lean by design.
  • References (loaded as needed): the 11 per-topic files from PR 6.

The body has a "Source contains → Load reference" table that tells the agent which references to pull based on the actual Remotion APIs the source uses. The agent doesn't load all 11 references — it loads 1–4 based on what the source needs.

| Source contains | Load reference |
|---|---|
| `Composition`, `defaultProps`, `schema`, `calculateMetadata` | parameters.md |
| `Sequence`, `Series`, `Loop`, `AbsoluteFill`, `Freeze` | sequencing.md |
| `useCurrentFrame`, `interpolate`, `spring`, `Easing` | timing.md |
| `Audio`, `Video`, `Img`, `IFrame`, `staticFile` | media.md |
| `TransitionSeries`, `@remotion/transitions` | transitions.md |
| `@remotion/lottie` | lottie.md |
| `@remotion/google-fonts/<Family>`, `Font.loadFont` | fonts.md |

Orchestrator design

assets/test-corpus/run.sh:

  • Iterates tier-1-* through tier-4-* directories.
  • For T1–T3: setup binary assets (T2 only) → lint Remotion source → lazy npm install → render Remotion baseline → render HF translation → SSIM diff at the fixture's per-tier threshold → generate frame strip on failure.
  • For T4: validate.sh (lint-only).
  • Emits run-report.json with per-tier pass/fail and aggregate counts.
  • Accepts a single-tier argument for fast iteration: ./run.sh tier-1-title-card.

The orchestrator is what makes this skill regression-testable: any future PR that touches the references can re-run ./run.sh to verify nothing broke.

Stack — complete

# PR Status What
1 #506 draft scaffold
2 #507 draft eval harness
3 #508 draft T1 + T2 fixtures (validated 0.974 / 0.985)
4 #509 draft T3 data-driven fixture (validated 0.953)
5 #515 draft T4 escape-hatch fixtures (8/8 cases)
6 #516 draft references (11 files, ~1500 LOC)
7 this PR draft SKILL.md body + orchestrator

Test plan

  • SKILL.md passes oxfmt --check
  • ./run.sh runs all four tiers end-to-end on a clean checkout, all pass
  • ./run.sh tier-1-title-card runs a single tier (selective execution works)
  • Skill-creator's package_skill.py validates the final skill
  • Lefthook pre-commit (lint + format + typecheck) passes
  • run-report.json excluded from git via .gitignore (it's regenerated per run)

Reviewer notes

This is a substantive skill (~3000 LOC across the stack — scripts, fixtures, references, body) so it warrants more than a glance. Highest-leverage things to look at:

  1. SKILL.md body — the workflow has to be right because everything else hangs off of it. Specifically the "Source contains → Load reference" table — if any common Remotion idiom is missing a reference link, the skill will load the wrong context.
  2. references/timing.md spring → back.out table — every spring config in the wild needs a corresponding row. PR 6 has the validated 1.4 / 1.2 ratios; if you've seen springs the table doesn't cover, add them.
  3. The 0.95 / 0.95 / 0.90 / n/a thresholds — these came from a single calibration run on this machine. CI on a different host with a different Chromium version could drift. Wider thresholds = fewer false alarms but higher chance of missing a real regression.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final orchestration layer has two blockers: the lambda-only policy contradicts the corpus/references, and the advertised corpus runner fails before it can run the lint-only tier in a clean checkout.

Comment thread skills/remotion-to-hyperframes/SKILL.md Outdated
- **Warnings** (translate after dropping): `delayRender`, `useCallback`, `useMemo`, custom hooks.
- **Info** (translate with note): `staticFile`, `interpolateColors`.

If any blocker fires, **stop**. Read [`references/escape-hatch.md`](references/escape-hatch.md) and surface the recommendation message. Do not produce HF output for a source that hasn't passed lint cleanly.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes every blocker a hard stop with no HF output, but the T4 corpus expects lambda-only sources to use drop_lambda_code_translate_remainder_if_clean, and escape-hatch.md also documents that edge case. Since lint_source.py currently emits r2hf/lambda-import as a blocker with exit code 1, the final skill will never take the documented lambda-only path. Please align the policy across SKILL.md, T4, references, and the linter severity before this lands.

STRIP="$SKILL_DIR/scripts/frame_strip.sh"
HF_CLI="$REPO_ROOT/packages/cli/dist/cli.js"

if [[ ! -f "$HF_CLI" ]]; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This preflight runs before tier selection, so ./assets/test-corpus/run.sh tier-4-escape-hatch fails in a clean checkout even though T4 is lint-only and does not need the HF CLI. I reproduced the failure locally: error: HF CLI not built. Run 'bun run --filter @hyperframes/core build' .... The suggested command is also wrong for this check because the script needs packages/cli/dist/cli.js; building @hyperframes/core will not produce that file. Please only require the CLI for T1-T3 and point users at the CLI/root build command.

@jrusso1020 jrusso1020 force-pushed the skill/r2hf-references branch from 6717d96 to 662c4db Compare April 27, 2026 23:08
@jrusso1020 jrusso1020 force-pushed the skill/r2hf-skill-body branch from 01ed4b7 to 95b89ca Compare April 27, 2026 23:10
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

jrusso1020 commented Apr 27, 2026

@miguel-heygen — addressed in the amended commit 95b89ca0:

SKILL.md Step 1 — @remotion/lambda moved from Blockers list to Warnings list. Now consistent with #507's lint severity, T4's expected.json (#515), and the references (#516):

- **Blockers** (refuse + recommend interop): `useState`, `useReducer`, `useEffect` with non-empty deps, async `calculateMetadata`, `@remotion/lambda`, third-party React UI libraries...
+ **Blockers** (refuse + recommend interop): `useState`, `useReducer`, `useEffect`/`useLayoutEffect` with non-empty deps, async `calculateMetadata`, third-party React UI libraries...

- **Warnings** (translate after dropping): `delayRender`, `useCallback`, `useMemo`, custom hooks.
+ **Warnings** (translate after dropping the construct): `@remotion/lambda` config, `delayRender`, `useCallback`, `useMemo`, custom hooks.

The "What this skill explicitly does NOT do" list lost the lambda bullet and gained a parenthetical note pointing at escape-hatch.md for the warning-level treatment.

run.sh preflight fix: the HF CLI presence check (and the ffmpeg check) moved out of top-level into a require_render_tier_tools() function called from run_render_tier(). T4 (lint-only) skips it entirely. Also fixed the build command suggestion: bun run --filter @hyperframes/cli build (was @hyperframes/core — different package, doesn't produce cli/dist/cli.js).

Verified ./run.sh tier-4-escape-hatch works on a clean checkout (no CLI build needed). Full ./run.sh still passes 4/4 end-to-end.

Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda policy is now aligned and T4 can run without the CLI, but the full orchestrator still reports success when the render tiers never ran.

fixture_name=$(basename "$fixture_dir")
local expected="$fixture_dir/expected.json"

if ! require_render_tier_tools; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning success here converts missing render-tier tooling into a skipped fixture, and the aggregate exits 0 as long as there are no explicit failures. In a clean checkout without packages/cli/dist/cli.js, I ran ./assets/test-corpus/run.sh; it skipped T1/T2/T3, ran only T4, printed passed 1/4, failed 0, skipped 3, and exited 0. That makes the advertised end-to-end corpus check pass without validating any render/diff tier. Missing CLI/ffmpeg for a requested render tier should be a failure, or the aggregate should exit non-zero when required tiers are skipped. Also the header comment still points users at @hyperframes/core build, while the runtime error correctly says @hyperframes/cli build.

@jrusso1020 jrusso1020 force-pushed the skill/r2hf-references branch from 662c4db to 5ef3b40 Compare April 27, 2026 23:31
@jrusso1020 jrusso1020 force-pushed the skill/r2hf-skill-body branch from 95b89ca to f29a062 Compare April 27, 2026 23:34
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

@miguel-heygen — addressed in f29a0627 (after #507's regex fix rebase):

The aggregator now treats skipped fixtures as failures so a clean checkout without the HF CLI doesn't accidentally report green:

sys.exit(0 if failed == 0 and skipped == 0 else 1)

The summary block also calls out skipped fixtures explicitly with their reason and notes that they count as failures, so the user sees what didn't run rather than just a small passed N/M line.

Verified all three modes:

# Happy path (CLI built, ffmpeg present)
$ ./run.sh ; echo "exit=$?"
  passed 4/4, failed 0, skipped 0
  exit=0

# T4-only (lint, no CLI needed)
$ ./run.sh tier-4-escape-hatch ; echo "exit=$?"
  passed 1/1, failed 0, skipped 0
  exit=0

# Clean checkout, full run, CLI not built (Miguel's repro)
$ ./run.sh ; echo "exit=$?"
  passed 1/4, failed 0, skipped 3
  ⚠ 3 skipped: tier-1-title-card, tier-2-multi-scene, tier-3-data-driven
    reason: render toolchain unavailable
  Skipped fixtures count as failures for the aggregate.
  exit=1

The single-tier mode (./run.sh tier-N-...) only counts the selected tier — tiers that weren't selected don't show up as skips, so picking T4 alone still cleanly exits 0.

Adds the deterministic eval primitives the skill calls into:

  scripts/render_diff.sh    SSIM diff between two MP4s, JSON summary, configurable threshold
  scripts/frame_strip.sh    side-by-side comparison strip for visual debugging
  scripts/lint_source.py    pre-translation lint over Remotion source — blocks/warnings/infos

The harness is decoupled from the render pipeline: it accepts paths to
already-rendered MP4s. The skill orchestrator (PR 7) drives both renders
and feeds the outputs in. This keeps the harness usable in CI, in
sandboxes, and on any machine that has ffmpeg without needing the full
Remotion + HyperFrames toolchain.

Lint catches the patterns from the skill's out-of-scope list:
- useState / useReducer (state-machine driven animation)
- useEffect with deps (side effects)
- async calculateMetadata (Promise-returning composition metadata)
- @remotion/lambda imports
- third-party React UI libraries (MUI, Chakra, Mantine, antd, shadcn, Radix, NextUI)
- delayRender / useCallback / useMemo (warnings)
- staticFile / interpolateColors (info — translatable but flagged)

Smoke test (scripts/tests/smoke.sh) exercises all three scripts against
synthetic inputs: identical ffmpeg testsrc videos pass at threshold 0.99,
different ffmpeg testsrc videos fail at 0.99, frame_strip produces a
strip.png, lint produces 0 blockers on a clean fixture and >=3 blockers
on a fixture that uses useState + useEffect + MUI + async metadata.

Validated locally: smoke.sh exits 0.
Adds the first two test fixtures the skill is graded against. Each fixture
ships:
  - remotion-src/  full Remotion project (package.json, src/, remotion.config.ts, tsconfig.json)
  - hf-src/        hand-translated HyperFrames composition (index.html)
  - expected.json  tier metadata + SSIM threshold + translation notes + measured validation
  - README.md      human walk-through of the translation choices
  - setup.sh       (T2 only) generates binary assets (PNG, WAV) via ffmpeg

T1 — title-card-fade
- 3 s @ 30 fps, 1280x720
- Single AbsoluteFill, single useCurrentFrame interpolate
  with multi-segment input [0,15,75,90] -> [0,1,1,0]
- Validated mean SSIM 0.974, threshold 0.95
  (~0.025 gap from font-fallback divergence between Remotion's bundled
   Chromium and HF's chrome-headless-shell)

T2 — title-image-outro
- 6 s @ 30 fps, 1280x720, three Sequences (TitleScene, ImageScene, OutroScene)
- Exercises spring, interpolate, Audio, Img, staticFile
- Spring -> GSAP back.out(1.4) translation
- Validated mean SSIM 0.985, threshold 0.95
  (translation came out cleaner than predicted; spring->back.out drift was
   smaller than the ~0.05 budget I'd expected)
- setup.sh generates a 200x200 blue PNG and a 6 s silent WAV via ffmpeg
  so binaries stay out of the repo

Calibration done end-to-end: rendered Remotion baseline + HF translation,
ran scripts/render_diff.sh, set thresholds ~0.02 below measured p05.

Critical Remotion config: setVideoImageFormat("png") + setColorSpace("bt709").
The default JPEG output writes yuvj420p (full-range) which costs ~0.05 SSIM
vs HF's yuv420p (limited-range). Both fixtures' remotion.config.ts encode
this so render_diff.sh measures translation fidelity, not encoder differences.

Both fixtures lint clean (0 blockers via scripts/lint_source.py).
T2 staticFile() references correctly flagged as info-level findings.

The fixtures are not yet wired into CI — that comes with PR 7's orchestrator.
For now, render and eval are documented in each README and run by hand.
Adds the data-driven tier — a purpose-built fixture (option 2 from the
stack discussion, not a port of PR #214's examples/remotion-full/) that
exercises the realistic shape of a production Remotion composition
without using the runtime adapter.

Stargazed.tsx (10s @ 30fps, 1280x720):
  Sequence 0-3s    TitleScene   (title + subtitle)
  Sequence 3-7s    StatsScene   (3 reused StatCards staggered 12 frames apart)
  Sequence 7-10s   OutroScene   (UnderlinedText with scaleX-from-left underline)

Composition shape exercises:
  - <Composition schema={z.object({...})} defaultProps={...} />
  - nested array prop (stats[]) materialized as repeated HTML
  - custom React subcomponents (StatCard, AnimatedNumber, UnderlinedText)
    reused with different props
  - per-instance delay via prop (delayInFrames -> GSAP timeline offset)
  - frame-driven count-up (AnimatedNumber, manual cubic ease-out)
  - two different spring configs in the same composition
    (damping:12 -> back.out(1.4), damping:14 -> back.out(1.2))
  - useCurrentFrame, useVideoConfig

Translation choices documented in README.md and expected.json:
  - Zod props -> data-* on root #stage div
  - Custom subcomponents inline as repeated HTML using prop interface
    as the template
  - AnimatedNumber's frame-driven count-up -> GSAP onUpdate tween on a
    { v: 0 } counter object, ease power3.out
  - Two different spring configs -> two different back.out overshoots
    (1.4 vs 1.2 approximates the damping difference)
  - delayInFrames={i * 12} -> GSAP offset (i * 0.4)s

Validated end-to-end: rendered Remotion baseline + HF translation, ran
scripts/render_diff.sh.
  measured mean SSIM 0.953
  measured min  SSIM 0.927
  measured p05  SSIM 0.938
  threshold 0.90 (~0.04 below p05)

The wider gap vs T1/T2 reflects T3's bigger approximation budget
(2 spring instances + count-up timing + font fallback on multiple text
sizes). Mean SSIM below 0.90 = structural mismatch (wrong durations,
wrong stagger, missing prop wiring), not approximation drift.

Same Remotion config as PR 3: setVideoImageFormat("png") +
setColorSpace("bt709") to match HF's yuv420p output.

Lint: 9 files scanned, 0 blockers / 0 warnings / 0 infos.
oxlint, oxfmt, typecheck all pass.

The fixture is not yet wired into CI; render + diff is documented in
README.md and runs by hand via the harness from PR 2. PR 7's orchestrator
will wire all four tiers into a CI eval run.
Adds the escape-hatch tier — lint-only fixtures that test the skill's
ability to refuse translation cleanly when it sees patterns that don't map
to HF's seek-driven model.

Cases (8 total):
  01-use-state.tsx          blocker: r2hf/use-state
  02-use-effect-deps.tsx    blocker: r2hf/use-effect-deps (multi-line body
                            with internal commas — regression target for
                            the regex bug fix in PR 2)
  03-async-metadata.tsx     blocker: r2hf/async-metadata
  04-third-party-react.tsx  blocker: r2hf/third-party-react-ui (@mui/material)
  05-lambda-config.tsx      blocker: r2hf/lambda-import
  06-warnings-only.tsx      warnings: delayRender / useCallback / useMemo
                            (no blockers — translates after dropping wrappers)
  07-custom-hook.tsx        warning: r2hf/custom-hook (pure useFadeIn)
  08-mixed.tsx              multiple blockers + warnings (aggregate test)

Each case documents:
  - The Remotion pattern it demonstrates
  - Why it's a blocker / warning / info
  - What the skill should do (refuse / drop-and-translate / translate-as-is)

Validation harness (validate.sh):
  Runs lint_source.py against each case, asserts:
    - Each expected blocker rule fires with severity="blocker"
    - Each expected warning rule fires with severity="warning"
    - lint_source.py exit code is 1 when blockers expected, 0 otherwise

T4 has no renders to diff. The skill is graded on lint correctness — that's
the gate that decides whether to translate or recommend the runtime interop
pattern from PR #214.

Result: 8/8 cases pass.
@jrusso1020 jrusso1020 force-pushed the skill/r2hf-references branch from 5ef3b40 to 7509916 Compare April 27, 2026 23:55
Adds 11 progressively-disclosed reference files that the skill loads on
demand during translation. Total ~1500 LOC, every file under 200 lines
(skill-creator's progressive-disclosure budget).

  api-map.md         the comprehensive Remotion -> HF translation table
                     (the index; loaded at start of translation)
  timing.md          interpolate, spring (validated configs), easing,
                     count-up, stagger
  sequencing.md      Sequence, Series, Loop, Freeze, AbsoluteFill,
                     Composition root
  media.md           Audio, Video, Img, IFrame, OffthreadVideo,
                     staticFile, asset paths
  transitions.md     @remotion/transitions presentations -> manual GSAP
                     crossfades or HF shader-transitions
  lottie.md          @remotion/lottie -> HF lottie adapter (incl. AE
                     feature limitations note)
  fonts.md           Google Fonts loading, local @font-face, system
                     fallback noise floor
  parameters.md      Zod schemas, defaultProps, sync vs async
                     calculateMetadata
  escape-hatch.md    when to bow out + the runtime interop pattern
                     from PR #214
  limitations.md     known caveat patterns (volume ramps, Loop with
                     state, custom presentations, code-split components)
  eval.md            how to run the validation harness, threshold rule
                     of thumb, what the noise floor looks like

The references are evidence-driven rather than speculative: every spring
config, easing curve, and SSIM threshold is documented from the
validated T1/T2/T3 calibration runs (mean 0.974 / 0.985 / 0.953). The
escape-hatch boundaries match the lint blockers in PR 2 and the T4
fixtures in PR 5.

Replaces the placeholder .gitkeep from PR 1.
The leaf PR. Replaces the placeholder SKILL.md from PR 1 with the real
5-step workflow that loads the per-topic references on demand
(skill-creator's progressive-disclosure pattern), and adds a top-level
orchestrator that runs every tier and reports a pass/fail summary.

SKILL.md changes:
  - Frontmatter unchanged from PR 1 (already covers the trigger phrases
    and out-of-scope cases)
  - Body rewritten as a 5-step workflow:
      1. Lint (load escape-hatch.md if blockers)
      2. Plan (load api-map.md, then per-topic references on demand)
      3. Generate (HF index.html with paused GSAP timeline)
      4. Validate (render_diff.sh against per-tier threshold)
      5. Document gaps (TRANSLATION_NOTES.md if needed)
  - Includes a "Source contains -> Load reference" table so the agent
    only loads the references the source actually needs
  - Documents the validated baseline numbers (T1 0.974, T2 0.985,
    T3 0.953, T4 8/8) so reviewers can reproduce
  - Calls out the critical Remotion encoder config (PNG + BT.709) that
    avoids the ~0.05 SSIM hit from yuvj420p vs yuv420p

Orchestrator (assets/test-corpus/run.sh):
  - Iterates tier-1-* through tier-4-* directories
  - T1-T3: setup -> lint -> npm install (lazy) -> render Remotion ->
           render HF -> SSIM diff at the fixture's expected threshold ->
           generate strip on failure
  - T4: validate.sh (lint-only)
  - Emits run-report.json with per-tier pass/fail and aggregate counts
  - Accepts a single-tier argument for fast iteration: ./run.sh tier-1-title-card

Validated end-to-end on a clean checkout:
    ▶ tier-1-title-card → mean SSIM 0.9739 (≥ 0.95) ✓
    ▶ tier-2-multi-scene → mean SSIM 0.985292 (≥ 0.95) ✓
    ▶ tier-3-data-driven → mean SSIM 0.952941 (≥ 0.9) ✓
    ▶ tier-4-escape-hatch → 8/8 cases ✓
    passed 4/4, failed 0, skipped 0

Closes the 7-PR stack: scaffold, eval harness, 4 tiers of corpus,
references, and now the SKILL.md body that ties everything together.
@jrusso1020 jrusso1020 force-pushed the skill/r2hf-skill-body branch from f29a062 to b7769b2 Compare April 28, 2026 00:00
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Ran /simplify on the stack. Landed in #517's amend (b7769b23):

Three run.sh cleanups:

1. Per-fixture JSON tempfiles instead of bash string concat. The old aggregator built each result as RESULTS+=("{\"fixture\":\"$name\",…}") and reassembled the array as [$(IFS=,; echo "${RESULTS[*]}")], then re-parsed it as Python source via <<PY. A fixture name with " or \ would corrupt both the JSON and the Python literal. Now each fixture writes its own $RESULTS_DIR/<fixture>.json and the aggregator globs *.json. No two-stage interpolation, no injection shape, and the per-result structure is enforced by json.dump.

2. composition_id lives in expected.json instead of being sniffed from Root.tsx via regex. This drops a python3 -c "import json, re; src = open('$file').read(); m = re.search(...)" per fixture (and a fragile shell-source-into-Python interpolation). The corresponding composition_id field was added to T1 / T2 / T3 expected.json in #508/#509.

3. Unreachable branch removed. The "missing expected.json" path in run_render_tier only fires if a fixture is locally deleted — every checked-in fixture has the file. Drop the runtime check; let read_json_value fail loudly if the file's gone.

4. Narrating comments dropped (e.g., # THIS_DIR is …, # Run a single render-tier fixture (T1, T2, T3). — restating the line above or the function name). Kept only WHY commentary (the JSON-tempfile rationale, the r2hf/lambda-import warning treatment, the skipped-fixtures-count-as-failures reasoning).

All three exit-code modes still verified:

  • ./run.sh → 4/4 pass, exit 0
  • ./run.sh tier-4-escape-hatch → 1/1 pass, exit 0
  • ./run.sh (CLI not built) → 1/4 pass + 3 skipped, exit 1

Final report.json is structurally identical to before.

@jrusso1020 jrusso1020 marked this pull request as ready for review April 28, 2026 00:29
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-checked the latest head against my requested-change thread. The full corpus runner now exits non-zero when render tiers are skipped because the HF CLI is missing (passed 1/4, skipped 3, rc=1), while ./run.sh tier-4-escape-hatch still runs cleanly without the CLI (rc=0). The stale @hyperframes/core build instruction is also corrected to @hyperframes/cli. This resolves my blocker.

@jrusso1020 jrusso1020 changed the base branch from skill/r2hf-references to main April 28, 2026 05:14
@jrusso1020 jrusso1020 merged commit d01685d into main Apr 28, 2026
20 checks passed
@jrusso1020 jrusso1020 deleted the skill/r2hf-skill-body branch April 28, 2026 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants