feat(skills): remotion-to-hyperframes SKILL.md + orchestrator (7/7)#517
feat(skills): remotion-to-hyperframes SKILL.md + orchestrator (7/7)#517jrusso1020 merged 6 commits intomainfrom
Conversation
miguel-heygen
left a comment
There was a problem hiding this comment.
The final orchestration layer has two blockers: the lambda-only policy contradicts the corpus/references, and the advertised corpus runner fails before it can run the lint-only tier in a clean checkout.
| - **Warnings** (translate after dropping): `delayRender`, `useCallback`, `useMemo`, custom hooks. | ||
| - **Info** (translate with note): `staticFile`, `interpolateColors`. | ||
|
|
||
| If any blocker fires, **stop**. Read [`references/escape-hatch.md`](references/escape-hatch.md) and surface the recommendation message. Do not produce HF output for a source that hasn't passed lint cleanly. |
There was a problem hiding this comment.
This makes every blocker a hard stop with no HF output, but the T4 corpus expects lambda-only sources to use drop_lambda_code_translate_remainder_if_clean, and escape-hatch.md also documents that edge case. Since lint_source.py currently emits r2hf/lambda-import as a blocker with exit code 1, the final skill will never take the documented lambda-only path. Please align the policy across SKILL.md, T4, references, and the linter severity before this lands.
| STRIP="$SKILL_DIR/scripts/frame_strip.sh" | ||
| HF_CLI="$REPO_ROOT/packages/cli/dist/cli.js" | ||
|
|
||
| if [[ ! -f "$HF_CLI" ]]; then |
There was a problem hiding this comment.
This preflight runs before tier selection, so ./assets/test-corpus/run.sh tier-4-escape-hatch fails in a clean checkout even though T4 is lint-only and does not need the HF CLI. I reproduced the failure locally: error: HF CLI not built. Run 'bun run --filter @hyperframes/core build' .... The suggested command is also wrong for this check because the script needs packages/cli/dist/cli.js; building @hyperframes/core will not produce that file. Please only require the CLI for T1-T3 and point users at the CLI/root build command.
6717d96 to
662c4db
Compare
01ed4b7 to
95b89ca
Compare
|
@miguel-heygen — addressed in the amended commit SKILL.md Step 1 — - **Blockers** (refuse + recommend interop): `useState`, `useReducer`, `useEffect` with non-empty deps, async `calculateMetadata`, `@remotion/lambda`, third-party React UI libraries...
+ **Blockers** (refuse + recommend interop): `useState`, `useReducer`, `useEffect`/`useLayoutEffect` with non-empty deps, async `calculateMetadata`, third-party React UI libraries...
- **Warnings** (translate after dropping): `delayRender`, `useCallback`, `useMemo`, custom hooks.
+ **Warnings** (translate after dropping the construct): `@remotion/lambda` config, `delayRender`, `useCallback`, `useMemo`, custom hooks.The "What this skill explicitly does NOT do" list lost the lambda bullet and gained a parenthetical note pointing at
Verified |
miguel-heygen
left a comment
There was a problem hiding this comment.
The lambda policy is now aligned and T4 can run without the CLI, but the full orchestrator still reports success when the render tiers never ran.
| fixture_name=$(basename "$fixture_dir") | ||
| local expected="$fixture_dir/expected.json" | ||
|
|
||
| if ! require_render_tier_tools; then |
There was a problem hiding this comment.
Returning success here converts missing render-tier tooling into a skipped fixture, and the aggregate exits 0 as long as there are no explicit failures. In a clean checkout without packages/cli/dist/cli.js, I ran ./assets/test-corpus/run.sh; it skipped T1/T2/T3, ran only T4, printed passed 1/4, failed 0, skipped 3, and exited 0. That makes the advertised end-to-end corpus check pass without validating any render/diff tier. Missing CLI/ffmpeg for a requested render tier should be a failure, or the aggregate should exit non-zero when required tiers are skipped. Also the header comment still points users at @hyperframes/core build, while the runtime error correctly says @hyperframes/cli build.
662c4db to
5ef3b40
Compare
95b89ca to
f29a062
Compare
|
@miguel-heygen — addressed in The aggregator now treats skipped fixtures as failures so a clean checkout without the HF CLI doesn't accidentally report green: sys.exit(0 if failed == 0 and skipped == 0 else 1)The summary block also calls out skipped fixtures explicitly with their reason and notes that they count as failures, so the user sees what didn't run rather than just a small Verified all three modes: The single-tier mode ( |
Adds the deterministic eval primitives the skill calls into: scripts/render_diff.sh SSIM diff between two MP4s, JSON summary, configurable threshold scripts/frame_strip.sh side-by-side comparison strip for visual debugging scripts/lint_source.py pre-translation lint over Remotion source — blocks/warnings/infos The harness is decoupled from the render pipeline: it accepts paths to already-rendered MP4s. The skill orchestrator (PR 7) drives both renders and feeds the outputs in. This keeps the harness usable in CI, in sandboxes, and on any machine that has ffmpeg without needing the full Remotion + HyperFrames toolchain. Lint catches the patterns from the skill's out-of-scope list: - useState / useReducer (state-machine driven animation) - useEffect with deps (side effects) - async calculateMetadata (Promise-returning composition metadata) - @remotion/lambda imports - third-party React UI libraries (MUI, Chakra, Mantine, antd, shadcn, Radix, NextUI) - delayRender / useCallback / useMemo (warnings) - staticFile / interpolateColors (info — translatable but flagged) Smoke test (scripts/tests/smoke.sh) exercises all three scripts against synthetic inputs: identical ffmpeg testsrc videos pass at threshold 0.99, different ffmpeg testsrc videos fail at 0.99, frame_strip produces a strip.png, lint produces 0 blockers on a clean fixture and >=3 blockers on a fixture that uses useState + useEffect + MUI + async metadata. Validated locally: smoke.sh exits 0.
Adds the first two test fixtures the skill is graded against. Each fixture
ships:
- remotion-src/ full Remotion project (package.json, src/, remotion.config.ts, tsconfig.json)
- hf-src/ hand-translated HyperFrames composition (index.html)
- expected.json tier metadata + SSIM threshold + translation notes + measured validation
- README.md human walk-through of the translation choices
- setup.sh (T2 only) generates binary assets (PNG, WAV) via ffmpeg
T1 — title-card-fade
- 3 s @ 30 fps, 1280x720
- Single AbsoluteFill, single useCurrentFrame interpolate
with multi-segment input [0,15,75,90] -> [0,1,1,0]
- Validated mean SSIM 0.974, threshold 0.95
(~0.025 gap from font-fallback divergence between Remotion's bundled
Chromium and HF's chrome-headless-shell)
T2 — title-image-outro
- 6 s @ 30 fps, 1280x720, three Sequences (TitleScene, ImageScene, OutroScene)
- Exercises spring, interpolate, Audio, Img, staticFile
- Spring -> GSAP back.out(1.4) translation
- Validated mean SSIM 0.985, threshold 0.95
(translation came out cleaner than predicted; spring->back.out drift was
smaller than the ~0.05 budget I'd expected)
- setup.sh generates a 200x200 blue PNG and a 6 s silent WAV via ffmpeg
so binaries stay out of the repo
Calibration done end-to-end: rendered Remotion baseline + HF translation,
ran scripts/render_diff.sh, set thresholds ~0.02 below measured p05.
Critical Remotion config: setVideoImageFormat("png") + setColorSpace("bt709").
The default JPEG output writes yuvj420p (full-range) which costs ~0.05 SSIM
vs HF's yuv420p (limited-range). Both fixtures' remotion.config.ts encode
this so render_diff.sh measures translation fidelity, not encoder differences.
Both fixtures lint clean (0 blockers via scripts/lint_source.py).
T2 staticFile() references correctly flagged as info-level findings.
The fixtures are not yet wired into CI — that comes with PR 7's orchestrator.
For now, render and eval are documented in each README and run by hand.
Adds the data-driven tier — a purpose-built fixture (option 2 from the stack discussion, not a port of PR #214's examples/remotion-full/) that exercises the realistic shape of a production Remotion composition without using the runtime adapter. Stargazed.tsx (10s @ 30fps, 1280x720): Sequence 0-3s TitleScene (title + subtitle) Sequence 3-7s StatsScene (3 reused StatCards staggered 12 frames apart) Sequence 7-10s OutroScene (UnderlinedText with scaleX-from-left underline) Composition shape exercises: - <Composition schema={z.object({...})} defaultProps={...} /> - nested array prop (stats[]) materialized as repeated HTML - custom React subcomponents (StatCard, AnimatedNumber, UnderlinedText) reused with different props - per-instance delay via prop (delayInFrames -> GSAP timeline offset) - frame-driven count-up (AnimatedNumber, manual cubic ease-out) - two different spring configs in the same composition (damping:12 -> back.out(1.4), damping:14 -> back.out(1.2)) - useCurrentFrame, useVideoConfig Translation choices documented in README.md and expected.json: - Zod props -> data-* on root #stage div - Custom subcomponents inline as repeated HTML using prop interface as the template - AnimatedNumber's frame-driven count-up -> GSAP onUpdate tween on a { v: 0 } counter object, ease power3.out - Two different spring configs -> two different back.out overshoots (1.4 vs 1.2 approximates the damping difference) - delayInFrames={i * 12} -> GSAP offset (i * 0.4)s Validated end-to-end: rendered Remotion baseline + HF translation, ran scripts/render_diff.sh. measured mean SSIM 0.953 measured min SSIM 0.927 measured p05 SSIM 0.938 threshold 0.90 (~0.04 below p05) The wider gap vs T1/T2 reflects T3's bigger approximation budget (2 spring instances + count-up timing + font fallback on multiple text sizes). Mean SSIM below 0.90 = structural mismatch (wrong durations, wrong stagger, missing prop wiring), not approximation drift. Same Remotion config as PR 3: setVideoImageFormat("png") + setColorSpace("bt709") to match HF's yuv420p output. Lint: 9 files scanned, 0 blockers / 0 warnings / 0 infos. oxlint, oxfmt, typecheck all pass. The fixture is not yet wired into CI; render + diff is documented in README.md and runs by hand via the harness from PR 2. PR 7's orchestrator will wire all four tiers into a CI eval run.
Adds the escape-hatch tier — lint-only fixtures that test the skill's
ability to refuse translation cleanly when it sees patterns that don't map
to HF's seek-driven model.
Cases (8 total):
01-use-state.tsx blocker: r2hf/use-state
02-use-effect-deps.tsx blocker: r2hf/use-effect-deps (multi-line body
with internal commas — regression target for
the regex bug fix in PR 2)
03-async-metadata.tsx blocker: r2hf/async-metadata
04-third-party-react.tsx blocker: r2hf/third-party-react-ui (@mui/material)
05-lambda-config.tsx blocker: r2hf/lambda-import
06-warnings-only.tsx warnings: delayRender / useCallback / useMemo
(no blockers — translates after dropping wrappers)
07-custom-hook.tsx warning: r2hf/custom-hook (pure useFadeIn)
08-mixed.tsx multiple blockers + warnings (aggregate test)
Each case documents:
- The Remotion pattern it demonstrates
- Why it's a blocker / warning / info
- What the skill should do (refuse / drop-and-translate / translate-as-is)
Validation harness (validate.sh):
Runs lint_source.py against each case, asserts:
- Each expected blocker rule fires with severity="blocker"
- Each expected warning rule fires with severity="warning"
- lint_source.py exit code is 1 when blockers expected, 0 otherwise
T4 has no renders to diff. The skill is graded on lint correctness — that's
the gate that decides whether to translate or recommend the runtime interop
pattern from PR #214.
Result: 8/8 cases pass.
5ef3b40 to
7509916
Compare
Adds 11 progressively-disclosed reference files that the skill loads on
demand during translation. Total ~1500 LOC, every file under 200 lines
(skill-creator's progressive-disclosure budget).
api-map.md the comprehensive Remotion -> HF translation table
(the index; loaded at start of translation)
timing.md interpolate, spring (validated configs), easing,
count-up, stagger
sequencing.md Sequence, Series, Loop, Freeze, AbsoluteFill,
Composition root
media.md Audio, Video, Img, IFrame, OffthreadVideo,
staticFile, asset paths
transitions.md @remotion/transitions presentations -> manual GSAP
crossfades or HF shader-transitions
lottie.md @remotion/lottie -> HF lottie adapter (incl. AE
feature limitations note)
fonts.md Google Fonts loading, local @font-face, system
fallback noise floor
parameters.md Zod schemas, defaultProps, sync vs async
calculateMetadata
escape-hatch.md when to bow out + the runtime interop pattern
from PR #214
limitations.md known caveat patterns (volume ramps, Loop with
state, custom presentations, code-split components)
eval.md how to run the validation harness, threshold rule
of thumb, what the noise floor looks like
The references are evidence-driven rather than speculative: every spring
config, easing curve, and SSIM threshold is documented from the
validated T1/T2/T3 calibration runs (mean 0.974 / 0.985 / 0.953). The
escape-hatch boundaries match the lint blockers in PR 2 and the T4
fixtures in PR 5.
Replaces the placeholder .gitkeep from PR 1.
The leaf PR. Replaces the placeholder SKILL.md from PR 1 with the real
5-step workflow that loads the per-topic references on demand
(skill-creator's progressive-disclosure pattern), and adds a top-level
orchestrator that runs every tier and reports a pass/fail summary.
SKILL.md changes:
- Frontmatter unchanged from PR 1 (already covers the trigger phrases
and out-of-scope cases)
- Body rewritten as a 5-step workflow:
1. Lint (load escape-hatch.md if blockers)
2. Plan (load api-map.md, then per-topic references on demand)
3. Generate (HF index.html with paused GSAP timeline)
4. Validate (render_diff.sh against per-tier threshold)
5. Document gaps (TRANSLATION_NOTES.md if needed)
- Includes a "Source contains -> Load reference" table so the agent
only loads the references the source actually needs
- Documents the validated baseline numbers (T1 0.974, T2 0.985,
T3 0.953, T4 8/8) so reviewers can reproduce
- Calls out the critical Remotion encoder config (PNG + BT.709) that
avoids the ~0.05 SSIM hit from yuvj420p vs yuv420p
Orchestrator (assets/test-corpus/run.sh):
- Iterates tier-1-* through tier-4-* directories
- T1-T3: setup -> lint -> npm install (lazy) -> render Remotion ->
render HF -> SSIM diff at the fixture's expected threshold ->
generate strip on failure
- T4: validate.sh (lint-only)
- Emits run-report.json with per-tier pass/fail and aggregate counts
- Accepts a single-tier argument for fast iteration: ./run.sh tier-1-title-card
Validated end-to-end on a clean checkout:
▶ tier-1-title-card → mean SSIM 0.9739 (≥ 0.95) ✓
▶ tier-2-multi-scene → mean SSIM 0.985292 (≥ 0.95) ✓
▶ tier-3-data-driven → mean SSIM 0.952941 (≥ 0.9) ✓
▶ tier-4-escape-hatch → 8/8 cases ✓
passed 4/4, failed 0, skipped 0
Closes the 7-PR stack: scaffold, eval harness, 4 tiers of corpus,
references, and now the SKILL.md body that ties everything together.
f29a062 to
b7769b2
Compare
|
Ran /simplify on the stack. Landed in #517's amend ( Three 1. Per-fixture JSON tempfiles instead of bash string concat. The old aggregator built each result as 2. 3. Unreachable branch removed. The "missing expected.json" path in 4. Narrating comments dropped (e.g., All three exit-code modes still verified:
Final report.json is structurally identical to before. |
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-checked the latest head against my requested-change thread. The full corpus runner now exits non-zero when render tiers are skipped because the HF CLI is missing (passed 1/4, skipped 3, rc=1), while ./run.sh tier-4-escape-hatch still runs cleanly without the CLI (rc=0). The stale @hyperframes/core build instruction is also corrected to @hyperframes/cli. This resolves my blocker.
What
The leaf PR of the 7-PR stack. Two pieces:
SKILL.mdbody — replaces the placeholder from PR 1 with a 5-step workflow that loads per-topic references on demand (skill-creator's progressive-disclosure pattern).assets/test-corpus/run.shthat runs all four tiers and emits a pass/fail report.End-to-end validation
Run on a clean checkout:
SKILL.md design
Per skill-creator's progressive-disclosure pattern:
The body has a "Source contains → Load reference" table that tells the agent which references to pull based on the actual Remotion APIs the source uses. The agent doesn't load all 11 references — it loads 1–4 based on what the source needs.
Orchestrator design
assets/test-corpus/run.sh:tier-1-*throughtier-4-*directories.npm install→ render Remotion baseline → render HF translation → SSIM diff at the fixture's per-tier threshold → generate frame strip on failure.validate.sh(lint-only).run-report.jsonwith per-tier pass/fail and aggregate counts../run.sh tier-1-title-card.The orchestrator is what makes this skill regression-testable: any future PR that touches the references can re-run
./run.shto verify nothing broke.Stack — complete
Test plan
./run.shruns all four tiers end-to-end on a clean checkout, all pass./run.sh tier-1-title-cardruns a single tier (selective execution works)package_skill.pyvalidates the final skillrun-report.jsonexcluded from git via .gitignore (it's regenerated per run)Reviewer notes
This is a substantive skill (~3000 LOC across the stack — scripts, fixtures, references, body) so it warrants more than a glance. Highest-leverage things to look at:
SKILL.mdbody — the workflow has to be right because everything else hangs off of it. Specifically the "Source contains → Load reference" table — if any common Remotion idiom is missing a reference link, the skill will load the wrong context.references/timing.mdspring → back.out table — every spring config in the wild needs a corresponding row. PR 6 has the validated 1.4 / 1.2 ratios; if you've seen springs the table doesn't cover, add them.