feat(skills): remotion-to-hyperframes corpus T3 (4/7) by jrusso1020 · Pull Request #509 · heygen-com/hyperframes

jrusso1020 · 2026-04-27T05:36:12Z

What

Tier 3 of the test corpus — a purpose-built data-driven fixture (option 2 from the stack discussion, not a port of PR #214's examples/remotion-full/) that exercises the realistic shape of a production Remotion composition.

Stargazed.tsx — 10 s @ 30 fps, 1280×720:

Sequence 0–3 s    TitleScene    (title + subtitle, spring + linear fade)
Sequence 3–7 s    StatsScene    (3 reused StatCards staggered 12 frames apart)
Sequence 7–10 s   OutroScene    (UnderlinedText with scaleX-from-left underline)

Custom React subcomponents:

StatCard(label, value, color, delayInFrames) — used 3× with different props
AnimatedNumber(from, to, durationInFrames) — frame-driven count-up
UnderlinedText(text, color) — text with scaleX underline reveal

End-to-end validation

Validated mean SSIM: 0.953 · threshold 0.90 · margin +0.038 from p05 (0.938).

Frame-strip inspection: the count-up shows minor frame-level digit mismatches mid-animation (Remotion 913, HF 1032 around 3.5 s) but converges to identical final values. Both formulas (Remotion's manual 1 - (1-t)^3 and GSAP's power3.out) are mathematically equivalent — the offset comes from sub-frame seek timing of when GSAP's onUpdate callback fires relative to Remotion's per-frame React render. No SSIM impact above the noise floor.

The wider threshold gap vs T1/T2 (0.04 vs 0.02 below p05) reflects T3's bigger approximation budget: 2 spring instances + count-up timing + font fallback on multiple text sizes (160 px title, 72 px stat number, 80 px outro). Mean SSIM below 0.90 = structural mismatch (wrong durations, wrong stagger, missing prop wiring), not approximation drift.

Why

This is #4 in the 7-PR stack and the largest tier in the corpus. T1 + T2 covered the basic API surface (Sequence, AbsoluteFill, interpolate, spring, Audio, Img, staticFile). T3 adds the shape of real production compositions:

<Composition schema={z.object({...})} defaultProps={...} /> — Zod-typed props
nested array prop (stats[]) materialized as repeated markup
custom React subcomponents reused with different props
frame-driven count-up animation (AnimatedNumber)
two different spring configs in the same composition
useVideoConfig() for fps

If a translation passes T3, the skill correctly handles the patterns 80% of real-world Remotion code uses.

The choice not to port examples/remotion-full/ from PR #214: that fixture's HF half uses the runtime adapter pattern (Remotion's render pipeline running inside the HF page). Including it in the corpus would mix runtime-adapter idioms into the translation evaluation, which is the wrong target — the skill produces pure HTML+GSAP. A purpose-built fixture exercising the same APIs is more honest.

How

Translation choices documented in README.md and expected.json:

Remotion	HyperFrames
`<Composition schema={z.object({...})} defaultProps={...} />`	data-* attrs on root `#stage` div
nested array prop (`stats[]`)	repeated HTML markup with per-instance `data-*` attrs
custom React subcomponent	inline repeated HTML using the component's prop interface as the template
`<AnimatedNumber from={0} to={value} dur={45} />` (cubic ease-out count-up)	tween on `{ v: 0 }` object with `onUpdate` rewriting `textContent`, ease `power3.out`
`spring({damping:12, stiffness:100})`	`back.out(1.4)` over ~0.7 s
`spring({damping:14, stiffness:90})`	`back.out(1.2)` over ~0.7 s
`delayInFrames={i * 12}` (per-instance)	GSAP timeline offset `(i * 0.4)` s

Same Remotion config as PR #508: setVideoImageFormat("png") + setColorSpace("bt709") to match HF's yuv420p output.

Stack

#506 (1/7) — scaffold
#507 (2/7) — eval harness
#508 (3/7) — T1 + T2 fixtures
this PR (4/7) — T3 data-driven fixture
5/7 — T4 escape-hatch fixtures
6/7 — references/*.md (translation map)
7/7 — SKILL.md body + corpus orchestrator

Test plan

lint_source.py over T3 Remotion source: 9 files scanned, 0 blockers / 0 warnings / 0 infos
All fixture .tsx, .ts, .json, .md files pass oxfmt --check and oxlint
Lefthook pre-commit (lint + format + typecheck) passes
End-to-end render + SSIM diff (mean 0.953, ≥ 0.90 threshold)

miguel-heygen

T3 has the same threshold-contract drift as T1/T2. The fixture itself is useful, but the README and executable metadata need to agree.

miguel-heygen · 2026-04-27T20:06:05Z

+to count from 0 to the target. `AnimatedNumber` itself derives the displayed
+value from `useCurrentFrame()` + a manual `1 - (1 - t)^3` ease.
+
+## The lossy parts (and why threshold = 0.85)


This section says the threshold is 0.85 and line 56 says below 0.85 is the structural-mismatch signal, but expected.json sets ssim_threshold to 0.90 and the PR body/final skill also cite 0.90. Please align this README with the threshold the orchestrator actually enforces.

jrusso1020 · 2026-04-27T23:10:16Z

@miguel-heygen — addressed in the amended commit 7e93bf8f:

T3 README: The "## The lossy parts (and why threshold = 0.85)" header is now "= 0.90" (matching expected.json). The closing paragraph "A mean SSIM below 0.85 in T3 indicates a structural mismatch" → "below 0.90". Validated mean (0.953) appended so the reader sees the calibrated number alongside the gate.

grep -n "0\.85" over T3 returns no matches now.

miguel-heygen

Re-reviewed the latest head. The T3 README now uses the same 0.90 threshold as expected.json and the final skill, so my prior blocker is resolved.

Adds the deterministic eval primitives the skill calls into: scripts/render_diff.sh SSIM diff between two MP4s, JSON summary, configurable threshold scripts/frame_strip.sh side-by-side comparison strip for visual debugging scripts/lint_source.py pre-translation lint over Remotion source — blocks/warnings/infos The harness is decoupled from the render pipeline: it accepts paths to already-rendered MP4s. The skill orchestrator (PR 7) drives both renders and feeds the outputs in. This keeps the harness usable in CI, in sandboxes, and on any machine that has ffmpeg without needing the full Remotion + HyperFrames toolchain. Lint catches the patterns from the skill's out-of-scope list: - useState / useReducer (state-machine driven animation) - useEffect with deps (side effects) - async calculateMetadata (Promise-returning composition metadata) - @remotion/lambda imports - third-party React UI libraries (MUI, Chakra, Mantine, antd, shadcn, Radix, NextUI) - delayRender / useCallback / useMemo (warnings) - staticFile / interpolateColors (info — translatable but flagged) Smoke test (scripts/tests/smoke.sh) exercises all three scripts against synthetic inputs: identical ffmpeg testsrc videos pass at threshold 0.99, different ffmpeg testsrc videos fail at 0.99, frame_strip produces a strip.png, lint produces 0 blockers on a clean fixture and >=3 blockers on a fixture that uses useState + useEffect + MUI + async metadata. Validated locally: smoke.sh exits 0.

Adds the first two test fixtures the skill is graded against. Each fixture ships: - remotion-src/ full Remotion project (package.json, src/, remotion.config.ts, tsconfig.json) - hf-src/ hand-translated HyperFrames composition (index.html) - expected.json tier metadata + SSIM threshold + translation notes + measured validation - README.md human walk-through of the translation choices - setup.sh (T2 only) generates binary assets (PNG, WAV) via ffmpeg T1 — title-card-fade - 3 s @ 30 fps, 1280x720 - Single AbsoluteFill, single useCurrentFrame interpolate with multi-segment input [0,15,75,90] -> [0,1,1,0] - Validated mean SSIM 0.974, threshold 0.95 (~0.025 gap from font-fallback divergence between Remotion's bundled Chromium and HF's chrome-headless-shell) T2 — title-image-outro - 6 s @ 30 fps, 1280x720, three Sequences (TitleScene, ImageScene, OutroScene) - Exercises spring, interpolate, Audio, Img, staticFile - Spring -> GSAP back.out(1.4) translation - Validated mean SSIM 0.985, threshold 0.95 (translation came out cleaner than predicted; spring->back.out drift was smaller than the ~0.05 budget I'd expected) - setup.sh generates a 200x200 blue PNG and a 6 s silent WAV via ffmpeg so binaries stay out of the repo Calibration done end-to-end: rendered Remotion baseline + HF translation, ran scripts/render_diff.sh, set thresholds ~0.02 below measured p05. Critical Remotion config: setVideoImageFormat("png") + setColorSpace("bt709"). The default JPEG output writes yuvj420p (full-range) which costs ~0.05 SSIM vs HF's yuv420p (limited-range). Both fixtures' remotion.config.ts encode this so render_diff.sh measures translation fidelity, not encoder differences. Both fixtures lint clean (0 blockers via scripts/lint_source.py). T2 staticFile() references correctly flagged as info-level findings. The fixtures are not yet wired into CI — that comes with PR 7's orchestrator. For now, render and eval are documented in each README and run by hand.

Adds the data-driven tier — a purpose-built fixture (option 2 from the stack discussion, not a port of PR #214's examples/remotion-full/) that exercises the realistic shape of a production Remotion composition without using the runtime adapter. Stargazed.tsx (10s @ 30fps, 1280x720): Sequence 0-3s TitleScene (title + subtitle) Sequence 3-7s StatsScene (3 reused StatCards staggered 12 frames apart) Sequence 7-10s OutroScene (UnderlinedText with scaleX-from-left underline) Composition shape exercises: - <Composition schema={z.object({...})} defaultProps={...} /> - nested array prop (stats[]) materialized as repeated HTML - custom React subcomponents (StatCard, AnimatedNumber, UnderlinedText) reused with different props - per-instance delay via prop (delayInFrames -> GSAP timeline offset) - frame-driven count-up (AnimatedNumber, manual cubic ease-out) - two different spring configs in the same composition (damping:12 -> back.out(1.4), damping:14 -> back.out(1.2)) - useCurrentFrame, useVideoConfig Translation choices documented in README.md and expected.json: - Zod props -> data-* on root #stage div - Custom subcomponents inline as repeated HTML using prop interface as the template - AnimatedNumber's frame-driven count-up -> GSAP onUpdate tween on a { v: 0 } counter object, ease power3.out - Two different spring configs -> two different back.out overshoots (1.4 vs 1.2 approximates the damping difference) - delayInFrames={i * 12} -> GSAP offset (i * 0.4)s Validated end-to-end: rendered Remotion baseline + HF translation, ran scripts/render_diff.sh. measured mean SSIM 0.953 measured min SSIM 0.927 measured p05 SSIM 0.938 threshold 0.90 (~0.04 below p05) The wider gap vs T1/T2 reflects T3's bigger approximation budget (2 spring instances + count-up timing + font fallback on multiple text sizes). Mean SSIM below 0.90 = structural mismatch (wrong durations, wrong stagger, missing prop wiring), not approximation drift. Same Remotion config as PR 3: setVideoImageFormat("png") + setColorSpace("bt709") to match HF's yuv420p output. Lint: 9 files scanned, 0 blockers / 0 warnings / 0 infos. oxlint, oxfmt, typecheck all pass. The fixture is not yet wired into CI; render + diff is documented in README.md and runs by hand via the harness from PR 2. PR 7's orchestrator will wire all four tiers into a CI eval run.

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from d8242cd to bb3e7d5 Compare April 27, 2026 17:05

jrusso1020 force-pushed the skill/r2hf-corpus-t3 branch from 564a5b5 to a9a2bf7 Compare April 27, 2026 17:07

jrusso1020 mentioned this pull request Apr 27, 2026

feat(skills): remotion-to-hyperframes corpus T1+T2 (3/7) #508

Merged

6 tasks

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from bb3e7d5 to 08fa028 Compare April 27, 2026 18:40

jrusso1020 force-pushed the skill/r2hf-corpus-t3 branch from a9a2bf7 to 0697f35 Compare April 27, 2026 18:41

This was referenced Apr 27, 2026

feat(skills): remotion-to-hyperframes corpus T4 (5/7) #515

Merged

feat(skills): remotion-to-hyperframes references (6/7) #516

Merged

feat(skills): remotion-to-hyperframes SKILL.md + orchestrator (7/7) #517

Merged

miguel-heygen requested changes Apr 27, 2026

View reviewed changes

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from 08fa028 to abaa743 Compare April 27, 2026 23:04

jrusso1020 force-pushed the skill/r2hf-corpus-t3 branch from 0697f35 to 7e93bf8 Compare April 27, 2026 23:05

jrusso1020 requested a review from miguel-heygen April 27, 2026 23:11

miguel-heygen approved these changes Apr 27, 2026

View reviewed changes

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from abaa743 to 2649d6d Compare April 27, 2026 23:31

jrusso1020 force-pushed the skill/r2hf-corpus-t3 branch from 7e93bf8 to 2ab99c6 Compare April 27, 2026 23:31

jrusso1020 requested a review from miguel-heygen April 27, 2026 23:34

jrusso1020 added 2 commits April 27, 2026 23:54

jrusso1020 force-pushed the skill/r2hf-corpus-t1-t2 branch from 2649d6d to 9ff46d7 Compare April 27, 2026 23:54

jrusso1020 force-pushed the skill/r2hf-corpus-t3 branch from 2ab99c6 to efa7164 Compare April 27, 2026 23:54

jrusso1020 marked this pull request as ready for review April 28, 2026 00:29

jrusso1020 changed the base branch from skill/r2hf-corpus-t1-t2 to main April 28, 2026 05:13

jrusso1020 merged commit 1294523 into main Apr 28, 2026
20 checks passed

jrusso1020 deleted the skill/r2hf-corpus-t3 branch April 28, 2026 05:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): remotion-to-hyperframes corpus T3 (4/7)#509

feat(skills): remotion-to-hyperframes corpus T3 (4/7)#509
jrusso1020 merged 3 commits into
mainfrom
skill/r2hf-corpus-t3

jrusso1020 commented Apr 27, 2026 •

edited

Loading

Uh oh!

miguel-heygen left a comment

Uh oh!

miguel-heygen Apr 27, 2026

Uh oh!

jrusso1020 commented Apr 27, 2026 •

edited

Loading

Uh oh!

miguel-heygen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jrusso1020 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

End-to-end validation

Why

How

Stack

Test plan

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

miguel-heygen Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

jrusso1020 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miguel-heygen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jrusso1020 commented Apr 27, 2026 •

edited

Loading

jrusso1020 commented Apr 27, 2026 •

edited

Loading