feat(skills): remotion-to-hyperframes references (6/7)#516
Conversation
miguel-heygen
left a comment
There was a problem hiding this comment.
The references still disagree on the lambda-only case, so the stack does not give agents one stable decision rule for when to stop versus translate the remainder.
| The user should use interop for the whole thing OR refactor the | ||
| blocker patterns out of their Remotion source first. | ||
|
|
||
| ## Edge case: `@remotion/lambda` is the only blocker |
There was a problem hiding this comment.
This edge case conflicts with the policy above: lines 10-23 say any blocker, including r2hf/lambda-import, should recommend interop because translating blockers produces silently-wrong output. api-map.md repeats the same any-blocker rule. If lambda-only is intended to be translatable, make it a warning/special action consistently across the linter references and final skill; if it is a true blocker, remove this edge case and update T4 case 05.
afe9d23 to
a98c6d1
Compare
6717d96 to
662c4db
Compare
|
@miguel-heygen — addressed in the amended commit The lambda-only "Edge case" section in
Plus a short paragraph below the table explaining why lambda is a warning (deployment config, orthogonal to rendered composition).
The blocker rules table at the top of |
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-reviewed the latest head. The references now consistently treat @remotion/lambda as a warning/deployment gap rather than a blocker, matching T4 and the final skill. No remaining blockers on this layer.
a98c6d1 to
461b381
Compare
662c4db to
5ef3b40
Compare
Adds the deterministic eval primitives the skill calls into: scripts/render_diff.sh SSIM diff between two MP4s, JSON summary, configurable threshold scripts/frame_strip.sh side-by-side comparison strip for visual debugging scripts/lint_source.py pre-translation lint over Remotion source — blocks/warnings/infos The harness is decoupled from the render pipeline: it accepts paths to already-rendered MP4s. The skill orchestrator (PR 7) drives both renders and feeds the outputs in. This keeps the harness usable in CI, in sandboxes, and on any machine that has ffmpeg without needing the full Remotion + HyperFrames toolchain. Lint catches the patterns from the skill's out-of-scope list: - useState / useReducer (state-machine driven animation) - useEffect with deps (side effects) - async calculateMetadata (Promise-returning composition metadata) - @remotion/lambda imports - third-party React UI libraries (MUI, Chakra, Mantine, antd, shadcn, Radix, NextUI) - delayRender / useCallback / useMemo (warnings) - staticFile / interpolateColors (info — translatable but flagged) Smoke test (scripts/tests/smoke.sh) exercises all three scripts against synthetic inputs: identical ffmpeg testsrc videos pass at threshold 0.99, different ffmpeg testsrc videos fail at 0.99, frame_strip produces a strip.png, lint produces 0 blockers on a clean fixture and >=3 blockers on a fixture that uses useState + useEffect + MUI + async metadata. Validated locally: smoke.sh exits 0.
Adds the first two test fixtures the skill is graded against. Each fixture
ships:
- remotion-src/ full Remotion project (package.json, src/, remotion.config.ts, tsconfig.json)
- hf-src/ hand-translated HyperFrames composition (index.html)
- expected.json tier metadata + SSIM threshold + translation notes + measured validation
- README.md human walk-through of the translation choices
- setup.sh (T2 only) generates binary assets (PNG, WAV) via ffmpeg
T1 — title-card-fade
- 3 s @ 30 fps, 1280x720
- Single AbsoluteFill, single useCurrentFrame interpolate
with multi-segment input [0,15,75,90] -> [0,1,1,0]
- Validated mean SSIM 0.974, threshold 0.95
(~0.025 gap from font-fallback divergence between Remotion's bundled
Chromium and HF's chrome-headless-shell)
T2 — title-image-outro
- 6 s @ 30 fps, 1280x720, three Sequences (TitleScene, ImageScene, OutroScene)
- Exercises spring, interpolate, Audio, Img, staticFile
- Spring -> GSAP back.out(1.4) translation
- Validated mean SSIM 0.985, threshold 0.95
(translation came out cleaner than predicted; spring->back.out drift was
smaller than the ~0.05 budget I'd expected)
- setup.sh generates a 200x200 blue PNG and a 6 s silent WAV via ffmpeg
so binaries stay out of the repo
Calibration done end-to-end: rendered Remotion baseline + HF translation,
ran scripts/render_diff.sh, set thresholds ~0.02 below measured p05.
Critical Remotion config: setVideoImageFormat("png") + setColorSpace("bt709").
The default JPEG output writes yuvj420p (full-range) which costs ~0.05 SSIM
vs HF's yuv420p (limited-range). Both fixtures' remotion.config.ts encode
this so render_diff.sh measures translation fidelity, not encoder differences.
Both fixtures lint clean (0 blockers via scripts/lint_source.py).
T2 staticFile() references correctly flagged as info-level findings.
The fixtures are not yet wired into CI — that comes with PR 7's orchestrator.
For now, render and eval are documented in each README and run by hand.
Adds the data-driven tier — a purpose-built fixture (option 2 from the stack discussion, not a port of PR #214's examples/remotion-full/) that exercises the realistic shape of a production Remotion composition without using the runtime adapter. Stargazed.tsx (10s @ 30fps, 1280x720): Sequence 0-3s TitleScene (title + subtitle) Sequence 3-7s StatsScene (3 reused StatCards staggered 12 frames apart) Sequence 7-10s OutroScene (UnderlinedText with scaleX-from-left underline) Composition shape exercises: - <Composition schema={z.object({...})} defaultProps={...} /> - nested array prop (stats[]) materialized as repeated HTML - custom React subcomponents (StatCard, AnimatedNumber, UnderlinedText) reused with different props - per-instance delay via prop (delayInFrames -> GSAP timeline offset) - frame-driven count-up (AnimatedNumber, manual cubic ease-out) - two different spring configs in the same composition (damping:12 -> back.out(1.4), damping:14 -> back.out(1.2)) - useCurrentFrame, useVideoConfig Translation choices documented in README.md and expected.json: - Zod props -> data-* on root #stage div - Custom subcomponents inline as repeated HTML using prop interface as the template - AnimatedNumber's frame-driven count-up -> GSAP onUpdate tween on a { v: 0 } counter object, ease power3.out - Two different spring configs -> two different back.out overshoots (1.4 vs 1.2 approximates the damping difference) - delayInFrames={i * 12} -> GSAP offset (i * 0.4)s Validated end-to-end: rendered Remotion baseline + HF translation, ran scripts/render_diff.sh. measured mean SSIM 0.953 measured min SSIM 0.927 measured p05 SSIM 0.938 threshold 0.90 (~0.04 below p05) The wider gap vs T1/T2 reflects T3's bigger approximation budget (2 spring instances + count-up timing + font fallback on multiple text sizes). Mean SSIM below 0.90 = structural mismatch (wrong durations, wrong stagger, missing prop wiring), not approximation drift. Same Remotion config as PR 3: setVideoImageFormat("png") + setColorSpace("bt709") to match HF's yuv420p output. Lint: 9 files scanned, 0 blockers / 0 warnings / 0 infos. oxlint, oxfmt, typecheck all pass. The fixture is not yet wired into CI; render + diff is documented in README.md and runs by hand via the harness from PR 2. PR 7's orchestrator will wire all four tiers into a CI eval run.
Adds the escape-hatch tier — lint-only fixtures that test the skill's
ability to refuse translation cleanly when it sees patterns that don't map
to HF's seek-driven model.
Cases (8 total):
01-use-state.tsx blocker: r2hf/use-state
02-use-effect-deps.tsx blocker: r2hf/use-effect-deps (multi-line body
with internal commas — regression target for
the regex bug fix in PR 2)
03-async-metadata.tsx blocker: r2hf/async-metadata
04-third-party-react.tsx blocker: r2hf/third-party-react-ui (@mui/material)
05-lambda-config.tsx blocker: r2hf/lambda-import
06-warnings-only.tsx warnings: delayRender / useCallback / useMemo
(no blockers — translates after dropping wrappers)
07-custom-hook.tsx warning: r2hf/custom-hook (pure useFadeIn)
08-mixed.tsx multiple blockers + warnings (aggregate test)
Each case documents:
- The Remotion pattern it demonstrates
- Why it's a blocker / warning / info
- What the skill should do (refuse / drop-and-translate / translate-as-is)
Validation harness (validate.sh):
Runs lint_source.py against each case, asserts:
- Each expected blocker rule fires with severity="blocker"
- Each expected warning rule fires with severity="warning"
- lint_source.py exit code is 1 when blockers expected, 0 otherwise
T4 has no renders to diff. The skill is graded on lint correctness — that's
the gate that decides whether to translate or recommend the runtime interop
pattern from PR #214.
Result: 8/8 cases pass.
461b381 to
7509916
Compare
5ef3b40 to
7509916
Compare
Adds 11 progressively-disclosed reference files that the skill loads on
demand during translation. Total ~1500 LOC, every file under 200 lines
(skill-creator's progressive-disclosure budget).
api-map.md the comprehensive Remotion -> HF translation table
(the index; loaded at start of translation)
timing.md interpolate, spring (validated configs), easing,
count-up, stagger
sequencing.md Sequence, Series, Loop, Freeze, AbsoluteFill,
Composition root
media.md Audio, Video, Img, IFrame, OffthreadVideo,
staticFile, asset paths
transitions.md @remotion/transitions presentations -> manual GSAP
crossfades or HF shader-transitions
lottie.md @remotion/lottie -> HF lottie adapter (incl. AE
feature limitations note)
fonts.md Google Fonts loading, local @font-face, system
fallback noise floor
parameters.md Zod schemas, defaultProps, sync vs async
calculateMetadata
escape-hatch.md when to bow out + the runtime interop pattern
from PR #214
limitations.md known caveat patterns (volume ramps, Loop with
state, custom presentations, code-split components)
eval.md how to run the validation harness, threshold rule
of thumb, what the noise floor looks like
The references are evidence-driven rather than speculative: every spring
config, easing curve, and SSIM threshold is documented from the
validated T1/T2/T3 calibration runs (mean 0.974 / 0.985 / 0.953). The
escape-hatch boundaries match the lint blockers in PR 2 and the T4
fixtures in PR 5.
Replaces the placeholder .gitkeep from PR 1.
What
Eleven reference files under
skills/remotion-to-hyperframes/references/that the skill loads progressively (per skill-creator's design pattern) when translating a specific Remotion idiom.api-map.mdtiming.mdsequencing.mdmedia.mdtransitions.mdlottie.mdfonts.mdparameters.mdescape-hatch.mdlimitations.mdeval.mdTotal: ~1500 lines across 11 files, every file under 200 lines (skill-creator's progressive-disclosure budget).
Why
#6 in the 7-PR stack.
The references are the skill's working memory — SKILL.md (PR 7) is intentionally lean (a 5-step workflow) and links into these files as needed. When the agent encounters a
spring()it loadstiming.md; when it sees<TransitionSeries>it loadstransitions.md. The agent never loads everything at once — that's the whole point of the progressive-disclosure pattern.How
Evidence-driven, not speculative:
spring → back.out(N)mapping intiming.mdis from the validated T2 / T3 renders (mean SSIM 0.985 / 0.953).eval.mdis from the calibration runs in PR 3 / PR 4.escape-hatch.mdmatches alint_source.pyrule and a T4 case.eval.md(setVideoImageFormat("png")+setColorSpace("bt709")) is the empirical fix that bumped T1 from 0.958 → 0.974 during validation.Stack
#506 (1/7) — scaffold
#507 (2/7) — eval harness
#508 (3/7) — T1 + T2 fixtures
#509 (4/7) — T3 data-driven fixture
#515 (5/7) — T4 escape-hatch fixtures
this PR (6/7) — references/*.md
7/7 — SKILL.md body + corpus orchestrator
Test plan
[timing.md](timing.md)).gitkeepfrom PR 1's scaffold