Skip to content

codegen: snapshot-driven generator + module split + correctness pass#60

Open
jonathanembleyriches wants to merge 5 commits into
mainfrom
refactor/codegen-clang-ast
Open

codegen: snapshot-driven generator + module split + correctness pass#60
jonathanembleyriches wants to merge 5 commits into
mainfrom
refactor/codegen-clang-ast

Conversation

@jonathanembleyriches

@jonathanembleyriches jonathanembleyriches commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Replace URLab's hand-rolled mjsX import / spec-write code with a
snapshot-driven codegen under Scripts/codegen/. A rules JSON
plus four snapshots (MJCF schema, mjsX struct fields, mjxmacro
pointer tables, libclang AST scrape) drive a generator that emits
UE C++ between explicit CODEGEN_* markers in component files.
URLab.Build.cs runs generate_ue_components.py --check on every
editor compile, so hand-edits inside marker regions surface as
build failures rather than silent drift.

Components moved to codegen ownership: Geom, Site, Joint, Body,
Frame, Inertial, Camera, Sensor, Actuator, Equality, Tendon,
ContactPair, ContactExclude, Keyframe, Flexcomp, and the seven
view structs in MjBind.h.

Five commits; per-chunk detail in each commit message.

Motivation

Replace hand-maintained mirror code so new mjsX fields are JSON
edits and drift between MuJoCo's C API and URLab's mirror surfaces
at compile time, not runtime.

Linked issue

None.

Build + test evidence

=== URLab build+test summary ===
Timestamp : 2026-06-09 08:41:43 UTC
Git HEAD  : 26c67c94 (refactor/codegen-clang-ast)
Engine    : UE_5.7
Deps      : mj=6882095 coacd=c7436bf zmq=7d95ac0
Build     : Succeeded
Tests     : 207 / 207 passed (0 failed)  [207 tests performed]
Log sha256: 49caa6b965cd1e74
================================

pytest Scripts/codegen/tests also green at 281/281.

Manual verification steps

  • MJCF round-trip: import a representative MJCF (mixed joint types,
    actuators, sensors), confirm the Details panel hides Pos / Quat
    on geom and site (transform widget owns the value), export back
    to MJCF, diff for round-trip stability.
  • Slider-crank actuator: an MJCF actuator carrying both site=
    and slidersite= should land as TransmissionType=SliderCrank
    with site as TargetName (was misimported as Site).
  • mjOption override semantics: load options from an MJCF that
    sets timestep=0.001; with bOverride_Timestep left off,
    ApplyToSpec must preserve 0.001 instead of clobbering it
    with the UE struct's default.

Checklist

  • Builds locally against UE 5.7+
  • Full URLab.* automation suite passes (207/207)
  • Docs updated for user-facing changes (`docs/guides/codegen_contributors.md`)

…ator

URLab's MuJoCo component layer used to hand-roll its mjsX import,
spec-write, and view-bind code per class. This commit replaces that
with a snapshot-driven generator under Scripts/codegen/.

Pipeline:
- codegen_rules.json is the editorial source of truth; it points at
  four JSON snapshots (MJCF schema, mjsX struct fields + mjs_setTo
  signatures, mjxmacro pointer tables, libclang AST scrape).
- generate_ue_components.py reads rules + snapshots and emits UE
  C++ between explicit CODEGEN_*_START/END markers.
- regen_all.py orchestrates snapshot rebuilds + the codegen pass.
- URLab.Build.cs invokes generate_ue_components.py --check on every
  editor compile, so hand-edits inside marker regions surface as
  build failures rather than silent drift.

Components moved to codegen ownership: Geom (+ 8 primitive
subclasses via banner-mode overwrite), Site (+ 5 type subclasses),
Joint (+ Hinge / Slide / Ball / Free), Body, Frame, Inertial,
Camera, Sensor (+ 28 subtypes via auto-emitted type switch +
TagToType map), Actuator (+ 9 subtypes via subtype_setto),
Equality (+ 7 subtypes via multi_uclass + objtype_dispatch),
Tendon, ContactPair, ContactExclude, Keyframe, Flexcomp, plus all
seven view structs in MjBind.h and MjGeom's FinalType resolution
block.

Synthetic struct emission (whole file): FMjOptionGenerated
(replaces FMuJoCoOptions), FMjStatistic, six FMjVisualX structs,
FMjsCompilerOptions, FMjsSpec, MjOptionEnums
(EMjIntegrator / Cone / Solver), MjArticulationRegistry.

Spec-write helpers extracted to consolidate emission sites:
MjSetDoubleVec, MjSetString, RegisterAllOf<T>.

Rule shapes introduced: fully_emitted (gated by an audit
diagnostic on overwrite), objtype_dispatch (equality),
geom_final_type, extra_constructor, bind_override,
mjs_data_packed_attrs (slot ranges in mjsEquality.data),
target_collations, subtype_setto, synthetic_categories,
generated_enums, value_map_from_enum, unit_conversion (per-attr
per-Type op, e.g. joint cm/deg storage), vec3_convert: y_negate
(joint.axis handedness).

Vendored mjspecmacro.h is pinned via sync_vendored.py +
_VENDORED_FROM.md manifest and folded into mjxmacro_snapshot at
build time.

Test surface: ~95 pytest cases cover emission shapes, snapshot
parsers, and drift internals. URLab automation expands to 207
cases; all green. docs/guides/codegen_contributors.md documents
the rules shape, the snapshots, the drift gate, and the
--check / --strict flags.
Adds forward-maintainability drift checks (hand-enum drift,
type/shape drift, new-attr typing, orphan rule entries, mjxmacro
block coverage). Every check routes through a shared DiagBuffer;
--strict promotes a non-zero fired count to a non-zero exit code.

CLI: --strict + --require-introspect default on, so a stale or
missing snapshot fails CI loudly rather than producing degraded
output.

Single source of truth for UE types: _UE_TYPE_INFO is the only
type-list the codegen consults. Default-initializer choice,
drift-check shape, and Units-meta gate become thin views over it
(was three separate type lists).

Refactor pass on the emission surface:
- compute_canon_absorbed + iter_category_attrs formalise the
  standard exclusion gate that every per-attr emit used to inline.
- emit_enum_switch helper.
- PhaseContext threads as a single arg through every emission
  phase.
- Canonicalisation registry replaces ad-hoc canon dispatch.
- Property emission, xml_enum property-decl emission, and
  emit_xml_passthrough_body each split into focused helpers.
- emit_schema_for_attrs collapses to a 3-line orchestrator.
- _emit_synthetic_struct_files splits into 4 helpers;
  _emit_drift_diagnostics factors into 6 _check_* helpers.

Inject helper: a multi-tag entry point with a brace-balance gate
replaces ad-hoc text-injection sites. A brace-aware ctor extractor
in the banner-overwrite audit replaces a regex that truncated on
inner braces.

libclang scrape of URLab's hand-written EMj* enums lands in the
introspect snapshot.

Editor UX: hide MJCF-overlapping properties (Pos, Quat, size, etc.)
when the UE Transform widget owns the value.

Generated articulation enums (MjArticulationEnums.h) are staged
behind extra_members + disabled flags; activation lands in a
follow-up once the runtime articulation surface consumes them.
generate_ue_components.py was a monolith. Extracted three sibling
modules:

- _codegen_core: DiagBuffer, _UE_TYPE_INFO + derived views, the
  5-strategy _resolve_mjs_field chain, _resolve_value_map, schema
  readers, FileWrite dataclass.
- _codegen_inject: text-injection helpers, brace-balance gate,
  multi-tag inject scaffolding.
- _codegen_checks: 17 drift checks + _emit_drift_diagnostics.

generate_ue_components.py becomes the slim orchestrator that
threads PhaseContext through 11 emission phases.

Resolver chain hardened: the 5 strategies (direct match, name
suffix, root-name-digits, underscore-normed, actuator-prefix)
each become a separate helper with its own test, with a lock-in
test pinning the dispatch order.

New drift checks: apply-mode validity, embedded-C++ references,
allowlist staleness (covers 4 of the 5 intentionally_* allowlists;
the fifth is covered indirectly by compiler-attrs coverage).

Snapshot pipeline simplification: build_mjspec_snapshot.py (the
regex scrape) is retired. The libclang introspect snapshot is the
single source of truth for the MuJoCo C API surface.

Behavior fixes: geom.fitscale is now double (matching the mjsGeom
field type), not bool. Site exposes Mesh + Hfield as additional
site types.

Inject helper: empty CODEGEN blocks collapse to adjacent markers,
no stray blank line. END marker indent mirrors START's.

Rule contract: a codegen_rules.json shape test pins every reader's
key shape so typos surface in pytest, not as UHT errors later.

MjArticulationEnums emission activated. EMjGainType, EMjBiasType,
EMjDynType, EMjGeomInertia, EMjFluidShape, EMjActuatorTrnType,
EMjDcMotorInput, EMjJointType, EMjGeomType, EMjSiteType all emit
from a single codegen-owned header that MjActuator, MjGeom,
MjCameraTypes, MjSensor, and MjFlexcomp include.

New plugin utilities: URLabAxisConv (MuJoCo <-> UE axis conversion
helpers) and URLabLocaleSafeFloat (locale-independent float parser
for MJCF imports under non-English locales).
Diagnostic hygiene:
- All coverage diagnostics carry consistent source= tags so
  --strict greps can target specific message buckets.
- _check_generated_enum_coverage diagnoses a missing from_mj_enum
  instead of silently continuing.
- editor_option_helpers and banner-safety paths surface their
  missing-marker / OSError conditions that previously went silent.
- apply_writes writes bytes + binary-compares so codegen output is
  byte-identical across OSes (was silently baking CRLF on Windows).
- main() asserts non-empty enums / structs / setto_functions after
  the introspect projection — a libclang scrape shape change used
  to silently empty the projection.
- An orphan-file walk catches stale Generated/ residue when a
  category is removed from rules.

Dead code removed: all_owned_decl_names, two dead-store assigns,
a duplicate _attr_default_value branch, EmissionContext.category_
label, the unused mjspec param on _resolve_value_map + its
callers. Unused UE types pruned from _UE_TYPE_INFO. Orchestrator
import block shrinks from 66 to 27 names.

Contract tests: test_rule_shape_contract pins inner-key shapes
for categories, subtypes, element_rules, xml_enum_attrs,
canonicalizations, synthetic_categories, generated_enums. A
typo'd inner key now surfaces in pytest instead of letting every
reader's .get(..., {}) silently drop work. Canonicalisation and
EmissionPhase dataclass callables get real Callable signatures
(were Any). A _value_map_pair helper consolidates 4 inline shape
guards.

Test surface cleanup: 10 low-signal duplicates / self-identity
assertions dropped. New focused tests: test_coverage_checks
(13 cases for 6 previously-untested drift checks),
test_xml_passthrough_emit (8 cases for emit_xml_passthrough_body,
which had zero direct coverage), test_canon_emitter_bodies
(5 cases for actuator_transmission + fromto_decompose canon
emitters).

Rules JSON consolidation: 15 empty exclude_attrs defaults dropped
(readers already fall back to []). A _doc_transform_widget_policy
top-level note consolidates 5 per-element _note_hidden prose
variants. Misleading _note_setto_param_defaults string-literal
claim corrected.

Legacy URLab-internal compat surface removed: diagnostic API
aliases, the retired build_mjspec_snapshot.py fallback paths,
pascal_case identity helper, the export_if + scalar_int_fields
rule features that no production rule uses, the
disable_schema_emission gate, default_subtype_key,
articulation_registry_path.

Exclusion-gate collapse: iter_category_attrs grows an extra_excl
kwarg; per-call inline if/continue chains collapse into a single
parameter. _CANON_MJS_FIELDS lifts inline next to its sole
consumer in _check_mjs_struct_field_drift.

_check_new_attr_typing also surfaces type_mappings entries whose
UE type is not in _UE_TYPE_INFO (catches typo'd TArray<flot> at
JSON-edit time).

test_behavioral_emit_guarantees pins three semantic properties
that the substring lock-ins didn't cover: xml_enum import/export
roundtrip pairing, mjs_setTo arg order following the C signature,
and bOverride_X / X identifier agreement across every category.
Correctness fixes:
- FMjOptionGenerated::ApplyToSpec gates every write with
  bOverride_X. Previously wrote unconditionally, silently
  clobbering MJCF-XML defaults with UE struct defaults whenever
  the user had not overridden a field. Now matches the existing
  ApplyOverridesToModel semantics.
- Slider-crank actuator transmission misimport: the slidersite
  branch fired before site, then site clobbered the transmission
  type back to Site. Per MJCF, slidersite + site means SliderCrank
  with site as the target. Reorder so slidersite runs last, after
  the main target is captured.
- Unit-conversion + double-vec exports silently lost the
  attr_to_mjs_field remap when mjs_fields was empty (the
  --no-require-introspect path). Apply the per-attr export's
  fallback to both miss sites.
- _phase_objtype_dispatch and _phase_geom_final_type re-read the
  base .cpp from disk even when _phase_categories had already
  queued a FileWrite for the same path; the later phase would
  inject into stale text and apply_writes would overwrite the
  earlier phase's update on the first run after a rule change.
  _inject_tags_into_cpp now uses the pending FileWrite content
  when present and replaces the entry in-place.
- _emit_xml_enum_import sets the override toggle only when a
  value_map branch actually matched. A typo'd XML enum value
  (e.g. type="hindge") no longer silently masks itself by setting
  the toggle on the UE-default enum value.

Latent hazards closed:
- test_ue_type_registry monkey-patched _UE_TYPE_INFO onto the
  gen module at import time, leaking into every subsequent test.
  Switched to direct imports from _codegen_core.
- _audit_banner_safety only diagnosed; apply_writes would still
  destroy hand-edited content before the --strict gate fired.
  main() now snapshots banner_overwrite diagnostics before
  apply_writes and refuses to write in strict mode.
- apply_writes stages to <path>.codegen.tmp + os.replace so a
  Ctrl-C between truncate and write doesn't leave a partial
  file UBT would pick up.
- Synthetic mirror 'count' parsing accepts both str and int —
  was brittle to a future mjxmacro snapshot shape change.
- Synthetic ApplyTo inline uses static_cast<decltype(X)>(V)
  instead of C-style (decltype(X))V so narrowing warnings
  surface at compile.
- New contract test rejects 'inf' / 'nan' / hex / empty strings
  in setto_param_defaults at JSON-edit time (values land verbatim
  in generated C++).
- Extend value_map shape contract to reject empty maps and
  empty-string xml keys (would silently set the override toggle
  without ever matching the enum).

Hygiene + perf:
- inject_between_tags caches the compiled regex per tag (was
  re.compile-ing on every call).
- Drop spurious 'f' suffix on double-typed mjOption defaults
  (Timestep, Impratio, Tolerance, NoslipTolerance, CCD_Tolerance)
  — were triggering -Wnarrowing.
- Stale docstring + comment fixes: snapshot paths updated,
  retired build_mjspec_snapshot.py references replaced.
- test_synthetic_categories uses _reset_diags() instead of raw
  _DIAGS_BUFFER.pending.clear() so fired_count is zeroed too.

Refactor: a single _inject_or_diag helper replaces six hand-rolled
"try inject_between_tags then if-not-ok diag" stanzas across
emit_subclass_files, emit_base_class_injection,
emit_multi_uclass, and emit_bind_h_injection (~60 LoC of
boilerplate removed).

Rename: elem_rule (singular) -> elem_rules (plural) in
_codegen_checks.py to match the orchestrator's convention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant