codegen: snapshot-driven generator + module split + correctness pass#60
Open
jonathanembleyriches wants to merge 5 commits into
Open
codegen: snapshot-driven generator + module split + correctness pass#60jonathanembleyriches wants to merge 5 commits into
jonathanembleyriches wants to merge 5 commits into
Conversation
…ator URLab's MuJoCo component layer used to hand-roll its mjsX import, spec-write, and view-bind code per class. This commit replaces that with a snapshot-driven generator under Scripts/codegen/. Pipeline: - codegen_rules.json is the editorial source of truth; it points at four JSON snapshots (MJCF schema, mjsX struct fields + mjs_setTo signatures, mjxmacro pointer tables, libclang AST scrape). - generate_ue_components.py reads rules + snapshots and emits UE C++ between explicit CODEGEN_*_START/END markers. - regen_all.py orchestrates snapshot rebuilds + the codegen pass. - URLab.Build.cs invokes generate_ue_components.py --check on every editor compile, so hand-edits inside marker regions surface as build failures rather than silent drift. Components moved to codegen ownership: Geom (+ 8 primitive subclasses via banner-mode overwrite), Site (+ 5 type subclasses), Joint (+ Hinge / Slide / Ball / Free), Body, Frame, Inertial, Camera, Sensor (+ 28 subtypes via auto-emitted type switch + TagToType map), Actuator (+ 9 subtypes via subtype_setto), Equality (+ 7 subtypes via multi_uclass + objtype_dispatch), Tendon, ContactPair, ContactExclude, Keyframe, Flexcomp, plus all seven view structs in MjBind.h and MjGeom's FinalType resolution block. Synthetic struct emission (whole file): FMjOptionGenerated (replaces FMuJoCoOptions), FMjStatistic, six FMjVisualX structs, FMjsCompilerOptions, FMjsSpec, MjOptionEnums (EMjIntegrator / Cone / Solver), MjArticulationRegistry. Spec-write helpers extracted to consolidate emission sites: MjSetDoubleVec, MjSetString, RegisterAllOf<T>. Rule shapes introduced: fully_emitted (gated by an audit diagnostic on overwrite), objtype_dispatch (equality), geom_final_type, extra_constructor, bind_override, mjs_data_packed_attrs (slot ranges in mjsEquality.data), target_collations, subtype_setto, synthetic_categories, generated_enums, value_map_from_enum, unit_conversion (per-attr per-Type op, e.g. joint cm/deg storage), vec3_convert: y_negate (joint.axis handedness). Vendored mjspecmacro.h is pinned via sync_vendored.py + _VENDORED_FROM.md manifest and folded into mjxmacro_snapshot at build time. Test surface: ~95 pytest cases cover emission shapes, snapshot parsers, and drift internals. URLab automation expands to 207 cases; all green. docs/guides/codegen_contributors.md documents the rules shape, the snapshots, the drift gate, and the --check / --strict flags.
Adds forward-maintainability drift checks (hand-enum drift, type/shape drift, new-attr typing, orphan rule entries, mjxmacro block coverage). Every check routes through a shared DiagBuffer; --strict promotes a non-zero fired count to a non-zero exit code. CLI: --strict + --require-introspect default on, so a stale or missing snapshot fails CI loudly rather than producing degraded output. Single source of truth for UE types: _UE_TYPE_INFO is the only type-list the codegen consults. Default-initializer choice, drift-check shape, and Units-meta gate become thin views over it (was three separate type lists). Refactor pass on the emission surface: - compute_canon_absorbed + iter_category_attrs formalise the standard exclusion gate that every per-attr emit used to inline. - emit_enum_switch helper. - PhaseContext threads as a single arg through every emission phase. - Canonicalisation registry replaces ad-hoc canon dispatch. - Property emission, xml_enum property-decl emission, and emit_xml_passthrough_body each split into focused helpers. - emit_schema_for_attrs collapses to a 3-line orchestrator. - _emit_synthetic_struct_files splits into 4 helpers; _emit_drift_diagnostics factors into 6 _check_* helpers. Inject helper: a multi-tag entry point with a brace-balance gate replaces ad-hoc text-injection sites. A brace-aware ctor extractor in the banner-overwrite audit replaces a regex that truncated on inner braces. libclang scrape of URLab's hand-written EMj* enums lands in the introspect snapshot. Editor UX: hide MJCF-overlapping properties (Pos, Quat, size, etc.) when the UE Transform widget owns the value. Generated articulation enums (MjArticulationEnums.h) are staged behind extra_members + disabled flags; activation lands in a follow-up once the runtime articulation surface consumes them.
generate_ue_components.py was a monolith. Extracted three sibling modules: - _codegen_core: DiagBuffer, _UE_TYPE_INFO + derived views, the 5-strategy _resolve_mjs_field chain, _resolve_value_map, schema readers, FileWrite dataclass. - _codegen_inject: text-injection helpers, brace-balance gate, multi-tag inject scaffolding. - _codegen_checks: 17 drift checks + _emit_drift_diagnostics. generate_ue_components.py becomes the slim orchestrator that threads PhaseContext through 11 emission phases. Resolver chain hardened: the 5 strategies (direct match, name suffix, root-name-digits, underscore-normed, actuator-prefix) each become a separate helper with its own test, with a lock-in test pinning the dispatch order. New drift checks: apply-mode validity, embedded-C++ references, allowlist staleness (covers 4 of the 5 intentionally_* allowlists; the fifth is covered indirectly by compiler-attrs coverage). Snapshot pipeline simplification: build_mjspec_snapshot.py (the regex scrape) is retired. The libclang introspect snapshot is the single source of truth for the MuJoCo C API surface. Behavior fixes: geom.fitscale is now double (matching the mjsGeom field type), not bool. Site exposes Mesh + Hfield as additional site types. Inject helper: empty CODEGEN blocks collapse to adjacent markers, no stray blank line. END marker indent mirrors START's. Rule contract: a codegen_rules.json shape test pins every reader's key shape so typos surface in pytest, not as UHT errors later. MjArticulationEnums emission activated. EMjGainType, EMjBiasType, EMjDynType, EMjGeomInertia, EMjFluidShape, EMjActuatorTrnType, EMjDcMotorInput, EMjJointType, EMjGeomType, EMjSiteType all emit from a single codegen-owned header that MjActuator, MjGeom, MjCameraTypes, MjSensor, and MjFlexcomp include. New plugin utilities: URLabAxisConv (MuJoCo <-> UE axis conversion helpers) and URLabLocaleSafeFloat (locale-independent float parser for MJCF imports under non-English locales).
Diagnostic hygiene:
- All coverage diagnostics carry consistent source= tags so
--strict greps can target specific message buckets.
- _check_generated_enum_coverage diagnoses a missing from_mj_enum
instead of silently continuing.
- editor_option_helpers and banner-safety paths surface their
missing-marker / OSError conditions that previously went silent.
- apply_writes writes bytes + binary-compares so codegen output is
byte-identical across OSes (was silently baking CRLF on Windows).
- main() asserts non-empty enums / structs / setto_functions after
the introspect projection — a libclang scrape shape change used
to silently empty the projection.
- An orphan-file walk catches stale Generated/ residue when a
category is removed from rules.
Dead code removed: all_owned_decl_names, two dead-store assigns,
a duplicate _attr_default_value branch, EmissionContext.category_
label, the unused mjspec param on _resolve_value_map + its
callers. Unused UE types pruned from _UE_TYPE_INFO. Orchestrator
import block shrinks from 66 to 27 names.
Contract tests: test_rule_shape_contract pins inner-key shapes
for categories, subtypes, element_rules, xml_enum_attrs,
canonicalizations, synthetic_categories, generated_enums. A
typo'd inner key now surfaces in pytest instead of letting every
reader's .get(..., {}) silently drop work. Canonicalisation and
EmissionPhase dataclass callables get real Callable signatures
(were Any). A _value_map_pair helper consolidates 4 inline shape
guards.
Test surface cleanup: 10 low-signal duplicates / self-identity
assertions dropped. New focused tests: test_coverage_checks
(13 cases for 6 previously-untested drift checks),
test_xml_passthrough_emit (8 cases for emit_xml_passthrough_body,
which had zero direct coverage), test_canon_emitter_bodies
(5 cases for actuator_transmission + fromto_decompose canon
emitters).
Rules JSON consolidation: 15 empty exclude_attrs defaults dropped
(readers already fall back to []). A _doc_transform_widget_policy
top-level note consolidates 5 per-element _note_hidden prose
variants. Misleading _note_setto_param_defaults string-literal
claim corrected.
Legacy URLab-internal compat surface removed: diagnostic API
aliases, the retired build_mjspec_snapshot.py fallback paths,
pascal_case identity helper, the export_if + scalar_int_fields
rule features that no production rule uses, the
disable_schema_emission gate, default_subtype_key,
articulation_registry_path.
Exclusion-gate collapse: iter_category_attrs grows an extra_excl
kwarg; per-call inline if/continue chains collapse into a single
parameter. _CANON_MJS_FIELDS lifts inline next to its sole
consumer in _check_mjs_struct_field_drift.
_check_new_attr_typing also surfaces type_mappings entries whose
UE type is not in _UE_TYPE_INFO (catches typo'd TArray<flot> at
JSON-edit time).
test_behavioral_emit_guarantees pins three semantic properties
that the substring lock-ins didn't cover: xml_enum import/export
roundtrip pairing, mjs_setTo arg order following the C signature,
and bOverride_X / X identifier agreement across every category.
Correctness fixes: - FMjOptionGenerated::ApplyToSpec gates every write with bOverride_X. Previously wrote unconditionally, silently clobbering MJCF-XML defaults with UE struct defaults whenever the user had not overridden a field. Now matches the existing ApplyOverridesToModel semantics. - Slider-crank actuator transmission misimport: the slidersite branch fired before site, then site clobbered the transmission type back to Site. Per MJCF, slidersite + site means SliderCrank with site as the target. Reorder so slidersite runs last, after the main target is captured. - Unit-conversion + double-vec exports silently lost the attr_to_mjs_field remap when mjs_fields was empty (the --no-require-introspect path). Apply the per-attr export's fallback to both miss sites. - _phase_objtype_dispatch and _phase_geom_final_type re-read the base .cpp from disk even when _phase_categories had already queued a FileWrite for the same path; the later phase would inject into stale text and apply_writes would overwrite the earlier phase's update on the first run after a rule change. _inject_tags_into_cpp now uses the pending FileWrite content when present and replaces the entry in-place. - _emit_xml_enum_import sets the override toggle only when a value_map branch actually matched. A typo'd XML enum value (e.g. type="hindge") no longer silently masks itself by setting the toggle on the UE-default enum value. Latent hazards closed: - test_ue_type_registry monkey-patched _UE_TYPE_INFO onto the gen module at import time, leaking into every subsequent test. Switched to direct imports from _codegen_core. - _audit_banner_safety only diagnosed; apply_writes would still destroy hand-edited content before the --strict gate fired. main() now snapshots banner_overwrite diagnostics before apply_writes and refuses to write in strict mode. - apply_writes stages to <path>.codegen.tmp + os.replace so a Ctrl-C between truncate and write doesn't leave a partial file UBT would pick up. - Synthetic mirror 'count' parsing accepts both str and int — was brittle to a future mjxmacro snapshot shape change. - Synthetic ApplyTo inline uses static_cast<decltype(X)>(V) instead of C-style (decltype(X))V so narrowing warnings surface at compile. - New contract test rejects 'inf' / 'nan' / hex / empty strings in setto_param_defaults at JSON-edit time (values land verbatim in generated C++). - Extend value_map shape contract to reject empty maps and empty-string xml keys (would silently set the override toggle without ever matching the enum). Hygiene + perf: - inject_between_tags caches the compiled regex per tag (was re.compile-ing on every call). - Drop spurious 'f' suffix on double-typed mjOption defaults (Timestep, Impratio, Tolerance, NoslipTolerance, CCD_Tolerance) — were triggering -Wnarrowing. - Stale docstring + comment fixes: snapshot paths updated, retired build_mjspec_snapshot.py references replaced. - test_synthetic_categories uses _reset_diags() instead of raw _DIAGS_BUFFER.pending.clear() so fired_count is zeroed too. Refactor: a single _inject_or_diag helper replaces six hand-rolled "try inject_between_tags then if-not-ok diag" stanzas across emit_subclass_files, emit_base_class_injection, emit_multi_uclass, and emit_bind_h_injection (~60 LoC of boilerplate removed). Rename: elem_rule (singular) -> elem_rules (plural) in _codegen_checks.py to match the orchestrator's convention.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace URLab's hand-rolled mjsX import / spec-write code with a
snapshot-driven codegen under
Scripts/codegen/. A rules JSONplus four snapshots (MJCF schema, mjsX struct fields, mjxmacro
pointer tables, libclang AST scrape) drive a generator that emits
UE C++ between explicit
CODEGEN_*markers in component files.URLab.Build.csrunsgenerate_ue_components.py --checkon everyeditor compile, so hand-edits inside marker regions surface as
build failures rather than silent drift.
Components moved to codegen ownership: Geom, Site, Joint, Body,
Frame, Inertial, Camera, Sensor, Actuator, Equality, Tendon,
ContactPair, ContactExclude, Keyframe, Flexcomp, and the seven
view structs in
MjBind.h.Five commits; per-chunk detail in each commit message.
Motivation
Replace hand-maintained mirror code so new mjsX fields are JSON
edits and drift between MuJoCo's C API and URLab's mirror surfaces
at compile time, not runtime.
Linked issue
None.
Build + test evidence
pytest Scripts/codegen/testsalso green at 281/281.Manual verification steps
actuators, sensors), confirm the Details panel hides Pos / Quat
on geom and site (transform widget owns the value), export back
to MJCF, diff for round-trip stability.
site=and
slidersite=should land asTransmissionType=SliderCrankwith
siteasTargetName(was misimported as Site).sets
timestep=0.001; withbOverride_Timestepleft off,ApplyToSpecmust preserve0.001instead of clobbering itwith the UE struct's default.
Checklist