Refactor: rename enable_profiling -> enable_l2_swimlane; extend diagnostics bitmask by ChaoZheng109 · Pull Request #652 · hw-native-sys/simpler

ChaoZheng109 · 2026-04-22T08:43:52Z

Fixes #641

Summary

Introduce a three-layer naming scheme that cleanly separates the user-facing feature flag, the internal collection implementation, and the on-disk artifact:

Layer	Prefix	Scope
User-facing flag	`l2_swimlane`	pytest CLI `--enable-l2-swimlane`; Python kwarg; `ChipCallConfig` / `Runtime` field `enable_l2_swimlane`; mailbox offsets `_OFF_ENABLE_L2_SWIMLANE` / `MAILBOX_OFF_ENABLE_L2_SWIMLANE`; bitmask bit `PROFILING_FLAG_L2_SWIMLANE`
Internal implementation	`l2_perf`	class `L2PerfCollector`; files `l2_perf_collector{,_aicpu,_aicore}.{h,cpp}` and `l2_perf_profiling.h`; function prefixes `l2_perf_aicpu_` / `l2_perf_aicore_`; data types `L2PerfRecord` / `L2PerfBuffer` / `L2PerfSetupHeader` / `L2PerfDataHeader` / `L2PerfFreeQueue` and all `L2PerfCallback` typedefs; scheduler counters `SchedL2PerfCounters` and member `sched_l2_perf_[]`; scheduler locals `l2_perf.l2_perf_enabled` / `l2_perf.phase_`; runtime fields `l2_perf_data_base` / `l2_perf_records_addr` / `l2_perf_buffer_status`
On-disk artifact	`l2_perf_records`	output JSON `l2_perf_records_<ts>.json`; per-subprocess subdir `outputs/l2_perf_records_<tag>/`; environment variable `SIMPLER_L2_PERF_RECORDS_OUTPUT_DIR`; tools CLI `--l2-perf-records-json`; Python helpers `flatten_l2_perf_records_subdirs` / `_snapshot_l2_perf_records_files` / `auto_select_l2_perf_records_json`

The a2a3 and a5 backends are kept in lockstep — identical symbols and filenames, differing only by architecture subdirectory.

Motivation

Today the front-end uses enable_profiling to mean the perf swimlane feature alone, while dump_tensor and pmu are parallel one-off flags. This makes "profiling" mean two different things: an umbrella concept at the product level, and a perf-only sub-feature at the API surface. This PR makes the L2 swimlane an explicit sub-feature parallel to dump_tensor and pmu, and disentangles the naming so that a reader can tell at a glance whether a given identifier refers to the user toggle, the collection code, or the raw records on disk.

Changes

Feature flag rename (hard rename, no legacy alias)

Rename enable_profiling → enable_l2_swimlane end-to-end: pytest CLI --enable-l2-swimlane, Python kwargs, nanobind binding, ChipCallConfig field, C ABI parameter, mailbox offset, and Runtime struct field.
Extend the existing enable_profiling_flag umbrella bitmask so each sub-feature owns one bit: bit 0 = dump_tensor (unchanged), bit 1 = l2_swimlane (new), bit 2 = pmu (renumbered from bit 1 on a2a3 for cross-arch consistency; reserved on a5).
Wire the new l2_swimlane bit through every device_runner that publishes the bitmask to AICore handshakes (a5 and a2a3, sim and onboard).

Collector refactor

Rename the host-side class PerformanceCollector → L2PerfCollector and move files: performance_collector.{h,cpp} → l2_perf_collector.{h,cpp}, performance_collector_aicpu.{h,cpp} → l2_perf_collector_aicpu.{h,cpp}, performance_collector_aicore.h → l2_perf_collector_aicore.h, common/perf_profiling.h → common/l2_perf_profiling.h. All include paths and header guards updated.
Rename AICPU/AICore function prefixes perf_aicpu_* → l2_perf_aicpu_*, perf_aicore_* → l2_perf_aicore_*.
Rename all data types exposed by the collector: PerfRecord → L2PerfRecord, PerfBuffer → L2PerfBuffer, PerfSetupHeader → L2PerfSetupHeader, PerfDataHeader → L2PerfDataHeader, PerfFreeQueue → L2PerfFreeQueue, and every Perf*Callback typedef → L2Perf*Callback.
Rename Runtime fields that carry collector state: perf_data_base → l2_perf_data_base, perf_records_addr → l2_perf_records_addr, perf_buffer_status → l2_perf_buffer_status (the last is flagged for removal in a follow-up but renamed here for consistency).
Rename scheduler-side profiling counters: struct SchedProfilingCounters → SchedL2PerfCounters, SchedulerContext member sched_perf_[] → sched_l2_perf_[], field profiling_enabled → l2_perf_enabled, and local alias auto &perf = sched_l2_perf_[tid] → auto &l2_perf = ... across scheduler_dispatch.cpp, scheduler_completion.cpp, and scheduler_cold_path.cpp (≈80 occurrences).

On-disk artifacts

Rename the runtime's output file prefix perf_swimlane_*.json → l2_perf_records_*.json (the file contains raw per-task records; the swimlane visualization is produced downstream by swimlane_converter.py as merged_swimlane_*.json).
Rename the environment variable SIMPLER_PERF_OUTPUT_DIR → SIMPLER_L2_PERF_RECORDS_OUTPUT_DIR and the per-subprocess output subdirectory prefix outputs/perf_* → outputs/l2_perf_records_*.
Rename the test dispatcher helpers accordingly: flatten_perf_subdirs → flatten_l2_perf_records_subdirs, _snapshot_perf_files → _snapshot_l2_perf_records_files, _wait_new_perf_file → _wait_new_l2_perf_records_file.

Tools

Update tools/swimlane_converter.py, tools/perf_to_mermaid.py, tools/sched_overhead_analysis.py, tools/device_log_resolver.py, and tools/README.md: CLI argument --perf-json → --l2-perf-records-json, internal function auto_select_perf_json → auto_select_l2_perf_records_json, and all references to the old file-name prefix.
Drive-by: fix a small number of pre-existing lint issues surfaced by touching these files (missing copyright header, E501 overflows, F841 unused-variable, one max(key=dict.get) pyright complaint) and several pre-existing markdownlint MD060/MD033 violations in tools/README.md.

Docs

Update docs/testing.md, docs/task-flow.md, docs/profiling-name-map.md, the per-runtime RUNTIME_LOGIC and profiling_levels pages, and tools/README.md with the new umbrella/sub-feature story and renamed identifiers.
Add a one-line umbrella note where appropriate: "Profiling is the umbrella; the three sub-features are --enable-l2-swimlane, --dump-tensor, --enable-pmu."

Tests

Add a ChipCallConfig round-trip unit test that asserts all three diagnostics sub-feature flags travel together through the nanobind binding, guarding against drift where only two of the three fields get plumbed.

Test plan

pip install --no-build-isolation -e . builds cleanly on both a2a3 and a5
pytest tests/ut/py -x — 217 passed (one pre-existing unrelated failure in test_scene_test_cache.py on upstream/main, logged to local KNOWN_ISSUES.md)
Grep gate clean: rg '\benable_profiling\b' src/ python/ simpler_setup/ tests/ examples/ docs/ tools/ conftest.py returns no hits outside the intentional umbrella name enable_profiling_flag
Local pre-commit clean on the full diff (clang-format, clang-tidy, cpplint, ruff check, ruff format, pyright, markdownlint, check-headers, check-english-only)
Simulation scene test with --enable-l2-swimlane (the runner sandbox has no simulator access; please verify in CI)
Hardware smoke run combining --enable-l2-swimlane, --dump-tensor, and --enable-pmu (nice-to-have; simulation coverage is the primary gate)

gemini-code-assist

Code Review

This pull request renames the --enable-profiling CLI option and its associated internal flags to --enable-perf to clarify its role as a sub-feature of the broader profiling diagnostics umbrella. The changes span across Python test configurations, documentation, C++ runtime headers, and device-side executors. Feedback highlights missing logic in the Python worker's bootstrap loop for unpacking and passing the new diagnostic flags. Additionally, it is suggested to read the performance flag from the handshake bitmask in several AICore executors to maintain consistency with other diagnostic features.

…nable_l2_swimlane Fixes hw-native-sys#641 Today the front-end uses `enable_profiling` to mean perf swimlane only, while `dump_tensor` and `pmu` are parallel one-off flags. This makes "profiling" mean two different things: an umbrella concept at the product level vs. perf-only at the API surface. Make L2 swimlane an explicit sub-feature parallel to dump_tensor and pmu: - Rename `enable_profiling` -> `enable_l2_swimlane` end-to-end: pytest CLI (`--enable-l2-swimlane`), Python kwargs, nanobind binding, ChipCallConfig field, C ABI param, mailbox offset, runtime struct field. No legacy alias. - Extend the existing `enable_profiling_flag` umbrella bitmask so each sub-feature owns one bit: bit0=dump_tensor (unchanged), bit1=l2_swimlane (new), bit2=pmu (renumbered from bit1 on a2a3 for cross-arch consistency; reserved on a5). - Wire the new l2_swimlane bit through every device_runner that publishes the bitmask to handshakes (a5/a2a3 sim+onboard). - Rename output artifacts and helpers: `perf_swimlane_*.json` -> `l2_swimlane_*.json`; env var `SIMPLER_PERF_OUTPUT_DIR` -> `SIMPLER_L2_SWIMLANE_OUTPUT_DIR`; per-subprocess output subdir prefix `outputs/perf_*` -> `outputs/l2_swimlane_*`. - Update docs (testing, task-flow, profiling-name-map, tensor-dump, RUNTIME_LOGIC) and add a one-line umbrella note: "Profiling is the umbrella; the three sub-features are --enable-l2-swimlane, --dump-tensor, --enable-pmu." - Add a ChipCallConfig round-trip test guarding against drift where only two of the three sub-features are plumbed.

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

ChaoZheng109 force-pushed the fix/issue-641-unify-profiling-abstractions branch 4 times, most recently from d1caf62 to c8b71b6 Compare April 23, 2026 01:43

poursoul previously approved these changes Apr 23, 2026

View reviewed changes

ChaoZheng109 mentioned this pull request Apr 23, 2026

Refactor: remove perf state from generic Runtime struct #658

Open

5 tasks

ChaoZheng109 dismissed poursoul’s stale review via f1ad890 April 23, 2026 03:10

ChaoZheng109 force-pushed the fix/issue-641-unify-profiling-abstractions branch 2 times, most recently from f1ad890 to 642801e Compare April 23, 2026 06:17

ChaoZheng109 changed the title ~~Refactor: rename enable_profiling -> enable_perf; extend diagnostics bitmask~~ Refactor: rename enable_profiling -> enable_l2_swimlane; extend diagnostics bitmask Apr 23, 2026

ChaoZheng109 force-pushed the fix/issue-641-unify-profiling-abstractions branch 10 times, most recently from 3b08641 to 0be9313 Compare April 23, 2026 11:03

ChaoZheng109 force-pushed the fix/issue-641-unify-profiling-abstractions branch from 0be9313 to 2e161dd Compare April 23, 2026 11:28

ChaoWao approved these changes Apr 24, 2026

View reviewed changes

ChaoWao merged commit 737288d into hw-native-sys:main Apr 24, 2026
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: rename enable_profiling -> enable_l2_swimlane; extend diagnostics bitmask#652

Refactor: rename enable_profiling -> enable_l2_swimlane; extend diagnostics bitmask#652
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoZheng109:fix/issue-641-unify-profiling-abstractions

ChaoZheng109 commented Apr 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChaoZheng109 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Feature flag rename (hard rename, no legacy alias)

Collector refactor

On-disk artifacts

Tools

Docs

Tests

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChaoZheng109 commented Apr 22, 2026 •

edited

Loading