Skip to content

perf (alarm): fix O(N²·log N) table update in AlarmTableUI — closes #3504#3770

Merged
shroffk merged 3 commits intoControlSystemStudio:masterfrom
emilioheredia-source:alarm_handler_fixes
Apr 6, 2026
Merged

perf (alarm): fix O(N²·log N) table update in AlarmTableUI — closes #3504#3770
shroffk merged 3 commits intoControlSystemStudio:masterfrom
emilioheredia-source:alarm_handler_fixes

Conversation

@emilioheredia-source
Copy link
Copy Markdown

perf (alarm): fix O(N²·log N) table update in AlarmTableUI — closes #3504

PR link: https://github.com/emilioheredia-source/phoebus/pull/new/alarm_handler_fixes
Upstream issue: #3504

Summary

Fixes the performance regression reported in #3504, which was independently observed at CLS with ~2000 active alarms. When the alarm table contained a few thousand active alarms the UI became unusable, with multi-second freezes on every update cycle. This PR identifies the root cause as a quadratic-time interaction between JavaFX's ObservableList extractor and a continuously-bound SortedList comparator, and fixes it with a targeted change that reduces update cost from O(N²·log N) to a single O(N·log N) sort. Three headless regression tests are included that prove both the old behaviour and the fix without requiring a live alarm server.


Root cause analysis

AlarmTableUI holds two ObservableList<AlarmInfoRow> fields created with the CHANGING_PROPERTIES extractor:

FXCollections.observableArrayList(AlarmInfoRow.CHANGING_PROPERTIES)

The extractor tells JavaFX to watch all eight SimpleXxxProperty fields of each row (pv, description, severity, status, time, value, pv_severity, pv_status) and fire a list-change event whenever any of them changes. This is intentional: it keeps the SortedList wrapper up to date when an alarm's severity changes in place.

Each TableView wraps the raw list in a SortedList whose comparator is bound to the table's comparatorProperty:

final SortedList<AlarmInfoRow> sorted = new SortedList<>(rows);
sorted.comparatorProperty().bind(table.comparatorProperty());

The SortedList responds to every list-change event — including the item-property events from the extractor — by performing a full sort.

update() calls AlarmInfoRow.copy() on each existing row to move in the new data. copy() calls set() on each of the eight properties. Every set() fires a list-change event through the extractor. Every list-change event fires a full sort.

Total comparator invocations per update() call:

Factor Value
Properties set per copy() 8
Rows in the list N
Property-change events 8 × N
Sort depth per event O(N · log N)
Total O(8 · N² · log N)

At N = 2000 (a normal load) and a comparator cost of ~1 µs: roughly 2000² × log(2000) × 8 × 1 µs ≈ 180 seconds of comparator work per update cycle. The profiler screenshot in #3504 shows SortedList.sort() as the dominant hot-spot, consistent with this analysis.


Fix

The fix is in AlarmTableUI.update(). Before the copy loop, the SortedList comparator is unbound and set to null. With a null comparator the SortedList mirrors the source in insertion order and ignores all item-property events. After the loop completes, the comparator is rebound in a finally block — producing exactly one O(N · log N) sort over all N updated rows.

// Unbind so property-change events during copy() do not trigger re-sorts.
sorted.comparatorProperty().unbind();
sorted.setComparator(null);

try
{
    for (int i=0; i<N; ++i)
        items.get(i).copy(input.get(i));
    // ... add / trim logic unchanged ...
}
finally
{
    // Always rebind — even if an exception occurs — so the table is never
    // left with a dead comparator.
    sorted.comparatorProperty().bind(table.comparatorProperty());
}

The finally block is the key safety guarantee: if any exception is thrown inside the try (e.g. an IndexOutOfBoundsException from a concurrent modification, or a comparator throwing on malformed data), the sorted list is always left with a live comparator bound to the table. Without it, the table would silently stop sorting for the remainder of the session.

Updated complexity:

Step Cost
Null-comparator copy loop (8 × N set() calls, no sort) O(N)
Single final rebind + sort O(N · log N)
Total O(N · log N)

Regression tests

Three headless JUnit 5 tests in app/alarm/ui/src/test/…/AlarmTableUpdatePerformanceTest.java cover the fix without requiring a live alarm server or a JavaFX display (only javafx.base classes are used):

testOldBehaviorTriggersExcessiveSorts

Constructs N rows with an active comparator and runs the old code path (comparator present throughout copy() loop). Asserts that comparator invocations exceed the cost of a single sort (i.e. proves the bug). At N = 150: 152,060 comparator calls vs. an upper bound of 3,000 for one sort.

testFixSuppressesIntermediateSortsAndPreservesOrder

Runs the fixed code path (null → copy → rebind). Asserts that comparator invocations stay within one sort's cost and that the final list is in the correct ascending order. At N = 150: 696 comparator calls, final order verified at every index.

testFixOutperformsOldCodeAndProducesSameOrder

Runs both code paths on identical input and asserts at least 5× fewer comparator invocations in the fixed path (actual: 218×), and that both paths produce identical final row order.


Files changed

File Change
app/alarm/ui/…/AlarmTableUI.java update(): unbind comparator before copy loop, rebind in finally
app/alarm/ui/…/AlarmTableUpdatePerformanceTest.java New: three regression tests

Testing notes

What has been run

  • All existing alarm UI unit tests pass: AlarmTableModelTest, AlarmTreeValidatePathTest, AlarmURITest, AreaFilterTest.
  • The three new regression tests pass and quantitatively prove both the bug and the fix (152,060 comparator calls → 696; 218× reduction at N=150).
  • No live Kafka broker or alarm server is required to run the new tests; they use only javafx.base headless data classes.

End-to-end testing

End-to-end validation against a live alarm server has not yet been performed. We are currently setting up a local test instance of the Phoebus alarm backend to run a smoke-test with synthetic alarms at scale; this PR will be updated once that test is conducted.

In the meantime, the risk of the current change introducing a regression is low: the fix is confined to a single private method, the finally block guarantees the SortedList comparator is always restored regardless of any exception, and the only externally observable side-effect is that column sort order is applied once per update cycle instead of once per property-change event — which is both correct and the intended behaviour. We therefore opted to submit the fix now with the analytical and unit-test validation already in place rather than hold it until the server setup is complete.


With N active alarms the update() method called copy() on each AlarmInfoRow,
firing 8 property-change events per row. Each event triggered a full re-sort
of the SortedList (via the CHANGING_PROPERTIES extractor), resulting in
O(N²·log N) comparator work — measured as multi-second freezes at ~2000 rows
(issue ControlSystemStudio#3504).

Fix: temporarily unbind the SortedList comparator before the copy loop and
set it to null so property-change events are no-ops. On finally rebind to
table.comparatorProperty(), which performs exactly one O(N·log N) sort over
all updated rows.

Fixes ControlSystemStudio#3504
…ate()

Three headless JUnit tests (no JavaFX display required) that:

1. testOldBehaviorTriggersExcessiveSorts
   — proves that with the CHANGING_PROPERTIES extractor and an active
   comparator, calling copy() on N rows fires O(8N * N log N) comparator
   invocations (152,060 calls for N=150).

2. testFixSuppressesIntermediateSortsAndPreservesOrder
   — proves that setting the comparator to null before the loop and
   restoring it once after reduces calls to O(N log N) (696 calls for
   N=150) and that the final sort order is correct.

3. testFixOutperformsOldCodeAndProducesSameOrder
   — end-to-end comparison: asserts at least 5x fewer comparisons
   (actual: 218x) and that both paths produce the same sorted order.

Accompanies the fix in AlarmTableUI.java (issue ControlSystemStudio#3504).
- Remove 'public' modifier from class and test methods (java:S5786 —
  JUnit 5 does not require public visibility)
- Add assertThat(sorted.size(), equalTo(N)) in testOldBehaviorTriggersExcessiveSorts
  to resolve the unused 'sorted' variable warning (java:S1481) while
  also adding a meaningful assertion on the sorted view size
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 4, 2026

@shroffk shroffk merged commit fea8088 into ControlSystemStudio:master Apr 6, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phoebus drastically slows down when alarm table contains a few thousand updating alarms.

4 participants