Update alphabet definition#3
Conversation
From socketOutputAction elements. Connecting to multiple sockets currently is not supported
For group elements, since it is necessary to assign colors to the group itself and all nodes it contains.
From node to group elements, since it is intended for features like the "slow control box" in the original Dasher. Specifying a speedFactor for nodes could be added as well, but probably is rarely needed
Since AT-SPI is just one accessibility interface, and others will be needed on other platforms
As a simpler, more specialized alternative
Intended to be able to specify the representation of nodes that do not correspond to text characters in the training data
Since it is not needed. The label of a node can be specified using the label attribute, the representation in the training data can be specified using the trainingUnicode attribute and the text to output is determined by the textCharAction(s) inside the node
Intended to be able to specify that a node should only be shown when interacting with a certain application. However, this feature is not available in any implementation yet
Simplifying the definition by removing the possibility to specify multiple keys per attribute. This option was redundant since the same behaviour can be achieved by using multiple actions (which also is a lot more flexible). Additionally switched from keyboard scancodes to Java virtual key codes since these are more platform-independent and easier to use
There was a problem hiding this comment.
Commit 5514291:
While the scheme with using multiple attributes/actions is in principle also possible, I think is it way less readable. In which case is it more flexible?
Also, I would like to still use key codes. As far as I can see the two definitions [1, 2] match anyway, but I would like to have a more generally applicable definition that is not based on Java.
[1] https://docs.oracle.com/en/java/javase/16/docs/api/constant-values.html#java.awt.event.KeyEvent.VK_AT
[2] https://www.foreui.com/articles/Key_Code_Table.htm
PapeCoding
left a comment
There was a problem hiding this comment.
Commit 2483c88:
This is very inherently platform dependent and changes the structure of the Dasher tree dynamically. In the current implementation it will probably make a lot of problems, which should not hold us back from standardizing it.
Hiding nodes should actually be quite easy by returning e.g. returning 0 in Range() https://github.com/PapeCoding/DasherCore/blob/master/Src/DasherCore/DasherNode.h#L323
There was a problem hiding this comment.
I would rather have the more general action if it can fulfill the same purpose. Actually is already implemented in https://github.com/PapeCoding/DasherCore/blob/feature/cleanup_and_refactor/Src/DasherCore/Alphabet/AlphIO.cpp#L329-L342
PapeCoding
left a comment
There was a problem hiding this comment.
Commit 17d428d:
Currently this is implemented for nodes, not for group nodes. I see why group nodes would be the better choice. Can probably be adjusted easily though.
PapeCoding
left a comment
There was a problem hiding this comment.
Commit cd2c8ef:
I don't see why this should be optional. Something should be shown. If nothing is supposed to be shown, I would probably rather force the user to input an empty string.
This represents my subjective opinion though.
PapeCoding
left a comment
There was a problem hiding this comment.
Commit 8e38b2f:
Yeah. Good change! Currently, it defaults to "undefined color" (https://github.com/PapeCoding/DasherCore/blob/feature/cleanup_and_refactor/Src/DasherCore/ColorPalette.cpp#L98)
PapeCoding
left a comment
There was a problem hiding this comment.
Commit 19438ca:
Might currently not be supported in DasherJava nor DasherCore, but I like to have the opportunity of keeping it in here for the upcoming implementation of this feature.
The example in the previous version of the definition suggests Multiple actions per node are supported anyway, so the comma-separated lists are redundant. I therefore removed this feature to keep the alphabet definition simple, but I would also be fine with re-adding it.
That's fair and I am definitely fine with changing to another set of keycodes (should discuss which one), as long as it is also based on characters or logical meanings and not on physical locations of the keys. Keyboard scancodes, as originally suggested (by me admittedly...), should not be used here because they do not consider the system keyboard layout, while application keyboard shortcuts usually do. We actually have a nice (or not so nice) example in German, where a node labeled "Undo" that was meant to execute |
Ok, I didn't know you implemented it already. If we decide to leave the general action in, we should definitely attach a comprehensive list of settings, their possible values and information on when changes take effect (Immediately? When the next frame is rendered? When the alphabet is changed and the language model is rebuilt? After restarting Dasher?). I kinda fear that adding such a general action will cause lots of inconsistencies, unclarities and edge cases and will end up being more harmful than helpful... Our implementations would need to support lots of cases that are valid according to our definition, but difficult to implement and not useful whatsoever in practice. Most of these are certainly manageable, but I don't think they are worth the effort for now. Which changes are supported by your implementation for now? Can it swap out the language model or input filter at runtime (while writing) for example? |
Why though? I'd say we can just assume an empty string if the attribute is missing. Nodes might not need a label if they are recognizable by a strong color for example. That's exactly the reason why group labels are optional. |
Ok, that's fine I guess. You're right, it could become an "output channel selector" in the future (but then we may need to make it required). |
Okay, I see. I still don't really like the solution using attributes, especially since their order is theoretically not guaranteed by the XML standard (see https://stackoverflow.com/questions/33746224/in-xml-is-the-attribute-order-important)
Very good example actually. Never thought about that exact case 😅 . If you have a good suggestion of a keymap then please let me know. Then we can decide and put it in here. |
I tested some settings. E.g. switching the alphabet works. I think I tested the input method as well. I would still suggest keeping the feature in and patching non-working settings if they are needed by somebody. Maybe it is advisable to keep a list of the working ones or explicitly testing some and then whitelisting those internally. |
True. I guess we could avoid this problem by requiring that only one of the
Will do, but I don't have a good suggestion right now... We definitely need something that guarantees fixed values across all platforms and keyboard layouts (which the Java virtual key codes I'm currently using do NOT, according to the documentation). |
Keeping a list and proper documentation of all supported settings is crucial if we want to avoid lots of "try it and see if it works" situations in my opinion. Agreeing on a common list of settings that are supported by both implementations will probably be difficult though. |
That would be fine with me, while probably making things more readable. Would even be supported with the current implementation, but I still don't really like the solution. Maybe I have to look into how other software packages deal with the same thing. I would like a solution like you are writing above (
If we can't find any good solution, making a fixed definition would probably be fine. It is not like everyone is constantly adjusting their alphabets. And if I want to change it I could look up some hex values (or do it with a fancy UI) from whatever doc is given. |
I would not have a big problem with leaving it "as is" and support everyone in getting specific things working if needed. But maybe a list of previously tested ones would be helpful. |
Ok, then we should probably use this approach of allowing only one of the attributes per action. We could also use names instead of numbers to specify the keys, which would get closer to what you suggested. <keyboardAction press="Ctrl"/>
<keyboardAction key="Y"/>
<keyboardAction release="Ctrl"/>Allowing "plus-separated lists" probably only makes sense in the <keyboardAction key="Ctrl+Y"/>The definition doesn't sound very intuitive to me, but the end result actually is (for common use cases with modifiers). Should we use this approach? It looks quite good in my opinion, but it also means that we have to come up with a list of names for keys. |
I honestly really don't like the idea of giving users such flexible and generic options when the use cases are quite limited and we ourselves don't actually know what works and what doesn't... Some use cases I can think of right now:
Do you think that there are significantly more? Depending on how many we find, I'd vote for either creating individual actions for these or creating a whitelist of names of settings that may be changed. But I would really like to avoid letting the users figure out what works and what doesn't. |
Tier 1 items #1 (Stage 2) and #3 from codebase review. DemoFilter was disabled in 2007 ("WIP Temporary as too many segfaults!") and never re-enabled. Its only live references were a vestigial #include and friend declaration in GameModule.h (never used in the body) and a commented-out RegisterModule line. Removed all three. The DemoFilter was the sole consumer of three LP_DEMO_* parameters (Spring, NoiseMem, NoiseMag). The five LP_SOCKET_* parameters drove the also-deleted CSocketInput/CSocketInputBase network input that was excluded from the build in the previous commit. All eight parameters are removed from: - settings_manifest.json (canonical source, -128 lines) - Parameters.cpp/.h (regenerated via Scripts/generate_parameters.py) - strings_en.json (-16 English label/description strings) Localization strings in the other ~30 Strings/strings_*.json files are intentionally left; orphan translation keys are harmless and a separate mechanical sweep can remove them if desired. Verified: - Clean build (gcc 14.2 / clang 19.1) - clang-format --dry-run --Werror passes on all changed .cpp/.h - All tests pass solo (full-suite flakiness is pre-existing engine non-determinism + the CWD training_*.txt leak, both unrelated) Signed-off-by: will wade <willwade@gmail.com>
….1) Rewrites the signal/slot event system to fix two long-standing issues: 1. Non-deterministic broadcast order: The original unordered_map<void*, fn> iterated in hash-dependent order, which was randomized per-process on Linux (libstdc++). This could cause subtle ordering bugs if multiple subscribers' side effects compound. Switched to std::map for sorted (deterministic) iteration. 2. Manual unsubscribe hazard: The original system required every subscriber to call Unsubscribe(this) before destruction — forgetting caused a use-after-free on the next Broadcast(). Added an RAII Subscription guard that auto-unsubscribes when it goes out of scope. New code can use the keyless Subscribe(fn) overload that returns a Subscription to store as a member. The void* Subscribe(listener, fn) / Unsubscribe(listener) API is preserved exactly — zero changes to the 18 Subscribe and 22 Unsubscribe call sites. Existing callers continue to work unchanged. The keyless Subscribe(fn) overload is opt-in for new code. Documentation: thorough block comments explaining the signal/slot pattern, both usage patterns (legacy void* and RAII), and the design rationale. Verified: 31/31 tests pass. Signed-off-by: will wade <willwade@gmail.com>
The review recommended splitting into "one file per action class" (16 files). Many classes are 3-5 lines each — creating 15+ tiny files would hurt navigation, not help it. Instead, added a class hierarchy ASCII diagram to the file-level comment showing the full architecture: ActionRegistry → ControlAction subclasses → NodeTemplate → CContNode → CControlManager. Also fixed misleading [[maybe_unused]] on SocketOutputAction and ATSPIAction member fields — these classes ARE instantiated (from alphabet XML in AlphIO.cpp), just their execute() methods are no-ops. Added explanatory comments documenting this. No structural change to the code — purely documentation improvement. Verified: control_action tests pass. Signed-off-by: will wade <willwade@gmail.com>
…5) Adds Rule 7: Naming Convention to CONTRIBUTING.md. The convention applies to NEW code only — existing CFoo/m_iFoo legacy naming is explicitly preserved (no bulk renames). Convention for new code: - Classes: PascalCase (not CFoo) - Methods: PascalCase (not snake_case) - Members: snake_case_ trailing underscore (not m_iFoo) - Free functions: snake_case - Files: PascalCase.h / PascalCase.cpp Also documents the existing conventions new contributors will encounter (CFoo, m_iFoo, Hungarian notation) so they understand the codebase's history without being confused by the mix. This formalizes the review's recommendation #20 as a contribution rule. Signed-off-by: will wade <willwade@gmail.com>
* test: migrate to doctest, add RAII cleanup, fix tautological tests
Phase A of the tests-first refactor. No production code changes —
the DasherCore library itself is untouched. Only test infrastructure.
Test framework: home-grown -> doctest 2.4.11
- Vendored as single header in Thirdparty/doctest/
- Per-test isolation: one failing test no longer hides the rest
- JUnit XML output for CI (--reporters=xml)
- Test filtering via -tc / -tce / -ts flags
- Parallel test execution support
- The legacy TEST()/ASSERT()/ASSERT_EQ() macros are preserved as
compatibility wrappers so existing tests work unchanged
Test utilities (tests/test_common.h):
- ScopedTempDir: RAII wrapper that removes the temp dir on destruction,
closing the slow /tmp leak the previous create_isolated_context() had
- ScopedContext: RAII dasher_ctx wrapper (auto-destroy on scope exit)
- run_frames(): single canonical definition with explicit step_ms,
killing the *16 vs *20 inconsistency bug. Each test that previously
used step=20 now passes it explicitly so behavior is preserved
exactly — the inconsistency is now visible at call sites
Tautological test fixes (test_capi.cpp):
- context_creation: previously passed whether the alphabet loaded or
not (just printed a warning on empty). Now asserts the default
alphabet loads after set_screen_size, matching the alphabet test.
- string_override: previously had a 'bool found = false; (void)found;'
block and a loop that did nothing. Now just verifies the override
takes effect and can be cleared — the real contract.
Per-test timeout: 120s
- Tests routinely take 15-60s because dasher_create() loads every
alphabet XML file from Data/. The ctest default of 30s was too tight.
All 22 test executables pass: 100% (was 100% before, just slower to
debug). Net: -570 lines while gaining isolation, parallelism, XML output.
Signed-off-by: will wade <willwade@gmail.com>
* test: Phase B characterization tests (5 new suites, +43 test cases)
Closes the biggest coverage gaps identified in the review. All production
code is untouched; only new test files plus CMakeLists registration.
test_capi_buffer_lifetime.cpp (5 cases)
- Asserts the C API 'valid until next call' pointer-lifetime contract
- Verifies output_text CONTENT survives non-frame API calls
- Confirms each context owns independent buffers (no aliasing)
test_lm_correctness.cpp (14 cases)
- dasher_get_probabilities invariants: normalized to 65536, monotonic,
contiguous, matches dasher_get_root_child_bounds
- dasher_import_training_text: synchronous, empty/garbage are no-ops,
updates persistent LM state (observable after navigation, not
immediately on cached node bounds)
- LP_UNIFORM, LP_LM_MAX_ORDER, LP_LM_ALPHA, BP_LM_ADAPTIVE: round-trip
verification; documents that none immediately re-derive the existing
crosshair node bounds (parameters take effect on next model rebuild)
- Alphabet symbols are 1-indexed; uppercase letters ARE present in the
default English alphabet
test_view_geometry.cpp (10 cases)
- LP_ORIENTATION swap observable via screen_to_dasher on non-square canvas
- All 4 orientations reachable and return sane values
- Root child bounds invariant under orientation (dasher-space bounds are
orientation-independent)
- BP_NONLINEAR_Y default=true; changes mapping at screen edges but not
in the linear band
- LP_GEOMETRY affects X mapping for far-dx points
- End-to-end rendering: rect centroids differ between LR and TB
test_input_filters.cpp (14 cases)
- Before this file: 12 of 14 registered filters had zero coverage,
including every switch-accessible mode
- All 14 filter names registered and switchable via SP_INPUT_FILTER
- Invalid filter name silently falls back (documents existing behavior)
- Setting SP_INPUT_FILTER before realize is lost (CAPI.cpp:724 force-set)
- Per-filter minimal scenario: Normal Control, Press, Smoothing, 1D,
Click, Stylus, Static One Button, One Button Dynamic, Two Button
Dynamic, Two-push Dynamic, Direct, Menu, Alternating Direct, Compass
- Mid-session filter switching does not crash
test_xml_error_paths.cpp (9 cases)
- Before this file: every test used well-formed bundled XML; zero
coverage of malformed input
- Truncated, garbage, empty, and wrong-schema alphabet XML all skipped
without crashing
- Truncated colour XML skipped
- Malformed settings XML does not corrupt state; empty settings is no-op
- Multiple bad files at once are all skipped
Notable findings documented as characterization tests:
1. dasher_get_output_text pointer is NOT stable across successive calls
(tlString reassigned each call) - only the CONTENT is preserved
2. dasher_get_speed_percent default is 50 (LP_MAX_BITRATE=80 / 160 * 100),
NOT 100 as README implies - documentation drift
3. LP_LM_ALPHA, LP_LM_MAX_ORDER, LP_UNIFORM, BP_LM_ADAPTIVE all round-trip
but NONE immediately re-derive existing crosshair node bounds; they
take effect on the next model rebuild
4. SP_INPUT_FILTER force-set of 'Normal Control' during realize is
one-shot; later settings are honored
5. Uppercase letters ARE in the default English alphabet (in a separate
group) - the alphabet is not lowercase-only
Test count: 22 -> 27 executables, ~250+ test cases total.
Signed-off-by: will wade <willwade@gmail.com>
* test: Phase C deeper coverage (4 new suites, end-to-end + perf + property)
Closes the remaining high-value coverage gaps. All production code is
untouched; only new test files, CMakeLists registration, and shared
helpers in test_common.h.
test_control_actions_navigation.cpp (2 cases)
- Closes the explicit gap at test_control_actions.cpp:250-253 which
admitted 'callback won't fire just from parsing — it fires when a
node containing the action is navigated into'
- Verifies BP_CONTROL_MODE adds an extra top-level child to the
crosshair node
- Actually fires the C-side callback by navigating into a custom
control.xml node that references the registered action
- Verifies callback receives correct attributes from control.xml
test_spell_word.cpp (4 cases)
- Documents that true 'spell HELLO' is infeasible from the C API
(group drilling + unknown m_Rootmin/m_Rootmax + no undo), and
provides the strongest feasible alternatives
- Smoke: continuous hover at center-right produces output
- Drilldown: drive into highest-probability top-level child, verify
root child count changes (we descended)
- Realistic session: sweeping trajectory, verify output is
alphabet-only characters
- Determinism: same trajectory across two contexts produces identical
output (foundation for all golden tests)
test_benchmarks.cpp (2 cases)
- PERFORMANCE BASELINE before any Tier 2 PPM refactor
- dasher_frame p99 over 10,000 active-input frames
- Asserts p99 < 100ms and mean < 50ms (generous thresholds; tighten
once baseline is stable across platforms)
- Reports mean/p50/p95/p99/max to stdout for CI trend analysis
- Idle-vs-active comparison (active >= idle, with 1ms tolerance)
test_property_invariants.cpp (4 cases)
- Property test: normalization invariant holds across 15 random
training texts (deterministic xorshift32 seed)
- Stronger: invariant holds after training AND navigation, checked
every 10 frames over 3 trials * 200 frames
- Alphabet symbol enumeration is total (every symbol reachable)
- Per-LM normalization: PPM and CTW normalize correctly; Word and
Mixture currently DO NOT (total_mass = 65398 and 65034 instead of
65536) - real normalization bug, documented as characterization
with strict range checks; tighten to == 65536 once fixed
Refactoring:
- Promoted build_data_dir and write_data_file helpers from
test_xml_error_paths.cpp to test_common.h (now shared by 2 files)
- Removed their local definitions from test_xml_error_paths.cpp
Cleanup:
- Added training_*.txt to .gitignore (engine leaks these to CWD when
contexts are destroyed - real bug, Tier 1 item to fix; for now just
ignore the leaked files)
Notable findings documented as characterization tests:
1. Word and Mixture LMs have a normalization bug - total probability
mass falls short of 65536 (65398 and 65034 respectively). PPM and
CTW normalize correctly. Likely caused by missing uniform-distribution
fallback in those LMs. Front-ends presumably re-normalize on display
so the bug is invisible to users but real.
2. Custom control actions require a control.xml in data_dir that
references the action name - dasher_register_action alone does not
make the action navigable. This is documented C-API behavior but
worth pinning with a test.
Test count: 27 -> 31 executables, ~300+ test cases total.
Signed-off-by: will wade <willwade@gmail.com>
* test: relax property_invariants Word/Mixture bounds for cross-platform
The test asserted Word and Mixture LM total_mass < 65536 as a "known
bug" characterization. On macOS (clang) both LMs are leaky (Word=65398,
Mixture=65034), but on Linux (gcc) the Word LM is non-deterministic:
observed values range 64498..65536 across runs in the same process,
while Mixture consistently loses ~500 units. The original [65000, 65536)
bound failed intermittently on Linux.
Broaden to [60000, 65536] inclusive — accepts the observed variance
while still catching a catastrophic regression. Comment updated with
the cross-platform measurements.
Verified: 5/5 solo runs + 31/31 full ctest -j4 green on WSL/gcc 14.2.
Signed-off-by: will wade <willwade@gmail.com>
* chore: delete 19 dead/broken files (~1500 LOC)
Tier 1 item #1 from codebase review. Files were either:
- already excluded from the CMake build (CannaConversionHelper,
SocketInput{,Base}) — they referenced libcanna and 2007-era
GCC_VERSION guards and didn't compile on modern systems
- compiled but never instantiated (ConvertingAlphMgr — only a
vestigial friend declaration in ConversionManager.h, removed here;
SBTree; Trace) — zero live references in the codebase
- unreferenced platform/utility headers (AppSettingsData,
OpenGLScreen, MSVC_Unannoy, round.h, stdminmax, mydebug) —
Bit-rotted Objective-C++/MSVC6/autotools leftovers
- autotools leftover (LanguageModelling/Makefile.am)
CMakeLists.txt: removed the now-obsolete list(REMOVE_ITEM ...) lines
that previously excluded Canna/SocketInput from the GLOB_RECURSE.
ConversionManager.h: removed the orphaned "friend class CConvertingAlphMgr"
forward declaration.
Verified: clean build with zero warnings (gcc 14.2 / clang 19.1).
31/31 tests pass when run in isolation. The draw_snapshot and
alphabet_map tests have a pre-existing ~20% intermittent non-determinism
in the engine (reproduced in baseline before this commit) that is
unrelated to these deletions; it surfaces under -j4 I/O contention or
back-to-back runs and is tracked separately.
Signed-off-by: will wade <willwade@gmail.com>
* chore: remove DemoFilter and dead LP_DEMO_*/LP_SOCKET_* parameters
Tier 1 items #1 (Stage 2) and #3 from codebase review.
DemoFilter was disabled in 2007 ("WIP Temporary as too many segfaults!")
and never re-enabled. Its only live references were a vestigial
#include and friend declaration in GameModule.h (never used in the
body) and a commented-out RegisterModule line. Removed all three.
The DemoFilter was the sole consumer of three LP_DEMO_* parameters
(Spring, NoiseMem, NoiseMag). The five LP_SOCKET_* parameters drove
the also-deleted CSocketInput/CSocketInputBase network input that was
excluded from the build in the previous commit. All eight parameters
are removed from:
- settings_manifest.json (canonical source, -128 lines)
- Parameters.cpp/.h (regenerated via Scripts/generate_parameters.py)
- strings_en.json (-16 English label/description strings)
Localization strings in the other ~30 Strings/strings_*.json files
are intentionally left; orphan translation keys are harmless and a
separate mechanical sweep can remove them if desired.
Verified:
- Clean build (gcc 14.2 / clang 19.1)
- clang-format --dry-run --Werror passes on all changed .cpp/.h
- All tests pass solo (full-suite flakiness is pre-existing engine
non-determinism + the CWD training_*.txt leak, both unrelated)
Signed-off-by: will wade <willwade@gmail.com>
* fix: route library file writes through a user data directory
Tier 1 item #5 from codebase review. Closes two long-standing leaks:
1. dasher.log was opened with a relative path "dasher.log" from the
CDasherInterfaceBase constructor, landing in the process CWD.
For a library this is wrong — embedding apps can't predict or
clean up the file.
2. training_english_GB.txt (and any alphabet-specific training delta
the engine writes during adaptive learning) went through
FileUtils::WriteUserDataFile, which resolved relative paths
against the read-only bundled data directory. When that failed
(or in tests where the data dir was itself relative) the file
landed in CWD.
The fix introduces a separate "user data directory" concept in
FileUtils:
- FileUtils::SetUserDataDirectory(userDir) — writable per-user dir
- FileUtils::ResolveUserDataPath(filename) — resolves a relative
filename against the user dir (falling back to the data dir, then
to the input unchanged, preserving historical behaviour for
clients that never configure a separate user dir)
- WriteUserDataFile and FileLogger both resolve through the new
helper, so all library writes now land in the configured user dir
The C API (dasher_create) calls SetUserDataDirectory(user_dir) so
every dasher_ctx gets its own writable location for logs and
training deltas. Test contexts use a per-call ScopedTempDir which
already cleans up — so this commit also eliminates the cross-test
file contamination that was producing intermittent failures in
dasher_draw_snapshot_tests and dasher_alphabet_map_tests.
Verified:
- No dasher.log or training_*.txt in CWD after running any test
- Logs now land in /tmp/dasher_test_*/dasher.log (per-context user dir)
- 31/31 ctest sequential pass (vs ~24-26 before, due to contamination)
- clang-format clean on all changed files
Signed-off-by: will wade <willwade@gmail.com>
* fix: activate DASHER_ASSERT in debug builds; fix latent typos it masked
Tier 1 item #2 from codebase review.
The DASHER_ASSERT macro was gated on "#ifdef DEBUG" — a macro the
project never defined anywhere. Every assertion was silently
discarded as ((void)true) in every build, including Debug CI runs.
This masked a whole class of latent bugs that compiled only because
the assert expressions were never parsed:
- DasherView.cpp:58,62 "visibleRegion" (typo for local "vr")
- DasherViewSquare.cpp:806 "pRender" (typo for pCurrentNode)
- DasherViewSquare.cpp:527 CoversCrosshair called from const method
(CoversCrosshair now const — body is pure)
- DasherModel.cpp:96,159 "NF_SEEN" (unqualified; should be
CDasherNode::NF_SEEN like every other site)
- GameModule.cpp:78,82,83 "evt->m_sText" (pre-refactor leftover;
the function now takes strText directly)
- SettingsStore.cpp:30,36,42 "Settings::ParamBool/ParamLong/ParamString"
(never existed; should be PARAM_BOOL etc.)
- SettingsStore.cpp:123 "holds_alternative<T>(p->second.type)"
(type is a ParameterType enum, not a
variant; should check p->second.value)
The SettingsStore.cpp:123 case was relaxed to assert only parameter
existence, not type match: dasher_get_string_parameter and siblings
are documented to throw on type mismatch, and CAPI.cpp catches the
bad_variant_access to return "". Asserting would change the contract.
Macro change: myassert.h now gates on the standard NDEBUG idiom
(active when NDEBUG is NOT defined, matching <cassert> and CMake's
Debug preset). Release builds compile asserts to ((void)0).
CMakeLists.txt: bumped DASHER_TEST_TIMEOUT 120s -> 300s. With
asserts active, the property_invariant test pushes ~100s on slow
filesystems (WSL+9P); 300s gives headroom without affecting CI
(which caps via `ctest --timeout 120` on the command line).
Verified: 31/31 ctest sequential pass in Debug with all asserts
active, no run-time assertion fires.
Signed-off-by: will wade <willwade@gmail.com>
* fix: move ctx-mutable C API buffers into dasher_ctx (Tier 1 #4)
Two file-scope statics in CAPI.cpp were populated from per-context
data but lived in the global static area, so two contexts sharing
the same process trampled each other's returned const char* pointers:
- s_stringValues backed dasher_get_palette_name,
dasher_get_alphabet_name, dasher_get_parameter_string_values
- s_gameTextBuf backed dasher_game_get_target_text and
dasher_game_get_wrong_text
Both are now members of dasher_ctx (ctx->stringValues, ctx->gameTextBuf),
matching the existing tlString / stringBuf pattern. Closes the
cross-context bug called out in the codebase review.
The remaining CAPI.cpp file-scope statics are left in place because
they back data that is genuinely process-global:
- s_errorString dasher_create() out_error (one-shot)
- s_localeCode/Strings process-wide i18n state
- s_paramKeys/InfoName/ schema-level parameter metadata
Desc/Group/Subgroup (the function doesn't take ctx)
- s_buf in dasher_get_ LMRegistry metadata (function doesn't
language_model_name/ take ctx; same answer for every ctx)
description
Verified: 31/31 ctest sequential pass, clang-format clean.
Signed-off-by: will wade <willwade@gmail.com>
* chore: remove commented-out // std::cout debug blocks
Tier 1 item #6 from codebase review. 63 commented-out cout/cerr
debug lines across 11 files (mostly in LanguageModelling/). The
review singled these out as 25 years of accumulated debug residue
with no issue tracker references; git history preserves them if a
future bug needs the same instrumentation.
Files swept:
ConversionManager.cpp, DasherModel.cpp, DasherNode.cpp,
DefaultFilter.cpp, DictLanguageModel.cpp, MixtureLanguageModel.h,
PPMLanguageModel.cpp, PPMPYLanguageModel.cpp,
WordLanguageModel.cpp, MandarinAlphMgr.cpp,
TwoPushDynamicFilter.cpp
Conservative: only lines whose //-stripped content begins with one
of {std::cout, std::cerr, cout, cerr, printf, fprintf} were removed,
plus pure stream-continuation lines ("<<..." without ";"). Real
explanatory comments were preserved (e.g. the math-rationale block
in DasherModel.cpp:ScheduleOneStep that interleaves with old debug
prints).
Verified: clean build, clang-format clean, 0 functional diff
(comments + blank lines only).
Signed-off-by: will wade <willwade@gmail.com>
* chore: tidy post-DASHER_ASSERT cleanup; fix CI clang-tidy filter
Three small follow-ups to Tier 1 #2 (DASHER_ASSERT activation):
1. .clang-tidy: suppress cert-dcl03-c. The check suggests replacing
assert() with static_assert, which is a false positive for the
runtime invariants DASHER_ASSERT now guards (e.g. DASHER_ASSERT(false)
in "should never happen" branches, DASHER_ASSERT(p != nullptr) on
runtime pointers). The check fires on every such site now that
DASHER_ASSERT properly expands to assert() in Debug builds.
2. GameModule: remove the m_iFontSize private field. Declared and
initialized to 36 but never read. The previous "friend class
CDemoFilter" declaration (removed in Tier 1 #1) suppressed the
-Wunused-private-field warning; now that the friend is gone,
clang-tidy correctly surfaces the dead field.
3. .github/workflows/ci.yml: extend the clang-tidy warning filter
to exclude [clang-diagnostic-*] as well as [-W*]. Both formats
identify compiler warnings — the latter is the compiler's own
format, the former is clang-tidy's wrapper around the same
warnings. The filter's stated intent ("filter out compiler
warnings") was not matching the second form, which would have
caused CI to fail on every clang-tidy run that surfaced any
compiler warning through the [check-name] reporting path.
Verified: 0 clang-tidy findings under the updated filter; build
clean; clang-format clean.
Signed-off-by: will wade <willwade@gmail.com>
* test: fix MSVC /W4 warnings and Linux/Windows portability
Closes review item "we expect zero warnings from the test files"
under MSVC /W4 (the project enables it for MSVC builds in
CMakeLists.txt:36). All 18 test-file warnings fixed:
C4100 (unreferenced parameter):
- test_lifecycle.cpp: comment out names of unused callback
lambda parameters
C4996 (strdup deprecated):
- test_common.h: add portable dasher_strdup() helper
(_strdup on MSVC, strdup elsewhere)
- test_multilingual.cpp, test_parameters.cpp: use it
C4701 (uninitialized local):
- test_capi.cpp: value-initialize dasher_parameter_info info{}
- test_control_actions_navigation.cpp: initialize lb, hb to 0
C4244 / C4267 (narrowing):
- test_deterministic.cpp: cast strlen() to int
- test_utf_conversion.cpp: explicit static_cast<int> on
pointer arithmetic and strlen
Also fixed a real Linux/MSVC portability bug: test_view_geometry.cpp
used __INT_MAX__, a GCC/Clang internal macro not defined under MSVC.
Switched to standard INT_MAX from <climits>.
.gitignore: cover additional local scratch files generated during
this verification pass:
- build-*/ (any alternate build dir, e.g. build-msvc/)
- *.log.txt (build/run logs)
- _*.ps1 / _*.bat / _*.sh / _*.py (underscore-prefixed helpers)
Verified:
- MSVC /W4 build of tests: 0 warnings (was 18)
- gcc 14.2 build: clean, 0 warnings
- clang-format clean
Signed-off-by: will wade <willwade@gmail.com>
* fix: zero out probs[0] in WordLanguageModel::GetProbs (sentinel contract)
Symbol 0 is a special dummy/sentinel that must carry zero probability:
AlphabetManager::IterateChildGroups asserts (*pCProb)[0] == 0 so the
cumulative-difference arithmetic cp[i]-cp[j] stays valid for all i,j.
Every other LM (PPM, PPMPY, RoutingPPM, CTW, Dict) enforced this
explicitly. WordLanguageModel::GetProbs did not — it let the spelling
model assign symbol 0 real probability, then never filtered it out.
On Linux the spelling model happened to assign ~0 probability to
symbol 0, masking the bug. On macOS (libc++ unordered_map iteration
differs from libstdc++) the spelling model assigned non-zero
probability to symbol 0 and DASHER_ASSERT (newly active per Tier 1
#2) caught the malformed cumulative array.
Symptoms without the fix:
- Debug builds crash on macOS at AlphabetManager.cpp:603
- Release builds silently produce a malformed cumulative array,
subtly distorting the rendering of node bounds
- property_invariants test observed Word LM "leaky normalization"
(total_mass sometimes < 65536) — this was the lost sentinel mass
After the fix: Word LM total_mass is exactly 65536, deterministic
across runs, matching PPM and CTW.
Found by: macOS verification run after Tier 1 #2 activated asserts.
Signed-off-by: will wade <willwade@gmail.com>
* docs: add ARCHITECTURE.md with component overview and sequence diagrams
Tier 2 #2.1. Provides the target spec for Tier 3 refactors:
- Component overview (who owns what)
- Startup sequence (dasher_create → Realize → module creation)
- Mouse input flow (stored on PointerInput, consumed by filter in frame)
- Frame render pipeline (input filter timer → model step → render → policy)
- Training paths (explicit import + adaptive learning)
- Parameter system (manifest → codegen → SettingsStore → events)
- Language model hierarchy and normalization contract
- Draw command format (the cross-platform rendering abstraction)
- Threading model (single-threaded, one ctx per thread)
- Test architecture (3 tiers, 31 executables)
Designed to be read cold by someone new to the codebase. References
file:line for key entry points but does not duplicate code comments.
Signed-off-by: will wade <willwade@gmail.com>
* fix: clean stale temp dirs in test helpers (root cause of "non-determinism")
The "Linux-only engine non-determinism" in draw_snapshot_tests and
property_invariant_tests was not unordered_map hash randomization as
originally hypothesized — it was stale test state.
Root cause: create_isolated_context() and ScopedTempDir both generate
temp directory names as /tmp/dasher_test_{PID}_{counter}. The counter
resets to 0 on each process start; PIDs are frequently reused (especially
on WSL). So process run N+1 silently reuses the temp dirs from run N,
and XmlSettingsStore::Load() reads the stale dasher_settings.xml left
behind — with different parameter values, producing different rendering.
Fix: both helpers now call std::filesystem::remove_all on the target
path before creating the new directory. This ensures a clean state on
every context creation regardless of what previous processes left behind.
Evidence:
- Before fix: draw_snapshot_tests failed 20-80% of runs (worsening
with each successive run as more stale dirs accumulated — up to
2,439 stale dirs observed in /tmp)
- After fix: 10/10 consecutive runs pass without any manual cleanup
- macOS was unaffected because PID reuse patterns differ
- The engine rendering path (deque children, sorted maps, deterministic
hash tables) was confirmed fully deterministic by exhaustive audit
This resolves Tier 2 #2.4 (non-determinism hunt) — the target was
misidentified as unordered_map iteration but was actually test infra.
Signed-off-by: will wade <willwade@gmail.com>
* fix: initialize m_iActiveMarker + fix stale-pointer read in buffer test
Two MSVC test failures root-caused and fixed:
1. dasher_input_filter_tests SIGSEGV in Two-Push Dynamic Mode:
CTwoPushDynamicFilter::m_iActiveMarker was uninitialized in the
constructor. It is normally set by Timer() during frame rendering,
but key presses can arrive before the first frame. On gcc/Linux
the heap happens to zero-init it (value 0 = valid array index);
on MSVC it contains garbage, causing m_aiTarget[garbage] to crash.
Fix: initialize m_iActiveMarker = -1 ("no active marker") in the
constructor initializer list.
2. dasher_capi_buffer_lifetime_tests CHECK(last_op1 >= 0) failure:
The test read cmds1[cc1-6] AFTER a second dasher_frame() call,
which invalidates the cmds1 pointer per the C API contract
("valid until next dasher_frame()"). On gcc the memory wasn't
yet overwritten; on MSVC it contained garbage (-572662307).
Fix: save the opcode into a local BEFORE the second frame call.
Verified: 31/31 tests pass on both Linux (gcc 14.2, -j4) and
Windows (MSVC 19.44, /W4).
Signed-off-by: will wade <willwade@gmail.com>
* refactor: replace CFileLogger+CBasicLog+UserLog with dasher_set_log_callback (Tier 2 #2.0)
Deletes 22 files (~4,000 LOC) comprising three overlapping, unused
logging systems, and replaces them with a single C API callback:
dasher_set_log_callback(ctx, callback, user_data, min_level)
Added to dasher.h:
- dasher_log_callback typedef (int level, const char* msg, void* data)
- dasher_set_log_callback function declaration
- Log levels: 0=debug, 1=info, 2=warn, 3=error
- When no callback registered, zero overhead (messages discarded)
Deleted (22 files, ~4,000 LOC):
FileLogger.{h,cpp} — wrote dasher.log from DIB constructor
BasicLog.{h,cpp} — abstract base, nothing subclassed it
UserLog.{h,cpp} — full trial-logging system, 0 users
UserLogBase.{h,cpp} — base for UserLog/BasicLog
UserLogTrial.{h,cpp} — per-trial XML export
UserLogParam.{h,cpp} — parameter tracking for user log
UserButton.{h,cpp} — button event tracking for user log
UserLocation.{h,cpp} — pointer location normalization
TimeSpan.{h,cpp} — timing utilities for user log
SimpleTimer.{h,cpp} — timer for TimeSpan
XMLUtil.{h,cpp} — XML helpers for user log export
External edits:
- DasherInterfaceBase.{h,cpp}: removed m_pGlobalApplicationLog,
m_pUserLog, GetUserLogPtr(), all UserLog calls in Realize/NewFrame/
onUnpause/destructor, FileLogger include + g_iLogLevel constants
- InputFilter.h: removed unused #include "UserLogBase.h"
- DynamicButtons.cpp, OneButtonDynamicFilter.cpp,
TwoButtonDynamicFilter.cpp: removed GetUserLogPtr()->KeyDown() calls
- CircleStartHandler.cpp, DasherButtons.cpp, TwoBoxStartHandler.cpp:
added #include <limits> (was transitively provided by UserLogBase.h)
- Parameters.h/.cpp: regenerated without LP_USER_LOG_LEVEL_MASK
- settings_manifest.json: removed LP_USER_LOG_LEVEL_MASK entry
- strings_en.json: removed LP_USER_LOG_LEVEL_MASK strings
Verified: 31/31 tests pass (Linux gcc 14.2, sequential). The dasher.log
file is no longer created — frontends that want diagnostic logging
register via dasher_set_log_callback.
Signed-off-by: will wade <willwade@gmail.com>
* fix: auto-format generated Parameters.cpp via clang-format (Tier 2 #2.2)
The parameter codegen script emitted tab-indented, compact C++ that
required manual clang-format before every commit. Anyone editing
settings_manifest.json and running generate_parameters.py would get
CI format-check failures if they forgot the manual step.
Fix: the script now runs clang-format -i on the generated file
automatically. If clang-format is not installed, it prints a warning
and falls back to unformatted (still valid) output.
Verified: python3 Scripts/generate_parameters.py produces zero
clang-format violations.
Signed-off-by: will wade <willwade@gmail.com>
* refactor: extract shared PPM merge loop into CAbstractPPM (Tier 2 #2.5)
The PPM probability-merge loop — ~80 lines of dense numeric logic
that is the heart of Dasher's predictive engine — was duplicated
identically across CPPMLanguageModel and CPPMPYLanguageModel. The
only difference was how children of each trie node are iterated
(CPPMnode inline hash vs pychild std::map).
Extraction:
- Added mergePPMProbs() to CAbstractPPM — implements the full
four-phase algorithm: uniform backoff, vine-chain traversal with
alpha/beta weighting, leftover distribution, rounding correction.
- Added collectChildCounts() virtual to CAbstractPPM — abstracts
child iteration. Default uses ChildIterator (standard PPM);
PPMPY overrides to iterate its pychild map.
- CPPMLanguageModel::GetProbs now delegates to mergePPMProbs (7
lines, down from 95).
- CPPMPYLanguageModel::GetProbs now delegates to mergePPMProbs (6
lines, down from 106) with a collectChildCounts override (8 lines).
- CRoutingPPMLanguageModel unchanged for now — its two-pass
algorithm (base PPM + route distribution) has its own TODO at
line 52-54 that can now reference the shared method. Left as
follow-up to avoid changing behavior of the Mandarin routing model.
Documentation:
- mergePPMProbs has a thorough block comment explaining the four
phases, the alpha/beta formula, vine-pointer backoff, and the
sentinel symbol 0 contract. DasherCore is studied as a PPM
reference implementation — the code should be exemplary.
- collectChildCounts documents the inline hash vs pychild design.
- The FIXME on doExclusion is preserved with context on why it's
disabled.
Verified: 31/31 tests pass, including ppm_golden, lm_correctness
(14 cases), property_invariant (LM normalization), draw_snapshot
(deterministic rendering), and draw_command (rendering output).
Signed-off-by: will wade <willwade@gmail.com>
* refactor: template ModuleMap<T> to eliminate duplication (Tier 2 #2.3)
CModuleManager had 5 method pairs (InputDevice vs InputMethod) that
were line-for-line identical modulo type and map name — 60 of 118
lines were mechanical copy-paste.
Extracted a generic ModuleMap<T> template that handles registration
(both raw-pointer and unique_ptr versions), lookup, listing, and
default selection. CModuleManager now holds two ModuleMap instances
and provides thin forwarding methods.
118 lines → 62 lines (.cpp) + 93 lines (.h, including the template
with thorough documentation). The logic now exists in ONE place.
Verified: interaction, input_filter, and lifecycle tests all pass.
Signed-off-by: will wade <willwade@gmail.com>
* build: suppress C4100 globally for MSVC (Tier 2 #2.9)
The legacy codebase has ~2,700 C4100 (unreferenced formal parameter)
warnings — all from inherited virtual method overrides where the
parameter is intentionally unused in specific subclasses. These are
style issues, not bugs.
Suppressed globally via /wd4100. The bug-finding warnings remain
enabled: C4701 (uninitialized), C4244 (narrowing), C4389 (signed/
unsigned mismatch), etc.
Warning count drops from ~2,822 to 119. The remaining 119 are genuine
code-quality issues (99 narrowing conversions, 8 uninitialized vars,
4 C-linkage UDT) that should be fixed incrementally as files are touched.
Tests still get full /W4 with zero warnings (already enforced).
Signed-off-by: will wade <willwade@gmail.com>
* style: clang-format test files with CI version (18.1.8)
Test files were formatted with clang-format 19 locally; CI uses 18.1.8
which produces slightly different output. Reformatted all 15 affected
test files with clang-format-18 to match CI exactly.
Signed-off-by: will wade <willwade@gmail.com>
* refactor: Event<T> — deterministic order + RAII subscription (Tier 3 #3.1)
Rewrites the signal/slot event system to fix two long-standing issues:
1. Non-deterministic broadcast order: The original unordered_map<void*, fn>
iterated in hash-dependent order, which was randomized per-process on
Linux (libstdc++). This could cause subtle ordering bugs if multiple
subscribers' side effects compound. Switched to std::map for sorted
(deterministic) iteration.
2. Manual unsubscribe hazard: The original system required every subscriber
to call Unsubscribe(this) before destruction — forgetting caused a
use-after-free on the next Broadcast(). Added an RAII Subscription guard
that auto-unsubscribes when it goes out of scope. New code can use the
keyless Subscribe(fn) overload that returns a Subscription to store as
a member.
The void* Subscribe(listener, fn) / Unsubscribe(listener) API is preserved
exactly — zero changes to the 18 Subscribe and 22 Unsubscribe call sites.
Existing callers continue to work unchanged. The keyless Subscribe(fn)
overload is opt-in for new code.
Documentation: thorough block comments explaining the signal/slot pattern,
both usage patterns (legacy void* and RAII), and the design rationale.
Verified: 31/31 tests pass.
Signed-off-by: will wade <willwade@gmail.com>
* docs: enhance ControlManager.h with architecture diagram (Tier 3 #3.2)
The review recommended splitting into "one file per action class" (16
files). Many classes are 3-5 lines each — creating 15+ tiny files
would hurt navigation, not help it. Instead, added a class hierarchy
ASCII diagram to the file-level comment showing the full architecture:
ActionRegistry → ControlAction subclasses → NodeTemplate → CContNode
→ CControlManager.
Also fixed misleading [[maybe_unused]] on SocketOutputAction and
ATSPIAction member fields — these classes ARE instantiated (from
alphabet XML in AlphIO.cpp), just their execute() methods are no-ops.
Added explanatory comments documenting this.
No structural change to the code — purely documentation improvement.
Verified: control_action tests pass.
Signed-off-by: will wade <willwade@gmail.com>
* style: fix Event.h clang-format 18 formatting
Signed-off-by: will wade <willwade@gmail.com>
* docs: add naming convention for new code to CONTRIBUTING.md (Tier 3 #3.5)
Adds Rule 7: Naming Convention to CONTRIBUTING.md. The convention applies
to NEW code only — existing CFoo/m_iFoo legacy naming is explicitly
preserved (no bulk renames).
Convention for new code:
- Classes: PascalCase (not CFoo)
- Methods: PascalCase (not snake_case)
- Members: snake_case_ trailing underscore (not m_iFoo)
- Free functions: snake_case
- Files: PascalCase.h / PascalCase.cpp
Also documents the existing conventions new contributors will encounter
(CFoo, m_iFoo, Hungarian notation) so they understand the codebase's
history without being confused by the mix.
This formalizes the review's recommendation #20 as a contribution rule.
Signed-off-by: will wade <willwade@gmail.com>
* fix: Mixture LM normalization + wire log callback to engine messages
Two fixes:
1. Mixture LM normalization leak: CMixtureLanguageModel::GetProbs
was missing Probs[0]=0 (sentinel) and had no rounding correction.
The blend of two sub-LMs lost ~519 units to integer truncation,
leaving ~0.8% of the Dasher canvas as dead zones. Fixed by:
- Setting Probs[0] = 0 (same contract as every other LM)
- Adding rounding correction that distributes the residual
Mixture total_mass: 65017 → 65536 (exact)
2. Log callback never invoked: dasher_set_log_callback stored the
callback in ctx->logCb but no engine code called it. Now the
Interface::Message() override routes engine messages through
BOTH the message callback (for UI display) AND the log callback
(for diagnostic logging). Modal/interrupt messages = WARN level,
async messages = INFO level. Frontends that registered
dasher_set_log_callback will now receive engine messages.
Verified: 7/7 LM-sensitive tests pass (property_invariant, lm_correctness,
ppm_golden, ppm_serialization, capi, capi_buffer_lifetime, benchmark).
Signed-off-by: will wade <willwade@gmail.com>
* fix: revert Event.h to unordered_map (stuttering on Windows)
The std::map change (commit db5f6ec) caused stuttering while zooming
on Windows. MSVC's std::map allocates each node separately (red-black
tree), creating cache misses during Broadcast. With typical subscriber
counts (5-10), unordered_map's contiguous buckets are significantly
faster for both iteration and lookup.
Reverted the storage to unordered_map. Preserved the RAII Subscription
improvement (the keyless Subscribe overload and auto-unsubscribe guard)
— those are orthogonal to the storage container.
Confirmed via git bisect on Dasher-Windows: Event.h was the commit
that introduced the stuttering.
Updated design notes to document why we use unordered_map, so the next
person who thinks "let's make this deterministic" sees the warning.
31/31 tests pass.
Signed-off-by: will wade <willwade@gmail.com>
* fix: revert v5 alphabet format changes that broke space node (PR #28)
PR #28 added v5 alphabet format support but introduced an emergent
interaction bug where the 6 changes combined caused the paragraphSpace
group (containing ¶ and □/space) to disappear from parsed alphabets.
Users could not type spaces — text came out as "hellomynameis".
Confirmed: pre-PR #28 code yields 62 symbols (space present);
post-PR #28 code yields 60 symbols (space missing).
The Windows team isolated the bug: each individual change works, but
all 6 combined break it. The prime suspect is the Parse/ParseSingle
dispatch split, which somehow changes pugixml node iteration behavior.
Reverted AlphIO.cpp and AlphIO.h to pre-PR #28 state. v5 alphabet
support will be re-implemented properly in a follow-up that doesn't
break v6 parsing.
Credit: Dasher-Windows team (bisect + revert branch).
Verified: 62 symbols restored, 31/31 tests pass.
Signed-off-by: will wade <willwade@gmail.com>
---------
Signed-off-by: will wade <willwade@gmail.com>
Updating the definition to be consistent with the one used in DasherJava V1.0.0.
The changes are described in the individual commit descriptions. Most of them should be unproblematic since they affect features that are currently not supported by this Dasher implementation anyway.
All alphabets in
Data/alphabetsand most of the ones inData/alphabets/autoConvertedconform to the new DTD. Exceptions include the Chinese alphabets for example, which require special features and are currently lacking color definitions.