Kotlin Multiplatform (KMP) library that implements Unicode confusable detection based on Unicode Technical Standard #39 - Unicode Security Mechanisms.
It extends String with:
toSkeleton(): returns the UTS #39 confusable skeleton (specifically,internalSkeleton).isConfusable(other): returns whether two strings have the same skeleton.
Warning
A skeleton is intended only for internal use when testing confusability; it is not suitable for display and should not be treated as a general “normalization” of identifiers.
"paypal".isConfusable("p\u0430yp\u0430l") // => true (Cyrillic 'а')
"ѕсоре".toSkeleton() // => "scope"repositories {
mavenCentral()
}
kotlin {
sourceSets {
val commonMain by getting {
dependencies {
implementation("com.doist.x:confusables:1.0.0")
}
}
}
}This library embeds data from:
- UTS #39
confusables.txt(Unicode 17.0.0) - UCD
Default_Ignorable_Code_Point(Unicode 17.0.0)
Kotlin tables are generated into build/ at build time from the pinned resources/unicode-data/ inputs.
All Unicode data is subject to Unicode’s Terms of Use.
Run:
./gradlew updateUnicodeData -PunicodeVersion=17.0.0Released under the MIT License.