Skip to content

Kotlin Multiplatform (KMP) library for Unicode confusable detection (UTS #39)

License

Notifications You must be signed in to change notification settings

Doist/doistx-confusables

Repository files navigation

doistx-confusables

badge-version badge-android badge-jvm badge-js badge-ios badge-ios badge-ios badge-macos badge-windows badge-linux

Kotlin Multiplatform (KMP) library that implements Unicode confusable detection based on Unicode Technical Standard #39 - Unicode Security Mechanisms.

It extends String with:

  • toSkeleton(): returns the UTS #39 confusable skeleton (specifically, internalSkeleton).
  • isConfusable(other): returns whether two strings have the same skeleton.

Warning

A skeleton is intended only for internal use when testing confusability; it is not suitable for display and should not be treated as a general “normalization” of identifiers.

Usage

"paypal".isConfusable("p\u0430yp\u0430l") // => true (Cyrillic 'а')
"ѕсоре".toSkeleton() // => "scope"

Setup

repositories {
   mavenCentral()
}

kotlin {
   sourceSets {
      val commonMain by getting {
         dependencies {
            implementation("com.doist.x:confusables:1.0.0")
         }
      }
   }
}

Unicode data

This library embeds data from:

  • UTS #39 confusables.txt (Unicode 17.0.0)
  • UCD Default_Ignorable_Code_Point (Unicode 17.0.0)

Kotlin tables are generated into build/ at build time from the pinned resources/unicode-data/ inputs.

All Unicode data is subject to Unicode’s Terms of Use.

Updating Unicode data

Run:

./gradlew updateUnicodeData -PunicodeVersion=17.0.0

License

Released under the MIT License.

About

Kotlin Multiplatform (KMP) library for Unicode confusable detection (UTS #39)

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages