After updating from v1.2.1 to v1.3.1 we noticed a performance regression in Unicode.String.Break.next/4, which is now slower by a factor of ~50. It is still present in v1.4.0.
I did some preliminary profiling and found that a lot of time seems to be spent in regex compilation.
Reproduction
[unicode_string_ver] = System.argv()
Mix.install([
{:unicode_string, unicode_string_ver}
])
:timer.tc(fn ->
Enum.each(1..100, fn _ ->
{_, _} = Unicode.String.Break.next("test123 ", "root", :word, [])
end)
end)
|> elem(0)
|> IO.inspect(label: "usecs")
> elixir repro.exs "~> 1.2.1"
usecs: 28131
> elixir repro.exs "~> 1.3.1"
usecs: 1456437
> elixir repro.exs "~> 1.4.0"
usecs: 1431304
Erlang/OTP 25 [erts-13.2.2.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit:ns]
Elixir 1.15.7 (compiled with Erlang/OTP 25)