Conversation
This comment has been minimized.
This comment has been minimized.
6b6b35f to
5d8c3ac
Compare
Because it has to handle multibyte encodings, `dirname` scans the entire string from the start and maps all the separators before copying the relevant part. The overwhelming majority of the time, the function is called with UTF-8 or ASCII strings, and `level = 1`. If we optimize for that case specifically, we can use a much simpler and faster reverse search. In addition, there is a lot needless and/or duplicated safety checks performed by `FilePathStringValue` and `StringValueCStr`, so a lot of overhead can be removed by calling a simpler check function instead. Similarly, the function was using a lot of generic `rb_enc_` functions that waste a lot of time performing type checks, when we know we're dealing with strings. ``` | |compare-ruby|built-ruby| |:------|-----------:|---------:| |long | 3.925M| 25.221M| | | -| 6.43x| |short | 15.550M| 28.697M| | | -| 1.85x| ```
5d8c3ac to
98d9072
Compare
|
I'm gonna wait on this, because instead of and addtional fast path, we could perhaps always consider paths as ASCII compatible, and get most of the win with while reducing complexity: byroot@56aec70 As far as I can tell, >> File.dirname("/foo/bar/baz".encode(Encoding::UTF_16BE))
(irb):1:in 'File.dirname': path name must be ASCII-compatible (UTF-16BE): "/foo/bar/baz" (Encoding::CompatibilityError)So I don't understand why we'd need all that multibyte handling code. It was added in January 2012 in ed46983, but the encoding check was added later in October 2012 ad54de2, so perhaps it's just "dead code". |
|
It is not dead code. OTOH, you may be able to simplify the macro in the case |
That makes a lot of sense.
Interesting idea. Or perhaps pass |
|
Closing in favor of #15907 |
Because it has to handle multibyte encodings,
dirnamescans the entire string from the start and maps all the separators before copying the relevant part.The overwhelming majority of the time, the function is called with UTF-8 or ASCII strings, and
level = 1.If we optimize for that case specifically, we can use a much simpler and faster reverse search.
In addition, there is a lot needless and/or duplicated safety checks performed by
FilePathStringValueandStringValueCStr, so a lot of overhead can be removed by calling a simpler check function instead.Similarly, the function was using a lot of generic
rb_enc_functions that waste a lot of time performing type checks, when we know we're dealing with strings.