string_decoder: support Uint8Array input to methods#11613
string_decoder: support Uint8Array input to methods#11613addaleax wants to merge 3 commits intonodejs:masterfrom
Conversation
test/parallel/test-string-decoder.js
Outdated
lib/buffer.js
Outdated
There was a problem hiding this comment.
nit: I'm really not a fan of this syntax.. but oh well.
There was a problem hiding this comment.
Me neither. but what are the alternatives if I want to avoid the cost of property lookups? Splitting this into 6 lines? That’s something I can do if you prefer
There was a problem hiding this comment.
nah, this is fine as is. Just had to gripe about it ;-)
|
@mscdex Any further thoughts on this? Otherwise I’d like to land this in the next 1 or 2 days. |
lib/string_decoder.js
Outdated
There was a problem hiding this comment.
This will create hidden classes, which could add to the lookup overhead.
lib/string_decoder.js
Outdated
There was a problem hiding this comment.
Using a prototypeless object would be better, but I'm not sure that having a lookup object like this is best.
Taking into account my comment from below, I would suggest creating a prototypeless object with the properties assigned at the same time using Object.create(null, { ... }).
There was a problem hiding this comment.
Also, it might be worthwhile comparing other lookup strategies, such as using a function that returns the correct function based on the encoding.
|
Shouldn't this be semver-major if we're now (explicitly) changing the behavior of strings passed to |
lib/string_decoder.js
Outdated
There was a problem hiding this comment.
I don't understand the comment here. Is the suggestion that in the future it should throw on a string? Otherwise the comment seems at odds with the string check below.
There was a problem hiding this comment.
I don't understand the comment here. Is the suggestion that in the future it should throw on a string? Otherwise the comment seems at odds with the string check below.
Yes… do you have different thoughts? It doesn’t really make sense to pass in a string here, does it?
There was a problem hiding this comment.
Maybe just leave the behavior about string inputs as it is and make it throw in another PR? (that would be semver-major, I guess) (EDIT: OK this PR is already semver-major..)
|
Also, have you benchmarked the |
|
ping @addaleax :-) |
7014a3e to
db0181e
Compare
|
@jasnell Thanks for the ping… I’ve rebased this and edited a bit, @mscdex was right to ask for individual benchmarks for the Here’s the current benchmark situation: I would be okay with accepting these, especially given how the improvements tend to affect the more common encodings (esp.
@mscdex I wouldn’t consider string input covered as part of the API, but it’s a reasonable point of view. I’m changing the label to be careful. |
lib/string_decoder.js
Outdated
There was a problem hiding this comment.
These additions concern me a bit because they make the function size exceed Crankshaft's max inlineable source size.
There was a problem hiding this comment.
@mscdex … yeah. Do you have a better alternative in mind?
|
@jasnell I plan on taking a look this week. |
|
FWIW I think I've found some more optimizations to avoid the current performance regressions and then some. I am running benchmarks now ... |
|
Ok, here are the results (compared to the current node master branch) with this PR + my changes to |
|
@mscdex Are your modifications pushed somewhere? Are you okay with this change (possibly pending applying them)? |
|
Not yet, I wasn't sure where to push it for review. |
|
You can just push to this branch if you like. |
|
CI is green. @jasnell Do you mind taking another look? |
|
After #12223, I wonder if it would make sense to add support for all ArrayBuffer views instead of just Uint8Array, here rather than in a later PR. |
|
@TimothyGu I am not sure that makes sense … for something that decodes byte sequences, shouldn’t the input be an Uint8Array? |
lib/string_decoder.js
Outdated
There was a problem hiding this comment.
Would it affect performance if we use constants with names instead of number literals for indices?
Makes the string slice methods of buffers available on the binding object in addition to the `Buffer` prototype. This enables subsequent `string_decoder` changes to use these methods directly without performance loss, since all parameters are known by the string decoder in those cases.
This is a bit odd since `string_decoder` does currently not perform any type checking. Also, this adds an explicit check for `string` input, which does not really make sense but is relied upon by our test suite.
b67e1e9 to
525fabd
Compare
|
Rebased. @mscdex Do my changes LGTY? This is basically only waiting for a second CTC member approval. |
| Local<Value> buffer_arg, | ||
| Local<Value> start_arg, | ||
| Local<Value> end_arg) { | ||
| Isolate* isolate = args.GetIsolate(); |
There was a problem hiding this comment.
Perhaps we could match StringSlice() above and instead do Isolate* isolate = env->isolate(); for consistency?
|
LGTM with one minor nit that shouldn't block this from landing. CI again: https://ci.nodejs.org/job/node-test-pull-request/7898/ |
| sequence.forEach((write) => { | ||
| output += decoder.write(input.slice(write[0], write[1])); | ||
| for (const useUint8array of [ false, true ]) { | ||
| sequences.forEach((sequence) => { |
There was a problem hiding this comment.
While at it, change this to a for-of loop?
Well maybe, but I feel it is plausible for the user to use a Uint16Array for UTF-16/UCS-2 input, for example |
mcollina
left a comment
There was a problem hiding this comment.
LGTM with a polyfill for https://github.com/rvagg/string_decoder (it can be down there too!)
|
Needs a rebase but I guess this is otherwise pretty much good to go? |
|
Ping @addaleax I would love to get this in and I think this only needs a rebase. Otherwise I would go ahead and close this. |
|
@BridgeAR benchmarks would probably need to be re-ran before merging because this was all done before TurboFan. |
|
@BridgeAR @mcollina’s review was dependent on there being a polyfill for the corresponding npm module, which I haven’t done yet; feel free to take this over if you like @mscdex I agree, but I wouldn’t expect much of a difference since I don’t think TurboFan had impact on how native bindings are called |
|
@addaleax I was referring more to the js-land stuff, especially the commit I pushed. |
|
@BridgeAR this can land independently, we pick the content from core releases, so we can fetch them But good catch on the other PR, I'll get it updated and landed. This is semver-major anyway, so we have time. |
|
Ping @addaleax |
|
Closing due to long inactivity. @addaleax please reopen if you want to follow up on this :-) (in that case the benchmarks should be rerun though). |
|
Yeah, I think there’s no point in pursuing this given that we now have |
This includes a bit of refactoring for the
Bufferinternals to keep up performance. Some quick benchmark results (only thestring-decoderbenchmark, excluding the bigger input/chunk sizes and with reducedn):Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
string_decoder, buffer