-
-
Notifications
You must be signed in to change notification settings - Fork 34.5k
lib: use utf8 fast path for streaming TextDecoder #61549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a61503c to
4331d20
Compare
4331d20 to
916cd32
Compare
9ebe8ce to
111b99e
Compare
1a77e7a to
cd5c966
Compare
cd5c966 to
e9e9252
Compare
|
@ChALkeR can you run benchmark CI that applies to this pull-request? |
|
@anonrig I can't do anything until #61553 Also, we don't have a function decodeChunked(bytes) {
const chunk = 1000
const max = bytes.length - chunk
let i = 0
for (; i < max; i += chunk) decoder.decode(bytes.subarray(i, i + chunk), { stream: true })
decoder.decode(bytes.subarray(i))
}And using that instead of |
anonrig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the kind responses and amazing work.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #61549 +/- ##
========================================
Coverage 89.77% 89.77%
========================================
Files 672 673 +1
Lines 203755 203875 +120
Branches 39167 39190 +23
========================================
+ Hits 182922 183038 +116
- Misses 13164 13172 +8
+ Partials 7669 7665 -4
🚀 New features to boost your workflow:
|
|
@anonrig I added a benchmark for streaming Unicode TextDecoder. |
This comment was marked as outdated.
This comment was marked as outdated.
|
Benchmark CI output. Chunks of 25 bytes are slightly slower than previously, larger chunks are significantly faster than previously.
|
|
cc @nodejs/performance perhaps |
gurgunday
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I reviewed and tested to the best of my ability, and this is indeed great work @ChALkeR
|
@gurgunday Once everything is fixed, the plan is to update WPT + bring in my extra tests. They'll fail now due to other bugs though, so it can't be done yet without sorting the tests, and that's likely not worth it if we can just fix stuff first. |
|
Can we find a middle ground and avoid regressions like this with this pull-request? util/text-decoder-stream.js type='SharedArrayBuffer' n=1000 chunks=10 len=256 unicode=1 fatal=1 ignoreBOM=1 encoding='utf-8' *** -21.55 % ±4.82% ±6.42% ±8.36% |
|
@anonrig That's streaming with chunks of 25 bytes. Which is not a realistic use-case of this API. |
Tracking: #61041
A continuation of #61409
This is based on the logic in https://github.com/ExodusOSS/bytes
Unifies stream and non-stream codepath for UTF-8 both for intl and no-intl variants.
Previously, ICU and
string_decoderwere used on with-intl and without-intl variants correspondingly for streaming UTF-8 implementationInstead, just do minor quick slices on JS side and use the single stateless native decoding API to be fast on large chunks
This:
Benchmarks
Previously: #61131
Non-chunked, 25.5.0
Non-chunked, PR
(should not be affected)
Chunked (1000 byte chunks), 25.5.0
Chunked (1000 byte chunks), PR
The benchmark creates about ~70-256 chunks on each test
The improvement is significant