Use TextDecoder for toString('utf8') #286

mischnic · 2021-01-07T10:46:47Z

Closes #268

This uses TextDecoder for toString('utf8') and toString().

I needed to update some tests so that they are in line with Node's native Buffer (which also makes them pass with TextDecoder), I hope this was correct?

Technically, it also supports latin1, utf-16le, but the conversion is different from Node for strings that aren't representable in these encodings:

latin1:

Buffer content: 'Ö' = <UTF8 Buffer c3 96>
Output:
 TextDecoder latin1: 'Ã–' <Buffer c3 13>
 Node Buffer latin1: 'Ã�' <Buffer c3 96>

utf16: (TextDecoder adds an "�" at the end. https://www.compart.com/en/unicode/U+FFFD)

Buffer content: 'abc' = <UTF8 Buffer 61 62 63>
Output:
 TextDecoder utf-16le: e6 89 a1 ef bf bd
 Node utf16le:         e6 89 a1

martinheidegger · 2021-01-11T05:19:14Z

Great initiative @mischnic. I am using Buffer as a drop-in replacement for Node's versions. Changing the tests wouldn't work for me as then Buffer couldn't be used for that. Would it be possible to adjust the output of decoderUTF8.decode(buf.slice(start, end)) to adjust for this case? Maybe remove the "�" with utf16le and replace 0x13 with 0x96 with lading encoding?

mischnic · 2021-01-11T09:39:10Z

I only adjusted those test which were apparently deviating from Node's Buffer. For example try running this

> new Buffer([0xF4, 0x8F, 0x80]).toString().length
1

so this test was apparently wrong

  t.equal(
    new B([0xF4, 0x8F, 0x80]).toString(),
    '\uFFFD\uFFFD\uFFFD'
  )

I only used TextDecoder for utf8 because it seems to align with Buffer.toString("utf8"). The handling of the other encodings (utf16, latin) is still the same.

mischnic · 2021-06-05T22:45:08Z

@feross ?

mischnic · 2021-12-08T10:16:44Z

There is apparently some breakeven point where TextDecoder becomes faster then the existing implementation:

Using node perf/readUtf8.js, testing 256 byte buffers and new Buffer('7c'.repeat(5e7), 'hex') for the "big" variants

master:
	BrowserBuffer#readUtf8 x 414,259 ops/sec ±2.91% (85 runs sampled)
	NodeBuffer#readUtf8 x 486,114 ops/sec ±3.01% (84 runs sampled)
	BrowserBuffer#readUtf8 big x 0.98 ops/sec ±5.58% (7 runs sampled)
	NodeBuffer#readUtf8 big x 34.31 ops/sec ±1.56% (58 runs sampled)

this:
	BrowserBuffer#readUtf8 x 195,525 ops/sec ±2.06% (86 runs sampled)
	NodeBuffer#readUtf8 x 486,587 ops/sec ±2.24% (79 runs sampled)
	BrowserBuffer#readUtf8 big x 18.61 ops/sec ±11.77% (38 runs sampled)
	NodeBuffer#readUtf8 big x 35.19 ops/sec ±1.76% (61 runs sampled)

mischnic force-pushed the textdecoder branch from fa2190e to aa8e06e Compare January 7, 2021 12:48

mischnic force-pushed the textdecoder branch from aa8e06e to 99d4fd5 Compare June 5, 2021 23:35

mischnic force-pushed the textdecoder branch from 99d4fd5 to cd0c859 Compare December 8, 2021 11:53

Use TextDecoder for big toString('utf8')

b17b5e2

mischnic force-pushed the textdecoder branch from cd0c859 to b17b5e2 Compare November 25, 2023 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use TextDecoder for toString('utf8') #286

Use TextDecoder for toString('utf8') #286

mischnic commented Jan 7, 2021 •

edited

martinheidegger commented Jan 11, 2021

mischnic commented Jan 11, 2021 •

edited

mischnic commented Jun 5, 2021

mischnic commented Dec 8, 2021

Use TextDecoder for toString('utf8') #286

Are you sure you want to change the base?

Use TextDecoder for toString('utf8') #286

Conversation

mischnic commented Jan 7, 2021 • edited

martinheidegger commented Jan 11, 2021

mischnic commented Jan 11, 2021 • edited

mischnic commented Jun 5, 2021

mischnic commented Dec 8, 2021

mischnic commented Jan 7, 2021 •

edited

mischnic commented Jan 11, 2021 •

edited