Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem about message missing when write CJK string #139

Closed
hicdre opened this issue Apr 12, 2022 · 10 comments · Fixed by #188
Closed

Problem about message missing when write CJK string #139

hicdre opened this issue Apr 12, 2022 · 10 comments · Fixed by #188

Comments

@hicdre
Copy link

hicdre commented Apr 12, 2022

const SonicBoom = require('sonic-boom');
const sonic = new SonicBoom({ fd: process.stdout.fd }); 

sonic.write('测试一二三四五六七八九十\n');
sonic.write('零\n');  // this is missing in stdout

console.log('finish');

The problem is about line below

sonic-boom/index.js

Lines 164 to 165 in 7b4e25f

const n = fs.writeSync(this.fd, this._writingBuf, 'utf8')
this._len -= n

n is the byte length of this._writingBuf, but it may not equals the characters length of the string which actually encoded in utf16. So this._len would be less than zero. It will lead to message missing.

console.assert('测试一二三四五六七八九十\n'.length === 13);
console.assert(Buffer.from('测试一二三四五六七八九十\n', 'utf8').length === 37);
@ronag
Copy link
Contributor

ronag commented Apr 12, 2022

Good catch! @mcollina I don't see a good way to fix this without some form of perf regression. I think we have 2 options:

In write we:

  1. Buffer.byteLength(str)
  2. Always convert to Buffer in write.

@jsumners
Copy link
Member

This module's readme clearly states it is a utf8 only stream module.

@ronag
Copy link
Contributor

ronag commented Apr 12, 2022

All strings in javascript are utf-16...

@jsumners
Copy link
Member

All strings in javascript are utf-16...

So they are https://262.ecma-international.org/11.0/#sec-ecmascript-language-types-string-type

@hicdre
Copy link
Author

hicdre commented Apr 12, 2022

pinojs/pino-pretty#324

@zbinlin
Copy link

zbinlin commented Jul 24, 2022

Can you fix this issue? @mcollina @ronag @jsumners
I use Fastify v4, it will cause the log not to be flushed.

@mcollina
Copy link
Member

That's unlikely to happen in the short term. I'd be happy to see a PR that fixes it (without regressing the throughput).

@mcollina
Copy link
Member

Here is a PR with a partial fix: #154

@mcollina
Copy link
Member

The next step would be to convert this._writingBuf from being a String to being a Buffer.

@jsumners
Copy link
Member

For whomever tackles this by swapping out the String backing to a Buffer, keep in mind that there are multiple types of multi-byte utf-8 characters. See https://github.com/ldapjs/filter/blob/1423612281ccd43fb5f45dc9b534d91969d3cc6e/lib/utils/escape-filter-value.js for some code that covers 2 and 3 byte characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants