Skip to content

Commit

Permalink
doc: improve Buffer's encoding documentation
Browse files Browse the repository at this point in the history
- Add a paragraph about case-insensitivity of encoding options.
- Document "utf-8", "utf-16le" and "ucs-2" aliases.
- Always use "utf8" in documentation for defaults and examples.

PR-URL: #37945
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
  • Loading branch information
targos authored and MylesBorins committed Apr 4, 2021
1 parent f3fabb5 commit a57dc06
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 16 deletions.
30 changes: 17 additions & 13 deletions doc/api/buffer.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,17 +79,21 @@ console.log(Buffer.from('fhqwhgads', 'utf16le'));
// Prints: <Buffer 66 00 68 00 71 00 77 00 68 00 67 00 61 00 64 00 73 00>
```

Node.js buffers accept all case variations of encoding strings that they
receive. For example, UTF-8 can be specified as `'utf8'`, `'UTF8'` or `'uTf8'`.

The character encodings currently supported by Node.js are the following:

* `'utf8'`: Multi-byte encoded Unicode characters. Many web pages and other
document formats use [UTF-8][]. This is the default character encoding.
When decoding a `Buffer` into a string that does not exclusively contain
valid UTF-8 data, the Unicode replacement character `U+FFFD` � will be used
to represent those errors.
* `'utf8'` (alias: `'utf-8'`): Multi-byte encoded Unicode characters. Many web
pages and other document formats use [UTF-8][]. This is the default character
encoding. When decoding a `Buffer` into a string that does not exclusively
contain valid UTF-8 data, the Unicode replacement character `U+FFFD` � will be
used to represent those errors.

* `'utf16le'`: Multi-byte encoded Unicode characters. Unlike `'utf8'`, each
character in the string will be encoded using either 2 or 4 bytes.
Node.js only supports the [little-endian][endianness] variant of [UTF-16][].
* `'utf16le'` (alias: `'utf-16le'`): Multi-byte encoded Unicode characters.
Unlike `'utf8'`, each character in the string will be encoded using either 2
or 4 bytes. Node.js only supports the [little-endian][endianness] variant of
[UTF-16][].

* `'latin1'`: Latin-1 stands for [ISO-8859-1][]. This character encoding only
supports the Unicode characters from `U+0000` to `U+00FF`. Each character is
Expand Down Expand Up @@ -132,11 +136,11 @@ The following legacy character encodings are also supported:
* `'binary'`: Alias for `'latin1'`. See [binary strings][] for more background
on this topic. The name of this encoding can be very misleading, as all of the
encodings listed here convert between strings and binary data. For converting
between strings and `Buffer`s, typically `'utf-8'` is the right choice.
between strings and `Buffer`s, typically `'utf8'` is the right choice.

* `'ucs2'`: Alias of `'utf16le'`. UCS-2 used to refer to a variant of UTF-16
that did not support characters that had code points larger than U+FFFF.
In Node.js, these code points are always supported.
* `'ucs2'`, `'ucs-2'`: Aliases of `'utf16le'`. UCS-2 used to refer to a variant
of UTF-16 that did not support characters that had code points larger than
U+FFFF. In Node.js, these code points are always supported.

```js
Buffer.from('1ag', 'hex');
Expand Down Expand Up @@ -900,7 +904,7 @@ Returns `true` if `encoding` is the name of a supported character encoding,
or `false` otherwise.

```js
console.log(Buffer.isEncoding('utf-8'));
console.log(Buffer.isEncoding('utf8'));
// Prints: true

console.log(Buffer.isEncoding('hex'));
Expand Down
4 changes: 2 additions & 2 deletions doc/api/http.md
Original file line number Diff line number Diff line change
Expand Up @@ -2323,7 +2323,7 @@ changes:
-->

* `chunk` {string | Buffer}
* `encoding` {string} Optional, **Default**: `utf-8`
* `encoding` {string} Optional, **Default**: `utf8`
* `callback` {Function} Optional
* Returns: {this}

Expand Down Expand Up @@ -2565,7 +2565,7 @@ changes:
-->

* `chunk` {string | Buffer}
* `encoding` {string} **Default**: `utf-8`
* `encoding` {string} **Default**: `utf8`
* `callback` {Function}
* Returns {boolean}

Expand Down
2 changes: 1 addition & 1 deletion doc/api/stream.md
Original file line number Diff line number Diff line change
Expand Up @@ -2759,7 +2759,7 @@ const fs = require('fs');

pipeline(
fs.createReadStream('object.json')
.setEncoding('utf-8'),
.setEncoding('utf8'),
new Transform({
decodeStrings: false, // Accept string input rather than Buffers
construct(callback) {
Expand Down

0 comments on commit a57dc06

Please sign in to comment.