Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(web): optimize single pass utf8 decoding #16593

Merged
merged 6 commits into from Nov 11, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
61 changes: 35 additions & 26 deletions ext/web/08_text_encoding.js
Expand Up @@ -16,14 +16,14 @@
const ops = core.ops;
const webidl = window.__bootstrap.webidl;
const {
ArrayBufferIsView,
ObjectPrototypeIsPrototypeOf,
PromiseReject,
PromiseResolve,
StringPrototypeCharCodeAt,
StringPrototypeSlice,
TypedArrayPrototypeSubarray,
Uint8Array,
ObjectPrototypeIsPrototypeOf,
ArrayBufferIsView,
Uint32Array,
} = window.__bootstrap.primordials;

Expand All @@ -34,6 +34,8 @@
#fatal;
/** @type {boolean} */
#ignoreBOM;
/** @type {boolean} */
#utf8SinglePass;

/** @type {number | null} */
#rid = null;
Expand All @@ -56,6 +58,7 @@
this.#encoding = encoding;
this.#fatal = options.fatal;
this.#ignoreBOM = options.ignoreBOM;
this.#utf8SinglePass = encoding === "utf-8" && !options.fatal;
this[webidl.brand] = webidl.brand;
}

Expand All @@ -81,7 +84,7 @@
* @param {BufferSource} [input]
* @param {TextDecodeOptions} options
*/
decode(input = new Uint8Array(), options = {}) {
decode(input = new Uint8Array(), options = undefined) {
webidl.assertBranded(this, TextDecoderPrototype);
const prefix = "Failed to execute 'decode' on 'TextDecoder'";
if (input !== undefined) {
Expand All @@ -91,40 +94,46 @@
allowShared: true,
});
}
options = webidl.converters.TextDecodeOptions(options, {
prefix,
context: "Argument 2",
});
let stream = false;
if (options !== undefined) {
options = webidl.converters.TextDecodeOptions(options, {
prefix,
context: "Argument 2",
});
stream = options.stream;
}
Comment on lines +97 to +104
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should come up with a general improvement in webidl for cases like this. I think the optimal way would require new Function codegen.

Copy link
Contributor

@vimirage vimirage Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, Evan (Discord) had been saying that for over a year. The entire WebIDL bindings setup should be via JiT-compiled codegen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah :(

never too late though


try {
try {
if (ArrayBufferIsView(input)) {
input = new Uint8Array(
input.buffer,
input.byteOffset,
input.byteLength,
);
} else {
input = new Uint8Array(input);
}
} catch {
// If the buffer is detached, just create a new empty Uint8Array.
input = new Uint8Array();
}
// Note from spec: implementations are strongly encouraged to use an implementation strategy that avoids this copy.
// When doing so they will have to make sure that changes to input do not affect future calls to decode().
if (
ObjectPrototypeIsPrototypeOf(
SharedArrayBuffer.prototype,
Comment on lines 110 to 111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ObjectPrototypeIsPrototypeOf(
SharedArrayBuffer.prototype,
ObjectPrototypeIsPrototypeOf(
SharedArrayBufferPrototype,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please defer these primordials changes? I don't want to make the code any slower

Copy link
Contributor

@vimirage vimirage Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; I'll introduce these next month, myself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets see what others think. i personally don't like primordials in fast paths for mitigating a hypothetical case. Node.js has also removed primoridals in hot paths previously nodejs/node#38248

input.buffer,
input || input.buffer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is polymorphic

)
) {
// We clone the data into a non-shared ArrayBuffer so we can pass it
// to Rust.
// `input` is now a Uint8Array, and calling the TypedArray constructor
// with a TypedArray argument copies the data.
input = new Uint8Array(input);
if (ArrayBufferIsView(input)) {
input = new Uint8Array(
input.buffer,
input.byteOffset,
input.byteLength,
Comment on lines +121 to +123
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... now these are all polymorphic...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't this already the case before this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! ...No one fixed it

);
} else {
input = new Uint8Array(input);
}
}

if (!options.stream && this.#rid === null) {
// Fast path for single pass encoding.
if (!stream && this.#rid === null) {
// Fast path for utf8 single pass encoding.
if (this.#utf8SinglePass) {
return ops.op_encoding_decode_utf8(input, this.#ignoreBOM);
}

return ops.op_encoding_decode_single(
input,
this.#encoding,
Expand All @@ -140,9 +149,9 @@
this.#ignoreBOM,
);
}
return ops.op_encoding_decode(input, this.#rid, options.stream);
return ops.op_encoding_decode(input, this.#rid, stream);
} finally {
if (!options.stream && this.#rid !== null) {
if (!stream && this.#rid !== null) {
core.close(this.#rid);
this.#rid = null;
}
Expand Down
34 changes: 34 additions & 0 deletions ext/web/lib.rs
Expand Up @@ -91,6 +91,7 @@ pub fn init<P: TimersPermission + 'static>(
op_base64_btoa::decl(),
op_encoding_normalize_label::decl(),
op_encoding_decode_single::decl(),
op_encoding_decode_utf8::decl(),
op_encoding_new_decoder::decl(),
op_encoding_decode::decl(),
op_encoding_encode_into::decl(),
Expand Down Expand Up @@ -179,6 +180,39 @@ fn op_encoding_normalize_label(label: String) -> Result<String, AnyError> {
Ok(encoding.name().to_lowercase())
}

#[op(v8)]
fn op_encoding_decode_utf8<'a>(
scope: &mut v8::HandleScope<'a>,
zero_copy: &[u8],
ignore_bom: bool,
) -> Result<serde_v8::Value<'a>, AnyError> {
let buf = &zero_copy;

let buf = if !ignore_bom
&& buf.len() >= 3
&& buf[0] == 0xef
&& buf[1] == 0xbb
&& buf[2] == 0xbf
{
&buf[3..]
} else {
buf
};

// If `String::new_from_utf8()` returns `None`, this means that the
// length of the decoded string would be longer than what V8 can
// handle. In this case we return `RangeError`.
//
// For more details see:
// - https://encoding.spec.whatwg.org/#dom-textdecoder-decode
// - https://github.com/denoland/deno/issues/6649
// - https://github.com/v8/v8/blob/d68fb4733e39525f9ff0a9222107c02c28096e2a/include/v8.h#L3277-L3278
match v8::String::new_from_utf8(scope, buf, v8::NewStringType::Normal) {
littledivy marked this conversation as resolved.
Show resolved Hide resolved
littledivy marked this conversation as resolved.
Show resolved Hide resolved
Some(text) => Ok(serde_v8::from_v8(scope, text.into())?),
None => Err(type_error("buffer exceeds maximum length")),
littledivy marked this conversation as resolved.
Show resolved Hide resolved
}
}

#[op]
fn op_encoding_decode_single(
data: &[u8],
Expand Down
37 changes: 24 additions & 13 deletions ops/lib.rs
Expand Up @@ -910,22 +910,33 @@ fn codegen_u8_slice(core: &TokenStream2, idx: usize) -> TokenStream2 {
let value = args.get(#idx as i32);
match #core::v8::Local::<#core::v8::ArrayBuffer>::try_from(value) {
Ok(b) => {
let store = b.data() as *mut u8;
// SAFETY: rust guarantees that lifetime of slice is no longer than the call.
unsafe { ::std::slice::from_raw_parts_mut(store, b.byte_length()) }
// Handles detached buffers.
let byte_length = b.byte_length();
if byte_length == 0 {
&mut []
} else {
let store = b.data() as *mut u8;
// SAFETY: rust guarantees that lifetime of slice is no longer than the call.
unsafe { ::std::slice::from_raw_parts_mut(store, byte_length) }
}
littledivy marked this conversation as resolved.
Show resolved Hide resolved
},
Err(_) => {
if let Ok(view) = #core::v8::Local::<#core::v8::ArrayBufferView>::try_from(value) {
let (offset, len) = (view.byte_offset(), view.byte_length());
let buffer = match view.buffer(scope) {
Some(v) => v,
None => {
return #core::_ops::throw_type_error(scope, format!("Expected ArrayBufferView at position {}", #idx));
}
};
let store = buffer.data() as *mut u8;
// SAFETY: rust guarantees that lifetime of slice is no longer than the call.
unsafe { ::std::slice::from_raw_parts_mut(store.add(offset), len) }
let len = view.byte_length();
if len == 0 {
&mut []
} else {
let offset = view.byte_offset();
let buffer = match view.buffer(scope) {
Some(v) => v,
None => {
return #core::_ops::throw_type_error(scope, format!("Expected ArrayBufferView at position {}", #idx));
}
};
let store = buffer.data() as *mut u8;
// SAFETY: rust guarantees that lifetime of slice is no longer than the call.
unsafe { ::std::slice::from_raw_parts_mut(store.add(offset), len) }
littledivy marked this conversation as resolved.
Show resolved Hide resolved
}
} else {
return #core::_ops::throw_type_error(scope, format!("Expected ArrayBufferView at position {}", #idx));
}
Expand Down