Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interning internal enum representations and field names #3468

Open
grovesNL opened this issue Jun 6, 2023 · 16 comments · May be fixed by #3915
Open

Interning internal enum representations and field names #3468

grovesNL opened this issue Jun 6, 2023 · 16 comments · May be fixed by #3915

Comments

@grovesNL
Copy link
Contributor

grovesNL commented Jun 6, 2023

Motivation

We use a lot of enums for WebGPU at high frequency, e.g. https://docs.rs/web-sys/0.3.63/web_sys/struct.GpuRenderPassDepthStencilAttachment.html#method.stencil_load_op

These enums are represented as strings in the WebIDL, e.g.

enum GPULoadOp {
"load",
"clear"
};

This is problematic because we end up going through JsValue::from_str and creating a significant amount of temporary string allocations for both string enum values and field names.

Proposed Solution

Values like load and clear in the example above could be internally interned (i.e., cached on the JS side) so we don't create temporary strings for these.

I don't know how we'd expose this option, but ideally we'd want to be able to control whether string enum values and field names are interned somehow. This might need to be fine-grained so that only high-frequency enums/field names are interned.

We could consider lazy-loading all string enum values and field names into a cache for interning but that might be excessive.

Alternatives

We could use wasm-bindgen's existing enable-interning feature somehow but we can't use the intern function on enums or field names since they're generated internally in the WebIDL bindings.

We could also consider internally representing these as something other than a string (e.g., remapping to u32 and mapping it back to a string on the JS side).

Additional Context

This adds significant overhead in real use cases so it's not a micro-optimization. I raised this to the WebGPU specification in the early days (hoping we could use u32 instead to avoid the string overhead). We're seeing the performance problems show up in projects now since wasm host bindings aren't available yet.

I'd guess this affects a lot of other generated bindings that are called at high frequencies too (e.g., DOM access, audio) so this could improve performance for a lot of projects, especially if they're sensitive to GC hitching (like most WebGPU applications).

@daxpedda
Copy link
Collaborator

daxpedda commented Jun 6, 2023

We could consider lazy-loading all string enum values and field names into a cache for interning but that might be excessive.

That's the solution I came up with immediately as well. It doesn't sound too excessive to be honest, considering applications that use WebGPU are probably not small.

I'm not sure if there were better proposals in the past, but we could of course change the way bindings like this are generated to be in JS instead of using Reflect, though this would blow up the bindings of course, I assume this tradeoff was made in the past for a reason.
It could make sense to implement some more fine-grained control over this in the macro or do it for enums only.

@grovesNL
Copy link
Contributor Author

grovesNL commented Jun 6, 2023

It doesn't sound too excessive to be honest, considering applications that use WebGPU are probably not small.

I was thinking we'd apply the solution here across all WebIDL bindings instead of just WebGPU, or at least be able to opt-in to it per-WebIDL file or something. I think the string intern cache could be pretty big depending on where we apply it

@daxpedda
Copy link
Collaborator

daxpedda commented Jun 6, 2023

I was thinking we'd apply the solution here across all WebIDL bindings instead of just WebGPU, or at least be able to opt-in to it per-WebIDL file or something.

I don't really think that's an option, it has to be opt-in for users in my opinion.


I did a bit more research on this and I'm not sure why exactly it was decided to implement dictionaries like this, it doesn't seem like a good trade-off to me. I would absolutely be in favor of implementing dictionaries without Reflect, unless I'm missing something here.

@daxpedda
Copy link
Collaborator

daxpedda commented Jun 6, 2023

Looking at the original implementation #702 it wasn't discussed.

Probably the best solution forward would be to just implement dictionaries with the wasm_bindgen macro instead of using Reflect. This shouldn't be that hard to implement, just some changes in the WebIDL generator, but is a big change and I don't feel comfortable enough making this call on my own, so I would like at least another maintainer to chime in. But it's an idea that I would approve of so far.

On that note, nothing is really stopping you from doing this already in wgpu, you can just generate the bindings yourself instead of relying on web-sys. But obviously improving this in web-sys is a desirable goal.

Enums are a bit more involved. This probably requires some adjustments in the wasm_bindgen macro and obviously some work to generate the JS bindings to support it. Again, I would be in favor of that improvement.

All of that should be possible without a breaking change. Happy to provide guidance to anybody wanting to work on this.

@grovesNL
Copy link
Contributor Author

grovesNL commented Jun 6, 2023

I don't really think that's an option, it has to be opt-in for users in my opinion.

Right, this is why I was thinking it might be excessive if it's only useful for high frequency APIs.

Probably the best solution forward would be to just implement dictionaries with the wasm_bindgen macro instead of using Reflect.

What are the benefits of going through the macro vs. interning the strings (e.g. wrapping field names and enum values in intern(value) in the generated bindings)?

On that note, nothing is really stopping you from doing this already in wgpu, you can just generate the bindings yourself instead of relying on web-sys. But obviously improving this in web-sys is a desirable goal.

Some projects already use the web-sys types directly and they're part of wgpu's public API, so we can't really do this in a practical way without slightly breaking the existing ecosystem.

Happy to provide guidance to anybody wanting to work on this.

Could you outline the rough steps here a bit? I've changed the WebIDL generator before but not sure how involved this change would be.

@daxpedda
Copy link
Collaborator

daxpedda commented Jun 6, 2023

What are the benefits of going through the macro vs. interning the strings (e.g. wrapping field names and enum values in intern(value) in the generated bindings)?

Going through the macro will generate JS bindings, so instead of using Reflect, which accesses properties by a string which you have to pass to JS, it will just call a generated JS function that accesses the property in JS. Meaning there won't be any strings involved at all, except parameters, e.g. enums or string parameters. I outlined how to remove the whole string debacle from enums above as well.

Could you outline the rough steps here a bit? I've changed the WebIDL generator before but not sure how involved this change would be.

Currently the WebIDL generator converts this:

dictionary Foo {
    float bar;
};

... roughly into this:

#[wasm_bindgen]
extern "C" {
    #[wasm_bindgen(extends = Object)]
    type Foo;
}

impl Foo {
    fn bar(&mut self, val: f32) -> &mut Self {
        Reflect::set(&self, "bar".into(), val.into());
        self
    }
}

What I'm proposing is to let the wasm_bindgen macro actually do it's job and instead generate the following:

#[wasm_bindgen]
extern "C" {
    #[wasm_bindgen(extends = Object)]
    type Foo;

    // don't expose this method
    #[wasm_bindgen(method, getter, js_name = bar)]
    fn bar_internal(this: &Foo, val: f32);
}

// here we wrap the method above to keep the interface the same; no breaking changes after all
impl Foo {
    fn bar(&mut self, val: f32) -> &mut Self {
        self.bar_internal(val);
        self
    }
}

Though as I said above, I would like to get the input of another maintainer first.

@Liamolucko
Copy link
Collaborator

Though as I said above, I would like to get the input of another maintainer first.

This all sounds reasonable to me 👍

Also, I looked into the string cache a bit and it would be able to work with the current design if not for the fact that JsValue::from_str doesn't currently use it; it's only used when directly passing an &str over the ABI.

You don't need to call intern every time you pass a string to JS, you just need to call it once with a string and it will then be cached every time you pass it to JS from then on. So, if JsValue::from_str were to be fixed to use the cache, you could just call intern("load") once, and from then on passing GpuLoadOp::Load to JS would load the string from the cache every time.

@daxpedda
Copy link
Collaborator

daxpedda commented Jun 7, 2023

@grovesNL I think we crystallized a couple of solutions here for you:

  1. Modify the WebIDL generator to generate proper JS bindings for dictionaries, see Interning internal enum representations and field names #3468 (comment). This should significantly reduce the issue already.
  2. Fix JsValue::from_str() to use the cache, which would allow you to cache all the enum strings you need in wgpu beforehand. You can also use this to improve the calls to dictionaries without 1., though 1. is easy enough and is a way more comprehensive solution.
  3. Support enums in #[wasm_bindgen] extern "C" { ... }, which would generate JS bindings for enums, completely avoiding strings altogether.

I think solution 1. and 2. are easy enough that any Rust contributor can tackle them, 3. is much more involved, but definitely not impossible for somebody new to wasm-bindgen. Happy to provide guidance for the implementation 😃.

@grovesNL
Copy link
Contributor Author

@daxpedda ok great! In that case solution 3 sounds like the proper fix in the long term. Could you provide any implementation guidance in this thread? I'd be happy to start looking into it.

@daxpedda
Copy link
Collaborator

daxpedda commented Jun 13, 2023

Solution 1. is probably what you should start with. Solution 3. only replaces solution 2.

--

Solution 1.

Should be fairly easy, you just need to change what the WebIDL generator outputs in the case of dictionaries. So this looks like it is already everything you need to change.

For details on what you should actually let the WebIDL generator generate, look at #3468 (comment).

Solution 3.

Is more complicated and involves two parts.
You have to implement enum support to #[wasm_bindgen] extern "C".

  1. First you probably have to learn how types are "described" to the CLI. So you should read this part in the wasm-bindgen Guide.
  2. Implement a "description" for enums, probably starting somewhere here.
  3. Generate proper JS somewhere here.
  4. Let the proc-macro do the whole job here.

Probably what you want to end up with is just passing through an integer. The enum should generate a JS function that turns this integer into the proper string, which is then called. To generate the enum JS function, you will have to encode the variant strings into the "description".

The second part is just again, like in solution 1., hooking it up into the WebIDL generator to use the feature you just implemented.


If you get blocked on anything please feel free to just contact me on Matrix.

@Liamolucko
Copy link
Collaborator

  1. You have to implement enum support to #[wasm_bindgen] extern "C".

#[wasm_bindgen] already supports enums, and web-sys is already using them, it's just that that support currently consists of turning them into JsValues before passing them over the ABI. So they're pretty much entirely handled by the macro right now:

impl ToTokens for ast::ImportEnum {
fn to_tokens(&self, tokens: &mut TokenStream) {
let vis = &self.vis;
let name = &self.name;
let expect_string = format!("attempted to convert invalid {} into JSValue", name);
let variants = &self.variants;
let variant_strings = &self.variant_values;
let attrs = &self.rust_attrs;
let mut current_idx: usize = 0;
let variant_indexes: Vec<Literal> = variants
.iter()
.map(|_| {
let this_index = current_idx;
current_idx += 1;
Literal::usize_unsuffixed(this_index)
})
.collect();
// Borrow variant_indexes because we need to use it multiple times inside the quote! macro
let variant_indexes_ref = &variant_indexes;
// A vector of EnumName::VariantName tokens for this enum
let variant_paths: Vec<TokenStream> = self
.variants
.iter()
.map(|v| quote!(#name::#v).into_token_stream())
.collect();
// Borrow variant_paths because we need to use it multiple times inside the quote! macro
let variant_paths_ref = &variant_paths;
let wasm_bindgen = &self.wasm_bindgen;
(quote! {
#(#attrs)*
#vis enum #name {
#(#variants = #variant_indexes_ref,)*
#[automatically_derived]
#[doc(hidden)]
__Nonexhaustive,
}
#[automatically_derived]
impl #name {
fn from_str(s: &str) -> Option<#name> {
match s {
#(#variant_strings => Some(#variant_paths_ref),)*
_ => None,
}
}
fn to_str(&self) -> &'static str {
match self {
#(#variant_paths_ref => #variant_strings,)*
#name::__Nonexhaustive => panic!(#expect_string),
}
}
#vis fn from_js_value(obj: &#wasm_bindgen::JsValue) -> Option<#name> {
obj.as_string().and_then(|obj_str| Self::from_str(obj_str.as_str()))
}
}
// It should really be using &str for all of these, but that requires some major changes to cli-support
#[automatically_derived]
impl #wasm_bindgen::describe::WasmDescribe for #name {
fn describe() {
<#wasm_bindgen::JsValue as #wasm_bindgen::describe::WasmDescribe>::describe()
}
}
#[automatically_derived]
impl #wasm_bindgen::convert::IntoWasmAbi for #name {
type Abi = <#wasm_bindgen::JsValue as #wasm_bindgen::convert::IntoWasmAbi>::Abi;
#[inline]
fn into_abi(self) -> Self::Abi {
<#wasm_bindgen::JsValue as #wasm_bindgen::convert::IntoWasmAbi>::into_abi(self.into())
}
}
#[automatically_derived]
impl #wasm_bindgen::convert::FromWasmAbi for #name {
type Abi = <#wasm_bindgen::JsValue as #wasm_bindgen::convert::FromWasmAbi>::Abi;
unsafe fn from_abi(js: Self::Abi) -> Self {
let s = <#wasm_bindgen::JsValue as #wasm_bindgen::convert::FromWasmAbi>::from_abi(js);
#name::from_js_value(&s).unwrap_or(#name::__Nonexhaustive)
}
}
#[automatically_derived]
impl #wasm_bindgen::convert::OptionIntoWasmAbi for #name {
#[inline]
fn none() -> Self::Abi { <::js_sys::Object as #wasm_bindgen::convert::OptionIntoWasmAbi>::none() }
}
#[automatically_derived]
impl #wasm_bindgen::convert::OptionFromWasmAbi for #name {
#[inline]
fn is_none(abi: &Self::Abi) -> bool { <::js_sys::Object as #wasm_bindgen::convert::OptionFromWasmAbi>::is_none(abi) }
}
#[automatically_derived]
impl From<#name> for #wasm_bindgen::JsValue {
fn from(obj: #name) -> #wasm_bindgen::JsValue {
#wasm_bindgen::JsValue::from(obj.to_str())
}
}
}).to_tokens(tokens);
}
}

Everything else you said is right, it's just that the existing implementation needs to be modified rather than making a new one.

@daxpedda
Copy link
Collaborator

#[wasm_bindgen] already supports enums, and web-sys is already using them, it's just that that support currently consists of turning them into JsValues before passing them over the ABI.

So my thinking was that we might want to leave whatever #[wasm_bindgen] enum Test { ... } is doing right now in place and introduce a new #[wasm_bindgen] extern "C" { enum Test { ... } }.

After all they do represent two different things, export vs import. On the other hand this might be pretty useless, as JS isn't typesafe in this way and enums don't actually exist like this per se.

WDYT?

@Liamolucko
Copy link
Collaborator

So my thinking was that we might want to leave whatever #[wasm_bindgen] enum Test { ... } is doing right now in place and introduce a new #[wasm_bindgen] extern "C" { enum Test { ... } }.

After all they do represent two different things, export vs import. On the other hand this might be pretty useless, as JS isn't typesafe in this way and enums don't actually exist like this per se.

While #[wasm_bindgen] enum Test { ... } should represent exporting an enum, it doesn't in the case of string enums; it actually represents importing them. Nothing gets exported to JS, and it's internally represented as an import. I agree that the extern "C" syntax would make more sense, but even if we were to add that it should share the implementation; I don't want to have two different features that do the exact same thing.

Applying #[wasm_bindgen] to a non-string enum, on the other hand, does actually export it.

@daxpedda
Copy link
Collaborator

I was not aware of this behavior ...
Sounds good to me, I don't think extending proc-macro support to enums in extern "C" is even necessary at all then.

@Davidster
Copy link
Contributor

Is this currently being worked on? If no, would the info in this issue be enough for a wasm bindgen noob to work on it? I wouldn't mind giving it a shot. It's the currently the biggest perf bottleneck in my engine's web target.

@daxpedda
Copy link
Collaborator

See #3468 (comment) for a summary of what needs to be done.

The 1st solution outlined there should be pretty easy to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants