Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] Reduce WASM call overhead #1545

Open
Omxjep3434 opened this issue Jan 10, 2024 · 6 comments
Open

[JS] Reduce WASM call overhead #1545

Omxjep3434 opened this issue Jan 10, 2024 · 6 comments

Comments

@Omxjep3434
Copy link

Omxjep3434 commented Jan 10, 2024

The problem

It seems that Yoga API calls are relatively expensive because of WASM overhead. I've noticed that getting all computed values of a tree takes almost as long as the initial Yoga layout calculation. Correct me if I'm wrong, but I believe this is because of the barrier between JS and WASM. To get all computed values, 13 API calls must be made for each node: getComputedPadding/getComputedMargin/getComputedBorder must all be called 4 times for each edge. getComputedLayout must also be called.

My use case

Every time Yoga executes, I must get all computed values and transfer them to my custom layout system.

Benchmarks

I did a few benchmarks by setting up a tree with N nodes, and then getting the computed values of each node. Environment: Windows 10, Chrome 120.0.6099.200, Yoga 2.0.1. The benchmarks in Firefox have lower times, but the ratio between Yoga execution and getting the computed results is the same, if not worse.

100 nodes:

  • Yoga execution: 1.1ms
  • Getting all computed values (13 API calls): 1.3ms
  • Calling getComputedLayout() only: 0.6ms

625 nodes:

  • Yoga execution: 4.8ms
  • Getting all computed values (13 API calls): 4.4ms
  • Calling getComputedLayout() only: 2ms

2,500 nodes:

  • Yoga execution: 18ms
  • Getting all computed values (13 API calls): 14ms
  • Calling getComputedLayout() only: 6.5ms

API proposal

I'm proposing a general getComputedValues() method that returns an object that includes all computed values for all edges so that the computed values of a node can be retrieved with only 1 API call. To make the method more versatile, there could be parameters that specify which values and edges to return. Perhaps a 'recursive' option could be added as well so that the computed values of an entire subtree could be returned with 1 API call.

@nicoburns
Copy link
Contributor

This seems like a sensible proposal to me. Crossing the WASM boundary is indeed expensive. And in my testing, batching calls helps significantly (esp. in Firefox, which was already faster than Chrome). I have some performance experiments with WASM builds of Taffy and Yoga on this branch DioxusLabs/taffy#394

It's also the case that Chrome's WASM implementation is just generally not very optimised compared to the one in Firefox. So if you're testing in Chrome then there is probably room for optimisation in the WASM-implementation itself.

@NickGerleman
Copy link
Contributor

NickGerleman commented Jan 11, 2024

Wow, that’s a lot of overhead.

One more layer we have in here is “embind”. https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html

I’ve been keen on removing it for some unrelated reasons (#1507), but a solution for both of these might be to move the bindings to Emscripten direct function calls. https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#interacting-with-code-direct-function-calls

I think this might lower call/bridging overhead, and the Yoga public API being C should have nice mappings for this most of the time. But I haven’t measured if this has a real impact.

@NickGerleman
Copy link
Contributor

Would you be able to share the code used for the example, where chunkier accessor improved performance?

@NickGerleman
Copy link
Contributor

If you are using Yoga in a case where your source may mutate over time, incremental Yoga layout can avoid the majority of the traversal, both for calculating, and later reading layout results.

But we are missing a trivial binding for the api for this #681

@Omxjep3434
Copy link
Author

Omxjep3434 commented Jan 11, 2024

Here's some more info about my benchmark. I updated my original post as well. Environment: Windows 10, Chrome 120.0.6099.200, Yoga 2.0.1. The benchmarks in Firefox have lower times, but the ratio between Yoga execution and getting the computed results is the same, if not worse.

If you guys can double check the results, that would be great. Hopefully there is not something with my environment or configuration that is causing these results.

Benchmark source code:

async function main() {
  const yoga = await loadYoga();
  postProcessingBenchmark(yoga, 2500);
}

function postProcessingBenchmark(yoga, numChildren) {
  const root = createRoot(yoga, numChildren);
  let start;

  start = performance.now();
  root.calculateLayout(10, 10);
  console.log("Yoga computation: ", performance.now() - start);
  start = performance.now();
  postProcessingLoop(root);
  console.log("Getting computed values: ", performance.now() - start);
}

function createRoot(yoga, numChildren) {
  const root = yoga.Node.create();
  root.setWidth(10);
  root.setHeight(10);

  for (let i = 0; i < numChildren; i += 1) {
    const node = yoga.Node.create();
    root.insertChild(node, i);
  }

  return root;
}

function postProcessingLoop(node) {
  const result = {
    paddingLeft: null, paddingRight: null, paddingTop: null, paddingBottom: null,
    marginLeft: null, marginRight: null, marginTop: null, marginBottom: null,
    borderLeft: null, borderRight: null, borderTop: null, borderBottom: null,
    computedLayout: null
  };

  result.paddingLeft = node.getComputedPadding(Edge.Left);
  result.paddingRight = node.getComputedPadding(Edge.Right);
  result.paddingTop = node.getComputedPadding(Edge.Top);
  result.paddingBottom = node.getComputedPadding(Edge.Bottom);

  result.marginLeft = node.getComputedMargin(Edge.Left);
  result.marginRight = node.getComputedMargin(Edge.Right);
  result.marginTop = node.getComputedMargin(Edge.Top);
  result.marginBottom = node.getComputedMargin(Edge.Bottom);

  result.borderLeft = node.getComputedBorder(Edge.Left);
  result.borderRight = node.getComputedBorder(Edge.Right);
  result.borderTop = node.getComputedBorder(Edge.Top);
  result.borderBottom = node.getComputedBorder(Edge.Bottom);

  result.computedLayout = node.getComputedLayout();

  for (let i = 0; i < node.getChildCount(); i += 1)
    postProcessingLoop(node.getChild(i));
}

@Omxjep3434
Copy link
Author

Omxjep3434 commented Jan 12, 2024

I read the 'embind' documentation you posted Nick.

From the performance section at the bottom of the page:

The call overhead for simple functions has been measured at about 200 ns. While there is room for further optimisation, so far its performance in real-world applications has proved to be more than acceptable.

This makes perfect sense with what we're seeing. 200ns * 2500 nodes * 13 API calls = 6.5ms. I didn't account for the calls to getChildCount() and getChild() as well while iterating through nodes, so that adds even more time. This doesn't account for all of the time, but it's certainly a ton of overhead.

@NickGerleman NickGerleman changed the title [JS] API proposal to more efficiently get computed values [JS] Reduce WASM call overhead Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants