perf(composeMatrix): 25% improv by restoring v5 implementation #9851

jiayihu · 2024-05-05T21:40:32Z

Based on the findings of fabricjs/canvas-engines-comparison#1 and my own experiments because I'm writing an article about analysing performance (soon to be published 😏), I saw that composeMatrix performance can be significantly improved for non-rotated/scaled objects by just returning createTranslateMatrix(translateX, translateY).

Then I remembered that @ShaMan123 mentioned to me that he changed the functions to be more declarative in v6, so I looked back at how it was and noticed indeed that it was more performant before: d0d0cfb#diff-c771d7d885099c09929056e6d495faa0cd32bd1269dda11623cd1dddf4122b59

Previously we would avoid uselessly multiplying matrixes if not needed. In my tests this simple change can result in ~25% improved performance, without any API change.

You can test it yourself @asturur, I've prepare a benchmark clone to try out changes https://codesandbox.io/p/sandbox/fabric-bench-forked-4xfvnd. Just change this line in OptimisedRect#calcOwnMatrix:

const value = composeMatrix(options);

To

const value = fabric.util.composeMatrix(options);

The top-right FPS meter is by no means accurate but it's decent indicator of % of improvement. The Chrome Devtools flame chart showed me that renderCanvas time changed from 45ms to 35ms for 16k objects, around 28% improvement. Alternatively you can use the Chrome FPS meter. Whatever you use to measure, there's no doubt about the performance change.

You can also try out the other changes yourself, such as removing the cache key completely in calcOwnMatrix improves the FPS meter by 50% for 16k objects, 80% for 8k objects. But changing cache key is less trivial so we can discuss that on a separate PR/issue.

codesandbox · 2024-05-05T21:40:34Z

Review or Edit in CodeSandbox

Open the branch in Web Editor • VS Code • Insiders

Open Preview

asturur · 2024-05-06T07:24:09Z

i know the old was faster because the mutliplyArray has an extra array and a free wasted multiplication by identity. I think i tried to add this conversation in the past too.

Please leave a comment in both places of the code linking to the old PR: #8894 and writing down that this code has been reverted after changes because of performance reasons and that if we want to change it back we first need to be sure the new implementation has some gains compared to the old one

When we change some code and then we change it back for performance reason i tried to add the habit to leave a small note in a benchmark folder that compares the old code with the new one, i will add that so you don't have to bother

ShaMan123

Who is to blame? Array#reduce or the array creation or both?
I am interested
I would argue then that perhaps we should do a manual calculation in this hot path not relaying on multiplyTransformMatrices if that increases perf even more.
It is not a lot of code

let a,b,c,d,e,f
e = translateX
f = translateY
...

jiayihu · 2024-05-06T10:01:06Z

Who is to blame? Array#reduce or the array creation or both?

It's the array creation and multiplication. Mostly multiplication.

I would argue then that perhaps we should do a manual calculation in this hot path not relaying on multiplyTransformMatrices if that increases perf even more.

You can try it out quickly in the CodeSandbox to see if it improves FPS with 8/16k objects.

asturur · 2024-05-06T10:34:23Z

to blame is generic code over specialized one, in general.

Is not reduce per se, is adding a function stack for each multiplication and a free entire multiplication

In the case of translation only:

export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D, curr) =>
      curr ? multiplyTransformMatrices(curr, product, is2x2) : product,
    iMatrix
  );

We arrive here with an array that we created to wrap the matrix, we start a reduce and we multiply the translate matrix by the identity matrix that serves as initial value.

When N goes up the initial mutliplication cost is diluted, but for a single matrix is 100% . But given the fact that our top case is 3-4 matrices, this extra multiplication is always a consistent percentage
On top of that for each mutliplication we have to enter in and out of the stack of the function.

The old implementation in the best case scenario just pass the matrix through the functions and returns it.

The other possible improvement is to change the code so that translate matrix is created only when there is no rotation but that remains not useful till we have originX and originY

asturur · 2024-05-06T10:49:40Z

src/util/misc/matrix.ts

+  let matrix = createScaleMatrix(
+    flipX ? -scaleX : scaleX,
+    flipY ? -scaleY : scaleY
  );


i m sure at some point we were checking for scale !== 1 and flip thruty to also do this.
I m not sure is worth to create it rather than returning the iMatrix as a reference and then check by strict equality.

calcOwnMatrix has been split for reusal across fabric but maybe was a bad idea, it was good as a code block doing its thing

I m not sure is worth to create it rather than returning the iMatrix as a reference and then check by strict equality.

I thought about that but it's dangerous in fabric returning the same reference, as the iMatrix array could be mutated with terrible consequences

iMatrix is frozen and will throw if you try to mutate it, we can't return it

but maybe it is quicker to create it than to clone it

We don't mutate matrix we always create new i don't see any issue with returning iMatrix when necessary. but anyway this PR is a revert basically, so is fine as it is it does not need more changes apart the code comments

ShaMan123 · 2024-05-06T13:34:44Z

Who is to blame? Array#reduce or the array creation or both?

It's the array creation and multiplication. Mostly multiplication.

But it shouldn't do more multiplications, should it? It should skip if an entry is falsy

asturur · 2024-05-06T13:36:15Z

Who is to blame? Array#reduce or the array creation or both?

It's the array creation and multiplication. Mostly multiplication.

But it shouldn't do more multiplications, should it? It should skip if an entry is falsy

The initial value is always multiplicated by the first matrix that comes from the array. So the first matrix is always multiplied by Imatrix

ShaMan123 · 2024-05-06T13:37:35Z

We arrive here with an array that we created to wrap the matrix, we start a reduce and we multiply the translate matrix by the identity matrix that serves as initial value.

Nice catch! that is the problem here, the initial value.
So I wish to try out the reduce without an initial value because since there is always an initial value the condition never applies to reduce calculation! curr is always define!
I am to blame!

export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D, curr) =>
      curr ? multiplyTransformMatrices(curr, product, is2x2) : product,
    iMatrix
  );

jiayihu · 2024-05-06T13:46:44Z

Bear in mind that multiplyTransformMatrixArray is not used anymore in composeMatrix in this PR so this is true:

But it shouldn't do more multiplications, should it? It should skip if an entry is falsy

asturur · 2024-05-06T13:55:23Z

We do not need to dig into this is a matter of time to dedicate to other things.
We figured out there is a subtantial difference between new and old, we put back the old.
We move on.
We know we will need to remove originX and originY and the matrix logic will change again, there is also matrix caching in the middle of this conversation, this code needs to be re-evaluated again.
Now trying to add code inside the reduce function to skip the initial multiplication, but keep the function stack and the array loop, why? what gain does it bring?

ShaMan123 · 2024-05-06T14:00:43Z

A simple gain.
We understand what was wrong and we fix.
Why not?
multiplyTransformMatrixArray is heavily used everywhere. Why leave this bug? What gain does that give?

Here is the fix:


export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D|undefined, curr) =>
      curr && product? fabric.util.multiplyTransformMatrices(curr, product, is2x2) : curr||product,
    undefined
  ) || fabric.iMatrix.concat();

const composeMatrix = ({
  translateX,
  translateY,
  angle,
  scaleX, 
  scaleY, skewX,skewY,flipX,flipY,
}) => {
  return multiplyTransformMatrixArray([
    (translateX || translateY) && createTranslateMatrix(translateX, translateY),
    angle && createRotateMatrix({ angle }),
    ((scaleX && scaleX !== 1) || (scaleY && scaleY !== 1) || skewX || skewY || flipX || flipY) && calcDimensionsMatrix({ scaleX, scaleY, skewX,skewY,flipX,flipY,})
  ]);
};

ShaMan123 · 2024-05-06T14:02:15Z

@jiayihu please validate that the perf is the same compared to v5
I do not see any difference using https://codesandbox.io/p/sandbox/fabric-bench-composematrix-forked-r2w2sz

asturur · 2024-05-06T14:55:48Z

Yes you can fix multiplyMatricesArray, but shouldn't be in hot code.

jiayihu · 2024-05-06T15:02:14Z

((scaleX && scaleX !== 1) || (scaleY && scaleY !== 1) || skewX || skewY || flipX || flipY) && calcDimensionsMatrix({ scaleX, scaleY, skewX,skewY,flipX,flipY,})

I find this ugly, I'd keep this PR and additionally do this only:

export const multiplyTransformMatrixArray = (
  matrices: (TMat2D | undefined | null | false)[],
  is2x2?: boolean
) =>
  matrices.reduceRight(
    (product: TMat2D|undefined, curr) =>
      curr && product? fabric.util.multiplyTransformMatrices(curr, product, is2x2) : curr||product,
    undefined
  ) || fabric.iMatrix.concat();

ShaMan123 · 2024-05-06T15:08:22Z

I don't mind cleaning up the code a bit, e.g. exposing a util hasMatrixDomensionsProps
I did it as such due to your v8 optimization comments.
The demo proves that with the fix there is no difference between the v5 code and the v6 so now I believe you guys should defend this PR. You wish to revert to old code that is less compact and adds no benefit.
Why? Please explain

jiayihu · 2024-05-06T15:10:02Z

I think the old code is simpler to understand and it worked well. Extracting a function typically doesn't solve the problem, you're just adding more indirection.

ShaMan123 · 2024-05-06T15:26:07Z

Though I believe we should fix bugs, do as you please.
Feel free to merge or close #9860

asturur · 2024-05-06T15:30:44Z

Though I believe we should fix bugs, do as you please.

^^^ This is a comment left in rush.

The change for the extra multiplication is one topic.
Not using loops of functions over matrices in hot code is another topic

If there are bugs in those 2 topics are eventually introduced by swapping the order of multiplications of matrices and float precision.

Both of the open PRs do not address any known bug

asturur · 2024-05-06T15:51:17Z

Can you clarify why you can't fix the implementation of multiplyArray and at the same time we can switch important code to a more lean implementation? What is the issue there?

ShaMan123 · 2024-05-06T17:24:57Z

No issue, I just don't understand why.
You guys found a perf hit.
Then we found the bug that caused it.
I opened a PR to fix it and it seems to have fixed the perf issue that is the reason for this PR.
If there is no benefit why change the code?

asturur · 2024-05-07T11:25:12Z

Because this loop over matrices in hot code is not a good idea. It was brought in as a cleanup and i let go discussing the consequences of it after many messages in the old pr.
Is good for an utility to write terse code but not in one of the most run piece of code.

The perf hit surfaced the occasion to re-discuss it and we can fix the utility and restore the old more verbose, more byte consuming but maybe more appropriate for the situation code.

for me It was a bad idea also to have the calcTransformMatrix in a method that calls an utility that compose the matrices, because even just creating the object to pass down the information and then trash it is wasted time and resources.

When we will get to performances i think we will rollback a bunch of other functions to uglier code

jiayihu · 2024-05-15T14:18:43Z

Any update on the final decision?

asturur · 2024-05-17T14:23:48Z

Sorry i took a little break and i just re-started working on the website today.
Definitely want to merge this and the other.
I ll clean up what is left to clean up and i ll merge

asturur · 2024-05-19T08:25:24Z

Removing the spread operator from the options seems to bring the difference to up to 50%.

asturur · 2024-05-19T08:35:13Z

The wild thing is that if apply the fix to the transform matrix array multiplication in this way

util2.multiplyTransformMatrixArray = (matrices, is2x2) =>
  matrices.reduceRight(
    (product, curr) =>
      curr && product
        ? util2.multiplyTransformMatrices(curr, product, is2x2)
        : curr || product,
    undefined
  ) || [1, 0, 0, 1, 0, 0];

The benchmark is more or less the same in percentage, but everything drops to half, and i can't explain that.

jiayihu · 2024-05-19T10:28:40Z

The benchmark is more or less the same in percentage, but everything drops to half, and i can't explain that

Maybe because you're allocating an array each time and it must be garbage collected. On the benchmark (if you refer to the node.js one) maybe it's not significant because the GC doesn't run during it, it can be delayed the sync benchmark iterations has finished. On the browser however it has to do it between frames, dropping them by half.

perf(composeMatrix): restore v5 implementation

d7891c0

jiayihu changed the title ~~perf(composeMatrix): restore v5 implementation~~ perf(composeMatrix): 25% improv by restoring v5 implementation May 5, 2024

Use consistent naming

5d65ab0

asturur requested a review from ShaMan123 May 6, 2024 07:28

asturur approved these changes May 6, 2024

View reviewed changes

ShaMan123 approved these changes May 6, 2024

View reviewed changes

Fix test error

d514fdb

asturur reviewed May 6, 2024

View reviewed changes

ShaMan123 mentioned this pull request May 6, 2024

fix(): multiplyTransformMatrixArray redundant multiplication #9860

Closed

ShaMan123 closed this May 6, 2024

ShaMan123 reopened this May 6, 2024

asturur added 2 commits May 19, 2024 10:17

benchmark

5280921

a spread less

8eda3b4

asturur added 2 commits May 19, 2024 10:44

changelog

a9d7d14

Merge branch 'master' into perf/compose-matrix

65013aa

asturur merged commit 759ee7c into fabricjs:master May 19, 2024
20 of 22 checks passed

jiayihu deleted the perf/compose-matrix branch May 19, 2024 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(composeMatrix): 25% improv by restoring v5 implementation #9851

perf(composeMatrix): 25% improv by restoring v5 implementation #9851

jiayihu commented May 5, 2024

codesandbox bot commented May 5, 2024

asturur commented May 6, 2024

ShaMan123 left a comment •

edited

jiayihu commented May 6, 2024 •

edited

asturur commented May 6, 2024

asturur May 6, 2024

jiayihu May 6, 2024 •

edited

ShaMan123 May 6, 2024

ShaMan123 May 6, 2024

asturur May 6, 2024

ShaMan123 commented May 6, 2024

asturur commented May 6, 2024

ShaMan123 commented May 6, 2024

jiayihu commented May 6, 2024

asturur commented May 6, 2024

ShaMan123 commented May 6, 2024 •

edited

ShaMan123 commented May 6, 2024 •

edited

asturur commented May 6, 2024

jiayihu commented May 6, 2024

ShaMan123 commented May 6, 2024 •

edited

jiayihu commented May 6, 2024 •

edited

ShaMan123 commented May 6, 2024

asturur commented May 6, 2024 •

edited

asturur commented May 6, 2024

ShaMan123 commented May 6, 2024

asturur commented May 7, 2024

jiayihu commented May 15, 2024

asturur commented May 17, 2024

asturur commented May 19, 2024

asturur commented May 19, 2024 •

edited

jiayihu commented May 19, 2024

perf(composeMatrix): 25% improv by restoring v5 implementation #9851

perf(composeMatrix): 25% improv by restoring v5 implementation #9851

Conversation

jiayihu commented May 5, 2024

codesandbox bot commented May 5, 2024

Review or Edit in CodeSandbox

asturur commented May 6, 2024

ShaMan123 left a comment • edited

Choose a reason for hiding this comment

jiayihu commented May 6, 2024 • edited

asturur commented May 6, 2024

asturur May 6, 2024

Choose a reason for hiding this comment

jiayihu May 6, 2024 • edited

Choose a reason for hiding this comment

ShaMan123 May 6, 2024

Choose a reason for hiding this comment

ShaMan123 May 6, 2024

Choose a reason for hiding this comment

asturur May 6, 2024

Choose a reason for hiding this comment

ShaMan123 commented May 6, 2024

asturur commented May 6, 2024

ShaMan123 commented May 6, 2024

jiayihu commented May 6, 2024

asturur commented May 6, 2024

ShaMan123 commented May 6, 2024 • edited

ShaMan123 commented May 6, 2024 • edited

asturur commented May 6, 2024

jiayihu commented May 6, 2024

ShaMan123 commented May 6, 2024 • edited

jiayihu commented May 6, 2024 • edited

ShaMan123 commented May 6, 2024

asturur commented May 6, 2024 • edited

asturur commented May 6, 2024

ShaMan123 commented May 6, 2024

asturur commented May 7, 2024

jiayihu commented May 15, 2024

asturur commented May 17, 2024

asturur commented May 19, 2024

asturur commented May 19, 2024 • edited

jiayihu commented May 19, 2024

ShaMan123 left a comment •

edited

jiayihu commented May 6, 2024 •

edited

jiayihu May 6, 2024 •

edited

ShaMan123 commented May 6, 2024 •

edited

ShaMan123 commented May 6, 2024 •

edited

ShaMan123 commented May 6, 2024 •

edited

jiayihu commented May 6, 2024 •

edited

asturur commented May 6, 2024 •

edited

asturur commented May 19, 2024 •

edited