Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when eager loading with cycle #4213

Closed
SpencerKaiser opened this issue Apr 10, 2023 · 13 comments
Closed

Performance issue when eager loading with cycle #4213

SpencerKaiser opened this issue Apr 10, 2023 · 13 comments

Comments

@SpencerKaiser
Copy link
Contributor

SpencerKaiser commented Apr 10, 2023

Describe the bug
When an entity uses a jsonb column in addition to circular and eagerly loaded properties, the post-query processing has an exponentially growing performance issue

e.g.,

Book
  - jsonb property
  - Eager: Author
    - Eager: Publisher
      - Eager: Books
        - ...

Stack trace
N/A

To Reproduce
See repro below. Important notes:

// ~~~~~ Benchmark ~~~~~
// 50 book records populated

// MacBook Pro
// M1 Max
// 64GB Ram

// SQLite: ~5 seconds
// PostgreSQL: ~65 seconds

// NOTES: Removing either of the two `eager` loads within relevant entities will resolve the issue

While those performance times don't seem like much, the amount grows exponentially as the data set increases... a 100-record query came back in about 50ms and was still processing after 15 minutes of waiting.

Repro

Expected behavior
The query and the post-query processing will be relatively quick and will beO(n log(n)) or better as the record set increases

Additional context
Add any other context about the problem here.

Versions

| Dependency | Version |
| mikro-orm* | 6.0.0-dev.54 |
| node | 16 |
| typescript | 4.9.5 |
| mikro-orm | ? |
| your-driver | PostgreSQL or SQLite |

@SpencerKaiser SpencerKaiser changed the title v6 significant jsonb performance issue when used with eager: true v6 - significant jsonb performance issue when used with eager: true Apr 10, 2023
@SpencerKaiser SpencerKaiser changed the title v6 - significant jsonb performance issue when used with eager: true v6: Significant jsonb performance issue when used with eager: true Apr 10, 2023
@SpencerKaiser
Copy link
Contributor Author

@B4nan were you able to validate the issue? I just had to limit a query of ours to 5 records because at 15 it causes the server to exceed CPU limits and the whole thing crashes 🙃

@B4nan
Copy link
Member

B4nan commented Apr 11, 2023

Only saw the stackblitz, its valid for sure, but no insights yet, need to finish with other WIP first.

@B4nan
Copy link
Member

B4nan commented Apr 11, 2023

I don't think this is about the JSON property, maybe it makes things more visible as the hydration requires parsing, but it looks like its super slow without it too. Also with select-in strategy it gets into some infinite loop apparenlty.

edit: the problem is in EntityFactory.mergeData(), apparently it gets called way too many times

@SpencerKaiser
Copy link
Contributor Author

Only saw the stackblitz, its valid for sure, but no insights yet, need to finish with other WIP first.

Cool, yea no rush I was mostly asking to make sure you thought it was something that should be addressed.

The details from your last comment all make sense; I figured it was something near-infinite-loop-y. If I can help at all, let me know!

@B4nan
Copy link
Member

B4nan commented Apr 12, 2023

It's funny how such a simple test case can uncover so many issues. It all boils down to the mergeData call which is indeed overused, and while I identified ~4 problematic parts in that method, it looks like the culprit is actually the initial result mapping - the first factory.create call gets data where the publisher has a books collection with 2500 items, so for each book there are 50 copies in there. It all gets deduplicated during hydration, but this results in huge overprocessing and recursion.

Not gonna lie, the combination of joined strategy and eager loading is quite bad. While the issue is indeed valid, I would generally discourage the usage of eager loading, it's almost always a bad practice. But it's not working at all with select-in strategy, so... :D

@SpencerKaiser
Copy link
Contributor Author

Oh yea, tests are invaluable and I've learned that more and more as I've grown over the years. However, it's still the first thing to get cut when time is tight 🥲

Ah alright that makes sense... and yea I almost figured you'd come back with "Yea but don't do that" as your first reply, so I was certainly expecting the note about it not being ideal. It felt a little dirty as I wrote it, but a lot had to do with compensation for types, which I no longer have to do. I'll remove all the eager uses and just swap to populate 😬 thanks for the update!

@B4nan
Copy link
Member

B4nan commented Apr 12, 2023

After a lot of struggle, I think I found a way to fix it. I had to add another meta property to keep a list of populated relations, that way I can skip them and escape the cycle when eager loading. It works for both loading strategies, the tests are now running super fast, when I run a single one locally it takes just ~12ms.

@B4nan B4nan changed the title v6: Significant jsonb performance issue when used with eager: true Performance issue when eager loading with cycle Apr 12, 2023
@B4nan B4nan closed this as completed in ecbecfc Apr 12, 2023
@SpencerKaiser
Copy link
Contributor Author

Oh snap that's incredible! So should I just leave our queries the way they are or would you still recommend we avoid eager?

@B4nan
Copy link
Member

B4nan commented Apr 13, 2023

I would still discourage eager loading of to-many relations, unless you are absolutely sure the collection will be small.

@SpencerKaiser
Copy link
Contributor Author

Ah ok that's a good clarification; yea they're all very small collections (1-5). I'll go update the docs to make that recommendation clear.

@SpencerKaiser
Copy link
Contributor Author

@B4nan the fix certainly improved the load time but it's still quite a lot... I think it's best for me to rip it out entirely 😬

@B4nan
Copy link
Member

B4nan commented Apr 15, 2023

Well, you should drop the joined strategy in this case - it will always explode this way due to how sql works. So any to-many relation you want to eager load should have select-in strategy, that should help.

@SpencerKaiser
Copy link
Contributor Author

Just tried overriding it for those properties and it actually makes a second, redundant query for each...

A -> B -> C

turns into

A
  B
    C
      A
        B
          C

From the very beginning I was hesitant about using eager; I'm just gonna remove it... I think it's helpful but it's way too brittle in terms of ways it can blow up 😬

jsprw pushed a commit to jsprw/mikro-orm-full-text-operators that referenced this issue May 7, 2023
B4nan added a commit that referenced this issue Jun 1, 2023
)

This reworks the fix for #4213 which broke populating in parallel.

Closes #4343
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants