Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anormal time spent in initializeCollections (quadratic behaviour somewhere ?) #4171

Closed
rvion opened this issue Mar 27, 2023 · 9 comments
Closed

Comments

@rvion
Copy link

rvion commented Mar 27, 2023

performance issue in initializeCollections

(using "5.6.16-dev.6", but the issue was present in previous versions too)

馃憢 Hello @B4nan

I have a worker that process quite a few rows from various tables in batch.

Initialize Collections takes way more time that we expect 馃

image

some batch hints about some quadratic behaviours: here:

  • 馃煝 150ms only to find and hydrate models
  • 馃敶 12seconds to initialize collections

image


possible culprit:

at https://github.com/mikro-orm/mikro-orm/blob/master/packages/core/src/entity/EntityLoader.ts#L223

image

  • could it be something around this filter within a for ?
  • or the fact helper(entity).getPrimaryKey() is not cached above the filter closure ?
  • or some hydratation checks that runs too many times on already instanciated entities ?
  private initializeOneToMany<T>(filtered: T[], children: AnyEntity[], prop: EntityProperty, field: keyof T): void {
    for (const entity of filtered) {
      const items = children.filter(child => {
        if (prop.targetMeta!.properties[prop.mappedBy].mapToPk) {
          return child[prop.mappedBy] as unknown === helper(entity).getPrimaryKey();
        }

        return Reference.unwrapReference(child[prop.mappedBy]) as unknown === entity;
      });

      (entity[field] as unknown as Collection<AnyEntity>).hydrate(items);
    }
  }

@B4nan Not quite sure what is the best way to help you reproduce this. not event sure if it's worth anything in case you'll simply spot the problem by just looking at the code.

I'm a bit lost there, but tell me if I can do anything to help !

Best

@B4nan
Copy link
Member

B4nan commented Mar 27, 2023

Do you use composite PKs, or FKs as PKs heavily? That could add some overhead to the getPrimaryKey call. But that part is executed only when you use mapToPk, do you use that?

Could be more about the hydration, especially if your entities are huge. The Collection.hydrate does quite a lot of things, including propagation. How many items do you have in those collections?

Can't help much without a complete reproduction, this will be probably very specific to your entity definition and number of items in the context. You know what method does it, so it should be quite easy, or what am I missing? It is about populating some to-many relation, so just create the entity definitions, populate with a lot of random data and do the find call.

@rvion
Copy link
Author

rvion commented Mar 27, 2023

thanks for the quick response; I'll make a repro ! I understand those kind of issues without context can be quite annoying x)

In the meantime, just sharing this in case it rings a bell.

image

our collections are large (a few dozen to a few thouthands items max) but nothing too crazy; never more than 3/4000 instances.

I think we should not be using mapToPk as we only have basic PKs(id) and non composite Foreign keys 馃 but the execution cost near the lines 170 / 171 / 174 feels wrong.

@B4nan
Copy link
Member

B4nan commented Mar 27, 2023

That's indeed very interesting/weird. But at the same time, I feel like this will be much easier to fix than a hydration/propagation issue :]

I guess we will just need to chunk the children array into groups in one go instead of filtering it again and again for every entity.

@rvion
Copy link
Author

rvion commented Mar 27, 2023

Yeah, that was my initial guess; with ~1000 items in batch, each having ~1000 children, fusing the loop, or optimising the comparaison could probably brings a 2 or 3 magnitude order speedup.

@B4nan
Copy link
Member

B4nan commented Mar 27, 2023

Yeah, that was my initial guess; with ~1000 items in batch, each having ~1000 children, fusing the loop, or optimising the comparaison could probably brings a 2 or 3 magnitude order speedup.

Well, if you want to have 1 million entities in context, I'm afraid it will never be very fast :D

Looks like this is enough to reproduce:

test('perf: populating many large one to many collections (#4171)', async () => {
  console.time('create');

  for (let i = 1; i <= 1000; i++) {
    const author = new Author2('Jon Snow ' + i, `snow-${i}@wall.st`);

    for (let j = 1; j <= 20; j++) {
      author.books.add(new Book2(`My Life on The Wall, part ${i}/${j}`, author));
    }

    await orm.em.fork().persist(author).flush();
  }

  console.timeEnd('create');

  console.time('find');
  await orm.em.fork().find(Author2, {}, { populate: ['books'] });
  console.timeEnd('find');
});

If I comment the problematic filter line and use an empty array instead, the find time goes from 9s to 0,5s.

@gggdomi
Copy link

gggdomi commented Mar 27, 2023

Starting from your repro, a quick benchmark which may help: https://stackblitz.com/edit/mikro-orm-repro-bzsaor?file=test.ts

It's surprising:

  • test 1: many authors, few books per author: create is fast (scales ~linearly with number of authors), find does not scale linearly at all
  • test 2+3: few authors, many books per author: find is fast (scales ~linearly with number of books), create does not scale linearly at all

Capture d鈥檈虂cran 2023-03-27 a虁 14 06 15

@B4nan
Copy link
Member

B4nan commented Mar 27, 2023

The slow part is actually reading the relation property (so the child[prop.mappedBy] bit), as that is defined as a lazy getter on the prototype to allow propagation. That itself could be hopefully improved if we define the get/set pair directly on the instance after we create it, and it could be gotten around by using child.__helper.__data[prop.mappedBy] instead, so using the underlying value instead of the getter.

@rvion
Copy link
Author

rvion commented Mar 27, 2023

I tried to quickly patch the code like this:

initializeOneToMany(filtered, children, prop, field) {
   for (const entity of filtered) {
       const items = children.filter(child => {
           // /* 馃敶 */ if (prop.targetMeta.properties[prop.mappedBy].mapToPk) {
           // /* 馃敶 */     return child[prop.mappedBy] === (0, wrap_1.helper)(entity).getPrimaryKey();
           // /* 馃敶 */ }
           // /* 馃敶 */ return Reference_1.Reference.unwrapReference(child[prop.mappedBy]) === entity;

           return Reference_1.Reference.unwrapReference(child.__helper.__data[prop.mappedBy]) === entity;
       });
       entity[field].hydrate(items);
   }
}

Everything is indeed faster 馃憤, but the cost remains high in unexpected places:

image

and there is still some weird flamegraph

image

馃

@rvion
Copy link
Author

rvion commented Mar 27, 2023

so I guess the nested filter within the for loop also is at play ? 馃

@B4nan B4nan closed this as completed in 875d966 Mar 27, 2023
jsprw pushed a commit to jsprw/mikro-orm-full-text-operators that referenced this issue May 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants