Anormal time spent in `initializeCollections` (quadratic behaviour somewhere ?) #4171

rvion · 2023-03-27T09:37:25Z

performance issue in `initializeCollections`

(using "5.6.16-dev.6", but the issue was present in previous versions too)

👋 Hello @B4nan

I have a worker that process quite a few rows from various tables in batch.

`Initialize Collections` takes way more time that we expect 🤔

some batch hints about some quadratic behaviours: here:

🟢 150ms only to find and hydrate models
🔴 12seconds to initialize collections

possible culprit:

at https://github.com/mikro-orm/mikro-orm/blob/master/packages/core/src/entity/EntityLoader.ts#L223

could it be something around this filter within a for ?
or the fact helper(entity).getPrimaryKey() is not cached above the filter closure ?
or some hydratation checks that runs too many times on already instanciated entities ?

  private initializeOneToMany<T>(filtered: T[], children: AnyEntity[], prop: EntityProperty, field: keyof T): void {
    for (const entity of filtered) {
      const items = children.filter(child => {
        if (prop.targetMeta!.properties[prop.mappedBy].mapToPk) {
          return child[prop.mappedBy] as unknown === helper(entity).getPrimaryKey();
        }

        return Reference.unwrapReference(child[prop.mappedBy]) as unknown === entity;
      });

      (entity[field] as unknown as Collection<AnyEntity>).hydrate(items);
    }
  }

@B4nan Not quite sure what is the best way to help you reproduce this. not event sure if it's worth anything in case you'll simply spot the problem by just looking at the code.

I'm a bit lost there, but tell me if I can do anything to help !

Best

The text was updated successfully, but these errors were encountered:

B4nan · 2023-03-27T10:03:08Z

Do you use composite PKs, or FKs as PKs heavily? That could add some overhead to the getPrimaryKey call. But that part is executed only when you use mapToPk, do you use that?

Could be more about the hydration, especially if your entities are huge. The Collection.hydrate does quite a lot of things, including propagation. How many items do you have in those collections?

Can't help much without a complete reproduction, this will be probably very specific to your entity definition and number of items in the context. You know what method does it, so it should be quite easy, or what am I missing? It is about populating some to-many relation, so just create the entity definitions, populate with a lot of random data and do the find call.

rvion · 2023-03-27T10:15:45Z

thanks for the quick response; I'll make a repro ! I understand those kind of issues without context can be quite annoying x)

In the meantime, just sharing this in case it rings a bell.

our collections are large (a few dozen to a few thouthands items max) but nothing too crazy; never more than 3/4000 instances.

I think we should not be using mapToPk as we only have basic PKs(id) and non composite Foreign keys 🤔 but the execution cost near the lines 170 / 171 / 174 feels wrong.

B4nan · 2023-03-27T10:20:06Z

That's indeed very interesting/weird. But at the same time, I feel like this will be much easier to fix than a hydration/propagation issue :]

I guess we will just need to chunk the children array into groups in one go instead of filtering it again and again for every entity.

rvion · 2023-03-27T10:25:26Z

Yeah, that was my initial guess; with ~1000 items in batch, each having ~1000 children, fusing the loop, or optimising the comparaison could probably brings a 2 or 3 magnitude order speedup.

B4nan · 2023-03-27T10:45:21Z

Yeah, that was my initial guess; with ~1000 items in batch, each having ~1000 children, fusing the loop, or optimising the comparaison could probably brings a 2 or 3 magnitude order speedup.

Well, if you want to have 1 million entities in context, I'm afraid it will never be very fast :D

Looks like this is enough to reproduce:

test('perf: populating many large one to many collections (#4171)', async () => {
  console.time('create');

  for (let i = 1; i <= 1000; i++) {
    const author = new Author2('Jon Snow ' + i, `snow-${i}@wall.st`);

    for (let j = 1; j <= 20; j++) {
      author.books.add(new Book2(`My Life on The Wall, part ${i}/${j}`, author));
    }

    await orm.em.fork().persist(author).flush();
  }

  console.timeEnd('create');

  console.time('find');
  await orm.em.fork().find(Author2, {}, { populate: ['books'] });
  console.timeEnd('find');
});

If I comment the problematic filter line and use an empty array instead, the find time goes from 9s to 0,5s.

gggdomi · 2023-03-27T11:51:34Z

Starting from your repro, a quick benchmark which may help: https://stackblitz.com/edit/mikro-orm-repro-bzsaor?file=test.ts

It's surprising:

test 1: many authors, few books per author: create is fast (scales ~linearly with number of authors), find does not scale linearly at all
test 2+3: few authors, many books per author: find is fast (scales ~linearly with number of books), create does not scale linearly at all

B4nan · 2023-03-27T12:55:26Z

The slow part is actually reading the relation property (so the child[prop.mappedBy] bit), as that is defined as a lazy getter on the prototype to allow propagation. That itself could be hopefully improved if we define the get/set pair directly on the instance after we create it, and it could be gotten around by using child.__helper.__data[prop.mappedBy] instead, so using the underlying value instead of the getter.

rvion · 2023-03-27T15:20:08Z

I tried to quickly patch the code like this:

initializeOneToMany(filtered, children, prop, field) {
   for (const entity of filtered) {
       const items = children.filter(child => {
           // /* 🔴 */ if (prop.targetMeta.properties[prop.mappedBy].mapToPk) {
           // /* 🔴 */     return child[prop.mappedBy] === (0, wrap_1.helper)(entity).getPrimaryKey();
           // /* 🔴 */ }
           // /* 🔴 */ return Reference_1.Reference.unwrapReference(child[prop.mappedBy]) === entity;

           return Reference_1.Reference.unwrapReference(child.__helper.__data[prop.mappedBy]) === entity;
       });
       entity[field].hydrate(items);
   }
}

Everything is indeed faster 👍, but the cost remains high in unexpected places:

and there is still some weird flamegraph

🤔

rvion · 2023-03-27T15:24:09Z

so I guess the nested filter within the for loop also is at play ? 🤔

Closes mikro-orm#4171

B4nan added the performance label Mar 27, 2023

B4nan closed this as completed in 875d966 Mar 27, 2023

jsprw pushed a commit to jsprw/mikro-orm-full-text-operators that referenced this issue May 7, 2023

perf(core): populating many large 1:M collections

af914f8

Closes mikro-orm#4171

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anormal time spent in `initializeCollections` (quadratic behaviour somewhere ?) #4171

Anormal time spent in `initializeCollections` (quadratic behaviour somewhere ?) #4171

rvion commented Mar 27, 2023

B4nan commented Mar 27, 2023 •

edited

rvion commented Mar 27, 2023

B4nan commented Mar 27, 2023

rvion commented Mar 27, 2023

B4nan commented Mar 27, 2023

gggdomi commented Mar 27, 2023 •

edited

B4nan commented Mar 27, 2023

rvion commented Mar 27, 2023 •

edited

rvion commented Mar 27, 2023

Anormal time spent in initializeCollections (quadratic behaviour somewhere ?) #4171

Anormal time spent in initializeCollections (quadratic behaviour somewhere ?) #4171

Comments

rvion commented Mar 27, 2023

performance issue in initializeCollections

Initialize Collections takes way more time that we expect 🤔

some batch hints about some quadratic behaviours: here:

possible culprit:

B4nan commented Mar 27, 2023 • edited

rvion commented Mar 27, 2023

B4nan commented Mar 27, 2023

rvion commented Mar 27, 2023

B4nan commented Mar 27, 2023

gggdomi commented Mar 27, 2023 • edited

B4nan commented Mar 27, 2023

rvion commented Mar 27, 2023 • edited

rvion commented Mar 27, 2023

Anormal time spent in `initializeCollections` (quadratic behaviour somewhere ?) #4171

Anormal time spent in `initializeCollections` (quadratic behaviour somewhere ?) #4171

performance issue in `initializeCollections`

`Initialize Collections` takes way more time that we expect 🤔

B4nan commented Mar 27, 2023 •

edited

gggdomi commented Mar 27, 2023 •

edited

rvion commented Mar 27, 2023 •

edited