Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MongoDB nested/include query slow #17142

Closed
foghina opened this issue Jan 4, 2023 · 9 comments
Closed

MongoDB nested/include query slow #17142

foghina opened this issue Jan 4, 2023 · 9 comments
Labels
bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. kind/bug A reported bug. team/client Issue for team Client. topic: include topic: mongodb topic: performance/queries
Milestone

Comments

@foghina
Copy link

foghina commented Jan 4, 2023

Bug description

It seems that, when using include with MongoDB, the generated query uses "in" instead of "eq", resulting in significantly slower queries.

How to reproduce

Relevant schema bits:

model Series {
  id       String    @id @default(auto()) @map("_id") @db.ObjectId
  imdbId   String    @unique
  title    Title
  episodes Episode[]

  @@index([title.primaryTitle])
}

model Episode {
  id           String @id @default(auto()) @map("_id") @db.ObjectId
  imdbId       String @unique
  season       Int?
  episode      Int?
  title        Title
  Series       Series @relation(fields: [seriesImdbId], references: [imdbId])
  seriesImdbId String

  @@index([seriesImdbId])
}

Now, say I have the imdbId for a series, and I want to fetch it + all the episodes. The following takes under 130ms to run on my dev machine:

console.time("my-query");
const series = await prisma.series.findUnique({
  where: { imdbId: tconst },
});
const episodes = await prisma.episode.findMany({
  where: { seriesImdbId: tconst },
});
console.timeEnd("my-query");

However, the following takes over 2.5s:

console.time("my-query");
const series = await prisma.series.findUnique({
  where: { imdbId: tconst },
  include: { episodes: true },
});
const episodes = series.episodes;
console.timeEnd("my-query");

Now, I've enabled query logging and saw the following:

Slow:
Query: db.Episode.aggregate([ { $match: { $expr: { $and: [ { $in: [ "$seriesImdbId", [ { $literal: "tt0106179", }, ], ], }, { $ne: [ "$seriesImdbId", "$$REMOVE", ], }, ], }, }, }, { $project: { _id: 1, imdbId: 1, season: 1, episode: 1, title.primaryTitle: 1, title.originalTitle: 1, title.isAdult: 1, title.startYear: 1, title.endYear: 1, title.runtimeMins: 1, title.genres: 1, title.rating.numVotes: 1, title.rating.avgRating: 1, seriesImdbId: 1, }, }, ])

Fast:
Query: db.Episode.aggregate([ { $match: { $expr: { $and: [ { $eq: [ "$seriesImdbId", { $literal: "tt0106179", }, ], }, { $ne: [ "$seriesImdbId", "$$REMOVE", ], }, ], }, }, }, { $project: { _id: 1, imdbId: 1, season: 1, episode: 1, title.primaryTitle: 1, title.originalTitle: 1, title.isAdult: 1, title.startYear: 1, title.endYear: 1, title.runtimeMins: 1, title.genres: 1, title.rating.numVotes: 1, title.rating.avgRating: 1, seriesImdbId: 1, }, }, ])

I think the key here is that the slow query uses operator $in, whereas the fast query uses $eq.

Expected behavior

The query using include should not take 20x longer.

Prisma information

included in repro

Environment & setup

  • Fedora 37
  • MongoDB 6.0
  • Node v19.2.0

Prisma Version

prisma                  : 4.8.0
@prisma/client          : 4.8.0
Current platform        : rhel-openssl-3.0.x
Query Engine (Node-API) : libquery-engine d6e67a83f971b175a593ccc12e15c4a757f93ffe (at node_modules/@prisma/engines/libquery_engine-rhel-openssl-3.0.x.so.node)
Migration Engine        : migration-engine-cli d6e67a83f971b175a593ccc12e15c4a757f93ffe (at node_modules/@prisma/engines/migration-engine-rhel-openssl-3.0.x)
Introspection Engine    : introspection-core d6e67a83f971b175a593ccc12e15c4a757f93ffe (at node_modules/@prisma/engines/introspection-engine-rhel-openssl-3.0.x)
Format Binary           : prisma-fmt d6e67a83f971b175a593ccc12e15c4a757f93ffe (at node_modules/@prisma/engines/prisma-fmt-rhel-openssl-3.0.x)
Format Wasm             : @prisma/prisma-fmt-wasm 4.8.0-61.d6e67a83f971b175a593ccc12e15c4a757f93ffe
Default Engines Hash    : d6e67a83f971b175a593ccc12e15c4a757f93ffe
Studio                  : 0.479.0
@foghina foghina added the kind/bug A reported bug. label Jan 4, 2023
@jkomyno jkomyno added team/client Issue for team Client. bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. topic: performance/queries topic: mongodb labels Jan 5, 2023
@janpio
Copy link
Member

janpio commented Jan 6, 2023

Thanks for the nice investigation @foghina 👍

@sebastiansas1
Copy link

sebastiansas1 commented Apr 25, 2023

Hi all. Do you guys have any updates on this?

I can confirm there's a significant delay when using include with MongoDB.

It is significantly impacting our production environment. Simple queries with include are taking on average 10x more time to complete.

PNG image

@rjzheng
Copy link

rjzheng commented Jun 19, 2023

Happening to us as well, it wouldn't be usable at scale.

@rjzheng
Copy link

rjzheng commented Jul 25, 2023

I did some digging and this is what I found:
Prisma uses $expr and $in and indexes cannot be used for that combination in Mongo. Although it may eventually be supported, it does not at this point. Because of that, the query would scan every document in the collection.
image
image

If only Prisma uses $expr and $or + $eq instead, it'll be able to use indexes and speed up queries by magnitudes
image
image

I'm not sure if this would give Prisma a head start or not but we are currently going to convert all of our queries to not use include or connect, just very simple queries and if we need complex lookups, we'll just use aggregateRaw to write our own aggregation. It's really bad DX but can't do much about it :(

@EricVentor
Copy link

My issue is related, but isn't for includes. When trying to do a query with Prisma using in where the values were ids, Mongo was doing a table scan.

Slow solution:

where.id = {
  in: ids.map((id) => id),
};

In order to use the index for each id I did:

where.OR = ids.map((id) => ({ id: { equals: id} }));

@macrozone
Copy link

select has the same problem.

i think this is a very serious issue and should be tackled with high priority.

@offchan42
Copy link

offchan42 commented Jan 31, 2024

My issue is related, but isn't for includes. When trying to do a query with Prisma using in where the values were ids, Mongo was doing a table scan.

Slow solution:

where.id = {
  in: ids.map((id) => id),
};

In order to use the index for each id I did:

where.OR = ids.map((id) => ({ id: { equals: id} }));

Yep, it's because of the $expr $and $in operator.
Here is another example from me. This code is also slow even if there's index on the key:

export function getCreationsFromShareTokens(shareTokens: string[]) {
  return prisma.creation.findMany({
    where: {
      shareToken: { in: shareTokens },
    },
  })
}

@Rieranthony
Copy link

Any news on this specific problem? I feel like mongodb isn't really being taken care of on prisma anymore

@janpio janpio changed the title MongoDB nested/include query slow MongoDB nested/include query slow Apr 24, 2024
@laplab
Copy link
Contributor

laplab commented May 13, 2024

This performance problem is fixed in Prisma 5.14.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. kind/bug A reported bug. team/client Issue for team Client. topic: include topic: mongodb topic: performance/queries
Projects
None yet
Development

No branches or pull requests

10 participants