Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Join" performance: (Not so) huge data set throws an error #13306

Closed
seromenho opened this issue May 11, 2022 · 6 comments
Closed

"Join" performance: (Not so) huge data set throws an error #13306

seromenho opened this issue May 11, 2022 · 6 comments
Assignees
Labels
bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. kind/bug A reported bug. team/client Issue for team Client. tech/engines Issue for tech Engines. topic: joins topic: performance/queries topic: performance
Milestone

Comments

@seromenho
Copy link

seromenho commented May 11, 2022

Bug description

"Joining" 5-6 tables with a data set > 20k rows (not that big) is a bottleneck on the DB given how Prisma does the "joins". Each join creates queries with all the ids from the previous one.
This is a non-issue for the DB when using a join but the way Prisma is doing the "join" it is not performant and the CPU doesn't handle it.

Executing

await prisma.a.findMany({
      where: {
        name: '40k records',
      },
      select: {
        B: true,
        C: true,
        D: true,
        E: true,
        F: true,
        G: true
      }
    })

Where table A will return > 20K rows throws an error

      Can't reach database server at `localhost`:`5432`

    Please make sure your database server is running at `localhost`:`5432`.

Additional context
Could be related to #4744 and a query optimisation can also resolve this

How to reproduce

Created a repo here: https://github.com/seromenho/prisma-join-performance
Failing test here: https://github.com/seromenho/prisma-join-performance/runs/6396987212?check_suite_focus=true

Expected behaviour

Should not throw an error and execute the query.
Meanwhile, it's not resolved even the error message might be improved as I suspect the reason is that it can't connect but instead related to the length of the queries.
I would expect a real join instead of this design choice of splitting it into multiple smaller sub-queries.

Prisma information

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = "postgresql://postgres:postgres@localhost:5432/mydb?schema=public"
}

model A {
  id String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  name String
  B  B[]
  C  C[]
  D  D[]
  E  E[]
  F  F[]
  G  G[]
}

model B {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model C {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model D {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model E {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model F {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model G {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

Environment & setup

  • OS:
  • Database:
  • Node.js version:

Prisma Version


  • Connector: PostgreSQL 14.2
  • Prisma: 3.14.0
  • Prisma client: 3.14.0
@seromenho seromenho added the kind/bug A reported bug. label May 11, 2022
@Weakky Weakky added team/client Issue for team Client. tech/engines Issue for tech Engines. topic: performance bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. labels May 14, 2022
@seromenho seromenho changed the title "Join" resulting in a big data set throws an error (performance) "Join" performance: Huge data set throws an error May 17, 2022
@seromenho seromenho changed the title "Join" performance: Huge data set throws an error "Join" performance: (Not so) huge data set throws an error May 17, 2022
@Gerrit-K
Copy link

Gerrit-K commented Jun 27, 2022

I think we're hitting the same issue. Any ideas on what exact limit is hit, causing the request to fail? I was first thinking about the parameter count, but prisma seems to limit that to 2^15. The next thing I could imagine would be some payload limit (because our IDs are UUIDs, so having at least ~32k*40 characters explodes the query size), but I don't know how to debug the engine. Is there perhaps any workaround? (e.g. by manually increasing some of those limits?)

@Gerrit-K
Copy link

So I've logged one of the "offending" queries in our case, which resulted in a ~1.2MB sql file. I then manually executed it on our postgresql server. It took a while (~1.5min, including the network time to download and log the results), but it worked first try - which, to me, rules out any misconfiguration on the DB server and confirms that the prisma engine is the culprit here.

@cgil
Copy link

cgil commented Aug 10, 2022

Hitting similar issues with 9 JOINS, raw SQL had a 10x speedup.

@Gerrit-K
Copy link

Sorry, forgot about this issue and didn't post my latest update here. We stumbled upon this discussion and this issue, which seemed related.

In our case, I got thrown off a bit by the number of query parameters. It seemed that prisma indeed started to chunk queries to not exceed 2^15 parameters, so I was thinking it must be something else. But actually, 2^15 is exactly one parameter too much, because the limit in JDBC when connecting to a postgres database is 32,767 (or 2^15 - 1).

The workaround was quite simple in that case: set the (unfortunately not well documented) QUERY_BATCH_SIZE to something lower. This resolved the Can't reach database server issue, but of course it doesn't really make your queries run faster ... but at least they finish eventually.

@jkomyno
Copy link
Contributor

jkomyno commented Sep 16, 2022

Thank you for opening this issue. Closing this in favour of #8832, which we were recently able to reproduce (different Prisma Client query, but same problem under the hood).

We're also aware of performance concerns in some particular cases, which we are investigating.

@jkomyno jkomyno closed this as completed Sep 16, 2022
@jkomyno jkomyno self-assigned this Sep 16, 2022
@janpio janpio added this to the 4.4.0 milestone Sep 23, 2022
@jkomyno
Copy link
Contributor

jkomyno commented Sep 28, 2022

We have closed the original issue in prisma@4.4.0, please see this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. kind/bug A reported bug. team/client Issue for team Client. tech/engines Issue for tech Engines. topic: joins topic: performance/queries topic: performance
Projects
None yet
Development

No branches or pull requests

8 participants