"Join" performance: (Not so) huge data set throws an error #13306

seromenho · 2022-05-11T23:24:27Z

Bug description

"Joining" 5-6 tables with a data set > 20k rows (not that big) is a bottleneck on the DB given how Prisma does the "joins". Each join creates queries with all the ids from the previous one.
This is a non-issue for the DB when using a join but the way Prisma is doing the "join" it is not performant and the CPU doesn't handle it.

Executing

await prisma.a.findMany({
      where: {
        name: '40k records',
      },
      select: {
        B: true,
        C: true,
        D: true,
        E: true,
        F: true,
        G: true
      }
    })

Where table A will return > 20K rows throws an error

      Can't reach database server at `localhost`:`5432`

    Please make sure your database server is running at `localhost`:`5432`.

Additional context
Could be related to #4744 and a query optimisation can also resolve this

How to reproduce

Created a repo here: https://github.com/seromenho/prisma-join-performance
Failing test here: https://github.com/seromenho/prisma-join-performance/runs/6396987212?check_suite_focus=true

Expected behaviour

Should not throw an error and execute the query.
Meanwhile, it's not resolved even the error message might be improved as I suspect the reason is that it can't connect but instead related to the length of the queries.
I would expect a real join instead of this design choice of splitting it into multiple smaller sub-queries.

Prisma information

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = "postgresql://postgres:postgres@localhost:5432/mydb?schema=public"
}

model A {
  id String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  name String
  B  B[]
  C  C[]
  D  D[]
  E  E[]
  F  F[]
  G  G[]
}

model B {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model C {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model D {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model E {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model F {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

model G {
  id  String @id @default(dbgenerated("uuid_generate_v4()")) @db.Uuid
  a   A      @relation(fields: [aId], references: [id])
  aId String @db.Uuid
}

Environment & setup

OS:
Database:
Node.js version:

Prisma Version

Connector: PostgreSQL 14.2
Prisma: 3.14.0
Prisma client: 3.14.0

The text was updated successfully, but these errors were encountered:

Gerrit-K · 2022-06-27T09:31:36Z

I think we're hitting the same issue. Any ideas on what exact limit is hit, causing the request to fail? I was first thinking about the parameter count, but prisma seems to limit that to 2^15. The next thing I could imagine would be some payload limit (because our IDs are UUIDs, so having at least ~32k*40 characters explodes the query size), but I don't know how to debug the engine. Is there perhaps any workaround? (e.g. by manually increasing some of those limits?)

Gerrit-K · 2022-06-28T13:50:45Z

So I've logged one of the "offending" queries in our case, which resulted in a ~1.2MB sql file. I then manually executed it on our postgresql server. It took a while (~1.5min, including the network time to download and log the results), but it worked first try - which, to me, rules out any misconfiguration on the DB server and confirms that the prisma engine is the culprit here.

cgil · 2022-08-10T00:11:50Z

Hitting similar issues with 9 JOINS, raw SQL had a 10x speedup.

Gerrit-K · 2022-08-10T08:00:24Z

Sorry, forgot about this issue and didn't post my latest update here. We stumbled upon this discussion and this issue, which seemed related.

In our case, I got thrown off a bit by the number of query parameters. It seemed that prisma indeed started to chunk queries to not exceed 2^15 parameters, so I was thinking it must be something else. But actually, 2^15 is exactly one parameter too much, because the limit in JDBC when connecting to a postgres database is 32,767 (or 2^15 - 1).

The workaround was quite simple in that case: set the (unfortunately not well documented) QUERY_BATCH_SIZE to something lower. This resolved the Can't reach database server issue, but of course it doesn't really make your queries run faster ... but at least they finish eventually.

jkomyno · 2022-09-16T12:03:55Z

Thank you for opening this issue. Closing this in favour of #8832, which we were recently able to reproduce (different Prisma Client query, but same problem under the hood).

We're also aware of performance concerns in some particular cases, which we are investigating.

jkomyno · 2022-09-28T14:17:54Z

We have closed the original issue in prisma@4.4.0, please see this comment.

seromenho added the kind/bug A reported bug. label May 11, 2022

Weakky added team/client Issue for team Client. tech/engines Issue for tech Engines. topic: performance bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. labels May 14, 2022

seromenho mentioned this issue May 16, 2022

Prisma join performances prisma/prisma1#4744

Closed

seromenho changed the title ~~"Join" resulting in a big data set throws an error (performance)~~ "Join" performance: Huge data set throws an error May 17, 2022

seromenho changed the title ~~"Join" performance: Huge data set throws an error~~ "Join" performance: (Not so) huge data set throws an error May 17, 2022

Janushan mentioned this issue Jun 18, 2022

Intermittent P1001 - "Can't reach database server..." #13770

Closed

dpetrick added size/unknown labels Jul 8, 2022

matthewmueller added topic: joins topic: core query optimization topic: performance/queries and removed topic: core query optimization labels Jul 8, 2022

pranavbadami mentioned this issue Sep 3, 2022

Prisma generated query takes 10x as long than a manually composed one #14402

Open

jkomyno closed this as completed Sep 16, 2022

jkomyno self-assigned this Sep 16, 2022

janpio added this to the 4.4.0 milestone Sep 23, 2022

seromenho mentioned this issue Jun 5, 2023

Use JOINs in more queries #5184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Join" performance: (Not so) huge data set throws an error #13306

"Join" performance: (Not so) huge data set throws an error #13306

seromenho commented May 11, 2022 •

edited

Gerrit-K commented Jun 27, 2022 •

edited

Gerrit-K commented Jun 28, 2022

cgil commented Aug 10, 2022

Gerrit-K commented Aug 10, 2022

jkomyno commented Sep 16, 2022

jkomyno commented Sep 28, 2022

"Join" performance: (Not so) huge data set throws an error #13306

"Join" performance: (Not so) huge data set throws an error #13306

Comments

seromenho commented May 11, 2022 • edited

Bug description

How to reproduce

Expected behaviour

Prisma information

Environment & setup

Prisma Version

Gerrit-K commented Jun 27, 2022 • edited

Gerrit-K commented Jun 28, 2022

cgil commented Aug 10, 2022

Gerrit-K commented Aug 10, 2022

jkomyno commented Sep 16, 2022

jkomyno commented Sep 28, 2022

seromenho commented May 11, 2022 •

edited

Gerrit-K commented Jun 27, 2022 •

edited