Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart engine on panic #2100

Closed
Sytten opened this issue Apr 5, 2020 · 9 comments
Closed

Restart engine on panic #2100

Sytten opened this issue Apr 5, 2020 · 9 comments
Assignees
Labels
kind/improvement An improvement to existing feature and code. tech/typescript Issue for tech TypeScript.
Milestone

Comments

@Sytten
Copy link
Contributor

Sytten commented Apr 5, 2020

Problem

As I understand it, if the engine panics during a request, the client will be rendered nonoperational. Since I only create one client for my application (shared via the Context in an Apollo server), it would mean that the service would need to be restarted. This is problematic for uptime since it would affect other customers that have queries that work.

Solution

It would be best if the client would restart the engine automatically. Otherwise it would be best to document that we need to create a client or call connect for each request.

Additional context

This is needed for a production deployment since errors will happen and they need to be not catastrophic.

@Sytten
Copy link
Contributor Author

Sytten commented Apr 5, 2020

I dig a bit and I have some findings:

  • For each test, I killed the engine process manually (simulating a panic)
  • Calling connect on the client for each request did not restart the process
  • Creating a new client for each request starts a new process

@schickling schickling added bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. process/candidate labels Apr 5, 2020
@entrptaher
Copy link

entrptaher commented Apr 6, 2020

+1 for this.

I just tested the prisma.connect() for each request and it has a heavy burden of 15-35ms in any given case. Here is a random case for just one request.

connect: 31.921ms
promises:push(user.create): 24.488ms
disconnect: 2.218ms

If I don't connect it myself, then it connects on first request,

promises:push(user.create): 57.864ms
disconnect: 2.666ms

I'm disconnecting purposefully because last night I saw a lot of zombie processes during a little benchmark of prisma2. It's already half slower than the prisma 1 in that case just because we have to avoid the panic.

However, repeated queries are fast (even faster) enough, but the engine panic still remains an issue,

➜  prisma-1 node index.js
createUser: 24.935ms
createUser: 9.794ms
createUser: 8.186ms
createUser: 8.543ms
createUser: 8.417ms
createUser: 8.088ms
createUser: 8.178ms
createUser: 7.846ms
createUser: 8.537ms
createUser: 8.128ms

➜  prisma-2 node index.js
connect: 37.185ms
user.create: 24.196ms
user.create: 3.773ms
user.create: 3.126ms
user.create: 2.653ms
user.create: 2.588ms
user.create: 2.713ms
user.create: 2.811ms
user.create: 3.830ms
user.create: 3.053ms
user.create: 2.302ms
disconnect: 2.423ms

Zombie Process

I posted this over the prisma.slack.com thread,
and I think this might be related, if there are panic, it should be cleaned properly.

image

@pantharshit00 pantharshit00 added kind/improvement An improvement to existing feature and code. and removed bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. labels Apr 8, 2020
@divyenduz divyenduz added this to the Beta 3 milestone Apr 9, 2020
@divyenduz divyenduz added tech/typescript Issue for tech TypeScript. and removed process/candidate labels Apr 9, 2020
@mavilein
Copy link
Member

@Sytten : If there's a panic during request execution the engine process will not die. The only panics that may kill the process are the ones that happen before the server has finished starting.

@Sytten
Copy link
Contributor Author

Sytten commented Apr 16, 2020

@mavilein I was discussing it with @timsuchanek and he explain that, but my whole premise is based on the experience I had on a panic that crashed the rust process. It is probably fixed now, but I still think that it can happen until the Neon binding is implemented. Meaning that if the child dies in the JS code, it should be restarted (if the client wants that).
So the way we are currently planning would be:

  1. In the async intercept in NodeEngine, we know if the process died/exited/crashed
  2. We have to bubble that to the PrismaClient and remove the connectPromise
  3. The client can then call connect on each request in a middleware to restart the engine
  4. Requests will wait for the engine to restart

@wSedlacek
Copy link

I am running into an issue with limited memory with a large database and exposing queries pubically. The result is the query engine dieing when memory runs out which is to be expected then all future request failing due to the query engine being dead. While I should limit request so they don't end up this large of someone does find a request that kills the engine it would be nice if that engine didn't require a manual restart of the application.

@kindywu
Copy link

kindywu commented May 20, 2020

engine crash and never come back

findMany(
{
skip: -1
}
)

@janpio janpio modified the milestones: Beta 6, Beta 7 May 26, 2020
@timsuchanek
Copy link
Contributor

Thanks a lot for reporting 🙏
This issue is fixed in the latest alpha version of @prisma/cli.
You can try it out with npm i -g @prisma/cli@alpha.

In case it’s not fixed for you - please let us know and we’ll reopen this issue!

@entrptaher
Copy link

Oh very interesting. Gonna test out.

@Sytten
Copy link
Contributor Author

Sytten commented May 29, 2020

Will test it too this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/improvement An improvement to existing feature and code. tech/typescript Issue for tech TypeScript.
Projects
None yet
Development

No branches or pull requests

10 participants