[Error 500] "Socket Hang Up" Randomly Occurring on any Routes in Production Mode #60148
Replies: 74 comments 6 replies
-
This issue will be easier to assess if you provide a simple project that reproduces this issue. Nevertheless, based on your stack trace, it looks like you are trying to connect to TLS/SSL socket (which I doubt Nextjs handles such a thing, it is probably handled by one of your libraries). Based on your dependencies too, I am gonna give a big shot that you are somehow trying to connect to a database to authenticate a user. This is a wild guess, but I think the connection between your web server to your database is somehow closed (or not stable, or anything in between really). This is already out of scope. But maybe, for a quick fix, you can check the connection to your database, or simply restart your Nextjs server (if possible. Because it will re-instantiate the database variable and database connection). |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Another thing I overlooked; Shouldn't database call trigger Perhaps the socket hangup is from the vercel side? They have HTTPS handling on their side, and if the client suddenly closes the connection whilst the request isn't complete, maybe it'll throw an error? But I don't think that's the case either. If it were the case many vercel users would have reported that already. Maybe you could provide a simple reproduction code, and see if I can reproduce it myself on vercel. |
Beta Was this translation helpful? Give feedback.
-
We cannot recreate the issue with the provided information. Please add a reproduction in order for us to be able to investigate. Why was this issue marked with the
|
Beta Was this translation helpful? Give feedback.
-
We're facing the same issue, also shortly after upgrading to next 13 Logs are looking like this: and then thousands of errors: This is random but never on fresh instances, for now each time (3 times had this problem) is occuring after days since deploy. Looks like once socket breaks it can't be recreated? I saw @SebastienSusini is using vercen, I'm using aws ecs tasks |
Beta Was this translation helpful? Give feedback.
-
btw I can't share whole project, and since it can happen after week and milion request, I'm not sure how easy would it be to recreate it. Maybe easier route would be enable some debug logs? @SebastienSusini are you having big traffic on that project? Are you experiencing this errors also after some time passes since deploy or randomly it can happen few minutes/hours after deploy? When you upgraded to 13 and how many these incidents you had? |
Beta Was this translation helpful? Give feedback.
-
I'm also seeing this same error, and it occurs 1/10 requests reliably on production, but cannot reproduce it locally with
Notably, my app renders the page, but occasionally throws this error in |
Beta Was this translation helpful? Give feedback.
-
Seems possibly related to #49587 |
Beta Was this translation helpful? Give feedback.
-
@SebastienSusini I also isolated this issue in my app to have started in next |
Beta Was this translation helpful? Give feedback.
-
13.4.12 with the same problem man |
Beta Was this translation helpful? Give feedback.
-
We are also running into this issue, with the same circumstances as described before:
Error: socket hang up
at connResetException (node:internal/errors:705:14)
at Socket.socketOnEnd (node:_http_client:518:23)
at Socket.emit (node:events:525:35)
at Socket.emit (node:domain:489:12)
at endReadableNT (node:internal/streams/readable:1358:12)
at processTicksAndRejections (node:internal/process/task_queues:83:21) {
code: 'ECONNRESET' We will now downgrade to For info, we are running on AWS EC2 instances. @0xadada do I understand correctly that in your case following requests do get handled? For us it seems to completely stop the server from being able to handle any requests after that point. |
Beta Was this translation helpful? Give feedback.
-
Socket hangups do occur from time to time if the client is aborting the connection, and it seems like after it aborted next.js still actively waiting for incoming TCP packets. There are few candidates where this error could occur, but since this error is happening on production mode where incoming traffic might be huge and really hard to reproduce on a small scale, pinpointing an exact part is hard. Nonetheless, I have some rough ideas where this problem(s) could be, based on the effects some were mentioned.
In both cases, next.js uses http-proxy to forward the requests between processes. Though I might write a proposal to rewrite IPC communications between next.js processes to handle requests better, (support for IPC callbacks; passing req,res pair to another process; etc) @dbrxnds With the first point described, does the subsequent request after the error failed immediately or are there timeout before the subsequent request failed? @0xadada I need to confirm, are you deploying this on a machine or shared hosting (vercel, etc)? Do you use appDir or pageDir or combination of both? |
Beta Was this translation helpful? Give feedback.
-
Appreciate the well written response, @NadhifRadityo. I am fairly certain subsequent requests just hang, at least for a good while. We end up getting an error response saying "the upstream server returned an invalid response" but I assume that is just the load balancer or some other part doing its' thing. Requests do just remain pending in your network tab until that point |
Beta Was this translation helpful? Give feedback.
-
This seems unlikely, but are there a chance of your next.js project does I/O operations synchronously or heavy synchronous tasks? Also I need to confirm, the request hangs for any routes right? (dynamic page, static page, static resources) And to make things sure, can you do a process list with process arguments, before and after the error? Search something like And for the record, do you use appDir only or pageDir or combination of both? I will try to eliminate IPC communication first as it makes the most sense in my opinion. I'll try manually killing the worker process, and see if I can reproduce the problem. |
Beta Was this translation helpful? Give feedback.
-
@NadhifRadityo yes, i've got next next.js process running in a shared Docker container with a ruby webserver. Ruby webclient makes HTTP requests to our next process on localhost running |
Beta Was this translation helpful? Give feedback.
-
I faced the same issue when I tried to upgrade Node.js version to 20 from 16, and it only occurs in production environment
However, after upgrading Next.js to latest (14.0.3), it seems the issues is gone |
Beta Was this translation helpful? Give feedback.
-
"next": "^14.0.3" the same issue when running custom server |
Beta Was this translation helpful? Give feedback.
-
I have just tried v14.0.4-canary.47 and the issue persists. I also tried Node.js v18 and v20. We are only using the App Router. We do not use Prisma or NextAuth. This is affecting builds hosted on Vercel.com (including production). It takes a little while for the issue to pop up after deploying, but after a few RSC renders, it happens quite often (~15% of the time). |
Beta Was this translation helpful? Give feedback.
-
Same issue here, happening very often, impossible to find where it comes from. I am using "next": "^14.0.4", with nextAuth, and nextJS middleware (my app uses also Wundergraph/sdk) Any update on this issue? Thx a lot |
Beta Was this translation helpful? Give feedback.
-
In case anyone is using Sentry, our issue turned out to be related to a bug with the https://github.com/orgs/vercel/discussions/3248#discussioncomment-7851868 |
Beta Was this translation helpful? Give feedback.
-
I am not using Sentry, but I would be very interested to understand which type of error from Sentry was solved. Indeed I have "Socket hang up error" quite often but have found no ways to track the issue for now... Thx I am using nextjs 14.0.4, nextauth with middleware, and wundergraph as backend. |
Beta Was this translation helpful? Give feedback.
-
Also have this exact problem, using NextJS 14.0.3. The error is not caught by our NextJS error boundary, and the user sees the default Vercel 500 error page (black background, white text). Site works after simply refreshing. My initial thought (before finding this thread) was; could this have something to do with cookies from Vercel preview deployments? |
Beta Was this translation helpful? Give feedback.
-
Also experiencing this issue. Node 18/20, Next 14.0.4, next-auth 4.24.5. Specifically, my
Not using Vercel, this is on a Windows Server 2022 VM and locally on my M1 Mac. Works fine running
I was seeing this intermittently, but over the last week something has changed and I am seeing it 100% of the time. Again, this all works perfectly when running Other issues I've found along the way that may be related?
|
Beta Was this translation helpful? Give feedback.
-
Exact same issue on Node 18, Next 14.0.4, next-auth 4.24.5. In
Not sure how it's related, but the random freezes we experienced in production every few days are now completely gone (2 weeks in a row without this issue)! I'd guess a user from time to time gets an unexpected error (from authenticating in our case), which triggered this 500 error page and since |
Beta Was this translation helpful? Give feedback.
-
Hello guys, if your using sentry and keeps getting 500 error, refer to this thread Error 500. This fixed my random error 500. |
Beta Was this translation helpful? Give feedback.
-
Same thing here with
for any request that takes more than 30s to complete. Seems related to the previous issue: I can see that the remote server continues to process and eventually completes and returns a correct response - not the 500 that nextjs is claiming. Is there any way to eliminate the timeout or customize it to a longer period? |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, I will be moving this issue to our We encourage folks to file a new issue with a consistently reproducible Happy 2024! |
Beta Was this translation helpful? Give feedback.
-
Hello, I had the same issue: Why do I mention these dependencies? Because in my case, the error was when the user tried to log in. At that moment, the bycript.compare function was executed, and, for some strange reason, if the credentials were correct, then the app would crash. I hope my case can help someone. |
Beta Was this translation helpful? Give feedback.
-
I have the following error:
after recently moving to Next v14 and using |
Beta Was this translation helpful? Give feedback.
-
Don't know if this will be useful for someone, but in my case I was wrongly calling a route segment from a layout. The call was being cached (revalidation was set to 10 minutes) so at build time we were getting the socket hang up error and that was the response until revalidation period ended and then we would get the result of actually calling the external API. |
Beta Was this translation helpful? Give feedback.
-
Verify canary release
Provide environment information
Operating System: Platform: darwin Arch: x64 Version: Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10 PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64 Binaries: Node: 16.14.2 npm: 8.5.0 Yarn: 1.22.15 pnpm: 6.11.0 Relevant packages: next: 13.4.6 eslint-config-next: 13.4.2 react: 18.2.0 react-dom: 18.2.0 typescript: 4.9.5
Which area(s) of Next.js are affected? (leave empty if unsure)
No response
Link to the code that reproduces this issue or a replay of the bug
not possible confidential
To Reproduce
this our package.json
our next.config.js :
our middleware.ts
Describe the Bug
We are experiencing a bug that occurs randomly for some of our users, only in production, on any route of the site, and it has never been reported on Sentry. We can only see it in the Vercel logs.
The full error message is as follows:
Uncaught Exception {"errorType":"Error","errorMessage":"socket hang up","code":"ECONNRESET","stack":["Error: socket hang up"," at connResetException (node:internal/errors:717:14)"," at TLSSocket.socketOnEnd (node:_http_client:526:23)"," at TLSSocket.emit (node:events:525:35)"," at TLSSocket.emit (node:domain:489:12)"," at endReadableNT (node:internal/streams/readable:1359:12)"," at process.processTicksAndRejections (node:internal/process/task_queues:82:21)"]} Unknown application error occurred Runtime.Unknown.
We think (but can't verify) that this bug appeared when we updated to Next.js 13. However, none of our pages use appRouter; we're still using Page Router for the time being. We've seen that rewrites can cause socket hangs, but as you can see in our next.config.js, we don't use rewrites.
This can happen on SSG (Static Site Generation), SSR (Server-Side Rendering), or Client-side rendered pages.
It can also happen on any browser or device.
Honestly, we have no clue or way of reproducing this problem because even in our development environment, we don't encounter any problems.
Expected Behavior
I expect the application to work seamlessly without any errors or disruptions. Specifically, I anticipate that the mentioned "Socket Hang Up" error will not occur randomly in production mode on any route of the site. Additionally, I hope that better error handling mechanisms will be implemented to address any potential issues that may arise.
Which browser are you using? (if relevant)
No response
How are you deploying your application? (if relevant)
Vercel
Beta Was this translation helpful? Give feedback.
All reactions