Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket.io client keeps disconnecting and reconnecting at a fixed interval because of ping timeout #1584

Open
valentin-at-edlio opened this issue May 25, 2023 · 19 comments
Labels
needs investigation This issue needs further investigations

Comments

@valentin-at-edlio
Copy link

valentin-at-edlio commented May 25, 2023

Recently I've updated my socket.io server and client to 4.6.1 and ever since then my client keeps disconnectiong and reconnecting at a fixed interval. The disconnection reason on the client is ping timeout...

It does not matter whether the tab is in the foreground or background.
I cannot reproduce this issue on localhost.

I still have a client that uses 2.2.0 and it does not reconnect at all and seems to be working correctly.

Left is 2.2.0, right is 4.6.1. They both connect to the same server.
Screenshot_601

And more importantly the 4.6.1 client receives the ping (2) and sends the pong (3) message to the server, but it does not seem to matter.
image

Socket.IO server version: 4.6.1

Server

    this.io = new Server(httpServer, {
      allowEIO3: true,
      serveClient: false
    });

Socket.IO client version: 4.6.1

Client

    this.socket = io(url, {
      transports: ['websocket'],
      reconnection: true,
      reconnectionAttempts: 50,
      reconnectionDelay: 5000,
      reconnectionDelayMax: 25000,
      auth: {
        token: *****
      }
    });

Expected behavior
I expect the 4.6.1 client not to reconnect at a fixed interval.

Platform:
I can reproduce with:

  • Chrome Version 113.0.5672.127 (Official Build) (64-bit)
  • Firefox 113.0.1 (64-bit)
  • Microsoft Edge Version 113.0.1774.50 (Official build) (64-bit)
@valentin-at-edlio valentin-at-edlio added the to triage Waiting to be triaged by a member of the team label May 25, 2023
@valentin-at-edlio valentin-at-edlio changed the title Socket.io client keeps disconnecting and reconnecting at a fixed interval Socket.io client keeps disconnecting and reconnecting at a fixed interval because of ping timeout May 25, 2023
@darrachequesne
Copy link
Member

Hi! Thanks for the details 👍

This "1.2 minute" delay is weird indeed. It does not seem related to the heartbeat mechanism, as pingInterval/pingTimeout values default to 25s/20s (your screenshot seems to agree with that).

Reference: https://socket.io/docs/v4/server-options/#pinginterval

I cannot reproduce this issue on localhost.

Could it be linked to the platform where your server is deployed? Does it happen if you regularly send some data from server to client (or vice versa)?

@valentin-at-edlio
Copy link
Author

valentin-at-edlio commented May 26, 2023

Could it be linked to the platform where your server is deployed?

Not sure since as I said an older client can connect just fine. Currently my server is running as an app service on Azure. I'll try to test it on a different platform...

Does it happen if you regularly send some data from server to client (or vice versa)?

Yes, it does. I tried spamming the server with an event to which it is supposed to respond with all online users. It worked right until the first ping-pong, then the server stopped answering.

image

I see that the new 4.x client receives a ping then sends a pong while the 2.x client sends the ping then receives the pong. I'm not sure if it matters, just pointing out.

I've tested different client versions and I can reproduce the issue with 3.1.3, but 2.5.0 works.

@selvaganes

This comment was marked as off-topic.

@selvaganes
Copy link

selvaganes commented May 29, 2023

After doing a bit more testing on different scenarios, I am adding a bit more information on the topic discussed here.

A) When I install up to version 4.4.1 using npm. I get the following message post install npm audit:-
npm audit report

socket.io-parser 4.0.4 - 4.2.2
Severity: critical
Insufficient validation when decoding a Socket.IO packet - GHSA-qm95-pgcg-qqfq
Insufficient validation when decoding a Socket.IO packet - GHSA-cqmj-92xf-r6r9
fix available via npm audit fix
node_modules/socket.io-client/node_modules/socket.io-parser
socket.io-client 1.0.0-pre - 1.0.1 || 4.3.0 - 4.4.1
Depends on vulnerable versions of socket.io-parser
node_modules/socket.io-client

2 critical severity vulnerabilities

To address all issues, run:
npm audit fix

If I were to run this unfixed without 'npm audit fix', everything runs fine on the event disconnect on the client
If I were to run 'npm audit fix' the disconnect event on the client with the reason 'parse error' is triggered and new socket sessions are created on the server side.

B) When I install Version 4.5.1. Get a clean audit report with no vulnerabilities. However, on the browser which uses the socket.io-client, the following error is thrown from the browser:-

[Error] ERROR – Error: invalid payload — index.js:215
Error: invalid payload — index.js:215decodeString — index.js:215add — index.js:118ondata — manager.js:192ondata(anonymous function) — index.mjs:136onPacket — socket.js:324onPacket(anonymous function) — index.mjs:136onPacket — transport.js:105onData — transport.js:97wrapFn — zone.js:769onInvokeTask — core.mjs:23899onInvokeTask — core.mjs:24197runTask — zone.js:178invokeTask — zone.js:490invokeTask — zone.js:1664globalCallback — zone.js:1695
handleError (vendor.js:71258)
next (vendor.js:87405)
next
next (vendor.js:41150)
_next (vendor.js:41119)
next (vendor.js:41092)
(anonymous function) (vendor.js:40929)
errorContext (vendor.js:44238)
next (vendor.js:40922)
emit (vendor.js:83398)
run (polyfills.js:7358)
onHandleError (vendor.js:86716)
runTask (polyfills.js:7406)
invokeTask (polyfills.js:7679)
invokeTask (polyfills.js:8830)
globalCallback (polyfills.js:8860)

The disconnect event on the client is not triggered. On the server side, the disconnect event that is displayed is "ping timeout"

C) On version 4.6.1, the audit report is clean with no vulnerabilities.
The disconnect event on the client with the reason 'parse error' is triggered.
The disconnect event on the server with the reason 'transport close' or 'transport error' is triggered as there seem to be many socket sessions for each new reconnect that was done.

The above scenario applies to both on localhost and remote servers.

Hope the above information helps more in finding the cause of this issue

@selvaganes
Copy link

D) On version 4.6.2, upgraded both server and client to the new version. Issue and symptoms are still the same.
The disconnect event on the client with the reason 'parse error' is triggered.
The disconnect event on the server with the reason 'transport close'

Additional Information.

When a request is sent, the client establishes a connection and performs the request. Then when another request is sent out, the disconnect happens, and a new connection is formed

The disconnect event is not fired on the client and server at the same time. While the disconnect with the reason 'parse error' is triggered immediately, there is no disconnect event on the server side. However, when the connection is terminated/disconnected intentionally, the server disconnect event is triggered multiple times with reason 'transport close' ( depending on how many requests were previously made ).

@yakirs57
Copy link

yakirs57 commented Jun 1, 2023

I have experienced the same issues. I had to revert back to the older version of socketio. If required I can also supply some logs and data.

@darrachequesne
Copy link
Member

@selvaganes I was not able to reproduce the "parse error" issue, could you please provide a reproducible example by forking the example here please?

"parse error" would mean that the server has sent some invalid packet, which seems weird. Could you please check the packet in the devtools?

@yakirs57 same question, could you please provide some way to reproduce the issue?

@valentin-at-edlio so, if I understand correctly, server v4 + client v3/v4 does not work, but server v4 + client v2 does?

I see that the new 4.x client receives a ping then sends a pong while the 2.x client sends the ping then receives the pong. I'm not sure if it matters, just pointing out.

Yes, that's indeed the main difference between v2 and v3/v4:

  • in v2, the client sends a PING and the server responds with a PONG
  • in v3/v4, the server sends a PING and the client responds with a PONG

@darrachequesne darrachequesne added needs investigation This issue needs further investigations and removed to triage Waiting to be triaged by a member of the team labels Jun 4, 2023
@selvaganes
Copy link

selvaganes commented Jun 6, 2023

@darrachequesne @yakirs57 @valentin-at-edlio
The error can be seen clearly here on the tweaked version of your fork.

The test scenarios described in the README

These is a null value in the packets, but I think you will also be able to see directly for better understanding.

Hope this is helpful and if you have anything further do let me know.

@darrachequesne
Copy link
Member

darrachequesne commented Jun 6, 2023

@selvaganes thanks a lot for taking the time to provide a reproducible test case ❤️

It seems there is an error in your example here: https://github.com/selvaganes/socket.io-fiddle-ouput/blob/e74a5909162bcc80e3f595ffe1832bf259f84f3d/server.js#L20

socket.on('request', (sneeze, callback ) => {
  console.log(`Request received from ${socket.id}: ${sneeze}`)
  socket.emit(callback(`Responded ${socket.id}: Bless you!!!`))
});

It should be either:

socket.on('request', (sneeze, callback ) => {
  console.log(`Request received from ${socket.id}: ${sneeze}`)
  callback(`Responded ${socket.id}: Bless you!!!`);
});

or

socket.on('request', (sneeze) => {
  console.log(`Request received from ${socket.id}: ${sneeze}`)
  socket.emit('response', `Responded ${socket.id}: Bless you!!!`);
});

In your example, socket.emit() is called with undefined (the return value of the callback()), which is now forbidden by the library (since socketio/socket.io-parser@3b78117) and cause the "parse error" disconnection.

@selvaganes
Copy link

Yes it works. Thank you very very very much :)

@valentin-at-edlio
Copy link
Author

valentin-at-edlio commented Jun 7, 2023

@valentin-at-edlio so, if I understand correctly, server v4 + client v3/v4 does not work, but server v4 + client v2 does?

@darrachequesne Yes, at least this is what I have experienced. I've upgraded my prod server to 4.6.1, but left all clients at 2.2.0 and everything seems to work well.

@darrachequesne
Copy link
Member

@valentin-at-edlio well, that's really weird. Do you have a test environment where we could investigate?

@nikhildigde
Copy link

Hi @darrachequesne ,
I am using a flask server with socket.io-client. Im too getting the same errors mentioned above by @valentin-at-edlio .

parse error when using 4.5.2 and above
v4.5.1 -> invalid payload -> decodeString

Funny enough, if I use this project (https://github.com/adrianhuber17/webSocket-App/blob/main/front-end/package-lock.json) as is (with the package-lock.json) things work well. But if I delete it and reinstall with 4.6.1 it again has the parse error.
I guess the deps then were fine :D

Any idea how this can be resolved?

@darrachequesne
Copy link
Member

@nikhildigde thanks for the reproducible example ❤️

It's because you are using reserved event names:

Those event names have a special meaning and cannot be used in your application.

Reference: https://socket.io/docs/v4/emit-cheatsheet/#reserved-events

The parser now includes a check: socketio/socket.io-parser@d9db473 (included in version 4.6.x)

@nikhildigde
Copy link

@darrachequesne - Thanks for the prompt response! Will change accordingly.

@zhengcling
Copy link

@valentin-at-edlio so, if I understand correctly, server v4 + client v3/v4 does not work, but server v4 + client v2 does?

I had the same problem in polling mode.

When decodeString is executed, the input parameters are as follows:

2["joined", { ...messages }]\x00\x03\x06\x01�42["joined", { ...messages }]\x00\x03\x05\t�42["joined", { ...messages }]\x00\x01\x00\x02\x01�42["joined", { ...messages }]42["joined", { ...messages }]

My server uses netty-socketio@2.0.3 and the client version is 4.7.2

@darrachequesne Any idea how this can be resolved? Thank you.

@maxime1992
Copy link

maxime1992 commented Feb 8, 2024

I suspect I'm having the same issue. 3 intense days of debugging and I still haven't found the root cause nor how to fix it.

After 1.2mn, the client is timing out with the reason "ping timeout". When this happens, it terminates the socket and opens a new one:

image

What's even worst is that the second time it does not try to reconnect:

image

This started to happen after a recent migration of socket.io in our codebase.

On the server here's my conf:

{
  cors: {
    origin: ['http://localhost:4200'],
  },
  transports: ['polling', 'websocket'],
}

and on the client:

{
  secure: true,
  transports: ['polling', 'websocket'],
  auth: { token },
}

Any idea? I'm happy to provide more context if needed I just don't know what else can be relevant.

EDIT: Actually, I'll share part of my yarn.lock so that you can see exactly what version is running

"@socket.io/component-emitter@~3.1.0":
  version "3.1.0"
  resolved "https://registry.npmjs.org/@socket.io/component-emitter/-/component-emitter-3.1.0.tgz#96116f2a912e0c02817345b3c10751069920d553"
  integrity sha512-+9jVqKhRSpsc591z5vX+X5Yyw+he/HCB4iQ/RYxw35CEPaY1gnsNE43nf9n9AaYjAQrTiI/mOwKUKdUs9vf7Xg==

"@socket.io/redis-adapter@8.2.1":
  version "8.2.1"
  resolved "https://registry.npmjs.org/@socket.io/redis-adapter/-/redis-adapter-8.2.1.tgz#36f75afc518d0e1fa4fa7c29e6d042f53ee7563b"
  integrity sha512-6Dt7EZgGSBP0qvXeOKGx7NnSr2tPMbVDfDyL97zerZo+v69hMfL99skMCL3RKZlWVqLyRme2T0wcy3udHhtOsg==
  dependencies:
    debug "~4.3.1"
    notepack.io "~3.0.1"
    uid2 "1.0.0"

socket.io-adapter@~2.5.2:
  version "2.5.2"
  resolved "https://registry.npmjs.org/socket.io-adapter/-/socket.io-adapter-2.5.2.tgz#5de9477c9182fdc171cd8c8364b9a8894ec75d12"
  integrity sha512-87C3LO/NOMc+eMcpcxUBebGjkpMDkNBS9tf7KJqcDsmL936EChtVva71Dw2q4tQcuVC+hAUy4an2NO/sYXmwRA==
  dependencies:
    ws "~8.11.0"

socket.io-client@4.7.4, socket.io-client@^4.4.1:
  version "4.7.4"
  resolved "https://registry.npmjs.org/socket.io-client/-/socket.io-client-4.7.4.tgz#5f0e060ff34ac0a4b4c5abaaa88e0d1d928c64c8"
  integrity sha512-wh+OkeF0rAVCrABWQBaEjLfb7DVPotMbu0cgWgyR0v6eA4EoVnAwcIeIbcdTE3GT/H3kbdLl7OoH2+asoDRIIg==
  dependencies:
    "@socket.io/component-emitter" "~3.1.0"
    debug "~4.3.2"
    engine.io-client "~6.5.2"
    socket.io-parser "~4.2.4"

socket.io-parser@~4.2.4:
  version "4.2.4"
  resolved "https://registry.npmjs.org/socket.io-parser/-/socket.io-parser-4.2.4.tgz#c806966cf7270601e47469ddeec30fbdfda44c83"
  integrity sha512-/GbIKmo8ioc+NIWIhwdecY0ge+qVBSMdgxGygevmdHj24bsfgtCmcUUcQ5ZzcylGFHsN3k4HB4Cgkl96KVnuew==
  dependencies:
    "@socket.io/component-emitter" "~3.1.0"
    debug "~4.3.1"

socket.io@4.7.4, socket.io@^4.4.1:
  version "4.7.4"
  resolved "https://registry.npmjs.org/socket.io/-/socket.io-4.7.4.tgz#2401a2d7101e4bdc64da80b140d5d8b6a8c7738b"
  integrity sha512-DcotgfP1Zg9iP/dH9zvAQcWrE0TtbMVwXmlV4T4mqsvY+gw+LqUGPfx2AoVyRk0FLME+GQhufDMyacFmw7ksqw==
  dependencies:
    accepts "~1.3.4"
    base64id "~2.0.0"
    cors "~2.8.5"
    debug "~4.3.2"
    engine.io "~6.5.2"
    socket.io-adapter "~2.5.2"
    socket.io-parser "~4.2.4"

EDIT 1: I just found the issue that was preventing it to reconnect properly so it's something pretty good already, but now I'm at the same stage as everyone here I believe, with all the websockets that are being created every minute or so

image

EDIT 2: I also have a question about this part of the doc: https://socket.io/docs/v4/server-options/#pinginterval

The server sends a ping packet every pingInterval ms, and if the client does not answer with a pong within pingTimeout ms, the server considers that the connection is closed.

There's nothing for the ping / pong that we need to implement ourselves right? It's built into the lib I'd guess? I'd think so and found nothing in the doc that says otherwise but as I keep hitting the "ping timeout" I start to wonder

@darrachequesne
Copy link
Member

@maxime1992 since the interval before disconnection is fairly stable (~1.2 minutes), I'd say something between your server and your client is closing the connection. Do you have any proxy (nginx, httpd) between them?

There's nothing for the ping / pong that we need to implement ourselves right? It's built into the lib I'd guess?

Yes it is. The server sends a ping every pingInterval (25s by default), and expects a pong from the client before pingTimeout (20s by default).

Are you able to reproduce the issue in local? In that case, could you please provide some code?

@maxime1992
Copy link

maxime1992 commented Feb 12, 2024

since the interval before disconnection is fairly stable (~1.2 minutes), I'd say something between your server and your client is closing the connection

Most likely yes, the interval is indeed really stable so quite suspicious...

Do you have any proxy (nginx, httpd) between them?

It's an angular app, so it is served through angular cli but I don't think it matters in this case.

As for the backend handling the websockets, it's built using NestJS so maybe something on that side even though I'd question my own code first before questioning NestJS code 🙃.

since the interval before disconnection is fairly stable (~1.2 minutes), I'd say something between your server and your client is closing the connection. Do you have any proxy (nginx, httpd) between them?

There's nothing for the ping / pong that we need to implement ourselves right? It's built into the lib I'd guess?

Yes it is. The server sends a ping every pingInterval (25s by default), and expects a pong from the client before pingTimeout (20s by default).

Are you able to reproduce the issue in local? In that case, could you please provide some code?

I have tried pretty hard to repro in 3 different configs:

  • Pure node js and socket IO
  • Pure node js and socket IO + redis (as I suspected initially it may have been related)
  • NestJS app with socket IO and redis

Unfortunately I still haven't managed to repro even though the code looks similar...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs investigation This issue needs further investigations
Projects
None yet
Development

No branches or pull requests

7 participants