[Bug]: fatal "InvalidStateError: readyState not OPEN" error after losing connection #3519

Macil · 2024-03-31T03:42:46Z

Runtime

Deno

Runtime Version

1.42.0

Version

18.0.1

Describe the bug

Sometimes when I leave a Discord bot running on my computer, let it fall asleep, and then come back on, I find that the bot has exited with this error output:

error: Uncaught (in promise) InvalidStateError: readyState not OPEN
      shard.socket?.send(
                    ^
    at WebSocket.send (ext:deno_websocket/01_websocket.js:326:13)
    at https://deno.land/x/discordeno@18.0.1/gateway/shard/startHeartbeating.ts:52:21
    at eventLoopTick (ext:core/01_core.js:203:13)

This doesn't always happen when my bot is running while my computer falls asleep. Sometimes it works fine.

I have not looked into Discordeno's code, but having previously developed programs using Websockets and running into the same "readyState not OPEN" error, I know that this can be caused by calling .send() on a Websocket that is still connecting to a server or when the connection has been closed already. I think it's likely that Discordeno has a heartbeat timer that can call .send() while the connection is closed and not yet reopened and/or while the connection has been previously closed and is still reconnnecting now. The fix would be to either make the heartbeat function check the current state of the Websocket before sending with it, or to make the heartbeat timer shut down entirely once the websocket has been closed and restart once a new websocket connection has been established.

What should've happened?

Discordeno should always be able to handle a connection closing and reopening without a fatal error.

Code to reproduce the bug

import "https://deno.land/std@0.220.1/dotenv/load.ts";
import {
  createBot,
  Intents,
  startBot,
} from "https://deno.land/x/discordeno@18.0.1/mod.ts";

const bot = createBot({
  token: Deno.env.get("DISCORD_TOKEN")!,
  intents: Intents.Guilds | Intents.GuildMessages | Intents.DirectMessages |
    Intents.MessageContent,
  events: {
    ready(bot, payload) {
      console.log(new Date(), "ready");
    },
    messageCreate(bot, message) {
      console.log(new Date(), "messageCreate");
    },
  },
});
await startBot(bot);

The text was updated successfully, but these errors were encountered:

Macil · 2024-04-29T03:50:38Z

I decided to look into the code behind this so I could fix it myself, but it looks like this issue was already solved in 2714e1e, as the refactored startHeartbeating method in Shard.ts now checks the socket's readyState (through a call to this.isOpen()) immediately before calling send on it, so it won't try to send a heartbeat for a connection that's closed or still connecting. However, there haven't been any new releases since that commit. Is a release planned?

Fleny113 · 2024-04-29T04:39:40Z

I decided to look into the code behind this so I could fix it myself, but it looks like this issue was already solved in 2714e1e, as the refactored startHeartbeating method in Shard.ts now checks the socket's readyState (through a call to this.isOpen()) immediately before calling send on it, so it won't try to send a heartbeat for a connection that's closed or still connecting. However, there haven't been any new releases since that commit. Is a release planned?

The node migration is v19 that is still in development, so it's not yet released but there is a per-commit version ~~(stuck to an old commit rn, see #2977 and #3270)~~¹ . You are using Deno so to use the package you need to use the deno npm compatibility and use the @discordeno/bot package.

If you want to update to v19 there are some changes that aren't too much documented at the moment (like desired proprieties) but if you face any issue you can ask on our discord server (https://discord.gg/ddeno)

We re-enabled the relase on every commit, so now the npm version is up to date ↩

Macil · 2024-04-29T08:06:45Z

Trying to use the latest version (19.0.0-next.6ad4e1d) of the @discordeno/bot package fails with an error "Could not find npm package '@discordeno/gateway' matching '19.0.0-alpha.1'." (And trying to use the previous version, 19.0.0-next.d81b28a, fails because my events.messageCreate callback gets called with a message object that's just { bitfield: ToggleBitfield { bitfield: 0 }, flags: ToggleBitfield { bitfield: 0 } }).

(For the moment I've made a fork of the repo published on deno.land at 18.0.1 with my attempt at a fix.)

Fleny113 · 2024-04-29T11:11:24Z

Trying to use the latest version (19.0.0-next.6ad4e1d) of the @discordeno/bot package fails with an error "Could not find npm package '@discordeno/gateway' matching '19.0.0-alpha.1'." (And trying to use the previous version, 19.0.0-next.d81b28a, fails because my events.messageCreate callback gets called with a message object that's just { bitfield: ToggleBitfield { bitfield: 0 }, flags: ToggleBitfield { bitfield: 0 } }).

(For the moment I've made a fork of the repo published on deno.land at 18.0.1 with my attempt at a fix.)

I will look into the npm error, for the version before it's because of desired proprieties, you need to be explit on what proprieties you want on the Message object, the Interaction object and so on from bot.transformers.desiredProprieties

Re: discordeno#3519

Macil · 2024-04-30T02:04:15Z

I've actually just run into a new similar error while waking up my computer while using Discordeno, that seems to be in parts of code that are unchanged in v19:

error: Uncaught (in promise) InvalidStateError: readyState not OPEN
  shard.socket?.send(JSON.stringify(message));
                ^
    at WebSocket.send (ext:deno_websocket/01_websocket.js:326:13)
    at send (https://deno.land/x/discordeno_patched@18.0.2/gateway/shard/send.ts:26:17)
    at eventLoopTick (ext:core/01_core.js:168:7)
    at async Object.send (https://deno.land/x/discordeno_patched@18.0.2/gateway/shard/createShard.ts:147:14)

The relevant parts of send.ts (in v18, but v19 still has this same code as-is refactored into Shard.ts):

discordeno/gateway/shard/send.ts

Lines 3 to 27 in c522cd9

    
           async function checkOffline(shard: Shard, highPriority: boolean): Promise<void> { 
        
             if (!shard.isOpen()) { 
        
               await new Promise((resolve) => { 
        
                 if (highPriority) { 
        
                   // Higher priority requests get added at the beginning of the array. 
        
                   shard.offlineSendQueue.unshift(resolve); 
        
                 } else { 
        
                   shard.offlineSendQueue.push(resolve); 
        
                 } 
        
               }); 
        
             } 
        
           } 
        
           export async function send(shard: Shard, message: ShardSocketRequest, highPriority: boolean): Promise<void> { 
        
             // Before acquiring a token from the bucket, check whether the shard is currently offline or not. 
        
             // Else bucket and token wait time just get wasted. 
        
             await checkOffline(shard, highPriority); 
        
             await shard.bucket.acquire(1, highPriority); 
        
             // It's possible, that the shard went offline after a token has been acquired from the bucket. 
        
             await checkOffline(shard, highPriority); 
        
             shard.socket?.send(JSON.stringify(message)); 
        
           }

It seems like when await checkOffline(...); finished, the socket wasn't in the open state. Looking at the body of checkOffline(), the function could resolve while the socket is closed if the offlineSendQueue was resolved while the socket was closed.

Here are the places that the offlineSendQueue is resolved (v19 code, also identical here to v18):

discordeno/packages/gateway/src/Shard.ts

Lines 429 to 450 in 687c29d

    
           case 'RESUMED': 
        
             this.state = ShardState.Connected 
        
             this.events.resumed?.(this) 
        
             // Continue the requests which have been queued since the shard went offline. 
        
             this.offlineSendQueue.map((resolve) => resolve()) 
        
             this.resolves.get('RESUMED')?.(packet) 
        
             this.resolves.delete('RESUMED') 
        
             break 
        
           case 'READY': { 
        
             // Important for future resumes. 
        
             const payload = packet.d as DiscordReady 
        
             this.resumeGatewayUrl = payload.resume_gateway_url 
        
             this.sessionId = payload.session_id 
        
             this.state = ShardState.Connected 
        
             // Continue the requests which have been queued since the shard went offline. 
        
             // Important when this is a re-identify 
        
             this.offlineSendQueue.map((resolve) => resolve())

My guess is that the socket can be closed at this point, possibly because of any awaits above this code within the same function or because of other asynchronous steps between the message being read on the socket and this function being called. This seems confirmed to me because of this line above within the same function handling this exact condition, which would prevent this issue if done for RESUMED and READY packet handling too:

discordeno/packages/gateway/src/Shard.ts

Lines 340 to 343 in 687c29d

    
           switch (packet.op) { 
        
             case GatewayOpcodes.Heartbeat: { 
        
               // TODO: can this actually happen 
        
               if (!this.isOpen()) return

I'm not sure if if (!this.isOpen()) return above handling RESUMED and READY packets would be the best solution, or if code in the rest of the function is still important to run even when the socket is closed in which case a tighter if (this.isOpen()) { block around some of the code dealing with offlineSendQueue would be better.

Also I noticed that the offlineSendQueue is never emptied. The current code is a memory leak, though a very minor one because I think already-called Promise resolvers don't keep references to anything. And kind of a nitpick but a offlineSendQueue.forEach call would be a little better than a offlineSendQueue.map call if you don't want a new result list to be returned from the call. Here's a suggested fix which clears it. The resolve() calls don't do any real work synchronously and only cause the promise to execute its callbacks later in a microtask, so there's no risk that more items will be added to the offlineSendQueue in between the time we resolve everything in it and when we empty it.

       case 'RESUMED':
+        if (!this.isOpen()) return
         this.state = ShardState.Connected
         this.events.resumed?.(this)

         // Continue the requests which have been queued since the shard went offline.
-        this.offlineSendQueue.map((resolve) => resolve())
+        this.offlineSendQueue.forEach((resolve) => resolve())
+        this.offlineSendQueue.length = 0

         this.resolves.get('RESUMED')?.(packet)
         this.resolves.delete('RESUMED')
         break
       case 'READY': {
+        if (!this.isOpen()) return
         // Important for future resumes.
         const payload = packet.d as DiscordReady

         this.resumeGatewayUrl = payload.resume_gateway_url

         this.sessionId = payload.session_id
         this.state = ShardState.Connected

         // Continue the requests which have been queued since the shard went offline.
         // Important when this is a re-identify
-        this.offlineSendQueue.map((resolve) => resolve())
+        this.offlineSendQueue.forEach((resolve) => resolve())
+        this.offlineSendQueue.length = 0

Re: discordeno#3519

Fleny113 · 2024-04-30T05:19:36Z

At this point you made me notice another problem, if the events we call are sync calls to something sync blocking wouldn't that also cause the promise to resolve while the shard could have got closed by discord? Because if yes then there is also that to account for

Macil · 2024-04-30T19:13:22Z

When this.offlineSendQueue.forEach((resolve) => resolve()) is called, everything waiting on checkOffline() including the Shard send() method resumes in a microtask which is scheduled after the current JS execution ends but before any more IO events get dispatched to JS (like a websocket close event), so I think with my suggested fix, the only way the socket can close between checkOffline() ending and the send call in the Shard send() method is if Discordeno itself calls socket.close() in that time. (For example, this could happen if there are multiple places in Discordeno that are concurrently waiting on calls to await checkOffline(...);, the offlineSendQueue gets resolved, the first waiter on checkOffline() immediately calls socket.close(), and then all of the other subsequent waiters find the socket is closed. However, if Discordeno never calls socket.close(), or at least never calls it immediately after an await checkOffline(...);, then there shouldn't be any risk of that.)

If there's a risk of that or we want the code to be solid against that kind of thing, then offlineSendQueue needs not to be a list of promise resolvers that are all resolved together all at once, but a list of callbacks that are sequentially executed with a fresh isOpen() check before each one starts, so if one callback calls socket.close() then the subsequent callbacks stay queued:

class Shard {
  socket: WebSocket | undefined;
  isOpen() {
    return this.socket?.readyState === WebSocket.OPEN;
  }
  #runWhenOnlineCallbacks: Array<() => void> = [];
  #executeRunWhenOnlineCallbacks() {
    while (this.#runWhenOnlineCallbacks.length) {
      if (!this.isOpen()) break;
      const callback = this.#runWhenOnlineCallbacks.shift()!;
      callback();
    }
  }
  runWhenOnline<T>(callback: () => T, highPriority: boolean): Promise<T> {
    return new Promise((resolve) => {
      const queuedCallback = () => {
        resolve(callback());
      };
      if (highPriority) {
        // Higher priority requests get added at the beginning of the array.
        this.#runWhenOnlineCallbacks.unshift(queuedCallback);
      } else {
        this.#runWhenOnlineCallbacks.push(queuedCallback);
      }
    });
  }
  async send(message: ShardSocketRequest, highPriority: boolean): Promise<void> {
    await this.runWhenOnline(() => {
      this.socket!.send(JSON.stringify(message));
    }, highPriority);
  }
  async discordMessage(message) {
    // ...
    // some awaits etc
    if (message.type === 'RESUME') {
      this.#executeRunWhenOnlineCallbacks();
    }
    // ...
  }
}

Macil added t-bug Something isn't working w-unverified This has not been verified labels Mar 31, 2024

Macil added a commit to Macil/discordeno that referenced this issue Apr 29, 2024

fix discordeno#3519

b30b647

Fleny113 mentioned this issue Apr 29, 2024

fix: bump script ignoring some @discordeno deps #3547

Merged

Macil added a commit to Macil/discordeno that referenced this issue Apr 30, 2024

More closed socket fixes

3a7c4b2

Re: discordeno#3519

Macil added a commit to Macil/discordeno that referenced this issue Apr 30, 2024

More closed socket fixes

e265d36

Re: discordeno#3519

Fleny113 added w-verified This had been verified and removed w-unverified This has not been verified labels May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: fatal "InvalidStateError: readyState not OPEN" error after losing connection #3519

[Bug]: fatal "InvalidStateError: readyState not OPEN" error after losing connection #3519

Macil commented Mar 31, 2024

Macil commented Apr 29, 2024

Fleny113 commented Apr 29, 2024 •

edited

Macil commented Apr 29, 2024 •

edited

Fleny113 commented Apr 29, 2024

Macil commented Apr 30, 2024

Fleny113 commented Apr 30, 2024

Macil commented Apr 30, 2024

[Bug]: fatal "InvalidStateError: readyState not OPEN" error after losing connection #3519

[Bug]: fatal "InvalidStateError: readyState not OPEN" error after losing connection #3519

Comments

Macil commented Mar 31, 2024

Runtime

Runtime Version

Version

Describe the bug

What should've happened?

Code to reproduce the bug

Macil commented Apr 29, 2024

Fleny113 commented Apr 29, 2024 • edited

Footnotes

Macil commented Apr 29, 2024 • edited

Fleny113 commented Apr 29, 2024

Macil commented Apr 30, 2024

Fleny113 commented Apr 30, 2024

Macil commented Apr 30, 2024

Fleny113 commented Apr 29, 2024 •

edited

Macil commented Apr 29, 2024 •

edited