Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: fatal "InvalidStateError: readyState not OPEN" error after losing connection #3519

Open
Macil opened this issue Mar 31, 2024 · 7 comments
Labels
t-bug Something isn't working w-verified This had been verified

Comments

@Macil
Copy link

Macil commented Mar 31, 2024

Runtime

Deno

Runtime Version

1.42.0

Version

18.0.1

Describe the bug

Sometimes when I leave a Discord bot running on my computer, let it fall asleep, and then come back on, I find that the bot has exited with this error output:

error: Uncaught (in promise) InvalidStateError: readyState not OPEN
      shard.socket?.send(
                    ^
    at WebSocket.send (ext:deno_websocket/01_websocket.js:326:13)
    at https://deno.land/x/discordeno@18.0.1/gateway/shard/startHeartbeating.ts:52:21
    at eventLoopTick (ext:core/01_core.js:203:13)

This doesn't always happen when my bot is running while my computer falls asleep. Sometimes it works fine.

I have not looked into Discordeno's code, but having previously developed programs using Websockets and running into the same "readyState not OPEN" error, I know that this can be caused by calling .send() on a Websocket that is still connecting to a server or when the connection has been closed already. I think it's likely that Discordeno has a heartbeat timer that can call .send() while the connection is closed and not yet reopened and/or while the connection has been previously closed and is still reconnnecting now. The fix would be to either make the heartbeat function check the current state of the Websocket before sending with it, or to make the heartbeat timer shut down entirely once the websocket has been closed and restart once a new websocket connection has been established.

What should've happened?

Discordeno should always be able to handle a connection closing and reopening without a fatal error.

Code to reproduce the bug

import "https://deno.land/std@0.220.1/dotenv/load.ts";
import {
  createBot,
  Intents,
  startBot,
} from "https://deno.land/x/discordeno@18.0.1/mod.ts";

const bot = createBot({
  token: Deno.env.get("DISCORD_TOKEN")!,
  intents: Intents.Guilds | Intents.GuildMessages | Intents.DirectMessages |
    Intents.MessageContent,
  events: {
    ready(bot, payload) {
      console.log(new Date(), "ready");
    },
    messageCreate(bot, message) {
      console.log(new Date(), "messageCreate");
    },
  },
});
await startBot(bot);
@Macil Macil added t-bug Something isn't working w-unverified This has not been verified labels Mar 31, 2024
@Macil
Copy link
Author

Macil commented Apr 29, 2024

I decided to look into the code behind this so I could fix it myself, but it looks like this issue was already solved in 2714e1e, as the refactored startHeartbeating method in Shard.ts now checks the socket's readyState (through a call to this.isOpen()) immediately before calling send on it, so it won't try to send a heartbeat for a connection that's closed or still connecting. However, there haven't been any new releases since that commit. Is a release planned?

@Fleny113
Copy link
Contributor

Fleny113 commented Apr 29, 2024

I decided to look into the code behind this so I could fix it myself, but it looks like this issue was already solved in 2714e1e, as the refactored startHeartbeating method in Shard.ts now checks the socket's readyState (through a call to this.isOpen()) immediately before calling send on it, so it won't try to send a heartbeat for a connection that's closed or still connecting. However, there haven't been any new releases since that commit. Is a release planned?

The node migration is v19 that is still in development, so it's not yet released but there is a per-commit version (stuck to an old commit rn, see #2977 and #3270)1 . You are using Deno so to use the package you need to use the deno npm compatibility and use the @discordeno/bot package.

If you want to update to v19 there are some changes that aren't too much documented at the moment (like desired proprieties) but if you face any issue you can ask on our discord server (https://discord.gg/ddeno)

Footnotes

  1. We re-enabled the relase on every commit, so now the npm version is up to date

Macil added a commit to Macil/discordeno that referenced this issue Apr 29, 2024
@Macil
Copy link
Author

Macil commented Apr 29, 2024

Trying to use the latest version (19.0.0-next.6ad4e1d) of the @discordeno/bot package fails with an error "Could not find npm package '@discordeno/gateway' matching '19.0.0-alpha.1'." (And trying to use the previous version, 19.0.0-next.d81b28a, fails because my events.messageCreate callback gets called with a message object that's just { bitfield: ToggleBitfield { bitfield: 0 }, flags: ToggleBitfield { bitfield: 0 } }).

(For the moment I've made a fork of the repo published on deno.land at 18.0.1 with my attempt at a fix.)

@Fleny113
Copy link
Contributor

Trying to use the latest version (19.0.0-next.6ad4e1d) of the @discordeno/bot package fails with an error "Could not find npm package '@discordeno/gateway' matching '19.0.0-alpha.1'." (And trying to use the previous version, 19.0.0-next.d81b28a, fails because my events.messageCreate callback gets called with a message object that's just { bitfield: ToggleBitfield { bitfield: 0 }, flags: ToggleBitfield { bitfield: 0 } }).

(For the moment I've made a fork of the repo published on deno.land at 18.0.1 with my attempt at a fix.)

I will look into the npm error, for the version before it's because of desired proprieties, you need to be explit on what proprieties you want on the Message object, the Interaction object and so on from bot.transformers.desiredProprieties

Macil added a commit to Macil/discordeno that referenced this issue Apr 30, 2024
@Macil
Copy link
Author

Macil commented Apr 30, 2024

I've actually just run into a new similar error while waking up my computer while using Discordeno, that seems to be in parts of code that are unchanged in v19:

error: Uncaught (in promise) InvalidStateError: readyState not OPEN
  shard.socket?.send(JSON.stringify(message));
                ^
    at WebSocket.send (ext:deno_websocket/01_websocket.js:326:13)
    at send (https://deno.land/x/discordeno_patched@18.0.2/gateway/shard/send.ts:26:17)
    at eventLoopTick (ext:core/01_core.js:168:7)
    at async Object.send (https://deno.land/x/discordeno_patched@18.0.2/gateway/shard/createShard.ts:147:14)

The relevant parts of send.ts (in v18, but v19 still has this same code as-is refactored into Shard.ts):

async function checkOffline(shard: Shard, highPriority: boolean): Promise<void> {
if (!shard.isOpen()) {
await new Promise((resolve) => {
if (highPriority) {
// Higher priority requests get added at the beginning of the array.
shard.offlineSendQueue.unshift(resolve);
} else {
shard.offlineSendQueue.push(resolve);
}
});
}
}
export async function send(shard: Shard, message: ShardSocketRequest, highPriority: boolean): Promise<void> {
// Before acquiring a token from the bucket, check whether the shard is currently offline or not.
// Else bucket and token wait time just get wasted.
await checkOffline(shard, highPriority);
await shard.bucket.acquire(1, highPriority);
// It's possible, that the shard went offline after a token has been acquired from the bucket.
await checkOffline(shard, highPriority);
shard.socket?.send(JSON.stringify(message));
}

It seems like when await checkOffline(...); finished, the socket wasn't in the open state. Looking at the body of checkOffline(), the function could resolve while the socket is closed if the offlineSendQueue was resolved while the socket was closed.

Here are the places that the offlineSendQueue is resolved (v19 code, also identical here to v18):

case 'RESUMED':
this.state = ShardState.Connected
this.events.resumed?.(this)
// Continue the requests which have been queued since the shard went offline.
this.offlineSendQueue.map((resolve) => resolve())
this.resolves.get('RESUMED')?.(packet)
this.resolves.delete('RESUMED')
break
case 'READY': {
// Important for future resumes.
const payload = packet.d as DiscordReady
this.resumeGatewayUrl = payload.resume_gateway_url
this.sessionId = payload.session_id
this.state = ShardState.Connected
// Continue the requests which have been queued since the shard went offline.
// Important when this is a re-identify
this.offlineSendQueue.map((resolve) => resolve())

My guess is that the socket can be closed at this point, possibly because of any awaits above this code within the same function or because of other asynchronous steps between the message being read on the socket and this function being called. This seems confirmed to me because of this line above within the same function handling this exact condition, which would prevent this issue if done for RESUMED and READY packet handling too:

switch (packet.op) {
case GatewayOpcodes.Heartbeat: {
// TODO: can this actually happen
if (!this.isOpen()) return

I'm not sure if if (!this.isOpen()) return above handling RESUMED and READY packets would be the best solution, or if code in the rest of the function is still important to run even when the socket is closed in which case a tighter if (this.isOpen()) { block around some of the code dealing with offlineSendQueue would be better.

Also I noticed that the offlineSendQueue is never emptied. The current code is a memory leak, though a very minor one because I think already-called Promise resolvers don't keep references to anything. And kind of a nitpick but a offlineSendQueue.forEach call would be a little better than a offlineSendQueue.map call if you don't want a new result list to be returned from the call. Here's a suggested fix which clears it. The resolve() calls don't do any real work synchronously and only cause the promise to execute its callbacks later in a microtask, so there's no risk that more items will be added to the offlineSendQueue in between the time we resolve everything in it and when we empty it.

       case 'RESUMED':
+        if (!this.isOpen()) return
         this.state = ShardState.Connected
         this.events.resumed?.(this)

         // Continue the requests which have been queued since the shard went offline.
-        this.offlineSendQueue.map((resolve) => resolve())
+        this.offlineSendQueue.forEach((resolve) => resolve())
+        this.offlineSendQueue.length = 0

         this.resolves.get('RESUMED')?.(packet)
         this.resolves.delete('RESUMED')
         break
       case 'READY': {
+        if (!this.isOpen()) return
         // Important for future resumes.
         const payload = packet.d as DiscordReady

         this.resumeGatewayUrl = payload.resume_gateway_url

         this.sessionId = payload.session_id
         this.state = ShardState.Connected

         // Continue the requests which have been queued since the shard went offline.
         // Important when this is a re-identify
-        this.offlineSendQueue.map((resolve) => resolve())
+        this.offlineSendQueue.forEach((resolve) => resolve())
+        this.offlineSendQueue.length = 0

Macil added a commit to Macil/discordeno that referenced this issue Apr 30, 2024
@Fleny113
Copy link
Contributor

At this point you made me notice another problem, if the events we call are sync calls to something sync blocking wouldn't that also cause the promise to resolve while the shard could have got closed by discord? Because if yes then there is also that to account for

@Macil
Copy link
Author

Macil commented Apr 30, 2024

When this.offlineSendQueue.forEach((resolve) => resolve()) is called, everything waiting on checkOffline() including the Shard send() method resumes in a microtask which is scheduled after the current JS execution ends but before any more IO events get dispatched to JS (like a websocket close event), so I think with my suggested fix, the only way the socket can close between checkOffline() ending and the send call in the Shard send() method is if Discordeno itself calls socket.close() in that time. (For example, this could happen if there are multiple places in Discordeno that are concurrently waiting on calls to await checkOffline(...);, the offlineSendQueue gets resolved, the first waiter on checkOffline() immediately calls socket.close(), and then all of the other subsequent waiters find the socket is closed. However, if Discordeno never calls socket.close(), or at least never calls it immediately after an await checkOffline(...);, then there shouldn't be any risk of that.)

If there's a risk of that or we want the code to be solid against that kind of thing, then offlineSendQueue needs not to be a list of promise resolvers that are all resolved together all at once, but a list of callbacks that are sequentially executed with a fresh isOpen() check before each one starts, so if one callback calls socket.close() then the subsequent callbacks stay queued:

class Shard {
  socket: WebSocket | undefined;
  isOpen() {
    return this.socket?.readyState === WebSocket.OPEN;
  }
  #runWhenOnlineCallbacks: Array<() => void> = [];
  #executeRunWhenOnlineCallbacks() {
    while (this.#runWhenOnlineCallbacks.length) {
      if (!this.isOpen()) break;
      const callback = this.#runWhenOnlineCallbacks.shift()!;
      callback();
    }
  }
  runWhenOnline<T>(callback: () => T, highPriority: boolean): Promise<T> {
    return new Promise((resolve) => {
      const queuedCallback = () => {
        resolve(callback());
      };
      if (highPriority) {
        // Higher priority requests get added at the beginning of the array.
        this.#runWhenOnlineCallbacks.unshift(queuedCallback);
      } else {
        this.#runWhenOnlineCallbacks.push(queuedCallback);
      }
    });
  }
  async send(message: ShardSocketRequest, highPriority: boolean): Promise<void> {
    await this.runWhenOnline(() => {
      this.socket!.send(JSON.stringify(message));
    }, highPriority);
  }
  async discordMessage(message) {
    // ...
    // some awaits etc
    if (message.type === 'RESUME') {
      this.#executeRunWhenOnlineCallbacks();
    }
    // ...
  }
}

@Fleny113 Fleny113 added w-verified This had been verified and removed w-unverified This has not been verified labels May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-bug Something isn't working w-verified This had been verified
Projects
None yet
Development

No branches or pull requests

2 participants