Roadmap: raw data storage and state changes #6567

kyranet · 2021-08-30T13:07:39Z

In many places, we have methods and properties that either return raw or filled API data. Particularly, the union type Message | APIMessage, notably used in BaseCommandInteraction. What if we apply this raw data return concept to Structure#toJSON() so it can be used to clone structures? We could then use new Message(client, message.toJSON()), for example; however, we want to keep the current type's versatility in consolidating data Discord might not give us.

Introducing: Structure#data (or Structure#raw), a property containing the raw data from Discord, creating numerous advantages:

Consolidates raw and cached data into one central cached structure, which means no more Message | APIMessage unions - yay!
Simplifies Structure#toJSON() and improves its performance significantly because the data is raw already:
```
toJSON() {
  return { ...this.data };
}
```
We could use discord-api-types to strict-type the return of those toJSON() methods.
Makes cloning structures easier:
```
clone() {
  return new this.constructor(this.client, this.toJSON());
}
```
Expanding fix(Message): make #channel and #guild getters #6271 to more structures so they're unbound from state and accept 2 parameters (client and data) may be a valuable change.
Eliminates the need for new releases to support Discord emitting new data (although types may conflict).

Simplifies patching:

_patch(data) {
  const { roles, channels, ...raw } = data;

  this.data = { ...this.data, ...raw };

  if (roles) {
    // Handle roles
  }

  if (channels) {
    // Handle channels
  }

  // ...
}

Avoids mutating the raw data, which means new Message(client, Object.freeze(data)) won't throw.

Overall, this is a huge improvement with few downsides - alas, not zero. Now for the hard parts.

The raw data must be exposed so users can access it, but do we want to expose getters so they can use message.content instead of message.data.content? That'd make the entire change semver: minor, but at a maintenance cost. On the other hand, forcing the usage of the new data field is similar to the cache change from feat: remove datastores and implement Managers #3696, and despite it bringing consistency, it's a massive breaking change (semver: major).
What about managers, e.g. guild.channels? I have three solutions:
1. Store the raw data (guild.data.channels) and construct the manager's cache on demand, reducing memory use - when the cache isn't used. We use them often internally via DataManager#resolve{,Id}, which would create maps repeatedly. Imagine the performance impact on Client#emojis (Performance issue with client.emojis getter #6248), or the much larger Client#users.
2. Store the raw data as an array-backed binary search tree.
  Doing so has the positive memory impact of the prior solution (~4x less memory usage) at the cost of CPU time: creating the BST requires a full sort (O(n * log n)), compared to the single iteration (O(n)) required for a Map to be populated. Furthermore, get and has operations have an O(log n) computational cost compared to Map's O(1). The lack of hashing removes CPU overhead required for the search, and allows index accesses (from start to end and vice versa, as well as random access), so I'd argue that the cost for reading is offset by those features.
  Unfortunately, when we want to write to the BST, performance drops through the floor and down to hell as it requires a search (O(log n)) followed by an Array#splice (O(n)) call. This makes array-backed BSTs perform nearly the same as ordered maps when reading, but significantly slower when writing. Sadly, we require writes to be fast, especially for Guild#members and Guild#presences as those stores change their data often, so this solution is infeasible without addressing those drawbacks.
3. We don't store the raw arrays, and continue constructing maps. We can still use {...super.toJSON(), channels: this.channels.cache.toJSON(), /* ... */ }.

The text was updated successfully, but these errors were encountered:

kyranet · 2021-12-22T10:14:34Z

A few more things that also need to be discussed:

Should we do raw-data transformations? Besides excluding any array since they'll most likely be stored... in stores (2.iii). Things such as GuildMember#joined_at would massively benefit from data transformations, namely converting the ISO 8601 strings into UNIX timestamps, since it's a string of 24 characters (alongside the string length field: 256 bits, or 32 bytes) versus a double number (64 bits, or 8 bytes).
Following on raw-data transformations, should we allow a mechanism to exclude properties we don't want to store in Structure#data? E.g. some of us might be interested in knowing a member's roles, but not in their banner or accent color.
Should we have multiple Channel classes?
1. A single Channel class would allow all channels to have the same shape, although the structure of Channel#data would vary. It'd also allow us to do Channel#guild and many other things that would otherwise not be possible. We can still use a typing mechanism similar to the one from types: make channel types a lot stricter #7120 to make sure it's strict enough. One large advantage of this approach is that channel partials are a lot easier since we don't depend on knowing the channel's type, and also makes constructing channels a lot simpler and performant, while also seamlessly allowing unknown channels to be received in the future.
2. An abstract Channel class which DMChannel and GuildChannel inherit is an alternative approach to make things strict and separate what can be done in a DM and what can be done in a GuildChannel, furthermore, it allows getters and other things to be simplified. The largest advantages of this approach is that we make a better distinction between two kinds of channels, while we also allow for channel type switching (TextChannel -> NewsChannel) without creating a new instance.
3. Keep the current behavior, alongside its complicated and ever-growing instantiation switches.

kyranet added caching feature request performance labels Aug 30, 2021

kyranet added this to the Version 14 milestone Aug 30, 2021

kyranet mentioned this issue Aug 30, 2021

Roadmap: cache and sweeping improvements #6539

Open

ImRodry mentioned this issue Nov 8, 2021

Remove *Data types from Discord.js types #6958

Open

kyranet mentioned this issue Dec 22, 2021

refactor: switch to /builders Embed #7067

Merged

iCrawl added the packages:discord.js label Jan 7, 2022

kyranet mentioned this issue Jan 22, 2022

Roadmap: unified Channel class #7321

Open

KhafraDev mentioned this issue Feb 16, 2022

feat: add missing v13 component methods #7466

Merged

kyranet modified the milestones: discord.js v14, TypeScript rewrite Feb 16, 2022

kyranet mentioned this issue Feb 17, 2022

Roadmap: make the library more resilient to the lack of cache #7487

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap: raw data storage and state changes #6567

Roadmap: raw data storage and state changes #6567

kyranet commented Aug 30, 2021

kyranet commented Dec 22, 2021

Roadmap: raw data storage and state changes #6567

Roadmap: raw data storage and state changes #6567

Comments

kyranet commented Aug 30, 2021

kyranet commented Dec 22, 2021