Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap: raw data storage and state changes #6567

Open
kyranet opened this issue Aug 30, 2021 · 1 comment
Open

Roadmap: raw data storage and state changes #6567

kyranet opened this issue Aug 30, 2021 · 1 comment

Comments

@kyranet
Copy link
Member

kyranet commented Aug 30, 2021

In many places, we have methods and properties that either return raw or filled API data. Particularly, the union type Message | APIMessage, notably used in BaseCommandInteraction. What if we apply this raw data return concept to Structure#toJSON() so it can be used to clone structures? We could then use new Message(client, message.toJSON()), for example; however, we want to keep the current type's versatility in consolidating data Discord might not give us.

Introducing: Structure#data (or Structure#raw), a property containing the raw data from Discord, creating numerous advantages:

  1. Consolidates raw and cached data into one central cached structure, which means no more Message | APIMessage unions - yay!
  2. Simplifies Structure#toJSON() and improves its performance significantly because the data is raw already:
    toJSON() {
      return { ...this.data };
    }
    We could use discord-api-types to strict-type the return of those toJSON() methods.
  3. Makes cloning structures easier:
    clone() {
      return new this.constructor(this.client, this.toJSON());
    }
    Expanding fix(Message): make #channel and #guild getters #6271 to more structures so they're unbound from state and accept 2 parameters (client and data) may be a valuable change.
  4. Eliminates the need for new releases to support Discord emitting new data (although types may conflict).
  5. Simplifies patching:
    _patch(data) {
      const { roles, channels, ...raw } = data;
    
      this.data = { ...this.data, ...raw };
    
      if (roles) {
        // Handle roles
      }
    
      if (channels) {
        // Handle channels
      }
    
      // ...
    }
  6. Avoids mutating the raw data, which means new Message(client, Object.freeze(data)) won't throw.

Overall, this is a huge improvement with few downsides - alas, not zero. Now for the hard parts.

  1. The raw data must be exposed so users can access it, but do we want to expose getters so they can use message.content instead of message.data.content? That'd make the entire change semver: minor, but at a maintenance cost. On the other hand, forcing the usage of the new data field is similar to the cache change from feat: remove datastores and implement Managers #3696, and despite it bringing consistency, it's a massive breaking change (semver: major).
  2. What about managers, e.g. guild.channels? I have three solutions:
    1. Store the raw data (guild.data.channels) and construct the manager's cache on demand, reducing memory use - when the cache isn't used. We use them often internally via DataManager#resolve{,Id}, which would create maps repeatedly. Imagine the performance impact on Client#emojis (Performance issue with client.emojis getter #6248), or the much larger Client#users.
    2. Store the raw data as an array-backed binary search tree.
      Doing so has the positive memory impact of the prior solution (~4x less memory usage) at the cost of CPU time: creating the BST requires a full sort (O(n * log n)), compared to the single iteration (O(n)) required for a Map to be populated. Furthermore, get and has operations have an O(log n) computational cost compared to Map's O(1). The lack of hashing removes CPU overhead required for the search, and allows index accesses (from start to end and vice versa, as well as random access), so I'd argue that the cost for reading is offset by those features.
      Unfortunately, when we want to write to the BST, performance drops through the floor and down to hell as it requires a search (O(log n)) followed by an Array#splice (O(n)) call. This makes array-backed BSTs perform nearly the same as ordered maps when reading, but significantly slower when writing. Sadly, we require writes to be fast, especially for Guild#members and Guild#presences as those stores change their data often, so this solution is infeasible without addressing those drawbacks.
    3. We don't store the raw arrays, and continue constructing maps. We can still use {...super.toJSON(), channels: this.channels.cache.toJSON(), /* ... */ }.
@kyranet
Copy link
Member Author

kyranet commented Dec 22, 2021

A few more things that also need to be discussed:

  1. Should we do raw-data transformations? Besides excluding any array since they'll most likely be stored... in stores (2.iii). Things such as GuildMember#joined_at would massively benefit from data transformations, namely converting the ISO 8601 strings into UNIX timestamps, since it's a string of 24 characters (alongside the string length field: 256 bits, or 32 bytes) versus a double number (64 bits, or 8 bytes).
  2. Following on raw-data transformations, should we allow a mechanism to exclude properties we don't want to store in Structure#data? E.g. some of us might be interested in knowing a member's roles, but not in their banner or accent color.
  3. Should we have multiple Channel classes?
    1. A single Channel class would allow all channels to have the same shape, although the structure of Channel#data would vary. It'd also allow us to do Channel#guild and many other things that would otherwise not be possible. We can still use a typing mechanism similar to the one from types: make channel types a lot stricter #7120 to make sure it's strict enough. One large advantage of this approach is that channel partials are a lot easier since we don't depend on knowing the channel's type, and also makes constructing channels a lot simpler and performant, while also seamlessly allowing unknown channels to be received in the future.
    2. An abstract Channel class which DMChannel and GuildChannel inherit is an alternative approach to make things strict and separate what can be done in a DM and what can be done in a GuildChannel, furthermore, it allows getters and other things to be simplified. The largest advantages of this approach is that we make a better distinction between two kinds of channels, while we also allow for channel type switching (TextChannel -> NewsChannel) without creating a new instance.
    3. Keep the current behavior, alongside its complicated and ever-growing instantiation switches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants