Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB Index and Schema #183

Open
merowin opened this issue Feb 4, 2024 · 8 comments
Open

DB Index and Schema #183

merowin opened this issue Feb 4, 2024 · 8 comments

Comments

@merowin
Copy link
Collaborator

merowin commented Feb 4, 2024

After shortly talking about this topic here: #171 , let's discuss this a bit more.

When our database (almost inevitably) gets to a large size, queries with filters will get slow and strain our resources. So I think we should make plans for adding some indexes early rather than late. Here's relevant documentation on how to do so: https://www.mongodb.com/docs/drivers/node/current/fundamentals/indexes/

It seems that creating an index is as simple as

const result = await db.collection("games").createIndex({ variant: 1 });

But how should we handle db migrations in the first place? Manually migrating the db via mongodb atlas seems to be possible. But I think a workflow where developers can add migrations in the source code would be preferable, though of course we need to be careful with db migrations in general.

It might be worth adding a db schema before that, even if only to document how we want the data to be structured (roughly). I believe here is relevant information: https://www.mongodb.com/basics/json-schema-examples

@merowin
Copy link
Collaborator Author

merowin commented Feb 4, 2024

An alternative approach could be to try and keep the database small, e.g. by adding a service that automatically deletes games that are inactive for a long time (timeout service 2.0 for games? 😝 ). I'm guessing this would be even more resource friendly, but make a lot of future features impossible , e.g. statistics on games etc.
Personally I don't care a lot about such statistics, but also some users like viewing their entire game history, and might feel put-off by their games being deleted.

We can even think about combining both approaches.

@merowin
Copy link
Collaborator Author

merowin commented Feb 4, 2024

As a follow-up thought about the "delete games"-approach, we could mitigate the negative implications by adding a "download game"-feature. Basically, we'd put the burden of storing past games they want to keep to the user, which is fine imo. However a feature a la "upload and view stored game" would require us to add strict validation, I think.

I realize this is an issue for way in the future though 😅

@benjaminpjones
Copy link
Collaborator

Manually migrating the db via mongodb atlas seems to be possible. But I think a workflow where developers can add migrations in the source code would be preferable

100% agree, I would prefer migrations to be in source control. We get code review, and it's easier to perform across multiple db instances.

It might be worth adding a db schema before that

I don't think one is dependent on the other, but I am supportive of a schema now that the db structures are relatively stable.

future features impossible , e.g. statistics on games etc.

There might be ways to collect aggregated stats without keeping the entire archive around. As an analogy, when computing a running average, you only need to track the sum and count, but don't need to keep the data around.

@merowin
Copy link
Collaborator Author

merowin commented Feb 4, 2024

I just realised that I don't even know how the collections for users and games were initialised. I didn't find anything in the code, were they added manually?

@benjaminpjones
Copy link
Collaborator

benjaminpjones commented Feb 4, 2024

Probably implicitly created by gamesCollection().insertOne(game) and usersCollection().insertOne(user) - mongo is very flexy, and collections are created when you start adding documents to them.

There does exist a createCollection function, but the documentation says:

Because MongoDB creates a collection implicitly when the collection is first referenced in a command, this method is used primarily for creating new collections that use specific options.


So far, I don't think I've touched the database directly. Everything in our yarn commands is sufficient to get a deployment up, assuming the mongo server is already running.

@merowin
Copy link
Collaborator Author

merowin commented Feb 4, 2024

If we just call
const result = await db.collection("games").createIndex({ variant: 1 });
... each time on startup, I wonder if this would lead to multiple equivalent (superfluous) indexes being created. We need to make sure this doesn't happen, and I don't know if mongodb checks that itself.

// Edit: I believe I found the answer: https://stackoverflow.com/questions/35019313/checking-if-an-index-exists-in-mongodb

I have a little bit of experience with migrations of a Postgres DB with .Net Entity Framework. There a specific table is used to hold information on which migrations were applied. We could do something similar, to ensure that migrations are applied exactly once (and in the correct order).

@benjaminpjones
Copy link
Collaborator

benjaminpjones commented Feb 4, 2024

I think the SO link you posted says the same, but here's a quote from the docs.

When you run a create index command in the MongoDB Shell or a driver, MongoDB only creates the index if an index of the same specification does not exist.

So I think createIndex is okay to do on startup.


There a specific table is used to hold information on which migrations were applied. We could do something similar, to ensure that migrations are applied exactly once (and in the correct order).

Interesting! I would be interested to see what that looks like!

By the way, what types of migrations are you thinking of? The one example I can think of is updating the players array

In this case, the migration strategy was just to:

  1. update the code to work with both the old and new "schema"
  2. [TODO] write migration script add the missing field to all older games
    • side note: this type of migration I'd prefer to decouple from start up.
  3. [TODO] delete the code relevant to the old schema

Our migration script would need to check that the field doesn't already exist anyway. In that case, running the script more than once probably wouldn't cause issues.

@merowin
Copy link
Collaborator Author

merowin commented Feb 4, 2024

Interesting! I would be interested to see what that looks like!

It is mentioned e.g. here: https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations/history-table
These docs also include how to apply migrations on runtime, where I believe this table is important.

By the way, what types of migrations are you thinking of? The one example I can think of is updating the players array

I'm thinking of everything that changes the structure of the data base as a migration. But maybe this kind of thinking is uncalled for with mongo db, and we don't need anything special for the majority of such changes.

Yes the players array is a good example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants