New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use separate PostgreSQL schemas for sprocs #4412
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me - left a few comments, but I don't think they're blocking. I can't see any way this could have a negative impact (famous last words, right)?
I don't see anything that specifically instructs Postgres to create sprops in the random schema - will Postgres automatically create them in the first element of search_path
?
Yep. This is also why it's important to run migrations (which might create tables) before setting the random schema. See https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH where it says:
I also checked that we really are creating the sprocs in the random schema, as expected. |
An alternative which I considered is to explicitly set the schema when you create objects. For example, we could do
for all our sprocs to make sure they are created in exactly the schema we want. However, this is pretty invasive and I believe that using the implicit "create in first search_path entry" behavior is safe. |
Yep, as long as that behavior is documented, I'm good with relying on it! Thanks for the link to the docs. |
@nwalters512 Another question is whether we should do any automatic cleanup of either:
My current thoughts are to defer (1) until after we've deployed this PR and are happy with things. We could then add a migration to clean up the public sprocs. For (2) we can either ignore it for now, relying on the fact that having extra schemas lying around doesn't really hurt anything, or we could add code to automatically delete any old schemas for our current server name. That is, the schemas are named like Thoughts? |
Yeah, I'm happy to ignore the ones in the public schema almost indefinitely (though we can of course clean them up in the future). I believe (2) gets trickier once we throw auto scaling and general transient servers into the mix. We're not ever guaranteed to even restart a server on an instance once the process is killed or dies. I'm inclined to instead use the date somehow, e.g. deleting all schemas with creation dates over 30 days ago. That would of course cause issues if there happens to be a server that's alive for 30 days. I'll have to think more about this! |
Sounds good.
That's a good point that autoscaling would mean instances never come back. However, for autoscaling we will soon need to add a table like server_loads (which is for the grader hosts) but for |
@nwalters512 I updated a bunch of things and I believe it's ready to go. Mind taking another quick look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Fingers crossed that this makes deploys even more robust 🤞
Co-authored-by: Nathan Walters <nathan@prairielearn.com>
Co-authored-by: Nathan Walters <nathan@prairielearn.com>
Commit 4a966d1 above changes all sprocs from the old pattern:
to the new pattern:
The reason for this is that the old pattern causes a bug when deploying in an existing multi-server environment. The problem is:
The same problem happens for sprocs without a To fix this, we could either:
I went with (2) even though it's a much more invasive code change because it makes the resulting code much simpler: we have a simpler startup procedure and also individual sproc functions are simpler and cleaner. |
In the process of doing 4a966d1 I also discovered that we were using an explicit Line 2 in 27e279f
Line 2 in 27e279f
I removed both of these schema references so that we will default to the current random schema. I personally don't like these sprocs because they are using names which are much too generic ( |
This PR changes
sql-db
so that it can use a local DB schema for all queries. We use this to store a separate set of sprocs (stored procedures) for each invocation of the DB. The startup sequence is:public
schema.public
).This is intended to avoid problems where one server startup modifies sprocs that are currently in use by another server. Although Postgres has transactional DDL, this can still cause problems with deadlocking via explicit locks. I think this is responsible for #3824
Note that this PR requires #4411 to work, because the changes are in PrairieLib and so we need to be using it directly to pick up the new code.