Use separate PostgreSQL schemas for sprocs #4412

mwest1066 · 2021-06-23T17:06:17Z

This PR changes sql-db so that it can use a local DB schema for all queries. We use this to store a separate set of sprocs (stored procedures) for each invocation of the DB. The startup sequence is:

Start with only the default public schema.
Run migrations, which will modify the global public tables.
Set the default schema to a random string (with fallback to public).
Run sproc init, which will create all the sprocs from scratch in the random schema.
For all future SQL calls, use the random schema so that we get our local copy of the sprocs.

This is intended to avoid problems where one server startup modifies sprocs that are currently in use by another server. Although Postgres has transactional DDL, this can still cause problems with deadlocking via explicit locks. I think this is responsible for #3824

Note that this PR requires #4411 to work, because the changes are in PrairieLib and so we need to be using it directly to pick up the new code.

nwalters512

This makes sense to me - left a few comments, but I don't think they're blocking. I can't see any way this could have a negative impact (famous last words, right)?

I don't see anything that specifically instructs Postgres to create sprops in the random schema - will Postgres automatically create them in the first element of search_path?

prairielib/lib/sql-db.js

server.js

mwest1066 · 2021-06-25T02:32:38Z

I don't see anything that specifically instructs Postgres to create sprops in the random schema - will Postgres automatically create them in the first element of search_path?

Yep. This is also why it's important to run migrations (which might create tables) before setting the random schema.

See https://www.postgresql.org/docs/current/ddl-schemas.html#DDL-SCHEMAS-PATH where it says:

The first schema in the search path that exists is the default location for creating new objects.

I also checked that we really are creating the sprocs in the random schema, as expected.

mwest1066 · 2021-06-25T02:37:54Z

I don't see anything that specifically instructs Postgres to create sprops in the random schema - will Postgres automatically create them in the first element of search_path?

Yep. This is also why it's important to run migrations (which might create tables) before setting the random schema.

An alternative which I considered is to explicitly set the schema when you create objects. For example, we could do

CREATE FUNCTION ${schemaName}.variants_insert

for all our sprocs to make sure they are created in exactly the schema we want. However, this is pretty invasive and I believe that using the implicit "create in first search_path entry" behavior is safe.

nwalters512 · 2021-06-25T02:39:48Z

Yep, as long as that behavior is documented, I'm good with relying on it! Thanks for the link to the docs.

mwest1066 · 2021-06-25T02:44:16Z

@nwalters512 Another question is whether we should do any automatic cleanup of either:

The old sprocs in the public schema.
The new schemas that we are creating every time we restart the server.

My current thoughts are to defer (1) until after we've deployed this PR and are happy with things. We could then add a migration to clean up the public sprocs.

For (2) we can either ignore it for now, relying on the fact that having extra schemas lying around doesn't really hurt anything, or we could add code to automatically delete any old schemas for our current server name. That is, the schemas are named like i_329c8aef93_2021-06-21-SDLFK where the first bit is the EC2 instance ID, so on startup we can delete any schemas that start with our own instance ID. We always run all servers with unique IDs, although it would be possible to override this which would lead to problems if we auto-delete.

Thoughts?

nwalters512 · 2021-06-25T02:55:23Z

Yeah, I'm happy to ignore the ones in the public schema almost indefinitely (though we can of course clean them up in the future).

I believe (2) gets trickier once we throw auto scaling and general transient servers into the mix. We're not ever guaranteed to even restart a server on an instance once the process is killed or dies. I'm inclined to instead use the date somehow, e.g. deleting all schemas with creation dates over 30 days ago. That would of course cause issues if there happens to be a server that's alive for 30 days. I'll have to think more about this!

mwest1066 · 2021-06-25T12:49:03Z

Yeah, I'm happy to ignore the ones in the public schema almost indefinitely (though we can of course clean them up in the future).

Sounds good.

I believe (2) gets trickier once we throw auto scaling and general transient servers into the mix. We're not ever guaranteed to even restart a server on an instance once the process is killed or dies. I'm inclined to instead use the date somehow, e.g. deleting all schemas with creation dates over 30 days ago. That would of course cause issues if there happens to be a server that's alive for 30 days. I'll have to think more about this!

That's a good point that autoscaling would mean instances never come back. However, for autoscaling we will soon need to add a table like server_loads (which is for the grader hosts) but for webserver_loads, so we can track how many webservers there are and load levels. Each server updates its row every 10 seconds while it's alive, so it's easy to tell when servers are dead. We use this already for grader hosts to kill them off. The same logic would let us clean up old schemas.

mwest1066 · 2021-06-29T20:42:13Z

@nwalters512 I updated a bunch of things and I believe it's ready to go. Mind taking another quick look?

nwalters512

Awesome! Fingers crossed that this makes deploys even more robust 🤞

docs/dev-guide.md

prairielib/lib/sql-db.js

sprocs/array_and_number.sql

Co-authored-by: Nathan Walters <nathan@prairielearn.com>

mwest1066 · 2021-07-01T18:34:12Z

Commit 4a966d1 above changes all sprocs from the old pattern:

DROP FUNCTION IF EXISTS my_func()
CREATE OR REPLACE FUNCTION my_func()

to the new pattern:

CREATE FUNCTION my_func()

The reason for this is that the old pattern causes a bug when deploying in an existing multi-server environment. The problem is:

Consider running with an existing prod server and a chunk server, both of which are using the pre-existing code with all sprocs in the public schema.
Deploy this new random-schema code to prod.
The prod server runs code like DROP FUNCTION IF EXISTS my_func(), which will delete the copy of my_func() in the public schema.
The prod server creates a new function my_func() in its random schema.
The chunk server is still only using the public schema so when it tries to use my_func() it dies with a SQL error of cache lookup failed for function.

The same problem happens for sprocs without a DROP FUNCTION because they all use CREATE OR REPLACE FUNCTION which is equivalent to a DROP FUNCTION followed by a CREATE FUNCTION. In the case that the existing function is in the public schema and we have search_path = <random_schema>,public, running CREATE OR REPLACE FUNCTION removes the public function and creates a new one in the random schema.

To fix this, we could either:

On server startup, first use just the public schema to run migrations, then set search_path to only the random schema (with no fallback to public) to run all the DROP/CREATE code for sprocs. Finally, set search_path to <random_schema>,public for actual running.
Change all the sprocs to just do plain CREATE with no DROP or OR REPLACE.

I went with (2) even though it's a much more invasive code change because it makes the resulting code much simpler: we have a simpler startup procedure and also individual sproc functions are simpler and cleaner.

mwest1066 · 2021-07-01T18:41:12Z

In the process of doing 4a966d1 I also discovered that we were using an explicit public schema here:

PrairieLearn/sprocs/first.sql

Line 2 in 27e279f

CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )

and here:

PrairieLearn/sprocs/last.sql

Line 2 in 27e279f

CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )

I removed both of these schema references so that we will default to the current random schema.

I personally don't like these sprocs because they are using names which are much too generic (first and last) and could easily cause confusion or conflicts with built-in functions. I'm leaving them as-is for now, however.

…schema

mwest1066 added 2 commits June 23, 2021 11:58

Add schema support to sql-db

75aefc8

Add random schema setting

5793b8b

mwest1066 requested a review from nwalters512 June 23, 2021 17:06

mwest1066 mentioned this pull request Jun 23, 2021

Directly require() PrairieLib files instead of using it as a package #4411

Merged

mwest1066 and others added 8 commits June 24, 2021 11:19

Fix function definitions

394eb53

Fix substring() name

6173313

Change replaceAll() -> replace()

e0a2ff6

Fix module.export(s)

56e7a7e

Change to underscores for schema name

e107ace

Actually create the new schema

3a4f4b1

Merge branch 'master' into sproc-schemas

899005e

Fix an actual bug found by the linter!

ba6048c

nwalters512 approved these changes Jun 25, 2021

View reviewed changes

prairielib/lib/sql-db.js Outdated Show resolved Hide resolved

prairielib/lib/sql-db.js Outdated Show resolved Hide resolved

server.js Show resolved Hide resolved

mwest1066 added 11 commits June 29, 2021 09:46

Only create schema if it doesn't exist

926d1f2

Use client.escapeIdentifier() for schema

48f7bca

Document use of done()

c16d704

Document length limit of schema prefix

dfe3f28

Fix setting with null schema

eda367f

Fix schema prefix truncation

2f2894f

Fix to create non-null schema before using in search_path

6aadf86

Shift new functions to end-of-file

581aa74

Force schema name to lowercase for convenience with psql

074f37d

Fix sprocs/array_and_number.sql to not check EXISTS

c697be2

Rewrite query() to use queryWithClient()

6c9f8c7

mwest1066 and others added 2 commits June 29, 2021 15:11

Add docs and comments about our schema system

645cda3

Merge branch 'master' into sproc-schemas

06a1883

mwest1066 requested a review from nwalters512 June 29, 2021 20:41

nwalters512 approved these changes Jun 30, 2021

View reviewed changes

docs/dev-guide.md Outdated Show resolved Hide resolved

prairielib/lib/sql-db.js Show resolved Hide resolved

sprocs/array_and_number.sql Outdated Show resolved Hide resolved

mwest1066 and others added 4 commits June 30, 2021 09:02

Clarify "local" in the docs

4a5e2c8

Co-authored-by: Nathan Walters <nathan@prairielearn.com>

Rewrite type creation comment

f4e254c

Co-authored-by: Nathan Walters <nathan@prairielearn.com>

More comment updates about type creation

76142cc

Change all sprocs to plain CREATE

4a966d1

mwest1066 added 3 commits July 1, 2021 17:15

Fix tests to actually use setRandomSearchSchema()

1f2cfbe

Remove sprocs/random_string.sql because migrations make it in public …

22b41cc

…schema

Use more character types in setRandomSearchSchema()

6db5561

mwest1066 merged commit d5c37b3 into master Jul 7, 2021

mwest1066 deleted the sproc-schemas branch July 7, 2021 21:18

github-actions bot locked and limited conversation to collaborators Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use separate PostgreSQL schemas for sprocs #4412

Use separate PostgreSQL schemas for sprocs #4412

mwest1066 commented Jun 23, 2021

nwalters512 left a comment

mwest1066 commented Jun 25, 2021

mwest1066 commented Jun 25, 2021

nwalters512 commented Jun 25, 2021

mwest1066 commented Jun 25, 2021

nwalters512 commented Jun 25, 2021

mwest1066 commented Jun 25, 2021

mwest1066 commented Jun 29, 2021

nwalters512 left a comment

mwest1066 commented Jul 1, 2021

mwest1066 commented Jul 1, 2021 •

edited

Use separate PostgreSQL schemas for sprocs #4412

Use separate PostgreSQL schemas for sprocs #4412

Conversation

mwest1066 commented Jun 23, 2021

nwalters512 left a comment

Choose a reason for hiding this comment

mwest1066 commented Jun 25, 2021

mwest1066 commented Jun 25, 2021

nwalters512 commented Jun 25, 2021

mwest1066 commented Jun 25, 2021

nwalters512 commented Jun 25, 2021

mwest1066 commented Jun 25, 2021

mwest1066 commented Jun 29, 2021

nwalters512 left a comment

Choose a reason for hiding this comment

mwest1066 commented Jul 1, 2021

mwest1066 commented Jul 1, 2021 • edited

mwest1066 commented Jul 1, 2021 •

edited