Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL state store v2 #3250

Merged
merged 23 commits into from Dec 18, 2023
Merged

PostgreSQL state store v2 #3250

merged 23 commits into from Dec 18, 2023

Conversation

ItalyPaleAle
Copy link
Contributor

@ItalyPaleAle ItalyPaleAle commented Nov 28, 2023

Fixes #2956

This PR introduces the v2 of the PostgreSQL state store component, which is meant to live side-by-side with v1. The implementation is very different and the data stored in the state store will not be compatible. However, to use the "v2" of the state store, users will need to explicitly opt into that, so existing apps will not break.

There's currently no way to update from a "v1" to "v2" state store. If that were the needed, we could look into doing that in the future.

The main changes compared to v1 are:

  • There is only 1 implementation that is used by all PostgresSQL-compatible databases, including CockroachDB.
  • The data is now stored as binary and not in a JSON column.
    • This should improve performance quite a bit when the data stored is binary, such as all the data that comes in through the gRPC API (previously, that was base64-encoded every time) . It also improves performance in general due to using an opaque column type rather than JSONB (since data in JSONB columns is always validated to be correct JSON).
    • The biggest consequence of the above is that the v2 of this component does not support the state store query APIs. This is by design, as those APIs are being deprecated
  • All implementations now use a UUID as etag, rather than relying on the xmin column (which was always a bit "hacky" and only worked on PostgreSQL), or a deterministic integer.

The new component has conformance and certification tests enabled.

PS: In both "v2" and "v1", there are some aliases for metadata properties: cleanupInterval = cleanupIntervalInSeconds (and it accepts a Go duration), timeout = timeoutInSeconds (and it accepts a Go duration), connectionString = url

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
…B too

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle
Copy link
Contributor Author

/ok-to-test

@dapr-bot

This comment was marked as outdated.

@dapr-bot

This comment was marked as outdated.

@dapr-bot

This comment was marked as outdated.

…o postgres-v2

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle
Copy link
Contributor Author

/ok-to-test

@dapr-bot
Copy link
Collaborator

Complete Build Matrix

The build status is currently not updated here. Please visit the action run below directly.

🔗 Link to Action run

Commit ref: 9586773

@dapr-bot
Copy link
Collaborator

dapr-bot commented Nov 29, 2023

Components certification test

🔗 Link to Action run

Commit ref: 9586773

✅ All certification tests passed

All tests have reported a successful status

@dapr-bot
Copy link
Collaborator

dapr-bot commented Nov 29, 2023

Components conformance test

🔗 Link to Action run

Commit ref: 9586773

✅ All conformance tests passed

All tests have reported a successful status

// This is the only way to also ensure we are not running multiple "CREATE TABLE IF NOT EXISTS" at the exact same time
// See: https://www.postgresql.org/message-id/CA+TgmoZAdYVtwBfp1FL2sMZbiHCWT4UPrzRLNnX1Nb30Ku3-gg@mail.gmail.com
const lockID = 42
// Ensure the metadata table exists
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: I had to change how we lock before migrations because advisory locks aren't available in CockroachDB. This is functionally equivalent (and the certification tests are validating it), but because we call EnsureMetadataTable regardless of the lock, that operation may fail and may need to be retried automatically (see implementation for EnsureMetadataTable)

@berndverst
Copy link
Member

Fix the conflict please :)

Also just to confirm, v2 components don't have query API right?

…o postgres-v2

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle
Copy link
Contributor Author

Fix the conflict please :)

Done!

Also just to confirm, v2 components don't have query API right?

Correct, they do not.

Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@olitomlinson
Copy link

@ItalyPaleAle

The data is now stored as binary and not in a JSON column.

Does this mean there is no convenient way to inspect the value of what has been stored against a key using something like PgAdmin?

I mean, I get it, and I get the reasoning, but this is a -1 for convenience.

My developers have often relied on inspecting the state directly when getting to grips with Dapr, and still do dive into the data layer at times when debugging. Sounds like this would not be possible on V2?

@ItalyPaleAle
Copy link
Contributor Author

Does this mean there is no convenient way to inspect the value of what has been stored against a key using something like PgAdmin?

Still can, it shouldn't be significantly more inconvenient. In some cases it may even be more convenient.

First, one thing to keep in mind, is that Dapr is a bit of a mess in the way it's passing data to the state stores.

  • HTTP APIs store data by passing objects that will be encoded as JSON
  • gRPC APIs pass []byte to the state store

In v1 of the component, data from gRPC would be interpreted as "binary" and stored in a JSONB column base64-encoded. So if you sent "hi!" using gRPC, in the database you'd get "aGkh" (exactly like that, with quotes around the base64-encoded value)

In v2 everything is stored as-is in a BYTEA column. If the data is a []byte it is stored as-is. If it's an object, it's serialized as JSON, which produces []byte, and stored as-is.

pgAdmin will show the column as binary by default, but assuming you know the value is UTF-8-encoded text (as it the case for JSON data), then you can do:

SELECT convert_from(value, 'utf-8') FROM state

Copy link
Contributor

@DeepanshuA DeepanshuA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mostly LGTM.
I have some small comments only.

@@ -37,33 +37,58 @@ type Migrations struct {

// Perform the required migrations
func (m Migrations) Perform(ctx context.Context, migrationFns []commonsql.MigrationFn) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is this migration meant to migrate data from some existing db to postgres?
If it is doing that kind of migration, so should we use something like https://github.com/golang-migrate/migrate, instead of devising our locking strategy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is for schema migration. They are uncommon and normally happen on major Dapr updates only. We were already using it, just had to make some changes to support non-Postgres databases.

}

// Init sets up Postgres connection and performs migrations
func (p *PostgreSQL) Init(ctx context.Context, meta state.Metadata) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include somewhere here (apart from the docs) - the diff with v1 and what are the benefits of using v2 - just, so that a developer can upfront see the reason for a v2 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, in the comment on the NewPostgreSQLStateStore method

state/postgresql/v2/postgresql.go Show resolved Hide resolved
ItalyPaleAle and others added 3 commits December 18, 2023 16:47
Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@ItalyPaleAle ItalyPaleAle merged commit 73997a2 into dapr:main Dec 18, 2023
88 checks passed
@ItalyPaleAle ItalyPaleAle added this to the v1.13 milestone Dec 22, 2023
@ItalyPaleAle ItalyPaleAle deleted the postgres-v2 branch January 31, 2024 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PostgreSQL state store v2
5 participants