Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV: changing discard policy would fail on srv upgrade #917

Merged
merged 2 commits into from Mar 15, 2022

Conversation

kozlovic
Copy link
Member

This is related to #900 that introduced the change of discard
policy if connecting to a server v2.7.2+.

The CreateKeyValue() API calls AddStream(). If the stream already
exists and the configuration is identical, the call succeeds
(idempotent). However, since we now try to set the discard policy
to "new" when connecting to a server v2.7.2+, the call will fail
if the KV store already existed (CreateKeyValue() was called with
a v2.7.1 server and stream was created with DiscardOld).

The approach here is that if AddStream() fails with an "already in use"
error, then we will lookup the stream info, and if we detect that
the configuration is same (except for the discard policy), then
we call UpdateStream() with the new discard policy.

The problematic part is that the client side configuration does
not set some of the fields (or their value is 0), but then the
server sets either some defaults (like Duplicates set to 2min)
or replaces 0 to -1. So the info we get back and the config
we have need to be tweaked before being compared. This is really
hacky and prone to break if server were to change some defaults.

Signed-off-by: Ivan Kozlovic ivan@synadia.com

This is related to #900 that introduced the change of discard
policy if connecting to a server v2.7.2+.

The CreateKeyValue() API calls AddStream(). If the stream already
exists and the configuration is identical, the call succeeds
(idempotent). However, since we now try to set the discard policy
to "new" when connecting to a server v2.7.2+, the call will fail
if the KV store already existed (CreateKeyValue() was called with
a v2.7.1 server and stream was created with DiscardOld).

The approach here is that if AddStream() fails with an "already in use"
error, then we will lookup the stream info, and if we detect that
the configuration is same (except for the discard policy), then
we call UpdateStream() with the new discard policy.

The problematic part is that the client side configuration does
not set some of the fields (or their value is 0), but then the
server sets either some defaults (like Duplicates set to 2min)
or replaces 0 to -1. So the info we get back and the config
we have need to be tweaked before being compared. This is really
hacky and prone to break if server were to change some defaults.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
@coveralls
Copy link

coveralls commented Feb 19, 2022

Coverage Status

Coverage increased (+0.3%) to 85.532% when pulling 5bdaa1f on kv_create_updgrade_issue into 0096b1b on main.

kv.go Outdated Show resolved Hide resolved
kv.go Show resolved Hide resolved
kv.go Outdated Show resolved Hide resolved
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

scottf
scottf previously requested changes Feb 20, 2022
Copy link
Contributor

@scottf scottf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand the need to update the stream configuration to ensure it has the latest settings i.e. discard policy of new, but I object to changing the behavior of "Create" to actually be create or update. You are building in a side effect that may have unintended consequences, not to mention, for consistency and no surprises, it should be an error when you try to create something that already exists. We have individual create/update key API for the exact same reason, we have stream create and update api for that reason and we know that client developers have complained that consumer only has update.

I would instead prefer some management api that can check a store for compliance with a server version and update the stream. It also seems like this is more of a CLI function than a client function, as it should rarely be done or an automatic behavior of the client.

I also wonder if this is even necessary. Consider that KV is currently experimental. Someone starting to use KV now will use the latest client and server so this use case feels like an edge case

Copy link
Member

@ColinSullivan1 ColinSullivan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points @scottf. Technically I agree 100% However, from a broader product perspective I think the PR has merit and should be merged, even if it's not best long term w/r/t client library implementation.

It'd be better UX to do the update for the time being. Without the update users could suddenly have things "break" with a nebulous reason why (couldn't create stream), likely resulting in number new issues or users abandoning NATS KV.

This PR could be categorized as fixing a systemic design bug that could potentially result in data loss, justifying this approach.

In the future, we can update the client to remove the stream update and have mitigation instructions on a FAQ in case users skipped a number of versions.

@scottf
Copy link
Contributor

scottf commented Feb 24, 2022

@ColinSullivan1 This code also assumes that the users calls CreateKeyValue repeatedly (every time the code runs), which cannot be guaranteed, which means the fix may never be applied. Also, again, I think that is behavior that we should discourage. The stream [config] is not idempotent and creating a kv is part of management, meaning we've separated management from use. Also consider that we steer users to the CLI for management.

If anything, when a kv context is opened on a bucket (KeyValueManager.KeyValue(bucket string)), the stream configuration should be checked to see if it is up to date and if not, that's when it should be updated. This open path is guaranteed to be called.

@scottf
Copy link
Contributor

scottf commented Feb 25, 2022

And as I think about it, updating it on the open could have account/security problems. I still really think we should just add an update api, or at least a CreateOrUpdateApi instead of changing the Create behavior.

@tbeets
Copy link
Contributor

tbeets commented Feb 25, 2022

I agree with @scottf . I don't think we should auto-magically change the (non-identical) durable configuration at server from extent client application code. I think the needed stream updates (existing buckets) should be documented in the release notes (with a NATS CLI example of updating the policy on a bucket). This seems like an okay tax on KV experimental adopters (who have existing buckets they need to preserve).

@ripienaar
Copy link
Contributor

There is a “nats kv upgrade BUCKET” command that was used in a previous scenario that required stream updates. Could add this to it.

@derekcollison
Copy link
Member

Calling stream create with same config works by design to simplify program design. I don't think you are saying to change that.

With respect to the KV, it was my call to specifically upgrade an underlying stream created with an older client library that would discard old but valid keys/values. This was an oversight on my behalf but a pretty bad customer experience when you ran into it and would definitely violate path of least surprise.

I made the call to upgrade the underlying stream on the fly with just changing discard old to discard new if the server was the correct version. I did not expect folks to upgrade it themselves tbh. And they would most definitely call us out when we lost their data.

@scottf
Copy link
Contributor

scottf commented Feb 28, 2022

@derekcollison We all agree that there needs to be an easy way to apply this change. Putting the on the fly upgrade into the create call leaves out users who do not code create into the normal code path. We can guarantee that all code paths will open a KV context, but I'm not sure we can guarantee that the user will have authorization at that point. If we can guarantee auth, the get/ open kv context is the best place to do this upgrade.

If we are going to go through the effort of documenting that a user must call create each time, why don't we just provide an out of band way to fix the stream config and document it, since this is really just a one time fix for a problem that only existed for a short time. It would be easy enough to provide a piece of code in every language that does the upgrade. Or by adding an update api we can put this code under that. Also having an update api may provide useful against future changes.

@derekcollison
Copy link
Member

IIRC we have encouraged apps to just use stream create (and hence everything that utilizes that like K/V) to avoid excess boilerplate code of lookup, if it's not there, create etc. Hence the choice to place it there. Agree we can not control what everyone does but that is what we were encouraging I believe. I also prefer just doing a client lib update to get the fix vs relying on new code being put into the app layer or relying on the user to run an external command.

What is your alternative suggestion?

@scottf
Copy link
Contributor

scottf commented Feb 28, 2022

Alternatives.

  1. Build a one off piece of code in GO so it can be compiled multi-platform and run as an executable.
  2. Augment the CLI to be able to update a KV stream.

I prefer option 2 because this would set the table for future updates of this nature.

@scottf
Copy link
Contributor

scottf commented Feb 28, 2022

Regarding what we encourage, we are completely inconsistent. We have both stream create and stream update api in the JetStreamManager. We have put/create/update for keys. We have only create for consumers, which we as client developers have complained about. maybe this is a "we'll care about it in the next major version of the client"

@derekcollison
Copy link
Member

The consistency I was speaking of is that any app (or many at the same time) can call AddStream, or AddConsumer and that should work, one will create and the others are essentially no-ops as long as config is exactly the same.

What I am expressing here is that folks will most likely not run an external tool or even the cli, so I thought this was the best place to protect their data.

This is a one time and temporary fix, but if we lose their data that will be very bad. And they will not care of we had a tool and they did not use it, imo.

@scottf
Copy link
Contributor

scottf commented Mar 1, 2022

We cannot guarantee that the developer will call create, so that path is no better at ensuring that an upgrade will be executed and data will not be lost. All options require that we document and communicate the solution.

Having an upgrade path in the CLI is the best solution since we promote the CLI for management. Having a one off tool is the 2nd best solution.

This solution requires both adding unnecessary code to the client and having the user add code to their codebase or to write a one off of their own, so it is more work for a user. That does not feel like a better UX experience.

I built a small Java program. It connects to the server, looks for all streams with the KV_ prefix and updates the policy. It only requires a user set their connection and account information like they would normally do. It could be made to be fully command line, but it's purpose was to demonstrate how simple the one off is for us to write. If I was better at go, I would have already written the same.

@derekcollison
Copy link
Member

My point is we do suggest always calling create unless there is a separate management function that had created it. This allows apps to be independent of a management plane and nomadic.

If there is a management plane, I believe you can upgrade the stream underlying the KV with the current NATS cli.

@scottf
Copy link
Contributor

scottf commented Mar 2, 2022

I have a PR for the CLI that adds fixing the discard policy as part of the kv upgrade <bucket> command, which previously upgraded just allow_rollup_hdrs. nats-io/natscli#364
This is provided as an additional / alternate way to upgrade and is not intended as a replacement for this PR.

@scottf scottf dismissed their stale review March 15, 2022 16:23

resolved in discussion

@kozlovic kozlovic merged commit 77460b8 into main Mar 15, 2022
@kozlovic kozlovic deleted the kv_create_updgrade_issue branch March 15, 2022 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants