New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KV: changing discard policy would fail on srv upgrade #917
Conversation
This is related to #900 that introduced the change of discard policy if connecting to a server v2.7.2+. The CreateKeyValue() API calls AddStream(). If the stream already exists and the configuration is identical, the call succeeds (idempotent). However, since we now try to set the discard policy to "new" when connecting to a server v2.7.2+, the call will fail if the KV store already existed (CreateKeyValue() was called with a v2.7.1 server and stream was created with DiscardOld). The approach here is that if AddStream() fails with an "already in use" error, then we will lookup the stream info, and if we detect that the configuration is same (except for the discard policy), then we call UpdateStream() with the new discard policy. The problematic part is that the client side configuration does not set some of the fields (or their value is 0), but then the server sets either some defaults (like Duplicates set to 2min) or replaces 0 to -1. So the info we get back and the config we have need to be tweaked before being compared. This is really hacky and prone to break if server were to change some defaults. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely understand the need to update the stream configuration to ensure it has the latest settings i.e. discard policy of new, but I object to changing the behavior of "Create" to actually be create or update. You are building in a side effect that may have unintended consequences, not to mention, for consistency and no surprises, it should be an error when you try to create something that already exists. We have individual create/update key API for the exact same reason, we have stream create and update api for that reason and we know that client developers have complained that consumer only has update.
I would instead prefer some management api that can check a store for compliance with a server version and update the stream. It also seems like this is more of a CLI function than a client function, as it should rarely be done or an automatic behavior of the client.
I also wonder if this is even necessary. Consider that KV is currently experimental. Someone starting to use KV now will use the latest client and server so this use case feels like an edge case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points @scottf. Technically I agree 100% However, from a broader product perspective I think the PR has merit and should be merged, even if it's not best long term w/r/t client library implementation.
It'd be better UX to do the update for the time being. Without the update users could suddenly have things "break" with a nebulous reason why (couldn't create stream), likely resulting in number new issues or users abandoning NATS KV.
This PR could be categorized as fixing a systemic design bug that could potentially result in data loss, justifying this approach.
In the future, we can update the client to remove the stream update and have mitigation instructions on a FAQ in case users skipped a number of versions.
@ColinSullivan1 This code also assumes that the users calls If anything, when a kv context is opened on a bucket ( |
And as I think about it, updating it on the open could have account/security problems. I still really think we should just add an update api, or at least a CreateOrUpdateApi instead of changing the Create behavior. |
I agree with @scottf . I don't think we should auto-magically change the (non-identical) durable configuration at server from extent client application code. I think the needed stream updates (existing buckets) should be documented in the release notes (with a NATS CLI example of updating the policy on a bucket). This seems like an okay tax on KV experimental adopters (who have existing buckets they need to preserve). |
There is a “nats kv upgrade BUCKET” command that was used in a previous scenario that required stream updates. Could add this to it. |
Calling stream create with same config works by design to simplify program design. I don't think you are saying to change that. With respect to the KV, it was my call to specifically upgrade an underlying stream created with an older client library that would discard old but valid keys/values. This was an oversight on my behalf but a pretty bad customer experience when you ran into it and would definitely violate path of least surprise. I made the call to upgrade the underlying stream on the fly with just changing discard old to discard new if the server was the correct version. I did not expect folks to upgrade it themselves tbh. And they would most definitely call us out when we lost their data. |
@derekcollison We all agree that there needs to be an easy way to apply this change. Putting the on the fly upgrade into the create call leaves out users who do not code create into the normal code path. We can guarantee that all code paths will open a KV context, but I'm not sure we can guarantee that the user will have authorization at that point. If we can guarantee auth, the get/ open kv context is the best place to do this upgrade. If we are going to go through the effort of documenting that a user must call create each time, why don't we just provide an out of band way to fix the stream config and document it, since this is really just a one time fix for a problem that only existed for a short time. It would be easy enough to provide a piece of code in every language that does the upgrade. Or by adding an update api we can put this code under that. Also having an update api may provide useful against future changes. |
IIRC we have encouraged apps to just use stream create (and hence everything that utilizes that like K/V) to avoid excess boilerplate code of lookup, if it's not there, create etc. Hence the choice to place it there. Agree we can not control what everyone does but that is what we were encouraging I believe. I also prefer just doing a client lib update to get the fix vs relying on new code being put into the app layer or relying on the user to run an external command. What is your alternative suggestion? |
Alternatives.
I prefer option 2 because this would set the table for future updates of this nature. |
Regarding what we encourage, we are completely inconsistent. We have both stream create and stream update api in the JetStreamManager. We have put/create/update for keys. We have only create for consumers, which we as client developers have complained about. maybe this is a "we'll care about it in the next major version of the client" |
The consistency I was speaking of is that any app (or many at the same time) can call AddStream, or AddConsumer and that should work, one will create and the others are essentially no-ops as long as config is exactly the same. What I am expressing here is that folks will most likely not run an external tool or even the cli, so I thought this was the best place to protect their data. This is a one time and temporary fix, but if we lose their data that will be very bad. And they will not care of we had a tool and they did not use it, imo. |
We cannot guarantee that the developer will call create, so that path is no better at ensuring that an upgrade will be executed and data will not be lost. All options require that we document and communicate the solution. Having an upgrade path in the CLI is the best solution since we promote the CLI for management. Having a one off tool is the 2nd best solution. This solution requires both adding unnecessary code to the client and having the user add code to their codebase or to write a one off of their own, so it is more work for a user. That does not feel like a better UX experience. I built a small Java program. It connects to the server, looks for all streams with the |
My point is we do suggest always calling create unless there is a separate management function that had created it. This allows apps to be independent of a management plane and nomadic. If there is a management plane, I believe you can upgrade the stream underlying the KV with the current NATS cli. |
I have a PR for the CLI that adds fixing the discard policy as part of the |
This is related to #900 that introduced the change of discard
policy if connecting to a server v2.7.2+.
The CreateKeyValue() API calls AddStream(). If the stream already
exists and the configuration is identical, the call succeeds
(idempotent). However, since we now try to set the discard policy
to "new" when connecting to a server v2.7.2+, the call will fail
if the KV store already existed (CreateKeyValue() was called with
a v2.7.1 server and stream was created with DiscardOld).
The approach here is that if AddStream() fails with an "already in use"
error, then we will lookup the stream info, and if we detect that
the configuration is same (except for the discard policy), then
we call UpdateStream() with the new discard policy.
The problematic part is that the client side configuration does
not set some of the fields (or their value is 0), but then the
server sets either some defaults (like Duplicates set to 2min)
or replaces 0 to -1. So the info we get back and the config
we have need to be tweaked before being compared. This is really
hacky and prone to break if server were to change some defaults.
Signed-off-by: Ivan Kozlovic ivan@synadia.com