Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved timestamps option handling in bulkWrite #14546

Merged
merged 2 commits into from
Apr 29, 2024

Conversation

sderrow
Copy link
Contributor

@sderrow sderrow commented Apr 28, 2024

Fixes #14536

Summary

There's a fairly inconsistent experience with the timestamps option in bulkWrite right now:

  • The only way to influence insertOne operations is by specifying timestamps in the the top-level bulkWrite options
  • updateOne and updateMany have their timestamps option set purely at the operation level
  • There's no way to set the timestamps option for replaceOne operations

Let's debate desired behavior here, but this PR introduces a hierarchy in order to (mostly) prevent breaking changes:

  1. If timestamps is specified within a given operation (e.g., op.updateOne.timestamps), then use it
  2. Next look at the timestamps setting in the top-level bulkWrite options and use it if it exists
  3. Finally, default is true

Pros

  • Consistency!
  • Doesn't represent a breaking change for many use-cases of officially-documented options, given insertOne is fully controlled by a different level than updateOne and updateMany. But it's not 100% non-breaking (see below)
  • Gives users most flexibility. In my own experience, I'm generally either trying to set timestamps for all operations or none at all, which makes the top-level option useful. But if you do have a mix of commands where you want to set timestamps vs not, it's nice not having to split them into fully separate bulkWrite commands. I've run into this scenario myself, so I understand the benefit

Cons

  • Complexity! Being able to specify the same option at multiple levels and keep track of what level overrides which other level is maybe too much.
  • Technically this is a breaking change. If you send a bulkWrite with a mix of insertOne and updateOne commands, you currently control insertOne at the top level and updateOne at the operation level. Suppose you want to set timestamps for all updateOne commands but NOT for any insertOne commands. You would do that today by NOT specifying anything for timestamps in the updateOne commands and specifying timestamps: false in the top-level options. Moving forward, with this PR as written, the same bulkWrite will no longer set timestamps on anything.

Alternatives

  1. Make this truly non-breaking, and just add insertOne.timestamps and replaceOne.timestamps options. But we need to leave the top-level timestamps option in order to not break insertOne timestamp control for existing users. This will remain an inconsistent experience, as top-level will only control insertOne commands
  2. Embrace the breakage while also keeping it simple by removing top-level timestamps altogether and just doing timestamps specification at the command level for all command types. Since it's a breaking change, delay landing this until 9.X
  3. Make the argument that the top-level timestamps option for bulkWrite has never been officially documented, so we can remove it without it being a breaking change. Add insertOne.timestamps and replaceOne.timestamps options, and ship with 8.X. Honestly this is probably the optimal outcome from a simplicity/consistency/speed perspective...

Conclusion

After writing out everything above, I'm thinking my current solution as written is the worst of both worlds! Out of the alternatives listed above, I'm partial to alternative 3, but I don't know how others think about the argument that removing top-level timestamps is non-breaking.

@vkarpov15 vkarpov15 added this to the 8.3.3 milestone Apr 29, 2024
@vkarpov15
Copy link
Collaborator

Honestly, I think this PR is the way to go. Allowing setting timestamps on both individual operations as well as for the entire bulkWrite() is the most complete solution and most consistent with the intent of the API as originally written. IMO it's only backwards breaking inasmuch as any bug fix is backwards breaking.

@vkarpov15 vkarpov15 merged commit 434dac8 into Automattic:master Apr 29, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consistent top-level timestamps option for bulkWrite operations
2 participants