Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(NODE-6136): parse cursor responses on demand #4112

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

nbbeeken
Copy link
Contributor

@nbbeeken nbbeeken commented May 15, 2024

Description

What is changing?

  • Migrate all cursors to use on demand mechanisms for parsing documents as the cursor's are iterated.
  • Fixed the issue with encryption not supporting MongoDBResponse by changing the internal APIs to only operate on Uint8Arrays
  • "decorate" functionality moved into CryptoConnection, supports building the array of keys that were encrypted
  • A new ExplainCursorResponse class defers refactoring explain handling by simulating a cursor response similar to how the driver works now.
  • Preserve the elements generated in the first and only call to parseToElements, previously error detection tossed the element parsing work.
  • Consolidate the definition of "error". A MongoDB command has only failed when ok is set to 0, all other conditions were approximations and led to unnecessary parsing work (remove isError, only check ok === 0)
    • WriteConcernErrors have not been modified but clarified. The makeWriteConcernResultObject would delete properties when ok was set to 0 which is never the case for a writeConcernError because the "command succeeded" the "write concern/requirements failed". The delete logic would not run and the unit tests that asserted the properties did not exist only passed because the mock data fed in was inaccurate.
    • MongoWriteConcernError now takes an entire response (for simplicity it takes a plain object returned from BSON deserialize) if the response contains a writeConcernError property.
Is there new documentation needed for these changes?

No.

What is the motivation for this change?

Performance and enhancing internal capabilities for discrete BSON parsing.

Release Highlight

Cursor responses are now parsed lazily 🦥

MongoDB cursors (find, aggregate, etc.) operate on batches of documents equal to batchSize. Each time the driver runs out of documents for the current batch it gets more (getMore) and returns each document one at a time through APIs like cursor.next() or for await (const doc of cursor).

Prior to this change, the Node.js driver was designed in such a way that the entire BSON response was decoded after it was received. Parsing BSON, just like parsing JSON, is a synchronous blocking operation. This means that throughout a cursor's lifetime invocations of .next() that need to fetch a new batch hold up on parsing batchSize (default 1000) documents before returning to the user.

In an effort to provide more responsiveness, the driver now decodes BSON "on demand". By operating on the layers of data returned by the server, the driver now receives a batch, and only obtains metadata like size, and if there are more documents to iterate after this batch. After that, each document is parsed out of the BSON as the cursor is iterated.

A perfect example of where this comes in handy is our beloved mongosh! 💚

test> db.test.find()
[
	{ _id: ObjectId('665f7fc5c9d5d52227434c65'), ... },
  ...
]
Type "it" for more

That Type "it" for more message would now print after parsing only the documents displayed rather than after the entire batch is parsed.

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: type(NODE-xxxx)[!]: description
    • Example: feat(NODE-1234)!: rewriting everything in coffeescript
  • Changes are covered by tests
  • New TODOs have a related JIRA ticket

@nbbeeken nbbeeken force-pushed the NODE-6136-cursor-response branch 2 times, most recently from df69639 to e6de40a Compare May 16, 2024 14:53
@nbbeeken nbbeeken force-pushed the NODE-6136-cursor-response branch 2 times, most recently from 31efa9c to a37ec31 Compare June 4, 2024 20:28
@nbbeeken nbbeeken marked this pull request as ready for review June 4, 2024 21:25
@baileympearson baileympearson self-requested a review June 5, 2024 19:54
@baileympearson baileympearson self-assigned this Jun 5, 2024
@baileympearson baileympearson added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Jun 5, 2024
src/client-side-encryption/state_machine.ts Outdated Show resolved Hide resolved
ns,
encrypted,
options,
(responseType ?? MongoDBResponse) as any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come casting is necessary here? I'd expect responseType ?? MongoDBResponse to produce a type of MongoDBResponse, which should satisfy super.command()'s types without casting, and should produce the correct return type without as unknown as MongoDBResponse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I'm committing type crimes. I added a comment clarifying this, but it is not right that we are saying command of T and then passing in MongoDBResponse when responseType is nullish because now that may not match T. I can move to the type crime into the generic or here on the argument, but it will exist until we can only pass in a responseType without defaulting to a different subtype constraint.

typeof MongoDBResponse' is assignable to the constraint of type 'T', but 'T' could be instantiated with a different subtype of constraint 'MongoDBResponseConstructor'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we remove the generic on super? then super.command() should resolve to a mongodbresponse instance.

If there is no response type provided, line 777 resolves to a plain document of type T. So there wouldn't be TS issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it, TY, needed a non-nullish overload on the superclass otherwise this was still saying the result could be Document. Fixed!

src/cmap/wire_protocol/responses.ts Show resolved Hide resolved
src/cursor/change_stream_cursor.ts Outdated Show resolved Hide resolved
src/operations/execute_operation.ts Outdated Show resolved Hide resolved
@@ -13,7 +13,7 @@ export interface CountDocumentsOptions extends AggregateOptions {
}

/** @internal */
export class CountDocumentsOperation extends AggregateOperation<number> {
export class CountDocumentsOperation extends AggregateOperation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export class CountDocumentsOperation extends AggregateOperation {
export class CountDocumentsOperation extends AggregateOperation<number> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AggregateOperation is not generic and if it were it can't be that a subclass returns an incompatible type from the parent. (ex. Parent -> A -> B, B cannot have a method incompatible with Parent otherwise things that expect "Parent" will break when given Bs) This and the command below are TODO_NODE_3286 related. The operations layer TS does not work well and here in particular because the expectation (and reality) is that every command returns an object, I think returning a "number" should be handled up in the coll.countDocument API. Taking it even further I don't see the value in this subclass? can we just call this.aggregate from within collection.countDocuments?

Stuck the appropriate type annotation on here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed - the preferable approach would be to remove this class entirely (up to you). We do have this precedent with our index management APIs too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/unit/error.test.ts Outdated Show resolved Hide resolved
test/unit/cmap/wire_protocol/responses.test.ts Outdated Show resolved Hide resolved
test/integration/crud/explain.test.ts Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Primary Review In Review with primary reviewer, not yet ready for team's eyes
Projects
None yet
2 participants