Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(NODE-3803): Fix _id typing on collection create operations #3077

Merged
merged 16 commits into from Dec 16, 2021

Conversation

baileympearson
Copy link
Contributor

@baileympearson baileympearson commented Dec 13, 2021

Description

This PR improves type inference around the _id fields for schemas used by the Collection class.

What is changing?

This PR updates the type inference that extracts the type of the _id field from schemas to

  • be easier to understand
  • correctly handle the case where an optional _id is passed in

See https://jira.mongodb.org/browse/NODE-3761.

This PR also updates the parameter types for write operations on the collection class to match the following behavior (discussed in this slack thread)

  • _id not defined on collection means the default ObjectId will be used, and won't be required on inserts;
  • _id defined as required on a collection means that the user will provide the _id on every insert
  • _id defined as optional means that the user is expecting to rely on a pkFactory method to generate a consistent id
Is there new documentation needed for these changes?

Yes (I think)

What is the motivation for this change?

Improved typing of the _id field on schemas results in a better developer experience for typescript consumers of the node driver.

Double check the following

  • Ran npm run check:lint script
  • Self-review completed using the steps outlined here
  • PR title follows the correct format: <type>(NODE-xxxx)<!>: <description>
  • Changes are covered by tests

- Update the inferType helper type to correctly infer optional _id types
- Update the OptionalId helper to make the _id optional in all cases
- Create a new helper to require _id if the _id is defined on the schema
@baileympearson baileympearson changed the title WIP fix(NODE-3803): Fix _id typing on collection create operations fix(NODE-3803): Fix _id typing on collection create operations Dec 13, 2021
@baileympearson baileympearson marked this pull request as ready for review December 13, 2021 19:53
@dariakp dariakp added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Dec 14, 2021
src/mongo_types.ts Outdated Show resolved Hide resolved
- Remove union type for optional _id fields
- Add tests for insertX operations to test custom defined _id types
test/types/schema_helpers.test-d.ts Outdated Show resolved Hide resolved
src/mongo_types.ts Show resolved Hide resolved
test/types/community/collection/filterQuery.test-d.ts Outdated Show resolved Hide resolved
@dariakp dariakp added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Dec 16, 2021
Copy link
Contributor

@dariakp dariakp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@dariakp dariakp merged commit f1979db into main Dec 16, 2021
@dariakp dariakp deleted the NODE-3803/fix-_id-typing-on-collection-methods branch December 16, 2021 22:33
@octogonz
Copy link

octogonz commented Feb 20, 2022

@baileympearson @dariakp I don't understand the intended API usage after this change.

Consider this example:

old API usage (before this PR)

interface Book {
  _id: ObjectId;
  title: string
}

const books: Collection<Book> = db.collection('books');

// Let's add a book. Normally books must have an _id, 
// but during creation the service will generate the _id for us,
// so we omit that field for this particular operation:
const newBook: WithoutId<Book> = {
  title: 'My Book'
};
await books.insertOne(newBook);

// Later if I retrieve the book, its _id field is present:
const foundBook: Book | null = await books.findOne({ title: 'My Book' });
if (foundBook) {
  console.log('The book's ID is: ' + foundBook._id);
}

The above code worked fine prior to this PR. However after upgrading the mongodb package, it seems that now we are supposed to change the interface to have an optional _id:

new API usage (after this PR)

interface Book {
  // "_id defined as optional means that the user is expecting to rely on 
  // a pkFactory method to generate a consistent id"
  _id?: ObjectId; 
  title: string
}

But then the type of foundBook._id becomes ObjectId | undefined. This requires me to check for undefined whenever I try to access it.

This doesn't make sense. The foundBook._id cannot ever be undefined, so why should I need these checks? The only situation where _id might be missing is the partially constructed input for APIs like insertOne().

Am I misunderstanding something? (Maybe we just need to expand the unit tests to better indicate the intended usage for this API.)

@dariakp
Copy link
Contributor

dariakp commented Feb 22, 2022

@octogonz Thanks for reaching out. The intended usage for the API is to not specify the _id on the collection at all if you are intending to rely on the automatic _id insertion. The _id should only be specified (as optional or required) if you are intending to manage it manually. Please see my comment here for more detail about the reasoning behind this implementation and the use cases we are trying to support. As far as the _id coming back as ObjectId | undefined, it should not do that on findOne since findOne always returns a WithId<TSchema> where the _id will always be defined.

@octogonz
Copy link

@dariakp Thanks for explaining. I've removed the _id field from all our interfaces as suggested.

However the result is a bit awkward: Basically everywhere that used to say e.g. Book now must say WithId<Book>. If before it said Promise<Book[]>, now it needs to be Promise<WithId<Book>[]>. If before it said Map<string, Partial<Book>>, now it has to be Map<string, Partial<WithId<Book>>>.

In practice, this effect cascades across the code base, because you never know where a function may need to read the _id field of a Book.

And in theory, "book" objects always have an ID. So it seems odd to declare Book without an _id, and to complicate the 99% of non-insertion scenarios, versus for example doing WithoutId<> for the occasional call to insertOne().

I wonder if you might reconsider this API design choice.

@dariakp
Copy link
Contributor

dariakp commented Feb 23, 2022

@octogonz We do appreciate the concern here, but as I highlighted in the linked comment:

We are aware that this may be different from the intuitive notion of a collection document type; unfortunately, since it is impossible to reverse engineer the desired insert shape from the final document shape, only vice versa, we cannot use the final shape of the document to fully define the collection for all driver operations.

The purpose of the collection type is to inform the driver of the expected document shape across all supported operations, hence this is currently the only viable api choice that would allow us to correctly assert _id types on inserts. To mitigate the transition pain, you could keep the _id on Book type and pass WithoutId<Book> when specifying the collection type, which should reduce the amount of code changes required overall.

@justinmchase
Copy link

justinmchase commented Jul 8, 2022

This seems like a regression to me, its very frustrating. I'm reading the thought process behind it and its not good logic.

...since it is impossible to reverse engineer the desired insert shape from the final document shape...

    insertOne(doc: OptionalId<TSchema>): Promise<InsertOneResult<TSchema>>;

Boom done.

I need to reference the _id field all over my code so I need it to be non-optional on my schema. But now I'm stuck, unable to insert documents with the .insertOne(book) api because of this regression, unless I go down the path that @octogonz described, where I have to wrap everything with the WithId<Book> interface, its brutal.

Did you all test this out on a real code base before implementing it? I can't really see how this choice would have gotten past the prototype phase if it had actually been applied to a codebase even a single time.

This whole notion that you shouldn't require it to be defined on your schemas because you're using auto id's on creation seems to completely overlook the fact that you'll need to reference the _id on the schema still.

My workaround is to just cast it when inserting, even though that sucks too for different reasons:

const book = WithoutId<Book> {
  title: 'How to Design APIs'
} 
await books.insertOne(book as Book)

Please reconsider the comments of @octogonz because he's 100% right and this feels like a regression, what was there before was better.

@justinmchase
Copy link

Looking at the code in this PR in more detail, I would just recommend straight up reverting it.

@dariakp
Copy link
Contributor

dariakp commented Jul 8, 2022

Please see reply here: #3100 (comment)

tl;dr There are users who need the id to be required on inserts. What you pass in as the collection type is the insert shape, and regardless of the insert shape, finds will always return a document with a correctly typed required id field, therefore there is no need to cast anything in ANY of the individual methods (read or write) if you accurately specify the id as optional in the collection insert schema.

interface IdPet {
  _id: ObjectId;
  name: string;
  age: number;
}

const database = client.db("<your database>");
const collection = db.collection<OptionalId<IdPet>>("<your collection>");
 
collection.insertOne({
  name: "Spot",
  age: 2
});

const puppy = collection.findOne();  // type of find result is IdPet
// puppy: IdPet

const puppyId = collection.findOne()._id; //_id is defined and accessible
// puppyId: ObjectId

We have more examples in our docs: https://www.mongodb.com/docs/drivers/node/master/fundamentals/typescript/

@justinmchase
Copy link

Ok not gonna lie that's pretty confusing, in my schema its not optional, that's probably the source of the confusion. If I set it to optional on the schema then its optional on the insert but included in the results. Doesn't make sense but it does work.

@dariakp
Copy link
Contributor

dariakp commented Jul 12, 2022

I agree that it is counterintuitive, but it should make sense if you think of it from the perspective of what a typescript schema in the driver can actually enforce. When you specify a typescript schema for the collection, the driver has no way to know whether those types are actually accurate in the database itself - it is not uncommon for more than one microservice to share a collection and modify different attributes. So, you aren't expected to describe the exact shape of the data that exists in the database, instead, you are expected to tell the driver what fields your service will be interacting with (i.e., reading and writing). *(see note)

Ultimately, the collection schema is the "fields you care about when using the driver methods" schema with an indication of which fields you care about always and which only sometimes.

  • If you don't expect to touch a field at all, don't specify it at all. The driver won't know about it and so won't allow it on inserts or return it on finds (unless it's the _id).

  • If you care that a given field is always present on a document when your application interacts with it (i.e. when calling read AND write methods), specify the field as required. The driver will ensure that it's required on inserts and know to expect it to be returned on finds.

  • If you only expect your application to sometimes interact with a field, specify the field as optional. The driver will allow but not require it on inserts and will expect it to be optionally (unless it's the _id) present on finds.

The _id field is only special in that the driver knows that documents without an _id field cannot exist, so it always at least allows the _id field on inserts and knows that it is always present on finds with the default type of ObjectId OR the type you specify in your schema - even if you only sometimes or never care about interacting with it. However, for the users who always care about interacting with it, we allow the tightening of that condition from optional to required on inserts, since there is no harm in allowing you to set a property you ultimately do not use, but there is harm in not enforcing the requirement of a property you care about personally setting.

*Note: In fact, you could have objects of different shapes living side by side in the same collection, and if you have a module in your application that you wish to only work with a specific type of object, then you could specify only that object type as the collection schema used by that module and would get typescript-backed safety for operations performed by the driver inside that module.

@octogonz
Copy link

instead, you are expected to tell the driver what fields your service will be interacting with (i.e., reading and writing).

I guess my point was that reading should be favored over writing. In other words, this is the important basic scenario:

const foundBook: Book | null = await books.findOne({ title: 'My Book' });
if (foundBook) {
  console.log('The book's ID is: ' + foundBook._id);
}

In this scenario we expect Book to have a simple type expression (not a clumsy WithId<Book>). The interface should define our idea of what a "book" is, even if some other microservice might consider a "book" to have extra fields or different required/optional status.

By contrast, writing is the messier scenario:

const newBook: WithoutId<Book> = {
  title: 'My Book'
};
await books.insertOne(newBook);

The clunky WithoutId<Book> is more natural here, because we're not really describing a "book," we're describing the partial inputs to be used to initialize a new document.

tl;dr There are users who need the id to be required on inserts.

In the old design, those users would accomplish that by using Book with insertOne instead of WithoutId<Book>. What I understand from #3100 (comment) is that you wanted to provide an extra safeguard against mistakes -- a policy ensuring that insertOne() will reject WithoutId<Book>.

In the new implementation, this policy is controlled by the Book._id declaration:

export declare type OptionalUnlessRequiredId<TSchema> = TSchema extends {
    _id: any;
} ? TSchema : OptionalId<TSchema>;

The intended relationship between the schema and the _id in the typings is as follows:

  • If no _id is specified on the schema, it is inferred to be of type ObjectId and is optional on inserts.

  • If an _id is specified on the schema as required, then the _id type is inferred to be of the specified type and is required on inserts.

  • If an _id is specified on the schema as optional, it is inferred to be of the specified type and is optional on inserts:

But the effect is more like this:

  • For the minority of users who require the _id on inserts, explicitly declare Book._id as non-optional, mission accomplished 👍👍
  • For everyone else, insertOne() is broken unless you declare Book._id as optional. Oops, your Book interface no longer accurately describes the document. The mitigation is to use WithId<Book> everywhere. 😥

By adding a feature to help the minority, it seems like things got worse for the majority.

Isn't there some better way to solve this? Like maybe instead of TSchema extends { _id: any; } we check for some kind of _id: RequiredOnInsert<TSchema>. Or maybe Collection<T> has a parameterization as Collection<T, IdRequiredOnInsert> or CollectionWithExplicitId<T>. I don't have time to solve this type algebra problem, but maybe someone else can think of something...

@justinmchase It might help to create a proper GitHub issue rather than discussing this problem in a closed PR thread.

@dariakp
Copy link
Contributor

dariakp commented Jul 16, 2022

@octogonz I appreciate the thoughtful feedback here. Opening a ticket in our Jira project: https://jira.mongodb.org/projects/NODE would be the best way to keep track of the discussion and potential changes.

Write schema is preferred over read schema because the write schema is what has the highest risk of leading to data inconsistency. However, if it truly is a major pain point for many users and there are suggestions for a better implementation, we are always open to hearing them.

@octogonz
Copy link

octogonz commented Jul 25, 2022

Here's a ticket : https://jira.mongodb.org/browse/NODE-4470

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team Review Needs review from team
Projects
None yet
5 participants