Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greedy selection of merged fields with dependencies #1959

Closed
gmac opened this issue Aug 28, 2020 · 66 comments
Closed

Greedy selection of merged fields with dependencies #1959

gmac opened this issue Aug 28, 2020 · 66 comments

Comments

@gmac
Copy link
Contributor

gmac commented Aug 28, 2020

It appears that fields with selection dependencies are greedily selected early if merging visits their service prior to collecting their dependencies. Logistically we'd need to visit their service but not collect them, go elsewhere for dependencies, and then return for the dependent fields.

Actually, I'm not sure this is specifically an issue with field-level selections. Using a single static selectionSet: '{ id networkId }' also has the same issue. Either way, I may have led us down a bad road here with the nudge to support federation-style injection patterns – your ethos of "if a service has a field it should be able to fulfill it" avoids this completely. If that's a fundamental divide in the goals of the algorithm, it would be totally reasonable if you want to stick to your original path and drop official support for these dependency patterns. We could pull injected keys from docs and all mentions of Federation similarities, letting it just be its own thing.

import { graphql } from 'graphql';
import { makeExecutableSchema } from '@graphql-tools/schema';
import { stitchSchemas } from '@graphql-tools/stitch';

describe('federation pattern', () => {

  const contentSchema = makeExecutableSchema({
    typeDefs: `
      type Post {
        id: ID!
        title: String!
        networkId: ID!
      }
      type Query {
        _postsContent(ids: [ID!]!):[Post]!
      }
    `,
    resolvers: {
      Query: {
        _postsContent: (root, args) => args.ids.map(id => ({ id, title: `Post`, networkId: Number(id)+1 })),
      },
    },
  });

  const layoutSchema = makeExecutableSchema({
    typeDefs: `
      type Post {
        id: ID!
        network: Network! @deprecated(reason: "gateway access only")
      }
      type Network {
        id: ID!
        name: String!
      }

      scalar Any
      type Query {
        postLayout(id: ID!): Post
        _partials(representations: [Any!]!): [Post]!
      }
    `,
    resolvers: {
      Query: {
        postLayout: (root, args) => ({ id: args.id }),
        _partials: (obj, args) => args.representations,
      },
      Post: {
        network(post, args, ctx, info) {
          return {
            id: post.networkId,
            name: `network`
          };
        }
      }
    },
  });

  const stitchedSchema = stitchSchemas({
    subschemas: [
      {
        schema: contentSchema,
        merge: {
          Post: {
            selectionSet: '{ id }',
            fieldName: '_postsContent',
            key: ({ id }) => id,
            argsFromKeys: (ids) => ({ ids }),
          }
        }
      },
      {
        schema: layoutSchema,
        merge: {
          Post: {
            selectionSet: '{ id }',
            fields: {
              network: { selectionSet: '{ networkId }' }
            },
            fieldName: '_partials',
            key: ({ id, networkId }) => ({ id, networkId }),
            argsFromKeys: (representations) => ({ representations }),
          }
        }
      },
    ],
    mergeTypes: true,
  });

  test('works', async () => {
    // enters through contentSchema and
    // collects field dependencies for layout service
    const result = await graphql(
      stitchedSchema, `
        query {
          _postsContent(ids: [23]) {
            id
            title
            network {
              id
            }
          }
        }
      `,
    );

    console.log(JSON.stringify(result, null, 2))
  });

  test('doesnt work', async () => {
    // enters through layoutSchema and attempts
    // to make the full selection including network,
    // which has an unmet field dependency
    const result = await graphql(
      stitchedSchema, `
        query {
          postLayout(id: 23) {
            id
            title
            network {
              id
            }
          }
        }
      `,
    );

    console.log(JSON.stringify(result, null, 2))
  });
});
@yaacovCR
Copy link
Collaborator

not sure this is a field selection set issue

our algorithm currently just assumes that each subservice should be visited only once

it therefore subtracts the fields from the subschemas visited so far and visits only subschemas not yet visited

but, even without a field selection set, i.e. if networkId is moved to the object-level selectionSet, the originating subservice in this example must be visited twice

this sounds like a true bug in the algorithm

@yaacovCR
Copy link
Collaborator

Ah, I see not a bug in field selectionSet also not a bug in the algorithm.... Just assumed that each subservice can truly work independently, which differs from federation...

Deferring field resolution could be considered as an enhancement, of course with algorithm changes, but I guess for now working as designed

@gmac
Copy link
Contributor Author

gmac commented Aug 28, 2020 via email

@gmac
Copy link
Contributor Author

gmac commented Aug 29, 2020

Circling back, to distill down the insights learned here... it sounds like what we have are static and dynamic types:

  • Static types have entirely static (disk) data furnished by their service, and they work fine.
  • Dynamic types build entirely dynamic data based on input from the gateway, and they also work fine.

So where we run into trouble is with a hybrid type that has both static and dynamic data. This doesn’t work because the static data could originate in a subservice without gateway input present during the delegation round. I think the docs can be updated to articulate this differentiation. I do think there’s a strong practical case to be made for considering a staged resolution approach that builds around dependency groups rather than just schema membership in the long run.

@yaacovCR
Copy link
Collaborator

Hmmm. Not sure I understand all of above, but I'm looking forward to reading through the example more and trying to get it.

What I am wondering from example above is whether the network id should be wrapped in a network type by transform prior to merging...

Would that solve/change anything?

@yaacovCR
Copy link
Collaborator

I think I understand a bit more and that it wouldn't.

Overall, would like to get query batching in play to replace this federation-type behavior.

I think the easiest pattern to teach and implement is where each GraphQL schema is valid on its own. Would be in favor of docs noting that federation type schemas are not necessarily recommended.

@gmac
Copy link
Contributor Author

gmac commented Aug 30, 2020 via email

@gmac
Copy link
Contributor Author

gmac commented Aug 30, 2020 via email

@yaacovCR
Copy link
Collaborator

yaacovCR commented Aug 31, 2020

I think best would be to provide some generic options to configure how types and fields are merged.

For example, this code picks the last description for a type and picks up into the gateway the last version of a field form all the subschemas:

const descriptions = pluck<string>('description', candidates);
const description = descriptions[descriptions.length - 1];
const configs = candidates.map(candidate => (candidate.type as GraphQLObjectType).toConfig());
const fields = configs.reduce<GraphQLFieldConfigMap<any, any>>(
(acc, config) => ({
...acc,
...config.fields,
}),
{}
);

Something like onTypeConflict for when types are not merged, maybe onDescriptionConflict, onFieldConflict, etc.

yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info.

TODO:
- Add testing!
- Extensions support should be added by a custom option?

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info.

TODO:
- Add testing!
- Extensions support should be added by a custom option?

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
@yaacovCR yaacovCR mentioned this issue Aug 31, 2020
Merged
yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info.

TODO:
- Add testing!
- Extensions support should be added by a custom option?

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
@gmac
Copy link
Contributor Author

gmac commented Aug 31, 2020

onDescriptionConflict would be perfect. An onDeprecationConflict would also be useful.

yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954

fix
yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
@yaacovCR
Copy link
Collaborator

Hmmm. I am wondering if instead of using onDescriptionConflict and onDeprecatedConflict, it would be better to have one simple onFieldConflict option...

I think I would prefer that...

@gmac
Copy link
Contributor Author

gmac commented Aug 31, 2020 via email

@yaacovCR
Copy link
Collaborator

Ah, made a mistake already, onDescriptionConflict is with respect to types, while onDeprecatedConflict is with respect to fields. Have to have a longer think about this, your thoughts welcome. The idea would be to make these as generic as possible.

yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
@yaacovCR
Copy link
Collaborator

I see, so there’s already an onTypeConflict that can handle type-level docstring mashing? This is another prime area for documentation explaining how to do this.

Not exactly. onTypeConflict is a function can be used when schema stitching when types are not merged and is meant to select a type when there is conflict, the parameters given are the two conflicting types as well as the metadata in terms of the schemas from whence they came, and the result should be a new type.

When merging types, the fields and attributes require a host of custom merging that stitchSchemas gives you for free, but because of that, onTypeConflict does not exactly work. I am just trying to think of a way we could make this as generic as possible.

I think we will need separate functions for each type attribute that could conflict, i.e. onTypeDescriptionConflict but that we could have a single generic function for fields that conflict, i.e. onFieldConflict.

@yaacovCR
Copy link
Collaborator

The default onTypeConflict behavior when stitching btw is to just take the last type.

yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit that referenced this issue Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit that referenced this issue Aug 31, 2020
When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
@gmac
Copy link
Contributor Author

gmac commented Aug 31, 2020 via email

@gmac
Copy link
Contributor Author

gmac commented Sep 4, 2020 via email

@gmac
Copy link
Contributor Author

gmac commented Sep 4, 2020

I think your mention of a plugins system is on the money. An interface that works something like this would be really modular and extensible:

const gatewaySchema = stitchSchemas({
  subschemas: [ ... ],
  mergeTypes: true,
  plugins: [
    new ComputedFieldsPlugin(),
    new CombineDescriptionsPlugin({ excludePrefix: '!omit' }),
    new UndeprecateFieldsPlugin({ withReason: 'gateway access only' }),
  ]
});

@yaacovCR
Copy link
Collaborator

yaacovCR commented Sep 5, 2020

I imagine each plugin would take each SubschemaConfig object as parameter and return modified version for next plugin in pipeline?

@gmac
Copy link
Contributor Author

gmac commented Sep 5, 2020

I was envisioning that plugins would be at the level of the stitched schema, not the level of subschemas. That way a plugin could tap into numerous aspects of the stitched configuration. I just pushed an update to #1989 that demonstrates the proposed pattern.

@gmac
Copy link
Contributor Author

gmac commented Sep 5, 2020

@yaacovCR
Copy link
Collaborator

yaacovCR commented Sep 5, 2020

Actually, I guess the plugins would have to take the whole options object as input and return modified version.

Should we allow plugins to add additional plugins then? I would imagine not 🙂

@gmac
Copy link
Contributor Author

gmac commented Sep 5, 2020 via email

@yaacovCR
Copy link
Collaborator

yaacovCR commented Sep 9, 2020

See my latest changes in #1989, it appears that field selection set isolation can cause a subschema that properly returns its fields to fail, if it returns new data not present in the other subschemas.

mostStockedProduct: () => ({ upc: '3', price: 1, weight: 8560 }),

When a product with upc of 4 is returned from the inventory schema along with price and weight, the inventory schema on its own can calculate the shippingEstimate. If you enable field selection set isolation, the gateway cannot, because upc of 4 is not present in productSchema, Similarly, for upc of 3, when enabling field selection set isolation, price and weight are read from the product schema, even though supplied by the inventory schema, so that results may differ from schema to gateway.

What should the default behavior be? Presumably, it should be up to configuration on the gateway, but I do not think it is obvious that isolation should be enabled by default. It depends on the expectations of how the schema resolvers were written. If, for example, price and weight are never returned, presumably field selection set isolation should be enabled. But that is difficult to automatically introspect....

@gmac
Copy link
Contributor Author

gmac commented Sep 10, 2020

Trying to follow along... it's a little confusing because I don't see a upc of 4 in that example anymore. But I think I follow what you're saying, though I'm not convinced that it matches the goals of computed fields...

When a product with upc of 4 is returned from the inventory schema along with price and weight, the inventory schema on its own can calculate the shippingEstimate

The inventory schema doesn't provide fields for price and weight in its schema, so I don't feel there's any reasonable expectation that it should be able to return early with a precomputed value. When fields are marked as @required, the expectation is that the gateway will send them their dependency data, by design. Any local data caches are presumably untrusted and therefore ignored.

If you enable field selection set isolation, the gateway cannot, because upc of 4 is not present in productSchema

This sounds like a data error by any standard that has nothing to do with computed fields. Products schema is the only place that defines price and weight fields... If a record doesn't exist in that service, then there will always be an error when attempting to access price or weight on that record because the lookup will fail in the only service that provides the fields.

Similarly, for upc of 3, when enabling field selection set isolation, price and weight are read from the product schema, even though supplied by the inventory schema, so that results may differ from schema to gateway.

This sounds right to me for computed fields flow. The only thing that sounds off is that inventory schema is doing anything with local price and weight values when the expectation is that the gateway provides those values.

Bottom line is – when a field is marked as required, it is expecting its value to come from the gateway (and in turn–from some other service elsewhere). The developer has enabled a hard @requires because we presumably wouldn't trust a local value even if we had one (this is that exact case I was telling you about: while my Entry may have a cache of data, I can't trust it not to be stale, therefore want the data to come from the source-of-truth service via the gateway). All that is to say: extra back-and-forth is often preferable for data accuracy over doing the least amount of work possible to return the data in an untrusted form. The @requires mechanism is most definitely not a feature that promotes efficiency – it is a tool that enables accuracy.

@gmac
Copy link
Contributor Author

gmac commented Sep 10, 2020

ALL THAT SAID – I feel like we're getting into a gray area here where we've added a pretty high-touch feature that (with all due respect) doesn't sound like you're super comfortable with the implementation or use of. I wouldn't be the least bit offended if we were to scrap #1989 and scratch it off the board. It's going to be a bigger liability than boon in the long term if the primary author isn't comfortable with it being there. Type merging is still damn good without it, and personally I can navigate my way around without it there. I was only promoting it because I saw an opportunity for a general purpose feature that others looking for a federation alternative could benefit from.

@gmac
Copy link
Contributor Author

gmac commented Sep 10, 2020

Just to make a final appeal here before scrapping this or shipping a enableFieldSelectionSetIsolation (which I don't think makes any intuitive sense as an option)... I think your implementation of a dependency pattern was quite a ways off and you were encountering errors of the example's own creation. This is what a federation implementation looks like... would you consider backing out the enableFieldSelectionSetIsolation stuff and running the original code against this setup? The mostStockedProduct field you added works as expected with enableFieldSelectionSetIsolation turned on...

Main note from the original test:

mostStockedProduct: () => ({ upc: '3', price: 1, weight: 8560 }),

^^ That service isn't realistic for several reasons. The inventory array represents a database, so this mostStockedProduct is returning a record that doesn't match the database schema... then, inventory service doesn't offer price or weight GraphQL fields, and does offer inStock which isn't on that object. The way this would realistically work is that upc 3 would be looked up from the local database, ie: mostStockedProduct: () => inventory.find(i => i.upc === '3'). All other fields would be resolved from other services, and shippingEstimate gets resolved onto most stocked product after visiting products schema for the required fields specified.

import { graphql } from 'graphql';

import { makeExecutableSchema } from '@graphql-tools/schema';
import { ExecutionResult } from '@graphql-tools/utils';
import { stitchSchemas } from '@graphql-tools/stitch';

describe('merging using type merging', () => {
  const users = [
    {
      id: '1',
      name: 'Ada Lovelace',
      birthDate: '1815-12-10',
      username: '@ada'
    },
    {
      id: '2',
      name: 'Alan Turing',
      birthDate: '1912-06-23',
      username: '@complete',
    },
  ];

  const accountsSchema = makeExecutableSchema({
    typeDefs: `
      type Query {
        me: User
        _userById(id: ID!): User
        _usersById(ids: [ID!]!): [User]
      }
      type User {
        id: ID!
        name: String
        username: String
      }
    `,
    resolvers: {
      Query: {
        me: () => users[0],
        _userById: (_root, { id }) => users.find(user => user.id === id),
        _usersById: (_root, { ids }) => ids.map((id: any) => users.find(user => user.id === id)),
      },
    },
  });

  const inventory = [
    { upc: '1', inStock: true },
    { upc: '2', inStock: false },
    { upc: '3', inStock: true }
  ];

  const inventorySchema = makeExecutableSchema({
    typeDefs: `
      directive @requires(selectionSet: String!) on FIELD_DEFINITION

      input ProductRepresentation {
        upc: String!
        price: Int
        weight: Int
      }
      type Product {
        upc: String!
        inStock: Boolean
        shippingEstimate: Int @requires(selectionSet: "{ price weight }")
      }
      type Query {
        mostStockedProduct: Product
        _products(representations: [ProductRepresentation!]!): [Product]!
      }
    `,
    resolvers: {
      Product: {
        shippingEstimate: product => {
          if (product.price > 1000) {
            return 0 // free for expensive items
          }
          return Math.round(product.weight * 0.5) || null; // estimate is based on weight
        }
      },
      Query: {
        mostStockedProduct: () => inventory.find(i => i.upc === '3'),
        _products: (_root, { representations }) => {
          return representations.map(rep => ({ ...rep, ...inventory.find(i => i.upc === rep.upc) }));
        },
      },
    },
  });

  const products = [
    {
      upc: '1',
      name: 'Table',
      price: 899,
      weight: 100
    },
    {
      upc: '2',
      name: 'Couch',
      price: 1299,
      weight: 1000
    },
    {
      upc: '3',
      name: 'Chair',
      price: 54,
      weight: 50
    }
  ];

  const productsSchema = makeExecutableSchema({
    typeDefs: `
      type Query {
        topProducts(first: Int = 2): [Product]
        _productByUpc(upc: String!): Product
        _productsByUpc(upcs: [String!]!): [Product]
      }
      type Product {
        upc: String!
        name: String
        price: Int
        weight: Int
      }
    `,
    resolvers: {
      Query: {
        topProducts: (_root, args) => products.slice(0, args.first),
        _productByUpc: (_root, { upc }) => products.find(product => product.upc === upc),
        _productsByUpc: (_root, { upcs }) => upcs.map((upc: any) => products.find(product => product.upc === upc)),
      }
    },
  });

  const usernames = [
    { id: '1', username: '@ada' },
    { id: '2', username: '@complete' },
  ];

  const reviews = [
    {
      id: '1',
      authorId: '1',
      product: { upc: '1' },
      body: 'Love it!',
    },
    {
      id: '2',
      authorId: '1',
      product: { upc: '2' },
      body: 'Too expensive.',
    },
    {
      id: '3',
      authorId: '2',
      product: { upc: '3' },
      body: 'Could be better.',
    },
    {
      id: '4',
      authorId: '2',
      product: { upc: '1' },
      body: 'Prefer something else.',
    },
  ];

  const reviewsSchema = makeExecutableSchema({
    typeDefs: `
      type Review {
        id: ID!
        body: String
        author: User
        product: Product
      }
      type User {
        id: ID!
        username: String
        numberOfReviews: Int
        reviews: [Review]
      }
      type Product {
        upc: String!
        reviews: [Review]
      }
      type Query {
        _userById(id: ID!): User
        _usersById(ids: [ID!]!): [User]
        _reviewById(id: ID!): Review
        _productByUpc(upc: String!): Product
        _productsByUpc(upcs: [String!]!): [Product]
      }
    `,
    resolvers: {
      Review: {
        author: (review) => ({ __typename: 'User', id: review.authorId }),
      },
      User: {
        reviews: (user) => reviews.filter(review => review.authorId === user.id),
        numberOfReviews: (user) => reviews.filter(review => review.authorId === user.id).length,
        username: (user) => {
          const found = usernames.find(username => username.id === user.id)
          return found ? found.username : null
        },
      },
      Product: {
        reviews: (product) => reviews.filter(review => review.product.upc === product.upc),
      },
      Query: {
        _reviewById: (_root, { id }) => reviews.find(review => review.id === id),
        _userById: (_root, { id }) => ({ id }),
        _usersById: (_root, { ids }) => ids.map((id: string) => ({ id })),
        _productByUpc: (_, { upc }) => ({ upc }),
        _productsByUpc: (_, { upcs }) => upcs.map((upc: string) => ({ upc })),
      },
    }
  });

  const stitchedSchema = stitchSchemas({
    subschemas: [
      {
        schema: accountsSchema,
        merge: {
          User: {
            selectionSet: '{ id }',
            fieldName: '_userById',
            args: ({ id }) => ({ id })
          }
        },
        batch: true
      },
      {
        schema: inventorySchema,
        enableFieldSelectionSetIsolation: true,
        merge: {
          Product: {
            selectionSet: '{ upc }',
            fieldName: '_products',
            key: ({ upc, weight, price }) => ({ upc, weight, price }),
            argsFromKeys: (representations) => ({ representations }),
          }
        },
        batch: true
      },
      {
        schema: productsSchema,
        merge: {
          Product: {
            selectionSet: '{ upc }',
            fieldName: '_productByUpc',
            args: ({ upc }) => ({ upc }),
          }
        },
        batch: true
      },
      {
        schema: reviewsSchema,
        merge: {
          User: {
            selectionSet: '{ id }',
            fieldName: '_usersById',
            args: (ids) => ({ ids }),
            key: ({ id }) => id,
          },
          Product: {
            selectionSet: '{ upc }',
            fieldName: '_productByUpc',
            args: ({ upc }) => ({ upc }),
          },
        },
        batch: true
      }],
    mergeTypes: true,
  });

  it('gets most stocked product', async () => {
    const result = await graphql(
      stitchedSchema, `
        query {
          topProducts(first: 2) {
            upc
            shippingEstimate
          }
          mostStockedProduct {
            upc
            shippingEstimate
          }
        }
      `,
      undefined,
      {},
    );

    console.log(JSON.stringify(result, null, 2))
  });
});

@gmac
Copy link
Contributor Author

gmac commented Sep 10, 2020

Hmmm... actually, looks like the present implementation is back to only visiting each subschema once, or else isolation isnt working properly anymore.

This has proper shipping estimates but no inStock:

  it('gets most stocked product', async () => {
    const result = await graphql(
      stitchedSchema, `
        query {
          topProducts(first: 2) {
            upc
            shippingEstimate
            inStock
          }
          mostStockedProduct {
            upc
            shippingEstimate
            inStock
          }
        }
      `,
      undefined,
      {},
    );

    console.log(JSON.stringify(result, null, 2))
  });

While this has inStock with no shipping estimates (note selection order):

  it('gets most stocked product', async () => {
    const result = await graphql(
      stitchedSchema, `
        query {
          topProducts(first: 2) {
            upc
            inStock
            shippingEstimate
          }
          mostStockedProduct {
            upc
            inStock
            shippingEstimate
          }
        }
      `,
      undefined,
      {},
    );

    console.log(JSON.stringify(result, null, 2))
  });

It would appear that we're back to only visiting each subservice once here, and the two portions of the inventory schema are together only visited once?

@yaacovCR
Copy link
Collaborator

yaacovCR commented Sep 10, 2020

There's no single products database potentially, as you mentioned in terms of stale data, could be multiple, especially in terms of remote schemas not under control.

So which is stale? To prefer the database with exported fields is one choice, but it is perfectly valid to have the product object contain metadata that is not available via GraphQL, but possibly more accurate... in terms of calculating the shipping estimate.

I think the default of setting the option to true makes sense if you are in control of all the subschemas, as in your use case, but potentially false when thinking about remote schemas not under control.

Easiest thing is putting it under an option, either once for the entire stitch schemas, or for each sub schema, or for each field...

We can also consider instead of a flag, putting the selectionSet data for fields meant to be isolated under something other than fields, like gatewayFields, that is essentially like making the option enabled on a per field basis...

In terms of bug, we'll have to investigate! All tests pass under current implementation, so looks like we need more tests prior to shipping.

@yaacovCR
Copy link
Collaborator

Can you add the failing tests and I will work on it?

@yaacovCR
Copy link
Collaborator

yaacovCR commented Sep 10, 2020

could not reproduce your failing test, in the meantime i switched the test that i did add to use the field selection set isolation method

I renamed the property switch to useGatewayData

@yaacovCR
Copy link
Collaborator

a bit snazzier than enableFieldSelectionSetIsolation + explains what is being done rather than how

@gmac
Copy link
Contributor Author

gmac commented Sep 10, 2020

I guess I still don't think that a single useGatewayData is very intuitive as an option. It's too arcane. I still feel pretty strongly that the pattern that makes the most sense would be to activate this on a per-field basis like how the feature was originally written. Only fields with an @requires or a required: true attribute would be isolated, and the behavior would be implicit in the presence of a requirement. Having an alternate fields namespace like gatewayFields would also work here, although creates its own confusion. Personally, I think @requires or field: { selectionSet: ..., required: true } is extremely expressive of the config and not something you're going to do accidentally; and makes the system extremely granular (one subservice-level config doesn't give field level control, which is more useful than I had originally imagined).

Then to some points above:

So which is stale? To prefer the database with exported fields is one choice, but it is perfectly valid to have the product object contain metadata that is not available via GraphQL, but possibly more accurate... in terms of calculating the shipping estimate.

Sure, but in this case, there's no reason to put an @requires on the field. If you have an internal cache of data that you trust, there's no need to require the field to come from the gateway. If it's an either/or where you may want to use gateway data or prefer your own, that's fine too – it's quite easy to differentiate a database record from a representation sent by the gateway (though we've admittedly been munging that distinction in our simple tests), at which time you can write logic that selects one from the other. This resolver discussion is kind of burying the simple function of a @required field – the directive simply means that the gateway always sends it. What the resolver does is beside the point.

I think the default of setting the option to true makes sense if you are in control of all the subschemas, as in your use case, but potentially false when thinking about remote schemas not under control.

I guess I'd disagree here. Let's say we're consuming a third-party federation service and we read a field marked with @requires... that is not a soft requirement; the field expects that to be sent by the gateway. We have to always send that data if we're going to successfully consume the field. The only case where we wouldn't need to isolate a field is the off-chance that someone is using a @requires directive in the non-federation context, at which time we've isolated the field for no reason (benign). At this point, @requires feels ubiquitous enough as federation behavior that I think it makes sense to trigger isolation in response to it. Also, nothing says that static config couldn't be written as fieldName: { selectionSet: ..., required: false }, and we'd always believe static config over SDL directives.

So, all told – I still think this feature makes the most sense as it was originally written: a field marked as required is isolated for the simple purpose of making it always send its required data. No required fields means nothing is isolated.

@yaacovCR
Copy link
Collaborator

useGatewayData false with field selection sets is still useful because when the field is not requested, the gateway does not need to add the selection set.

that would break a schema that is not expecting it when the fresh data is at the remote subschema...

but it sounds like you are agreeing that it is okay to have a flag with this behavior off by default

I think it makes sense for the directive to be called requires, but within the mergeConfig it kind of doesn't make sense, because all fields "require" the selection set, it makes sense to call it "federate: true" or something like that...

@gmac
Copy link
Contributor Author

gmac commented Sep 10, 2020

Just added this test here that demonstrates the failure: e1533fc

When querying topProducts (originates in products service), only one of shippingEstimate OR inStock will be resolved, whichever field is selected first. This leads me to believe that merging is not visiting both the base and the computed schema for inventory schema, perhaps because they share an endpoint? These fields together work on mostStockedProduct which originates in the inventory service and returns to itself in a future generation with the required field.

@yaacovCR
Copy link
Collaborator

And then the default for the requires directive could be to have federate true, optionally have federate false

@gmac
Copy link
Contributor Author

gmac commented Sep 10, 2020

but it sounds like you are agreeing that it is okay to have a flag with this behavior off by default

I still don't fully understand the subservice level flag; but at the field level, yes, I would agree that this would be a field level flag that would be off by default and opted into. What the field is called in field level config I guess I'm not willing to go to battle over, but I still think @requires and required: true make the most sense for parity and because the selection set is required to be sent for the field (ie: it is not an optional "send it if you're going that direction anyway" kind of field).

So does this mean you have to enable a subservice-level flag in order to activate @requires behavior? If so, that feels weird to me that there's a setting to activate a setting.

@yaacovCR
Copy link
Collaborator

If we have federate or some other field level flag we don't need a subschema level flag, correct

@yaacovCR
Copy link
Collaborator

The problem with required as name for merge config is that setting it to false is a bit misleading, the gateway is still required to request the selection set in order to resolve the field unless the data originates from that sub service...

Within the directive, requires is the name of the directive and we are never setting it to false, we would have a different flag with perhaps a different default such as federate to indicate whether we are using the isolating behavior.... This sounds like the best option to me...

@yaacovCR
Copy link
Collaborator

Not quite sure about that failing test, I wonder what happens when you turn batching off... Hopefully will be able to take a look at it in a few hours

@yaacovCR
Copy link
Collaborator

Sorry I forced push probably should have merge master

@yaacovCR
Copy link
Collaborator

#1989 merged!

@yaacovCR yaacovCR added the waiting-for-release Fixed/resolved, and waiting for the next stable release label Sep 13, 2020
@yaacovCR
Copy link
Collaborator

yaacovCR commented Oct 1, 2020

Available in latest release...

@yaacovCR yaacovCR closed this as completed Oct 1, 2020
@yaacovCR yaacovCR removed the waiting-for-release Fixed/resolved, and waiting for the next stable release label Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants