Greedy selection of merged fields with dependencies #1959

gmac · 2020-08-28T04:16:18Z

It appears that fields with selection dependencies are greedily selected early if merging visits their service prior to collecting their dependencies. Logistically we'd need to visit their service but not collect them, go elsewhere for dependencies, and then return for the dependent fields.

Actually, I'm not sure this is specifically an issue with field-level selections. Using a single static selectionSet: '{ id networkId }' also has the same issue. Either way, I may have led us down a bad road here with the nudge to support federation-style injection patterns – your ethos of "if a service has a field it should be able to fulfill it" avoids this completely. If that's a fundamental divide in the goals of the algorithm, it would be totally reasonable if you want to stick to your original path and drop official support for these dependency patterns. We could pull injected keys from docs and all mentions of Federation similarities, letting it just be its own thing.

import { graphql } from 'graphql';
import { makeExecutableSchema } from '@graphql-tools/schema';
import { stitchSchemas } from '@graphql-tools/stitch';

describe('federation pattern', () => {

  const contentSchema = makeExecutableSchema({
    typeDefs: `
      type Post {
        id: ID!
        title: String!
        networkId: ID!
      }
      type Query {
        _postsContent(ids: [ID!]!):[Post]!
      }
    `,
    resolvers: {
      Query: {
        _postsContent: (root, args) => args.ids.map(id => ({ id, title: `Post`, networkId: Number(id)+1 })),
      },
    },
  });

  const layoutSchema = makeExecutableSchema({
    typeDefs: `
      type Post {
        id: ID!
        network: Network! @deprecated(reason: "gateway access only")
      }
      type Network {
        id: ID!
        name: String!
      }

      scalar Any
      type Query {
        postLayout(id: ID!): Post
        _partials(representations: [Any!]!): [Post]!
      }
    `,
    resolvers: {
      Query: {
        postLayout: (root, args) => ({ id: args.id }),
        _partials: (obj, args) => args.representations,
      },
      Post: {
        network(post, args, ctx, info) {
          return {
            id: post.networkId,
            name: `network`
          };
        }
      }
    },
  });

  const stitchedSchema = stitchSchemas({
    subschemas: [
      {
        schema: contentSchema,
        merge: {
          Post: {
            selectionSet: '{ id }',
            fieldName: '_postsContent',
            key: ({ id }) => id,
            argsFromKeys: (ids) => ({ ids }),
          }
        }
      },
      {
        schema: layoutSchema,
        merge: {
          Post: {
            selectionSet: '{ id }',
            fields: {
              network: { selectionSet: '{ networkId }' }
            },
            fieldName: '_partials',
            key: ({ id, networkId }) => ({ id, networkId }),
            argsFromKeys: (representations) => ({ representations }),
          }
        }
      },
    ],
    mergeTypes: true,
  });

  test('works', async () => {
    // enters through contentSchema and
    // collects field dependencies for layout service
    const result = await graphql(
      stitchedSchema, `
        query {
          _postsContent(ids: [23]) {
            id
            title
            network {
              id
            }
          }
        }
      `,
    );

    console.log(JSON.stringify(result, null, 2))
  });

  test('doesnt work', async () => {
    // enters through layoutSchema and attempts
    // to make the full selection including network,
    // which has an unmet field dependency
    const result = await graphql(
      stitchedSchema, `
        query {
          postLayout(id: 23) {
            id
            title
            network {
              id
            }
          }
        }
      `,
    );

    console.log(JSON.stringify(result, null, 2))
  });
});

The text was updated successfully, but these errors were encountered:

yaacovCR · 2020-08-28T06:09:40Z

not sure this is a field selection set issue

our algorithm currently just assumes that each subservice should be visited only once

it therefore subtracts the fields from the subschemas visited so far and visits only subschemas not yet visited

but, even without a field selection set, i.e. if networkId is moved to the object-level selectionSet, the originating subservice in this example must be visited twice

this sounds like a true bug in the algorithm

yaacovCR · 2020-08-28T07:01:50Z

Ah, I see not a bug in field selectionSet also not a bug in the algorithm.... Just assumed that each subservice can truly work independently, which differs from federation...

Deferring field resolution could be considered as an enhancement, of course with algorithm changes, but I guess for now working as designed

gmac · 2020-08-28T11:59:36Z

I should also have provided some more context here because this example is contrived as a root query. In practice, where this commonly happens is when Post is returned as an ID-only stub. For example, a “recentPosts” service might return id-only stubs representing the posts. The practical issue is that “recentPosts” would include the selection for fields it may not have dependency data for. While it seems counter-intuitive that we’d visit the same service twice at a glance, this is how basic stitch works... fields have dependencies and talk to SOME service to fulfill them, and coincidence would have it that sometimes it’s the same service.

gmac · 2020-08-29T01:52:17Z

Circling back, to distill down the insights learned here... it sounds like what we have are static and dynamic types:

Static types have entirely static (disk) data furnished by their service, and they work fine.
Dynamic types build entirely dynamic data based on input from the gateway, and they also work fine.

So where we run into trouble is with a hybrid type that has both static and dynamic data. This doesn’t work because the static data could originate in a subservice without gateway input present during the delegation round. I think the docs can be updated to articulate this differentiation. I do think there’s a strong practical case to be made for considering a staged resolution approach that builds around dependency groups rather than just schema membership in the long run.

yaacovCR · 2020-08-29T18:19:16Z

Hmmm. Not sure I understand all of above, but I'm looking forward to reading through the example more and trying to get it.

What I am wondering from example above is whether the network id should be wrapped in a network type by transform prior to merging...

Would that solve/change anything?

yaacovCR · 2020-08-29T20:07:32Z

I think I understand a bit more and that it wouldn't.

Overall, would like to get query batching in play to replace this federation-type behavior.

I think the easiest pattern to teach and implement is where each GraphQL schema is valid on its own. Would be in favor of docs noting that federation type schemas are not necessarily recommended.

gmac · 2020-08-30T02:51:29Z

Query batching is definitely a huge feature, and I certainly don’t mean to distract from that. This is an interesting prospect — with that mechanism in place, many delegated fields could then stand to rival federation in terms of batch performance (although I can’t speak to the execution costs associated with one big delegation of fields versus many individual ones. Are they comparable?). So the strategy would be to turn federation on its head with that stub types pattern — rather than one partial type with ten associations placed into service A, you’d instead place stubs of the ten associated types into service B? I hadn’t considered stubs at that scale before due to performance implications. It’d be relatively intuitive, although it’d be messier from the schema standpoint in that nothing is really organized by concern... there’d just be stubs everywhere. Is that bad? I dunno. I might have to try building a quick skeleton of our main schema laid out that way to study it; will pass it along if I do, it’d be a good test subject for studying query batching. Certainly, the stub types are nice in that they hide the underlying IDs in the schema. Pleasure working with you on this; it’s mind blowing stuff how certain alternative channels can crack the surface on viable new workflows. Oh, as for docs — let’s leave things as they stand until query batching comes together. I’ll be interested in studying it, and that may result in a totally new narrative. Awesome work on all this!

…

On Sat, Aug 29, 2020 at 4:07 PM Yaacov Rydzinski ***@***.***> wrote: I think I understand a bit more and that it wouldn't. Overall, would like to get query batching in play to replace this federation-type behavior. I think the easiest pattern to teach and implement is where each GraphQL schema is valid on its own. Would be in favor of docs noting that federation type schemas are not necessarily recommended. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1959 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFRROHLRXCXQHYNCP5BJJ3SDFNY7ANCNFSM4QNWJYYA> .

gmac · 2020-08-30T03:14:08Z

Okay, one other small detail that is actually a major consideration with the prospect of exploding the scale of stub types: docstrings. As far as I can tell at the type level, the last type definition in the list of subschemas provides the final docstring. We’re already wrestling with the “where does this docstring go” question now that we have a few duplicate types. The prospect of putting stubs everywhere would make this a hugely messy problem. Would it be possibly to, say, have merging ignore any docstring that starts with “!!” (or something), which would allow the official docstring to go with the main type definition in any service, and all other services could opt out of providing a docstring with a helpful note to developers like, “!! do not document this type here”. Possible? On Sat, Aug 29, 2020 at 10:51 PM Greg MacWilliam <gmacwill77@gmail.com> wrote:

…

Query batching is definitely a huge feature, and I certainly don’t mean to distract from that. This is an interesting prospect — with that mechanism in place, many delegated fields could then stand to rival federation in terms of batch performance (although I can’t speak to the execution costs associated with one big delegation of fields versus many individual ones. Are they comparable?). So the strategy would be to turn federation on its head with that stub types pattern — rather than one partial type with ten associations placed into service A, you’d instead place stubs of the ten associated types into service B? I hadn’t considered stubs at that scale before due to performance implications. It’d be relatively intuitive, although it’d be messier from the schema standpoint in that nothing is really organized by concern... there’d just be stubs everywhere. Is that bad? I dunno. I might have to try building a quick skeleton of our main schema laid out that way to study it; will pass it along if I do, it’d be a good test subject for studying query batching. Certainly, the stub types are nice in that they hide the underlying IDs in the schema. Pleasure working with you on this; it’s mind blowing stuff how certain alternative channels can crack the surface on viable new workflows. Oh, as for docs — let’s leave things as they stand until query batching comes together. I’ll be interested in studying it, and that may result in a totally new narrative. Awesome work on all this! On Sat, Aug 29, 2020 at 4:07 PM Yaacov Rydzinski ***@***.***> wrote: > > > I think I understand a bit more and that it wouldn't. > > > Overall, would like to get query batching in play to replace this > federation-type behavior. > > > I think the easiest pattern to teach and implement is where each GraphQL > schema is valid on its own. Would be in favor of docs noting that > federation type schemas are not necessarily recommended. > > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#1959 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAFRROHLRXCXQHYNCP5BJJ3SDFNY7ANCNFSM4QNWJYYA> > . > > >

yaacovCR · 2020-08-31T04:49:39Z

I think best would be to provide some generic options to configure how types and fields are merged.

For example, this code picks the last description for a type and picks up into the gateway the last version of a field form all the subschemas:

graphql-tools/packages/stitch/src/mergeCandidates.ts

Lines 60 to 70 in e60a814

    
           const descriptions = pluck<string>('description', candidates); 
        
           const description = descriptions[descriptions.length - 1]; 
        
           const configs = candidates.map(candidate => (candidate.type as GraphQLObjectType).toConfig()); 
        
           const fields = configs.reduce<GraphQLFieldConfigMap<any, any>>( 
        
             (acc, config) => ({ 
        
               ...acc, 
        
               ...config.fields, 
        
             }), 
        
             {} 
        
           );

Something like onTypeConflict for when types are not merged, maybe onDescriptionConflict, onFieldConflict, etc.

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info. TODO: - Add testing! - Extensions support should be added by a custom option? Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info. TODO: - Add testing! - Extensions support should be added by a custom option? Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info. TODO: - Add testing! - Extensions support should be added by a custom option? Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

gmac · 2020-08-31T12:24:08Z

onDescriptionConflict would be perfect. An onDeprecationConflict would also be useful.

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954 fix

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

yaacovCR · 2020-08-31T19:20:03Z

Hmmm. I am wondering if instead of using onDescriptionConflict and onDeprecatedConflict, it would be better to have one simple onFieldConflict option...

I think I would prefer that...

gmac · 2020-08-31T19:22:53Z

I see, so there’s already an onTypeConflict that can handle type-level docstring mashing? This is another prime area for documentation explaining how to do this.

…

On Mon, Aug 31, 2020 at 3:20 PM Yaacov Rydzinski ***@***.***> wrote: Hmmm. I am wondering if instead of using onDescriptionConflict and onDeprecatedConflict, it would be better to have one simple onFieldConflict option... I think I would prefer that... — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1959 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFRROG65VSEIYZPDKPWQFDSDPZXDANCNFSM4QNWJYYA> .

yaacovCR · 2020-08-31T19:24:21Z

Ah, made a mistake already, onDescriptionConflict is with respect to types, while onDeprecatedConflict is with respect to fields. Have to have a longer think about this, your thoughts welcome. The idea would be to make these as generic as possible.

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

yaacovCR · 2020-08-31T19:55:34Z

I see, so there’s already an onTypeConflict that can handle type-level docstring mashing? This is another prime area for documentation explaining how to do this.

Not exactly. onTypeConflict is a function can be used when schema stitching when types are not merged and is meant to select a type when there is conflict, the parameters given are the two conflicting types as well as the metadata in terms of the schemas from whence they came, and the result should be a new type.

When merging types, the fields and attributes require a host of custom merging that stitchSchemas gives you for free, but because of that, onTypeConflict does not exactly work. I am just trying to think of a way we could make this as generic as possible.

I think we will need separate functions for each type attribute that could conflict, i.e. onTypeDescriptionConflict but that we could have a single generic function for fields that conflict, i.e. onFieldConflict.

yaacovCR · 2020-08-31T19:56:00Z

The default onTypeConflict behavior when stitching btw is to just take the last type.

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

@vladar

WIP: When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. TODO: - Add additional testing! Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

@vladar

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests. Adapted from Gatsby query batcher by @vladar. Caveats: * Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen! * Passed `info` argument from first executor call to the batched executor call, making info argument unreliable. Related: gatsbyjs/gatsby#22347 (comment) #1710 (comment) #1959 (comment) #1954

gmac · 2020-08-31T21:53:34Z

Okay, I think I follow all that. Using the final definition of a field is pretty intuitive, and we’ve been generally able to be pretty productive around that known behavior. Also, the default technical merging behavior of both fields and types has proven itself to be pretty darn good, and as implementors we don’t really want to mess around with something dealing with internals unless we absolutely have to (go with the flow, and enjoy library updates as they come!) So, I would say that the docstring conflict scenario is the one boilerplate case where we do want to get involved with the merge while not touching anything technical. My suggestion would be this: onDescriptionConflict(oldStr, newStr, Type, Field?): string This would call whenever a conflict was encountered on a type description or a field description. OldStr is the working description on the merged type/field, and newStr is the newly encountered description. Type would always be passed as the current merge candidate type, and Field would optionally be passed when this conflict happens on a field. The method returns a string, which is the resolved description to use on the merged element. This would also be useful to run on InputObjectTypes.

…

On Mon, Aug 31, 2020 at 3:56 PM Yaacov Rydzinski ***@***.***> wrote: The default onTypeConflict behavior when stitching btw is to just take the last type. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1959 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFRROE3ZRTKI75NJ6OLQ6LSDP555ANCNFSM4QNWJYYA> .

gmac · 2020-09-04T11:43:03Z

I’ve got some catching up to do, but I like this approach of not wrapping stitchSchema and using some kind of semi-formal plugin system. The computed request scope thing seemed like a pretty small feature to add a full breakout wrapper for. Also, docs-wise this still works pretty well in that the merge guide can be reduced down to static schemas and an “advanced” guide page can be added for more involved patterns.

…

On Fri, Sep 4, 2020 at 4:03 AM Yaacov Rydzinski ***@***.***> wrote: I reworked subschema config sets to just be arrays of subschema config objects #2001 <#2001> The helper function can operate on a subschema config object and return a new subschema config object or an array of them that can either be spread or passed whole within the subschemas property of stitchSchemas. See https://github.com/ardatan/graphql-tools/blob/6a2e5ab8a1efd4b7524437ee0be5a03d7f312b1e/packages/delegate/tests/batchExecution.test.ts#L64-L177 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1959 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFRROGL6TSTX4J74CKGPZ3SECNNRANCNFSM4QNWJYYA> .

gmac · 2020-09-04T17:57:39Z

I think your mention of a plugins system is on the money. An interface that works something like this would be really modular and extensible:

const gatewaySchema = stitchSchemas({
  subschemas: [ ... ],
  mergeTypes: true,
  plugins: [
    new ComputedFieldsPlugin(),
    new CombineDescriptionsPlugin({ excludePrefix: '!omit' }),
    new UndeprecateFieldsPlugin({ withReason: 'gateway access only' }),
  ]
});

yaacovCR · 2020-09-05T18:06:45Z

I imagine each plugin would take each SubschemaConfig object as parameter and return modified version for next plugin in pipeline?

gmac · 2020-09-05T18:21:45Z

I was envisioning that plugins would be at the level of the stitched schema, not the level of subschemas. That way a plugin could tap into numerous aspects of the stitched configuration. I just pushed an update to #1989 that demonstrates the proposed pattern.

gmac · 2020-09-05T18:32:15Z

See proposal in https://github.com/ardatan/graphql-tools/pull/1989/files#diff-9714caf463c56574ed31e1c31f89562bR310-R340.

yaacovCR · 2020-09-05T18:49:33Z

Actually, I guess the plugins would have to take the whole options object as input and return modified version.

Should we allow plugins to add additional plugins then? I would imagine not 🙂

gmac · 2020-09-05T19:56:16Z

Large scale engines like Webpack are pretty hands-off about that. A plug-in has access to the whole config, and can customize as needed for its objectives. Plugins adding plugins isn’t unheard of because a responder may want to lock in at multiple priorities within the queue (front, back, etc), or install supporting modules that it takes advantage of, assuming they weren’t installed manually. So all that said, probably the simplest and most conventional approach is to provide everything and call it a day.

…

On Sat, Sep 5, 2020 at 2:49 PM Yaacov Rydzinski ***@***.***> wrote: Actually, I guess the plugins would have to take the whole options object as input and return modified version. Should we allow plugins to add additional plugins then? I would imagine not 🙂 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1959 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFRROCKMXGZ7KA2C74AO43SEKB4TANCNFSM4QNWJYYA> .

yaacovCR · 2020-09-09T22:58:55Z

See my latest changes in #1989, it appears that field selection set isolation can cause a subschema that properly returns its fields to fail, if it returns new data not present in the other subschemas.

graphql-tools/packages/batch-delegate/tests/typeMerging.example.test.ts

Line 85 in 1added0

mostStockedProduct: () => ({ upc: '3', price: 1, weight: 8560 }),

When a product with upc of 4 is returned from the inventory schema along with price and weight, the inventory schema on its own can calculate the shippingEstimate. If you enable field selection set isolation, the gateway cannot, because upc of 4 is not present in productSchema, Similarly, for upc of 3, when enabling field selection set isolation, price and weight are read from the product schema, even though supplied by the inventory schema, so that results may differ from schema to gateway.

What should the default behavior be? Presumably, it should be up to configuration on the gateway, but I do not think it is obvious that isolation should be enabled by default. It depends on the expectations of how the schema resolvers were written. If, for example, price and weight are never returned, presumably field selection set isolation should be enabled. But that is difficult to automatically introspect....

gmac · 2020-09-10T00:55:21Z

Trying to follow along... it's a little confusing because I don't see a upc of 4 in that example anymore. But I think I follow what you're saying, though I'm not convinced that it matches the goals of computed fields...

When a product with upc of 4 is returned from the inventory schema along with price and weight, the inventory schema on its own can calculate the shippingEstimate

The inventory schema doesn't provide fields for price and weight in its schema, so I don't feel there's any reasonable expectation that it should be able to return early with a precomputed value. When fields are marked as @required, the expectation is that the gateway will send them their dependency data, by design. Any local data caches are presumably untrusted and therefore ignored.

If you enable field selection set isolation, the gateway cannot, because upc of 4 is not present in productSchema

This sounds like a data error by any standard that has nothing to do with computed fields. Products schema is the only place that defines price and weight fields... If a record doesn't exist in that service, then there will always be an error when attempting to access price or weight on that record because the lookup will fail in the only service that provides the fields.

Similarly, for upc of 3, when enabling field selection set isolation, price and weight are read from the product schema, even though supplied by the inventory schema, so that results may differ from schema to gateway.

This sounds right to me for computed fields flow. The only thing that sounds off is that inventory schema is doing anything with local price and weight values when the expectation is that the gateway provides those values.

Bottom line is – when a field is marked as required, it is expecting its value to come from the gateway (and in turn–from some other service elsewhere). The developer has enabled a hard @requires because we presumably wouldn't trust a local value even if we had one (this is that exact case I was telling you about: while my Entry may have a cache of data, I can't trust it not to be stale, therefore want the data to come from the source-of-truth service via the gateway). All that is to say: extra back-and-forth is often preferable for data accuracy over doing the least amount of work possible to return the data in an untrusted form. The @requires mechanism is most definitely not a feature that promotes efficiency – it is a tool that enables accuracy.

gmac · 2020-09-10T01:08:26Z

ALL THAT SAID – I feel like we're getting into a gray area here where we've added a pretty high-touch feature that (with all due respect) doesn't sound like you're super comfortable with the implementation or use of. I wouldn't be the least bit offended if we were to scrap #1989 and scratch it off the board. It's going to be a bigger liability than boon in the long term if the primary author isn't comfortable with it being there. Type merging is still damn good without it, and personally I can navigate my way around without it there. I was only promoting it because I saw an opportunity for a general purpose feature that others looking for a federation alternative could benefit from.

gmac · 2020-09-10T02:46:28Z

Just to make a final appeal here before scrapping this or shipping a enableFieldSelectionSetIsolation (which I don't think makes any intuitive sense as an option)... I think your implementation of a dependency pattern was quite a ways off and you were encountering errors of the example's own creation. This is what a federation implementation looks like... would you consider backing out the enableFieldSelectionSetIsolation stuff and running the original code against this setup? The mostStockedProduct field you added works as expected with enableFieldSelectionSetIsolation turned on...

Main note from the original test:

graphql-tools/packages/batch-delegate/tests/typeMerging.example.test.ts

Line 85 in 1added0

mostStockedProduct: () => ({ upc: '3', price: 1, weight: 8560 }),

^^ That service isn't realistic for several reasons. The inventory array represents a database, so this mostStockedProduct is returning a record that doesn't match the database schema... then, inventory service doesn't offer price or weight GraphQL fields, and does offer inStock which isn't on that object. The way this would realistically work is that upc 3 would be looked up from the local database, ie: mostStockedProduct: () => inventory.find(i => i.upc === '3'). All other fields would be resolved from other services, and shippingEstimate gets resolved onto most stocked product after visiting products schema for the required fields specified.

import { graphql } from 'graphql';

import { makeExecutableSchema } from '@graphql-tools/schema';
import { ExecutionResult } from '@graphql-tools/utils';
import { stitchSchemas } from '@graphql-tools/stitch';

describe('merging using type merging', () => {
  const users = [
    {
      id: '1',
      name: 'Ada Lovelace',
      birthDate: '1815-12-10',
      username: '@ada'
    },
    {
      id: '2',
      name: 'Alan Turing',
      birthDate: '1912-06-23',
      username: '@complete',
    },
  ];

  const accountsSchema = makeExecutableSchema({
    typeDefs: `
      type Query {
        me: User
        _userById(id: ID!): User
        _usersById(ids: [ID!]!): [User]
      }
      type User {
        id: ID!
        name: String
        username: String
      }
    `,
    resolvers: {
      Query: {
        me: () => users[0],
        _userById: (_root, { id }) => users.find(user => user.id === id),
        _usersById: (_root, { ids }) => ids.map((id: any) => users.find(user => user.id === id)),
      },
    },
  });

  const inventory = [
    { upc: '1', inStock: true },
    { upc: '2', inStock: false },
    { upc: '3', inStock: true }
  ];

  const inventorySchema = makeExecutableSchema({
    typeDefs: `
      directive @requires(selectionSet: String!) on FIELD_DEFINITION

      input ProductRepresentation {
        upc: String!
        price: Int
        weight: Int
      }
      type Product {
        upc: String!
        inStock: Boolean
        shippingEstimate: Int @requires(selectionSet: "{ price weight }")
      }
      type Query {
        mostStockedProduct: Product
        _products(representations: [ProductRepresentation!]!): [Product]!
      }
    `,
    resolvers: {
      Product: {
        shippingEstimate: product => {
          if (product.price > 1000) {
            return 0 // free for expensive items
          }
          return Math.round(product.weight * 0.5) || null; // estimate is based on weight
        }
      },
      Query: {
        mostStockedProduct: () => inventory.find(i => i.upc === '3'),
        _products: (_root, { representations }) => {
          return representations.map(rep => ({ ...rep, ...inventory.find(i => i.upc === rep.upc) }));
        },
      },
    },
  });

  const products = [
    {
      upc: '1',
      name: 'Table',
      price: 899,
      weight: 100
    },
    {
      upc: '2',
      name: 'Couch',
      price: 1299,
      weight: 1000
    },
    {
      upc: '3',
      name: 'Chair',
      price: 54,
      weight: 50
    }
  ];

  const productsSchema = makeExecutableSchema({
    typeDefs: `
      type Query {
        topProducts(first: Int = 2): [Product]
        _productByUpc(upc: String!): Product
        _productsByUpc(upcs: [String!]!): [Product]
      }
      type Product {
        upc: String!
        name: String
        price: Int
        weight: Int
      }
    `,
    resolvers: {
      Query: {
        topProducts: (_root, args) => products.slice(0, args.first),
        _productByUpc: (_root, { upc }) => products.find(product => product.upc === upc),
        _productsByUpc: (_root, { upcs }) => upcs.map((upc: any) => products.find(product => product.upc === upc)),
      }
    },
  });

  const usernames = [
    { id: '1', username: '@ada' },
    { id: '2', username: '@complete' },
  ];

  const reviews = [
    {
      id: '1',
      authorId: '1',
      product: { upc: '1' },
      body: 'Love it!',
    },
    {
      id: '2',
      authorId: '1',
      product: { upc: '2' },
      body: 'Too expensive.',
    },
    {
      id: '3',
      authorId: '2',
      product: { upc: '3' },
      body: 'Could be better.',
    },
    {
      id: '4',
      authorId: '2',
      product: { upc: '1' },
      body: 'Prefer something else.',
    },
  ];

  const reviewsSchema = makeExecutableSchema({
    typeDefs: `
      type Review {
        id: ID!
        body: String
        author: User
        product: Product
      }
      type User {
        id: ID!
        username: String
        numberOfReviews: Int
        reviews: [Review]
      }
      type Product {
        upc: String!
        reviews: [Review]
      }
      type Query {
        _userById(id: ID!): User
        _usersById(ids: [ID!]!): [User]
        _reviewById(id: ID!): Review
        _productByUpc(upc: String!): Product
        _productsByUpc(upcs: [String!]!): [Product]
      }
    `,
    resolvers: {
      Review: {
        author: (review) => ({ __typename: 'User', id: review.authorId }),
      },
      User: {
        reviews: (user) => reviews.filter(review => review.authorId === user.id),
        numberOfReviews: (user) => reviews.filter(review => review.authorId === user.id).length,
        username: (user) => {
          const found = usernames.find(username => username.id === user.id)
          return found ? found.username : null
        },
      },
      Product: {
        reviews: (product) => reviews.filter(review => review.product.upc === product.upc),
      },
      Query: {
        _reviewById: (_root, { id }) => reviews.find(review => review.id === id),
        _userById: (_root, { id }) => ({ id }),
        _usersById: (_root, { ids }) => ids.map((id: string) => ({ id })),
        _productByUpc: (_, { upc }) => ({ upc }),
        _productsByUpc: (_, { upcs }) => upcs.map((upc: string) => ({ upc })),
      },
    }
  });

  const stitchedSchema = stitchSchemas({
    subschemas: [
      {
        schema: accountsSchema,
        merge: {
          User: {
            selectionSet: '{ id }',
            fieldName: '_userById',
            args: ({ id }) => ({ id })
          }
        },
        batch: true
      },
      {
        schema: inventorySchema,
        enableFieldSelectionSetIsolation: true,
        merge: {
          Product: {
            selectionSet: '{ upc }',
            fieldName: '_products',
            key: ({ upc, weight, price }) => ({ upc, weight, price }),
            argsFromKeys: (representations) => ({ representations }),
          }
        },
        batch: true
      },
      {
        schema: productsSchema,
        merge: {
          Product: {
            selectionSet: '{ upc }',
            fieldName: '_productByUpc',
            args: ({ upc }) => ({ upc }),
          }
        },
        batch: true
      },
      {
        schema: reviewsSchema,
        merge: {
          User: {
            selectionSet: '{ id }',
            fieldName: '_usersById',
            args: (ids) => ({ ids }),
            key: ({ id }) => id,
          },
          Product: {
            selectionSet: '{ upc }',
            fieldName: '_productByUpc',
            args: ({ upc }) => ({ upc }),
          },
        },
        batch: true
      }],
    mergeTypes: true,
  });

  it('gets most stocked product', async () => {
    const result = await graphql(
      stitchedSchema, `
        query {
          topProducts(first: 2) {
            upc
            shippingEstimate
          }
          mostStockedProduct {
            upc
            shippingEstimate
          }
        }
      `,
      undefined,
      {},
    );

    console.log(JSON.stringify(result, null, 2))
  });
});

gmac · 2020-09-10T03:33:40Z

Hmmm... actually, looks like the present implementation is back to only visiting each subschema once, or else isolation isnt working properly anymore.

This has proper shipping estimates but no inStock:

  it('gets most stocked product', async () => {
    const result = await graphql(
      stitchedSchema, `
        query {
          topProducts(first: 2) {
            upc
            shippingEstimate
            inStock
          }
          mostStockedProduct {
            upc
            shippingEstimate
            inStock
          }
        }
      `,
      undefined,
      {},
    );

    console.log(JSON.stringify(result, null, 2))
  });

While this has inStock with no shipping estimates (note selection order):

  it('gets most stocked product', async () => {
    const result = await graphql(
      stitchedSchema, `
        query {
          topProducts(first: 2) {
            upc
            inStock
            shippingEstimate
          }
          mostStockedProduct {
            upc
            inStock
            shippingEstimate
          }
        }
      `,
      undefined,
      {},
    );

    console.log(JSON.stringify(result, null, 2))
  });

It would appear that we're back to only visiting each subservice once here, and the two portions of the inventory schema are together only visited once?

yaacovCR · 2020-09-10T05:01:02Z

There's no single products database potentially, as you mentioned in terms of stale data, could be multiple, especially in terms of remote schemas not under control.

So which is stale? To prefer the database with exported fields is one choice, but it is perfectly valid to have the product object contain metadata that is not available via GraphQL, but possibly more accurate... in terms of calculating the shipping estimate.

I think the default of setting the option to true makes sense if you are in control of all the subschemas, as in your use case, but potentially false when thinking about remote schemas not under control.

Easiest thing is putting it under an option, either once for the entire stitch schemas, or for each sub schema, or for each field...

We can also consider instead of a flag, putting the selectionSet data for fields meant to be isolated under something other than fields, like gatewayFields, that is essentially like making the option enabled on a per field basis...

In terms of bug, we'll have to investigate! All tests pass under current implementation, so looks like we need more tests prior to shipping.

yaacovCR · 2020-09-10T05:02:40Z

Can you add the failing tests and I will work on it?

yaacovCR · 2020-09-10T09:01:39Z

could not reproduce your failing test, in the meantime i switched the test that i did add to use the field selection set isolation method

I renamed the property switch to useGatewayData

yaacovCR · 2020-09-10T09:05:00Z

a bit snazzier than enableFieldSelectionSetIsolation + explains what is being done rather than how

gmac · 2020-09-10T13:08:26Z

I guess I still don't think that a single useGatewayData is very intuitive as an option. It's too arcane. I still feel pretty strongly that the pattern that makes the most sense would be to activate this on a per-field basis like how the feature was originally written. Only fields with an @requires or a required: true attribute would be isolated, and the behavior would be implicit in the presence of a requirement. Having an alternate fields namespace like gatewayFields would also work here, although creates its own confusion. Personally, I think @requires or field: { selectionSet: ..., required: true } is extremely expressive of the config and not something you're going to do accidentally; and makes the system extremely granular (one subservice-level config doesn't give field level control, which is more useful than I had originally imagined).

Then to some points above:

So which is stale? To prefer the database with exported fields is one choice, but it is perfectly valid to have the product object contain metadata that is not available via GraphQL, but possibly more accurate... in terms of calculating the shipping estimate.

Sure, but in this case, there's no reason to put an @requires on the field. If you have an internal cache of data that you trust, there's no need to require the field to come from the gateway. If it's an either/or where you may want to use gateway data or prefer your own, that's fine too – it's quite easy to differentiate a database record from a representation sent by the gateway (though we've admittedly been munging that distinction in our simple tests), at which time you can write logic that selects one from the other. This resolver discussion is kind of burying the simple function of a @required field – the directive simply means that the gateway always sends it. What the resolver does is beside the point.

I think the default of setting the option to true makes sense if you are in control of all the subschemas, as in your use case, but potentially false when thinking about remote schemas not under control.

I guess I'd disagree here. Let's say we're consuming a third-party federation service and we read a field marked with @requires... that is not a soft requirement; the field expects that to be sent by the gateway. We have to always send that data if we're going to successfully consume the field. The only case where we wouldn't need to isolate a field is the off-chance that someone is using a @requires directive in the non-federation context, at which time we've isolated the field for no reason (benign). At this point, @requires feels ubiquitous enough as federation behavior that I think it makes sense to trigger isolation in response to it. Also, nothing says that static config couldn't be written as fieldName: { selectionSet: ..., required: false }, and we'd always believe static config over SDL directives.

So, all told – I still think this feature makes the most sense as it was originally written: a field marked as required is isolated for the simple purpose of making it always send its required data. No required fields means nothing is isolated.

yaacovCR · 2020-09-10T13:44:52Z

useGatewayData false with field selection sets is still useful because when the field is not requested, the gateway does not need to add the selection set.

that would break a schema that is not expecting it when the fresh data is at the remote subschema...

but it sounds like you are agreeing that it is okay to have a flag with this behavior off by default

I think it makes sense for the directive to be called requires, but within the mergeConfig it kind of doesn't make sense, because all fields "require" the selection set, it makes sense to call it "federate: true" or something like that...

gmac · 2020-09-10T13:47:30Z

Just added this test here that demonstrates the failure: e1533fc

When querying topProducts (originates in products service), only one of shippingEstimate OR inStock will be resolved, whichever field is selected first. This leads me to believe that merging is not visiting both the base and the computed schema for inventory schema, perhaps because they share an endpoint? These fields together work on mostStockedProduct which originates in the inventory service and returns to itself in a future generation with the required field.

yaacovCR · 2020-09-10T13:47:43Z

And then the default for the requires directive could be to have federate true, optionally have federate false

gmac · 2020-09-10T13:53:18Z

but it sounds like you are agreeing that it is okay to have a flag with this behavior off by default

I still don't fully understand the subservice level flag; but at the field level, yes, I would agree that this would be a field level flag that would be off by default and opted into. What the field is called in field level config I guess I'm not willing to go to battle over, but I still think @requires and required: true make the most sense for parity and because the selection set is required to be sent for the field (ie: it is not an optional "send it if you're going that direction anyway" kind of field).

So does this mean you have to enable a subservice-level flag in order to activate @requires behavior? If so, that feels weird to me that there's a setting to activate a setting.

yaacovCR · 2020-09-10T13:59:23Z

If we have federate or some other field level flag we don't need a subschema level flag, correct

yaacovCR · 2020-09-10T14:01:58Z

The problem with required as name for merge config is that setting it to false is a bit misleading, the gateway is still required to request the selection set in order to resolve the field unless the data originates from that sub service...

Within the directive, requires is the name of the directive and we are never setting it to false, we would have a different flag with perhaps a different default such as federate to indicate whether we are using the isolating behavior.... This sounds like the best option to me...

yaacovCR · 2020-09-10T14:02:22Z

Not quite sure about that failing test, I wonder what happens when you turn batching off... Hopefully will be able to take a look at it in a few hours

yaacovCR · 2020-09-10T14:02:37Z

Sorry I forced push probably should have merge master

yaacovCR · 2020-09-13T13:52:56Z

#1989 merged!

yaacovCR · 2020-10-01T20:00:33Z

Available in latest release...

yaacovCR mentioned this issue Aug 30, 2020

enable batch execution #1965

Merged

yaacovCR mentioned this issue Aug 31, 2020

v7 #1935

Merged

yaacovCR mentioned this issue Aug 31, 2020

enable batch execution #1971

Merged

yaacovCR added the waiting-for-release Fixed/resolved, and waiting for the next stable release label Sep 13, 2020

yaacovCR closed this as completed Oct 1, 2020

yaacovCR removed the waiting-for-release Fixed/resolved, and waiting for the next stable release label Oct 1, 2020

Greedy selection of merged fields with dependencies #1959

Greedy selection of merged fields with dependencies #1959

Comments

gmac commented Aug 28, 2020 • edited

yaacovCR commented Aug 28, 2020

yaacovCR commented Aug 28, 2020

gmac commented Aug 28, 2020 via email • edited

gmac commented Aug 29, 2020 • edited

yaacovCR commented Aug 29, 2020

yaacovCR commented Aug 29, 2020

gmac commented Aug 30, 2020 via email

gmac commented Aug 30, 2020 via email

yaacovCR commented Aug 31, 2020 • edited

gmac commented Aug 31, 2020

yaacovCR commented Aug 31, 2020

gmac commented Aug 31, 2020 via email

yaacovCR commented Aug 31, 2020

yaacovCR commented Aug 31, 2020

yaacovCR commented Aug 31, 2020

gmac commented Aug 31, 2020 via email

gmac commented Sep 4, 2020 via email

gmac commented Sep 4, 2020 • edited

yaacovCR commented Sep 5, 2020

gmac commented Sep 5, 2020

gmac commented Sep 5, 2020

yaacovCR commented Sep 5, 2020

gmac commented Sep 5, 2020 via email • edited

yaacovCR commented Sep 9, 2020

gmac commented Sep 10, 2020 • edited

gmac commented Sep 10, 2020

gmac commented Sep 10, 2020 • edited

gmac commented Sep 10, 2020 • edited

yaacovCR commented Sep 10, 2020 • edited

yaacovCR commented Sep 10, 2020

yaacovCR commented Sep 10, 2020 • edited

yaacovCR commented Sep 10, 2020

gmac commented Sep 10, 2020 • edited

yaacovCR commented Sep 10, 2020

gmac commented Sep 10, 2020 • edited

yaacovCR commented Sep 10, 2020

gmac commented Sep 10, 2020 • edited

yaacovCR commented Sep 10, 2020

yaacovCR commented Sep 10, 2020

yaacovCR commented Sep 10, 2020

yaacovCR commented Sep 10, 2020

yaacovCR commented Sep 13, 2020

yaacovCR commented Oct 1, 2020

gmac commented Aug 28, 2020 •

edited

gmac commented Aug 28, 2020 via email •

edited

gmac commented Aug 29, 2020 •

edited

yaacovCR commented Aug 31, 2020 •

edited

gmac commented Sep 4, 2020 •

edited

gmac commented Sep 5, 2020 via email •

edited

gmac commented Sep 10, 2020 •

edited

gmac commented Sep 10, 2020 •

edited

gmac commented Sep 10, 2020 •

edited

yaacovCR commented Sep 10, 2020 •

edited

yaacovCR commented Sep 10, 2020 •

edited

gmac commented Sep 10, 2020 •

edited

gmac commented Sep 10, 2020 •

edited

gmac commented Sep 10, 2020 •

edited