Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select only necessary columns for optimized queries #1356

Open
kplgr opened this issue May 10, 2020 · 28 comments · May be fixed by #1626
Open

Select only necessary columns for optimized queries #1356

kplgr opened this issue May 10, 2020 · 28 comments · May be fixed by #1626
Labels
discussion Requires input from multiple people performance Fix or optimization of performance

Comments

@kplgr
Copy link

kplgr commented May 10, 2020

What problem does this feature proposal attempt to solve?
I am trying out GraphQL, testing various features in order to include it in some of the web apps I am developing with laravel. Being curious, I created a Schema and fired up all sorts of queries, followed by poking around in the SQL server to see what was going on.

What I discovered was this:

Suppose I have a Type defined which corresponds to a Model, and the database table has an ID column and 26 more columns (A, B C, ..., Z).

No matter what the Type definition is, which in my case only includes some of the columns ( say, ID and A, B, C), I can see that the SQL server is asked to fetch all 27 columns from the DB, using a SELECT * FROM [Table] statement, even though given the definition of the type at most 4 of them would be returned via the resolver.

This is an unnecessary data transfer between the DB and PHP, in my opinion.

Which possible solutions should be considered?

I propose that we construct a more refined SQL query, selecting only the columns that are requested in the GraphQL query.

I am not sure I am competent enough to try and implement this on my own, but I will give it a shot. In any case, I thought I'd share my thoughts.

@spawnia spawnia added the discussion Requires input from multiple people label May 10, 2020
@spawnia
Copy link
Collaborator

spawnia commented May 10, 2020

Technically, this can be done by constraining the SELECT to the requested fields. The fourth resolver argument ResolveInfo can be used to know which fields a client selected.

There are some pitfalls to this which require discussion:

  • Fields may not match database columns 1-1
  • Laravel assumes SELECT * by default for things like relations, virtual properties, etc.

@mazedlx
Copy link

mazedlx commented May 28, 2020

Technically, this can be done by constraining the SELECT to the requested fields. The fourth resolver argument ResolveInfo can be used to know which fields a client selected.

There are some pitfalls to this which require discussion:

* Fields may not match database columns 1-1

* Laravel assumes `SELECT *` by default for things like relations, virtual properties, etc.

Would you be so awesome as to provide an example? I've searched through the docs but I can't seem to find what I am looking for :-(

Let's say I have a model Nerd with the fields glasses and beard,

type Nerd {
    glasses: Int!
    beard: String!
}

and I want to query them with

type Query {
   nerds: [Nerd] @all
}

and I only want the SQL statement to be SELECT glasses, beard FROM nerds. Thanks in advance!

@spawnia
Copy link
Collaborator

spawnia commented May 28, 2020

public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

@mazedlx
Copy link

mazedlx commented May 28, 2020

public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

Thanks! Just one more stupid question: where do I put that method? On the Eloquent model?

@spawnia
Copy link
Collaborator

spawnia commented May 28, 2020

https://lighthouse-php.com/master/the-basics/fields.html

@mazedlx
Copy link

mazedlx commented May 28, 2020

Awesome, thanks!

@canatufkansu
Copy link

This kind of feature would be a huge performance improvement for the package. I'm implementing an endpoint for multi million record table with at least 50+ columns. I'm using query builder to make search in a given bounding box (https://wiki.openstreetmap.org/wiki/Bounding_Box) in PostgreSQL also implemented complex where conditions to filter the results. Only issue is Laravel generates SELECT * queries which affects the performance. I could use custom resolvers but when I do that I have to deal with pagination.

Following is my current Query schema:

listings(bbox: BoundingBoxInput! @builder(method: "App\\GraphQL\\Builders\\BoundingBoxSearch@bbSearch"), where: _ @whereConditions(columnsEnum: "ListingColumn")) : [Listing!]! @middleware(checks: ["auth:api"]) @paginate(maxCount: 50 defaultCount: 10)

Is there any way I can modify select columns without writing custom resolver ? Because by this way with very little code I can do so much.

@spawnia
Copy link
Collaborator

spawnia commented Jun 22, 2020

I am not planning to implement such functionality in Lighthouse myself, but am open to accepting PRs.

@canatufkansu
Copy link

I am not planning to implement such functionality in Lighthouse myself, but am open to accepting PRs.

I understand the complexity and side affects like you mentioned above about relations and virtual properties. It would be nice to have some way to control select columns like in query builder @builder directive. Thanks, even like this Lighthouse is very useful.

@spawnia spawnia added help wanted duplicate Already present in another issue labels Jun 22, 2020
@spawnia
Copy link
Collaborator

spawnia commented Jun 22, 2020

Just noticed this is a duplicate of #1214. An implementation attempt was also started #1253

@andershagbard
Copy link
Contributor

Can't quite see how this is duplicate.

Could this be solved by implementing a select directive?

class User extends Model
{
  public function getNameAttribute()
  {
    return $this->first_name . ' ' . $this->last_name;
  }
}
type User {
  id: ID!
  first_name: String
  last_name: String
  name: String @select(fields: ["first_name", "last_name"])
}

@spawnia spawnia changed the title Why select all the fields of a table, when resolving a query? Select only necessary columns for optimized queries Jul 28, 2020
@spawnia spawnia removed the duplicate Already present in another issue label Jul 28, 2020
@spawnia spawnia reopened this Jul 28, 2020
@spawnia
Copy link
Collaborator

spawnia commented Jul 28, 2020

@andershagbard right, there is some overlap between the issue, but the intent is different.

Could this be solved by implementing a select directive?

That would be one part of the puzzle, good thinking.

We might need a mechanism to make the optimized selects opt-in per field / type. Otherwise, it could be quite hard to migrate existing applications towards it, since now Lighthouse always selects all the fields.

@andershagbard
Copy link
Contributor

We might need a mechanism to make the optimized selects opt-in per field / type. Otherwise, it could be quite hard to migrate existing applications towards it, since now Lighthouse always selects all the fields.

Trying to think of a good solution for this.

Could this be an opt-in in the config file? If enabled it would use the field(s) in the @select, if @select is absent, use the field name.

@spawnia
Copy link
Collaborator

spawnia commented Jul 28, 2020

We might do that as an MVP. I am a bit worried about migrating big applications towards optimized selects, but maybe it is not a big issue in practice.

I do not plan to implement such a feature myself, but am happy to aid in reviewing and refining a PR.

@nuwave nuwave deleted a comment from vanthao03596 Sep 12, 2020
@DDynamic
Copy link

@spawnia I am working on an implementation for this, and I am having trouble accessing child directives via $resolveInfo. Accessing the directives property on any selection in a selectionSet returns an empty NodeList.

@spawnia
Copy link
Collaborator

spawnia commented Nov 23, 2020

@spawnia I am working on an implementation for this, and I am having trouble accessing child directives via $resolveInfo. Accessing the directives property on any selection in a selectionSet returns an empty NodeList.

That would give you access to client directives.

This feature requires looking at schema directives. Look at other Lighthouse directives to see how that works.

@DDynamic
Copy link

Could you give me an example of a parent directive searching through child directives? I can't think of any off the top of my head.

@spawnia
Copy link
Collaborator

spawnia commented Nov 23, 2020

I do not know what you mean by parent or child directives. Consider the FieldValue class and its uses, maybe you will find what you need there.

Try posting a draft PR and we can discuss implementation details.

@nuwave nuwave deleted a comment from petecoop Nov 26, 2020
@DDynamic DDynamic linked a pull request Nov 27, 2020 that will close this issue
3 tasks
@spawnia spawnia added the performance Fix or optimization of performance label Nov 27, 2020
@Paula2001
Copy link

I got to the point that I have created at trait that get the fields that should get selected but I am facing a serious issue with relations I can't apply the select method to belongsTo and hasOne and HasMany directives for some reason is there's any easier way to make the select rule more generic and we can apply it to all directives cause I don't see a way in here

@nuwave nuwave deleted a comment from egyjs Aug 15, 2021
@dmouse
Copy link

dmouse commented Dec 15, 2021

Hi all! other thing that I notice is about @count and @paginate:

if in the @count directive we add the columns to include in the count() method, you can choose some "indexed fields", this improve the performance

directive @count(
  """
  .....
  """

  """
  Columns
  """
  columns: [String!]
) on FIELD_DEFINITION
GRAPHQL;

something similar for @paginate might improve the performance, but in this case it's more complicated:

https://github.com/nuwave/lighthouse/blob/master/src/Pagination/PaginationArgs.php#L107

if we modify the below line to include the columns for the count process, the column parameter only affects the return columns and doesn't affect the count that execute the paginator.

return $builder->{$methodName}($this->first, ['row_id'], 'page', $this->page);

image

This is because the Laravel paginate method don't use the columns in the count execution. This is more a question for the Laravel issue queue, but maybe add another parameter in the paginate method to modify the columns in the count process

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Eloquent/Builder.php#L808
https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Query/Builder.php#L2445

@dmouse
Copy link

dmouse commented Dec 20, 2021

Update: I opened a discussion about the paginate method laravel/framework#40114

@chadidi
Copy link

chadidi commented Dec 27, 2021

@dmouse that's just a misconception count(*) is not the same as SELECT *, check this link for more details: https://www.sqlservercentral.com/articles/advice-on-using-count

@nuwave nuwave deleted a comment from hmingv Jul 25, 2022
@Stevemoretz
Copy link
Contributor

public function __invoke($root, array $args, GraphQLContext $context, ResolveInfo $resolveInfo)
{
    $resolveInfo->getFieldSelection();
    $resolveInfo->lookAhead();
    // TODO use the information from those methods to SELECT only the given fields
}

Trying to do something similar and I found that brilliant piece you written here!!
Though what is the lookahead for? Any issues with using the following?

$resolveInfo->getFieldSelection(100);

It gives the result in the most beautiful output I could have imagined to get, not sure about performance issues...

@spawnia
Copy link
Collaborator

spawnia commented Nov 18, 2022

$resolveInfo->getFieldSelection(100);

That is absolutely fine, if it has the information you need. lookAhead() gives you more detailed information about the selection, such as the types of the sub-fields and their arguments.

@Stevemoretz
Copy link
Contributor

Stevemoretz commented Nov 18, 2022

$resolveInfo->getFieldSelection(100);

That is absolutely fine, if it has the information you need. lookAhead() gives you more detailed information about the selection, such as the types of the sub-fields and their arguments.

Thank you so much for explaining that, so now I know how to access arguments of sub-fields thanks to that method!
Just two questions to make my knowledge a little more about this,

  1. can we grab info from a parent-field in the tree from a sub-node?
$_, array $args, GraphQLContext $context, ResolveInfo $resolveInfo

$_ here gives the parent result, but can't access for instance parent args or args from parent of the parent...

  1. can we pass in data from upper-fields nodes to lower ones?

Let's say that I have an array which I got based on the info I was provided in an upper-field now in a sub node (in the same branch) I want to access those data, how could I pass data from upper node to lower nodes in the branch? I thought context was there for this but I was wrong!

@spawnia
Copy link
Collaborator

spawnia commented Nov 20, 2022

I think those questions are off-topic, you may ask in StackOverflow or Slack.

@spawnia
Copy link
Collaborator

spawnia commented Aug 29, 2023

Right now, two pull requests are trying to solve this issue:

Going forward, I think the minimal viable solution for this issue is to implement the manual approach first. A marker directive needs to be added to fields to inform Lighthouse which columns need to be selected for it, e.g.:

"""
Defines which columns must be selected from the database to resolve the field.

If specified on at least one field of a type, Lighthouse will `SELECT` only the explicitly
mentioned columns matching the queried fields. For fields without this directive,
no columns will be selected, so make sure to specify this for all fields that need it.
"""
directive @select(columns: [String!]!) on FIELD_DEFINITION

type User {
  id: ID! @select(columns: ["id"])
  first_name: String! @select(columns: ["first_name"])
  last_name: String! @select(columns: ["last_name"])
  full_name: String! @select(columns: ["first_name", "last_name"])
  rights: [Right!]! @hasMany
}

Only if the fields of a type are explicitly marked with @select will Lighthouse replace the default SELECT * with a constrained SELECT, based upon the fields selected in the query.

{ users { id last_name full_name } }
# SELECT id, last_name, first_name

@petecoop
Copy link

@spawnia I agree on your approach on it being an entirely opt-in feature through the @select.

I've been following this for a while as I have one table that has a couple of large blob columns (not the best design but it's a legacy project), without constraining the SELECT the performance is terrible. I've had to fix this through a builder that modifies the query, but some way of doing it as you've shown would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Requires input from multiple people performance Fix or optimization of performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.