Batch big IN/NOT IN selections #575

pimeys · 2020-03-11T19:34:02Z

This will fix the issues when joining and fetching with a big selection of records when the database doesn't like that big of a queries anymore.

What it does is it checks the filter, if it can find ONE scalar filter that is IN/NOT IN with more than 5000 items, it will split the filters into batches of at most 5000 items and run the queries in parallel.

So this is going to be batched:

query {
  findManyUser(where: { id_lt: 10000 }) {
    posts(where: { id_lt: 10000 }) {
      id
    }
  }
}

The batch size can be controlled via QUERY_BATCH_SIZE environment variable for test purposes.
Hopefully on Friday I can solve the last query in a way that works and I can also write tests for this somehow.

tomhoule

I'm not familiar enough with the implementation of the filters yet to say if there's high level problems with the design, but it looks good to me 💯

One super nitpicky thing: the PR description is helpful, it would be nice to have it in the commit message for future reference (sometimes I do use that :p)

tomhoule · 2020-03-12T08:11:11Z

query-engine/connectors/query-connector/src/filter/scalar.rs

+            }
+
+            batches
+        }


I was thinking a bit more about this, and if the list here is large (which it is if we are batching), then we are iterating over it, but if we computed the indexes for the batches, we could iterate over those and push the batches one at a time (instead of an element at a time). Here we have per-element costs like the conditional, and the last_mut lookup (with bounds check). But that's an optimization we can do later once we have decided we do want to do this this way and the API settles down.

Yeah this thing here also is not the final version. It's the first version that somehow works. Now we need some conversation about this. The first limit I'd like to lift is the problem of this not working if we have more than one IN statement in the filters.

query-engine/connectors/sql-query-connector/src/database/operations/read.rs

This will fix the issues when joining and fetching with a big selection of records when the database doesn't like that big of a queries anymore. What it does is it checks the filter, if it can find ONE scalar filter that is `IN`/`NOT IN` with more than 5000 items, it will split the filters into batches of at most 5000 items and run the queries in parallel. So this is going to be batched: ```graphql query { findManyUser(where: { id_lt: 10000 }) { posts(where: { id_lt: 10000 }) { id } } } ``` The batch size can be controlled via `QUERY_BATCH_SIZE` environment variable for test purposes.

dpetrick

Apart from the discussed ordering issues, this looks alright.

pimeys · 2020-03-16T19:48:16Z

Addresses #579

tomhoule

My only doubt is whether this will interact with other features, and we miss it because the default is 5000 and most tests don't have that many records in the database.

libs/prisma-models/src/order_by.rs

tomhoule · 2020-03-17T10:24:58Z

query-engine/connectors/sql-query-connector/src/database/operations/read.rs

+        }
+
+        if let Some(ref order_by) = order {
+            records.order_by(order_by)


Hmm so here we actually sort ourselves right?

pimeys · 2020-03-17T11:37:18Z

My only doubt is whether this will interact with other features, and we miss it because the default is 5000 and most tests don't have that many records in the database.

We'll keep an eye on this. For now with lots of manual testing and with the integration tests, I was able to find two problems:

The selection values should be deduplicated, which is fixed here
The ordering needs to happen in the application layer (connector), which is fixed here

pimeys requested review from dpetrick and tomhoule March 11, 2020 19:34

pimeys force-pushed the batching branch from 34d4a20 to 54f27e3 Compare March 11, 2020 20:00

tomhoule approved these changes Mar 12, 2020

View reviewed changes

pimeys self-assigned this Mar 12, 2020

pimeys added this to the Preview 24 New milestone Mar 12, 2020

pimeys force-pushed the batching branch from 54f27e3 to 15668ed Compare March 12, 2020 10:56

divyenduz added process/next-milestone and removed process/next-milestone labels Mar 12, 2020

divyenduz modified the milestones: Preview 24 New, Preview 25 Mar 13, 2020

pimeys force-pushed the batching branch from 0cce28f to fa7dac5 Compare March 16, 2020 10:40

Julius de Bruijn and others added 2 commits March 16, 2020 12:24

Add a broken test for ordering

e22cd50

pimeys force-pushed the batching branch from fa7dac5 to e22cd50 Compare March 16, 2020 12:06

Ordering in the connector for batches

186979d

dpetrick reviewed Mar 16, 2020

View reviewed changes

pimeys force-pushed the batching branch from 930ab78 to 89330c5 Compare March 16, 2020 19:51

tomhoule approved these changes Mar 17, 2020

View reviewed changes

pimeys force-pushed the batching branch from 89330c5 to d38a7af Compare March 17, 2020 10:48

Ordering with a relation id

d38a7af

pimeys merged commit fb04976 into master Mar 17, 2020

pimeys deleted the batching branch March 17, 2020 11:35

pimeys mentioned this pull request Mar 19, 2020

Query Engine: SQL connector must use chunking when a large number of parameters is reached #546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch big IN/NOT IN selections #575

Batch big IN/NOT IN selections #575

pimeys commented Mar 11, 2020 •

edited

tomhoule left a comment

tomhoule Mar 12, 2020

pimeys Mar 12, 2020

dpetrick left a comment

pimeys commented Mar 16, 2020

tomhoule left a comment

tomhoule Mar 17, 2020

pimeys Mar 17, 2020

pimeys commented Mar 17, 2020

+                          }
+                          batches
+                      }

Batch big IN/NOT IN selections #575

Batch big IN/NOT IN selections #575

Conversation

pimeys commented Mar 11, 2020 • edited

tomhoule left a comment

Choose a reason for hiding this comment

tomhoule Mar 12, 2020

Choose a reason for hiding this comment

pimeys Mar 12, 2020

Choose a reason for hiding this comment

dpetrick left a comment

Choose a reason for hiding this comment

pimeys commented Mar 16, 2020

tomhoule left a comment

Choose a reason for hiding this comment

tomhoule Mar 17, 2020

Choose a reason for hiding this comment

pimeys Mar 17, 2020

Choose a reason for hiding this comment

pimeys commented Mar 17, 2020

pimeys commented Mar 11, 2020 •

edited