Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended Query: Support Batch Execution #2257

Open
mpareja opened this issue Jun 26, 2020 · 12 comments · May be fixed by #2942
Open

Extended Query: Support Batch Execution #2257

mpareja opened this issue Jun 26, 2020 · 12 comments · May be fixed by #2942

Comments

@mpareja
Copy link

mpareja commented Jun 26, 2020

Hi!

The Extended Query protocol enables drivers to submit multiple BIND messages before SYNC. One of the big benefits of using Extended Queries is that you can efficiently send a batch of executions without waiting for round trips for each execution. Pairing that with prepared statements and some simplifications: you send a single PARSE, a bunch of BIND/EXECUTE and a SYNC to find out how things went.

In other words, you'd be able to support something like the following without needing 4 entire round trips. (I'm not recommending this API since it would be a terrible breaking change.)

await client.query({
  name: 'my_query',
  text: 'insert sometable (id, val) values ($1, $2)'
  values: [
    [ 1, 'asdf' ],
    [ 2, 'fdsa' ],
    [ 3, 'qwer' ],
    [ 4, 'uytr' ]
  ]
})

For more information, check out how the JDBC Postgres driver handles a batched execution. There are a few layers to dig through, but this appears to be the core of the code that sends a batch of messages and subsequently sends a single SYNC. NOTE: their driver imposes a limit of 128 records per batch as (apparently) further batching does not improve performance.

@rightaway
Copy link

Isn't it already possible in this case? #1190 (comment)

@mpareja
Copy link
Author

mpareja commented Sep 8, 2021

While the two approaches look similar, there are very different performance characteristics. Sending a large number of SQL statements with different parameters will perform much worse than sending a single prepared statement and binding many parameters to it.

@brianc
Copy link
Owner

brianc commented Sep 8, 2021

yeah a proper "batched query" would be nice. Probably a separate object you pass to client.query or pool.query that was like

const batch = new BatchQuery({
  name: 'optional',
  text: 'INSERT INTO foo (bar) VALUES ($1)',
  values: [
    ['first'],
    ['second']
  ]
})
const result = client.query(batch)
for (const res of result) {
  for (const row of res) {
  }
}

Then the batch query execution could throw if this is false for some validation:

for (const row of config.values) {
  if (!Array.isArray(config.values)) {
    throw new Error('Batch commands require each set of values to be an array. e.g. values: any[][]')
  }
}

something like that. Then it would be explicit.

@damianobarbati
Copy link

@brianc is this supported?

@brianc
Copy link
Owner

brianc commented Aug 29, 2022

is this supported?

from a protocol perspective, yes. But I haven't actually implemented the code yet.

@iamkhush
Copy link

Hi @brianc , I am interesting contributing since I believe this would be helpful for my usecase in production.
Any guidance would be appreciated.

@brianc
Copy link
Owner

brianc commented Aug 29, 2022

That'd be cool! I'd suggest making this a separate module like pg-cursor or pg-query-stream. It's fine to inline it into this repo as another module here, but best to keep it out of core of pg to keep bloat to a minimum. So, w/ that in mind we can look at pg-cursor to see how to do something like this...

Basically anything passed to client.query will be sniffed to see if it has a submit function. If it does, that function is called, passing in the connection object. From that point forward it can fully take over the underlying connection object (which is basically low level functions to send/receieve postgres packets directly) and do anything it wants. Once it emits end it'll need to clean up after itself. It's not the worlds most well-designed API, particularly from my current skill level, but it is what I came up w/ many years ago and in the interest of backwards compatibility it is what is there today.

@brianc
Copy link
Owner

brianc commented Aug 29, 2022

@aleclarson
Copy link

It would be great if this feature was implemented with support for multiple queries, rather than multiple value arrays only.

const batch = new BatchQuery({
  name: 'optional',
  queries: [
    ['INSERT INTO foo (bar) VALUES ($1)', ['first']],
    ['DELETE FROM foo WHERE id = $1', [1]],
  ],
})

@charmander
Copy link
Collaborator

@aleclarson That would be pipelining, not multiple bind.

@abenhamdine
Copy link
Contributor

for pipelining, see the experiment in #2706

@iamkhush iamkhush linked a pull request Mar 29, 2023 that will close this issue
@iamkhush
Copy link

Hey @brianc @mpareja I have put up a PR.
Can you please have a look at the PR and provide feedback

Remarkably, the bench.ts file shows that inserts are getting 100% increase in query per second.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants