New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/client.js this.activeQuery sometimes null #3174
Comments
First of all: sorry that's happening. That's troubling to hear. Do you have any self contained way to reproduce the issue or more about your environment you could describe? Having a way to reproduce this would be wonderful as I'm sure its an easy fix when it comes down to it. That code is fine to completely skip or comment out on a fork of your own. I doubt you're even using the Are you using pg-bouncer? I'll take a look at the code a bit more in the coming weeks & keep this issue in mind. I'm in the process of spinning down my full-time job right now so I can focus more on open source, starting early April! |
Hello again and thanks for the quick reply!! After further review of past 24 hours we have also found another seemingly related trace:
This seems to be pointing to: node-postgres/packages/pg/lib/client.js Line 382 in 91de4b9
Another optimistic edit would be change this: _handleCommandComplete(msg) {
// delegate commandComplete to active query
this.activeQuery.handleCommandComplete(msg, this.connection)
} To this: _handleCommandComplete(msg) {
// delegate commandComplete to active query
if (this.activeQuery) {
this.activeQuery.handleCommandComplete(msg, this.connection)
}
} More about our environment: We are Timescale customers and we use their managed db as a service In our prod environment we do make use of pgbouncer we are at version 1.21.0-p2 Additionally we are not using the pg native option Thanks again! |
Many of our services use a mix of SELECT, UPDATE, and INSERT however I can say that for one of the services which has been effected today this service is 100% SELECT statements and the service is not using prepared statements. |
Thanks! The only thing I can think of that's triggering this is postgres response messages are coming in "out of order" and the client is getting confused. Not sure if pgbouncer or pgproxy is the culprit or something else but this is how the (extended) protocol works (mostly found here)
Only when the pg client receives My hunch (might be wildly wrong) is that one of the proxies or pg-bouncer is getting the postgres response messages out of order or sent to the wrong client? Particularly when you say this:
If a service isn't using prepared statements then its never even getting into the extended query protocol and those clients should never receieve a |
I think something I could work on with ya at some point possibly is at least making this not a catastrophic uncatchable error which is likely crashing the node process, but instead makes the client emit an error you can handle. This way you could catch it and then (I would highly recommend) close the client out & throw it away as it somehow got its internal state machine poisoned w/ what is likely out of order messaging (which is v strange) |
OK I've already forked the repo and I've started down the path of making some edits to the lib which we can deploy to prod
About this we use pg.Pool in every service which uses pg, is it possible to tell the pool to destroy/evict a specific connection? Thanks!! |
OK I've modified lib/client.js to check that this.activeQuery is non-null before every access and I think this code is probably good to deploy to the SELECT-only prod service. About the "stalled connection" we use |
Note that “prepared statements” in the sense of the extended query protocol also includes any query with a parameter. Like the other times this error has come up in issues: I would definitely not suggest ignoring the problem by checking for
is absolutely a possibility.
|
Hey @charmander thanks for writing back! OK I was not aware that parameterized queries count towards prepared statements we do use 99% parameterized queries We have a support plan with Timescale and I can point them towards this issue to see if maybe they can help server-side So since we are in the context of the extended query protocol it does make sense to me that maybe the naive approach of skipping over the Like results/rows for one query are returned to the wrong caller If you have any recommendations of further instrumentation/debugging we can add to our stack to investigate this we'll do what we can to get it in place Thanks again!! |
Could it be of any use to trial the pg native feature with our SELECT only service ? |
You could try that out - it might not have the problem. It might throw away messages out of order internally or be more lax w/ them in some way? Note: node-libpq doesn't compile on node 20 yet - that's also on my list for next month!
Two ways to do this:
Sorry yeah I should have been more clear - any query which contains parameters goes through the "extended query protocol."
In side I gotta go to bed - got my eyes dialated today and man alive it sucks to read. I'll be back w/ more during the day tomorrow. Sucks so bad we can't reproduce this in any isolated way - would really love to figure out what this is and fix it. |
Hey @brianc oh my yes I remember the last time I had my eyes dilated and I had to find my way onto the subway after it was not easy! Last night just before I passed out something @charmander raised about the parameterized query made me remember about the new prepared statement support in pgbouncer.
https://www.pgbouncer.org/config.html#max_prepared_statements In prod we are using pgbouncer 1.21.0 so the max_prepared_statements config param is supported but we have not specified a value for this in our config and the docs say the default value is zero (disabled). For pgbouncer we use the "transaction" pooling mode and when it was first recommended to us to use pgbouncer we asked about this parameterized queries and it was explained to us that parameterized queries dont count towards prepared statements but it seems maybe that was not correct. It seems likely to me from both of your responses that we should probably give Just a question about this bit
I think what they are saying is if the client lib is using the binary protocol to prepare statements then pgbouncer will rewrite and support them but if the library is using text strings this pgbouncer feature will not work with transaction pooling mode, do you read this the same way and can you say if the library is using the binary protocol or text strings? About rolling this out I am going to spend the day doing some testing locally with Thanks!! |
yeah I am pretty sure what that's saying is if you put a As far as I know there's not a lot of difference between a prepared statement and a parameterized query. From my understanding every time you execute a parameterized query you technically create a prepared statement without a name so every parameterized query replaces the unnamed prepared statement. If you do this await client.query({
text: 'SELECT $1::text',
params: ['hello world'],
}) you end up telling postgres
However, if you do this await client.query({
text: 'SELECT $1::text',
params: ['hello world'],
name: 'hello world query'
}) you end up telling postgres
the next time you execute a query on the same client named For example this: await client.query({
text: 'SELECT $1::text',
params: ['goodbye for now'],
name: 'hello world query'
})
Effectively you skip the parse step (which is where the query is planned for the most part AFAIK). This saves 1 network roundtrip and perhaps a millisecond or two for query planning. I wouldn't recommend using this all the time, but I've used in the past when we had a couple extremely large query texts (like 1000 lines long) and we execute that query regularly, we can skip the network round-trip of sending a 100kb of query text over the network hundreds of time a second. In my experience its a micro-optimization that doesn't really do much...but anyways I'm just getting into the weeds here to explain how things work, and to explain why I often conflate the terms "prepared statement" and "parameterized query" - they're very, very similar from a protocol perspective. |
@brianc thanks a lot for this explanation and how you describe it makes good sense to me, I spent this morning trying to create a POC which would reproduce the bug locally however I have yet to reproduce it maybe you have some ideas on what can be changed to increase the chance of reproducing https://github.com/rhodey/pgbouncetest What the test does
The idea is to create a large number of different parameterized queries and exec them in parallel while connected to pgbouncer with max_prepared_statements disabled. What I think is happening in prod is that ConnectionA is getting prepared statement messages which are meant for a query on ConnectionB do you agree? My laptop is relatively low performance w/ 4 CPUs and I was unable to have any of the children exit with an error, I had a co-worker with a 24 CPU machine run this test for 5 minutes and they also had no children processes crash. Note that pgbouncer max_db_connections is set to 2 which I believe is what you'd want to maximize the chance of one connection seeing prepared statement messages from another connection. I would be real happy to reproduce this issue locally and then have the max_prepared_statements = 200 config prevent the problem and then it would be a no brainer to deploy this to prod but so far that hasn't happened I think if today goes by and we are still unable to reproduce the issue locally before tonight we may still end up deploying a modified pgbouncer with max_prepared_statements = 200 Actually in prod today we have yet to see a single instance of either of these errors turn up, not since last night March 20 22:26 Eastern Time Something about this test is that it is unlikely that any of the children get one or more rows returned from the random queries just based on how the code has been designed, I did not think it was important to have the queries returning data but perhaps you would know better and I could refactor, Thanks again!! |
Hello All, thanks to all the contributors who keep this project going, I have an issue to report
Before today I ran into this issue in prod approx 6 times total in ~12 months, but today for some reason this issue has turned up six times.
We have approx 24 services with 200 tasks running on AWS ECS and 90% of these services make use of this pg library in one way or another, another maybe important note is that they all use the pg.Pool
So the stack trace we get in prod from multiple services is:
So most of that stack trace does not say a lot but I think what it is saying is to see this line here:
node-postgres/packages/pg/lib/client.js
Line 389 in 91de4b9
And it must be saying that
this.activeQuery
is nullNow I could open a PR and modify this function:
And simply change it to:
There is a chance that is all that is needed but someone more knowledgeable about the library would know if this is maybe a symptom of a larger problem. Anyone online care to comment ??
I am concerned about patching the library and testing it out on our prod stack because I don't want my edit to cause the lib to somehow be confused and make a serious error.
Can anyone comment on if this edit seems like a safe thing to try we are not able to reproduce this outside of prod.
I can say that the services effected today none of them use transactions maybe that is helpful info
pg version 8.10.0
Thanks!!
The text was updated successfully, but these errors were encountered: