New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't add new command when connection is in closed state #2321
Comments
Which db, how to reproduce? has it worked in some earlier version? Also please try all knex versions to pinpoint the release where that feature started working differently. |
Updated. |
Ok, so it seems to be new bug. Could you provide the code for reproducing this problem? |
I'm not sure how it happens but when it does, a simple |
@elhigu - per #1833 (comment), I'm seeing this error when I have long-running connections (typically 1-3 days old, that have perhaps timed out or just disconnected.) I'm also seeing:
... just like @tuananh |
Yes. It seems to be same in my case as well |
Hi, just today we've observed (by placing a few good ol' As seen in the code above, the pool will only actually run the validator function if the option |
@rkaw92 sounds good, now we just need a test case and make knex to enable that setting always. I will do that if I can get first one other bug fixed that I'm working on. |
I debugged issue #2325 last night and it is also pool related and I have already test case for it. I probably need to review the whole pool code through and check why connection error exceptions are not passed to knex and why dead connections are not evicted. |
@elhigu As far as I can tell, knex does react to connection errors by setting the Now, I'm also tracking this pg pull request which fixes some pg-specific issues regarding connection loss. If you were testing with |
Thanks for tracking this @rkaw92, appreciated. FWIW, I'm using Hoping we can figure this out soon in our build-- at least once every day or two, the API process needs restarting because of this. Order forms, member dashboards, everything goes down. I wound up putting in a hard |
@leebenson One thing to try without code modification is to pass const db = require('knex')({
client: '...',
connection: '...',
pool: { testOnBorrow: true }
}) This should tell you if it's caused by the connection recycling bug. Of course, it's just a temporary workaround, but it would be helpful if we can see whether it's this or another sneaky one. If successful, at the time when the connection fails, you're going to see this printed to console.log:
And the next thing should be the pool allocating a fresh connection for you. |
Good to know, thanks. Should |
Yeah, you can pass other pool options, too. |
@rkaw92 I am testing this workaround since yesterday morning and it seems to work fine. Before that I had to restart API processes at least once a day, otherwise database connection will eventually close without reconnection. |
Ah, drats. Seeing this in the logs, even with
|
It's possible - sockets do get closed, after all. If a connection fails between acquiring and returning to the pool, you are still going to get errors. The important thing is, is your pool getting starved?
One easy way to check if dead connections are being recycled is to enable `DEBUG=knex:client`, or manually log the client object's `__knexUid` property along with your errors. The Uid is allocated on connection creation, so if you get errors from the same Uid on the same knex instance, you know the pool is misbehaving.
Dnia 17 listopada 2017 12:01:28 CET, Lee Benson <notifications@github.com> napisał(a):
…Ah, drats. Seeing this in the logs, even with `testOnBorrow: true`:
> This socket has been ended by the other party
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#2321 (comment)
-- Wysłane za pomocą K-9 Mail.
|
Thanks, I'll add the extra debug checks. And of course, sockets do get closed. But the expectation of a pool is that it's going to anticipate sockets being closed, and re-connect automatically/silently to reattempt the query. Isn't that the point of a pool? If something drops, handle the logic to reconnect? Throwing an error every time a long-running socket disconnects is definitely not expected. Connections will drop multiple times per day, which is expected. What's not expected is that I have to design for that scenario in code. I'm not even sure how that'd work. Wrap each SQL call in a block that traps an error, and reattempts the same SQL query? Implement timeouts? Handle retry logic? I figured all that stuff was the domain of a connection pool. |
@leebenson You're correct in saying that the job of the pool is to handle reconnection. However, it does this at very specific points: before handing out the client (connection) object to the user. When the client is in the user's hands, there are several reasons why it can't transparently reconnect nor retry queries:
Assuming the pool now works (i.e. the connections it gives out are healthy at the time of dispesing from the pool), all that remains is handling errors that occur mid-transaction. A common strategy is to retry the entire transaction and make the transaction block self-sufficient (i.e. assumes no previous state). It doesn't seem too different from retrying transactions that fail to COMMIT because of serialization failures (in SERIALIZABLE), either. Of course, if the periods between acquiring and releasing a connection from the pool in your code are supposed to be very brief (i.e. no long-lived transactions with prolonged idle periods mid-transaction), and you're still getting socket errors, something might be going on (you'd expect connections to time out when there's no activity, not in the middle of it!). Still, some load-balancing solutions (haproxy?) like to limit the max total duration of a connection and can be set to not care about idle-ness. Definitely check if the connections you're getting are the same old broken ones, or new ones that just disconnect because of network conditions. Also determining if the DB is behind some kind of proxy might tell you something about who to blame for the disconnects. |
Thanks for the detailed insight, @rkaw92. In this particular case, the MySQL instance is hosted on Google Cloud. No proxies are being used. Something is definitely wrong, somewhere -- I'm seeing With Are there any code examples for handling timeouts and testing whether a connection is still alive before executing SQL? I also noticed this in the generic-pool readme:
Are any of those specified by Knex (like a default I appreciate your help with this. Knex is a really awesome query builder, but this connection stuff has me stumped. |
indeed looks like validate reject doesn't doesn't automatically try to find working connection... I have test case already for this, but still need to figure out work around. |
Hm, doesn't the pool automatically call the factory when it needs to replace an invalid resource? Should result in a new resource being fetched, it seems. An easy test for this might involve generic-pool itself and a fake resource that fails after a random timeout - I don't think we need to involve actual networks if the pool recycler algorithm is the culprit.
Dnia 17 listopada 2017 18:27:56 CET, "Mikael Lepistö" <notifications@github.com> napisał(a):
…indeed looks like validate reject doesn't doesn't automatically try to
find working connection... I have test case already for this, but
still need to figure out work around.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#2321 (comment)
-- Wysłane za pomocą K-9 Mail.
|
@leebenson Knex passes a pool validator function which checks a flag and rejects the connection if the flag has been set. In theory, when the connection breaks, the underlying driver should asynchronously set the flag so that it's picked up when the connection is being acquired from the pool. I'm starting to think this may be mysql2-related: sidorares/node-mysql2#447 If mysql2 suffers from an |
FWIW I have also verified this bug using the standard mysql driver too (and mysql2), and fixed when bumping the version back down. It also occurred during long running connections but I can repro by restarting mysql while the app using knex is running. @elhigu thanks for the quick response to the issue! |
@rkaw92 @tswayne now that I wrote test for it and tried the Validate factory method indeed seem to do the reasonable thing, which is that if resource is bad, it evicts it from the pool and tries to create new resource to replace it and ultimately returns the fresh resource back. I'll be releasing 0.14.1 in few hours. |
Hmm.. looks like there is still something fishy going on... running the test case alone worked fine, but running all tests still gives Edit: actually this last problem seems like timing issue. DB error event of closed connection has not been emitted yet when I tried to make new queries to killed connections. So validate did pass, but connection didn't work. I don't thing there is anything (well we could make async check, but that would be too wasteful) that we can do to make that part work better, 2ms delay was enough locally to get error events registered before making new queries. |
Please test this with 0.14.1 and lets reopen if there are still problems. |
same issue with 0.14.1:
|
@leebenson please open new issue, share some test case or guide how to reproduce and which mysql driver are you using. Only way that happen that I can think is that for some reason connection closed event is not recognised... but according to test case that I wrote it is. There might be windows of few milliseconds, between connection close and that being noticed by the pool, so if you are able to hit that window with query you should be getting that error. |
@elhigu - this is my connection string:
I'm accessing the DB very infrequently - literally 2-5 SQL commands per day, max. It's used as part of an API server that is processing sales via a new (limited access) order form. The API server is loaded from a Docker instance, running on Google Container Engine. MySQL is running via Google managed DB, and connected to directly over public IPv4. I see I'll need switch to another lib as a priority because issue is surfacing daily at point-of-sale, and effectively blocking the API. I'll try and get some test cases up when I have some time to play with it. But I think if I'm facing this issue, others must be too... I'm not doing anything exciting, just connecting to the MySQL DB and issuing very infrequent SQL against it. |
@leebenson for 0.14.1 i wrote a testcase which killed connections and then tried to use them. I had that exact error message on 0.14.0 when testOnBorrow was set. To 0.14.1 I fixed that problem... so if you cannot provide some test case I cannot know if the problem is now in your setup or happening generally to everyone. On 0.14.1 testOnBorrow is set automatically so you can try to drop that at least. |
Will set-up some test cases as soon as I can. Out of interest, how did you kill the connection in your test? Did you do that via knex, or did you somehow (more abruptly) pull the plug? Did you also test timeouts? Thanks for your help. |
@leebenson I just told mysql to kill the connection (https://github.com/tgriesser/knex/blob/master/test/tape/pool.js#L105) there are also some docker based tests where actual server process is shutdown. No I haven't tested mysql timeouts... maybe I could check out if they can be set for each connection separately. |
@elhigu - after a bit of local testing, it seems that Google Cloud SQL simply times out after 10/15 minutes of idle usage. Apologies if this has been answered elsewhere (couldn't see anything obvious in the docs), but is there is a way to set an 'idle' connection timeout so that an 'old' connection won't even be attempted? |
@leebenson it actually really isn't even attempted. It is attempted in a sense that
https://github.com/coopernurse/node-pool There are also parameters for setting idle timeout. I believe in addition to ide timeout config (defaults to 30s) one must set |
Thanks @elhigu. There's a thread I've commented on here that @rkaw92 mentioned earlier, which I think is the real underlying issue. The problem seems to be the mysql2 driver not evicting connections. A workaround suggested in that thread (using sequelize) is to attempt a I'm baffled that default logic for handling socket timeouts seems to have been overlooked in the driver. This doesn't just affect Knex- any ORM that uses mysql2 is affected. In low-traffic environments that use Google Cloud SQL and other managed cloud DBs, which typically have an idle timeout, it'll result in I'm amazed this hasn't been reported more. |
I'll test this today, thanks @elhigu. |
That might be case some times, but in my tests I'll try to reproduce and see if that really is a problem in mysql2. If that is the case, maybe knex with mysql2 driver should set some defaults to generic-pool so that pool evicts the connection, before mysql closes the connection. |
I'm still wondering if there's a distinction between a connection that is explicitly terminated, and one where the connected socket simply times out. |
Just tried by setting idle connection timeout
Seems to work as expected. Maybe the linked issue exist only when using pool from mysql driver. |
FWIW, I switched to Sequelize and the problem disappeared. Not sure if they're setting |
Any update on this issue @elhigu I just updated to 0.14.2 in production and began seeing the warning Oddly enough, this did not occur locally for me running node 7.10.1 on a macbook pro. Is there a reason that running transactions would cause the server to close down connections? I ended up downgrading to 0.13.0, which resolved the issue. |
@cyorobert do you have some code example that I could use to reproduce the behaviour? |
I found that this was being caused due to importing a large file. I fixed it by setting "max_allowed_packet=32505856" |
This happened to me today when i upgrade to
0.14.0
. Not sure when the bug was introduced.Also, i saw this along with the above message:
This socket has been ended by the other party
Previously working version seems to be 0.13.0
node 8.9.1, knex 0.14.0, "mysql2": "1.4.2"
The text was updated successfully, but these errors were encountered: