Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection leak? #1777

Closed
johanneswuerbach opened this issue Dec 1, 2018 · 12 comments
Closed

Connection leak? #1777

johanneswuerbach opened this issue Dec 1, 2018 · 12 comments

Comments

@johanneswuerbach
Copy link
Contributor

We recently updated some of our postgres client libraries and experience an almost total pool exhaustion roughly 2~3h are deploying the new version in production (never happened before) on 2 out of 20 DBs.

We updated the following libraries.
pg 7.4.3 ~> 7.6.1
pg-pool 2.0.3 ~> 2.0.4
pg-promise 7.5.4 ~> 8.5.2

Once the pool is slowly exhausting a majority of queries hitting an affected process are returned with "timeout exceeded when trying to connect" https://github.com/brianc/node-pg-pool/blob/v2.0.4/index.js#L178 and the percentage is increasing overtime and condition is persistent (~30min until revert)

I'm debugging this for multiple days now, but have a hard time identifying the exact root cause, so far suspected:

brianc/node-pg-pool#86, which means we generally queue more work now as no longer all pending queue items are dropped, but we rarely saw "timeout exceeded when trying to connect" errors before so this seems unlikely

#1503 some kind of race condition here as both affected DBs occasionally are hit by queries running into statement timeouts.

Any ideas, pointer, potential areas for races would be really appreciated.

//cc @vitaly-t I know that pg-promise is not part of pg distribution, but you seemed really active here and I would prefer a single spot for discussion. Any insight would be appreciated.

We mainly (but not exclusively) use nested transactions via https://github.com/vitaly-t/pg-promise#transactions starting with SET LOCAL statement_timeout = 30000;

@vitaly-t
Copy link
Contributor

vitaly-t commented Dec 1, 2018

Can you identify which version exactly introduced the issue? So we can narrow down possible breaking changes.

@johanneswuerbach
Copy link
Contributor Author

Not really as once the condition hits, we return a lot of errors on our APIs, which are used by customers.

I'm trying to replica this in our staging environment or in specific integration tests, but haven't been lucky so far.

@vitaly-t
Copy link
Contributor

vitaly-t commented Dec 1, 2018

Well, we'd need to know which version broke it, unless there is something wrong in your code. For example, sometimes developers use an obsolete approach to using connections, and once support of it is removed, upgrading the library may result in bugs like that.

Also, it is a little confusing that you list several versions of pg, pg-pool and pg-promise, because if you use pg-promise, you do not include anything else in your project.

@johanneswuerbach
Copy link
Contributor Author

Yes, those are only sub-dependencies, which have been updated from version x ~> version y. I listed them here as each of them could have potentially introduced this.

sometimes developers use an absolute approach to using connections

Absolutely, but afaik we only use https://github.com/vitaly-t/pg-promise#transactions with async/await or db.{oneOrNone,manyOrNone,result} and never manage connections explicitly on in our code.

@vitaly-t
Copy link
Contributor

vitaly-t commented Dec 2, 2018

Misspelled - I meant obsolete approach 😄

and never manage connections explicitly on in our code.

That's a good start 😄

I hope you will find from which version your code stopped working 😉

Maybe you've got a leech somewhere, like a module that after upgrading no longer releases connections correctly.

@skipjack
Copy link

skipjack commented Dec 2, 2018

I think I may be seeing something similar on pg@7.7.1. I'm using pg with node-cron and on the first execution of the task all the sql queries fire perfectly. However, come the second execution (I'm using 30 second intervals for testing), it either hangs or errors out with:

Error: timeout exceeded when trying to connect
    at Timeout.setTimeout [as _onTimeout] (/.../node_modules/pg-pool/index.js:178:27)
    at ontimeout (timers.js:478:11)
    at tryOnTimeout (timers.js:302:5)
    at Timer.listOnTimeout (timers.js:262:5)

if I have connectionTimeoutMillis set. The code looks roughly like this:

cron.js

const { CronJob } = require('cron')
const { someJob } = require('./jobs')

new CronJob('*/30 * * * * *', someJob, null, true)

jobs.js

const { Pool } = require('pg')

// Passing `connectionTimeoutMillis: 5000` here makes it error out instead of hanging
const pool = new Pool()

const someJob = async () => {
    // This is the line that hangs or errors on the second execution
    let { rows } = await pool.query('...', [ ... ])
    
    rows.forEach(row => {
          let client = await pool.connect()

          try {
              await client.query('...')
              await client.query('COMMIT')
              
          } catch (err) {
              await client.query('ROLLBACK')
          }
    })
}

The above example is obviously contrived but it captures my underlying usage. I can try to throw together a reproducible repo but I'm not sure how what db would hit since I obviously can't expose the service I'm working on.

Aside from this cron job, pools have worked fine for me throughout the rest of the application I'm building (a koa-based api). I have read through the docs to see if I'm missing some cleanup steps or something and tried a few things like pool.end all to no avail. Any advice would be much appreciated. I'm fairly new to pg and postgres in general so I'm not sure where to go next from the timeout exceeded when trying to connect error.

@skipjack
Copy link

skipjack commented Dec 2, 2018

The issue (for me at least) was that I wasn't calling client.release() in a finally at the end of the try ... catch. @johanneswuerbach if this isn't your issue, I'm happy to delete these comments so you don't have extra noise on your ticket (sorry bout that 😞).

I noticed this after seeing the default max of 10. Once let client = await pool.connect() was called ten times I started seeing the timeout errors and I confirmed by bumping the max and seeing more fire before failing. @brianc @vitaly-t would there be a way for this package to throw an error like the following when the max is exceeded (rather than the cryptic timeouts):

Error: You've exceeded the maximum number of clients in your pool. Make sure to call `client.release()` to release clients back into the pool and bump `max` to more than 10 if you actually need 10+ clients in the pool at once.

@johanneswuerbach
Copy link
Contributor Author

@skipjack yes, we might have a similar missing release somewhere, but sadly I haven't found anything so far.

Especially as usually (and before the upgrade) everything just works ™️ , but at some pointing something is killing the pool. Below an example of a process generally serving ~100-200 req/min until the pool is gone and only 500s caused by timeout exceeded when trying to connect are returned.

image

@johanneswuerbach
Copy link
Contributor Author

Found two potential candidates brianc/node-pg-pool#109 🎉

@skipjack
Copy link

skipjack commented Dec 2, 2018

@johanneswuerbach nice. Yeah so I think we were running into the the same issue (the pool gets filled and then timeouts are thrown) but with different root causes. In my case, I was missing the client.release() call. In yours, it sounds like an actual bug since you verified client.release() is called and the issue arose after a picking up a new pg version. Anyways, hope your pr resolves whatever's causing the bug.

@johanneswuerbach
Copy link
Contributor Author

We are running with brianc/node-pg-pool#109 in production now for some time and it looks like this fully resolved our issues.

@johanneswuerbach
Copy link
Contributor Author

We are running latest versions of all libraries now and no longer see this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants