Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Potential) Memory Leak In insert() Call #710

Closed
mortenbo opened this issue May 29, 2020 · 7 comments
Closed

(Potential) Memory Leak In insert() Call #710

mortenbo opened this issue May 29, 2020 · 7 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/nodejs-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@mortenbo
Copy link

Environment details

  • OS: Linux Alpine, macOS
  • Node.js version: v14.3.0
  • npm version: 6.14.5
  • @google-cloud/bigquery version: ^4.7.0

Steps to Reproduce

const {BigQuery} = require('@google-cloud/bigquery');

(async () => {
    const client = new BigQuery({
        keyFilename: './google-creds.json',
        projectId: 'myproductid'
    });
    const run = async () => {
        await client
            .dataset('events')
            .table('analytics')
            .insert([{
                insertId: '1',
                json: {
                    test: 1
                }
            }, {
                insertId: '2',
                json: {
                    test: 2
                }
            }], {raw: true});
    };

    setInterval(async () => {
        await run();
    }, 5000);
})();

Here's a production screenshot that shows external memory (top), total heap (middle) and used heap (bottom).

Screen Shot 2020-05-29 at 16 12 15

As you can see external memory slowly goes up. If I remove my BigQuery insert statements in my endpoint external memory releases and doesn't gradually increase.

Here's an isolated external memory chart for one of the pods:

Screen Shot 2020-05-29 at 16 10 13

I've found out it's (maybe) related to teeny-request being called in node_modules/@google-cloud/common/build/src/util.js. Specifically this function:

    /**
     * Make a request through the `retryRequest` module with built-in error
     * handling and exponential back off.
     *
     * @param {object} reqOpts - Request options in the format `request` expects.
     * @param {object=} config - Configuration object.
     * @param {boolean=} config.autoRetry - Automatically retry requests if the
     *     response is related to rate limits or certain intermittent server
     * errors. We will exponentially backoff subsequent requests by default.
     * (default: true)
     * @param {number=} config.maxRetries - Maximum number of automatic retries
     *     attempted before returning the error. (default: 3)
     * @param {object=} config.request - HTTP module for request calls.
     * @param {function} callback - The callback function.
     */
    makeRequest(reqOpts, config, callback) {
        const options = {
            request: teeny_request_1.teenyRequest.defaults(requestDefaults),
            retries: config.autoRetry !== false ? config.maxRetries || 3 : 0,
            shouldRetryFn(httpRespMessage) {
                const err = util.parseHttpRespMessage(httpRespMessage).err;
                return err && util.shouldRetryRequest(err);
            },
        };
        if (typeof reqOpts.maxRetries === 'number') {
            options.retries = reqOpts.maxRetries;
        }
        if (!config.stream) {
            return retryRequest(reqOpts, options, (err, response, body) => {
                util.handleResp(err, response, body, callback);
            });
        }

If I return before return retryRequest external memory doesn't go up. Maybe I'm wrong but this is as far as I got before concluding I need some help from the authors.

I hope you can help! Thanks!

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/nodejs-bigquery API. label May 29, 2020
@stephenplusplus
Copy link
Contributor

If I return before return retryRequest external memory doesn't go up.

Do you mean you do not make the API request? The call to retryRequest(reqOpts, ...) is where the logic of the API request begins.

In general, I've noticed while investigating potential memory leaks, that it's most often Node's internal memory management that is choosing not to release the memory we used... yet. It has its own lifecycle that can make it look like there's a leak. And I'm not sure exactly what will trigger the eventual memory clean up-- when something else asks for memory? When the available system memory from other processes gets knocked down below some threshold? When the memory usage of the Node process hits a preset value? When the Node process isn't busy for a predetermined amount of time?

However, if the process crashes with an out of memory error, we definitely have a problem!

In any case, I'm working on reproducing this. If you have any additional details to share from further investigation on your end, feel free in the meantime. Thanks for reporting!

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 30, 2020
@mortenbo
Copy link
Author

@stephenplusplus Yeah, not making the actual HTTP request makes it release external memory shortly after.

We have multiple Node.js services and they look just normal; releasing memory often and regularly. This is the only service with this pattern and it uses BigQuery insert and it doesn’t really do anything other than that. This is the status after 44 hours:

7BFBF099-DC3A-4DD9-B057-F0277C59AD8A

Still increasing memory usage and not letting go.

Have you found anything? Looking more into it it looks related to “new Response” in your code.

I’ve also found a “bug” in your code where you look at response type “application/json” or “application/json; charset=utf-8” but the actual response type is UTF-8 (uppercase). It means it uses text() instead of json(). Not a big issue but it’s incorrect. It happens here: https://github.com/googleapis/teeny-request/blob/88ff2d0d8e0fc25a4219ef5625b8de353ed4aa29/src/index.ts#L217

I also find this library a bit hard to follow/understand due to a lot of nested dependencies. Is there any reason why you don’t use e.g. axios directly in this library? Or can I decide to use it myself circumventing teeny-request?

@mortenbo
Copy link
Author

It almost seems like the JSON response from insertAll leaks. It’s a very tiny payload response and it seems to correlate with our number of requests.

@mortenbo
Copy link
Author

mortenbo commented Jun 2, 2020

Yesterday, I tried to replace the official BigQuery client with my custom axios one with HTTPS pooling, etc. The leak is now gone:

9E93738C-90C6-4E3B-A8EE-5F31262556C1

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Jun 3, 2020
@bcoe bcoe added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jun 3, 2020
@yoshi-automation yoshi-automation removed triage me I really want to be triaged. 🚨 This issue needs some love. labels Jun 3, 2020
@mortenbo
Copy link
Author

Update: It seems the memory leak is specific to certain versions of Node.js: nodejs/node#33468

The application/json bug is still present in your code.

@meredithslota
Copy link
Contributor

The issue nodejs/node#33468 is marked fixed and I believe patches to impacted versions have been released. @mortenbo Have you seen this issue reoccur since?

@mortenbo
Copy link
Author

No, all good here. It was a bug in core Node.js that leaked memory. Thanks for taking the time to review this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/nodejs-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants