Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Referenced Object never be recycled even if napi_delete_reference called #39915

Open
Brooooooklyn opened this issue Aug 27, 2021 · 14 comments
Open
Labels
node-api Issues and PRs related to the Node-API.

Comments

@Brooooooklyn
Copy link

Version

14.17.5

Platform

All

Subsystem

No response

What steps will reproduce the bug?

See https://github.com/Brooooooklyn/node-ref-object-leak for detail

I have a js function in native side, which accept a JsBuffer:

#[js_function(1)]
fn ref_buffer(ctx: napi::CallContext) -> napi::Result<napi::JsNumber> {
    let buffer = ctx.get::<napi::JsBuffer>(0)?;
    let ref_buffer = buffer.into_ref()?;
    let ref_count = ref_buffer.unref(ctx.env.clone())?;
    ctx.env.create_uint32(ref_count)
}

The napi_create_reference, napi_reference_unref and napi_delete_reference are called in this function.

In the JavaScript side:

const { readFileSync } = require('fs')

const { refBuffer } = require('./index.node')
const { displayMemoryUsageFromNode } = require('./util')

const BigBuffer = readFileSync('yarn.lock')

const initial = process.memoryUsage()

async function main() {
  for (const [i] of Array.from({ length: 10000000 }).entries()) {
    let refCount = refBuffer(BigBuffer)
    if (refCount) {
      throw new Error(`refCount should be 0, but got ${refCount}`)
    }
    if (i % 1000000 === 0) {
      await sleep() // Async function to give GC a chance
      if (typeof global.gc === 'function') {
        global.gc()
      }
      displayMemoryUsageFromNode(initial)
    }
  }
}

function sleep() {
  return new Promise((resolve) => {
    setTimeout(resolve, 1000)
  })
}

main()

We can see the memory is growing:

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +80.1 MB │ +80 MB    │ +80 MB   │ +40 B    │  0 B         ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +250 MB │ +85.3 MB  │ +79 MB   │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +414 MB │ +85.3 MB  │ +79.7 MB │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +576 MB │ +85.3 MB  │ +79.3 MB │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +739 MB │ +85.3 MB  │ +79.1 MB │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +902 MB │ +85.3 MB  │ +79 MB   │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +1.06 GB │ +85.3 MB  │ +79 MB   │ -170 kB  │ -162 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +1.23 GB │ +85.3 MB  │ +79.1 MB │ -170 kB  │ -162 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +1.39 GB │ +85.3 MB  │ +79 MB   │ -170 kB  │ -162 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +1.55 GB │ +85.3 MB  │ +79 MB   │ -170 kB  │ -162 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior?

Buffer is deallocated

What do you see instead?

Buffer is never been deallocated

Additional information

Brooooooklyn/snappy#25

@mhdawson
Copy link
Member

There is never any guarantee when the gc will collect objects, when finalizers will be run and also no real guarantee that RSS will go down when memory is released so its often hard to be sure if there is a real leak or not.

Have you tried running for a long period and seeing if it actually fails due to an out of memory at some point?

@Brooooooklyn
Copy link
Author

Brooooooklyn commented Aug 28, 2021

After this change:

async function main() {
    let i = 0
++  while (true) {
--  for (const [i] of Array.from({ length: 10000000 }).entries()) {
    let refCount = refBuffer(BigBuffer)
    if (refCount) {
      throw new Error(`refCount should be 0, but got ${refCount}`)
    }
    if (i % 1000000 === 0) {
      await sleep()
      if (typeof global.gc === 'function') {
        global.gc()
      }
      displayMemoryUsageFromNode(initial)
    }
++  i++
  }
}

And running this program for a while, it stucks and the console output the memory usage:

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +17.1 GB │ -3.14 MB  │ -960 kB  │ -178 kB  │ -170 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +17.3 GB │ -3.14 MB  │ -960 kB  │ -178 kB  │ -170 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +17.1 GB │ -3.14 MB  │ -960 kB  │ -178 kB  │ -170 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

And I removed the reference things in the native side:

#[js_function(1)]
fn ref_buffer(ctx: napi::CallContext) -> napi::Result<napi::JsNumber> {
    let buffer = ctx.get::<napi::JsBuffer>(0)?;
++    let ref_buffer = buffer.into_value()?;
++    let ref_count = ref_buffer.len();
++    ctx.env.create_uint32(ref_count as u32)
--    let ref_buffer = buffer.into_ref()?;
--    let ref_count = ref_buffer.unref(ctx.env.clone())?;
--    ctx.env.create_uint32(ref_count)
}

And the program seems to be fine afterward:

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +590 kB │ -262 kB   │ -1.41 MB │ -8.4 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +700 kB │ -262 kB   │ -1.27 MB │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +1.06 MB │ -262 kB   │ -1.25 MB │ -170 kB  │ -162 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +1.07 MB │ -262 kB   │ -1.25 MB │ -170 kB  │ -162 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ +823 kB │ -3.41 MB  │ -1.25 MB │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

╔══════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss      │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟──────────┼───────────┼──────────┼──────────┼──────────────╢
║ +20.5 kB │ -3.41 MB  │ -1.24 MB │ -170 kB  │ -162 kB      ║
╚══════════╧═══════════╧══════════╧══════════╧══════════════╝

╔═════════╤═══════════╤══════════╤══════════╤══════════════╗
║ rss     │ heapTotal │ heapUsed │ external │ arrayBuffers ║
╟─────────┼───────────┼──────────┼──────────┼──────────────╢
║ -692 kB │ -3.41 MB  │ -1.24 MB │ -170 kB  │ -162 kB      ║
╚═════════╧═══════════╧══════════╧══════════╧══════════════╝

@Mesteery Mesteery added the node-api Issues and PRs related to the Node-API. label Aug 28, 2021
@Brooooooklyn
Copy link
Author

I've updated the codes in https://github.com/Brooooooklyn/node-ref-object-leak

#[js_function(1)]
fn ref_buffer(ctx: napi::CallContext) -> napi::Result<napi::JsNumber> {
--  let buffer = ctx.get::<napi::JsBuffer>(0)?;
--  let ref_buffer = buffer.into_ref()?;
++  let buffer = ctx.get::<napi::JsObject>(0)?;
++  let ref_buffer = ctx.env.create_reference(buffer)?;
    let ref_count = ref_buffer.unref(ctx.env.clone())?;
    ctx.env.create_uint32(ref_count)
}

And if I pass the Buffer directly into ref_buffer function, the memory will grow until the system memory outage.
If I passed the Buffer wrapped with plain Object, the memory seems to be stable in 10min:

const { readFileSync } = require('fs')

const { refBuffer } = require('./index.node')
const { displayMemoryUsageFromNode } = require('./util')

const BigBuffer = readFileSync('yarn.lock')

const initial = process.memoryUsage()

async function main() {
  let i = 0
  while (true) {
--  let refCount = refBuffer(BigBuffer) // Memory leak
++  let refCOunt = refBuffer({ BigBuffer }) //fine
    if (refCount) {
      throw new Error(`refCount should be 0, but got ${refCount}`)
    }
    if (i % 1000000 === 0) {
      await sleep()
      if (typeof global.gc === 'function') {
        global.gc()
      }
      displayMemoryUsageFromNode(initial)
    }
    i++
  }
}

function sleep() {
  return new Promise((resolve) => {
    setTimeout(resolve, 1000)
  })
}

main()

@mhdawson
Copy link
Member

mhdawson commented Sep 1, 2021

Looking at Node-api code

napi_status napi_create_reference(napi_env env,
I don't see anything that would change based on what is being passed is a buffer or an object that references an object. I think there are differences in finalization of buffers, but I don't think that should be related either

One thing that change does do is introduce an allocation point where gc might be triggered. If you run with --trace-gc output do you see a difference between the behaviour?

I also see that you ref the same BigBuffer, so unless I misunderstand the size of that should not make a difference since there is only one of those objects regardless of how many times it is referenced.

@mhdawson
Copy link
Member

mhdawson commented Sep 7, 2021

Spent some more time looking at this today.

Changing the code to be:

const { readFileSync } = require('fs')

const { refBuffer } = require('./index.node')
const { displayMemoryUsageFromNode } = require('./util')

const BigBuffer = new Object();

const initial = process.memoryUsage()

async function main() {
  let i = 0
  while (true) {
    let refCount = refBuffer(new Object())
    if (refCount) {
      throw new Error(`refCount should be 0, but got ${refCount}`)
    }
    if (i % 1000000 === 0) {
      await sleep()
      if (typeof global.gc === 'function') {
        global.gc()
      }
      displayMemoryUsageFromNode(initial)
    }
    i++
  }
}

function sleep() {
  return new Promise((resolve) => {
    setTimeout(resolve, 100)
  })
}

main()

I see that rss grows to about 22MB and then stays there.

If I then edit the rust code to remove the unref:

#[macro_use]
extern crate napi_derive;

#[module_exports]
fn init(mut exports: napi::JsObject) -> napi::Result<()> {
    exports.create_named_method("refBuffer", ref_buffer)?;
    Ok(())
}

#[js_function(1)]
fn ref_buffer(ctx: napi::CallContext) -> napi::Result<napi::JsNumber> {
    let buffer = ctx.get::<napi::JsObject>(0)?;
    let ref_buffer = ctx.env.create_reference(buffer)?;
//   let ref_count = ref_buffer.unref(ctx.env.clone())?;
    let ref_count = 0;
    ctx.env.create_uint32(ref_count)
}

the the rss quickly grows reaching several GB in 3-4 seconds. Fixing the code to add the unref back I then see that rss is stable again. That confirms that unref is doing what we think in the case were we are passing in a new object each time.

If I then change the code so that instead of a new object it reference the same object (one created through simply new Object()) every time I again see the increasing rss. This is the change to the code above to see the difference in behaviour:

    let refCount = refBuffer(BigBuffer)
//    let refCount = refBuffer(new Object())

@mhdawson
Copy link
Member

mhdawson commented Sep 7, 2021

Next I modified the code so that every time the stats were printed out we would create a new object and then use that for the call to refBuffer.

      .
      .
      .
      displayMemoryUsageFromNode(initial)
      BigBuffer = new Object();
    }

After doing that the rss grew to 393MB (It went up higher then shrunk down and stayed there) and then stayed steady at that.

@mhdawson
Copy link
Member

mhdawson commented Sep 8, 2021

Based on the above this is what I think is happening:

The following code creates a reference to Object A. That reference is then unrefed, but the reference itself is not deleted. When the finalizer for the object runs it will delete the reference for you.

let ref_buffer = ctx.env.create_reference(buffer)?;
let ref_count = ref_buffer.unref(ctx.env.clone())?;

BUT before the gc runs, triggering the finalizer, a another reference is created to the same object object A. This means it will not be collected by the gc, the finalizer is not run and the first reference that was created will not be cleaned up.

This matches the other data where if we use a new object every time we don't see any growth in RSS. Since it's a new Object each time, creating the second reference does not prevent the earlier one from being cleaned up. It also matches the case were we re-use the same Object for a period of time and then start using another one. We see RSS grow, but then stabilize as we end up with a maximum number of references held alive related to the same object.

So I think this is working as designed, even if its not exactly what we'd like. We need to use the finalizer for the object to clean up the C structure allocated for the reference. If you prevent the Object from being collected and therefore the finalizer being run by creating another reference you will retain memory until the Object can be collected. Therefore, repeatedly creating a new reference on the same object will cause a memory leak as the Object will never be able to be collected. That is something that is easy to make happen in a simple test but unlikely to occur in a real life scenario.

@mhdawson
Copy link
Member

mhdawson commented Sep 8, 2021

I'll add that even if you delete the reference in addition to unrefing it the same behaviour may occur. This is because if a finalizer is queued we need to wait to delete the reference until after the finalizer runs. You can see the code that could end up defering the deleted until after the finalizer runs here:

void RefBase::Delete(RefBase* reference) {

However, if you avoid unrefing before you delete, the delete will take place immediately as we know it could not have been queued for finalization since the reference count was >0.

@gabrielschulhof
Copy link
Contributor

@mhdawson IINM when we create a buffer on the native side with napi_create_external_buffer() we use a subclass of Finalizer (called BufferFinalizer) for assigning to it its napi_finalize. Perhaps we need a special case in napi_create_reference keyed on the type of the JavaScript object to which we wish to make a reference, that will go down the napi_create_external_buffer code path if given a Buffer object, and down the usual path otherwise.

@mhdawson
Copy link
Member

@gabrielschulhof I need a bit more context/info on how that might resolve the issue.

@mhdawson
Copy link
Member

Based on discussion in the Node-API team meeting we agreed that adding some additional explanation to the documentation would help, with the recommendation to do a delete instead of unref when that makes sense.

@mhdawson
Copy link
Member

Created this PR to update the documentation - #42035

@mhdawson
Copy link
Member

There is now a concept of a nogc callback which helps with this problem https://nodejs.org/docs/latest/api/n-api.html#node_api_nogc_finalize.

@Brooooooklyn can you validate that this fixes the issue for you?

@Brooooooklyn
Copy link
Author

@mhdawson I'll try it asap, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
node-api Issues and PRs related to the Node-API.
Projects
None yet
Development

No branches or pull requests

4 participants