Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak on rbd image copy #904

Open
shell-skrimp opened this issue Jul 19, 2023 · 6 comments
Open

Potential memory leak on rbd image copy #904

shell-skrimp opened this issue Jul 19, 2023 · 6 comments

Comments

@shell-skrimp
Copy link

shell-skrimp commented Jul 19, 2023

The tool I have written uses < 70MB RSS to perform exact same operations for qcow2 images (basically raw copying them). I can do this in rapid succession and will never break 100MB. However, if I do the same thing with ceph (copying image in a pool) with exact same throughput as "qcow2" based implementation after all is said and done my tool is using 1127M RSS and it's never freed. The ramp up to this amount of memory takes a bit, initially it's only using about 40MB more than it would if I wasnt talking to ceph. So I think there is some type of memory leak, just not entirely sure on where to find it or how to track it down with go-ceph/cgo.

I have used memleak-bpfcc -p <pid> on the process and then start a copy and see the following:

[20:07:34] Top 10 stacks with outstanding allocations:
        addr = 7f0a54038b00 size = 4
        // various omitted addresses from 4-1024
        addr = 7f0a44b3eb20 size = 1024
        addr = 7f0a54038500 size = 1232
        addr = 7f0a55712d90 size = 1280
        addr = 7f0a54003880 size = 1280
        addr = 7f0a557152b0 size = 1336
        addr = 7f0a54b10490 size = 1336
        addr = 7f0a54b6c570 size = 1336
        addr = 7f0a54002d90 size = 1432
        addr = 7f0a680099a0 size = 1792
        addr = 7f0a5c00fa30 size = 1792
        addr = 7f0a5403ead0 size = 1800
        addr = 7f0a556eb9b0 size = 1856
        addr = 7f0a54bcf990 size = 1856
        addr = 7f0a54b11dc0 size = 2304
        addr = 7f0a54ae2960 size = 3416
        addr = 7f0a54bda240 size = 3712
        addr = 7f0a54b67180 size = 4032
        addr = 7f0a54001d80 size = 4096
        addr = 7f0a4803f040 size = 4096
        addr = 7f0a4c01f040 size = 4096
        addr = 7f0a4cb63040 size = 4096
        addr = 7f0a496f3040 size = 4096
        addr = 7f0a4cb71a20 size = 4464
        addr = 7f0a44020810 size = 4464
        addr = 7f0a48bd2f80 size = 4464
        addr = 7f0a54b6a560 size = 8192
        addr = 7f0a54b1fd80 size = 8192
        addr = 7f0a54bd7bb0 size = 8192
        addr = 7f0a557132a0 size = 8192
        addr = 7f0a54b8ac90 size = 15320
        addr = 7f0a55690860 size = 33264
        addr = 7f0a54062ef0 size = 33264
        addr = 7f0a54b74ce0 size = 60000
        addr = 7f0a54bb8520 size = 60000
        addr = 7f0a54b25310 size = 60000
        addr = 7f0a5563f3a0 size = 65536
        addr = 7f0a54b90b60 size = 120000
        addr = 7f0a54aee750 size = 120000
        addr = 7f0a5403fde0 size = 120000
        addr = 7f0a54bdef90 size = 10880000
        6488 bytes in 38 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]
        13392 bytes in 3 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]
                ProtocolV2::send_auth_request(std::vector<unsigned int, std::allocator<unsigned int> >&)+0xe8 [libceph-common.so.2]
                [unknown]
                ProtocolV2::~ProtocolV2()+0x0 [libceph-common.so.2]
                [unknown]
        15320 bytes in 1 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]
                [unknown] [librados.so.2.0.0]
        19336 bytes in 15 allocations from stack
                posix_memalign+0x1a8 [libc.so.6]
                [unknown]
        39408 bytes in 21 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]
                [unknown]
                AsyncConnection::~AsyncConnection()+0x0 [libceph-common.so.2]
                [unknown]
        113285 bytes in 1653 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]
        180000 bytes in 3 allocations from stack
                EpollDriver::init(EventCenter*, int)+0x35 [libceph-common.so.2]
        267232 bytes in 3586 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]
                [unknown]
        467277 bytes in 681 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]
        10945592 bytes in 3 allocations from stack
                operator new(unsigned long)+0x1c [libstdc++.so.6.0.30]

There is also a less significant

[20:10:45] Top 10 stacks with outstanding allocations:
        addr = 7f0a54b20890 size = 448
        448 bytes in 1 allocations from stack
                _dl_allocate_tls+0x3a [ld-linux-x86-64.so.2]

So, what I'm trying to understand is if the allocations are reaped at some point or there is indeed a memory leak?

I have also checked through the pprof dumps and from what I can tell I am releasing memory properly (closing connections, ioctx, images, etc); uses about 512Kb on the heap after testing; so it appears that all the memory that's being used it outside the go runtime which leads me to believe its librados/go-ceph (especially since this started occurring once I added ceph support).

The code I use to do the copy:

func GetCephConn() (*rados.Conn, error) {
	conn, err := rados.NewConnWithClusterAndUser(cc.Cluster, cc.ID)
	if err != nil {
		return nil, err
	}
	if err1 := conn.ReadConfigFile(cc.Conf); err1 != nil {
		return nil, err1
	}
	if err1 := conn.SetConfigOption("client_mount_timeout", ClientMountTimeout); err1 != nil {
		return nil, err
	}
	if err1 := conn.SetConfigOption("rados_mon_op_timeout", RADOSMONOPTimeout); err1 != nil {
		return nil, err
	}
	if err1 := conn.SetConfigOption("rados_osd_op_timeout", RADOSOSDOPTimeout); err1 != nil {
		return nil, err
	}
	if err1 := conn.Connect(); err1 != nil {
		return nil, err1
	}

	return conn, nil
}


func CopyImage(pool, srcImg, dstImg string) error {
	conn, err := GetCephConn()
	if err != nil {
		return err
	}
	defer conn.Shutdown()

	srcIoCtx, err := conn.OpenIOContext(pool)
	if err != nil {
		return err
	}
	defer srcIoCtx.Destroy()

	rbdImg, err := rbd.OpenImageReadOnly(srcIoCtx, srcImg, rbd.NoSnapshot)
	if err != nil {
		return err
	}
	defer rbdImg.Close()

	return rbdImg.Copy(srcIoCtx, dstImg)
}

To show that it's not anything I'm doing in the runtime:

Showing top 10 nodes out of 28
      flat  flat%   sum%        cum   cum%
         0     0%     0%  1227.97kB 54.04%  github.com/valyala/fasthttp.init
 1227.97kB 54.04% 54.04%  1227.97kB 54.04%  github.com/valyala/fasthttp/stackless.NewFunc
         0     0% 54.04%  1227.97kB 54.04%  runtime.doInit
         0     0% 54.04%  1227.97kB 54.04%  runtime.main

If this is a memory leak, what more could I do to help get more info?

Using go 1.20 with go-ceph v0.22.0 on a local quincy test cluster.

@shell-skrimp
Copy link
Author

Going to give some context for any users that run into similar problem I have. Looking at the code you see how I handle the connection closing (via defer). For some reason that is not freeing up resources. I decided to use heaptrack and low and behold rados_create2 leaks (see pic).

leak

If I drop the defer and instead just add Destroy/Shutdown where needed (if error, and before exit). There is no memory leak.

ioctx.Destroy()
conn.Shutdown()

It's really weird, it's like the defer calls to conn.Shutdown() or ioctx.Destroy() do not get called.

@shell-skrimp
Copy link
Author

@phlogistonjohn go-ceph indeed has a memory leak with how Shutdown is called. See above code and golang/go#43363 (comment).

If you return a *rados.Conn and then immediately defer conn.Shutdown() it will not be evaluated properly and will therefore leak.

I have worked around this by just calling the Shutdown()/Destroy() at the end of whatever it is I'm doing. However it may also be possible to defer func() { conn.Shutdown() }() to keep the code more idiomatic.

Since the defer is evaluated at defer time; the freeConn(c) and/or ensureConnected are buggy in some way.

Basically, the connection I return after conn.Connect() isnt "complete" enough for defer conn.Shutdown() to properly reap resources. I might experiment with it more but if done rapidly in a gourtine the connections can leak.

@phlogistonjohn
Copy link
Collaborator

phlogistonjohn commented Jul 24, 2023

OK, thanks for the update. Without a lot of investigation on my part yet, an issue with Shutdown seems more plausible to me. I'm reopening this issue since it automatically got closed from the other PR. We'll look into it soon.

@shell-skrimp
Copy link
Author

It's very hard to reproduce. Heaptrack indicates it's only about ~6% of connections. I tried with a custom program that just did these checks endlessly and it never leaked. If I try on the program where I discovered this it takes 10+ hours for the leak to occur. I did my own custom changes to go-ceph; tried to remove runtime.SetFinalizer, tried directly freeing if conn.cluster != nil and it still leaked.

@ansiwen
Copy link
Collaborator

ansiwen commented Aug 3, 2023

It's very hard to reproduce. Heaptrack indicates it's only about ~6% of connections. I tried with a custom program that just did these checks endlessly and it never leaked. If I try on the program where I discovered this it takes 10+ hours for the leak to occur. I did my own custom changes to go-ceph; tried to remove runtime.SetFinalizer, tried directly freeing if conn.cluster != nil and it still leaked.

@shell-skrimp So, I'm a bit confused now. Which of your findings from above (defer vs direct call etc.) are still valid? Now it sounds here like there is a leak no matter what. To be honest, it even sounds like there might be a race within ceph itself. But I just started to look into this, so it's just a gut feeling.

@shell-skrimp
Copy link
Author

@ansiwen neither are valid. I thought that direct calling was better than defer because testing showed initially that there was no memory leak, but in the end there was still a memory leak, it just took thousands of new connections to the ceph cluster to reproduce.

What I did in my testing:

  • Try removal of runtime.SetFinalizer; no difference.
  • Try removal of defer and directly free/close/destroy; no difference.
  • Change Shutdown to if c.cluster != nil { c.rados_shutdown(...} (going by memory on this one); no difference

In the mean time I switched to a long lived ceph connection and that seems to have fixed issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants