cmd/load: reduce memory usage #4777

jiefenghuang · 2024-04-25T07:51:33Z

test:

meta: 31M total entries, biggest dir has 11M files
cost: loaded to badger with 14GiB memory

echo $((14 * 1024 * 1024 * 1024)) > /sys/fs/cgroup/memory/limited_mem_group/memory.limit_in_bytesq
sudo cgexec -g memory:limited_mem_group ./juicefs load badger://test.badger /data/code/com/juicedata/test/dump.json.gz

Signed-off-by: jiefenghuang <jiefeng@juicedata.io>

davies · 2024-04-26T01:57:53Z

pkg/meta/dump.go

@@ -587,6 +596,11 @@ func decodeEntry(dec *json.Decoder, parent Ino, cs *DumpedCounters, parents map[
 			return nil, fmt.Errorf("decode %v: %s", name, err)
 		}
 	}
+	if decodeCounter >= gcThreshold || len(e.CompressedEntries) > gcThreshold {
+		decodeCounter = 0
+		runtime.GC()


it's better to use mem pool for DumpedEntry and compressedAttr

yes, mem pool can improve performance and reduce GC pressure.
However, testing shows that in this scenario, with 11M files in a single folder, if GC is not triggered in time, memory will still continue to grow, leading to OOM.

We don't need to over optimize for this single case.

btw, since we don't use JSON to serialize the whole tree directly, we can change type of Entries of DumpedEntry to slice of compressAttr.

btw, we can optmize this after 1.2

We don't need to over optimize for this single case.

btw, since we don't use JSON to serialize the whole tree directly, we can change type of Entries of DumpedEntry to slice of compressAttr.

Entries map[string]*DumpedEntry json:"entries,omitempty"``
This variable is still necessary during the dump and load processes.

cmd/load: reduce memory usage

28b3bd5

Signed-off-by: jiefenghuang <jiefeng@juicedata.io>

jiefenghuang added the kind/enhancement label Apr 25, 2024

jiefenghuang marked this pull request as ready for review April 25, 2024 08:09

davies reviewed Apr 26, 2024

View reviewed changes

jiefenghuang marked this pull request as draft April 26, 2024 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/load: reduce memory usage #4777

cmd/load: reduce memory usage #4777

jiefenghuang commented Apr 25, 2024

davies Apr 26, 2024 •

edited

jiefenghuang Apr 26, 2024

davies Apr 26, 2024

davies Apr 26, 2024

jiefenghuang Apr 29, 2024

cmd/load: reduce memory usage #4777

Are you sure you want to change the base?

cmd/load: reduce memory usage #4777

Conversation

jiefenghuang commented Apr 25, 2024

davies Apr 26, 2024 • edited

Choose a reason for hiding this comment

jiefenghuang Apr 26, 2024

Choose a reason for hiding this comment

davies Apr 26, 2024

Choose a reason for hiding this comment

davies Apr 26, 2024

Choose a reason for hiding this comment

jiefenghuang Apr 29, 2024

Choose a reason for hiding this comment

davies Apr 26, 2024 •

edited