Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New blob storage scheme avoiding large base dir count #13884

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

kasey
Copy link
Contributor

@kasey kasey commented Apr 16, 2024

What type of PR is this?

Bug fix

What does this PR do? Why is it needed?

This PR changes the naming scheme for blob file storage. Currently the scheme is a single blobs directory which contains a subdir for each unique block root, within which we store blobs by their index number, eg:

blobs/0xff09895e2ff6232ac1ef6f08b1940e0177aaed5d8b682a30574a1cbd3ec6b487/0.ssz

The problem with this approach is that it leads to a large directory entry for the blobs dir, which can cause issues with older filesystems. With this PR, an extra subdir is added to group block root directories by their first byte, eg the directory:

blobs/0xff09895e2ff6232ac1ef6f08b1940e0177aaed5d8b682a30574a1cbd3ec6b487

is renamed to:

blobs/0xff/0xff09895e2ff6232ac1ef6f08b1940e0177aaed5d8b682a30574a1cbd3ec6b487

In order to minimizes changes to the storage code and reduce the risk of new bugs, we perform a one-time migration of the legacy structure during the blob cache warm up, which runs at node startup. This migration makes the appropriate containing directory for each subdir in the old format (eg 0xff in the above example) and calls Rename to move the existing subdir into the new enclosing directory. On most systems this should be an atomic syscall that should be fairly cheap.

Which issues(s) does this PR fix?

Fixes #13880

Other notes for review

We're keeping this as a draft PR while we assess whether it makes sense to release before a minor version bump. We would prefer to wait for a version bump because this change is not backwards compatible - the new directory structure won't be understood by nodes running previous releases. It appears the dir_nlink ext4 feature flag is enabled by default on modern ext4 systems, so this should only be an issue for users running older kernels.

@kasey kasey force-pushed the nested-blobs-dir branch 4 times, most recently from dd35864 to 33be337 Compare April 17, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beacon node stops syncing because >64998 subdirectories in blobs folder
1 participant