-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make fd_limit an env var to let operater increase beyond 65k default #10963
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10963 +/- ##
==========================================
+ Coverage 71.49% 71.52% +0.02%
==========================================
Files 758 758
Lines 152276 152487 +211
Branches 152276 152487 +211
==========================================
+ Hits 108868 109059 +191
- Misses 38899 38918 +19
- Partials 4509 4510 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is fine for me even though I would consider it a "use at own risk" kind of a config.
On implementation side it should be done as part of config, not an environment variable. Please change the code to read the value from config instead. You may need to move the code a bit further away from main but this is fine.
cc @posvyatokum FYI in case you have any comments, if not I'll approve once the comments are addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Wac, that this limit should come from
nearcore/core/store/src/config.rs
Line 10 in 2c13078
pub struct StoreConfig { |
Otherwise, no objections
My opinion is that the node shouldn't need to bother with this at all, since this limit is ambient and can/should be set by the actual environment (e.g. systemd) My understanding of this rlimit setter was effectively to circumvent issues arising from such ambient state misconfiguration by setting a known-good value. If we aren't able to come up with such a value correctly, then we have no business trying to modify this state at all. At the very least we should read out the current limit and not reduce it if it's actually higher than our "reasonable limit". |
This PR introduces the ability to customize the file descriptor (FD) limit for the neard process through an environment variable. Currently, neard has a hardcoded file descriptor hard limit of 65,000. While this default setting accommodates typical operation, edge cases such as the resharding event have demonstrated that archival nodes can exhaust this limit due to the opening of multiple RocksDB instances, leading to errors from too many open files.
Motivation
During intensive operations like resharding, archival nodes encounter the FD hard limit, resulting in failure due to "too many open files" errors. Providing operators with the flexibility to adjust the FD limit as needed based on their configuration and operational demands will enhance the stability and adaptability of neard.
Changes
Environment Variable FD_LIMIT: Operators can now specify an FD_LIMIT environment variable to set the file descriptor limit, offering a way to increase it beyond the default 65,000 when necessary.
Default Behavior: In the absence of this environment variable, the file descriptor limit will remain at the default setting of 65,000. This ensures backward compatibility and maintains current performance expectations under standard operational conditions.