-
Notifications
You must be signed in to change notification settings - Fork 35.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init: fixes file descriptor accounting #30065
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
src/init.cpp
Outdated
nMaxConnections = std::max(std::min<int>(nMaxConnections, fd_max - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE), 0); | ||
if (nFD < MIN_CORE_FILEDESCRIPTORS) | ||
return InitError(_("Not enough file descriptors available.")); | ||
nMaxConnections = std::min(nFD - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE, nMaxConnections); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice how the latter nMaxConnections
trim is redundant once we limit nFD = std::min(FD_SETSIZE, nFD);
. Moving things around, plus accounting for nBind
(which is currently not being accounted for in master), we get:
nMaxConnections = std::max(std::min<int>(nMaxConnections, nFD - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE), 0);
nMaxConnections = std::min(nMaxConnections, nFD - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE,);
Concept ACK |
I think it is worth mentioning that the current approach is defining the minimum amount of file descriptions required without accounting for a single connection*, this means that if we are at the bare minimum, we will run but won't be able to connect to any nodes. This is consistent with the current logic in master, but I think it's not the way it should be. I'm happy to add another commit addressing this, but I've rathered start approaching this sticking to the same assumptions as master. * This actually accounts for manual connections, which may never happen, but not to any of our outbound. If we happen to not create any manuals we'd have 8 slots ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Concept ACK, I stumbled on this recently in #29415 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/init.cpp
Outdated
#else | ||
int fd_max = FD_SETSIZE; | ||
// Reserve enough FDs to account for the bare minimum, plus any manual connections, plus the bound interfaces | ||
int nMinRequiredFds = MIN_CORE_FDS + MAX_ADDNODE_CONNECTIONS + nBind; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to doc/developer-notes.md
variables should use snake_case and no need to denote the type in the variable name (n
for integer): min_required_fds
.
But more importantly, the name nMinRequiredFds
is misleading because we accept a smaller number - a bit below we check if (nFD < MIN_CORE_FDS) return error...
, so it is not "minimum required". I think this variable here should be: min_required_fds = MIN_CORE_FDS + nBind
, that check should be if (nFD < min_required_fds) return error
and then:
- nFD = RaiseFileDescriptorLimit(nMaxConnections + nMinRequiredFds);
+ nFD = RaiseFileDescriptorLimit(min_required_fds + MAX_ADDNODE_CONNECTIONS + nMaxConnections);
and
- nMaxConnections = std::max(std::min<int>(nMaxConnections, nFD - nMinRequiredFds), 0);
+ nMaxConnections = std::max(std::min<int>(nMaxConnections, available_fds - min_required_fds - MAX_ADDNODE_CONNECTIONS), 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the if check should be updated, but I don't necessarily agree with splitting MAX_ADDNODE_CONNECTIONS
from min_required_fds
.
There is the outstanding question of whether the minimum required fds to run Core should account for some of the connections. Currently, we are not accounting for any (given the check is performed against MIN_CORE_FDS
), but I think we should be accounting for at least, outbound + manuals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, no strong opinion whether to include or exclude MAX_ADDNODE_CONNECTIONS
in min_required_fds
.
I guess better to update the check. It is funny to have code like
min_required = ...;
if (available < 10) error
it should really be
min_required = ...;
if (available < min_required) error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed that, plus increased the min_required_fds
to also account for automatic outbounds in the last commit. I'm open to discuss about the latter, but I think it's something we should also account for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added int auto_outbounds = std::min(MAX_OUTBOUND_FULL_RELAY_CONNECTIONS + MAX_BLOCK_RELAY_ONLY_CONNECTIONS + MAX_FEELER_CONNECTIONS, nUserMaxConnections);
seems to be duplicating this logic, but not exactly:
Lines 1069 to 1073 in be100cf
m_max_automatic_connections = connOptions.m_max_automatic_connections; | |
m_max_outbound_full_relay = std::min(MAX_OUTBOUND_FULL_RELAY_CONNECTIONS, m_max_automatic_connections); | |
m_max_outbound_block_relay = std::min(MAX_BLOCK_RELAY_ONLY_CONNECTIONS, m_max_automatic_connections - m_max_outbound_full_relay); | |
m_max_automatic_outbound = m_max_outbound_full_relay + m_max_outbound_block_relay + m_max_feeler; | |
m_max_inbound = std::max(0, m_max_automatic_connections - m_max_automatic_outbound); |
and I think this is getting complicated again. I would suggest to keep it simple - drop the last commit f569b06 init: fix min required fds check and account for outbounds
and only apply this change to 3982543 init: improves file descriptors accounting and docs
:
- if (available_fds < MIN_CORE_FDS)
+ if (available_fds < min_required_fds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is duplicating part of that, given CConnman::m_max_automatic_outbound
is private.
I agree it is adding more complexity, but not accounting for it in the minimum means that could be in a situation where no single outbound connection is accounted for, so the user is basically only connected to manuals.
I acknowledge that's fairly rare, but the two reasons why we are currently not facing this issue is, first, because the minimum is fairly low, so it would need a system with really limited resources to become an issue, and second, because If we were in that situation it would also need to be that case that the node is defining manual connections, otherwise the 8 reserved slots for manual would be available for automatic.
Despite that, this feels like bad design, so I'd argue accounting for automatic is something we should do (though not necessarily as done in the current approach)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node is connected to 8 manually defined outbound peers and 0 inbounds and 0 automatic outbounds, then it is still operational? No need to prohibit startup in this case if there is a chance that this might happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it though? If the user has not specified that he wants no automatic connections (-maxconnections != 0
) I'd argue he'd expect them to happen.
In any case, I think this is such an edge case that it is not worth blocking the PR on it. If you have a strong opinion on this, I'm happy to drop it and squash the rest of the commit to one of the previous ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat ~0 on erroring in the case of 8manual/0in/0out, but my bigger anxiety with the latest commit is that it makes it confusing again (or too complicated in my judgement). I tried to code a simple way to account for the automatic outbounds but gave up. Also, most naturally, as the "minimum required" name suggests, the check should be if (... < min_required)
. I find if (... < min_required + something more)
confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the "making it more confusing". Given this is an edge case, I'll remove the outbound count and we can account for it if someone has a better way of doing so, or if someone opens an issue about it.
We are computing our file descriptors limits based on whether we use poll or select. However, we are taking that into account only partially (subtracting from fd_max in one case, but from nFD later on). Moreover, nBind is also only accounted for partially. Simplify and fix this logic
Thanks for the suggestions @vasild, I've partially covered them and commented on some of the pending ones |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
fe92353
to
fa42bf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK fa42bf7
This is already an improvement over master
. Would be nice to resolve #30065 (comment) as well.
8ed9261
to
f569b06
Compare
The current logic for file descriptor accounting is pretty convoluted and hard to follow. This is partially caused by the lack of documentation plus non-intuitive variable naming (which was more intuitive when fewer things were accounted for, but hasn't aged well). This has led to this accounting being error-prone and hard to maintain (as shown in he previous commit). Redefine some of the constants, plus document what are we accounting for so this can be extended more easily Remove FreeBSD workaround to bitcoin#2695
We were setting a minimum required file descriptor count but testing against a smaller value (notice this is independent from the refactor, it was already the case). Also, while automatic outbounds were accounted for in nUserMaxConnections, they were not used to check against our minimum required number of fds.
Addressed all comments now |
The current logic for file descriptor accounting is pretty convoluted and hard to follow. This is partially caused by the lack of documentation plus non-intuitive variable naming (which was more intuitive when fewer things were accounted for, but
hasn't aged well). This has led to this accounting being error-prone and hard to maintain (as shown in the first commit of this PR).
Redefine some of the constants, plus document what are we accounting for so this can be extended more easily
Fixes #18911