Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report drifting close time on local node #4101

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

marta-lokhova
Copy link
Contributor

potentially resolves #1815

This change aims to report a scenario where that we frequently observe on pubnet. Often one or two validators will have a clock that is lagging, which will cause them to adjust their close time during nomination. This change adds a new metric to track frequency of such occurrences and warn operators it it happens consistently over 15 minutes (we want to report these things overtime to avoid noise).

Copy link
Contributor

@SirTyson SirTyson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few questions.

@@ -1323,6 +1328,29 @@ HerderImpl::triggerNextLedger(uint32_t ledgerSeqToTrigger,
if (nextCloseTime <= lcl.header.scpValue.closeTime)
{
nextCloseTime = lcl.header.scpValue.closeTime + 1;
// LCL close time is when _previous_ value was nominated (normally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of BAD_CLOCK_PER_LEDGER_RATE_TRIGGER? Is the idea that a healthy node will occasionally drift by a second or two and be slightly behind the rest of the network? If so, why not just check that the network and local clocks are within some allowed drift window?

@@ -1323,6 +1328,29 @@ HerderImpl::triggerNextLedger(uint32_t ledgerSeqToTrigger,
if (nextCloseTime <= lcl.header.scpValue.closeTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason we only check if a clock is drifting behind and not drifting ahead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rethink how we check whether node's time is synchronized
2 participants