Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmagent: remote write respect Retry-After in header #6124

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jiekun
Copy link
Contributor

@jiekun jiekun commented Apr 17, 2024

Describe Your Changes

related issue: #6097

Changed

  • Remote write retry policy in vmagent is changed into:
    1. Calculate next retry duration by the backoff policy (x2) and the max retry duration limit (1m). (Current situation)
    2. If Retry-After exists in the response header, use max(Retry-After duration, next retry duration), which may exceed the max retry duration limit (1m). (Added in this PR)

Docs

  • CHANGELOG.md.

Note regarding backoff and Retry-After

  • If Retry-After is lower than 1s (which is the initial value of backoff policy retry), it will be ignored.
  • If Retry-After is lower than 1m, the next retry will still act like the current backoff policy, multiplied by 2 until reaching 1m.
  • If Retry-After is higher than 1m, the next retry always uses the duration retrieved from Retry-After.

This modification will reduce the number of retries when Retry-After is greater than 1m. When Retry-After is less than 1m, it will behave like the existing back-off policy (with a different initial retry duration).

Also see the test cases for more info.


Checklist

The following checks are mandatory:

  • I have read the Contributing Guidelines
  • All commits are signed and include Signed-off-by line. Use git commit -s to include Signed-off-by your commits. See this doc about how to sign your commits.
  • Tests are passing locally. Use make test to run all tests locally.
  • Linting is passing locally. Use make check-all to run all linters locally.

Further checks are optional for External Contributions:

  • Include a link to the GitHub issue in the commit message, if issue exists.

  • Mention the change in the Changelog. Explain what has changed and why. If there is a related issue or documentation change - link them as well.

    Tips for writing a good changelog message::

    • Write a human-readable changelog message that describes the problem and solution.
    • Include a link to the issue or pull request in your changelog message.
    • Use specific language identifying the fix, such as an error message, metric name, or flag name.
    • Provide a link to the relevant documentation for any new features you add or modify.
  • After your pull request is merged, please add a message to the issue with instructions for how to test the fix or try the feature you added. Here is an example

  • Do not close the original issue before the change is released. Please note, in some cases Github can automatically close the issue once PR is merged. Re-open the issue in such case.

  • If the change somehow affects public interfaces (a new flag was added or updated, or some behavior has changed) - add the corresponding change to documentation.

Copy link

codecov bot commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 84.21053% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 54.84%. Comparing base (8aaa828) to head (d9e8c28).
Report is 715 commits behind head on master.

Files Patch % Lines
app/vmagent/remotewrite/client.go 84.21% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6124      +/-   ##
==========================================
- Coverage   60.37%   54.84%   -5.54%     
==========================================
  Files         411      594     +183     
  Lines       76609    80133    +3524     
==========================================
- Hits        46253    43948    -2305     
- Misses      27794    33149    +5355     
- Partials     2562     3036     +474     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

var retryAfterDuration time.Duration

// Retry-After could be in "Mon, 02 Jan 2006 15:04:05 GMT" format.
parsedTime, err := time.Parse(http.TimeFormat, retryAfterString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think header parsing should be a separate function. It can be tested separately and calculateRetryDuration will be simplified signifacntly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now seperated from calculateRetryDuration and called in sendBlockHTTP

// testFunc evaluate if the result of calculateRetryDuration is
// 1. >= expectMinDuration
// 2. <= expectMinDuration + 10% (see timeutil.AddJitterToDuration)
testFunc := func(name string, retryAfterString string, retryDuration, expectMinDuration time.Duration) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about emulating retryDuration persistence in this test? This would make it similar to real use case - consecutive calls to calculateRetryDuration slowly increase the retryDuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this idea, and it brings up some scenarios that I had already forgotten.

Please review the test cases in the Call calculateRetryDuration for multiple times block.


// calculateRetryAfterDuration calculate the retry duration.
// 1. Calculate next retry duration by backoff policy (x2) and max retry duration limit.
// 2. If Retry-After exists in response header, use max(Retry-After duration, next retry duration).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// 2. If Retry-After exists in response header, use max(Retry-After duration, next retry duration).

This function doesn't know about this header. This info needs to be placed elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have simplified it and removed unrelated content.

@jiekun jiekun force-pushed the feature/respect-retry-after branch from cea4491 to 3a6c7bd Compare May 22, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants