Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/splunkhec] Add the ability to specify a Failover Endpoint and Enable a Circuit Breaker #23822

Conversation

cparkins
Copy link
Contributor

Description:
Add the ability to specify an alternative Splunk HEC Endpoint when the primary endpoint is failing. In addition, if failures are occurring on a regular basis allow each endpoint to be disabled with a Circuit Breaker pattern.

Link to tracking Issue:
23821

Testing:
Testing locally by setting a primary endpoint that would always fail and a secondary that would fail or succeed. Working on testing end-to-end in Test Environment.

Documentation:
Not completed yet.

@cparkins cparkins requested review from a team, atoulme and dmitryax as code owners June 29, 2023 00:06
@linux-foundation-easycla
Copy link

CLA Missing ID CLA Not Signed

@cparkins cparkins force-pushed the feature/splunkhecexporter/failover-circuit-breaker branch from 64d00ff to c80026c Compare June 29, 2023 00:12
@cparkins
Copy link
Contributor Author

Code rebased to resolve CLA issue with older commits.

@atoulme
Copy link
Contributor

atoulme commented Jun 29, 2023

@cparkins I can say right now that while this is an awesome idea and a great contribution, I loathe turning this code which already feels like a lot of spaghetti into something even more complex.

I will take a good look once tests pass but it's not something I'm looking forward to support, and I have posted comments on the issue with my thoughts.

@cparkins
Copy link
Contributor Author

cparkins commented Jun 29, 2023

@cparkins I can say right now that while this is an awesome idea and a great contribution, I loathe turning this code which already feels like a lot of spaghetti into something even more complex.

@atoulme Fortunately the implementation is pretty simple. I'd love to see this turn into something more like a failover connector but I don't really know how to code that yet. My first implementation was done in the core repository but I couldn't get it to work 👎

I will take a good look once tests pass but it's not something I'm looking forward to support, and I have posted comments on the issue with my thoughts.

I've responded to all of your comments. Hopefully with some explanation. I think adding resiliency feature is worthwhile but there may be better ways to implement some or all of what we are trying to do. I have personally rebased this code about 4 times since 0.74 and can understand your trepidation in making this exporter even more complex than it already is.

@atoulme
Copy link
Contributor

atoulme commented Jun 29, 2023

No worries. It will help going forward if you can open issues ahead of time, as we have changes we're also considering to help with batching that may affect the design here.

@cparkins
Copy link
Contributor Author

No worries. It will help going forward if you can open issues ahead of time, as we have changes we're also considering to help with batching that may affect the design here.

@atoulme I thought there was one, but it was related to the failover connector. This code has been something that I've been working on for quite some time and we have tested in previous versions of the exporter. If the batching changes again it would probably be similar to the last update. Unfortunately this solution seems to require a second buffer to send the failover request without destroying the payload.

if failoverBuf != nil {
_, failoverErr := failoverBuf.Write(b)
if failoverErr != nil {
c.logger.Info("Error writing failover data", zap.Error(failoverErr))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove these log entries once we are done with end-to-end testing.

}
permanentErrors = append(permanentErrors, consumererror.NewPermanent(fmt.Errorf(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed to match how Traces and Logs were handling these errors.

- Allow Failover URI to be configured and used in case of a failed request.
- Allow a Circuit Breaker to be configured and used for the primary and failover url.
@cparkins cparkins force-pushed the feature/splunkhecexporter/failover-circuit-breaker branch from 54c8278 to 44693e8 Compare July 6, 2023 17:43
@cparkins
Copy link
Contributor Author

cparkins commented Jul 6, 2023

@atoulme I think this is finally ready for review!

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 21, 2023
@dmitryax dmitryax removed the Stale label Jul 21, 2023
@cparkins
Copy link
Contributor Author

@atoulme Tests are now passing, any chance I can get a review?

@atoulme atoulme added the discussion needed Community discussion needed label Aug 4, 2023
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 19, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 3, 2023

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Sep 3, 2023
@sethallen
Copy link

I'm curious what's next here @atoulme ?

I see that "discussion needed" was labeled on this PR and that the originating issue 23821 is on it's way to being closed in about 45 days due to being stale as well. How does this discussion proceed? Can I help with something here? Thanks!

@atoulme
Copy link
Contributor

atoulme commented Sep 8, 2023

I didn't have time to review the changes and the PR lapsed.
I'd much rather we follow up with the failover connector discussed here: #20766

Anyone can bring this up in a SIG meeting - hence the "discussion needed" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants