Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syslog batching poc implementation #491

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

nicklas-dohrn
Copy link

@nicklas-dohrn nicklas-dohrn commented Feb 13, 2024

Description

This is our proposal to implement syslog batching for sending logs via https.
it includes a switch between the normal syslog one log per request mode via a syslog query parameter.
This can be done with the query parameter batching=true:

https://<your-drain-url>/syslog?batching=true

If you enable the syslog batching behaviour, it will currently write syslogbatches, where single messages are newline delimited (\n).
Currently, the batch sizes are hardwired to be around 256kb, which is already sufficient for speeding up throughput by a factor of 10x at least.
making it configurable would be an option, but I did not see the need so far.
please let me know what you think of the current approach.

Copy link

linux-foundation-easycla bot commented Feb 13, 2024

CLA Not Signed

@nicklas-dohrn nicklas-dohrn marked this pull request as ready for review April 4, 2024 05:19
@nicklas-dohrn nicklas-dohrn requested a review from a team as a code owner April 4, 2024 05:19
Copy link
Member

@ctlong ctlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it looks fine to me. I don't seem where it adds the newline character to delimit between syslog lines though...

Would love to see a demo at the next ARP WG meeting.

(I sort of disregarded that this was a POC at points in time and some of my comments are more implementation-focused, sorry about that 😅 )

src/pkg/egress/syslog/https.go Outdated Show resolved Hide resolved
src/pkg/egress/syslog/https.go Outdated Show resolved Hide resolved
src/pkg/egress/syslog/https.go Outdated Show resolved Hide resolved
src/pkg/egress/syslog/https.go Outdated Show resolved Hide resolved
}

if w.sendTimer == nil {
w.sendTimer = time.AfterFunc(w.sendInterval, func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a race here if the send is happening in its own goroutine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately, there where some race conditions, when the batch was still sending, so the timer and the batch need to be cleared out as soon as possible.
will look into if there is more here

@nicklas-dohrn
Copy link
Author

The newline is already part of the syslog messages, so these are added already by a method beforhand (linked for anyone curious):
msg := appendNewline(removeNulls(env.GetLog().Payload))

this is true for all possible syslog messages, so I do not even need to add this, which is really convenient.

This addresses all the comments by @ctlong.
It fixes the unneded if else branching for adding to the message
batch which is just not needed anymore.
It fixes the egressMetric to behave similar to the single message
implementation, to not count erroneous logs.
@nicklas-dohrn
Copy link
Author

Adressed all the comments and additions by @ctlong above, if sufficient, please close the threads :)

The refactor is mainly reshuffeling

The new timer implementation makes it more clear what the actual logic
is, and might also prevent some unresolvable states.
It now only has two states:
- Running if a batch is not yet full or time triggered
- Not running if there was a batch send either through a time or a size
based trigger
Copy link
Member

@ctlong ctlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've still some specific concerns, which I've left as comments in this review.

In general, the implementation looks fine, though I'm not sure that I understand the necessity of the new TriggerTimer struct.

src/pkg/egress/syslog/triggerTimer.go Outdated Show resolved Hide resolved
src/pkg/egress/syslog/https.go Outdated Show resolved Hide resolved
@ctlong
Copy link
Member

ctlong commented Apr 17, 2024

@nicklas-dohrn can you please sign the CLA. We can't merge this unless you've done so.

@chombium
Copy link
Contributor

@ctlong I will take care about the CLA. @nicklas-dohrn has to be added to one of our GitHub orgs.

This is a new approach to switch between http and http batching.
It only is different in this regard from the previous attempts,
and only contains refactorings besides this change.
Copy link
Member

@ctlong ctlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, I think this proof of concept is correct. Implementation-wise the timer still has some issues.

Once those are fixed, I would suggest rebasing this off #573 and testing the two changes together to see if it achieves the throughput you want. Then we're all ready for a real implementation (with tests).

🙏 Could you please also update the PR description, thanks.

src/pkg/egress/syslog/https_batch.go Outdated Show resolved Hide resolved
src/pkg/egress/syslog/https_batch.go Outdated Show resolved Hide resolved
src/pkg/egress/syslog/https_batch.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

Successfully merging this pull request may close these issues.

None yet

4 participants