Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(splunk hec sink)!: json encoding should get message first and then parse #20297

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pront
Copy link
Contributor

@pront pront commented Apr 12, 2024

No description provided.

@pront pront requested a review from a team as a code owner April 12, 2024 23:19
@pront pront requested a review from jszwedko April 12, 2024 23:19
@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Apr 12, 2024
@pront
Copy link
Contributor Author

pront commented Apr 12, 2024

Putting this out there to potentially get some feedback, I will revisit on Monday.

@datadog-vectordotdev
Copy link

datadog-vectordotdev bot commented Apr 12, 2024

Datadog Report

Branch report: pront/OPA-1526_splunk_sink_enc_fix
Commit report: dbde866
Test service: vector

❌ 11 Failed (0 Known Flaky), 2146 Passed, 0 Skipped, 19m 46.3s Wall Time

❌ Failed Tests (11)

This report shows up to 5 failed tests.

  • sinks::humio::metrics::tests::multi_value_tags - vector

  • sinks::humio::metrics::tests::smoke_json - vector

  • sinks::splunk_hec::logs::integration_tests::splunk_auto_extracted_timestamp - vector - Details

    Expand for error
     Test has failed
    
  • sinks::splunk_hec::logs::integration_tests::splunk_configure_hostname - vector - Details

    Expand for error
     Test has failed
    
  • sinks::splunk_hec::logs::integration_tests::splunk_custom_fields - vector - Details

    Expand for error
     Test has failed
    

.ok()?,
)
} else {
encoder.encode(event, &mut bytes).ok()?;
Copy link
Contributor Author

@pront pront Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗒️ Encoders call get_message() internally.

.map_err(|error| emit!(SplunkEventEncodeError { error }))
.ok()?,
)
let hec_event = if let Some(message) = event.as_log().get_message() {
Copy link
Contributor Author

@pront pront Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This is expected to break all integration tests using JsonSerializerConfig. For example, look at splunk_custom_fields, looking up by message is no longer correct. The code needs to be updated to directly access the value from the event.

Also, all the tests rely on the legacy namespace. I might go ahead and use the Vector namesapce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @vladimir-dd and @jszwedko for confirming before I go ahead and edit all those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move all other fields from LogEvent to HecData.fields otherwise we could lost some fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't notice that we pass fields through metadata.fields. But since we do not expose indexed_fields for the sink, do we always pass empty.fields? Is this an expected behaviour? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm getting a bit turned around by this now 😵

I think I'd talked to Pavlos about only encoding the message field, but, thinking about it more now, the field in the HEC payload is actually called event which makes me think maybe it should be encoding the entire event and not just the message key. Their definition of event:

Event data can be assigned to the "event" key within the JSON object in the HTTP request, or it can be raw text. The "event" key is at the same level within the JSON event packet as the metadata keys. Within the "event" key-value curly brackets, the data can be in whatever format you want: a string, a number, another JSON object, and so on.

You can batch multiple events in one event packet by combining them within the request. By batching the events, you're specifying that any event metadata within the request is to apply to all of the events contained in the request. Batching events can significantly speed performance when you need to index large quantities of data.

https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/FormateventsforHTTPEventCollector#Event_data:~:text=from%20the%20request.-,Event%20data,-Event%20data%20can

fields is defined as:

The fields key isn't applicable to raw data. This key specifies a JSON object that contains a flat (not nested) list of explicit custom fields to be defined at index time. Requests containing the "fields" property must be sent to the /collector/event endpoint, or else they aren't indexed. For more information, see Indexed field extractions.

I agree with @vladimir-dd that if we only encode message as event that maybe the rest of the fields should go into fields; however, now I'm not so sure that the entire event shouldn't just go in event.

This might be a case where we need to play around with it a bit and see how it looks like in the Splunk interface under various conditions (like Splunk HEC -> Splunk and Syslog -> Splunk).

Apologies for what feels like going in circles a bit here. I'm happy to chat about this if it helps resolve it more quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking it as a draft for now. It's not urgent to get to the bottom of this given we have other priorities.

Copy link
Contributor

@sebastiantia sebastiantia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't mean to approve this PR 😓

@sebastiantia sebastiantia self-requested a review April 15, 2024 21:52
@pront
Copy link
Contributor Author

pront commented Apr 15, 2024

Didn't mean to approve this PR 😓

I think this will be blocked until you re-approve. I suggest re-approving 😅
I won't merge until Vlad and Jesse take a look.

} else {
encoder.encode(event, &mut bytes).ok()?;
HecEvent::Text(String::from_utf8_lossy(&bytes))
HecEvent::Text("".into())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the same logic as above here to avoid losing information - but instead of encoding message encode the whole event(either as json, or as text)?

Copy link
Contributor

@vladimir-dd vladimir-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

@@ -525,10 +525,7 @@ async fn splunk_auto_extracted_timestamp() {
// Splunk will determine the timestamp from the *message* field in the event.
// Thus, we expect the `timestamp` field to still be present.
assert_eq!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗒️ cargo vdev -v int start -a splunk works in Ubuntu. Also, when working on Ubuntu cargo vdev -v int test --retries 2 -a splunk fails to runs tests.

@@ -525,10 +525,7 @@ async fn splunk_auto_extracted_timestamp() {
// Splunk will determine the timestamp from the *message* field in the event.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment for these tests, they all use the Legacy namespace.

@jszwedko jszwedko changed the title chore!(splunk hec sink): json encoding should get message first and then parse chore(splunk hec sink)!: json encoding should get message first and then parse Apr 16, 2024
@@ -0,0 +1,2 @@
The Splunk HEC sink is now using the `message` meaning to retrieve the relevant value and encodes the retrieved value
only. Note that if the retrieved value is `None`, an event with an empty message will be published.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions:

  • Do we know how Splunk reacts if you send it an event with an empty event field?
  • Is the semantic field required? I.e. will Vector fail to boot if there is no semantic message defined?
  • Also, is this only true for the /event endpoint and not for /raw?

.map_err(|error| emit!(SplunkEventEncodeError { error }))
.ok()?,
)
let hec_event = if let Some(message) = event.as_log().get_message() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm getting a bit turned around by this now 😵

I think I'd talked to Pavlos about only encoding the message field, but, thinking about it more now, the field in the HEC payload is actually called event which makes me think maybe it should be encoding the entire event and not just the message key. Their definition of event:

Event data can be assigned to the "event" key within the JSON object in the HTTP request, or it can be raw text. The "event" key is at the same level within the JSON event packet as the metadata keys. Within the "event" key-value curly brackets, the data can be in whatever format you want: a string, a number, another JSON object, and so on.

You can batch multiple events in one event packet by combining them within the request. By batching the events, you're specifying that any event metadata within the request is to apply to all of the events contained in the request. Batching events can significantly speed performance when you need to index large quantities of data.

https://docs.splunk.com/Documentation/Splunk/9.2.1/Data/FormateventsforHTTPEventCollector#Event_data:~:text=from%20the%20request.-,Event%20data,-Event%20data%20can

fields is defined as:

The fields key isn't applicable to raw data. This key specifies a JSON object that contains a flat (not nested) list of explicit custom fields to be defined at index time. Requests containing the "fields" property must be sent to the /collector/event endpoint, or else they aren't indexed. For more information, see Indexed field extractions.

I agree with @vladimir-dd that if we only encode message as event that maybe the rest of the fields should go into fields; however, now I'm not so sure that the entire event shouldn't just go in event.

This might be a case where we need to play around with it a bit and see how it looks like in the Splunk interface under various conditions (like Splunk HEC -> Splunk and Syslog -> Splunk).

Apologies for what feels like going in circles a bit here. I'm happy to chat about this if it helps resolve it more quickly.

@neuronull
Copy link
Contributor

👋
Noting that we elected to hold off on this for the time being.
We can revisit it if needed next week when Pavlos returns.

Thanks for all the input @jszwedko 🙇

@pront pront marked this pull request as draft April 22, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sinks Anything related to the Vector's sinks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants