Fluentd json parser to support multiple timestamp format for same set of logs #3248

dontreboot · 2021-02-11T01:46:51Z

Is your feature request related to a problem? Please describe.

Fluentd json parser only supports a timestamp format in filter rule. I am using fluentd to parse Kubernetes docker logs and they have different timestamp formats. Some of them are in unix time format and others are in rfc3339 format. In order to support this use case, I have to explicitly define different filters and specify different time format for those json logs

Describe the solution you'd like
fluentd json plug that accepts multiple timestamp formats and use them to parse json logs. there are probably better ways to do it as that doesn't sound very efficient.

Describe alternatives you've considered
I use multiple filters to specify different timestamp formats.

<source>
  @type http
  port 9880
  bind 0.0.0.0
</source>

<filter vault.**>
  @type parser
  <parse>
    @type json
    json_parser json
    time_type string
    time_format "%Y-%m-%dT%H:%M:%S"
  </parse>
  key_name log
  hash_value_field json_log
  replace_invalid_sequence true
  emit_invalid_record_to_error false
  remove_key_name_field true
  reserve_data true
</filter>

<filter **>
  @type parser
  <parse>
    @type json
    json_parser json
  </parse>
  key_name log
  hash_value_field json_log
  replace_invalid_sequence true
  emit_invalid_record_to_error true
  remove_key_name_field true
  reserve_data true
</filter>

<match **>
  @type stdout
</match>

Additional context

The text was updated successfully, but these errors were encountered:

kenhys · 2021-02-16T06:18:50Z

It may be able to switch parser, but I assume that there may be a performance penalty. (just a guess )
I'm thinking about this feature.

kenhys · 2021-02-17T07:46:35Z

@dontreboot

I've implemented PoC PR. how about like this? #3252

dontreboot · 2021-02-22T21:30:35Z

@dontreboot

I've implemented PoC PR. how about like this? #3252

@kenhys Thanks for the PoC PR! That looks promising. Any idea on how much performance hit will be with it will be with this time_format_fallbacks assuming the simplest case with just two timestamp format? I understand there are other variables that could impact the performance. I think having some ballpark number is useful. I guess the worst case will be 2x slower if logs always match for the fallback timestamp.

cosmo0920 · 2021-03-02T01:54:45Z

Any idea on how much performance hit will be with it will be with this time_format_fallbacks assuming the simplest case with just two timestamp format? I understand there are other variables that could impact the performance. I think having some ballpark number is useful. I guess the worst case will be 2x slower if logs always match for the fallback timestamp.

This should be enough to run a tiny benchmarking with benchmark library like as the following?
#3095 (comment)

kenhys added feature request enhancement and removed feature request labels Feb 15, 2021

kenhys mentioned this issue Feb 17, 2021

Support multiple kind of timestamp format #3252

Merged

cosmo0920 closed this as completed in #3252 Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentd json parser to support multiple timestamp format for same set of logs #3248

Fluentd json parser to support multiple timestamp format for same set of logs #3248

dontreboot commented Feb 11, 2021

kenhys commented Feb 16, 2021

kenhys commented Feb 17, 2021

dontreboot commented Feb 22, 2021

cosmo0920 commented Mar 2, 2021

Fluentd json parser to support multiple timestamp format for same set of logs #3248

Fluentd json parser to support multiple timestamp format for same set of logs #3248

Comments

dontreboot commented Feb 11, 2021

kenhys commented Feb 16, 2021

kenhys commented Feb 17, 2021

dontreboot commented Feb 22, 2021

cosmo0920 commented Mar 2, 2021