Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd json parser to support multiple timestamp format for same set of logs #3248

Closed
dontreboot opened this issue Feb 11, 2021 · 4 comments · Fixed by #3252
Closed

Fluentd json parser to support multiple timestamp format for same set of logs #3248

dontreboot opened this issue Feb 11, 2021 · 4 comments · Fixed by #3252

Comments

@dontreboot
Copy link

Is your feature request related to a problem? Please describe.

Fluentd json parser only supports a timestamp format in filter rule. I am using fluentd to parse Kubernetes docker logs and they have different timestamp formats. Some of them are in unix time format and others are in rfc3339 format. In order to support this use case, I have to explicitly define different filters and specify different time format for those json logs

Describe the solution you'd like
fluentd json plug that accepts multiple timestamp formats and use them to parse json logs. there are probably better ways to do it as that doesn't sound very efficient.

Describe alternatives you've considered
I use multiple filters to specify different timestamp formats.

<source>
  @type http
  port 9880
  bind 0.0.0.0
</source>

<filter vault.**>
  @type parser
  <parse>
    @type json
    json_parser json
    time_type string
    time_format "%Y-%m-%dT%H:%M:%S"
  </parse>
  key_name log
  hash_value_field json_log
  replace_invalid_sequence true
  emit_invalid_record_to_error false
  remove_key_name_field true
  reserve_data true
</filter>

<filter **>
  @type parser
  <parse>
    @type json
    json_parser json
  </parse>
  key_name log
  hash_value_field json_log
  replace_invalid_sequence true
  emit_invalid_record_to_error true
  remove_key_name_field true
  reserve_data true
</filter>

<match **>
  @type stdout
</match>

Additional context

@kenhys
Copy link
Contributor

kenhys commented Feb 16, 2021

It may be able to switch parser, but I assume that there may be a performance penalty. (just a guess )
I'm thinking about this feature.

@kenhys
Copy link
Contributor

kenhys commented Feb 17, 2021

@dontreboot

I've implemented PoC PR. how about like this? #3252

@dontreboot
Copy link
Author

@dontreboot

I've implemented PoC PR. how about like this? #3252

@kenhys Thanks for the PoC PR! That looks promising. Any idea on how much performance hit will be with it will be with this time_format_fallbacks assuming the simplest case with just two timestamp format? I understand there are other variables that could impact the performance. I think having some ballpark number is useful. I guess the worst case will be 2x slower if logs always match for the fallback timestamp.

@cosmo0920
Copy link
Contributor

Any idea on how much performance hit will be with it will be with this time_format_fallbacks assuming the simplest case with just two timestamp format? I understand there are other variables that could impact the performance. I think having some ballpark number is useful. I guess the worst case will be 2x slower if logs always match for the fallback timestamp.

This should be enough to run a tiny benchmarking with benchmark library like as the following?
#3095 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants