Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs Missing During Frequent LogRotations #3429

Closed
ayushiaks opened this issue Jun 21, 2021 · 2 comments
Closed

Logs Missing During Frequent LogRotations #3429

ayushiaks opened this issue Jun 21, 2021 · 2 comments

Comments

@ayushiaks
Copy link

ayushiaks commented Jun 21, 2021

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug

We've been missing logs after they are rotated very frequently (high load on application). We're working with Azure Kubernetes Service. Upon investigating, we confirmed the following:

  1. Logs are flowing perfectly from our application.
  2. Using fluentd.conf, we store output to files. Here we find our logs missing.
  3. Upon checking the compressed rotated files (.gz), we find the missing logs in there.
  4. These logs are situated in the middle of the compressed file somewhere, not the beginning or the end.

Our rotate_wait param is set to default (5 secs). Logs are being rotated properly as per fluentd logs.
Seeing the missed logs in compressed files, is the position file not being updated correctly? Which doesn't make sense either with logs missing from random places.

To Reproduce

Expected behavior

Logs should not be missing ideally

Your Environment

  • Fluentd or td-agent version: fluentd --version or td-agent --version
    fluentd 1.12.3
  • Operating system: cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel version: uname -r
    5.4.0-1047-azure

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

<!-- Write your configuration here -->

Kubernetes.conf

# This file collects and filters all Kubernetes container logs. Should rarely need to modify it.

# Do not directly collect fluentd's own logs to avoid infinite loops.
<label @FLUENT_LOG>
  <match fluent.**>
    @type null
  </match>
</label>

<source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  rotate_wait 5s
  <parse>
    @type multi_format
    # Read logs in JSON format for Kubernetes v1.18-
    <pattern>
      format json
      time_format "%Y-%m-%dT%H:%M:%S.%NZ"
      keep_time_key true
    </pattern>
    # Reads logs in CRI format for Kubernetes v1.19+
    # The CRI format is documented here: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/kubelet-cri-logging.md
    <pattern>
      format regexp
      expression /^(?<time>.+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
      time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
      keep_time_key true
    </pattern>
  </parse>
</source>

# Used for collecting fluentd metrics
<source>
  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics
</source>

<source>
  @type prometheus_output_monitor
  interval 10
  <labels>
    hostname ${hostname}
  </labels>
</source>

<filter kubernetes.var.log.containers.**.log>
  @type kubernetes_metadata
</filter>

# Exclude events from Geneva containers since they just seem to echo events from other containers
<filter kubernetes.var.log.containers.geneva**.log>
  @type grep
  <exclude>
    key log
    pattern .*
  </exclude>
</filter>

<filter kubernetes.var.log.containers.**.log>
  @type prometheus
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of incoming records
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
</filter>

# Make the IfxAuditLogs JSON parsable
<filter kubernetes.var.log.containers.**.log>
  @type record_modifier  
  <record>    
    _temp_ ${ if record.has_key?("log"); record["log"] = record["log"].to_s.gsub(/ifxaudit(appl|mgmt|fail):*/,''); end; nil }
  </record>
  remove_keys _temp_
</filter>

# Flatten fields nested within the 'log' field if it is JSON
<filter kubernetes.var.log.containers.**.log>
  @type parser
  key_name log
  <parse>
    @type json
    json_parser json
  </parse>
  reserve_data true # this preserves fields from the original record
  remove_key_name_field true # this removes the log field if successfully parsed as JSON
  reserve_time # the time was already parsed in the source, we don't want to overwrite it with current time.
  emit_invalid_record_to_error false # In case of unparsable log lines or CRI logs. Keep fluentd's error log clean
</filter>

# Flatten fields nested within the 'kubernetes' field and remove unnecessary fields
<filter kubernetes.var.log.containers.**.log>
  @type record_transformer
  enable_ruby   
  <record>    
    ContainerName ${record["kubernetes"]["container_name"]}
    NamespaceName ${record["kubernetes"]["namespace_name"]}    
    PodName ${record["kubernetes"]["pod_name"]}
    Node ${record["kubernetes"]["host"]}
  </record>
  # https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/kubelet-cri-logging.md
  remove_keys docker,kubernetes,stream,logtag
</filter>

fluentd.conf

@include kubernetes.conf

# Retag to prefix all logging events type
# Presently retagging for activityLogEvent

<match kubernetes.var.log.containers.**.log>
  @type copy
  <store>
    @type prometheus
    <metric>
      name fluentd_output_status_num_records_total
      type counter
      desc The total number of outgoing records
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
  </store>
  <store>
    @type rewrite_tag_filter
    <rule>
      key     TableName
      pattern ^(ActivityLog)$
      tag     activitylogevent.logging
    </rule>
  </store>
</match>

# Retag to prefix all other container events with k8scontainers
<match kubernetes.var.log.containers.**.log>
  @type rewrite_tag_filter
  <rule>
    key     ContainerName
    pattern ^(.+)$
    tag     k8scontainers.$1
  </rule>
</match>

# Rename serilog keys
<filter **.logging>
  @type record_transformer
  enable_ruby true
  <record>
    Level ${record["@l"] == nil ? "Information" : record["@l"]}
    Exception ${record["@x"]}
  </record>
  remove_keys @mt,@t,@l,@x,TableName
</filter>

# Send activityLogEvents events to MDSD
<match activitylogevent.**>
  @type copy
  <store>
    @type file
    path /var/log/activitylog/aysriva
    @log_level info
    <buffer>
      timekey 30
      timekey_use_utc true
      timekey_wait 30000
    </buffer>
  </store>
</match>

Your Error Log

<!-- Write your **ALL** error log here -->

Additional context

@ashie
Copy link
Member

ashie commented Jun 23, 2021

First, please use v1.12.4 or later, or v1.11.x.
v1.12.0 - v1.12.3 has some bugs in in_tail. e.g.) #3393

When it's reproduces even if such versions, please check following things:

  • Is it really related to log rotation? Doesn't it happen when rotation is disabled?
  • Do you have example of dropped logs? (Because multi-format-parser ignores child parser's errors, they might be dropped by it.)

@ayushiaks
Copy link
Author

Thanks for pointing this out @ashie, after investigation we found out that few logs are not being parsed.
We're using two different logging libraries and they're cutting each other's logs in between, rendering them non-parseable.
We're trying to figure out how to handle multithreaded logging scenarios like these.

Thanks a lot for your help, closing this issue.
Although, do let me know if you've any leads on how to solve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants