Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes container logs - in_tail lose some of rotated logs when rotation is quite fast #3434

Closed
doge95 opened this issue Jun 24, 2021 · 2 comments

Comments

@doge95
Copy link

doge95 commented Jun 24, 2021

Describe the bug

We were doing stress test on our EFK stack. We deploy a simple openresty web server that writes access logs to stdout, and use vetega tool to send HTTP requests at a constant rate (QPS: 10000 for 180 seconds). We find that when log file is rotated quite fast, it takes a long time (around 2 mins) for in_tail to discover the new log file, and in_tail loses some of the rotated files.

To Reproduce

Docker logging driver config:

{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "50",
    "compress": "true"
  },
  "storage-driver": "overlay2"
}
# echo "GET http://172.16.151.239" | vegeta attack -rate=10000 -duration=180s | tee results.bin | vegeta report
Requests      [total, rate, throughput]         1799999, 9999.95, 9999.94
Duration      [total, attack, wait]             3m0s, 3m0s, 324.017µs
Latencies     [min, mean, 50, 90, 95, 99, max]  107.842µs, 904.156µs, 306.604µs, 1.96ms, 3.064ms, 8.051ms, 114.653ms
Bytes In      [total, mean]                     0, 0.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      204:1799999
Error Set:

Expected behavior
We expect Kibana to also receive 1799999 logs. However, it only receives 622756 entires.
Screen Shot 2021-06-24 at 10 36 25 PM

Your Environment

  • Fluentd or td-agent version: v1.12.0
  • Operating system: Debian GNU/Linux 10 (buster)
  • Kernel version: 5.4.0-48-generic

Your Configuration

<source>
        @type tail
        @log_level debug
        @id in_tail_container_logs
        path /var/log/containers/*.log
        pos_file /var/log/fluentd-containers.log.pos
        tag kubernetes.*
        exclude_path ["/var/log/containers/*_kube-system_*.log", "/var/log/containers/*_default_*.log", "/var/log/containers/*_rook-ceph_*.log", "/var/log/contain
ers/*fluent*.log"]
        read_from_head true
        follow_inodes true
        <parse>
          @type json
          time_format "%Y-%m-%dT%H:%M:%S.%NZ"
          keep_time_key true
        </parse>
      </source>
 
      <match kubernetes.** >
          @type elasticsearch_data_stream
          data_stream_name logs-efk-container
          @log_level debug
          hosts https://elasticsearch-client.default.svc.cluster.local:9200
          user elastic
          password xxxxxx
          ssl_verify false
          reload_connections false
          reconnect_on_error true
          reload_on_failure true
          log_es_400_reason false
          request_timeout 1000s
          include_timestamp true
          <buffer>
            flush_thread_count 8
            flush_interval 1s
            retry_max_interval 5
            retry_forever true
          </buffer>
      </match>

Additional context
Docker logs

# wc -l *.log*
    34143 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log
   588621 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.1
   588621 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.2
   588620 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.3
  1800005 total

Get the last log at 22:24. It is at line number 588614 of the log file - 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.3
Screen Shot 2021-06-24 at 10 30 52 PM

# grep '2021-06-24T14:24:00.849102284Z' 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.3 -n
588614:{"log":"172.24.0.214 - - [24/Jun/2021:22:24:00 +0800] \"GET / HTTP/1.1\" 204 0 \"-\" \"Go-http-client/1.1\"\n","stream":"stdout","time":"2021-06-24T14:24:00.849102284Z"}

Get the first log at 22:28. It is at the first line of the log file - 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log
Screen Shot 2021-06-24 at 10 50 45 PM

# grep '2021-06-24T14:25:58.573922367Z' 8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log -n
1:{"log":"172.24.0.214 - - [24/Jun/2021:22:25:58 +0800] \"GET / HTTP/1.1\" 204 0 \"-\" \"Go-http-client/1.1\"\n","stream":"stdout","time":"2021-06-24T14:25:58.573922367Z"}

It proofs that in_tail fails to check below 2 files when log rotation happening quite fast.

8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.1.gz
8a703c5ad9672f8f8101d6ad47c4ed61b4b04d69621e84cbd0851ba1ee3d3d5f-json.log.2.gz
@doge95
Copy link
Author

doge95 commented Jun 25, 2021

If we use parse type "none", then Fluentd able to tail all the logs.

  ## Fluentd configurations:
  fileConfigs:
    01_sources.conf: |-
      <source>
        @type tail
        @log_level debug
        @id in_tail_container_logs
        path /var/log/containers/*.log
        pos_file /var/log/fluentd-containers.log.pos
        tag kubernetes.*
        exclude_path ["/var/log/containers/*_kube-system_*.log", "/var/log/containers/*_tseacloud_*.log", "/var/log/containers/*_default_*.log", "/var/log/contain
ers/*_rook-ceph_*.log", "/var/log/containers/*fluent*.log", "/var/log/containers/*_cloudsql_*.log"]
        read_from_head true
        follow_inodes true
        <parse>
          @type none
        </parse>
      </source>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants