Fluentd not picking new log files #3239

indrajithgihan · 2021-01-28T20:37:26Z

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
I have a situation where fluentd running as a daemonset in kubernetes cluster not picking new log files and this happens randomly. Sometimes fluentd restart works. Here is my config. Not seeing the app.log.pos file is being updated either. Appreciate if somebody can help me on this

To Reproduce
Run fluentd as a daemonset in K8 cluster and create lof gile directory /data/logs and under multiple subdirectories logs will be generated by pods.

Expected behavior
Fluentd shold be able to pick new log files and update the app.log.pos file.

Your Environment

Fluentd or td-agent version: fluentd:v1.12.0-debian-1.0
Operating system: Red Hat Enterprise Linux 7.9
Kernel version: 3.10.0-1160.6.1.el7.x86_64

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

   <source>
     @type tail
     path /data/logs/*/app/*.log
     pos_file /data/logs/app.log.pos
     path_key tailed_path
     tag ms-logs-application
     read_from_head true
     follow_inodes true
     refresh_interval 20s
     enable_stat_watcher false
     <parse>
       @type none
     </parse>
     #format json
     time_format %Y-%m-%dT%H:%M:%S.%NZ      
   </source>
   <filter ms-logs-application>
     @type concat
     key message
     multiline_start_regexp /\d{4}-\d{1,2}-\d{1,2}/
     flush_interval 10
     timeout_label @NORMAL
   </filter>
   <match ms-logs-application>
     @type relabel
     num_threads 8
     @label @NORMAL
   </match>
   <label @NORMAL>
     <filter ms-logs-application>
      @type parser
      key_name message
      reserve_data true
       <parse>
         @type grok
     	grok_failure_key grokfailure
     	<grok>
         pattern (?<message>[^\]]+ (?<timestamp>%{HOUR}:%{MINUTE}:%{SECOND}.%{NONNEGINT})\|\[(?<thread>[^\]]+)\]\|%{IPORHOST:pod_instance}\|(?<severity>([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo?(?:rmation)?| INFO?(?:RMATION)?|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?))\|%{GREEDYDATA:log_type}\|%{GREEDYDATA:application}\|%{GREEDYDATA:microservice}\|%{UUID:uuid}\|(?<message_type>[^\]]+)\|(?<fullmessage>(.|\r|\n)*))
         </grok>		
       </parse>
     </filter>  
     
     <filter ms-logs-application>
       @type record_transformer
       remove_keys fullmessage
       enable_ruby
       <record>
         host.name ${hostname}
         remote_ip "#{(Socket.ip_address_list.detect do |intf| intf.ipv4_private? end).ip_address}"
         log.file.path "${record['tailed_path']}"
     	#remote_ip "%#{@metadata.ip_address}"
       </record>
     </filter>
     
     <match ms-logs-application>
       @type rewrite_tag_filter
       num_threads 8
       <rule>
         key grokfailure
         pattern /.*/
         tag grokfailure_log_app
       </rule>
       <rule>
         key application
         pattern /.*/
         tag ms-logs-app-matched
       </rule>     
     </match>
     
     <match ms-logs-app-matched>
       @type elasticsearch_dynamic
       num_threads 8
       @log_level info
       host <IP>
       suppress_type_name true
       include_tag_key true
       reload_connections true
       #port 9200
       logstash_format true
       #index_name fluentd.${tag}.%Y%m%d
       
       #%{application}-%{+YYYY.MM.dd}
       logstash_prefix myapp-application-${record['application']}
       <buffer>
          @type file
          path /data/logs/*/app/*.log
          flush_mode interval
          retry_type exponential_backoff
          flush_thread_count 8
          flush_interval 5s
          retry_forever true
          retry_max_interval 30
          chunk_limit_size 2M
          queue_limit_length 32
          overflow_action throw_exception
         </buffer>
     </match>  
     
     <match grokfailure_log_app>
       @type elasticsearch_dynamic
       num_threads 8
       @log_level info
       suppress_type_name true
       include_tag_key true
       reload_connections true
       hosts <ip>
       #port 9200
       logstash_format true
       #%{application}-%{+YYYY.MM.dd}
       logstash_prefix app-nonematch
       #type_name fluentd.${tag}.%Y%m%d
     </match>    
   </label>   
   <filter ms-logs-application>
    @type parser
    key_name message
    reserve_data true
     <parse>
       @type grok
   	grok_failure_key grokfailure
   	<grok>
       pattern (?<message>[^\]]+ (?<timestamp>%{HOUR}:%{MINUTE}:%{SECOND}.%{NONNEGINT})\|\[(?<thread>[^\]]+)\]\|%{IPORHOST:pod_instance}\|(?<severity>([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo?(?:rmation)?| INFO?(?:RMATION)?|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?))\|%{GREEDYDATA:log_type}\|%{GREEDYDATA:application}\|%{GREEDYDATA:microservice}\|%{UUID:uuid}\|(?<message_type>[^\]]+)\|(?<fullmessage>(.|\r|\n)*))
       </grok>		
     </parse>
   </filter>  
   
   <filter ms-logs-application>
     @type record_transformer
     remove_keys fullmessage
     enable_ruby     
     <record>
       host.name ${hostname}
       remote_ip "#{(Socket.ip_address_list.detect do |intf| intf.ipv4_private? end).ip_address}"
       log.file.path "${record['tailed_path']}"
   	#remote_ip "%#{@metadata.ip_address}"
     </record>
   </filter>
   
   <match ms-logs-application>
     @type rewrite_tag_filter
     num_threads 8
     <rule>
       key grokfailure
       pattern /.*/
       tag grokfailure_log_app
     </rule>
     <rule>
       key application
       pattern /.*/
       tag ms-logs-app-matched
     </rule>     
   </match>
   
   <match ms-logs-app-matched>
     @type elasticsearch_dynamic
     
----
   </match>   
   
   <match grokfailure_log_app>
     @type elasticsearch_dynamic
  -----
   </match>

Your Error Log

<!-- Write your **ALL** error log here -->

Additional context

The text was updated successfully, but these errors were encountered:

pawankkamboj · 2021-02-15T09:55:40Z

We are facing the some issue, after upgrading to 1.12. Not reading some file.

ashie · 2021-02-16T00:05:42Z

Doesn't it reproduce with v1.11.x ? @indrajithgihan @pawankkamboj

ashie · 2021-02-16T01:49:52Z

Recently test_rotate_file_with_open_on_every_update sometimes (often?) fails: https://travis-ci.org/github/fluent/fluentd/jobs/759131293
It may be related with this issue.

indrajithgihan · 2021-02-16T04:43:32Z

@ashie @repeatedly
I had the same issue with v1.11.5 as well. I am using fluntd-concat plugin and observed high cpu usage within fluentd pod with lots of timeout flushes in the log. Could this be an issue for not detecting new log files?

2021-02-16 10:08:05 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:08:35 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:08:57 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:09:23 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:09:55 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:10:23 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:10:41 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:11:17 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:11:32 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:08 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:26 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:41 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:53 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:13:17 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:13:38 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:13:55 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:14:09 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:14:36 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:14:54 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:15:15 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:15:30 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:15:54 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:16:30 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:16:43 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:16:56 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:17:29 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:17:41 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:18:01 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:18:21 +0545 [info]: #0 Timeout flush: ms-logs-application:default

hulucc · 2021-03-15T07:33:37Z

Had the same issue here. And I found that fluentd stopping pick log randomly after deployment rolling update, and restart fluentd helps. Version 1.12

joshbranham · 2021-03-22T13:38:19Z

Had the same issue here. And I found that fluentd stopping pick log randomly after deployment rolling update, and restart fluentd helps. Version 1.12

Confirmed we were seeing this with 1.12.1 across many different kubernetes clusters. Rolling back to 1.11.x has fixed it

snorwin · 2021-03-23T10:38:48Z

We are facing the same issue using fluentd version 1.12.1 on Red Hat Enterprise Linux 7.9 with kernel version: 3.10.0-1160.2.2.el7.x86_64 and running dockerd with --log-driver json-file --log-opt max-size=20M --log-opt max-file=3

It seems that the positions in the pos filefor some pods randomly are stuck at the max file size of ~20MiB:

$ cat /var/log/fluentd-containers.log.pos | grep xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx
/var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log       0000000001312d81        000000002c2a3401
/var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log       00000000013136db        000000002c2a3403
/var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log       0000000001312db2        000000002c2a3404

If we follow the symbolic link:

$ readlink -f  /var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log
/var/lib/docker/containers/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-json.log

and check the inode and the file size:

$ stat /var/lib/docker/containers/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-json.log
  File: ‘/var/lib/docker/containers/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-json.log’
  Size: 7805313         Blocks: 16384      IO Block: 4096   regular file
Device: fd09h/64777d    Inode: 740963332   Links: 1
Access: (0400/-r--------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:container_var_lib_t:s0
Access: 2021-03-23 09:37:24.959581215 +0100
Modify: 2021-03-23 11:17:23.688752202 +0100
Change: 2021-03-23 11:17:23.688752202 +0100
 Birth: -

the inode of stat 740963332 -> 2c2a3404 seems to match, however the size 7805313-> 771981 was not reset properly after the rotation.

Input configuration in fluent.conf:

<source>
  # See https://docs.fluentd.org/input/tail 
  @type tail
  @label @CONCAT
  path /var/log/containers/*.log
  pos_file "#{ENV['POS_FILE_PATH']}/fluentd-containers.log.pos"
  tag kubernetes.*
  read_from_head true
  follow_inodes true
  refresh_interval 15
  rotate_wait 5
  enable_stat_watcher false
  <parse>
    time_format %Y-%m-%dT%H:%M:%S.%N%Z
    keep_time_key true
    @type json
    time_type string
  </parse>
</source>

nvtkaszpir · 2021-03-23T18:11:49Z

can you guys provide info about your settings in the nodes:

sysctl fs.inotify.max_user_watches
sysctl fs.inotify.max_user_instances

snorwin · 2021-03-23T18:38:50Z

$ sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 65536
$ sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128

joshbranham · 2021-03-23T20:13:29Z

root@fluentd-2656g:/fluentd# sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
root@fluentd-2656g:/fluentd# sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 8192
root@fluentd-2656g:/fluentd#

ashie · 2021-03-24T00:09:36Z

Although I'm not sure all of your problems are same or not, #3274 or #3224 or #3292 may be the same issue.
These bugs are introduced at 1.12.0 (#3182) so that your problem is different one if it reproduces with 1.11 too (#3239 (comment) seems different one). Could you try & report whether your issue reproduces with 1.11 or not?

They are already fixed in master branch (#3275 and #3294) but not released yet.
We are preparing to release 1.12.2 until the end of this month, please try it after it's released.
If it still reproduces with 1.12.2 (or 1.11), I'll take a look it.

indrajithgihan · 2021-03-24T06:41:47Z

sysctl fs.inotify.max_user_instances

fs.inotify.max_user_instances = 128

sysctl fs.inotify.max_user_watches

fs.inotify.max_user_watches = 8192

I had the same issue with v1.11.5 as well. I am using fluntd-concat plugin and observed high cpu usage within fluentd pod with lots of timeout flushes in the log. Could this be an issue for not detecting new log files?

nvtkaszpir · 2021-03-24T08:06:13Z

@ashie I think the issues may be related, if someone leaves sysctl inotify parameters on default values for given operating system and has a lot of containers per node (like, hitting limits per instance + pods come and go/pods with mutliple sidecars) then the issue of not tailing logs may be also triggered, though AFAIR there should be another error message in that case.

We had that problems with different logging shipper - promtail.

snorwin · 2021-03-24T08:15:49Z

@ashie we rolled back fluentd to 1.11.5 yesterday and so far haven't encountered any issues.

TomasKohout · 2021-03-24T13:15:53Z

We also had to roll back to 1.11.x due to this error. 🙁

ashie · 2021-03-25T02:33:08Z

It seems that some different problems are mixed in this issue.
The original report by @indrajithgihan isn't 1.12 specific, so we don't treat 1.12 specific problems in this issue.

reproduces with both 1.11 and 1.12

We'll continue to investigate it in this issue

@indrajithgihan @nvtkaszpir

1.12 specific

Probably it will be resolved by 1.12.2 (we'll release it next week).
Please file a new issue if your problem sitll reproduces with 1.12.2.

@joshbranham @snorwin @TomasKohout

TBD (probably 1.12 specific?)

Rolling back to 1.11 may resolve your issue.

@pawankkamboj @hulucc

pawankkamboj · 2021-03-25T04:53:53Z

Yes, it is working fine with version 1.11.

…

On Thu, 25 Mar 2021, 08:03 Takuro Ashie, ***@***.***> wrote: It seems that some different problems are mixed in this issue. The original report by @indrajithgihan <https://github.com/indrajithgihan> isn't 1.12 specific, so we don't treat 1.12 specific problems in this issue. reproduces with both 1.11 and 1.12 Continue to investigate it in this issue @indrajithgihan <https://github.com/indrajithgihan> @nvtkaszpir <https://github.com/nvtkaszpir> 1.12 specific Probably it will be resolved by 1.12.2 (we'll release it next week). Please file a new issue if your problem sitll reproduces with 1.12.2. @joshbranham <https://github.com/joshbranham> @snorwin <https://github.com/snorwin> @TomasKohout <https://github.com/TomasKohout> TBD (probably 1.12 specific?) Rolling back to 1.11 may resolve your issue. @pawankkamboj <https://github.com/pawankkamboj> @hulucc <https://github.com/hulucc> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3239 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3YS4KXJJPXM6EDDTKVCJDTFKOHJANCNFSM4WXWNY5Q> .

Cryptophobia · 2021-04-08T14:59:03Z

Can this be closed now? Has anyone tested with 1.12.2 to confirm in_tail fixes have provided the stability?

If nobody can report back, today we build and test as well.

ashie · 2021-04-09T02:06:56Z

We receive no 1.12 specific bug reports yet after releasing 1.12.2.
We hope 1.12 specific bugs are resolved by 1.12.2.

Can this be closed now?

We don't close this issue yet because:

The original report by @indrajithgihan isn't 1.12 specific, so we don't treat 1.12 specific problems in this issue.

Probably the original comment is another issue that exists from past and rarely causes.

Cryptophobia · 2021-04-09T03:23:11Z

@ashie we receive reports from internal user managing many nodes that in_tail is not tailing all of the logs. Could this be a kernel issue or something else related down the stack?

Any guidelines for minimum kernel version and what dependencies are required for fluentd v1.12.2?

ashie · 2021-04-09T08:33:37Z

Again, we don't treat 1.12 specific problems anymore at this issue.
If your problem reproduces only with v1.12 or later, please open a new issue and describe your problem as detailed as possible.

ashie · 2021-04-09T08:36:18Z

I remove bug label because we don't yet identify the cause of the original report of this issue.

SimonForsman · 2021-04-09T08:59:29Z

I'm having this issue in 1.11.5 running on k8s, the issue only seems to occur (or it might just be a coincidence) after a few weeks of uptime and only on nodes that start temporary pods frequently. Could this be caused by large .pos files or has that been ruled out already ?

I've enabled compacting now but it might be a few weeks before i can know if it helped.

Cryptophobia · 2021-04-09T17:09:31Z

@ashie , yes this is not a 1.12.2 specific problem. We verified yesterday. It is reproducible with 1.11.2 and 1.12.2 for us. We tried both of those version on top of nodes with Ubuntu 20.04 kernel 5.4 . This is why I am asking if this problem is related to dependencies or kernel versions.

ashie · 2021-04-12T08:43:31Z

@Cryptophobia We don't have influential clues for this issue yet, still some different issues may be mixed.
@nvtkaszpir pointed out sysctl parameters but I think it's not applicable for @indrajithgihan's original comment because enable_stat_watcher false is used, it doesn't use inotify.

If you have an environment which doesn't reproduce this issue, please let me know the difference of them as detailed as possible.

Cryptophobia · 2021-04-17T16:51:08Z

The issues #3333 and #3332 are the last two remaining issues we seeing in our environment with fluentd v1.12.2 . We recently switched to using gracefulReload after upgrading to v1.12.2 instead of previously using reload endpoint.

ashie · 2021-04-28T04:10:57Z

#3357 might be a same issue.

It seems cause by a big log file (350MB)

This would indicate that it took 18 min to read that file because once it was done reading that file, it moved on. I suspect the cause here is a large log file.

ashie · 2021-04-30T08:51:12Z

#3357 might be a same issue.

It seems cause by a big log file (350MB)

This would indicate that it took 18 min to read that file because once it was done reading that file, it moved on. I suspect the cause here is a large log file.

Hmm, I've confirmed that in_tail plugin cannot run refresh_watchers method while reading a big log file, it means in_tail can't detect & read new log files while reading a big file. FYI: #3323 (comment)
The log throttling feature #3185 may resolve this issue.

SimonForsman · 2021-05-10T08:17:48Z

We've not encountered the issue since apr 9th when we enabled compacting of the position file. (so ~30 days uptime on fluentd without issues)

ashie · 2021-05-10T08:25:48Z

We've not encountered the issue since apr 9th when we enabled compacting of the position file. (so ~30 days uptime on fluentd without issues)

Good.
For our reference, cloud you tell me your config for compacting pos file (pos_file_compaction_interval)?
@SimonForsman

ashie · 2021-05-11T01:16:20Z

It seems cause by a big log file (350MB)

Probably #2478 is a same issue with this.

SimonForsman · 2021-05-11T07:58:19Z

For our reference, cloud you tell me your config for compacting pos file (pos_file_compaction_interval)?
@SimonForsman

<source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/containers.log.pos
  enable_stat_watcher false
  enable_watch_timer true
  pos_file_compaction_interval 1h
  read_from_head true
  tag kubernetes.*
  <parse>
    keep_time_key true
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    time_key {{ .Values.k8sLogTimeKey }}
    @type {{ .Values.k8sLogFormat }}
  </parse>
</source>

I don't know disabling the stat/inotify watcher matters, (It was one of the first things we tried and we've never reverted it even though it didn't solve the issue, atleast not on its own)

Also if this is an issue with large files in general and not just large pos files it might not help for everyone. (we have a lot of short lived containers so our pos file got significantly larger than our log files, and we're still monitoring things since we're not sure this helped or if we've just been lucky for the past month)

ashie · 2021-05-26T05:51:39Z

It seems cause by a big log file (350MB)

Probably #2478 is a same issue with this.

read_bytes_limit_per_second feature (#3185, #3364 and #3379) which will be effective against the big files issue has been merged.
It's included in v1.13.0 which will be released at the end of this month (May 2021).
Please try it if you still have this issue.

Usage: https://github.com/fluent/fluentd-docs-gitbook/pull/259/files

indrajithgihan · 2021-06-04T03:50:03Z

It seems cause by a big log file (350MB)

Probably #2478 is a same issue with this.

read_bytes_limit_per_second feature (#3185, #3364 and #3379) which will be effective against the big files issue has been merged.
It's included in v1.13.0 which will be released at the end of this month (May 2021).
Please try it if you still have this issue.

Usage: https://github.com/fluent/fluentd-docs-gitbook/pull/259/files

@ashie Does it require to disable stat watchers to use this feature?

enable_stat_watcher false

ashie · 2021-06-04T05:08:44Z

@ashie Does it require to disable stat watchers to use this feature?

enable_stat_watcher false

No, enabling it is also supported.

ashie · 2021-07-12T08:23:42Z

read_bytes_limit_per_second feature (#3185, #3364 and #3379) which will be effective against the big files issue has been merged.
It's included in v1.13.0 which will be released at the end of this month (May 2021).
Please try it if you still have this issue.

Usage: https://github.com/fluent/fluentd-docs-gitbook/pull/259/files

Effective of this feature is confirmed at #3423.
Probably most similar issues can be resolved by it, so now I close this issue.
If you still have this issue even if you use read_bytes_limit_per_second feature, please open a new issue.

ashie self-assigned this Mar 24, 2021

ashie added the bug label Mar 24, 2021

cosmo0920 mentioned this issue Apr 6, 2021

Update fluentd to 1.12.3, update json to 2.4.1, bump version to 0.11.0 GoogleCloudPlatform/fluent-plugin-google-cloud#447

Merged

ashie removed the bug label Apr 9, 2021

davidbhoward mentioned this issue Apr 12, 2021

in_tail throws error and crashes process #3327

Closed

ashie mentioned this issue Apr 28, 2021

in_tail is not picking up all log files in /var/log/containers/*.log #3357

Closed

ashie added the bug label Apr 28, 2021

ashie mentioned this issue Jun 23, 2021

New Kubernetes container logs are not tailed by fluentd #3423

Closed

ashie closed this as completed Jul 12, 2021

Vamsi0473 mentioned this issue Jul 14, 2021

Fluentd logs missing with huge traffic #3458

Closed

Fluentd not picking new log files #3239

Fluentd not picking new log files #3239

Comments

indrajithgihan commented Jan 28, 2021 • edited

pawankkamboj commented Feb 15, 2021 • edited

ashie commented Feb 16, 2021

ashie commented Feb 16, 2021 • edited

indrajithgihan commented Feb 16, 2021 • edited

hulucc commented Mar 15, 2021 • edited

joshbranham commented Mar 22, 2021 • edited

snorwin commented Mar 23, 2021

nvtkaszpir commented Mar 23, 2021

snorwin commented Mar 23, 2021

joshbranham commented Mar 23, 2021

ashie commented Mar 24, 2021 • edited

indrajithgihan commented Mar 24, 2021

sysctl fs.inotify.max_user_instances

sysctl fs.inotify.max_user_watches

nvtkaszpir commented Mar 24, 2021

snorwin commented Mar 24, 2021

TomasKohout commented Mar 24, 2021

ashie commented Mar 25, 2021 • edited

reproduces with both 1.11 and 1.12

1.12 specific

TBD (probably 1.12 specific?)

pawankkamboj commented Mar 25, 2021 via email

Cryptophobia commented Apr 8, 2021 • edited

ashie commented Apr 9, 2021

Cryptophobia commented Apr 9, 2021

ashie commented Apr 9, 2021 • edited

ashie commented Apr 9, 2021 • edited

SimonForsman commented Apr 9, 2021

Cryptophobia commented Apr 9, 2021

ashie commented Apr 12, 2021 • edited

Cryptophobia commented Apr 17, 2021

ashie commented Apr 28, 2021 • edited

ashie commented Apr 30, 2021

SimonForsman commented May 10, 2021

ashie commented May 10, 2021 • edited

ashie commented May 11, 2021

SimonForsman commented May 11, 2021 • edited

ashie commented May 26, 2021

indrajithgihan commented Jun 4, 2021 • edited

ashie commented Jun 4, 2021

ashie commented Jul 12, 2021

indrajithgihan commented Jan 28, 2021 •

edited

pawankkamboj commented Feb 15, 2021 •

edited

ashie commented Feb 16, 2021 •

edited

indrajithgihan commented Feb 16, 2021 •

edited

hulucc commented Mar 15, 2021 •

edited

joshbranham commented Mar 22, 2021 •

edited

ashie commented Mar 24, 2021 •

edited

ashie commented Mar 25, 2021 •

edited

Cryptophobia commented Apr 8, 2021 •

edited

ashie commented Apr 9, 2021 •

edited

ashie commented Apr 9, 2021 •

edited

ashie commented Apr 12, 2021 •

edited

ashie commented Apr 28, 2021 •

edited

ashie commented May 10, 2021 •

edited

SimonForsman commented May 11, 2021 •

edited

indrajithgihan commented Jun 4, 2021 •

edited