Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd not picking new log files #3239

Closed
indrajithgihan opened this issue Jan 28, 2021 · 36 comments
Closed

Fluentd not picking new log files #3239

indrajithgihan opened this issue Jan 28, 2021 · 36 comments
Assignees
Labels

Comments

@indrajithgihan
Copy link

indrajithgihan commented Jan 28, 2021

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
I have a situation where fluentd running as a daemonset in kubernetes cluster not picking new log files and this happens randomly. Sometimes fluentd restart works. Here is my config. Not seeing the app.log.pos file is being updated either. Appreciate if somebody can help me on this

To Reproduce
Run fluentd as a daemonset in K8 cluster and create lof gile directory /data/logs and under multiple subdirectories logs will be generated by pods.

Expected behavior
Fluentd shold be able to pick new log files and update the app.log.pos file.

Your Environment

  • Fluentd or td-agent version: fluentd:v1.12.0-debian-1.0
  • Operating system: Red Hat Enterprise Linux 7.9
  • Kernel version: 3.10.0-1160.6.1.el7.x86_64

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

   <source>
     @type tail
     path /data/logs/*/app/*.log
     pos_file /data/logs/app.log.pos
     path_key tailed_path
     tag ms-logs-application
     read_from_head true
     follow_inodes true
     refresh_interval 20s
     enable_stat_watcher false
     <parse>
       @type none
     </parse>
     #format json
     time_format %Y-%m-%dT%H:%M:%S.%NZ      
   </source>
   <filter ms-logs-application>
     @type concat
     key message
     multiline_start_regexp /\d{4}-\d{1,2}-\d{1,2}/
     flush_interval 10
     timeout_label @NORMAL
   </filter>
   <match ms-logs-application>
     @type relabel
     num_threads 8
     @label @NORMAL
   </match>
   <label @NORMAL>
     <filter ms-logs-application>
      @type parser
      key_name message
      reserve_data true
       <parse>
         @type grok
     	grok_failure_key grokfailure
     	<grok>
         pattern (?<message>[^\]]+ (?<timestamp>%{HOUR}:%{MINUTE}:%{SECOND}.%{NONNEGINT})\|\[(?<thread>[^\]]+)\]\|%{IPORHOST:pod_instance}\|(?<severity>([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo?(?:rmation)?| INFO?(?:RMATION)?|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?))\|%{GREEDYDATA:log_type}\|%{GREEDYDATA:application}\|%{GREEDYDATA:microservice}\|%{UUID:uuid}\|(?<message_type>[^\]]+)\|(?<fullmessage>(.|\r|\n)*))
         </grok>		
       </parse>
     </filter>  
     
     <filter ms-logs-application>
       @type record_transformer
       remove_keys fullmessage
       enable_ruby
       <record>
         host.name ${hostname}
         remote_ip "#{(Socket.ip_address_list.detect do |intf| intf.ipv4_private? end).ip_address}"
         log.file.path "${record['tailed_path']}"
     	#remote_ip "%#{@metadata.ip_address}"
       </record>
     </filter>
     
     <match ms-logs-application>
       @type rewrite_tag_filter
       num_threads 8
       <rule>
         key grokfailure
         pattern /.*/
         tag grokfailure_log_app
       </rule>
       <rule>
         key application
         pattern /.*/
         tag ms-logs-app-matched
       </rule>     
     </match>
     
     <match ms-logs-app-matched>
       @type elasticsearch_dynamic
       num_threads 8
       @log_level info
       host <IP>
       suppress_type_name true
       include_tag_key true
       reload_connections true
       #port 9200
       logstash_format true
       #index_name fluentd.${tag}.%Y%m%d
       
       #%{application}-%{+YYYY.MM.dd}
       logstash_prefix myapp-application-${record['application']}
       <buffer>
          @type file
          path /data/logs/*/app/*.log
          flush_mode interval
          retry_type exponential_backoff
          flush_thread_count 8
          flush_interval 5s
          retry_forever true
          retry_max_interval 30
          chunk_limit_size 2M
          queue_limit_length 32
          overflow_action throw_exception
         </buffer>
     </match>  
     
     <match grokfailure_log_app>
       @type elasticsearch_dynamic
       num_threads 8
       @log_level info
       suppress_type_name true
       include_tag_key true
       reload_connections true
       hosts <ip>
       #port 9200
       logstash_format true
       #%{application}-%{+YYYY.MM.dd}
       logstash_prefix app-nonematch
       #type_name fluentd.${tag}.%Y%m%d
     </match>    
   </label>   
   <filter ms-logs-application>
    @type parser
    key_name message
    reserve_data true
     <parse>
       @type grok
   	grok_failure_key grokfailure
   	<grok>
       pattern (?<message>[^\]]+ (?<timestamp>%{HOUR}:%{MINUTE}:%{SECOND}.%{NONNEGINT})\|\[(?<thread>[^\]]+)\]\|%{IPORHOST:pod_instance}\|(?<severity>([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo?(?:rmation)?| INFO?(?:RMATION)?|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?))\|%{GREEDYDATA:log_type}\|%{GREEDYDATA:application}\|%{GREEDYDATA:microservice}\|%{UUID:uuid}\|(?<message_type>[^\]]+)\|(?<fullmessage>(.|\r|\n)*))
       </grok>		
     </parse>
   </filter>  
   
   <filter ms-logs-application>
     @type record_transformer
     remove_keys fullmessage
     enable_ruby     
     <record>
       host.name ${hostname}
       remote_ip "#{(Socket.ip_address_list.detect do |intf| intf.ipv4_private? end).ip_address}"
       log.file.path "${record['tailed_path']}"
   	#remote_ip "%#{@metadata.ip_address}"
     </record>
   </filter>
   
   <match ms-logs-application>
     @type rewrite_tag_filter
     num_threads 8
     <rule>
       key grokfailure
       pattern /.*/
       tag grokfailure_log_app
     </rule>
     <rule>
       key application
       pattern /.*/
       tag ms-logs-app-matched
     </rule>     
   </match>
   
   <match ms-logs-app-matched>
     @type elasticsearch_dynamic
     
----
   </match>   
   
   <match grokfailure_log_app>
     @type elasticsearch_dynamic
  -----
   </match>     

Your Error Log

<!-- Write your **ALL** error log here -->

Additional context

@pawankkamboj
Copy link

pawankkamboj commented Feb 15, 2021

We are facing the some issue, after upgrading to 1.12. Not reading some file.

@ashie
Copy link
Member

ashie commented Feb 16, 2021

Doesn't it reproduce with v1.11.x ? @indrajithgihan @pawankkamboj

@ashie
Copy link
Member

ashie commented Feb 16, 2021

Recently test_rotate_file_with_open_on_every_update sometimes (often?) fails: https://travis-ci.org/github/fluent/fluentd/jobs/759131293
It may be related with this issue.

@indrajithgihan
Copy link
Author

indrajithgihan commented Feb 16, 2021

@ashie @repeatedly
I had the same issue with v1.11.5 as well. I am using fluntd-concat plugin and observed high cpu usage within fluentd pod with lots of timeout flushes in the log. Could this be an issue for not detecting new log files?

2021-02-16 10:08:05 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:08:35 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:08:57 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:09:23 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:09:55 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:10:23 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:10:41 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:11:17 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:11:32 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:08 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:26 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:41 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:12:53 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:13:17 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:13:38 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:13:55 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:14:09 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:14:36 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:14:54 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:15:15 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:15:30 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:15:54 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:16:30 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:16:43 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:16:56 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:17:29 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:17:41 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:18:01 +0545 [info]: #0 Timeout flush: ms-logs-application:default
2021-02-16 10:18:21 +0545 [info]: #0 Timeout flush: ms-logs-application:default

@hulucc
Copy link

hulucc commented Mar 15, 2021

Had the same issue here. And I found that fluentd stopping pick log randomly after deployment rolling update, and restart fluentd helps. Version 1.12

@joshbranham
Copy link

joshbranham commented Mar 22, 2021

Had the same issue here. And I found that fluentd stopping pick log randomly after deployment rolling update, and restart fluentd helps. Version 1.12

Confirmed we were seeing this with 1.12.1 across many different kubernetes clusters. Rolling back to 1.11.x has fixed it

@snorwin
Copy link

snorwin commented Mar 23, 2021

We are facing the same issue using fluentd version 1.12.1 on Red Hat Enterprise Linux 7.9 with kernel version: 3.10.0-1160.2.2.el7.x86_64 and running dockerd with --log-driver json-file --log-opt max-size=20M --log-opt max-file=3

It seems that the positions in the pos filefor some pods randomly are stuck at the max file size of ~20MiB:

$ cat /var/log/fluentd-containers.log.pos | grep xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx
/var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log       0000000001312d81        000000002c2a3401
/var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log       00000000013136db        000000002c2a3403
/var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log       0000000001312db2        000000002c2a3404

If we follow the symbolic link:

$ readlink -f  /var/log/containers/xxxxxxx-xxxxxxxxxxxx-xxx-xx-xxxxx_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.log
/var/lib/docker/containers/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-json.log

and check the inode and the file size:

$ stat /var/lib/docker/containers/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-json.log
  File: ‘/var/lib/docker/containers/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-json.log’
  Size: 7805313         Blocks: 16384      IO Block: 4096   regular file
Device: fd09h/64777d    Inode: 740963332   Links: 1
Access: (0400/-r--------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:container_var_lib_t:s0
Access: 2021-03-23 09:37:24.959581215 +0100
Modify: 2021-03-23 11:17:23.688752202 +0100
Change: 2021-03-23 11:17:23.688752202 +0100
 Birth: -

the inode of stat 740963332 -> 2c2a3404 seems to match, however the size 7805313-> 771981 was not reset properly after the rotation.

Input configuration in fluent.conf:

<source>
  # See https://docs.fluentd.org/input/tail 
  @type tail
  @label @CONCAT
  path /var/log/containers/*.log
  pos_file "#{ENV['POS_FILE_PATH']}/fluentd-containers.log.pos"
  tag kubernetes.*
  read_from_head true
  follow_inodes true
  refresh_interval 15
  rotate_wait 5
  enable_stat_watcher false
  <parse>
    time_format %Y-%m-%dT%H:%M:%S.%N%Z
    keep_time_key true
    @type json
    time_type string
  </parse>
</source>

@nvtkaszpir
Copy link

can you guys provide info about your settings in the nodes:

sysctl fs.inotify.max_user_watches
sysctl fs.inotify.max_user_instances

@snorwin
Copy link

snorwin commented Mar 23, 2021

$ sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 65536
$ sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128

@joshbranham
Copy link

root@fluentd-2656g:/fluentd# sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
root@fluentd-2656g:/fluentd# sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 8192
root@fluentd-2656g:/fluentd#

@ashie
Copy link
Member

ashie commented Mar 24, 2021

Although I'm not sure all of your problems are same or not, #3274 or #3224 or #3292 may be the same issue.
These bugs are introduced at 1.12.0 (#3182) so that your problem is different one if it reproduces with 1.11 too (#3239 (comment) seems different one). Could you try & report whether your issue reproduces with 1.11 or not?

They are already fixed in master branch (#3275 and #3294) but not released yet.
We are preparing to release 1.12.2 until the end of this month, please try it after it's released.
If it still reproduces with 1.12.2 (or 1.11), I'll take a look it.

@ashie ashie self-assigned this Mar 24, 2021
@ashie ashie added the bug label Mar 24, 2021
@indrajithgihan
Copy link
Author

sysctl fs.inotify.max_user_instances

fs.inotify.max_user_instances = 128

sysctl fs.inotify.max_user_watches

fs.inotify.max_user_watches = 8192

I had the same issue with v1.11.5 as well. I am using fluntd-concat plugin and observed high cpu usage within fluentd pod with lots of timeout flushes in the log. Could this be an issue for not detecting new log files?

@nvtkaszpir
Copy link

@ashie I think the issues may be related, if someone leaves sysctl inotify parameters on default values for given operating system and has a lot of containers per node (like, hitting limits per instance + pods come and go/pods with mutliple sidecars) then the issue of not tailing logs may be also triggered, though AFAIR there should be another error message in that case.

We had that problems with different logging shipper - promtail.

@snorwin
Copy link

snorwin commented Mar 24, 2021

@ashie we rolled back fluentd to 1.11.5 yesterday and so far haven't encountered any issues.

@TomasKohout
Copy link

We also had to roll back to 1.11.x due to this error. 🙁

@ashie
Copy link
Member

ashie commented Mar 25, 2021

It seems that some different problems are mixed in this issue.
The original report by @indrajithgihan isn't 1.12 specific, so we don't treat 1.12 specific problems in this issue.

reproduces with both 1.11 and 1.12

We'll continue to investigate it in this issue

@indrajithgihan @nvtkaszpir

1.12 specific

Probably it will be resolved by 1.12.2 (we'll release it next week).
Please file a new issue if your problem sitll reproduces with 1.12.2.

@joshbranham @snorwin @TomasKohout

TBD (probably 1.12 specific?)

Rolling back to 1.11 may resolve your issue.

@pawankkamboj @hulucc

@pawankkamboj
Copy link

pawankkamboj commented Mar 25, 2021 via email

@Cryptophobia
Copy link

Cryptophobia commented Apr 8, 2021

Can this be closed now? Has anyone tested with 1.12.2 to confirm in_tail fixes have provided the stability?

If nobody can report back, today we build and test as well.

@ashie
Copy link
Member

ashie commented Apr 9, 2021

We receive no 1.12 specific bug reports yet after releasing 1.12.2.
We hope 1.12 specific bugs are resolved by 1.12.2.

Can this be closed now?

We don't close this issue yet because:

The original report by @indrajithgihan isn't 1.12 specific, so we don't treat 1.12 specific problems in this issue.

Probably the original comment is another issue that exists from past and rarely causes.

@Cryptophobia
Copy link

@ashie we receive reports from internal user managing many nodes that in_tail is not tailing all of the logs. Could this be a kernel issue or something else related down the stack?

Any guidelines for minimum kernel version and what dependencies are required for fluentd v1.12.2?

@ashie
Copy link
Member

ashie commented Apr 9, 2021

Again, we don't treat 1.12 specific problems anymore at this issue.
If your problem reproduces only with v1.12 or later, please open a new issue and describe your problem as detailed as possible.

@ashie ashie removed the bug label Apr 9, 2021
@ashie
Copy link
Member

ashie commented Apr 9, 2021

I remove bug label because we don't yet identify the cause of the original report of this issue.

@SimonForsman
Copy link

I'm having this issue in 1.11.5 running on k8s, the issue only seems to occur (or it might just be a coincidence) after a few weeks of uptime and only on nodes that start temporary pods frequently. Could this be caused by large .pos files or has that been ruled out already ?

I've enabled compacting now but it might be a few weeks before i can know if it helped.

@Cryptophobia
Copy link

@ashie , yes this is not a 1.12.2 specific problem. We verified yesterday. It is reproducible with 1.11.2 and 1.12.2 for us. We tried both of those version on top of nodes with Ubuntu 20.04 kernel 5.4 . This is why I am asking if this problem is related to dependencies or kernel versions.

@ashie
Copy link
Member

ashie commented Apr 12, 2021

@Cryptophobia We don't have influential clues for this issue yet, still some different issues may be mixed.
@nvtkaszpir pointed out sysctl parameters but I think it's not applicable for @indrajithgihan's original comment because enable_stat_watcher false is used, it doesn't use inotify.

If you have an environment which doesn't reproduce this issue, please let me know the difference of them as detailed as possible.

@Cryptophobia
Copy link

The issues #3333 and #3332 are the last two remaining issues we seeing in our environment with fluentd v1.12.2 . We recently switched to using gracefulReload after upgrading to v1.12.2 instead of previously using reload endpoint.

@ashie
Copy link
Member

ashie commented Apr 28, 2021

#3357 might be a same issue.

It seems cause by a big log file (350MB)

This would indicate that it took 18 min to read that file because once it was done reading that file, it moved on. I suspect the cause here is a large log file.

@ashie ashie added the bug label Apr 28, 2021
@ashie
Copy link
Member

ashie commented Apr 30, 2021

#3357 might be a same issue.

It seems cause by a big log file (350MB)

This would indicate that it took 18 min to read that file because once it was done reading that file, it moved on. I suspect the cause here is a large log file.

Hmm, I've confirmed that in_tail plugin cannot run refresh_watchers method while reading a big log file, it means in_tail can't detect & read new log files while reading a big file. FYI: #3323 (comment)
The log throttling feature #3185 may resolve this issue.

@SimonForsman
Copy link

We've not encountered the issue since apr 9th when we enabled compacting of the position file. (so ~30 days uptime on fluentd without issues)

@ashie
Copy link
Member

ashie commented May 10, 2021

We've not encountered the issue since apr 9th when we enabled compacting of the position file. (so ~30 days uptime on fluentd without issues)

Good.
For our reference, cloud you tell me your config for compacting pos file (pos_file_compaction_interval)?
@SimonForsman

@ashie
Copy link
Member

ashie commented May 11, 2021

It seems cause by a big log file (350MB)

Probably #2478 is a same issue with this.

@SimonForsman
Copy link

SimonForsman commented May 11, 2021

For our reference, cloud you tell me your config for compacting pos file (pos_file_compaction_interval)?
@SimonForsman

<source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/containers.log.pos
  enable_stat_watcher false
  enable_watch_timer true
  pos_file_compaction_interval 1h
  read_from_head true
  tag kubernetes.*
  <parse>
    keep_time_key true
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    time_key {{ .Values.k8sLogTimeKey }}
    @type {{ .Values.k8sLogFormat }}
  </parse>
</source>

I don't know disabling the stat/inotify watcher matters, (It was one of the first things we tried and we've never reverted it even though it didn't solve the issue, atleast not on its own)

Also if this is an issue with large files in general and not just large pos files it might not help for everyone. (we have a lot of short lived containers so our pos file got significantly larger than our log files, and we're still monitoring things since we're not sure this helped or if we've just been lucky for the past month)

@ashie
Copy link
Member

ashie commented May 26, 2021

It seems cause by a big log file (350MB)

Probably #2478 is a same issue with this.

read_bytes_limit_per_second feature (#3185, #3364 and #3379) which will be effective against the big files issue has been merged.
It's included in v1.13.0 which will be released at the end of this month (May 2021).
Please try it if you still have this issue.

Usage: https://github.com/fluent/fluentd-docs-gitbook/pull/259/files

@indrajithgihan
Copy link
Author

indrajithgihan commented Jun 4, 2021

It seems cause by a big log file (350MB)

Probably #2478 is a same issue with this.

read_bytes_limit_per_second feature (#3185, #3364 and #3379) which will be effective against the big files issue has been merged.
It's included in v1.13.0 which will be released at the end of this month (May 2021).
Please try it if you still have this issue.

Usage: https://github.com/fluent/fluentd-docs-gitbook/pull/259/files

@ashie Does it require to disable stat watchers to use this feature?

enable_stat_watcher false

@ashie
Copy link
Member

ashie commented Jun 4, 2021

@ashie Does it require to disable stat watchers to use this feature?

enable_stat_watcher false

No, enabling it is also supported.

@ashie
Copy link
Member

ashie commented Jul 12, 2021

read_bytes_limit_per_second feature (#3185, #3364 and #3379) which will be effective against the big files issue has been merged.
It's included in v1.13.0 which will be released at the end of this month (May 2021).
Please try it if you still have this issue.

Usage: https://github.com/fluent/fluentd-docs-gitbook/pull/259/files

Effective of this feature is confirmed at #3423.
Probably most similar issues can be resolved by it, so now I close this issue.
If you still have this issue even if you use read_bytes_limit_per_second feature, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants