Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log forwarding stopped after "No such file or directory" error in in_tail #3224

Closed
yvespp opened this issue Jan 8, 2021 · 3 comments
Closed

Comments

@yvespp
Copy link

yvespp commented Jan 8, 2021

Describe the bug

Fluentd stops forwarding logs after error:

Unexpected error raised. Stopping the timer. title=:in_tail_refresh_watchers error_class=Errno::ENOENT error="No such file or directory @ rb_file_s_stat - 

This error only startted happening when I updated to fluentd 1.12.0 and set follow_inodes true in the tail plugin.
Sometime there is also this warn:

#0 Unparsable line in pos_file:

To Reproduce
Happens randomly in our environment, happens more often on servers with more log volume.

Expected behavior
No error should happen. If an error happens the log should still be forwarded.

Your Environment

  • Fluentd or td-agent version: fluentd 1.12.0
  • Operating system: we use the docker image fluent/fluentd:v1.12.0-1.0 on Ubuntu 20.04.1 LTS
  • Kernel version: 5.4.0-54-generic
  • containerd: v1.4.3
  • Kubernetes: v1.19.5

Your Configuration

<source>
  @type tail
  path /var/log/pods/*/*/*.log
  pos_file /fluentd/log/fluent_output.log.pos
  pos_file_compaction_interval 1h
  follow_inodes true
  read_from_head true
  refresh_interval 2
  tag except.kubernetes.*
  path_key tailed_path
  <parse>
    @type regexp
    expression "/^(?<time>.+) (stdout|stderr) [FP] (?<log>.+)?$/."
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
</source>

# drop empty log lines
<filter except.kubernetes.**.log>
  @type grep
  <exclude>
    key log
    pattern /^$/
  </exclude>
</filter>

# get kubernetes pod infos from log path
<filter except.kubernetes.**.log>
  @type parser
  key_name tailed_path
  reserve_time true
  reserve_data true
  remove_key_name_field true
  hash_value_field kubernetes
  <parse>
    @type regexp
    expression /^\/var\/log\/pods\/(?<namespace_name>.+)_(?<pod_name>.+)_.*\/(?<container_name>.*)\/.*\.log$/
  </parse>
</filter>

<filter except.kubernetes.**.log>
  @type record_transformer
  enable_ruby true
  <record>
    # Add cluster name
    # Line breaks may be trimmed when collecting from files. Add them back so that
    # multi line logs are still in multi line after combined by detect_exceptions.
    # Remove this if https://github.com/GoogleCloudPlatform/fluent-plugin-detect-exceptions/pull/10 is released
    # strings in ruby are mutable, so make use of it
    log ${ record['kubernetes']['cluster'] = "prod"; record["log"] << "\n" }
  </record>
</filter>

# expose metrics in prometheus format
<source>
  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics
</source>
<source>
  @type prometheus_output_monitor
  interval 10
  <labels>
    hostname ${hostname}
  </labels>
</source>

# After the exception detection match the record reenter the processing
# from the beginning. Since except is removed from the tag its not processed
# a second time here.
<match except.kubernetes.**.log>
  @type detect_exceptions
  message log
  remove_tag_prefix except
  multiline_flush_interval 1
</match>

# Drop Logs from AMA / Spoud forwarding, to prevent Log-Loops
<filter kubernetes.**.log>
  @type grep
  <or>
    <exclude>
      key $.kubernetes.namespace_name
      pattern /^ama-[a-z]$/
    </exclude>
    <exclude>
      key log
      pattern /^\s+$/
    </exclude>
  </or>
</filter>

# filter out spamy nginx-ingress warnings
<filter kubernetes.**.log>
  @type grep
  <and>
    <exclude>
      key $.kubernetes.namespace_name
      pattern /^cop-nginx-ingress$/
    </exclude>
    <exclude>
      key log
      pattern /W0806|\]TCP2000/
    </exclude>
  </and>
</filter>

# tag AMA beats
<match kubernetes.**.log>
  @type rewrite_tag_filter
  <rule>
    key log
    pattern /beatType/
    tag amabeat.${tag}
  </rule>
  <rule>
    key log
    pattern /.+/
    tag k8s.${tag}
  </rule>
</match>

<match **>
  @type relabel
  @label @standardLog
</match>

# send k8s.** to splunk and amabeat.** to ama endpoint
<label @standardLog>
  # add tag field to use it as source in splunk
  <filter **.kubernetes.**.log>
    @type record_transformer
    <record>
      tag ${tag}
    </record>
  </filter>

  <match k8s.**>
    @type splunk_hec
    port 8088
    host <edited>
    token <edited>
    default_index idx_sep_pro
    source_key tag
    remove_source_key true
    ssl_verify false
    use_ssl true
    default_host "#{ENV['HOSTNAME']}"
    default_sourcetype dockerjson
    time_as_integer false

    <buffer>
      flush_thread_count 8
      flush_interval 2s
      chunk_limit_size 8M
      retry_max_interval 30
      retry_forever true
    </buffer>
  </match>


  <match amabeat.**>
    @type http
    endpoint_url <edited>
    ssl_no_verify true
    bulk_request true
    compress_request false
    <buffer>
      flush_thread_count 8
      flush_interval 2s
      chunk_limit_size 2M
      retry_max_interval 30
    </buffer>
  </match>

</label>

# Drop fluentd log as it's written to stdout anyway
<label @FLUENT_LOG>
  <match fluent.*>
    @type null
  </match>
</label>

Your Error Log
This show only the last log lines.

2021-01-07 14:58:18 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:01:18 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:01:18 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:04:29 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:04:29 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:05:00 +0000 [warn]: #0 Unparsable line in pos_file: /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.lo000000000020101354	00000000003b113a
2021-01-07 15:07:42 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:07:42 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:10:54 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:10:54 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:14:06 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:14:06 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:17:16 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:17:16 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:20:28 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:20:28 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:23:39 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:23:39 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:26:49 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:26:49 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:30:11 +0000 [error]: #0 Unexpected error raised. Stopping the timer. title=:in_tail_refresh_watchers error_class=Errno::ENOENT error="No such file or directory @ rb_file_s_stat - /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log"
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:316:in `stat'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:316:in `block in expand_paths'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:315:in `each'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:315:in `expand_paths'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin/in_tail.rb:344:in `refresh_watchers'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin_helper/timer.rb:80:in `on_timer'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run_once'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/cool.io-1.7.0/lib/cool.io/loop.rb:88:in `run'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2021-01-07 15:30:11 +0000 [error]: #0 /usr/lib/ruby/gems/2.7.0/gems/fluentd-1.12.0/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2021-01-07 15:30:11 +0000 [error]: #0 Timer detached. title=:in_tail_refresh_watchers
2021-01-07 15:30:11 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds
2021-01-07 15:30:11 +0000 [info]: #0 following tail of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log
2021-01-07 15:33:45 +0000 [info]: #0 detected rotation of /var/log/pods/ns-a_pod-a-76c45d5b59-kmrfl_86c23b54-348c-426d-ac46-2ca58bb05f3c/pod-a/0.log; waiting 5 seconds

Additional context
Before updating to fluentd 1.12.0 we also got unreadable warnings like below from time to time. I think it has to with the way the kubelet rotates the containerd logs and the new log isn't readable while it's being created:

2021-01-08 07:51:49 +0000 [warn]: #0 /var/log/pods/ns-b_pod-b_9a6734a0-96e4-4005-9dc3-8d46e9e9f9ba/pod-b/89.log unreadable. It is excluded and would be examined next time.
@yvespp
Copy link
Author

yvespp commented Jan 12, 2021

Also tried without follow_inodes true and the No such file or directory error still happens.

@repeatedly
Copy link
Member

I will check recent code changes.

@jgehrcke
Copy link
Contributor

jgehrcke commented Mar 2, 2021

Attempt to fix: #3275 -- quick review and feedback appreciated. Thanks!

@ashie ashie closed this as completed in 48ff30d Mar 4, 2021
ashie added a commit that referenced this issue Mar 4, 2021
in_tail: expect ENOENT during stat (try to fix #3274 and #3224)
ashie added a commit to ashie/fluentd that referenced this issue Mar 4, 2021
ashie added a commit to ashie/fluentd that referenced this issue Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants