Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail may not send rotated logs when mv is used to rotate #3292

Closed
aYukiSekiguchi opened this issue Mar 15, 2021 · 8 comments
Closed

in_tail may not send rotated logs when mv is used to rotate #3292

aYukiSekiguchi opened this issue Mar 15, 2021 · 8 comments
Assignees

Comments

@aYukiSekiguchi
Copy link

aYukiSekiguchi commented Mar 15, 2021

Describe the bug
After upgrading fluentd from 0.14.24 to 1.12.1, tail plugin + my script which rotates the watched log using mv sometimes won't watch the rotated log file and send the first part of rotated logs.

To Reproduce

Watch /tmp/script.log

<source>
  @type tail
  path /tmp/script.log
  tag my-tag
  pos_file /var/log/td-agent/script.pos
  <parse>
    @type json
    time_key time
  </parse>
</source>

mv /tmp/script.log to /tmp/script.log.20210314 and write to /tmp/script.log

Expected behavior
The content in /tmp/script.log should be sent.

Your Environment

  • Fluentd or td-agent version: td-agent 4.1.0 fluentd 1.12.1 (e3effa3)
  • Operating system: Amazon Linux 2
  • Kernel version: 4.14.192-147.314.amzn2.x86_64

Your Configuration

<source>
  @type tail
  path /tmp/script.log
  tag my-tag
  pos_file /var/log/td-agent/script.pos
  <parse>
    @type json
    time_key time
  </parse>
</source>

Your Error Log

none (No rotate detection log)

Additional context

I'm trying to upgrade fluentd from 0.14.24 to 1.12.1 and faced two problems.

My script is called by cron every day. At the start of the script, it mvs log to rotate, do something great and dump logs to a new file.

FILE_NAME = '/tmp/script.log'

def rotate_file_if_exist
  if File.exists?(FILE_NAME)
    new_file_name = "#{FILE_NAME}.#{Time.now.strftime('%Y%m%d')}"
    FileUtils.mv(FILE_NAME, new_file_name)
  end
end

fluentd is watching the file name. My config is:

<source>
  @type tail
  path /tmp/script.log
  tag my-tag
  pos_file /var/log/td-agent/script.pos
  <parse>
    @type json
    time_key time
  </parse>
</source>

Here are problems I faced:

  1. TailWatcher sometimes won't restart
  2. TailWatcher may not send the first part of rotated logs

I debugged them using mv and cp command.

TailWatcher won't restart

When I run mv, the file of the filename becomes none. Then, TailWatcher#@rotate_handler becomes nil. This stops to watch the inode.

https://github.com/fluent/fluentd/blob/v1.12.1/lib/fluent/plugin/in_tail.rb#L763

          else # file is rotated and new file not found
            # Clear RotateHandler to avoid duplicated file watch in same path.
            @rotate_handler = nil
            watcher_needs_update = true
          end

          if watcher_needs_update
            # No need to update a watcher if stat is nil (file not present), because moving to inodes will create
            # new watcher, and old watcher will be closed by stop_watcher in refresh_watchers method
            if stat
              target_info = TargetInfo.new(@path, stat.ino)
              if @follow_inodes
                # don't want to swap state because we need latest read offset in pos file even after rotate_wait
                @update_watcher.call(target_info, @pe)
              else
                @update_watcher.call(target_info, swap_state(@pe))
              end
            end

The comment says "moving to inodes will create new watcher, and old watcher will be closed by stop_watcher in refresh_watchers method". However, this only happens if refresh_watchers() runs when there is no file in the path and refresh_watchers() runs later when a file is created in the path. If a file is mved and a new file is created between refresh_watchers()s, refresh_watchers() won't run start_watchers() and stop_watchers() for the path because target_paths_hash always contains the path.

TailWatcher may not send the first part of rotated logs

refresh_watchers() will create a new TailWatcher if a new file is created. refresh_watchers() runs every refresh_interval seconds, 60 seconds as default.

The problem is when a new file is created and logs are written to the file during refresh_interval, the new TimeWatcher won't send the written logs because it watch the tail of the new file which contains the written logs.

read_from_head true fixes this problem, but I'm not sure whether this is intentional fix in #3182. Even if it is intentional, it would be better to write a document about this behaviour.

Why fluentd 0.14.24 worked?

fluentd 0.14.24 creates a new TailWatcher soon even if a new file doesn't exist:
https://github.com/fluent/fluentd/blob/v0.14.24/lib/fluent/plugin/in_tail.rb#L584

          if watcher_needs_update
            @update_watcher.call(@path, swap_state(@pe))
          else

It looks like this is the reason why old fluentd detected the rotation but new fluentd doesn't.

Adding FileUtils.touch(FILE_NAME) to my script sometimes fixes these problems. However, since mv and touch aren't atomic, StatWatcher can detect the timing that there is no file for the filename, IIUC. Even if enable_stat_watcher is false, there is rare timing TimerTrigger calls on_notify when there is no file for a file name.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Mar 15, 2021

I'd sent a PR to fix the former "TailWatcher won't restart" issue: #3294
Could you try it out?

@aYukiSekiguchi
Copy link
Author

Thank you for the quick response!
I applied your PR to my local fluentd.

I'll answer the result of next log rotation tomorrow.

@aYukiSekiguchi
Copy link
Author

@cosmo0920 #3294 fixed this problem. Thank you!

@cosmo0920
Copy link
Contributor

Thanks for the confirmation, @aYukiSekiguchi !

@cosmo0920
Copy link
Contributor

I'd confirmed that the latter issue of TailWatcher may not send the first part of rotated logs is fixed with #3294.

@aYukiSekiguchi
Copy link
Author

Thank you for the quick fix!

@hikkie3110
Copy link

@cosmo0920
I have a question I would like you to tell me.

Will this PR also work for windows server?
and do the behavior mentioned in #3292 also occur on windows server?

I'm trying to assess the impact on windows server.

@daipom
Copy link
Contributor

daipom commented Apr 18, 2022

@hikkie3110 This problem occurs also on Windows. However only td-agent 4.1.0 (Fluentd 1.12.0, 1.12.1) has this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants