Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with file output plugin after upgrade to 4.4.0 #411

Closed
piotr-janek opened this issue Aug 18, 2022 · 8 comments · Fixed by fluent/fluentd#3864
Closed

Problem with file output plugin after upgrade to 4.4.0 #411

piotr-janek opened this issue Aug 18, 2022 · 8 comments · Fixed by fluent/fluentd#3864

Comments

@piotr-janek
Copy link

As td-agent 4.4.0 is using fluentd 1.15.1, there is a change in file output plugin. The plugin now creates /tmp/fluentd-lock-* directories. While td-agent is run without --daemon flag everything works great, the temp dir is created and the app works as expected. But when --daemon option is set then the temp directory is created but then it is removed and after that the name of that directory is passed to the child processes. That blocks them from functioning properly and makes them throw lots of No such file or directory errors.

Related strace output

06:31:37.550299 mkdir("/tmp/fluentd-lock-20220818-5506-14yftxz", 0700) = 0
06:31:37.579237 lstat("/tmp/fluentd-lock-20220818-5506-14yftxz", {st_dev=makedev(259, 3), st_ino=25167781, st_mode=S_IFDIR|0700, st_nlink=2, st_uid=995, st_gid=992, st_blksize=4096, st_blocks=0, st_size=6, st_atime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_atime_nsec=549858004, st_mtime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_mtime_nsec=549858004, st_ctime=1660804297 /* 2022-08-18T06:31:37.549858004+0000 */, st_ctime_nsec=549858004}) = 0
06:31:37.580017 openat(AT_FDCWD, "/tmp/fluentd-lock-20220818-5506-14yftxz", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 10
06:31:37.582363 rmdir("/tmp/fluentd-lock-20220818-5506-14yftxz") = 0
06:31:37.600303 execve("/opt/td-agent/bin/ruby", ["/opt/td-agent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/opt/td-agent/bin/fluentd", "--log", "/var/log/td-agent/td-agent.log", "--daemon", "/var/run/td-agent/td-agent.pid", "--under-supervisor"], ["FLUENT_PLUGIN=/etc/td-agent/plugin", "GEM_PATH=/opt/td-agent/lib/ruby/gems/2.7.0/", "GEM_HOME=/opt/td-agent/lib/ruby/gems/2.7.0/", "TD_AGENT_LOG_FILE=/var/log/td-agent/td-agent.log", "FLUENT_SOCKET=/var/run/td-agent/td-agent.sock", "LD_PRELOAD=/opt/td-agent/lib/libjemalloc.so", "FLUENT_CONF=/etc/td-agent/td-agent.conf", "XDG_SESSION_ID=c1593", "HOSTNAME=<redacted>", "SHELL=/bin/bash", "TERM=xterm-256color", "HISTSIZE=1000", "USER=td-agent", "LS_COLORS=rs=0:di=38;5;27:ln=38;5;51:mh=44;38;5;15:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=05;48;5;232;38;5;15:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;34:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.Z=38;5;9:*.dz=38;5;9:"..., "MAIL=/var/spool/mail/td-agent", "PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin", "PWD=/var/lib/td-agent", "LANG=en_US.UTF-8", "HISTCONTROL=ignoredups", "SHLVL=1", "HOME=/var/lib/td-agent", "LOGNAME=td-agent", "LESSOPEN=||/usr/bin/lesspipe.sh %s", "XDG_RUNTIME_DIR=/run/user/995", "_=/opt/td-agent/bin/fluentd", "FLUENTD_LOCK_DIR=/tmp/fluentd-lock-20220818-5506-14yftxz", "SERVERENGINE_SOCKETMANAGER_PATH=/tmp/SERVERENGINE_SOCKETMANAGER_2022-08-18T06:31:37Z_5695", "SERVERENGINE_WORKER_ID=0", "SERVERENGINE_SOCKETMANAGER_INTERNAL_TOKEN=7b50c9427382eebbf950bffb9d6b1809"]) = 0

6:33:06.155735 open("/tmp/fluentd-lock-20220818-5506-14yftxz/fluentd-_var_log_<redacted>_log.lock", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = -1 ENOENT (No such file or directory)
06:33:06.157334 write(5, "2022-08-18 06:33:06 +0000 [warn]: #3 failed to flush the buffer. retry_times=0 next_retry_time=2022-08-18 06:33:07 +0000 chunk=\"5e67e2765ec0c69c43f4717523cd22b1\" error_class=Errno::ENOENT error=\"No such file or directory @ rb_sysopen - /tmp/fluentd-lock-20220818-5506-14yftxz/fluentd-_var_log_<redacted>_log.lock\"\n", 383) = 383

The problem does not occur in ealier td-agent version as it using version of tdagent that does not create this directory.

What kind of information should I attach to make it easier for you to find the solution?

@ashie
Copy link
Member

ashie commented Aug 18, 2022

@fujimotos @daipom Could you take a look this?

@daipom
Copy link
Contributor

daipom commented Aug 19, 2022

Sure! I will.

@fujimotos
Copy link
Member

This is my bug. I implemented the tempdir creation as follows:
Evidently se.run will exit early with --daemon.

https://github.com/fluent/fluentd/blob/master/lib/fluent/supervisor.rb#L874-L877

      Dir.mktmpdir("fluentd-lock-") do |lock_dir|
        ENV['FLUENTD_LOCK_DIR'] = lock_dir
        se.run
      end

So what we probably need is revert fluent/fluentd@75ef92f,
which introduced the automatic cleanup based on the PR feedback.

@fujimotos
Copy link
Member

fujimotos commented Aug 19, 2022

And this is the fix: fluent/fluentd#3864
I confirmed it now works with --daemon with the following config:

<system>
  workers 3
</system>

<source>
  @type dummy
  tag test.log
</source>

<match test.**>
  @type file
  path test.log
  append true
  <buffer>
    @type memory
    flush_interval 3s
    flush_mode interval
  </buffer>
</match>

and by running Fluentd as follows:

$ fluentd --daemon test.pid --log test.log -c test.conf

@fujimotos
Copy link
Member

@ashie Can we go on to release td-agent v4.4.1? Since --daemon is included
in the default config, so I think we should make a point release for it.

ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon <%= Shellwords.shellescape("/var/run/#{project_name}/#{project_name}.pid") %> $TD_AGENT_OPTIONS

@ashie
Copy link
Member

ashie commented Aug 19, 2022

Yea, we should release it ASAP.
In addition, I want to include fluent-plugin-kafka's fix: fluent/fluent-plugin-kafka#466

@fujimotos
Copy link
Member

Fixed by fluent/fluentd#3864. Schedule to be released early next week.

@piotr-janek
Copy link
Author

Thanks, I did not expect this to happen so fast. You are awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants