in_tail take long time to found rotated files #2478

metayd · 2019-07-03T11:16:36Z

related to issue #2477

Will there be any risk if I change read_more = true to read_more = false, such as losing record or other issues, just in my copy of in_tail source code?

We came across a problem here. We have ten log files, our log-generator writes record(0.5KB per record) to them by 1000 lines/s each and rotate it by renaming the file to another name(the inode changed but filename stay the same) if the size of the file reaches 100MB. We use in_tail to collect these files.

We found two problems.

when the file is rotated, it takes a long time for in_tail to discover the new inode of the logfile, and after several rotate, it seems in_tail just lost some of the rotated files(rotate is quite fast)
when starting the fluentd, I found it takes a long time for in_tail to start_watchers for these files when I look at the log of fluentd(I add some debug log after setup_watcher). And with the rotate issues, it seems fluentd finally lost some of the log files.

I'm not familiar with in_tail's performance, so maybe it's not in_tail's problem but the way I use it. The usage of CPU is not high and the flush speed is ok.

For the moment, I solve these two problems with change read_more = true to read_more = false(it take effects), but I'm not sure if there will be any risk with this change.

It would be very kind of you if you can give me some advice with these two problems.

metayd · 2019-07-03T11:20:09Z

It seems if we don't break the read loop at https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/in_tail.rb#L741, in_tail would take very long time to setup_watcher and read content from other file if log files is big.

repeatedly · 2019-07-05T02:29:13Z

when the file is rotated, it takes a long time for in_tail to discover the new inode of the logfile, and after several rotate, it seems in_tail just lost some of the rotated files(rotate is quite fast)

Does this mean detecting new file takes longer time, not reading data, right?

How about this parameter? https://docs.fluentd.org/input/tail#skip_refresh_on_startup

For the moment, I solve these two problems with change read_more = true to read_more = false(it take effects), but I'm not sure if there will be any risk with this change.

I see. I understood the backgroud of previous issue. Current implementation is mainly for hit EOF frequently so your approach is considerable.

metayd · 2019-07-05T09:30:46Z

@repeatedly

Does this mean detecting new file takes longer time, not reading data, right?

Yes. But there is also another problem, if read_more = true, it seems in_tail read only one file of logfiles for a long time(maybe several minutes) before it starts to read another file(I found that by watching the interval between pos file entry update), looks like in_tail is stuck to one of them. If read_more = false, it seems in_tail read from all files evenly.

Changing read_more = true to read_more = false is the only way I found to solve these two problems, I wish there is another way.

How about this parameter? https://docs.fluentd.org/input/tail#skip_refresh_on_startup

already set to be true

I see. I understood the backgroud of previous issue. Current implementation is mainly for hit EOF frequently so your approach is considerable.

The modified version of in_tail has been running for 2 days, and it looks fine(I'm not sure if there are potential issues). I want to know what kind of risk it would be if I change read_more=true to read_more=false, record loss? high CPU or memory? or any other?

I don't get the meaning of Current implementation is mainly for hit EOF frequently. Does this mean the risk of losing some records at the end of log files if log files are rotated? I think in_tail would finally reach the end of the file if the file is never rotated. How about only set read_more = true after the file is rotated?

repeatedly · 2019-07-16T15:25:37Z

The modified version of in_tail has been running for 2 days, and it looks fine(I'm not sure if there are potential issues). I want to know what kind of risk it would be if I change read_more=true to read_more=false, record loss? high CPU or memory? or any other?

Good to hear that no problem on your production. we check the several situations with your suggestion and if no problem, the change will be included in v1.7.0.

metayd · 2019-07-17T03:22:53Z

@repeatedly

Good to hear that no problem on your production. we check the several situations with your suggestion and if no problem, the change will be included in v1.7.0.

I'm sorry there is bad news.

I found that there may be some record lost. I add some log to print the file size and the fd pos when the io handler is closed and I found that sometimes the io handler is closed before fd.pos reach the end of the file, this means some records are lost forever.

Current implementation is mainly for hit EOF frequently.

I think I get the meaning of that now.

When rotate happens, in_tail would wait just 5 seconds according to the default rotate_wait setting. I guess when read_more = true, handle_notify would always hit the end of the file(not very sure). But when read_more=false, in_tail would only continue read the file for rotate_wait seconds and read 8192 byte (according to io.readpartial(8192, @iobuf)) for each time. So if fd.pos is far away from the file size or the flush speed of the output is slow, when the file is rotated, records at the end of the file are lost.

I resolve this by setting rotate_wait to a bigger value(the value is determined by testing in_tail against out log generate speed and output flush speed). But I don't know if it's suitable for fluentd.

We found two problems.
1. when the file is rotated, it takes a long time for in_tail to discover the new inode of the logfile, and after several rotate, it seems in_tail just lost some of the rotated files(rotate is quite fast)
2. when starting the fluentd, I found it takes a long time for in_tail to start_watchers for these files when I look at the log of fluentd(I add some debug log after setup_watcher). And with the rotate issues, it seems fluentd finally lost some of the log files.

These two problems remain, and the root cause of them is not clear. Changing read_more=true to read_more=false maybe not a good solution.

Signed-off-by: Masahiro Nakagawa <repeatedly@gmail.com>

Projoke · 2019-08-06T03:16:31Z

I aslo modify read_more to false ago; In my test environment, this change will reduce the CPU usage of fluentd process but will aslo decrese the performance。

ashie · 2021-05-20T02:40:20Z

#3185 (with #3364 and #3379) has been merged, which is similar fix with #2525.
read_bytes_limit_per_second 8192 will take same effect with #2525.
It's included in v1.13.0 which will be released at the end of this month (May 2021).
Please try it.

TODO:

But when read_more=false, in_tail would only continue read the file for rotate_wait seconds and read 8192 byte (according to io.readpartial(8192, @iobuf)) for each time. So if fd.pos is far away from the file size or the flush speed of the output is slow, when the file is rotated, records at the end of the file are lost.

It's not addressed in #3185 yet. Probably it should keep reading until reached to EOF.
I'm considering add a fix for it.

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie · 2021-06-01T02:18:09Z

I'm considering add a fix for it.

WIP: #3390

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

metayd added the bug label Jul 3, 2019

repeatedly added a commit that referenced this issue Jul 26, 2019

in_tail: Add switch_io_after_read parameter. fix #2478

520f89b

Signed-off-by: Masahiro Nakagawa <repeatedly@gmail.com>

repeatedly mentioned this issue Jul 26, 2019

in_tail: Add switch_io_after_read parameter. fix #2478 #2525

Closed

This was referenced May 11, 2021

in_tail is not picking up all log files in /var/log/containers/*.log #3357

Closed

Fluentd not picking new log files #3239

Closed

Add log throttling per file (revised) #3185

Merged

ashie added a commit to ashie/fluentd that referenced this issue May 25, 2021

in_tail: Ensure to reach EOF after rotate event if throttling is enabled

4d01f45

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie mentioned this issue May 25, 2021

in_tail: Ensure to reach EOF after rotate even if throttling is enabled #3390

Merged

ashie added a commit to ashie/fluentd that referenced this issue May 25, 2021

in_tail: Ensure to reach EOF after rotate if throttling is enabled

da465d7

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie added a commit to ashie/fluentd that referenced this issue May 25, 2021

in_tail: Ensure to reach EOF after rotate if throttling is enabled

bae3954

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie added a commit to ashie/fluentd that referenced this issue May 26, 2021

in_tail: Ensure to reach EOF after rotate if throttling is enabled

99327c6

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie added a commit to ashie/fluentd that referenced this issue May 26, 2021

in_tail: Ensure to reach EOF after rotate if throttling is enabled

1a7a97c

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie added a commit to ashie/fluentd that referenced this issue Jun 4, 2021

in_tail: Ensure to reach EOF after rotate if throttling is enabled

619d0f8

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

ashie added a commit to ashie/fluentd that referenced this issue Jun 10, 2021

in_tail: Ensure to reach EOF after rotate if throttling is enabled

2515497

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

This was referenced Jun 23, 2021

New Kubernetes container logs are not tailed by fluentd #3423

Closed

Kubernetes container logs - in_tail lose some of rotated logs when rotation is quite fast #3434

Closed

ashie added a commit to ashie/fluentd that referenced this issue Jul 9, 2021

in_tail: Ensure to reach EOF after rotate if throttling is enabled

e444a31

Fix fluent#2478 Signed-off-by: Takuro Ashie <ashie@clear-code.com>

kenhys closed this as completed in #3390 Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in_tail take long time to found rotated files #2478

in_tail take long time to found rotated files #2478

metayd commented Jul 3, 2019 •

edited

metayd commented Jul 3, 2019 •

edited

repeatedly commented Jul 5, 2019 •

edited

metayd commented Jul 5, 2019 •

edited

repeatedly commented Jul 16, 2019 •

edited

metayd commented Jul 17, 2019

Projoke commented Aug 6, 2019

ashie commented May 20, 2021

ashie commented Jun 1, 2021

in_tail take long time to found rotated files #2478

in_tail take long time to found rotated files #2478

Comments

metayd commented Jul 3, 2019 • edited

metayd commented Jul 3, 2019 • edited

repeatedly commented Jul 5, 2019 • edited

metayd commented Jul 5, 2019 • edited

repeatedly commented Jul 16, 2019 • edited

metayd commented Jul 17, 2019

Projoke commented Aug 6, 2019

ashie commented May 20, 2021

ashie commented Jun 1, 2021

metayd commented Jul 3, 2019 •

edited

metayd commented Jul 3, 2019 •

edited

repeatedly commented Jul 5, 2019 •

edited

metayd commented Jul 5, 2019 •

edited

repeatedly commented Jul 16, 2019 •

edited