Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fanotify support #114

Open
purpleidea opened this issue Jan 20, 2016 · 56 comments · May be fixed by #542
Open

Add fanotify support #114

purpleidea opened this issue Jan 20, 2016 · 56 comments · May be fixed by #542

Comments

@purpleidea
Copy link

Would there be any objections if someone sent in patches to add support for fanotify?

@nathany
Copy link
Contributor

nathany commented Jan 21, 2016

That would be great.

The only thing is, how does someone using Linux choose between them? We may need something along the lines of #104 first.

@purpleidea
Copy link
Author

Just wanted to get the approximate ACK, so that I can point people here if they're interested in fanotify benefits. Thanks!

@nathany
Copy link
Contributor

nathany commented Jan 31, 2017

@purpleidea I think it would be best to build fanotify Go wrapper out as a separate repo and then look at integrating after that.

@nathany
Copy link
Contributor

nathany commented Jan 31, 2017

@amir73il is working on a "super block watch" for Linux, providing "the ability to set a single (fanotify) watch on a root directory and get notified on all the legacy inotify events without the need to recursively add watches on all directories." https://lkml.org/lkml/2016/12/20/312

This could avoid the need for a user-space recursive watcher (#16) on modern Linux kernels.

@purpleidea
Copy link
Author

@nathany Thanks for the info! Looking forward to @amir73il's patches!

Cheers

@amir73il
Copy link

Well the patches are out there already in my github (applied to kernel 4.9), but for those of you hoping for this functionality to get upstream, I suggest to be patient.

I have no doubt it is going to be some time before this feature can be
merged to an official kernel.
My bet is that I will have to maintain it out of tree for a while, and only
after real users show genuine interest in the feature, it will be seriously
considered for upstream.

This is were you guys can be of help.
So far I had only one guy rooting for my patches on LKML
and he has also tested them on his system.

When promoting a feature for upstream it is important to bring solid use cases that require the feature and argue that the same cannot be achieved by user library code and existing kernel functionality.

However, if you can't test my work on a distro kernel then it is going to be harder to claim that it is beneficial for your use cases.
To solve this chicken and egg problem I plan to provide install-able
kernel modules for commonly used Linux distros, so using fanotify super block
should be as easy as e.g.: apt-get install fsntotify-tools.

I cannot guaranty when I will get to providing this level of installation though, so if there are any of you out there not afraid of building a custom kernel, I will gladly assist you if you want to test my patches.

Cheers.

@nathany
Copy link
Contributor

nathany commented Jan 31, 2017

Thanks Amir.

Perhaps another option to make the patched kernel available would be to maintain a Vagrant box built with Packer. That way we could test fanotify super block using a VirtualBox VM from any operating system.

@amir73il
Copy link

Yes, that could work. And I promise to assist the person who volunteers to work on this setup.

@tiwaana
Copy link

tiwaana commented Jan 31, 2017

Amir, which kernel version you would like to target ?

@purpleidea
Copy link
Author

@amir73il I have pinged some kernel engineers at my company to look into your patch. In the meantime, if you have a moment, could you look into and recommend an algorithm or suggest an improvement to the recursive file watching which I've implemented for mgmt? The code is available here:
https://github.com/purpleidea/mgmt/blob/master/recwatch/recwatch.go#L134

Cheers!

@amir73il
Copy link

amir73il commented Feb 1, 2017

@tiwaana question is moot. I would like to target the earliest kernel version possible, but since this is not a bug fix nor a trivial improvement, some things have to happen first not all of them depend on me, not necessarily in that order:

  1. Technical review of patches (I am working on getting that)
  2. Design review of patched (ditto)
  3. Review of the proposed kernel-user API
  4. Demonstrate a cut and clear benefit to Linux users community
  5. Demonstrate no performance regressions for users not using the feature

@amir73il
Copy link

amir73il commented Feb 1, 2017

@purpleidea thanks for the ping. If your company will show interest in the super block watch, that can be a game changer. wrt your recursive watcher, I am new to golang and have zero knowledge about fsnotify library, but it appears your code is not calling addSubFolders() recursively from Init() more than 1 level of depth, so if you never get events on the direct sub folders you will never add watchers for level 2 subdirs, but I may be missing something. Also I don't see any handling of Move events for dirs, unless it is handled in lib by generating Rename/Create event pair.

@purpleidea
Copy link
Author

purpleidea commented Feb 1, 2017 via email

@isage
Copy link

isage commented Apr 17, 2017

@amir73il is working on a "super block watch" for Linux, providing "the ability to set a single (fanotify) watch on a root directory and get notified on all the legacy inotify events without the need to recursively add watches on all directories." https://lkml.org/lkml/2016/12/20/312

You do know, that fanotify supports recursive watch on (any, even bind) mountpoint with FAN_MARK_MOUNT, right?

@amir73il
Copy link

@isage focus on the part 'all the legacy inotify event', namely, create/move/delete.
an fanotify mount watch does not provide those events.

@pabs3
Copy link

pabs3 commented Sep 25, 2019

Linux fanotify added directory events (move/delete/etc) back in 2017:

https://lwn.net/Articles/717060/

@amir73il
Copy link

@pabs3 I was not aware of any distro that picked up the patch you mentioned,
but actually, Linux did get fanotify directory events back in May.

@nathany, sorry I forgot to update you when the feature got merged upstream:
https://kernelnewbies.org/Linux_5.1#Improved_fanotify_for_better_file_system_monitorization

Man pages were already updated:
http://man7.org/linux/man-pages/man7/fanotify.7.html

On my github, you can find demo conversion of inotifywait tool to use fanotify super block watch instead of a recursive inotify watch:
https://github.com/amir73il/inotify-tools/commits/fanotify_dirent

Please note that at this time, the feature enables user to listen on ALL directory events in the filesystem and any sort of filtering by subtree would have to be implemented in user space.
Implementing subtree filter in kernel is on my roadmap, but cannot promise anything yet.

Let me know if you are interested in using fanotify and if you have any questions.

@nathany
Copy link
Contributor

nathany commented Oct 5, 2019

Other than requiring a newer Linux kernel, is there any disadvantage to using fanotify? Could we detect support for fanotify and fallback to inotify if not available?

Would two or more people be interested in building out a stand-alone fanotify module/package, either in a separate repository or a subfolder of fsnotify? Then we could look at integrating it into fsnotify after that.

@amir73il
Copy link

amir73il commented Oct 6, 2019

To detect support just need to execute fanotify_init(FAN_REPORT_FID, 0).
If you do net get EINVAL you can use the feature.

The disadvatage compared to recursive inotify is that there is no subtree level filterting in the kernel.
When you set a watch by FAN_MARK_FILESYSTEM you get all events on filesystem and need to filter them by path prefix in userspace.

At the moment, directory modification events are NOT supported along with FAN_MARK_MOUNT due to Linux vfs implementation constrains.

@s3rj1k
Copy link

s3rj1k commented Nov 12, 2019

@amir73il any update on this issue?

@amir73il
Copy link

@s3rj1k which updates are you expecting?
There is no timeline or any guaranty that subtree filtering in-kernel will ever be available,
but that shouldn't matter - it's just an optimization.

The way I see it, the kernel code is ready and waiting for volunteers to implement the userspace recursive watcher. I even provided sample C code.

I forgot to mention in the answer to @nathany, that unlike inotify, fanotify requires SYS_CAP_ADMIN. Not sure if that is a problem for fsnotify.

@s3rj1k
Copy link

s3rj1k commented Nov 12, 2019

@amir73il Hi, basic support for fanotify in fsnotify.
I actually need only the FS_MOUNT watcher, as it can be used for recursive watcher.

@s3rj1k
Copy link

s3rj1k commented Nov 22, 2019

@amir73il
Copy link

For the record, the remaining bits of fanotify filesystem watch have been merged to kernel v5.9:
https://kernelnewbies.org/Linux_5.9#Core_.28various.29

Man pages were updated for using modes like FAN_REPORT_DFID_NAME, which most closely resembles the inotify event information:
https://www.man7.org/linux/man-pages/man2/fanotify_init.2.html

@benmccann
Copy link

Btrfs: watching a btrfs subvolume is currently not supported, see EXDEV error in
http://man7.org/linux/man-pages/man2/fanotify_mark.2.html
It's a challenge to fix that, but if there is a requirement I can look into it.

@amir73il I'd like to use fanotify inside a Docker container on a btrfs filesystem. Does this mean that wouldn't work?

I'm not that familiar with how LXC/LXD is implemented on top of btrfs and so don't know if a "btrfs subvolume" would always be used when running Docker on btrfs or only in certain setups...

@benmccann
Copy link

Hmm. I guess the other issue is that using this API within Docker would require granting CAP_SYS_ADMIN, which is discouraged since it's an overloaded permission which grants access to many things. It's a shame it's not able to use some more granular permission.

@amir73il
Copy link

Btrfs: watching a btrfs subvolume is currently not supported, see EXDEV error in
http://man7.org/linux/man-pages/man2/fanotify_mark.2.html
It's a challenge to fix that, but if there is a requirement I can look into it.

@amir73il I'd like to use fanotify inside a Docker container on a btrfs filesystem. Does this mean that wouldn't work?

I'm not that familiar with how LXC/LXD is implemented on top of btrfs and so don't know if a "btrfs subvolume" would always be used when running Docker on btrfs or only in certain setups...

Since kernel v6.8 commit 30ad1938326b ("fanotify: allow "weak" fsid when watching a single filesystem")
inode watches are allowed on btrfs subvolumes, so fsnotifywatch --fanotify --recursive will work,
but I am assuming that you wanted to use fsnotifywatch --filesystem?
that is currently not supported on btrfs subvolumes.

@amir73il
Copy link

Hmm. I guess the other issue is that using this API within Docker would require granting CAP_SYS_ADMIN, which is discouraged since it's an overloaded permission which grants access to many things. It's a shame it's not able to use some more granular permission.

This problem is a bit easier to solve using idmapped mounts.
I have a relatively simple kernel patch to allow setting filesystem marks inside container without the need for global CAP_SYS_ADMIN:
https://github.com/amir73il/linux/commits/fanotify_userns/
The other side of the problem is that open_by_handle_at() requires global CAP_DAC_READ_SEARCH
so we need to make that userns aware as well.
None of this is very controversial I think, but I had other priorities and noone has made assertive requests for this.

@brauner do I remember anything that was holding this back?

@benmccann
Copy link

Thanks for all the details @amir73il!

so fsnotifywatch --fanotify --recursive will work, but I am assuming that you wanted to use fsnotifywatch --filesystem?

I really just need to watch a specific directory. So before stumbling upon this thread, I would have said the former. However, your comment from a few years ago above seemed to suggest always using the latter (#114 (comment)), so I guess it depends on if your advice from back then still holds today.

@amir73il
Copy link

Thanks for all the details @amir73il!

so fsnotifywatch --fanotify --recursive will work, but I am assuming that you wanted to use fsnotifywatch --filesystem?

I really just need to watch a specific directory. So before stumbling upon this thread, I would have said the former. However, your comment from a few years ago above seemed to suggest always using the latter (#114 (comment)), so I guess it depends on if your advice from back then still holds today.

@benmccann I am not sure if my comment is relevant to your use case.
If you want to watch a single directory within container you should be fine with just fsnotifywatch --fanotify


you should also be fine with inotifywatch as there is not that much different in this case
or did you mean that you want to watch a single directory and its recursively?
there are some advantages to watching --filesystem over --recursive, but mostly for very large directory trees
and as you noticed --filesystem does not work on btrfs subvols and currently does not work inside unpriv container, so thats not an options for you

@benmccann
Copy link

I want to watch the directory recursively. Sorry for not making that clear.

I want to use fanotify both because the directory may be large and I'd like to avoid inotify limits and because I'd like to detect file moves and it appears that can be done in a reasonable way with fanotify whereas it is "inherently racy" with inotify (per the man pages)

@amir73il
Copy link

ok, fsnotifywatch --fanotify --recursive will have similar limits, but following renamed files paths may be more reliable.
although please note that fsnotifywatch mostly uses fanotify as inotify drop-in replacement
for example the new FAN_RENAME event which replaces the disjoint FAN_MOVED_FROM/TO events is not watched
not sure if that practically matters to your use case - you will have to try and see

@benmccann
Copy link

fsnotifywatch --fanotify --recursive will have similar limits

Just to be sure I understood correctly, similar limits to --filesystem meaning it does not work inside an unprivileged container? Though it sounds like --recursive does work with btrfs subvols unlike --filesystem (#114 (comment))

@amir73il
Copy link

fsnotifywatch --fanotify --recursive will have similar limits

Just to be sure I understood correctly, similar limits to --filesystem meaning it does not work inside an unprivileged container? Though it sounds like --recursive does work with btrfs subvols unlike --filesystem (#114 (comment))

no I meant --fanotify --recursive have similar scaling limitations as --inotify --recursive
it does not have the limitations of --filesystem for working inside containers, but you need a very recent kernel v6.8 for --fanotify --recursive to work on btrsf subvol

@arp242
Copy link
Member

arp242 commented May 23, 2024

Is any of this related to implementing a fanotify backend in the fsnotify library? I don't want to come off as too much of a curmudgeon, but this is not a generic fanotify discussion thread, and having tons of off-topic stuff rather detracts from the purpose of this issue.

@amir73il
Copy link

Is any of this related to implementing a fanotify backend in the fsnotify library? I don't want to come off as too much of a curmudgeon, but this is not a generic fanotify discussion thread, and having tons of off-topic stuff rather detracts from the purpose of this issue.

the answer is maybe. your question is a bit broad for a yes or no answer.
the relation is that if fsnotify library abstraction was created to reflect the inotify semantics, then the abstractions may need to be enhanced to get the full benefits of fanotify filesystem watch, but if fsnotify library already know how to do filesystems watch with MacOS and Windows then it should be easier to implement a fanotify filesystem watch backend

@benmccann
Copy link

Is any of this related to implementing a fanotify backend in the fsnotify library?

I think it'd probably be helpful when implementing an fanotify backend to at least document some of the limitations such as whether it works on Docker and in what scenarios. It could potentially impact what fanotify APIs are chosen to build on top of as well. In any case, I can take some of this discussion elsewhere. Sorry if this discussion felt like noise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants