From: Amir Goldstein <email@example.com> To: Jan Kara <firstname.lastname@example.org> Cc: Mo Re Ra <email@example.com>, linux-fsdevel <firstname.lastname@example.org> Subject: Re: File monitor problem Date: Wed, 4 Dec 2019 20:37:09 +0200 Message-ID: <CAOQ4uxjda6iQ1D0QEVB18TcrttVpd7uac++WX0xAyLvxz0x7Ew@mail.gmail.com> (raw) In-Reply-To: <20191204173455.GJ8206@quack2.suse.cz> On Wed, Dec 4, 2019 at 7:34 PM Jan Kara <email@example.com> wrote: > > Hello Mohammad, > > On Wed 04-12-19 17:54:48, Mo Re Ra wrote: > > On Wed, Dec 4, 2019 at 4:23 PM Amir Goldstein <firstname.lastname@example.org> wrote: > > > On Wed, Dec 4, 2019 at 12:03 PM Mo Re Ra <email@example.com> wrote: > > > > I don`t know if this is the correct place to express my issue or not. > > > > I have a big problem. For my project, a Directory Monitor, I`ve > > > > researched about dnotify, inotify and fanotify. > > > > dnotify is the worst choice. > > > > inotify is a good choice but has a problem. It does not work > > > > recursively. When you implement this feature by inotify, you would > > > > miss immediately events after subdir creation. > > > > fanotify is the last choice. It has a big change since Kernel 5.1. But > > > > It does not meet my requirement. > > > > > > > > I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM > > > > and CLOSE_WRITE events would be happened in its subdirectories. > > > > Filename of the events happened on that (without any miss) is > > > > mandatory for me. > > > > > > > > I`ve searched and found a contribution from @amiril73 which > > > > unfortunately has not been merged. Here is the link: > > > > https://github.com/amir73il/fsnotify-utils/issues/1 > > > > > > > > I`d really appreciate it If you could resolve this issue. > > > > > > > > > > Hi Mohammad, > > > > > > Thanks for taking an interest in fanotify. > > > > > > Can you please elaborate about why filename in events are mandatory > > > for your application. > > > > > > Could your application use the FID in FAN_DELETE_SELF and > > > FAN_MOVE_SELF events to act on file deletion/rename instead of filename > > > information in FAN_DELETE/FAN_MOVED_xxx events? > > > > > > Will it help if you could get a FAN_CREATE_SELF event with FID information > > > of created file? > > > > > > Note that it is NOT guarantied that your application will be able to resolve > > > those FID to file path, for example if file was already deleted and no open > > > handles for this file exist or if file has a hardlink, you may resolve the path > > > of that hardlink instead. > > > > > > Jan, > > > > > > I remember we discussed the optional FAN_REPORT_FILENAME  and > > > you had some reservations, but I am not sure how strong they were. > > > Please refresh my memory. > > > Hi Jan! Ah, so it was a human engineering issue mostly that concerns you. Let's see if I can argue against that ... > > >  https://github.com/amir73il/linux/commit/d3e2fec74f6814cecb91148e6b9984a56132590f > > > > > Fanotify project had a big change since Kernel 5.1 but did not meet > > some primiry needs. > > For example in my application, I`m watching on a specific directory to > > sync it (through a socket connection and considering some policies) > > with a directory in a remote system which a user working on that. Some > > subdirectoires may contain two milions of files or more. I need these > > two directoires be synced completely as soon as possible without any > > missed notification. > > So, I need a syscall with complete set of flags to help to watch on a > > directory and all of its subdirectories recuresively without any > > missed notification. > > > > Unfortuantely, in current version of Fanotify, the notification just > > expresses a change has been occured in a directory but dot not > > specifiy which file! I could not iterate over millions of file to > > determine which file was that. That would not be helpful. > > The problem is there's no better reliable way. For example even if fanotify > event provided a name as in the Amir's commit you reference, this name is > not very useful. Because by the time your application gets to processing > that fanotify event, the file under that name need not exist anymore, or For DELETE event, file is not expected to exist, the filename in event is always "reliable" (i.e. this name was unlinked). > there may be a different file under that name already. That is my main > objection to providing file names with fanotify events - they are not > reliable but they are reliable enough that application developers will use > them as a reliable thing which then leads to hard to debug bugs. Also > fanotify was never designed to guarantee event ordering so it is impossible > to reconstruct exact state of a directory in userspace just by knowing some > past directory state and then "replaying" changes as reported by fanotify. > > I could imagine fanotify events would provide FID information of the target > file e.g. on create so you could then use that with open_by_handle() to > open the file and get reliable access to file data (provided the file still > exists). However there still remains the problem that you don't know the > file name and the problem that directory changes while you are looking... > IMO, there are two distinct issues you raise: 1. Filenames are not reliable to describe the current state of fs. 2. Application developers may use unreliable information to write bugs. The problem I see with that argument is that for 99% of the cases, filename is events are going to be useful information for app developers and allow for much more efficient implementations. We are punishing the common case for the rare case. The fact that developers can ignore documentation and write bugs should not be a show stopper for proving very useful information which cannot be obtained efficiently otherwise. Even if we decide that we want to provide only FID and let users use open_by_handle_at() to try and resolve that FID to a path, what actually happens in the kernel in the slow path is a readdir on parent looking for the inode - not very efficient way of finding a path. The most efficient way to deliver path information to user IMO, which does not involve ambiguity in face of hardlinks and rename races is to provide: parent FID; child FID; child name The user application needs to: - Open parent by FID - Do name_to_handle_at(parent_fd, child_name) - Compare the child handle with event child FID That will cover the 99% of cases where event does represent the current state and be 100% reliable. In the 1% where one of the steps fail, application needs to fall back to slow path and lookup the file using open_by_handle_at(), do the readdir implementation itself or whatever. About the human engineering factor, I am not sure what to say to that. Your concern is valid, but all we can do is document properly and provide correct demo code. In the end, I think kernel fsnotify events should be handled by an "expert" system daemon (fsnotifyd), similar to MacOS fseventsd that was build on top of kevent. This deamon would be able to handle subscribe request by subtree, as so many people want, and may be able to provide persistent notification streams to some extent. But even this envisioned daemon won't be able to provide the information of an unlinked hardlink (for example) correctly without the filename information in the event. Thoughts? Thanks, Amir.
next prev parent reply index Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-12-04 10:02 Mo Re Ra 2019-12-04 12:53 ` Amir Goldstein 2019-12-04 14:24 ` Mo Re Ra 2019-12-04 17:34 ` Jan Kara 2019-12-04 18:37 ` Amir Goldstein [this message] 2019-12-04 19:02 ` Matthew Wilcox 2019-12-04 20:27 ` Amir Goldstein 2019-12-07 12:36 ` Mo Re Ra
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAOQ4uxjda6iQ1D0QEVB18TcrttVpd7uac++WX0xAyLvxz0x7Ew@mail.gmail.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-Fsdevel Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \ firstname.lastname@example.org public-inbox-index linux-fsdevel Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git