All of lore.kernel.org
 help / color / mirror / Atom feed
* File monitor problem
@ 2019-12-04 10:02 Mo Re Ra
  2019-12-04 12:53 ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Mo Re Ra @ 2019-12-04 10:02 UTC (permalink / raw)
  To: linux-fsdevel

Hi,

I don`t know if this is the correct place to express my issue or not.
I have a big problem. For my project, a Directory Monitor, I`ve
researched about dnotify, inotify and fanotify.
dnotify is the worst choice.
inotify is a good choice but has a problem. It does not work
recursively. When you implement this feature by inotify, you would
miss immediately events after subdir creation.
fanotify is the last choice. It has a big change since Kernel 5.1. But
It does not meet my requirement.

I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM
and CLOSE_WRITE events would be happened in its subdirectories.
Filename of the events happened on that (without any miss) is
mandatory for me.

I`ve searched and found a contribution from @amiril73 which
unfortunately has not been merged. Here is the link:
https://github.com/amir73il/fsnotify-utils/issues/1

I`d really appreciate it If you could resolve this issue.

Regards,
Mohammad Reza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 10:02 File monitor problem Mo Re Ra
@ 2019-12-04 12:53 ` Amir Goldstein
  2019-12-04 14:24   ` Mo Re Ra
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-04 12:53 UTC (permalink / raw)
  To: Mo Re Ra, Jan Kara; +Cc: linux-fsdevel

On Wed, Dec 4, 2019 at 12:03 PM Mo Re Ra <more7.rev@gmail.com> wrote:
>
> Hi,
>
> I don`t know if this is the correct place to express my issue or not.
> I have a big problem. For my project, a Directory Monitor, I`ve
> researched about dnotify, inotify and fanotify.
> dnotify is the worst choice.
> inotify is a good choice but has a problem. It does not work
> recursively. When you implement this feature by inotify, you would
> miss immediately events after subdir creation.
> fanotify is the last choice. It has a big change since Kernel 5.1. But
> It does not meet my requirement.
>
> I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM
> and CLOSE_WRITE events would be happened in its subdirectories.
> Filename of the events happened on that (without any miss) is
> mandatory for me.
>
> I`ve searched and found a contribution from @amiril73 which
> unfortunately has not been merged. Here is the link:
> https://github.com/amir73il/fsnotify-utils/issues/1
>
> I`d really appreciate it If you could resolve this issue.
>

Hi Mohammad,

Thanks for taking an interest in fanotify.

Can you please elaborate about why filename in events are mandatory
for your application.

Could your application use the FID in FAN_DELETE_SELF and
FAN_MOVE_SELF events to act on file deletion/rename instead of filename
information in FAN_DELETE/FAN_MOVED_xxx events?

Will it help if you could get a FAN_CREATE_SELF event with FID information
of created file?

Note that it is NOT guarantied that your application will be able to resolve
those FID to file path, for example if file was already deleted and no open
handles for this file exist or if file has a hardlink, you may resolve the path
of that hardlink instead.

Jan,

I remember we discussed the optional FAN_REPORT_FILENAME [1] and
you had some reservations, but I am not sure how strong they were.
Please refresh my memory.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commit/d3e2fec74f6814cecb91148e6b9984a56132590f

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 12:53 ` Amir Goldstein
@ 2019-12-04 14:24   ` Mo Re Ra
  2019-12-04 17:34     ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Mo Re Ra @ 2019-12-04 14:24 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, linux-fsdevel

On Wed, Dec 4, 2019 at 4:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Dec 4, 2019 at 12:03 PM Mo Re Ra <more7.rev@gmail.com> wrote:
> >
> > Hi,
> >
> > I don`t know if this is the correct place to express my issue or not.
> > I have a big problem. For my project, a Directory Monitor, I`ve
> > researched about dnotify, inotify and fanotify.
> > dnotify is the worst choice.
> > inotify is a good choice but has a problem. It does not work
> > recursively. When you implement this feature by inotify, you would
> > miss immediately events after subdir creation.
> > fanotify is the last choice. It has a big change since Kernel 5.1. But
> > It does not meet my requirement.
> >
> > I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM
> > and CLOSE_WRITE events would be happened in its subdirectories.
> > Filename of the events happened on that (without any miss) is
> > mandatory for me.
> >
> > I`ve searched and found a contribution from @amiril73 which
> > unfortunately has not been merged. Here is the link:
> > https://github.com/amir73il/fsnotify-utils/issues/1
> >
> > I`d really appreciate it If you could resolve this issue.
> >
>
> Hi Mohammad,
>
> Thanks for taking an interest in fanotify.
>
> Can you please elaborate about why filename in events are mandatory
> for your application.
>
> Could your application use the FID in FAN_DELETE_SELF and
> FAN_MOVE_SELF events to act on file deletion/rename instead of filename
> information in FAN_DELETE/FAN_MOVED_xxx events?
>
> Will it help if you could get a FAN_CREATE_SELF event with FID information
> of created file?
>
> Note that it is NOT guarantied that your application will be able to resolve
> those FID to file path, for example if file was already deleted and no open
> handles for this file exist or if file has a hardlink, you may resolve the path
> of that hardlink instead.
>
> Jan,
>
> I remember we discussed the optional FAN_REPORT_FILENAME [1] and
> you had some reservations, but I am not sure how strong they were.
> Please refresh my memory.
>
> Thanks,
> Amir.
>
> [1] https://github.com/amir73il/linux/commit/d3e2fec74f6814cecb91148e6b9984a56132590f



Hi Amir,
Thanks for your attention.

Fanotify project had a big change since Kernel 5.1 but did not meet
some primiry needs.
For example in my application, I`m watching on a specific directory to
sync it (through a socket connection and considering some policies)
with a directory in a remote system which a user working on that. Some
subdirectoires may contain two milions of files or more. I need these
two directoires be synced completely as soon as possible without any
missed notification.
So, I need a syscall with complete set of flags to help to watch on a
directory and all of its subdirectories recuresively without any
missed notification.

Unfortuantely, in current version of Fanotify, the notification just
expresses a change has been occured in a directory but dot not
specifiy which file! I could not iterate over millions of file to
determine which file was that. That would not be helpful.

Inevitably, xxx_SELF would not help me to meets all I need. Just to
clarify, I dont mean xxx_SELF flags are useless. I mean Fanotify is a
good project but the current version of that is not a project which
meets some basic needs.

Thanks,
Mohammad Reza.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 14:24   ` Mo Re Ra
@ 2019-12-04 17:34     ` Jan Kara
  2019-12-04 18:37       ` Amir Goldstein
  2019-12-07 12:36       ` Mo Re Ra
  0 siblings, 2 replies; 25+ messages in thread
From: Jan Kara @ 2019-12-04 17:34 UTC (permalink / raw)
  To: Mo Re Ra; +Cc: Amir Goldstein, Jan Kara, linux-fsdevel

Hello Mohammad,

On Wed 04-12-19 17:54:48, Mo Re Ra wrote:
> On Wed, Dec 4, 2019 at 4:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > On Wed, Dec 4, 2019 at 12:03 PM Mo Re Ra <more7.rev@gmail.com> wrote:
> > > I don`t know if this is the correct place to express my issue or not.
> > > I have a big problem. For my project, a Directory Monitor, I`ve
> > > researched about dnotify, inotify and fanotify.
> > > dnotify is the worst choice.
> > > inotify is a good choice but has a problem. It does not work
> > > recursively. When you implement this feature by inotify, you would
> > > miss immediately events after subdir creation.
> > > fanotify is the last choice. It has a big change since Kernel 5.1. But
> > > It does not meet my requirement.
> > >
> > > I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM
> > > and CLOSE_WRITE events would be happened in its subdirectories.
> > > Filename of the events happened on that (without any miss) is
> > > mandatory for me.
> > >
> > > I`ve searched and found a contribution from @amiril73 which
> > > unfortunately has not been merged. Here is the link:
> > > https://github.com/amir73il/fsnotify-utils/issues/1
> > >
> > > I`d really appreciate it If you could resolve this issue.
> > >
> >
> > Hi Mohammad,
> >
> > Thanks for taking an interest in fanotify.
> >
> > Can you please elaborate about why filename in events are mandatory
> > for your application.
> >
> > Could your application use the FID in FAN_DELETE_SELF and
> > FAN_MOVE_SELF events to act on file deletion/rename instead of filename
> > information in FAN_DELETE/FAN_MOVED_xxx events?
> >
> > Will it help if you could get a FAN_CREATE_SELF event with FID information
> > of created file?
> >
> > Note that it is NOT guarantied that your application will be able to resolve
> > those FID to file path, for example if file was already deleted and no open
> > handles for this file exist or if file has a hardlink, you may resolve the path
> > of that hardlink instead.
> >
> > Jan,
> >
> > I remember we discussed the optional FAN_REPORT_FILENAME [1] and
> > you had some reservations, but I am not sure how strong they were.
> > Please refresh my memory.
> >
> > Thanks,
> > Amir.
> >
> > [1] https://github.com/amir73il/linux/commit/d3e2fec74f6814cecb91148e6b9984a56132590f
> 

> Fanotify project had a big change since Kernel 5.1 but did not meet
> some primiry needs.
> For example in my application, I`m watching on a specific directory to
> sync it (through a socket connection and considering some policies)
> with a directory in a remote system which a user working on that. Some
> subdirectoires may contain two milions of files or more. I need these
> two directoires be synced completely as soon as possible without any
> missed notification.
> So, I need a syscall with complete set of flags to help to watch on a
> directory and all of its subdirectories recuresively without any
> missed notification.
> 
> Unfortuantely, in current version of Fanotify, the notification just
> expresses a change has been occured in a directory but dot not
> specifiy which file! I could not iterate over millions of file to
> determine which file was that. That would not be helpful.

The problem is there's no better reliable way. For example even if fanotify
event provided a name as in the Amir's commit you reference, this name is
not very useful. Because by the time your application gets to processing
that fanotify event, the file under that name need not exist anymore, or
there may be a different file under that name already. That is my main
objection to providing file names with fanotify events - they are not
reliable but they are reliable enough that application developers will use
them as a reliable thing which then leads to hard to debug bugs. Also
fanotify was never designed to guarantee event ordering so it is impossible
to reconstruct exact state of a directory in userspace just by knowing some
past directory state and then "replaying" changes as reported by fanotify.

I could imagine fanotify events would provide FID information of the target
file e.g. on create so you could then use that with open_by_handle() to
open the file and get reliable access to file data (provided the file still
exists). However there still remains the problem that you don't know the
file name and the problem that directory changes while you are looking...

So changing fanotify to suit your usecase requires more than a small tweak.

For what you want, it seems e.g. btrfs send-receive functionality will
provide what you need but then that's bound to a particular filesystem.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 17:34     ` Jan Kara
@ 2019-12-04 18:37       ` Amir Goldstein
  2019-12-04 19:02         ` Matthew Wilcox
  2019-12-07 12:36       ` Mo Re Ra
  1 sibling, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-04 18:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Mo Re Ra, linux-fsdevel

On Wed, Dec 4, 2019 at 7:34 PM Jan Kara <jack@suse.cz> wrote:
>
> Hello Mohammad,
>
> On Wed 04-12-19 17:54:48, Mo Re Ra wrote:
> > On Wed, Dec 4, 2019 at 4:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > On Wed, Dec 4, 2019 at 12:03 PM Mo Re Ra <more7.rev@gmail.com> wrote:
> > > > I don`t know if this is the correct place to express my issue or not.
> > > > I have a big problem. For my project, a Directory Monitor, I`ve
> > > > researched about dnotify, inotify and fanotify.
> > > > dnotify is the worst choice.
> > > > inotify is a good choice but has a problem. It does not work
> > > > recursively. When you implement this feature by inotify, you would
> > > > miss immediately events after subdir creation.
> > > > fanotify is the last choice. It has a big change since Kernel 5.1. But
> > > > It does not meet my requirement.
> > > >
> > > > I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM
> > > > and CLOSE_WRITE events would be happened in its subdirectories.
> > > > Filename of the events happened on that (without any miss) is
> > > > mandatory for me.
> > > >
> > > > I`ve searched and found a contribution from @amiril73 which
> > > > unfortunately has not been merged. Here is the link:
> > > > https://github.com/amir73il/fsnotify-utils/issues/1
> > > >
> > > > I`d really appreciate it If you could resolve this issue.
> > > >
> > >
> > > Hi Mohammad,
> > >
> > > Thanks for taking an interest in fanotify.
> > >
> > > Can you please elaborate about why filename in events are mandatory
> > > for your application.
> > >
> > > Could your application use the FID in FAN_DELETE_SELF and
> > > FAN_MOVE_SELF events to act on file deletion/rename instead of filename
> > > information in FAN_DELETE/FAN_MOVED_xxx events?
> > >
> > > Will it help if you could get a FAN_CREATE_SELF event with FID information
> > > of created file?
> > >
> > > Note that it is NOT guarantied that your application will be able to resolve
> > > those FID to file path, for example if file was already deleted and no open
> > > handles for this file exist or if file has a hardlink, you may resolve the path
> > > of that hardlink instead.
> > >
> > > Jan,
> > >
> > > I remember we discussed the optional FAN_REPORT_FILENAME [1] and
> > > you had some reservations, but I am not sure how strong they were.
> > > Please refresh my memory.
> > >

Hi Jan!

Ah, so it was a human engineering issue mostly that concerns you.
Let's see if I can argue against that ...

> > > [1] https://github.com/amir73il/linux/commit/d3e2fec74f6814cecb91148e6b9984a56132590f
> >
>
> > Fanotify project had a big change since Kernel 5.1 but did not meet
> > some primiry needs.
> > For example in my application, I`m watching on a specific directory to
> > sync it (through a socket connection and considering some policies)
> > with a directory in a remote system which a user working on that. Some
> > subdirectoires may contain two milions of files or more. I need these
> > two directoires be synced completely as soon as possible without any
> > missed notification.
> > So, I need a syscall with complete set of flags to help to watch on a
> > directory and all of its subdirectories recuresively without any
> > missed notification.
> >
> > Unfortuantely, in current version of Fanotify, the notification just
> > expresses a change has been occured in a directory but dot not
> > specifiy which file! I could not iterate over millions of file to
> > determine which file was that. That would not be helpful.
>
> The problem is there's no better reliable way. For example even if fanotify
> event provided a name as in the Amir's commit you reference, this name is
> not very useful. Because by the time your application gets to processing
> that fanotify event, the file under that name need not exist anymore, or

For DELETE event, file is not expected to exist, the filename in event is
always "reliable" (i.e. this name was unlinked).

> there may be a different file under that name already. That is my main
> objection to providing file names with fanotify events - they are not
> reliable but they are reliable enough that application developers will use
> them as a reliable thing which then leads to hard to debug bugs. Also
> fanotify was never designed to guarantee event ordering so it is impossible
> to reconstruct exact state of a directory in userspace just by knowing some
> past directory state and then "replaying" changes as reported by fanotify.
>
> I could imagine fanotify events would provide FID information of the target
> file e.g. on create so you could then use that with open_by_handle() to
> open the file and get reliable access to file data (provided the file still
> exists). However there still remains the problem that you don't know the
> file name and the problem that directory changes while you are looking...
>

IMO, there are two distinct issues you raise:
1. Filenames are not reliable to describe the current state of fs.
2. Application developers may use unreliable information to write bugs.

The problem I see with that argument is that for 99% of the cases,
filename is events are going to be useful information for app developers
and allow for much more efficient implementations.
We are punishing the common case for the rare case.

The fact that developers can ignore documentation and write bugs
should not be a show stopper for proving very useful information which
cannot be obtained efficiently otherwise.

Even if we decide that we want to provide only FID and let users use
open_by_handle_at() to try and resolve that FID to a path, what actually
happens in the kernel in the slow path is a readdir on parent looking for
the inode - not very efficient way of finding a path.

The most efficient way to deliver path information to user IMO, which
does not involve ambiguity in face of hardlinks and rename races is to
provide: parent FID; child FID; child name

The user application needs to:
- Open parent by FID
- Do name_to_handle_at(parent_fd, child_name)
- Compare the child handle with event child FID

That will cover the 99% of cases where event does represent the current
state and be 100% reliable.

In the 1% where one of the steps fail, application needs to fall back to
slow path and lookup the file using open_by_handle_at(), do the
readdir implementation itself or whatever.

About the human engineering factor, I am not sure what to say to that.
Your concern is valid, but all we can do is document properly and provide
correct demo code.

In the end, I think kernel fsnotify events should be handled by an "expert"
system daemon (fsnotifyd), similar to MacOS fseventsd that was build
on top of kevent.  This deamon would be able to handle subscribe
request by subtree, as so many people want, and may be able to provide
persistent notification streams to some extent.

But even this envisioned daemon won't be able to provide the information
of an unlinked hardlink (for example) correctly without the filename
information in the event.

Thoughts?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 18:37       ` Amir Goldstein
@ 2019-12-04 19:02         ` Matthew Wilcox
  2019-12-04 20:27           ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2019-12-04 19:02 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Mo Re Ra, linux-fsdevel

On Wed, Dec 04, 2019 at 08:37:09PM +0200, Amir Goldstein wrote:
> On Wed, Dec 4, 2019 at 7:34 PM Jan Kara <jack@suse.cz> wrote:
> > The problem is there's no better reliable way. For example even if fanotify
> > event provided a name as in the Amir's commit you reference, this name is
> > not very useful. Because by the time your application gets to processing
> > that fanotify event, the file under that name need not exist anymore, or
> 
> For DELETE event, file is not expected to exist, the filename in event is
> always "reliable" (i.e. this name was unlinked).

Jan already pointed out that events may be reordered.  So a CREATE event
and DELETE event may arrive out of order for the same file.  This will
confuse any agent.

> > there may be a different file under that name already. That is my main
> > objection to providing file names with fanotify events - they are not
> > reliable but they are reliable enough that application developers will use
> > them as a reliable thing which then leads to hard to debug bugs. Also
> > fanotify was never designed to guarantee event ordering so it is impossible
> > to reconstruct exact state of a directory in userspace just by knowing some
> > past directory state and then "replaying" changes as reported by fanotify.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 19:02         ` Matthew Wilcox
@ 2019-12-04 20:27           ` Amir Goldstein
  2019-12-11 10:06             ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-04 20:27 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, Mo Re Ra, linux-fsdevel

On Wed, Dec 4, 2019 at 9:02 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Dec 04, 2019 at 08:37:09PM +0200, Amir Goldstein wrote:
> > On Wed, Dec 4, 2019 at 7:34 PM Jan Kara <jack@suse.cz> wrote:
> > > The problem is there's no better reliable way. For example even if fanotify
> > > event provided a name as in the Amir's commit you reference, this name is
> > > not very useful. Because by the time your application gets to processing
> > > that fanotify event, the file under that name need not exist anymore, or
> >
> > For DELETE event, file is not expected to exist, the filename in event is
> > always "reliable" (i.e. this name was unlinked).
>
> Jan already pointed out that events may be reordered.  So a CREATE event
> and DELETE event may arrive out of order for the same file.  This will
> confuse any agent.
>

Right. Re-ordering of events is an issue that needs to be addressed.
But the truth is that events for the same file are not re-ordered, they
are merged (though a DELETE_SELF and CREATE could be re-ordered
because they are not on the same object).

The way to frame this correctly IMO is that fsnotify events let application
know that "something has changed", without any ordering guaranty
beyond "sometime before the event was read".

So far, that "something" can be a file (by fd), an inode (by fid),
more specifically a directory inode (by fid) where in an entry has
changed.

Adding filename info extends that concept to "something has changed
in the namespace at" (by parent fid+name).
All it means is that application should pay attention to that part of
the namespace and perform a lookup to find out what has changed.

Maybe the way to mitigate wrong assumptions about ordering and
existence of the filename in the namespace is to omit the event type
for "filename events", for example: { FAN_CHANGE, pfid, name }.

I know that defining a good API is challenging, but I have no doubt
that the need is real.
If anyone doubts that, please present an alternative that users can
actually use.

Ignoring users demand and leaving Linux filesystem notification API
functionality decades behind Windows and MacOS is not acceptable IMO.
We have made a big step forward with fanotify dirent events, but I think
we will need to make a few more adjustments to the API before we get to
what most users need.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 17:34     ` Jan Kara
  2019-12-04 18:37       ` Amir Goldstein
@ 2019-12-07 12:36       ` Mo Re Ra
  2019-12-10 16:55         ` Jan Kara
  1 sibling, 1 reply; 25+ messages in thread
From: Mo Re Ra @ 2019-12-07 12:36 UTC (permalink / raw)
  To: Jan Kara; +Cc: Amir Goldstein, linux-fsdevel

Hello Jan,

On Wed, Dec 4, 2019 at 9:04 PM Jan Kara <jack@suse.cz> wrote:
>
> Hello Mohammad,
>
> On Wed 04-12-19 17:54:48, Mo Re Ra wrote:
> > On Wed, Dec 4, 2019 at 4:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > On Wed, Dec 4, 2019 at 12:03 PM Mo Re Ra <more7.rev@gmail.com> wrote:
> > > > I don`t know if this is the correct place to express my issue or not.
> > > > I have a big problem. For my project, a Directory Monitor, I`ve
> > > > researched about dnotify, inotify and fanotify.
> > > > dnotify is the worst choice.
> > > > inotify is a good choice but has a problem. It does not work
> > > > recursively. When you implement this feature by inotify, you would
> > > > miss immediately events after subdir creation.
> > > > fanotify is the last choice. It has a big change since Kernel 5.1. But
> > > > It does not meet my requirement.
> > > >
> > > > I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM
> > > > and CLOSE_WRITE events would be happened in its subdirectories.
> > > > Filename of the events happened on that (without any miss) is
> > > > mandatory for me.
> > > >
> > > > I`ve searched and found a contribution from @amiril73 which
> > > > unfortunately has not been merged. Here is the link:
> > > > https://github.com/amir73il/fsnotify-utils/issues/1
> > > >
> > > > I`d really appreciate it If you could resolve this issue.
> > > >
> > >
> > > Hi Mohammad,
> > >
> > > Thanks for taking an interest in fanotify.
> > >
> > > Can you please elaborate about why filename in events are mandatory
> > > for your application.
> > >
> > > Could your application use the FID in FAN_DELETE_SELF and
> > > FAN_MOVE_SELF events to act on file deletion/rename instead of filename
> > > information in FAN_DELETE/FAN_MOVED_xxx events?
> > >
> > > Will it help if you could get a FAN_CREATE_SELF event with FID information
> > > of created file?
> > >
> > > Note that it is NOT guarantied that your application will be able to resolve
> > > those FID to file path, for example if file was already deleted and no open
> > > handles for this file exist or if file has a hardlink, you may resolve the path
> > > of that hardlink instead.
> > >
> > > Jan,
> > >
> > > I remember we discussed the optional FAN_REPORT_FILENAME [1] and
> > > you had some reservations, but I am not sure how strong they were.
> > > Please refresh my memory.
> > >
> > > Thanks,
> > > Amir.
> > >
> > > [1] https://github.com/amir73il/linux/commit/d3e2fec74f6814cecb91148e6b9984a56132590f
> >
>
> > Fanotify project had a big change since Kernel 5.1 but did not meet
> > some primiry needs.
> > For example in my application, I`m watching on a specific directory to
> > sync it (through a socket connection and considering some policies)
> > with a directory in a remote system which a user working on that. Some
> > subdirectoires may contain two milions of files or more. I need these
> > two directoires be synced completely as soon as possible without any
> > missed notification.
> > So, I need a syscall with complete set of flags to help to watch on a
> > directory and all of its subdirectories recuresively without any
> > missed notification.
> >
> > Unfortuantely, in current version of Fanotify, the notification just
> > expresses a change has been occured in a directory but dot not
> > specifiy which file! I could not iterate over millions of file to
> > determine which file was that. That would not be helpful.
>
> The problem is there's no better reliable way. For example even if fanotify
> event provided a name as in the Amir's commit you reference, this name is
> not very useful. Because by the time your application gets to processing
> that fanotify event, the file under that name need not exist anymore, or
> there may be a different file under that name already. That is my main
> objection to providing file names with fanotify events - they are not
> reliable but they are reliable enough that application developers will use
> them as a reliable thing which then leads to hard to debug bugs. Also
> fanotify was never designed to guarantee event ordering so it is impossible
> to reconstruct exact state of a directory in userspace just by knowing some
> past directory state and then "replaying" changes as reported by fanotify.
>
> I could imagine fanotify events would provide FID information of the target
> file e.g. on create so you could then use that with open_by_handle() to
> open the file and get reliable access to file data (provided the file still
> exists). However there still remains the problem that you don't know the
> file name and the problem that directory changes while you are looking...
>
> So changing fanotify to suit your usecase requires more than a small tweak.
>
> For what you want, it seems e.g. btrfs send-receive functionality will
> provide what you need but then that's bound to a particular filesystem.
>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

I understand your concerns about reliablity. But I think functionality
and reliablity are two different things in this case. We`d better
entrust the reliability to the user.
Consider a user just want monitor all of filesystem changes but does
not intend to do anything according the received notifications.
I think we do not make decision for users by restricting them and
ignoring their necessary demands. We shuold introduce the best
available tools with all of concerns about them (which are
documented). So, we would put the user in charge of organizing his
projects. The user may care or not according his demands.
To sum up, I think Fanotify is the best available tools. But we are
really looking forward to addressing the mentioned concerns or
introducing a better alternative.

Thanks,
Mohammad Reza.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-07 12:36       ` Mo Re Ra
@ 2019-12-10 16:55         ` Jan Kara
  2019-12-10 20:49           ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2019-12-10 16:55 UTC (permalink / raw)
  To: Mo Re Ra; +Cc: Jan Kara, Amir Goldstein, linux-fsdevel

Hello Mohammad!

On Sat 07-12-19 16:06:41, Mo Re Ra wrote:
> On Wed, Dec 4, 2019 at 9:04 PM Jan Kara <jack@suse.cz> wrote:
> >
> > Hello Mohammad,
> >
> > On Wed 04-12-19 17:54:48, Mo Re Ra wrote:
> > > On Wed, Dec 4, 2019 at 4:23 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > On Wed, Dec 4, 2019 at 12:03 PM Mo Re Ra <more7.rev@gmail.com> wrote:
> > > > > I don`t know if this is the correct place to express my issue or not.
> > > > > I have a big problem. For my project, a Directory Monitor, I`ve
> > > > > researched about dnotify, inotify and fanotify.
> > > > > dnotify is the worst choice.
> > > > > inotify is a good choice but has a problem. It does not work
> > > > > recursively. When you implement this feature by inotify, you would
> > > > > miss immediately events after subdir creation.
> > > > > fanotify is the last choice. It has a big change since Kernel 5.1. But
> > > > > It does not meet my requirement.
> > > > >
> > > > > I need to monitor a directory with CREATE, DELETE, MOVE_TO, MOVE_FROM
> > > > > and CLOSE_WRITE events would be happened in its subdirectories.
> > > > > Filename of the events happened on that (without any miss) is
> > > > > mandatory for me.
> > > > >
> > > > > I`ve searched and found a contribution from @amiril73 which
> > > > > unfortunately has not been merged. Here is the link:
> > > > > https://github.com/amir73il/fsnotify-utils/issues/1
> > > > >
> > > > > I`d really appreciate it If you could resolve this issue.
> > > > >
> > > >
> > > > Hi Mohammad,
> > > >
> > > > Thanks for taking an interest in fanotify.
> > > >
> > > > Can you please elaborate about why filename in events are mandatory
> > > > for your application.
> > > >
> > > > Could your application use the FID in FAN_DELETE_SELF and
> > > > FAN_MOVE_SELF events to act on file deletion/rename instead of filename
> > > > information in FAN_DELETE/FAN_MOVED_xxx events?
> > > >
> > > > Will it help if you could get a FAN_CREATE_SELF event with FID information
> > > > of created file?
> > > >
> > > > Note that it is NOT guarantied that your application will be able to resolve
> > > > those FID to file path, for example if file was already deleted and no open
> > > > handles for this file exist or if file has a hardlink, you may resolve the path
> > > > of that hardlink instead.
> > > >
> > > > Jan,
> > > >
> > > > I remember we discussed the optional FAN_REPORT_FILENAME [1] and
> > > > you had some reservations, but I am not sure how strong they were.
> > > > Please refresh my memory.
> > > >
> > > > Thanks,
> > > > Amir.
> > > >
> > > > [1] https://github.com/amir73il/linux/commit/d3e2fec74f6814cecb91148e6b9984a56132590f
> > >
> >
> > > Fanotify project had a big change since Kernel 5.1 but did not meet
> > > some primiry needs.
> > > For example in my application, I`m watching on a specific directory to
> > > sync it (through a socket connection and considering some policies)
> > > with a directory in a remote system which a user working on that. Some
> > > subdirectoires may contain two milions of files or more. I need these
> > > two directoires be synced completely as soon as possible without any
> > > missed notification.
> > > So, I need a syscall with complete set of flags to help to watch on a
> > > directory and all of its subdirectories recuresively without any
> > > missed notification.
> > >
> > > Unfortuantely, in current version of Fanotify, the notification just
> > > expresses a change has been occured in a directory but dot not
> > > specifiy which file! I could not iterate over millions of file to
> > > determine which file was that. That would not be helpful.
> >
> > The problem is there's no better reliable way. For example even if fanotify
> > event provided a name as in the Amir's commit you reference, this name is
> > not very useful. Because by the time your application gets to processing
> > that fanotify event, the file under that name need not exist anymore, or
> > there may be a different file under that name already. That is my main
> > objection to providing file names with fanotify events - they are not
> > reliable but they are reliable enough that application developers will use
> > them as a reliable thing which then leads to hard to debug bugs. Also
> > fanotify was never designed to guarantee event ordering so it is impossible
> > to reconstruct exact state of a directory in userspace just by knowing some
> > past directory state and then "replaying" changes as reported by fanotify.
> >
> > I could imagine fanotify events would provide FID information of the target
> > file e.g. on create so you could then use that with open_by_handle() to
> > open the file and get reliable access to file data (provided the file still
> > exists). However there still remains the problem that you don't know the
> > file name and the problem that directory changes while you are looking...
> >
> > So changing fanotify to suit your usecase requires more than a small tweak.
> >
> > For what you want, it seems e.g. btrfs send-receive functionality will
> > provide what you need but then that's bound to a particular filesystem.
> >
> >                                                                 Honza
> > --
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> 
> I understand your concerns about reliablity. But I think functionality
> and reliablity are two different things in this case. We`d better
> entrust the reliability to the user.
> Consider a user just want monitor all of filesystem changes but does
> not intend to do anything according the received notifications.
> I think we do not make decision for users by restricting them and
> ignoring their necessary demands. We shuold introduce the best
> available tools with all of concerns about them (which are
> documented). So, we would put the user in charge of organizing his
> projects. The user may care or not according his demands.

I disgree. This is not how API design works in the Linux kernel. First, you
have to have a good and sound use case for the functionality (and I
understand and acknowledge your need to monitor a large directory and
reliably synchronize changes to another place) and then we try to implement
API that would fulfil the needs of the usecase. For you, extending fanotify
with file names will *not* fulfil the needs of your use case. The mechanism
will be as racy as with inotify, just with somewhat smaller set of races.
I'm not willing to introduce another file change notification API to the
kernel that works with 99% reliability. We've been through that with
inotify and it just adds maintenance burden and shifts the problem couple
years down the road before people start hitting the races with the new API.

And we don't add new APIs to the kernel just because someone could find
some use for them. Because APIs in the kernel are really costly in terms of
maintenance. Once you introduce userspace facing API, you have to maintain
it basically forever which constrains future development and adds needs for
compatibility layers etc.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-10 16:55         ` Jan Kara
@ 2019-12-10 20:49           ` Amir Goldstein
  2019-12-11 22:06             ` Wez Furlong
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-10 20:49 UTC (permalink / raw)
  To: Jan Kara; +Cc: Mo Re Ra, linux-fsdevel, Wez Furlong

[cc: Watchman maintainer]

> > > I could imagine fanotify events would provide FID information of the target
> > > file e.g. on create so you could then use that with open_by_handle() to
> > > open the file and get reliable access to file data (provided the file still
> > > exists). However there still remains the problem that you don't know the
> > > file name and the problem that directory changes while you are looking...
> > >
> > > So changing fanotify to suit your usecase requires more than a small tweak.
> > >
> > > For what you want, it seems e.g. btrfs send-receive functionality will
> > > provide what you need but then that's bound to a particular filesystem.
> > >
> > >                                                                 Honza
> > > --
> > > Jan Kara <jack@suse.com>
> > > SUSE Labs, CR
> >
> > I understand your concerns about reliablity. But I think functionality
> > and reliablity are two different things in this case. We`d better
> > entrust the reliability to the user.
> > Consider a user just want monitor all of filesystem changes but does
> > not intend to do anything according the received notifications.
> > I think we do not make decision for users by restricting them and
> > ignoring their necessary demands. We shuold introduce the best
> > available tools with all of concerns about them (which are
> > documented). So, we would put the user in charge of organizing his
> > projects. The user may care or not according his demands.
>
> I disgree. This is not how API design works in the Linux kernel. First, you
> have to have a good and sound use case for the functionality (and I
> understand and acknowledge your need to monitor a large directory and
> reliably synchronize changes to another place) and then we try to implement
> API that would fulfil the needs of the usecase.

For the record, although I am the author of filename patches and represent
users that use them, I myself am not fully convinced that we need to
extend the API much further. For the past few months, I have been trying
to convert our in-house filesystem monitor to work without filename in events.
I haven't yet been able to prove (for performance of all interesting workloads)
that more information in events is not needed, but haven't been able to prove
that it is not needed either. CREATE_SELF events are needed for functionality.

I have also been looking at other filesystem monitor implementations to
see if they could be converted to fanotify without any extra information
in events. I mostly focused on Watchman, which looks like the most
promising open source filesystem monitor implementation around.
It was hard for me to figure out myself if Watchman can benefit from
new fanotify API and what it is missing from the new API.

I have already implemented unprivileged fanotify (this was posted
first even before FAN_REPORT_FID), but looking for a way to demo
its usefulness - how it can avoid races compared to inotify.

One way I am considering to tackle the missing information is to
provide  unprivileged access to open_by_handle_at(2) -
Currently, this syscall requires CAP_DAC_READ_SEARCH, because
it can open files without having search access to ancestor directories.

My idea is that if process has no CAP_DAC_READ_SEARCH, then
mountfd argument will be assumed to be the direct parent of the file.
Search access will be verified on mountfd and then a restrictive
acceptable() callback will make sure that only dentry whose parent
is mountfd is decoded. Alternatively a new syscall could be used.

A special variant of exportfs_decode_fh() would be used that take
the parent as argument instead of getting parent from
s_export_op->fh_to_parent() or s_export_op->get_parent().

The end result would be that events could report parent fid and child fid.
If monitor application is watching a single directory or has a map of watched
directories (like inotifywatch does), then child could be found by handle -
as long as the file is still inside the watched directory. If there is a single
hardlink in the directory, the child name would be non ambiguous.

Child fid with FAN_DELETE/FAN_MOVE_FROM would only be useful
if monitor application keeps a map of the files in every watched directory
(I believe Watchman does anyway).

Jan, does that sound like something that would address your concerns?

Does that sound like and API that would provide an added value to users?

Am I missing anything?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-04 20:27           ` Amir Goldstein
@ 2019-12-11 10:06             ` Jan Kara
  2019-12-11 13:58               ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2019-12-11 10:06 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Matthew Wilcox, Jan Kara, Mo Re Ra, linux-fsdevel

On Wed 04-12-19 22:27:31, Amir Goldstein wrote:
> On Wed, Dec 4, 2019 at 9:02 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Wed, Dec 04, 2019 at 08:37:09PM +0200, Amir Goldstein wrote:
> > > On Wed, Dec 4, 2019 at 7:34 PM Jan Kara <jack@suse.cz> wrote:
> > > > The problem is there's no better reliable way. For example even if fanotify
> > > > event provided a name as in the Amir's commit you reference, this name is
> > > > not very useful. Because by the time your application gets to processing
> > > > that fanotify event, the file under that name need not exist anymore, or
> > >
> > > For DELETE event, file is not expected to exist, the filename in event is
> > > always "reliable" (i.e. this name was unlinked).
> >
> > Jan already pointed out that events may be reordered.  So a CREATE event
> > and DELETE event may arrive out of order for the same file.  This will
> > confuse any agent.
> >
> 
> Right. Re-ordering of events is an issue that needs to be addressed.
> But the truth is that events for the same file are not re-ordered, they
> are merged (though a DELETE_SELF and CREATE could be re-ordered
> because they are not on the same object).

Well, once you start mixing cross-directory rename into the game, things
get even more interesting. Currently, rename events can get reordered
basically arbitrarily against any other operation on that directory because
fsnotify_move() calls happen outside of any locks.

> The way to frame this correctly IMO is that fsnotify events let application
> know that "something has changed", without any ordering guaranty
> beyond "sometime before the event was read".
> 
> So far, that "something" can be a file (by fd), an inode (by fid),
> more specifically a directory inode (by fid) where in an entry has
> changed.
> 
> Adding filename info extends that concept to "something has changed
> in the namespace at" (by parent fid+name).
> All it means is that application should pay attention to that part of
> the namespace and perform a lookup to find out what has changed.
>
> Maybe the way to mitigate wrong assumptions about ordering and
> existence of the filename in the namespace is to omit the event type
> for "filename events", for example: { FAN_CHANGE, pfid, name }.

So this event would effectively mean: In directory pfid, some filename
event has happened with name "name" - i.e. "name" was created (could mean
also mkdir), deleted, moved. Am I right? And the application would then
open_by_handle(2) + open_at(2) + fstat(2) the object pointed to by
(pfid, name) pair and copy whatever it finds to the other end (or delete on
the other end in case of ENOENT)?

After some thought, yes, I think this is difficult to misuse (or infer some
false guarantees out of it). As far as I was thinking it also seems good
enough to implement more efficient syncing of directories. Mohammad, would
this kind of event be enough for your needs? Frankly, I'd like to see a
sample program (say dir-tree-sync) that uses this event before merging the
kernel change so that we can verify that indeed this event is usable for
practical purposes in a race-free way...

> I know that defining a good API is challenging, but I have no doubt
> that the need is real.
> If anyone doubts that, please present an alternative that users can
> actually use.

I agree the need is real when synchronizing large directories. For smaller
directories, readdir + stat is fine but with growing directories this gets
pretty inefficient. btrfs send+receive is actually a nice solution to this
(and is reliable because it is effectively just a comparison of two COW
snapshots - i.e., two static things) but then it has also downsides (in
particular that it is fs specific).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-11 10:06             ` Jan Kara
@ 2019-12-11 13:58               ` Amir Goldstein
  2019-12-16 15:00                 ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-11 13:58 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong

On Wed, Dec 11, 2019 at 12:06 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 04-12-19 22:27:31, Amir Goldstein wrote:
[...]
> > The way to frame this correctly IMO is that fsnotify events let application
> > know that "something has changed", without any ordering guaranty
> > beyond "sometime before the event was read".
> >
> > So far, that "something" can be a file (by fd), an inode (by fid),
> > more specifically a directory inode (by fid) where in an entry has
> > changed.
> >
> > Adding filename info extends that concept to "something has changed
> > in the namespace at" (by parent fid+name).
> > All it means is that application should pay attention to that part of
> > the namespace and perform a lookup to find out what has changed.
> >
> > Maybe the way to mitigate wrong assumptions about ordering and
> > existence of the filename in the namespace is to omit the event type
> > for "filename events", for example: { FAN_CHANGE, pfid, name }.
>
> So this event would effectively mean: In directory pfid, some filename
> event has happened with name "name" - i.e. "name" was created (could mean
> also mkdir), deleted, moved. Am I right?

Exactly.

> And the application would then
> open_by_handle(2) + open_at(2) + fstat(2) the object pointed to by

open_by_handle(2) + fstatat(2) to be exact.

> (pfid, name) pair and copy whatever it finds to the other end (or delete on
> the other end in case of ENOENT)?

Basically, yes.
Although a modern sync tool may also keep some local map of
remote name -> local fid, to detect a local rename and try to perform a
remote rename.

>
> After some thought, yes, I think this is difficult to misuse (or infer some
> false guarantees out of it). As far as I was thinking it also seems good
> enough to implement more efficient syncing of directories.

Great, so I will work on the patches.

> Mohammad, would
> this kind of event be enough for your needs? Frankly, I'd like to see a
> sample program (say dir-tree-sync) that uses this event before merging the
> kernel change so that we can verify that indeed this event is usable for
> practical purposes in a race-free way...
>

I will prepare demo code for the new API based on inotifywatch.
Mohammad, if you like you could use the demo code to present a sync tool.
I am hoping to be able to integrate the new API with Watchman as a demo.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-10 20:49           ` Amir Goldstein
@ 2019-12-11 22:06             ` Wez Furlong
  2019-12-12  5:56               ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Wez Furlong @ 2019-12-11 22:06 UTC (permalink / raw)
  To: Amir Goldstein, Jan Kara; +Cc: Mo Re Ra, linux-fsdevel

On 12/10/19 12:49, Amir Goldstein wrote:

> [cc: Watchman maintainer]

Hi, I'm the Watchman creator and maintainer, and I also work on a FUSE 
based virtual filesystem called EdenFS that works with the source 
control systems that we use at Facebook.

I don't have much context on fanotify yet, but I do have a lot of 
practical experience with Watchman on various operating systems with 
very large recursive directory trees.

Amir asked me to participate in this discussion, and I think it's 
probably helpful to give a little bit of context on how we deal with 
some of the different watcher interfaces, and also how we see the 
consumers of Watchman making use of this sort of data.  There are tens 
of watchman consuming applications in common use inside FB, and a long 
tail of ad-hoc consumers that are not on my radar.

I don't want to flood you with data that may not feel relevant so I'm 
going to try to summarize some key points; I'd be happy to elaborate if 
you'd like more context!  These are written out as numbered statements 
to make it easier to reference in further discussion, and are not 
intended to be taken as any kind of prescriptive manifesto!

1. Humans think in terms of filenames.  Applications largely only care 
about filenames.  It's rare (it came up as a hypothetical for only one 
integrating application at FB in the past several years) that they care 
about optimizing for the various rename cases so long as they get 
notified that the old name is no longer visible in the filesystem and 
that a new name is now visible elsewhere in the portion of the 
filesystem that they are watching.

2. Application authors don't want to deal with the complexities of file 
watching, they just want to reliably know if/when a named file has 
changed.  Rename cookies and overflow events are too difficult for most 
applications to deal with at all/correctly.

3. Overflow events are painful to deal with.  In Watchman we deal with 
inotify overflow by re-crawling and examining the directory structure to 
re-synchronize with the filesystem state.  For very large trees this can 
take a very long time.

4. Partially related to 3., restarting the watchman server is an 
expensive event because we have to re-crawl everything to re-create the 
directory watches with inotify.  If the system provided a recursive 
watch function and some kind of a change journal that told watchman a 
set of N directories to crawl (where N < the M overall number of 
directories) and we had a stable identifier for files, then we could 
persist state across restarts and cheaply re-synchronize.

5. Is also related to 3. and 4.  We use btrfs subvolumes in our CI to 
snapshot large repos and make them available to jobs running in 
different containers potentially on different hosts.  If the journal 
mechanism from 4. were available in this situation it would make it 
super cheap to bring up watchman in those environments.

6. A downside to recursive watches on macOS is that fseventsd has very 
limited ability to add exceptions.  A common pattern at FB is that the 
buck build system maintains a build artifacts directory called 
`buck-out` in the repo.  On Linux we can ignore change notifications for 
this directory with zero cost by simply not registering it with 
inotify.  On macOS, the kernel interface allows for a maximum of 8 
exclusions.  The rest of the changes are delivered to fseventsd which 
stats and records everything in a sqlite database.  This is a 
performance hotspot for us because the number of excluded directories in 
a repo exceeds 8, and the uninteresting bulky build artifact writes then 
need to transit fseventsd and into watchman before we can decide to 
ignore them.

7. Windows has a journal mechanism that could potentially be used as 
suggested in 4. above, but it requires privileged access.  I happen to 
know from someone at MS that worked on a similar system that there is 
also a way to access a subset of this data that doesn't require 
privileged access, but that isn't documented.  I mention this because 
elsewhere in this thread is a discussion about privileged access to 
similar sounding information.

8. Related to 6. and 7., if there is a privileged system daemon to act 
as the interface between userspace<->kernel, care needs to be taken to 
avoid the sort of performance hotspot we see on macOS with 6. above.


OK, hopefully that doesn't feel too off the mark!  I don't think 
everything above needs to be handled directly at the kernel interface.  
Some of these details could be handled on the userspace side, either by 
a daemon (eg: watchman) or a suitably well designed client library 
(although that can make it difficult to consume in some programming 
environments).


--Wez


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-11 22:06             ` Wez Furlong
@ 2019-12-12  5:56               ` Amir Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2019-12-12  5:56 UTC (permalink / raw)
  To: Wez Furlong; +Cc: Jan Kara, Mo Re Ra, linux-fsdevel

On Thu, Dec 12, 2019 at 12:06 AM Wez Furlong <wez@fb.com> wrote:
>
> On 12/10/19 12:49, Amir Goldstein wrote:
>
> > [cc: Watchman maintainer]
>
> Hi, I'm the Watchman creator and maintainer, and I also work on a FUSE
> based virtual filesystem called EdenFS that works with the source
> control systems that we use at Facebook.
>

Hi Wez!

Thank you for joining.

> I don't have much context on fanotify yet, but I do have a lot of
> practical experience with Watchman on various operating systems with
> very large recursive directory trees.
>
>
> Amir asked me to participate in this discussion, and I think it's
> probably helpful to give a little bit of context on how we deal with
> some of the different watcher interfaces, and also how we see the
> consumers of Watchman making use of this sort of data.  There are tens
> of watchman consuming applications in common use inside FB, and a long
> tail of ad-hoc consumers that are not on my radar.
>
> I don't want to flood you with data that may not feel relevant so I'm
> going to try to summarize some key points; I'd be happy to elaborate if
> you'd like more context!  These are written out as numbered statements
> to make it easier to reference in further discussion, and are not
> intended to be taken as any kind of prescriptive manifesto!
>

I will try to explain how fanotify may be able to address some of these
points. Many times, new kernel APIs are written years before the actual
users start using them and sometimes when that time comes, the actual
users find that the new API doesn't quite fit their needs. My hope is
that with this early discussion, we will be able to avoid this pattern.

> 1. Humans think in terms of filenames.  Applications largely only care
> about filenames.  It's rare (it came up as a hypothetical for only one
> integrating application at FB in the past several years) that they care
> about optimizing for the various rename cases so long as they get
> notified that the old name is no longer visible in the filesystem and
> that a new name is now visible elsewhere in the portion of the
> filesystem that they are watching.
>

The current plan is to add a new event that notifies on namepsace
changes. At the moment there is no intention to distinguish between
name made visible and name made invisible. That is something that
application can easily figure out with fstatat(2) and avoid rename cookies
etc.

The information in the event will be the parent's fid (NFS file handle)
and name of child.
For a privileged super block watch, parent fid can be resolved to current
directory path (as opposed to path at the time of the change).
For unprivileged directory watches, the parent fid is the key by which
the monitoring app identifies a "watch" (like the inotify wd).

File modifications can now (since kernel 5.1) be reported with the file's
fid (FAN_REPORT_FID). It is a bit more challenging to resolve a
file's fid to its current path, especially by an unprivileged directory
watcher. My plan is to provide a way to report parent fid + name
also for file modifications.

It that doesn't sound right, please shout.

If you are asking yourself, what do I gain from replacing inotify with
unprivileged fanotify, you are not alone.
I cannot sell this as a major functionality improvement over inotify for
unprivileged users.

A minor improvement is that if watchman process can be granted
CAP_DAC_READ_SEARCH, then handling a watched directory rename
on FAN_MOVE_SELF would be less racy and far simpler than the
current rename cookie dance.

The major functionality improvements to end users will have to be
mediated by a privileged service.

Did you ever consider making Watchman a system service?
After all, if several users on the same machine are watching the same
directory tree than spawning multiple watchman processes, each one
indexing the same tree is not as efficient.
I realize that that creates a security issue - no idea how to solve it.

> 2. Application authors don't want to deal with the complexities of file
> watching, they just want to reliably know if/when a named file has
> changed.  Rename cookies and overflow events are too difficult for most
> applications to deal with at all/correctly.
>

...and the fanotify man page gets harder and harder to follow...
I believe that for advanced filesystem monitoring, the kernel provided API
would be better used by expert mediator library/service.

> 3. Overflow events are painful to deal with.  In Watchman we deal with
> inotify overflow by re-crawling and examining the directory structure to
> re-synchronize with the filesystem state.  For very large trees this can
> take a very long time.
>

The new FAN_MARK_FILESYSTEM feature partially solves that.
I wrote a "global watch" demo using inotifywait to show how.
It requires privileged user.

> 4. Partially related to 3., restarting the watchman server is an
> expensive event because we have to re-crawl everything to re-create the
> directory watches with inotify.  If the system provided a recursive
> watch function and some kind of a change journal that told watchman a
> set of N directories to crawl (where N < the M overall number of
> directories) and we had a stable identifier for files, then we could
> persist state across restarts and cheaply re-synchronize.
>

See: https://lwn.net/Articles/755277/
We do that in CTERA using my other project, Overlayfs snapshots:
https://lwn.net/Articles/719772/
An overlay snapshot maintains a persistent "changed subtree" since the
last crawl started.
The current crawler consults the "changed subtree" and skips
unmodified subtrees.
An fanotify super block watch notifies on all changes since this crawl started.

I have long wanted to publish a working demo with git fsmonitor, but
never got to it. This also requires privileged user as well as some other
limitations, so its not a general purpose solution. We can discuss this
solution in another thread.

> 5. Is also related to 3. and 4.  We use btrfs subvolumes in our CI to
> snapshot large repos and make them available to jobs running in
> different containers potentially on different hosts.  If the journal
> mechanism from 4. were available in this situation it would make it
> super cheap to bring up watchman in those environments.
>

I suppose you use btrfs send/recv to transfer the subvolume snapshots.
While those are block level snapshots, I believe btrfs has enough metadata
information to be able to link all changed blocks back to inodes and inodes
back to parent dirs, so in theory, a btrfs tool can be written to compose the
"changed subtree" from a subvolume diff.

> 6. A downside to recursive watches on macOS is that fseventsd has very
> limited ability to add exceptions.  A common pattern at FB is that the
> buck build system maintains a build artifacts directory called
> `buck-out` in the repo.  On Linux we can ignore change notifications for
> this directory with zero cost by simply not registering it with
> inotify.  On macOS, the kernel interface allows for a maximum of 8
> exclusions.  The rest of the changes are delivered to fseventsd which
> stats and records everything in a sqlite database.  This is a
> performance hotspot for us because the number of excluded directories in
> a repo exceeds 8, and the uninteresting bulky build artifact writes then
> need to transit fseventsd and into watchman before we can decide to
> ignore them.
>

fanotify has "ignore masks" that could be used to some extent.
For example, FAN_MARK_FILESYSTEM watch with FAN_CREATE mask
and FAN_MARK_INODE watch on `buck-out` directory with FAN_CREATE
in ignore mask.

> 7. Windows has a journal mechanism that could potentially be used as
> suggested in 4. above, but it requires privileged access.  I happen to
> know from someone at MS that worked on a similar system that there is
> also a way to access a subset of this data that doesn't require
> privileged access, but that isn't documented.  I mention this because
> elsewhere in this thread is a discussion about privileged access to
> similar sounding information.
>

Persistent change journal for Linux is a long term goal of mine.
It will require some support at filesystem level.
I see no reason for the kernel to provide unprivileged access to this sort
of information. IMO, it should be the job of a system service to hand out
filtered information to unprivileged users, according to their credentials.

> 8. Related to 6. and 7., if there is a privileged system daemon to act
> as the interface between userspace<->kernel, care needs to be taken to
> avoid the sort of performance hotspot we see on macOS with 6. above.
>

Some people use FAN_MARK_MOUNT to filter events on file modifications
by path, but that only works for modified files, not for create/delete/rename.
I spent a lot of time trying to figure out an efficient way to filter events
by subtree in the kernel. No success yet.

> OK, hopefully that doesn't feel too off the mark!  I don't think
> everything above needs to be handled directly at the kernel interface.
> Some of these details could be handled on the userspace side, either by
> a daemon (eg: watchman) or a suitably well designed client library
> (although that can make it difficult to consume in some programming
> environments).
>

Thank you Wez for this excellent summary.
Hopefully, I did not overwhelm you with too much information.
I will keep you posted when I have a POC of the new API.

Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-11 13:58               ` Amir Goldstein
@ 2019-12-16 15:00                 ` Amir Goldstein
  2019-12-19  7:33                   ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-16 15:00 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, linux-api

[cc: linux-api]

On Wed, Dec 11, 2019 at 3:58 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Dec 11, 2019 at 12:06 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 04-12-19 22:27:31, Amir Goldstein wrote:
> [...]
> > > The way to frame this correctly IMO is that fsnotify events let application
> > > know that "something has changed", without any ordering guaranty
> > > beyond "sometime before the event was read".
> > >
> > > So far, that "something" can be a file (by fd), an inode (by fid),
> > > more specifically a directory inode (by fid) where in an entry has
> > > changed.
> > >
> > > Adding filename info extends that concept to "something has changed
> > > in the namespace at" (by parent fid+name).
> > > All it means is that application should pay attention to that part of
> > > the namespace and perform a lookup to find out what has changed.
> > >
> > > Maybe the way to mitigate wrong assumptions about ordering and
> > > existence of the filename in the namespace is to omit the event type
> > > for "filename events", for example: { FAN_CHANGE, pfid, name }.
> >
> > So this event would effectively mean: In directory pfid, some filename
> > event has happened with name "name" - i.e. "name" was created (could mean
> > also mkdir), deleted, moved. Am I right?
>
> Exactly.
>
> > And the application would then
> > open_by_handle(2) + open_at(2) + fstat(2) the object pointed to by
>
> open_by_handle(2) + fstatat(2) to be exact.
>
> > (pfid, name) pair and copy whatever it finds to the other end (or delete on
> > the other end in case of ENOENT)?
>
> Basically, yes.
> Although a modern sync tool may also keep some local map of
> remote name -> local fid, to detect a local rename and try to perform a
> remote rename.
>
> >
> > After some thought, yes, I think this is difficult to misuse (or infer some
> > false guarantees out of it). As far as I was thinking it also seems good
> > enough to implement more efficient syncing of directories.
>
> Great, so I will work on the patches.
>

Hi Jan,

I have something working.

Patches:
https://github.com/amir73il/linux/commits/fanotify_name

Simple test:
https://github.com/amir73il/ltp/commits/fanotify_name

I will post the patches after I have a working demo, but in the mean while here
is the gist of the API from the commit log in case you or anyone has comments
on the API.

Note that in the new event flavor, event mask is given as input
(e.g. FAN_CREATE) to filter the type of reported events, but
the event types are hidden when event is reported.

Besides the dirent event types, events "on child" (i.e. MODIFY) can also be
reported with name to a directory watcher.

For now, "on child" events cannot be requested for filesystem/mount
watch, but I think we should consider this possibility so I added
a check to return EINVAL if this combination is attempted.

Let me know what you think.

Thanks,
Amir.

commit 91e0af27ac329f279167e74761fb5303ebbc1c08
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Mon Dec 16 08:39:21 2019 +0200

    fanotify: report name info with FAN_REPORT_FID_NAME

    With init flags FAN_REPORT_FID_NAME, report events with name in variable
    length fanotify_event_info record similar to how fid's are reported.
    When events are reported with name, the reported fid identifies the
    directory and the name follows the fid. The info record type for this
    event info is FAN_EVENT_INFO_TYPE_FID_NAME.

    There are several ways that an application can use this information:

    1. When watching a single directory, the name is always relative to
    the watched directory, so application need to fstatat(2) the name
    relative to the watched directory.

    2. When watching a set of directories, the application could keep a map
    of dirfd for all watched directories and hash the map by fid obtained
    with name_to_handle_at(2).  When getting a name event, the fid in the
    event info could be used to lookup the base dirfd in the map and then
    call fstatat(2) with that dirfd.

    3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
    directories, the application could use open_by_handle_at(2) with the fid
    in event info to obtain dirfd for the directory where event happened and
    call fstatat(2) with this dirfd.

    The last option scales better for a large number of watched directories.
    The first two options may be available in the future also for non
    privileged fanotify watchers, because open_by_handle_at(2) requires
    the CAP_DAC_READ_SEARCH capability.

    Legacy inotify events are reported with name and event mask (e.g. "foo",
    FAN_CREATE | FAN_ONDIR).  That can lead users to the conclusion that
    there is *currently* an entry "foo" that is a sub-directory, when in fact
    "foo" may be negative or non-dir by the time user gets the event.

    To make it clear that the current state of the named entry is unknown,
    the new fanotify event intentionally hides this information and reports
    only the flag FAN_WITH_NAME in event mask.  This should make it harder
    for users to make wrong assumptions and write buggy applications.

    We reserve the combination of FAN_EVENT_ON_CHILD on a filesystem/mount
    mark and FAN_REPORT_NAME group for future use, so for now this
    combination is invalid.

    Signed-off-by: Amir Goldstein <amir73il@gmail.com>

commit 76a509dbc06fd58ec6636484f87896044cd99022
Author: Amir Goldstein <amir73il@gmail.com>
Date:   Fri Dec 13 11:58:02 2019 +0200

    fanotify: implement basic FAN_REPORT_FID_NAME logic

    Dirent events will be reported in one of two flavors depending on
    fanotify init flags:

    1. Dir fid info + mask that includes the specific event types and
       optional FAN_ONDIR flag.
    2. Dir fid info + name + mask that includes only FAN_WITH_NAME flag.

    To request the second event flavor, user will need to set the
    FAN_REPORT_FID_NAME flags in fanotify_init().

    The first flavor is already supported since kernel v5.1 and is
    intended to be used for watching directories in "batch mode" - user
    is notified when directory is changed and re-scans the directory
    content in response.  This event flavor is stored more compactly in
    event queue, so it is optimal for workloads with frequent directory
    changes (e.g. many files created/deleted).

    The second event flavor is intended to be used for watching large
    directories, where the cost of re-scan of the directory on every change
    is considered too high.  The watcher getting the event with the directory
    fid and entry name is expected to call fstatat(2) to query the content of
    the entry after the change.

    Events "on child" will behave similarly to dirent events, with a small
    difference - the first event flavor without name reports the child fid.
    The second flavor with name info reports the parent fid, because the
    name is relative to the parent directory.

    At the moment, event name info reporting is not implemented, so the
    FAN_REPORT_NAME flag is not yet valid as input to fanotify_init().

    Signed-off-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-16 15:00                 ` Amir Goldstein
@ 2019-12-19  7:33                   ` Amir Goldstein
  2019-12-23 18:19                     ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-19  7:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Mon, Dec 16, 2019 at 5:00 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> [cc: linux-api]

CC for real this time.
Leaving entire message for people that join late.

>
> On Wed, Dec 11, 2019 at 3:58 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2019 at 12:06 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 04-12-19 22:27:31, Amir Goldstein wrote:
> > [...]
> > > > The way to frame this correctly IMO is that fsnotify events let application
> > > > know that "something has changed", without any ordering guaranty
> > > > beyond "sometime before the event was read".
> > > >
> > > > So far, that "something" can be a file (by fd), an inode (by fid),
> > > > more specifically a directory inode (by fid) where in an entry has
> > > > changed.
> > > >
> > > > Adding filename info extends that concept to "something has changed
> > > > in the namespace at" (by parent fid+name).
> > > > All it means is that application should pay attention to that part of
> > > > the namespace and perform a lookup to find out what has changed.
> > > >
> > > > Maybe the way to mitigate wrong assumptions about ordering and
> > > > existence of the filename in the namespace is to omit the event type
> > > > for "filename events", for example: { FAN_CHANGE, pfid, name }.
> > >
> > > So this event would effectively mean: In directory pfid, some filename
> > > event has happened with name "name" - i.e. "name" was created (could mean
> > > also mkdir), deleted, moved. Am I right?
> >
> > Exactly.
> >
> > > And the application would then
> > > open_by_handle(2) + open_at(2) + fstat(2) the object pointed to by
> >
> > open_by_handle(2) + fstatat(2) to be exact.
> >
> > > (pfid, name) pair and copy whatever it finds to the other end (or delete on
> > > the other end in case of ENOENT)?
> >
> > Basically, yes.
> > Although a modern sync tool may also keep some local map of
> > remote name -> local fid, to detect a local rename and try to perform a
> > remote rename.
> >
> > >
> > > After some thought, yes, I think this is difficult to misuse (or infer some
> > > false guarantees out of it). As far as I was thinking it also seems good
> > > enough to implement more efficient syncing of directories.
> >
> > Great, so I will work on the patches.
> >
>
> Hi Jan,
>
> I have something working.
>
> Patches:
> https://github.com/amir73il/linux/commits/fanotify_name
>
> Simple test:
> https://github.com/amir73il/ltp/commits/fanotify_name
>
> I will post the patches after I have a working demo, but in the mean while here
> is the gist of the API from the commit log in case you or anyone has comments
> on the API.
>
> Note that in the new event flavor, event mask is given as input
> (e.g. FAN_CREATE) to filter the type of reported events, but
> the event types are hidden when event is reported.
>
> Besides the dirent event types, events "on child" (i.e. MODIFY) can also be
> reported with name to a directory watcher.
>
> For now, "on child" events cannot be requested for filesystem/mount
> watch, but I think we should consider this possibility so I added
> a check to return EINVAL if this combination is attempted.
>

Hi Jan,

Thinking out loud again.

Assuming the concept of FAN_REPORT_FID_NAME as described in the
commit messages below is acceptable, the way to deal with dirent events
is clear as well as the way to deal with events "on child" for a watched dir
and those are what the branch fanotify_name implemented.

I've spend the last few days trying to figure out the "best" way to handle the
rest of the events. And by "best" I mean, least to explain in man page, while
providing the needed functionality to users.

This is what I got to so far. Patches are shaping up on branch
fanotify_name-wip same branch name for ltp tests:

For a group initialized with FAN_REPORT_FID_NAME:
1. Events report mask with only FAN_WITH_NAME flag
2. Reported name follows fid but may be empty in some cases
3. Dirent events (create/delete/move) report a non-empty name
4. Events "on child" on watched dir report a non-empty name
5. Events "on self" (delete_self/move_self) report an empty name
6. Events "possible on child" (open/access/modify/close/attrib) are
    reported only in their "on child" flavor when set on a sb/mount mark
7. The flag FAN_EVENT_ON_CHILD on a sb/mount mark is ignored
    (as in current upstream kernel and man page), but the events are
    reported with non-empty name and parent dir fid, same as in the
    case where all directories under sb/mount have been marked
    with FAN_EVENT_ON_CHILD (a.k.a slow recursive watch)

There are some open questions regarding the fine details of items 5-7:
- Should "self" events with empty name on dir be reported?
- Should "self" events with empty name on non-dir be reported?
- Should open/access/attrib on watched dir itself report an
  event with empty name?
- Should open/access/attrib on root sb/mount root dir report an
  event with empty name?
- Should open/access/modify/attrib on non-dir report an event
  with empty name?

For full disclosure, in the out-of-tree patches [1] we use in CTERA
the answer to all the open questions above is:
"Yes, but the filesystem monitor is only using the self events on dirs".

The problem with this approach is that there is currently no way
for users to request certain events ONLY_ONDIR. A typical filesystem
monitor is only interested in self events on directories, but requesting
self events will fill the queue with unneeded self events on files.

Another valid answer to all these questions could be:
"No, because user can already get those events by opening another
group with FAN_REPORT_FID".

The problem with this approach is that it is harder to document (?)
and harder for users to use (?).

Therefore, I am leaning toward this middle ground solution:

8. If a non-empty name is reported, fid is identifying a directory
9. Events on non-directory with empty name are not reported.
    user may use another group with FAN_REPORT_FID to get
    those events

I could use some guidance here.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/fanotify_filename

>
> commit 91e0af27ac329f279167e74761fb5303ebbc1c08
> Author: Amir Goldstein <amir73il@gmail.com>
> Date:   Mon Dec 16 08:39:21 2019 +0200
>
>     fanotify: report name info with FAN_REPORT_FID_NAME
>
>     With init flags FAN_REPORT_FID_NAME, report events with name in variable
>     length fanotify_event_info record similar to how fid's are reported.
>     When events are reported with name, the reported fid identifies the
>     directory and the name follows the fid. The info record type for this
>     event info is FAN_EVENT_INFO_TYPE_FID_NAME.
>
>     There are several ways that an application can use this information:
>
>     1. When watching a single directory, the name is always relative to
>     the watched directory, so application need to fstatat(2) the name
>     relative to the watched directory.
>
>     2. When watching a set of directories, the application could keep a map
>     of dirfd for all watched directories and hash the map by fid obtained
>     with name_to_handle_at(2).  When getting a name event, the fid in the
>     event info could be used to lookup the base dirfd in the map and then
>     call fstatat(2) with that dirfd.
>
>     3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
>     directories, the application could use open_by_handle_at(2) with the fid
>     in event info to obtain dirfd for the directory where event happened and
>     call fstatat(2) with this dirfd.
>
>     The last option scales better for a large number of watched directories.
>     The first two options may be available in the future also for non
>     privileged fanotify watchers, because open_by_handle_at(2) requires
>     the CAP_DAC_READ_SEARCH capability.
>
>     Legacy inotify events are reported with name and event mask (e.g. "foo",
>     FAN_CREATE | FAN_ONDIR).  That can lead users to the conclusion that
>     there is *currently* an entry "foo" that is a sub-directory, when in fact
>     "foo" may be negative or non-dir by the time user gets the event.
>
>     To make it clear that the current state of the named entry is unknown,
>     the new fanotify event intentionally hides this information and reports
>     only the flag FAN_WITH_NAME in event mask.  This should make it harder
>     for users to make wrong assumptions and write buggy applications.
>
>     We reserve the combination of FAN_EVENT_ON_CHILD on a filesystem/mount
>     mark and FAN_REPORT_NAME group for future use, so for now this
>     combination is invalid.
>
>     Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> commit 76a509dbc06fd58ec6636484f87896044cd99022
> Author: Amir Goldstein <amir73il@gmail.com>
> Date:   Fri Dec 13 11:58:02 2019 +0200
>
>     fanotify: implement basic FAN_REPORT_FID_NAME logic
>
>     Dirent events will be reported in one of two flavors depending on
>     fanotify init flags:
>
>     1. Dir fid info + mask that includes the specific event types and
>        optional FAN_ONDIR flag.
>     2. Dir fid info + name + mask that includes only FAN_WITH_NAME flag.
>
>     To request the second event flavor, user will need to set the
>     FAN_REPORT_FID_NAME flags in fanotify_init().
>
>     The first flavor is already supported since kernel v5.1 and is
>     intended to be used for watching directories in "batch mode" - user
>     is notified when directory is changed and re-scans the directory
>     content in response.  This event flavor is stored more compactly in
>     event queue, so it is optimal for workloads with frequent directory
>     changes (e.g. many files created/deleted).
>
>     The second event flavor is intended to be used for watching large
>     directories, where the cost of re-scan of the directory on every change
>     is considered too high.  The watcher getting the event with the directory
>     fid and entry name is expected to call fstatat(2) to query the content of
>     the entry after the change.
>
>     Events "on child" will behave similarly to dirent events, with a small
>     difference - the first event flavor without name reports the child fid.
>     The second flavor with name info reports the parent fid, because the
>     name is relative to the parent directory.
>
>     At the moment, event name info reporting is not implemented, so the
>     FAN_REPORT_NAME flag is not yet valid as input to fanotify_init().
>
>     Signed-off-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-19  7:33                   ` Amir Goldstein
@ 2019-12-23 18:19                     ` Jan Kara
  2019-12-23 19:14                       ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2019-12-23 18:19 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Thu 19-12-19 09:33:24, Amir Goldstein wrote:
> On Mon, Dec 16, 2019 at 5:00 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > On Wed, Dec 11, 2019 at 3:58 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > Hi Jan,
> >
> > I have something working.
> >
> > Patches:
> > https://github.com/amir73il/linux/commits/fanotify_name
> >
> > Simple test:
> > https://github.com/amir73il/ltp/commits/fanotify_name
> >
> > I will post the patches after I have a working demo, but in the mean while here
> > is the gist of the API from the commit log in case you or anyone has comments
> > on the API.
> >
> > Note that in the new event flavor, event mask is given as input
> > (e.g. FAN_CREATE) to filter the type of reported events, but
> > the event types are hidden when event is reported.

Makes some sense I guess but at the same time won't this be rather
confusing for users of the API?

Also I've read you proposal and I'm somewhat wondering whether we are not
overengineering this. I can see the need for FAN_DIR_MODIFIED_WITH_NAME
(stupid name, I know) - generated when something changed with names in a
particular directory, reported with FID of the directory and the name
inside that directory involved with the change. Directory watching
application needs this to keep track of "names to check". Is the name
useful with any other type of event? _SELF events cannot even sensibly have
it so no discussion there as you mention below. Then we have OPEN, CLOSE,
ACCESS, ATTRIB events. Do we have any use for names with those?

								Honza

> > Besides the dirent event types, events "on child" (i.e. MODIFY) can also be
> > reported with name to a directory watcher.
> >
> > For now, "on child" events cannot be requested for filesystem/mount
> > watch, but I think we should consider this possibility so I added
> > a check to return EINVAL if this combination is attempted.
> >
> 
> Hi Jan,
> 
> Thinking out loud again.
> 
> Assuming the concept of FAN_REPORT_FID_NAME as described in the
> commit messages below is acceptable, the way to deal with dirent events
> is clear as well as the way to deal with events "on child" for a watched dir
> and those are what the branch fanotify_name implemented.
> 
> I've spend the last few days trying to figure out the "best" way to handle the
> rest of the events. And by "best" I mean, least to explain in man page, while
> providing the needed functionality to users.
> 
> This is what I got to so far. Patches are shaping up on branch
> fanotify_name-wip same branch name for ltp tests:
> 
> For a group initialized with FAN_REPORT_FID_NAME:
> 1. Events report mask with only FAN_WITH_NAME flag
> 2. Reported name follows fid but may be empty in some cases
> 3. Dirent events (create/delete/move) report a non-empty name
> 4. Events "on child" on watched dir report a non-empty name
> 5. Events "on self" (delete_self/move_self) report an empty name
> 6. Events "possible on child" (open/access/modify/close/attrib) are
>     reported only in their "on child" flavor when set on a sb/mount mark
> 7. The flag FAN_EVENT_ON_CHILD on a sb/mount mark is ignored
>     (as in current upstream kernel and man page), but the events are
>     reported with non-empty name and parent dir fid, same as in the
>     case where all directories under sb/mount have been marked
>     with FAN_EVENT_ON_CHILD (a.k.a slow recursive watch)
> 
> There are some open questions regarding the fine details of items 5-7:
> - Should "self" events with empty name on dir be reported?
> - Should "self" events with empty name on non-dir be reported?
> - Should open/access/attrib on watched dir itself report an
>   event with empty name?
> - Should open/access/attrib on root sb/mount root dir report an
>   event with empty name?
> - Should open/access/modify/attrib on non-dir report an event
>   with empty name?
> 
> For full disclosure, in the out-of-tree patches [1] we use in CTERA
> the answer to all the open questions above is:
> "Yes, but the filesystem monitor is only using the self events on dirs".
> 
> The problem with this approach is that there is currently no way
> for users to request certain events ONLY_ONDIR. A typical filesystem
> monitor is only interested in self events on directories, but requesting
> self events will fill the queue with unneeded self events on files.
> 
> Another valid answer to all these questions could be:
> "No, because user can already get those events by opening another
> group with FAN_REPORT_FID".
> 
> The problem with this approach is that it is harder to document (?)
> and harder for users to use (?).
> 
> Therefore, I am leaning toward this middle ground solution:
> 
> 8. If a non-empty name is reported, fid is identifying a directory
> 9. Events on non-directory with empty name are not reported.
>     user may use another group with FAN_REPORT_FID to get
>     those events
> 
> I could use some guidance here.
> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/linux/commits/fanotify_filename
> 
> >
> > commit 91e0af27ac329f279167e74761fb5303ebbc1c08
> > Author: Amir Goldstein <amir73il@gmail.com>
> > Date:   Mon Dec 16 08:39:21 2019 +0200
> >
> >     fanotify: report name info with FAN_REPORT_FID_NAME
> >
> >     With init flags FAN_REPORT_FID_NAME, report events with name in variable
> >     length fanotify_event_info record similar to how fid's are reported.
> >     When events are reported with name, the reported fid identifies the
> >     directory and the name follows the fid. The info record type for this
> >     event info is FAN_EVENT_INFO_TYPE_FID_NAME.
> >
> >     There are several ways that an application can use this information:
> >
> >     1. When watching a single directory, the name is always relative to
> >     the watched directory, so application need to fstatat(2) the name
> >     relative to the watched directory.
> >
> >     2. When watching a set of directories, the application could keep a map
> >     of dirfd for all watched directories and hash the map by fid obtained
> >     with name_to_handle_at(2).  When getting a name event, the fid in the
> >     event info could be used to lookup the base dirfd in the map and then
> >     call fstatat(2) with that dirfd.
> >
> >     3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of
> >     directories, the application could use open_by_handle_at(2) with the fid
> >     in event info to obtain dirfd for the directory where event happened and
> >     call fstatat(2) with this dirfd.
> >
> >     The last option scales better for a large number of watched directories.
> >     The first two options may be available in the future also for non
> >     privileged fanotify watchers, because open_by_handle_at(2) requires
> >     the CAP_DAC_READ_SEARCH capability.
> >
> >     Legacy inotify events are reported with name and event mask (e.g. "foo",
> >     FAN_CREATE | FAN_ONDIR).  That can lead users to the conclusion that
> >     there is *currently* an entry "foo" that is a sub-directory, when in fact
> >     "foo" may be negative or non-dir by the time user gets the event.
> >
> >     To make it clear that the current state of the named entry is unknown,
> >     the new fanotify event intentionally hides this information and reports
> >     only the flag FAN_WITH_NAME in event mask.  This should make it harder
> >     for users to make wrong assumptions and write buggy applications.
> >
> >     We reserve the combination of FAN_EVENT_ON_CHILD on a filesystem/mount
> >     mark and FAN_REPORT_NAME group for future use, so for now this
> >     combination is invalid.
> >
> >     Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > commit 76a509dbc06fd58ec6636484f87896044cd99022
> > Author: Amir Goldstein <amir73il@gmail.com>
> > Date:   Fri Dec 13 11:58:02 2019 +0200
> >
> >     fanotify: implement basic FAN_REPORT_FID_NAME logic
> >
> >     Dirent events will be reported in one of two flavors depending on
> >     fanotify init flags:
> >
> >     1. Dir fid info + mask that includes the specific event types and
> >        optional FAN_ONDIR flag.
> >     2. Dir fid info + name + mask that includes only FAN_WITH_NAME flag.
> >
> >     To request the second event flavor, user will need to set the
> >     FAN_REPORT_FID_NAME flags in fanotify_init().
> >
> >     The first flavor is already supported since kernel v5.1 and is
> >     intended to be used for watching directories in "batch mode" - user
> >     is notified when directory is changed and re-scans the directory
> >     content in response.  This event flavor is stored more compactly in
> >     event queue, so it is optimal for workloads with frequent directory
> >     changes (e.g. many files created/deleted).
> >
> >     The second event flavor is intended to be used for watching large
> >     directories, where the cost of re-scan of the directory on every change
> >     is considered too high.  The watcher getting the event with the directory
> >     fid and entry name is expected to call fstatat(2) to query the content of
> >     the entry after the change.
> >
> >     Events "on child" will behave similarly to dirent events, with a small
> >     difference - the first event flavor without name reports the child fid.
> >     The second flavor with name info reports the parent fid, because the
> >     name is relative to the parent directory.
> >
> >     At the moment, event name info reporting is not implemented, so the
> >     FAN_REPORT_NAME flag is not yet valid as input to fanotify_init().
> >
> >     Signed-off-by: Amir Goldstein <amir73il@gmail.com>
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-23 18:19                     ` Jan Kara
@ 2019-12-23 19:14                       ` Amir Goldstein
  2019-12-24  3:49                         ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2019-12-23 19:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Mon, Dec 23, 2019 at 8:20 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 19-12-19 09:33:24, Amir Goldstein wrote:
> > On Mon, Dec 16, 2019 at 5:00 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > On Wed, Dec 11, 2019 at 3:58 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > Hi Jan,
> > >
> > > I have something working.
> > >
> > > Patches:
> > > https://github.com/amir73il/linux/commits/fanotify_name
> > >
> > > Simple test:
> > > https://github.com/amir73il/ltp/commits/fanotify_name
> > >
> > > I will post the patches after I have a working demo, but in the mean while here
> > > is the gist of the API from the commit log in case you or anyone has comments
> > > on the API.
> > >
> > > Note that in the new event flavor, event mask is given as input
> > > (e.g. FAN_CREATE) to filter the type of reported events, but
> > > the event types are hidden when event is reported.
>
> Makes some sense I guess but at the same time won't this be rather
> confusing for users of the API?
>

<shrug/> I guess it will be confusing for users that we obfuscate the event
mask in the first place. I though that taking away the ability to
filter by event
type would be too much, but I can change that if you think otherwise.


> Also I've read you proposal and I'm somewhat wondering whether we are not
> overengineering this.

Possibly. Won't be my first time..

> I can see the need for FAN_DIR_MODIFIED_WITH_NAME
> (stupid name, I know) - generated when something changed with names in a
> particular directory, reported with FID of the directory and the name
> inside that directory involved with the change. Directory watching
> application needs this to keep track of "names to check". Is the name
> useful with any other type of event? _SELF events cannot even sensibly have
> it so no discussion there as you mention below. Then we have OPEN, CLOSE,
> ACCESS, ATTRIB events. Do we have any use for names with those?
>

The problem is that unlike dir fid, file fid cannot be reliably resolved
to path, that is the reason that I implemented  FAN_WITH_NAME
for events "possible on child" (see branch fanotify_name-wip).

A filesystem monitor typically needs to be notified on name changes and on
data/metadata modifications.

So maybe add just two new event types:
FAN_DIR_MODIFY
FAN_CHILD_MODIFY

Both those events are reported with name and allowed only with init flag
FAN_REPORT_FID_NAME.
User cannot filter FAN_DIR_MODIFY by part of create/delete/move.
User cannot filter FAN_CHILD_MODIFY by part of attrib/modify/close_write.
FAN_CHILD_MODIFY implies FAN_EVENT_ON_CHILD for directory watch,
but also works on sb/mount mark as if all directories are watched.

And there is nothing stopping user from requesting also _SELF events
and other events without name on the same group.

Does that sound better?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-23 19:14                       ` Amir Goldstein
@ 2019-12-24  3:49                         ` Amir Goldstein
  2019-12-31 11:53                           ` Amir Goldstein
  2020-01-07 17:10                           ` Jan Kara
  0 siblings, 2 replies; 25+ messages in thread
From: Amir Goldstein @ 2019-12-24  3:49 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

> > I can see the need for FAN_DIR_MODIFIED_WITH_NAME
> > (stupid name, I know) - generated when something changed with names in a
> > particular directory, reported with FID of the directory and the name
> > inside that directory involved with the change. Directory watching
> > application needs this to keep track of "names to check". Is the name
> > useful with any other type of event? _SELF events cannot even sensibly have
> > it so no discussion there as you mention below. Then we have OPEN, CLOSE,
> > ACCESS, ATTRIB events. Do we have any use for names with those?
> >
>
> The problem is that unlike dir fid, file fid cannot be reliably resolved
> to path, that is the reason that I implemented  FAN_WITH_NAME
> for events "possible on child" (see branch fanotify_name-wip).
>
> A filesystem monitor typically needs to be notified on name changes and on
> data/metadata modifications.
>
> So maybe add just two new event types:
> FAN_DIR_MODIFY
> FAN_CHILD_MODIFY
>
> Both those events are reported with name and allowed only with init flag
> FAN_REPORT_FID_NAME.
> User cannot filter FAN_DIR_MODIFY by part of create/delete/move.
> User cannot filter FAN_CHILD_MODIFY by part of attrib/modify/close_write.

Nah, that won't do. I now remember discussing this with out in-house monitor
team and they said they needed to filter out FAN_MODIFY because it was too
noisy and rely on FAN_CLOSE_WRITE. And other may want open/access as
well.

There is another weird way to obfuscate the event type.
I am not sure if users will be less confused about it:
Each event type belongs to a group (i.e. self, dirent, poss_on_child)
User may set any event type in the mask (e.g. create|delete|open|close)
When getting an event from event group A (e.g. create), all event types
of that group will be reported (e.g. create|delete).

To put it another way:
#define FAN_DIR_MODIFY (FAN_CREATE | FAN_MOVE | FAN_DELETE)

For example in fanotify_group_event_mask():
if (event_with_name) {
    if (marks_mask & test_mask & FAN_DIR_MODIFY)
        test_mask |= marks_mask & FAN_DIR_MODIFY
...

Did somebody say over-engineering? ;)

TBH, I don't see how we can do event type obfuscation
that is both usable and not confusing, because the concept is
confusing. I understand the reasoning behind it, but I don't think
that many users will.

I'm hoping that you can prove me wrong and find a way to simplify
the API while retaining fair usability.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-24  3:49                         ` Amir Goldstein
@ 2019-12-31 11:53                           ` Amir Goldstein
  2020-01-07 17:10                           ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2019-12-31 11:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Tue, Dec 24, 2019 at 5:49 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > I can see the need for FAN_DIR_MODIFIED_WITH_NAME
> > > (stupid name, I know) - generated when something changed with names in a
> > > particular directory, reported with FID of the directory and the name
> > > inside that directory involved with the change. Directory watching
> > > application needs this to keep track of "names to check". Is the name
> > > useful with any other type of event? _SELF events cannot even sensibly have
> > > it so no discussion there as you mention below. Then we have OPEN, CLOSE,
> > > ACCESS, ATTRIB events. Do we have any use for names with those?
> > >
> >
> > The problem is that unlike dir fid, file fid cannot be reliably resolved
> > to path, that is the reason that I implemented  FAN_WITH_NAME
> > for events "possible on child" (see branch fanotify_name-wip).
> >
> > A filesystem monitor typically needs to be notified on name changes and on
> > data/metadata modifications.
> >

And just before 2019 ends, here is the promised demo.

The kernel branch was added support for events "on child" with name [1].
The inotifywatch demo created for FAN_REPORT_FID was extended to
watch events with FAN_REPORT_FID_NAME [2].

[1] https://github.com/amir73il/linux/commits/fanotify_name
[2] https://github.com/amir73il/inotify-tools/commits/fanotify_name

The demo branch includes the script test_demo.sh, whose output can be
seen here below, that does:

1. Create a small data set (in a filesystem mounted at /vdf)
2. Set up an fanotify filesystem mark watching for all fs changes
3. Make some filesystem changes on files and dirs
4. Read events with fid+name info after 2 seconds delay
5. Check the uptodate path of events with open_by_handle_at+faccessat
6. Print summary of all paths that require "attention"
7. Paths that are currently ENOENT are marked with "(deleted)"

The report includes all the information needed to sync filesystem
changes to mirror or to re-index filesystem after changes.

For example, the file a/b/c/1 was moved to a/b/c/d/e/f/g/1 and then
directory a/b/c/d/e/f/g was moved to a/b/c/d/e/G.
In the report, the paths a/b/c/1 and a/b/c/d/e/f/g are listed as (deleted)
and the paths a/b/c/d/e/G and a/b/c/d/e/G/1 are listed as changed.
There is no record in the report of the intermediate path a/b/c/d/e/f/g/1.

Happy new year!
Amir.

---------------
# ./test_demo.sh /vdf
+ WD=/vdf
+ cd /vdf
+ rm -rf a
+ mkdir -p a/b/c/d/e/f/g/
+ touch a/b/c/0 a/b/c/1 a/b/c/d/e/f/g/0
+ sleep 1
+ inotifywatch --global --writes --timeout -2 /vdf
Establishing filesystem global watch...
Finished establishing watches, now collecting statistics.
Sleeping for 2 seconds...
+
+ t=Create files and dirs...
+ touch a/0 a/1 a/2 a/3
+ mkdir a/dir0 a/dir1 a/dir2
+
+ t=Rename files and dirs...
+ mv a/0 a/3
+ mv a/dir0 a/dir3
+
+ t=Delete files and dirs...
+ rm a/1
+ rmdir a/dir1
+
+ t=Modify files and dirs...
+ chmod +x a/b/c/d
+ touch a/b/c/0
+
+ t=Move files and dirs...
+ mv a/b/c/1 a/b/c/d/e/f/g/1
+ mv a/b/c/d/e/f/g a/b/c/d/e/G
+
[fid=fd50.0.2007402;name='0'] /vdf/a/0 (deleted)
[fid=fd50.0.2007402;name='1'] /vdf/a/1 (deleted)
[fid=fd50.0.2007402;name='2'] /vdf/a/2
[fid=fd50.0.2007402;name='3'] /vdf/a/3
[fid=fd50.0.2007402;name='dir0'] /vdf/a/dir0 (deleted)
[fid=fd50.0.2007402;name='dir1'] /vdf/a/dir1 (deleted)
[fid=fd50.0.2007402;name='dir2'] /vdf/a/dir2
[fid=fd50.0.2007402;name='dir3'] /vdf/a/dir3
[fid=fd50.0.8c;name='d'] /vdf/a/b/c/d
[fid=fd50.0.8c;name='0'] /vdf/a/b/c/0
[fid=fd50.0.8c;name='1'] /vdf/a/b/c/1 (deleted)
[fid=fd50.0.2007403;name='1'] /vdf/a/b/c/d/e/G/1
[fid=fd50.0.10000c2;name='g'] /vdf/a/b/c/d/e/f/g (deleted)
[fid=fd50.0.2007403;name='G'] /vdf/a/b/c/d/e/G
total  filename
2      /vdf/a/0 (deleted)
2      /vdf/a/1 (deleted)
2      /vdf/a/3
2      /vdf/a/dir0 (deleted)
2      /vdf/a/dir1 (deleted)
1      /vdf/a/2
1      /vdf/a/dir2
1      /vdf/a/dir3
1      /vdf/a/b/c/d
1      /vdf/a/b/c/0
1      /vdf/a/b/c/1 (deleted)
1      /vdf/a/b/c/d/e/G/1
1      /vdf/a/b/c/d/e/f/g (deleted)
1      /vdf/a/b/c/d/e/G

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2019-12-24  3:49                         ` Amir Goldstein
  2019-12-31 11:53                           ` Amir Goldstein
@ 2020-01-07 17:10                           ` Jan Kara
  2020-01-07 18:56                             ` Amir Goldstein
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Kara @ 2020-01-07 17:10 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Tue 24-12-19 05:49:42, Amir Goldstein wrote:
> > > I can see the need for FAN_DIR_MODIFIED_WITH_NAME
> > > (stupid name, I know) - generated when something changed with names in a
> > > particular directory, reported with FID of the directory and the name
> > > inside that directory involved with the change. Directory watching
> > > application needs this to keep track of "names to check". Is the name
> > > useful with any other type of event? _SELF events cannot even sensibly have
> > > it so no discussion there as you mention below. Then we have OPEN, CLOSE,
> > > ACCESS, ATTRIB events. Do we have any use for names with those?
> > >
> >
> > The problem is that unlike dir fid, file fid cannot be reliably resolved
> > to path, that is the reason that I implemented  FAN_WITH_NAME
> > for events "possible on child" (see branch fanotify_name-wip).

Ok, but that seems to be a bit of an abuse, isn't it? Because with parent
fid + name you may reconstruct the path but you won't be able to reliably
identify the object where the operation happened? Even worse users can
mistakenly think that parent fid + name identify the object but that is
racy... This is exactly the kind of confusion I'd like to avoid with the
new API.

OTOH I understand that e.g. a file monitor may want to monitor CLOSE_WRITE
like you mention below just to record directory FID + name as something
that needs resyncing. So I agree that names in events other than directory
events are useful as well. And I also agree that for that usecase what you
propose would be fine.

> > A filesystem monitor typically needs to be notified on name changes and on
> > data/metadata modifications.
> >
> > So maybe add just two new event types:
> > FAN_DIR_MODIFY
> > FAN_CHILD_MODIFY
> >
> > Both those events are reported with name and allowed only with init flag
> > FAN_REPORT_FID_NAME.
> > User cannot filter FAN_DIR_MODIFY by part of create/delete/move.
> > User cannot filter FAN_CHILD_MODIFY by part of attrib/modify/close_write.
> 
> Nah, that won't do. I now remember discussing this with out in-house monitor
> team and they said they needed to filter out FAN_MODIFY because it was too
> noisy and rely on FAN_CLOSE_WRITE. And other may want open/access as
> well.

So for open/close/modify/read/attrib I don't see a need to obfuscate the
event type. They are already abstract enough so I don't see how they could
be easily misinterpretted. With directory events the potential for
"optimizations" that are subtly wrong is IMHO much bigger.

> There is another weird way to obfuscate the event type.
> I am not sure if users will be less confused about it:
> Each event type belongs to a group (i.e. self, dirent, poss_on_child)
> User may set any event type in the mask (e.g. create|delete|open|close)
> When getting an event from event group A (e.g. create), all event types
> of that group will be reported (e.g. create|delete).
> 
> To put it another way:
> #define FAN_DIR_MODIFY (FAN_CREATE | FAN_MOVE | FAN_DELETE)
> 
> For example in fanotify_group_event_mask():
> if (event_with_name) {
>     if (marks_mask & test_mask & FAN_DIR_MODIFY)
>         test_mask |= marks_mask & FAN_DIR_MODIFY
> ...
> 
> Did somebody say over-engineering? ;)
> 
> TBH, I don't see how we can do event type obfuscation
> that is both usable and not confusing, because the concept is
> confusing. I understand the reasoning behind it, but I don't think
> that many users will.
> 
> I'm hoping that you can prove me wrong and find a way to simplify
> the API while retaining fair usability.

I was thinking about this. If I understand the problem right, depending on
the usecase we may need with each event some subset of 'object fid',
'directory fid', 'name in directory'. So what if we provided all these
three things in each event? Events will get somewhat bloated but it may be
bearable.

With this information we could reliably reconstruct (some) path (we always
have directory fid + name), we can reliably identify the object involved in
the change (we always have object fid). I'd still prefer if we obfuscated
directory events, without possibility of filtering based of
CREATE/DELETE/MOVE (i.e., just one FAN_DIR_MODIFY event for this fanotify
group) - actually I have hard time coming with a usecase where application
would care about one type of event and not the other one. The other events
remain as they are. What do you think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2020-01-07 17:10                           ` Jan Kara
@ 2020-01-07 18:56                             ` Amir Goldstein
  2020-01-08  9:04                               ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2020-01-07 18:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Tue, Jan 7, 2020 at 7:10 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 24-12-19 05:49:42, Amir Goldstein wrote:
> > > > I can see the need for FAN_DIR_MODIFIED_WITH_NAME
> > > > (stupid name, I know) - generated when something changed with names in a
> > > > particular directory, reported with FID of the directory and the name
> > > > inside that directory involved with the change. Directory watching
> > > > application needs this to keep track of "names to check". Is the name
> > > > useful with any other type of event? _SELF events cannot even sensibly have
> > > > it so no discussion there as you mention below. Then we have OPEN, CLOSE,
> > > > ACCESS, ATTRIB events. Do we have any use for names with those?
> > > >
> > >
> > > The problem is that unlike dir fid, file fid cannot be reliably resolved
> > > to path, that is the reason that I implemented  FAN_WITH_NAME
> > > for events "possible on child" (see branch fanotify_name-wip).
>
> Ok, but that seems to be a bit of an abuse, isn't it? Because with parent
> fid + name you may reconstruct the path but you won't be able to reliably
> identify the object where the operation happened? Even worse users can
> mistakenly think that parent fid + name identify the object but that is
> racy... This is exactly the kind of confusion I'd like to avoid with the
> new API.
>
> OTOH I understand that e.g. a file monitor may want to monitor CLOSE_WRITE
> like you mention below just to record directory FID + name as something
> that needs resyncing. So I agree that names in events other than directory
> events are useful as well. And I also agree that for that usecase what you
> propose would be fine.
>
> > > A filesystem monitor typically needs to be notified on name changes and on
> > > data/metadata modifications.
> > >
> > > So maybe add just two new event types:
> > > FAN_DIR_MODIFY
> > > FAN_CHILD_MODIFY
> > >
> > > Both those events are reported with name and allowed only with init flag
> > > FAN_REPORT_FID_NAME.
> > > User cannot filter FAN_DIR_MODIFY by part of create/delete/move.
> > > User cannot filter FAN_CHILD_MODIFY by part of attrib/modify/close_write.
> >
> > Nah, that won't do. I now remember discussing this with out in-house monitor
> > team and they said they needed to filter out FAN_MODIFY because it was too
> > noisy and rely on FAN_CLOSE_WRITE. And other may want open/access as
> > well.
>
> So for open/close/modify/read/attrib I don't see a need to obfuscate the
> event type. They are already abstract enough so I don't see how they could
> be easily misinterpretted. With directory events the potential for
> "optimizations" that are subtly wrong is IMHO much bigger.
>

OK, that simplifies things quite a bit.

> > There is another weird way to obfuscate the event type.
> > I am not sure if users will be less confused about it:
> > Each event type belongs to a group (i.e. self, dirent, poss_on_child)
> > User may set any event type in the mask (e.g. create|delete|open|close)
> > When getting an event from event group A (e.g. create), all event types
> > of that group will be reported (e.g. create|delete).
> >
> > To put it another way:
> > #define FAN_DIR_MODIFY (FAN_CREATE | FAN_MOVE | FAN_DELETE)
> >
> > For example in fanotify_group_event_mask():
> > if (event_with_name) {
> >     if (marks_mask & test_mask & FAN_DIR_MODIFY)
> >         test_mask |= marks_mask & FAN_DIR_MODIFY
> > ...
> >
> > Did somebody say over-engineering? ;)
> >
> > TBH, I don't see how we can do event type obfuscation
> > that is both usable and not confusing, because the concept is
> > confusing. I understand the reasoning behind it, but I don't think
> > that many users will.
> >
> > I'm hoping that you can prove me wrong and find a way to simplify
> > the API while retaining fair usability.
>
> I was thinking about this. If I understand the problem right, depending on
> the usecase we may need with each event some subset of 'object fid',
> 'directory fid', 'name in directory'. So what if we provided all these
> three things in each event? Events will get somewhat bloated but it may be
> bearable.
>

I agree.

What I like about the fact that users don't need to choose between
'parent fid' and 'object fid' is that it makes some hard questions go away:
1. How are "self" events reported? simple - just with 'object id'
2. How are events on disconnected dentries reported? simple - just
with 'object id'
3. How are events on the root of the watch reported? same answer

Did you write 'directory fid' as opposed to 'parent fid' for a reason?
Was it your intention to imply that events on directories (e.g.
open/close/attrib) are
never reported with 'parent fid' , 'name in directory'?

I see no functional problem with making that distinction between directory and
non-directory, but I have a feeling that 'parent fid', 'name in
directory', 'object id',
regardless of dir/non-dir is going to be easier to document and less confusing
for users to understand, so this is my preference.

> With this information we could reliably reconstruct (some) path (we always
> have directory fid + name), we can reliably identify the object involved in
> the change (we always have object fid). I'd still prefer if we obfuscated
> directory events, without possibility of filtering based of
> CREATE/DELETE/MOVE (i.e., just one FAN_DIR_MODIFY event for this fanotify
> group) - actually I have hard time coming with a usecase where application
> would care about one type of event and not the other one. The other events
> remain as they are. What do you think?

That sounds like a plan.
I have no problem with the FAN_DIR_MODIFY obfuscation.

Will re-work the patches and demo.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2020-01-07 18:56                             ` Amir Goldstein
@ 2020-01-08  9:04                               ` Jan Kara
  2020-01-08 10:25                                 ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2020-01-08  9:04 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Tue 07-01-20 20:56:20, Amir Goldstein wrote:
> On Tue, Jan 7, 2020 at 7:10 PM Jan Kara <jack@suse.cz> wrote:
> > > There is another weird way to obfuscate the event type.
> > > I am not sure if users will be less confused about it:
> > > Each event type belongs to a group (i.e. self, dirent, poss_on_child)
> > > User may set any event type in the mask (e.g. create|delete|open|close)
> > > When getting an event from event group A (e.g. create), all event types
> > > of that group will be reported (e.g. create|delete).
> > >
> > > To put it another way:
> > > #define FAN_DIR_MODIFY (FAN_CREATE | FAN_MOVE | FAN_DELETE)
> > >
> > > For example in fanotify_group_event_mask():
> > > if (event_with_name) {
> > >     if (marks_mask & test_mask & FAN_DIR_MODIFY)
> > >         test_mask |= marks_mask & FAN_DIR_MODIFY
> > > ...
> > >
> > > Did somebody say over-engineering? ;)
> > >
> > > TBH, I don't see how we can do event type obfuscation
> > > that is both usable and not confusing, because the concept is
> > > confusing. I understand the reasoning behind it, but I don't think
> > > that many users will.
> > >
> > > I'm hoping that you can prove me wrong and find a way to simplify
> > > the API while retaining fair usability.
> >
> > I was thinking about this. If I understand the problem right, depending on
> > the usecase we may need with each event some subset of 'object fid',
> > 'directory fid', 'name in directory'. So what if we provided all these
> > three things in each event? Events will get somewhat bloated but it may be
> > bearable.
> >
> 
> I agree.
> 
> What I like about the fact that users don't need to choose between
> 'parent fid' and 'object fid' is that it makes some hard questions go away:
> 1. How are "self" events reported? simple - just with 'object id'
> 2. How are events on disconnected dentries reported? simple - just
> with 'object id'
> 3. How are events on the root of the watch reported? same answer
> 
> Did you write 'directory fid' as opposed to 'parent fid' for a reason?
> Was it your intention to imply that events on directories (e.g.
> open/close/attrib) are
> never reported with 'parent fid' , 'name in directory'?

Yes, that was what I thought.
 
> I see no functional problem with making that distinction between directory and
> non-directory, but I have a feeling that 'parent fid', 'name in
> directory', 'object id',
> regardless of dir/non-dir is going to be easier to document and less confusing
> for users to understand, so this is my preference.

Understood. The reason why I decided like this is that for a directory,
the parent may be actually on a different filesystem (so generating fid
will be more difficult) and also that what you get from dentry->d_parent
need not be the dir through which you actually reached the directory (think
of bind mounts) which could be a bit confusing. So I have no problem with
always providing 'parent fid' if we can give good answers to these
questions...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2020-01-08  9:04                               ` Jan Kara
@ 2020-01-08 10:25                                 ` Amir Goldstein
  2020-01-08 12:04                                   ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2020-01-08 10:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

> > What I like about the fact that users don't need to choose between
> > 'parent fid' and 'object fid' is that it makes some hard questions go away:
> > 1. How are "self" events reported? simple - just with 'object id'
> > 2. How are events on disconnected dentries reported? simple - just
> > with 'object id'
> > 3. How are events on the root of the watch reported? same answer
> >
> > Did you write 'directory fid' as opposed to 'parent fid' for a reason?
> > Was it your intention to imply that events on directories (e.g.
> > open/close/attrib) are
> > never reported with 'parent fid' , 'name in directory'?
>
> Yes, that was what I thought.
>
> > I see no functional problem with making that distinction between directory and
> > non-directory, but I have a feeling that 'parent fid', 'name in
> > directory', 'object id',
> > regardless of dir/non-dir is going to be easier to document and less confusing
> > for users to understand, so this is my preference.
>
> Understood. The reason why I decided like this is that for a directory,
> the parent may be actually on a different filesystem (so generating fid
> will be more difficult) and also that what you get from dentry->d_parent
> need not be the dir through which you actually reached the directory (think
> of bind mounts) which could be a bit confusing. So I have no problem with
> always providing 'parent fid' if we can give good answers to these
> questions...
>

Actually, my current code in branch fanotify_name already takes care of
some of those cases and it is rather easy to deal with the bind mount case
if path is available:

      if (path && path->mnt->mnt_root != dentry)
               mnt = real_mount(path->mnt);

      /* Not reporting parent fid + name for fs root, bind mount root and
         disconnected dentry */
      if (!IS_ROOT(dentry) && (!path || mnt))
               marks_mask |= fsnotify_watches_children(
                                               dentry->d_sb->s_fsnotify_mask);
      if (mnt)
               marks_mask |= fsnotify_watches_children(
                                               mnt->mnt_fsnotify_mask);

Note that a non-dir can also be bind mounted, so the concern you raised is
actually not limited to directories.
It is true that with the code above, FAN_ATTRIB and FAN_MODIFY (w/o path)
could still be reported to sb mark with the parent/name under the bind mount,
but that is not wrong at all IMO - I would say that is the expected behavior for
a filesystem mark.

IOW, the answer to your question, phrased in man page terminology is:
(parent fid + name) information is not guarantied to be available except for
FAN_DIR_MODIFY, but it may be available as extra info in addition to object fid
for events that are possible on child.

For example, an application relying on (parent fid + name) information for
FAN_MODIFY (e.g. for remote mirror) simply cannot get this information when
nfsd opens a file with a disconnected dentry and writes to it.

TBH, I am not convinced myself that reporting (parent fid + name) for
directories
is indeed easier to document/implement than treating directories different that
non-directories, but I am going to try and implement it this way and prepare a
draft man page update to see how it looks - I may yet change my mind after
going through this process...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: File monitor problem
  2020-01-08 10:25                                 ` Amir Goldstein
@ 2020-01-08 12:04                                   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2020-01-08 12:04 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Wilcox, Mo Re Ra, linux-fsdevel, Wez Furlong,
	Matthew Bobrowski, Linux API

On Wed 08-01-20 12:25:55, Amir Goldstein wrote:
> > > What I like about the fact that users don't need to choose between
> > > 'parent fid' and 'object fid' is that it makes some hard questions go away:
> > > 1. How are "self" events reported? simple - just with 'object id'
> > > 2. How are events on disconnected dentries reported? simple - just
> > > with 'object id'
> > > 3. How are events on the root of the watch reported? same answer
> > >
> > > Did you write 'directory fid' as opposed to 'parent fid' for a reason?
> > > Was it your intention to imply that events on directories (e.g.
> > > open/close/attrib) are
> > > never reported with 'parent fid' , 'name in directory'?
> >
> > Yes, that was what I thought.
> >
> > > I see no functional problem with making that distinction between directory and
> > > non-directory, but I have a feeling that 'parent fid', 'name in
> > > directory', 'object id',
> > > regardless of dir/non-dir is going to be easier to document and less confusing
> > > for users to understand, so this is my preference.
> >
> > Understood. The reason why I decided like this is that for a directory,
> > the parent may be actually on a different filesystem (so generating fid
> > will be more difficult) and also that what you get from dentry->d_parent
> > need not be the dir through which you actually reached the directory (think
> > of bind mounts) which could be a bit confusing. So I have no problem with
> > always providing 'parent fid' if we can give good answers to these
> > questions...
> >
> 
> Actually, my current code in branch fanotify_name already takes care of
> some of those cases and it is rather easy to deal with the bind mount case
> if path is available:
> 
>       if (path && path->mnt->mnt_root != dentry)
>                mnt = real_mount(path->mnt);
> 
>       /* Not reporting parent fid + name for fs root, bind mount root and
>          disconnected dentry */
>       if (!IS_ROOT(dentry) && (!path || mnt))
>                marks_mask |= fsnotify_watches_children(
>                                                dentry->d_sb->s_fsnotify_mask);
>       if (mnt)
>                marks_mask |= fsnotify_watches_children(
>                                                mnt->mnt_fsnotify_mask);
> 
> Note that a non-dir can also be bind mounted, so the concern you raised is
> actually not limited to directories.
> It is true that with the code above, FAN_ATTRIB and FAN_MODIFY (w/o path)
> could still be reported to sb mark with the parent/name under the bind mount,
> but that is not wrong at all IMO - I would say that is the expected behavior for
> a filesystem mark.

Yes, good points.

> IOW, the answer to your question, phrased in man page terminology is:
> (parent fid + name) information is not guarantied to be available except for
> FAN_DIR_MODIFY, but it may be available as extra info in addition to object fid
> for events that are possible on child.
> 
> For example, an application relying on (parent fid + name) information for
> FAN_MODIFY (e.g. for remote mirror) simply cannot get this information when
> nfsd opens a file with a disconnected dentry and writes to it.

And another good point.

> TBH, I am not convinced myself that reporting (parent fid + name) for
> directories
> is indeed easier to document/implement than treating directories different that
> non-directories, but I am going to try and implement it this way and prepare a
> draft man page update to see how it looks - I may yet change my mind after
> going through this process...

Ok, fair enough. I guess you've convinced me that both 'parent fid' and
'directory fid' approaches are somewhat messy so whatever ends up being
simplier / easier to understand is good :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-01-08 12:04 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-04 10:02 File monitor problem Mo Re Ra
2019-12-04 12:53 ` Amir Goldstein
2019-12-04 14:24   ` Mo Re Ra
2019-12-04 17:34     ` Jan Kara
2019-12-04 18:37       ` Amir Goldstein
2019-12-04 19:02         ` Matthew Wilcox
2019-12-04 20:27           ` Amir Goldstein
2019-12-11 10:06             ` Jan Kara
2019-12-11 13:58               ` Amir Goldstein
2019-12-16 15:00                 ` Amir Goldstein
2019-12-19  7:33                   ` Amir Goldstein
2019-12-23 18:19                     ` Jan Kara
2019-12-23 19:14                       ` Amir Goldstein
2019-12-24  3:49                         ` Amir Goldstein
2019-12-31 11:53                           ` Amir Goldstein
2020-01-07 17:10                           ` Jan Kara
2020-01-07 18:56                             ` Amir Goldstein
2020-01-08  9:04                               ` Jan Kara
2020-01-08 10:25                                 ` Amir Goldstein
2020-01-08 12:04                                   ` Jan Kara
2019-12-07 12:36       ` Mo Re Ra
2019-12-10 16:55         ` Jan Kara
2019-12-10 20:49           ` Amir Goldstein
2019-12-11 22:06             ` Wez Furlong
2019-12-12  5:56               ` Amir Goldstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.