All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] notification tree - try 37!
@ 2010-08-06 15:58 Eric Paris
  2010-08-06 23:34 ` Matt Helsley
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Paris @ 2010-08-06 15:58 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, viro, akpm

Here it is again!  Another notification pull request!  There is still
future work to be done on notification, but nothing that I believe
others would call blocking or functional.  The work I plan to do
includes:

1) Al has discussed the addition of a file_clone() call in the VFS which
would eliminate my use of __dentry_open() and would allow the removal of
the rather hideous FMODE_/O_NONOTIFY.
2) Al ask that the multiplexer from hell interfaces between fsnotify.h
and fsnotify.c be broken into multiple smaller cleaner functions.
3) Andrew has ask that I reduce the amount of code in the static inlines
in fsnotify.h by moving more to fsnotify.c.

- Eric


The following changes since commit fc1caf6eafb30ea185720e29f7f5eccca61ecd60:
  Linus Torvalds (1):
        Merge branch 'drm-core-next' of git://git.kernel.org/.../airlied/drm-2.6

are available in the git repository at:

  git://git.infradead.org/users/eparis/notify.git for-linus

Alexey Dobriyan (1):
      dnotify: move dir_notify_enable declaration

Andreas Gruenbacher (16):
      fsnotify: kill FSNOTIFY_EVENT_FILE
      fsnotify: take inode->i_lock inside fsnotify_find_mark_entry()
      fanotify: create_fd cleanup
      fanotify: Add pids to events
      fsnotify/vfsmount: add fsnotify fields to struct vfsmount
      fsnotify: Infrastructure for per-mount watches
      fanotify: remove fanotify_update_mark
      fanotify: do not call fanotify_update_object_mask in fanotify_remove_mark
      fanotify: do not call fanotify_update_object_mask in fanotify_add_mark
      fanotify: do not return pointer from fanotify_add_*_mark
      fanotify: remove fanotify_add_mark
      fanotify: rename FAN_MARK_ON_VFSMOUNT to FAN_MARK_MOUNT
      fanotify: split fanotify_remove_mark
      fanotify: remove fanotify.h declarations
      fanotify: remove outgoing function checks in fanotify.h
      fsnotify: Exchange list heads instead of moving elements

Dave Young (1):
      sysctl extern cleanup: inotify

Eric Paris (107):
      inotify: simplify the inotify idr handling
      Audit: clean up the audit_watch split
      audit: convert audit watches to use fsnotify instead of inotify
      audit: redo audit watch locking and refcnt in light of fsnotify
      audit: do not get and put just to free a watch
      fsnotify: duplicate fsnotify_mark_entry data between 2 marks
      fsnotify: allow addition of duplicate fsnotify marks
      audit: reimplement audit_trees using fsnotify rather than inotify
      Audit: audit watches depend on fsnotify
      Audit: split audit watch Kconfig
      Audit: audit watch init should not be before fsnotify init
      fsnotify: use fsnotify_create_event to allocate the q_overflow event
      inotify: use container_of instead of casting
      fsnotify: kzalloc fsnotify groups
      fsnotify: use kmem_cache_zalloc to simplify event initialization
      inotify: do not reuse watch descriptors
      inotify: remove inotify in kernel interface
      inotify: do not spam console without limit
      fsnotify: provide the data type to should_send_event
      fsnotify: include data in should_send calls
      fsnotify: pass a file instead of an inode to open, read, and write
      fsnotify: send struct file when sending events to parents when possible
      fsnotify: per group notification queue merge types
      fsnotify: clone existing events
      fsnotify: replace an event on a list
      fsnotify: lock annotation for event replacement
      fsnotify: remove group_num altogether
      fsnotify: fsnotify_obtain_group kzalloc cleanup
      fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group
      Audit: only set group mask when something is being watched
      fsnotify: drop mask argument from fsnotify_alloc_group
      fsnotify: rename fsnotify_groups to fsnotify_inode_groups
      fsnotify: initialize the group->num_marks in a better place
      fsnotify: add groups to fsnotify_inode_groups when registering inode watch
      fsnotify: mount point listeners list and global mask
      fsnotify: include vfsmount in should_send_event when appropriate
      fsnotify: put inode specific fields in an fsnotify_mark in a union
      fsnotify: add vfsmount specific fields to the fsnotify_mark_entry union
      fsnotify: add flags to fsnotify_mark_entries
      fsnotify: rename fsnotify_mark_entry to just fsnotify_mark
      fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_mark
      fsnotify: rename mark_entry to just mark
      inotify: rename mark_entry to just mark
      dnotify: rename mark_entry to mark
      vfs: introduce FMODE_NONOTIFY
      fanotify: fscking all notification system
      fanotify:drop notification if they exist in the outgoing queue
      fanotify: merge notification events with different masks
      fanotify: do not clone on merge unless needed
      fanotify: fanotify_init syscall declaration
      fanotify: fanotify_init syscall implementation
      fanotify: sys_fanotify_mark declartion
      fanotify: fanotify_mark syscall implementation
      fanotify: send events using read
      fsnotify: split generic and inode specific mark code
      fsnotify: clear marks to 0 in fsnotify_init_mark
      fsnotify: vfsmount marks generic functions
      fanotify: should_send_event needs to handle vfsmounts
      fanotify: infrastructure to add an remove marks on vfsmounts
      fanotify: hooks the fanotify_mark syscall to the vfsmount code
      fsnotify: allow marks to not pin inodes in core
      fsnotify: ignored_mask - excluding notification
      fanotify: ignored_mask to ignore events
      fanotify: allow users to set an ignored_mask
      fsnotify: clear ignored mask on modify
      fsnotify: allow ignored_mask to survive modification
      fanotify: allow ignored_masks to survive modify
      fanotify: clear all fanotify marks
      fsnotify: add group priorities
      fsnotify: intoduce a notification merge argument
      fanotify: use merge argument to determine actual event added to queue
      fsnotify: use unsigned char * for dentry->d_name.name
      fsnotify: new fsnotify hooks and events types for access decisions
      fanotify: permissions and blocking
      fanotify: userspace interface for permission responses
      fanotify: do not return 0 in a void function
      fsnotify: call iput on inodes when no longer marked
      fsnotify: initialize mask in fsnotify_perm
      fanotify: default Kconfig to n
      fanotify: drop the useless priority argument
      inotify: fix inotify oneshot support
      inotify: send IN_UNMOUNT events
      inotify: allow users to request not to recieve events on unlinked children
      inotify: force inotify and fsnotify use same bits
      fsnotify: check to make sure all fsnotify bits are unique
      fanotify: groups can specify their f_flags for new fd
      fsnotify: add pr_debug throughout
      fsnotify: fsnotify_add_notify_event should return an event
      fsnotify: store struct file not struct path
      vfs/fsnotify: fsnotify_close can delay the final work in fput
      fsnotify: place marks on object in order of group memory address
      fsnotify: use _rcu functions for mark list traversal
      fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
      fsnotify: srcu to protect read side of inode and vfsmount locks
      fsnotify: send fsnotify_mark to groups in event handling functions
      inotify: use the mark in handler functions
      dnotify: use the mark in handler functions
      audit: use the mark in handler functions
      fanotify: use the mark in handler functions
      fsnotify: cleanup should_send_event
      fsnotify: remove the global masks
      fsnotify: remove group->mask
      fsnotify: remove global fsnotify groups lists
      fsnotify: rework ignored mark flushing
      fsnotify: walk the inode and vfsmount lists simultaneously
      fsnotify: pass both the vfsmount mark and inode mark
      fanotify: use both marks when possible

H Hartley Sweeten (1):
      inotify_user.c: make local symbol static

Heiko Carstens (1):
      fanotify: CONFIG_HAVE_SYSCALL_WRAPPERS for sys_fanotify_mark

Jean-Christophe Dubois (1):
      fanotify: do not always return 0 in fsnotify

Jerome Marchand (1):
      inotify: Fix mask checks

Paul Mundt (1):
      fanotify: select ANON_INODES.

Signed-off-by: Wu Fengguang (1):
      fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict

Tejun Heo (1):
      fsnotify: update gfp/slab.h includes

 Documentation/feature-removal-schedule.txt |    8 -
 arch/x86/ia32/ia32entry.S                  |    2 +
 arch/x86/ia32/sys_ia32.c                   |    9 +
 arch/x86/include/asm/sys_ia32.h            |    3 +
 arch/x86/include/asm/unistd_32.h           |    4 +-
 arch/x86/include/asm/unistd_64.h           |    4 +
 arch/x86/kernel/syscall_table_32.S         |    2 +
 fs/compat.c                                |    5 +-
 fs/exec.c                                  |    4 +-
 fs/file_table.c                            |    9 +
 fs/inode.c                                 |    8 +-
 fs/namei.c                                 |    2 +-
 fs/namespace.c                             |    5 +
 fs/nfsd/vfs.c                              |    4 +-
 fs/notify/Kconfig                          |    1 +
 fs/notify/Makefile                         |    4 +-
 fs/notify/dnotify/dnotify.c                |  213 +++----
 fs/notify/fanotify/Kconfig                 |   26 +
 fs/notify/fanotify/Makefile                |    1 +
 fs/notify/fanotify/fanotify.c              |  212 +++++++
 fs/notify/fanotify/fanotify_user.c         |  760 ++++++++++++++++++++++++
 fs/notify/fsnotify.c                       |  201 ++++++--
 fs/notify/fsnotify.h                       |   27 +-
 fs/notify/group.c                          |  182 +------
 fs/notify/inode_mark.c                     |  331 ++++--------
 fs/notify/inotify/Kconfig                  |   15 -
 fs/notify/inotify/Makefile                 |    1 -
 fs/notify/inotify/inotify.c                |  873 ----------------------------
 fs/notify/inotify/inotify.h                |    7 +-
 fs/notify/inotify/inotify_fsnotify.c       |  151 ++++--
 fs/notify/inotify/inotify_user.c           |  369 ++++++++-----
 fs/notify/mark.c                           |  371 ++++++++++++
 fs/notify/notification.c                   |  236 +++++---
 fs/notify/vfsmount_mark.c                  |  187 ++++++
 fs/open.c                                  |    3 +-
 fs/read_write.c                            |    8 +-
 include/asm-generic/fcntl.h                |    8 +
 include/linux/Kbuild                       |    1 +
 include/linux/dnotify.h                    |    1 +
 include/linux/fanotify.h                   |  105 ++++
 include/linux/fs.h                         |   16 +-
 include/linux/fsnotify.h                   |  161 +++---
 include/linux/fsnotify_backend.h           |  211 +++++---
 include/linux/inotify.h                    |  185 +------
 include/linux/mount.h                      |    6 +-
 include/linux/security.h                   |    1 +
 include/linux/syscalls.h                   |    4 +
 init/Kconfig                               |   10 +-
 kernel/Makefile                            |    5 +-
 kernel/audit.c                             |    1 -
 kernel/audit.h                             |   26 +-
 kernel/audit_tree.c                        |  237 +++++----
 kernel/audit_watch.c                       |  274 +++++----
 kernel/auditfilter.c                       |   39 +-
 kernel/auditsc.c                           |   10 +-
 kernel/sys_ni.c                            |    4 +
 kernel/sysctl.c                            |    7 +-
 security/security.c                        |   16 +-
 58 files changed, 3197 insertions(+), 2379 deletions(-)
 create mode 100644 fs/notify/fanotify/Kconfig
 create mode 100644 fs/notify/fanotify/Makefile
 create mode 100644 fs/notify/fanotify/fanotify.c
 create mode 100644 fs/notify/fanotify/fanotify_user.c
 delete mode 100644 fs/notify/inotify/inotify.c
 create mode 100644 fs/notify/mark.c
 create mode 100644 fs/notify/vfsmount_mark.c
 create mode 100644 include/linux/fanotify.h


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-06 15:58 [GIT PULL] notification tree - try 37! Eric Paris
@ 2010-08-06 23:34 ` Matt Helsley
  2010-08-07  0:06   ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Matt Helsley @ 2010-08-06 23:34 UTC (permalink / raw)
  To: Eric Paris; +Cc: torvalds, linux-kernel, viro, akpm

On Fri, Aug 06, 2010 at 11:58:39AM -0400, Eric Paris wrote:
> Here it is again!  Another notification pull request!  There is still
> future work to be done on notification, but nothing that I believe
> others would call blocking or functional.  The work I plan to do
> includes:
> 
> 1) Al has discussed the addition of a file_clone() call in the VFS which
> would eliminate my use of __dentry_open() and would allow the removal of
> the rather hideous FMODE_/O_NONOTIFY.

I did a quick search and can't find a mailing list post on this. Was
it a private discussion or is there something I can read about what
file_clone() will do?

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-06 23:34 ` Matt Helsley
@ 2010-08-07  0:06   ` Christoph Hellwig
  2010-08-07 19:15     ` Eric Paris
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2010-08-07  0:06 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Eric Paris, torvalds, linux-kernel, viro, akpm

On Fri, Aug 06, 2010 at 04:34:31PM -0700, Matt Helsley wrote:
> On Fri, Aug 06, 2010 at 11:58:39AM -0400, Eric Paris wrote:
> > Here it is again!  Another notification pull request!  There is still
> > future work to be done on notification, but nothing that I believe
> > others would call blocking or functional.  The work I plan to do
> > includes:
> > 
> > 1) Al has discussed the addition of a file_clone() call in the VFS which
> > would eliminate my use of __dentry_open() and would allow the removal of
> > the rather hideous FMODE_/O_NONOTIFY.
> 
> I did a quick search and can't find a mailing list post on this. Was
> it a private discussion or is there something I can read about what
> file_clone() will do?

I'm also totally missing on any re-post of these patches or discussion
of the changes during the last development window.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-07  0:06   ` Christoph Hellwig
@ 2010-08-07 19:15     ` Eric Paris
  2010-08-07 20:55       ` Matt Helsley
  2010-08-16 20:32       ` Andreas Gruenbacher
  0 siblings, 2 replies; 46+ messages in thread
From: Eric Paris @ 2010-08-07 19:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Matt Helsley, torvalds, linux-kernel, viro, akpm

On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> On Fri, Aug 06, 2010 at 04:34:31PM -0700, Matt Helsley wrote:
> > On Fri, Aug 06, 2010 at 11:58:39AM -0400, Eric Paris wrote:
> > > Here it is again!  Another notification pull request!  There is still
> > > future work to be done on notification, but nothing that I believe
> > > others would call blocking or functional.  The work I plan to do
> > > includes:
> > > 
> > > 1) Al has discussed the addition of a file_clone() call in the VFS which
> > > would eliminate my use of __dentry_open() and would allow the removal of
> > > the rather hideous FMODE_/O_NONOTIFY.
> > 
> > I did a quick search and can't find a mailing list post on this. Was
> > it a private discussion or is there something I can read about what
> > file_clone() will do?

No, it was from a face to face meeting and a couple of irc conversations
talk about all of this stuff.  My understanding was that it was going to
be a lot like dentry_open() only it was going to require a valid struct
file and would return a new struct file.  One of the purposes of the new
interface being the ability to set f_mode at a better time to eliminate
the FMODE/O_ overlapping horror that fanotify requires to prevent
recursion and deadlock.

> I'm also totally missing on any re-post of these patches or discussion
> of the changes during the last development window.

I just searched lkml an fsdevel where I usually send everything don't
see then.  I totally failed.  I'm used to not hearing a peep about my
patches so I never noticed the lack of feedback.  I would have sworn the
last set of patches I sent to both Al and the list, but apparently I
only ever sent stuff to Al.  Looks like all changes between
f874e1ac21d770846 and 1968f5eed54ce47bde4 are the ones of interest.
They are all notification internal except for one:  

http://git.infradead.org/users/eparis/notify.git/commitdiff/c1e5c954020e123d30b4abf4038ce501861bcf9f

It screws with struct file refcnting.

Diffstat of when I don't see posted (well some of it is, but most isn't)
is below....

 fs/file_table.c                      |    9 +
 fs/notify/dnotify/dnotify.c          |   43 +-----
 fs/notify/fanotify/fanotify.c        |  221 +++++++++++++---------------------
 fs/notify/fanotify/fanotify_user.c   |   35 +----
 fs/notify/fsnotify.c                 |  226 ++++++++++++++++++++---------------
 fs/notify/fsnotify.h                 |   18 --
 fs/notify/group.c                    |  168 --------------------------
 fs/notify/inode_mark.c               |   48 +++++--
 fs/notify/inotify/inotify_fsnotify.c |   91 +++++---------
 fs/notify/inotify/inotify_user.c     |   34 +++--
 fs/notify/mark.c                     |   78 +++++++++---
 fs/notify/notification.c             |   88 +++++++++----
 fs/notify/vfsmount_mark.c            |   48 ++++---
 include/linux/fsnotify.h             |   37 ++---
 include/linux/fsnotify_backend.h     |   84 +++++--------
 kernel/audit_tree.c                  |   12 +
 kernel/audit_watch.c                 |   47 +------
 17 files changed, 578 insertions(+), 709 deletions(-)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-07 19:15     ` Eric Paris
@ 2010-08-07 20:55       ` Matt Helsley
  2010-08-16 20:32       ` Andreas Gruenbacher
  1 sibling, 0 replies; 46+ messages in thread
From: Matt Helsley @ 2010-08-07 20:55 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro, akpm

On Sat, Aug 07, 2010 at 03:15:14PM -0400, Eric Paris wrote:
> On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> > On Fri, Aug 06, 2010 at 04:34:31PM -0700, Matt Helsley wrote:
> > > On Fri, Aug 06, 2010 at 11:58:39AM -0400, Eric Paris wrote:
> > > > Here it is again!  Another notification pull request!  There is still
> > > > future work to be done on notification, but nothing that I believe
> > > > others would call blocking or functional.  The work I plan to do
> > > > includes:
> > > > 
> > > > 1) Al has discussed the addition of a file_clone() call in the VFS which
> > > > would eliminate my use of __dentry_open() and would allow the removal of
> > > > the rather hideous FMODE_/O_NONOTIFY.
> > > 
> > > I did a quick search and can't find a mailing list post on this. Was
> > > it a private discussion or is there something I can read about what
> > > file_clone() will do?
> 
> No, it was from a face to face meeting and a couple of irc conversations
> talk about all of this stuff.  My understanding was that it was going to
> be a lot like dentry_open() only it was going to require a valid struct
> file and would return a new struct file.  One of the purposes of the new
> interface being the ability to set f_mode at a better time to eliminate
> the FMODE/O_ overlapping horror that fanotify requires to prevent
> recursion and deadlock.

Thanks Eric, that's the information I was looking for. I was curious
because there's a chance file_clone() as you described it may also be useful
for checkpoint/restart.

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-07 19:15     ` Eric Paris
  2010-08-07 20:55       ` Matt Helsley
@ 2010-08-16 20:32       ` Andreas Gruenbacher
  2010-08-17  3:39         ` Eric Paris
  2010-08-18 15:47         ` [GIT PULL] notification tree: directory events Andreas Gruenbacher
  1 sibling, 2 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-16 20:32 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Saturday 07 August 2010 21:15:14 Eric Paris wrote:
> On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> > I'm also totally missing on any re-post of these patches or discussion
> > of the changes during the last development window.
> 
> I just searched lkml an fsdevel where I usually send everything don't
> see then.  I totally failed.

Oh yes.

This introduces two new syscalls which will be impossible to fix up after the 
fact, and those system calls are poorly documented: commits 2a3edf86 and 
52c923dd document the initial versions (in the commit message!), but 
subsequent commits then extend that interface.  The interface for replying to 
events is not documented at all beyond the example code [1].  There is no 
documentation in Documentation/filesystems/, either.

	[1] http://people.redhat.com/~eparis/fanotify/


Q: What happens when a process watching for FAN_OPEN_PERM or FAN_ACCESS_PERM 
events exits or dies while events are in flight?  I can't see anything in the 
code that would wake sleeping processes up when the fsnotify_group of the 
listener is torn down.


Q: What prevents the system from going out of memory when a listener decides 
to stop reading events or simply can't keep up?  There doesn't seem to be a 
limit on the queue depth.  Listeners currently need CAP_SYS_ADMIN, but somehow 
limiting the queue depth and throttling when things start to go bad still 
sounds like a reasonable thing to do, right?)


Don't get me wrong, I really appreciate your work on this (this shows through 
the patches I've committed), and what we have now is a lot better than what we 
had before.


Thanks,
Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-16 20:32       ` Andreas Gruenbacher
@ 2010-08-17  3:39         ` Eric Paris
  2010-08-17  4:03           ` Matt Helsley
                             ` (4 more replies)
  2010-08-18 15:47         ` [GIT PULL] notification tree: directory events Andreas Gruenbacher
  1 sibling, 5 replies; 46+ messages in thread
From: Eric Paris @ 2010-08-17  3:39 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> On Saturday 07 August 2010 21:15:14 Eric Paris wrote:
> > On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> > > I'm also totally missing on any re-post of these patches or discussion
> > > of the changes during the last development window.
> > 
> > I just searched lkml an fsdevel where I usually send everything don't
> > see then.  I totally failed.
> 
> Oh yes.
> 
> This introduces two new syscalls which will be impossible to fix up after the 
> fact, and those system calls are poorly documented: commits 2a3edf86 and 
> 52c923dd document the initial versions (in the commit message!), but 
> subsequent commits then extend that interface.  The interface for replying to 
> events is not documented at all beyond the example code [1].  There is no 
> documentation in Documentation/filesystems/, either.
> 
> 	[1] http://people.redhat.com/~eparis/fanotify/

I'll work on documentation.  Although it should be pointed out that the
interface was sent to list many times with lots of discussion and
feedback.  The only patches that didn't make the list were the last
couple which changed internal notification semantics (and fscked with
fput() but that patch, which caused problems, was specifically pointed
out in this thread and reverted).

> Q: What happens when a process watching for FAN_OPEN_PERM or FAN_ACCESS_PERM 
> events exits or dies while events are in flight?  I can't see anything in the 
> code that would wake sleeping processes up when the fsnotify_group of the 
> listener is torn down.

We can get stuck.  There was code which cleaned that up, but it got
accidentally removed long ago when, upon review on list, I was told to
remove all timeout code.  It's easy enough to fix up.  I'll post a patch
this week.

> Q: What prevents the system from going out of memory when a listener decides 
> to stop reading events or simply can't keep up?  There doesn't seem to be a 
> limit on the queue depth.  Listeners currently need CAP_SYS_ADMIN, but somehow 
> limiting the queue depth and throttling when things start to go bad still 
> sounds like a reasonable thing to do, right?)

It's an interesting question and obviously one that I've thought about.
You remember when we talked previously I said the hardest part left was
allowing non-root users to use the interface.  It gets especially
difficult when thinking about perm-events.  I was specifically told not
to timeout or drop those.  But when dealing with non-root users using
perm events?   As for pure notification we can do something like inotify
does quite easily.

I'm not certain exactly what the best semantics are for non trusted
users, so I didn't push any patches that way.  Suggestions welcome   :)

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
       [not found]           ` <1282016387.21419.113.camel-u/cB4NFi02V49Jlha2NJH1aTQe2KTcn/@public.gmane.org>
@ 2010-08-17  4:03             ` Matt Helsley
  0 siblings, 0 replies; 46+ messages in thread
From: Matt Helsley @ 2010-08-17  4:03 UTC (permalink / raw)
  To: Eric Paris
  Cc: Biederman Eric Biederman, Containers, Michael Kerrisk,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b, Andreas Gruenbacher,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn

On Mon, Aug 16, 2010 at 11:39:47PM -0400, Eric Paris wrote:
> On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > On Saturday 07 August 2010 21:15:14 Eric Paris wrote:
> > > On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> > > > I'm also totally missing on any re-post of these patches or discussion
> > > > of the changes during the last development window.
> > > 
> > > I just searched lkml an fsdevel where I usually send everything don't
> > > see then.  I totally failed.
> > 
> > Oh yes.
> > 
> > This introduces two new syscalls which will be impossible to fix up after the 
> > fact, and those system calls are poorly documented: commits 2a3edf86 and 
> > 52c923dd document the initial versions (in the commit message!), but 
> > subsequent commits then extend that interface.  The interface for replying to 
> > events is not documented at all beyond the example code [1].  There is no 
> > documentation in Documentation/filesystems/, either.
> > 
> > 	[1] http://people.redhat.com/~eparis/fanotify/
> 
> I'll work on documentation.  Although it should be pointed out that the
> interface was sent to list many times with lots of discussion and
> feedback.  The only patches that didn't make the list were the last
> couple which changed internal notification semantics (and fscked with
> fput() but that patch, which caused problems, was specifically pointed
> out in this thread and reverted).
> 
> > Q: What happens when a process watching for FAN_OPEN_PERM or FAN_ACCESS_PERM 
> > events exits or dies while events are in flight?  I can't see anything in the 
> > code that would wake sleeping processes up when the fsnotify_group of the 
> > listener is torn down.
> 
> We can get stuck.  There was code which cleaned that up, but it got
> accidentally removed long ago when, upon review on list, I was told to
> remove all timeout code.  It's easy enough to fix up.  I'll post a patch
> this week.
> 
> > Q: What prevents the system from going out of memory when a listener decides 
> > to stop reading events or simply can't keep up?  There doesn't seem to be a 
> > limit on the queue depth.  Listeners currently need CAP_SYS_ADMIN, but somehow 
> > limiting the queue depth and throttling when things start to go bad still 
> > sounds like a reasonable thing to do, right?)
> 
> It's an interesting question and obviously one that I've thought about.
> You remember when we talked previously I said the hardest part left was
> allowing non-root users to use the interface.  It gets especially
> difficult when thinking about perm-events.  I was specifically told not
> to timeout or drop those.  But when dealing with non-root users using
> perm events?   As for pure notification we can do something like inotify
> does quite easily.
> 
> I'm not certain exactly what the best semantics are for non trusted
> users, so I didn't push any patches that way.  Suggestions welcome   :)

Hi Eric,

Sorry I haven't had a chance to look at the perm events. I think
user namespace and containers folks might be interested in them for
non-root users though.

[Adding userns/containers folks to Cc]

I'm guessing non-trusted users can be restricted to only get perm events
on stuff they already own. Plus make perm events by non-trusted users
unreliable yet failsafe -- return EACESS/EPERM when we have to timeout or
drop the events if I understand the issues correctly. Those rules plus user
namespaces would still be quite useful I think. Do they seem reasonable to
implement? Am I forgetting anything important?

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  3:39         ` Eric Paris
@ 2010-08-17  4:03           ` Matt Helsley
       [not found]           ` <1282016387.21419.113.camel-u/cB4NFi02V49Jlha2NJH1aTQe2KTcn/@public.gmane.org>
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Matt Helsley @ 2010-08-17  4:03 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk, Serge Hallyn,
	Biederman Eric Biederman, Containers

On Mon, Aug 16, 2010 at 11:39:47PM -0400, Eric Paris wrote:
> On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > On Saturday 07 August 2010 21:15:14 Eric Paris wrote:
> > > On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> > > > I'm also totally missing on any re-post of these patches or discussion
> > > > of the changes during the last development window.
> > > 
> > > I just searched lkml an fsdevel where I usually send everything don't
> > > see then.  I totally failed.
> > 
> > Oh yes.
> > 
> > This introduces two new syscalls which will be impossible to fix up after the 
> > fact, and those system calls are poorly documented: commits 2a3edf86 and 
> > 52c923dd document the initial versions (in the commit message!), but 
> > subsequent commits then extend that interface.  The interface for replying to 
> > events is not documented at all beyond the example code [1].  There is no 
> > documentation in Documentation/filesystems/, either.
> > 
> > 	[1] http://people.redhat.com/~eparis/fanotify/
> 
> I'll work on documentation.  Although it should be pointed out that the
> interface was sent to list many times with lots of discussion and
> feedback.  The only patches that didn't make the list were the last
> couple which changed internal notification semantics (and fscked with
> fput() but that patch, which caused problems, was specifically pointed
> out in this thread and reverted).
> 
> > Q: What happens when a process watching for FAN_OPEN_PERM or FAN_ACCESS_PERM 
> > events exits or dies while events are in flight?  I can't see anything in the 
> > code that would wake sleeping processes up when the fsnotify_group of the 
> > listener is torn down.
> 
> We can get stuck.  There was code which cleaned that up, but it got
> accidentally removed long ago when, upon review on list, I was told to
> remove all timeout code.  It's easy enough to fix up.  I'll post a patch
> this week.
> 
> > Q: What prevents the system from going out of memory when a listener decides 
> > to stop reading events or simply can't keep up?  There doesn't seem to be a 
> > limit on the queue depth.  Listeners currently need CAP_SYS_ADMIN, but somehow 
> > limiting the queue depth and throttling when things start to go bad still 
> > sounds like a reasonable thing to do, right?)
> 
> It's an interesting question and obviously one that I've thought about.
> You remember when we talked previously I said the hardest part left was
> allowing non-root users to use the interface.  It gets especially
> difficult when thinking about perm-events.  I was specifically told not
> to timeout or drop those.  But when dealing with non-root users using
> perm events?   As for pure notification we can do something like inotify
> does quite easily.
> 
> I'm not certain exactly what the best semantics are for non trusted
> users, so I didn't push any patches that way.  Suggestions welcome   :)

Hi Eric,

Sorry I haven't had a chance to look at the perm events. I think
user namespace and containers folks might be interested in them for
non-root users though.

[Adding userns/containers folks to Cc]

I'm guessing non-trusted users can be restricted to only get perm events
on stuff they already own. Plus make perm events by non-trusted users
unreliable yet failsafe -- return EACESS/EPERM when we have to timeout or
drop the events if I understand the issues correctly. Those rules plus user
namespaces would still be quite useful I think. Do they seem reasonable to
implement? Am I forgetting anything important?

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  3:39         ` Eric Paris
  2010-08-17  4:03           ` Matt Helsley
       [not found]           ` <1282016387.21419.113.camel-u/cB4NFi02V49Jlha2NJH1aTQe2KTcn/@public.gmane.org>
@ 2010-08-17  8:09           ` Andreas Gruenbacher
  2010-08-17 15:08             ` Eric Paris
  2010-08-20  0:00             ` Andreas Gruenbacher
  2010-08-17  8:38           ` Andreas Gruenbacher
  2010-08-17  9:45           ` Tvrtko Ursulin
  4 siblings, 2 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-17  8:09 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > Q: What happens when a process watching for FAN_OPEN_PERM or
> > FAN_ACCESS_PERM events exits or dies while events are in flight?  I
> > can't see anything in the code that would wake sleeping processes up
> > when the fsnotify_group of the listener is torn down.
> 
> We can get stuck.  There was code which cleaned that up, but it got
> accidentally removed long ago when, upon review on list, I was told to
> remove all timeout code.  It's easy enough to fix up.  I'll post a patch
> this week.

This needs to be fixed then.  Not such a big deal, but it shows that the tree 
wasn't ready for being merged yet and needs further review.

> > Q: What prevents the system from going out of memory when a listener
> > decides to stop reading events or simply can't keep up?  There doesn't
> > seem to be a limit on the queue depth.  Listeners currently need
> > CAP_SYS_ADMIN, but somehow limiting the queue depth and throttling when
> > things start to go bad still sounds like a reasonable thing to do,
> > right?
> 
> It's an interesting question and obviously one that I've thought about.
> You remember when we talked previously I said the hardest part left was
> allowing non-root users to use the interface.  It gets especially
> difficult when thinking about perm-events.  I was specifically told not
> to timeout or drop those.  But when dealing with non-root users using
> perm events?   As for pure notification we can do something like inotify
> does quite easily.
> 
> I'm not certain exactly what the best semantics are for non trusted
> users, so I didn't push any patches that way.  Suggestions welcome   :)

The system will happily go OOM for trusted users and non-perm events if the 
listener doesn't keep up, so some throttling, dropping, or both needs to 
happen for non-perm events.  This is the critical case.  Doing what inotify 
does (queue an overflow event and drop further events) seems to make sense 
here.

The situation with perm-events is less severe because the number of 
outstanding perm events is bounded by the number of running processes.  This 
may be enough of a limit.

I don't think we need to worry about perm-events for untrusted users.  We can 
start supporting some kinds of non-perm-events for untrusted users later; this 
won't change the existing interface.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  3:39         ` Eric Paris
                             ` (2 preceding siblings ...)
  2010-08-17  8:09           ` Andreas Gruenbacher
@ 2010-08-17  8:38           ` Andreas Gruenbacher
  2010-08-17 15:24             ` Eric Paris
  2010-08-17  9:45           ` Tvrtko Ursulin
  4 siblings, 1 reply; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-17  8:38 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > On Saturday 07 August 2010 21:15:14 Eric Paris wrote:
> > > On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> > > > I'm also totally missing on any re-post of these patches or
> > > > discussion of the changes during the last development window.
> > > 
> > > I just searched lkml an fsdevel where I usually send everything don't
> > > see then.  I totally failed.
> > 
> > Oh yes.
> > 
> > This introduces two new syscalls which will be impossible to fix up after
> > the fact, and those system calls are poorly documented: commits 2a3edf86
> > and 52c923dd document the initial versions (in the commit message!), but
> > subsequent commits then extend that interface.  The interface for
> > replying to events is not documented at all beyond the example code [1].
> >  There is no documentation in Documentation/filesystems/, either.
> > 
> > 	[1] http://people.redhat.com/~eparis/fanotify/

Oh ... this example doesn't actually build; both syscall prototypes are wrong.  
What have you been testing this with?

> I'll work on documentation.  Although it should be pointed out that the
> interface was sent to list many times with lots of discussion and
> feedback.

One of the wonky remaining bits is the way how files are reopened with 
dentry_open() with the f_flags passed to fanotify_init().  The open can fail, 
in which case the user is left with an error condition but with no indication 
as to which object the error happened for.  What the heck?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  3:39         ` Eric Paris
                             ` (3 preceding siblings ...)
  2010-08-17  8:38           ` Andreas Gruenbacher
@ 2010-08-17  9:45           ` Tvrtko Ursulin
  2010-08-17 10:01             ` Andreas Gruenbacher
  4 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2010-08-17  9:45 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Tuesday 17 Aug 2010 04:39:47 Eric Paris wrote:
> > Q: What prevents the system from going out of memory when a listener
> > decides to stop reading events or simply can't keep up?  There doesn't
> > seem to be a limit on the queue depth.  Listeners currently need
> > CAP_SYS_ADMIN, but somehow limiting the queue depth and throttling when
> > things start to go bad still sounds like a reasonable thing to do,
> > right?)
>
> It's an interesting question and obviously one that I've thought about.
> You remember when we talked previously I said the hardest part left was
> allowing non-root users to use the interface.  It gets especially
> difficult when thinking about perm-events.  I was specifically told not
> to timeout or drop those.  But when dealing with non-root users using
> perm events?   As for pure notification we can do something like inotify
> does quite easily.

Why no timeouts? It sounds like a feasible way to work around listeners which
have stopped working. (Timeout and -ETIME for example to be clear, not
allowing access).

Alternative might be to expose queue size per group (and some additional group
info) so a daemon could keep an eye on listeners which are not making progress
and act accordingly. Sometimes appropriate action would be to restart, or to
kill, or even spawn a new one. Last bit is especially useful with some FUSE
filesystems to avoid deadlocks. Otherwise listener can get a perm event for
the top level, and then another perm event is generated when FUSE opens the
underlying object and there is noone to handle it.

But this can also work together with timeouts.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  9:45           ` Tvrtko Ursulin
@ 2010-08-17 10:01             ` Andreas Gruenbacher
  2010-08-17 10:12               ` Tvrtko Ursulin
  0 siblings, 1 reply; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-17 10:01 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Eric Paris, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Tuesday 17 August 2010 11:45:25 Tvrtko Ursulin wrote:
> Why no timeouts? It sounds like a feasible way to work around listeners
> which have stopped working. (Timeout and -ETIME for example to be clear,
> not allowing access).

>From the kernel's point of view, there is no way to guess how long those 
timeouts should be.  Watching for progress can be implemented in user space 
though.

Setting errno to ETIME as a result of trying to access a file is likely to 
break some applications which are not prepared to receive this error 
condition; we cannot do that.

I'm quite sure that both of these issues have been discussed already.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17 10:01             ` Andreas Gruenbacher
@ 2010-08-17 10:12               ` Tvrtko Ursulin
  2010-08-17 10:55                 ` Tvrtko Ursulin
  0 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2010-08-17 10:12 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Eric Paris, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Tuesday 17 Aug 2010 11:01:09 Andreas Gruenbacher wrote:
> On Tuesday 17 August 2010 11:45:25 Tvrtko Ursulin wrote:
> > Why no timeouts? It sounds like a feasible way to work around listeners
> > which have stopped working. (Timeout and -ETIME for example to be clear,
> > not allowing access).
>
> From the kernel's point of view, there is no way to guess how long those
> timeouts should be.  Watching for progress can be implemented in user space
> though.

That is why I said the timeout can be configurable. But I agree (and raise you
double :) that not only the kernel can not guess how long the timeout should
be, but the application can not know as well. One possible way to let fanotify
know that listeners is making progress is via some sort of a heartbeat
message.  Then application can set the timeout as a fail safe and perm events
can take as long as needed and we still prevent clogging the system.

How would you create a robust solution purely from userspace? With the current
interface?

> Setting errno to ETIME as a result of trying to access a file is likely to
> break some applications which are not prepared to receive this error
> condition; we cannot do that.

Why you think EPERM will be handled better?

> I'm quite sure that both of these issues have been discussed already.

Ok, I obviosuly missed it. Do you have a pointer perhaps?

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17 10:12               ` Tvrtko Ursulin
@ 2010-08-17 10:55                 ` Tvrtko Ursulin
  2010-08-17 15:27                   ` Eric Paris
  0 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2010-08-17 10:55 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Eric Paris, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Tuesday 17 Aug 2010 11:12:41 Tvrtko Ursulin wrote:
> On Tuesday 17 Aug 2010 11:01:09 Andreas Gruenbacher wrote:
> > I'm quite sure that both of these issues have been discussed already.
>
> Ok, I obviosuly missed it. Do you have a pointer perhaps?

I have found the thread. It has been discussed but I do not find it had a
clear outcome. At the end Eric has proposed the heartbeat/in-progress option
(which IMHO can only work together with a timeout) to which no-one objected. I
did not find other arguments against such functionality solid.

Tvrtko


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  8:09           ` Andreas Gruenbacher
@ 2010-08-17 15:08             ` Eric Paris
  2010-08-19 20:24               ` Andreas Gruenbacher
  2010-08-20  0:00             ` Andreas Gruenbacher
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Paris @ 2010-08-17 15:08 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Tue, 2010-08-17 at 10:09 +0200, Andreas Gruenbacher wrote:
> On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> > On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > > Q: What happens when a process watching for FAN_OPEN_PERM or
> > > FAN_ACCESS_PERM events exits or dies while events are in flight?  I
> > > can't see anything in the code that would wake sleeping processes up
> > > when the fsnotify_group of the listener is torn down.
> > 
> > We can get stuck.  There was code which cleaned that up, but it got
> > accidentally removed long ago when, upon review on list, I was told to
> > remove all timeout code.  It's easy enough to fix up.  I'll post a patch
> > this week.
> 
> This needs to be fixed then.  Not such a big deal, but it shows that the tree 
> wasn't ready for being merged yet and needs further review.

Code with bugs, shocking!  Two other bugs have been found and patches
for those will be coming shortly.  I've begged for review how many
times?  I don't care when review it comes, I'll address any issues as
they come up.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  8:38           ` Andreas Gruenbacher
@ 2010-08-17 15:24             ` Eric Paris
  2010-08-17 15:48               ` Andreas Gruenbacher
  2010-08-18 14:18               ` Andreas Gruenbacher
  0 siblings, 2 replies; 46+ messages in thread
From: Eric Paris @ 2010-08-17 15:24 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Tue, 2010-08-17 at 10:38 +0200, Andreas Gruenbacher wrote:
> On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> > On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > > On Saturday 07 August 2010 21:15:14 Eric Paris wrote:
> > > > On Fri, 2010-08-06 at 20:06 -0400, Christoph Hellwig wrote:
> > > > > I'm also totally missing on any re-post of these patches or
> > > > > discussion of the changes during the last development window.
> > > > 
> > > > I just searched lkml an fsdevel where I usually send everything don't
> > > > see then.  I totally failed.
> > > 
> > > Oh yes.
> > > 
> > > This introduces two new syscalls which will be impossible to fix up after
> > > the fact, and those system calls are poorly documented: commits 2a3edf86
> > > and 52c923dd document the initial versions (in the commit message!), but
> > > subsequent commits then extend that interface.  The interface for
> > > replying to events is not documented at all beyond the example code [1].
> > >  There is no documentation in Documentation/filesystems/, either.
> > > 
> > > 	[1] http://people.redhat.com/~eparis/fanotify/
> 
> Oh ... this example doesn't actually build; both syscall prototypes are wrong.  
> What have you been testing this with?

I updated that code, I didn't realize just how out of date it got.

> > I'll work on documentation.  Although it should be pointed out that the
> > interface was sent to list many times with lots of discussion and
> > feedback.
> 
> One of the wonky remaining bits is the way how files are reopened with 
> dentry_open() with the f_flags passed to fanotify_init().  The open can fail, 
> in which case the user is left with an error condition but with no indication 
> as to which object the error happened for.  What the heck?

What else can be done?  When notification is based on an open fd and you
can't give them an open fd, there's nothing left....


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17 10:55                 ` Tvrtko Ursulin
@ 2010-08-17 15:27                   ` Eric Paris
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Paris @ 2010-08-17 15:27 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Tue, 2010-08-17 at 11:55 +0100, Tvrtko Ursulin wrote:
> On Tuesday 17 Aug 2010 11:12:41 Tvrtko Ursulin wrote:
> > On Tuesday 17 Aug 2010 11:01:09 Andreas Gruenbacher wrote:
> > > I'm quite sure that both of these issues have been discussed already.
> >
> > Ok, I obviosuly missed it. Do you have a pointer perhaps?
> 
> I have found the thread. It has been discussed but I do not find it had a
> clear outcome. At the end Eric has proposed the heartbeat/in-progress option
> (which IMHO can only work together with a timeout) to which no-one objected. I
> did not find other arguments against such functionality solid.

You'll notice that anything anyone I respected objected to, even if I
didn't agree, I dropped or replaced.  I had such code.  We can bring it
back.  But the objection was: 'what's the point?"  They believed that
everyone would just do it in a library and end up in the 'block forever'
situation we have today.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17 15:24             ` Eric Paris
@ 2010-08-17 15:48               ` Andreas Gruenbacher
  2010-08-18 14:18               ` Andreas Gruenbacher
  1 sibling, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-17 15:48 UTC (permalink / raw)
  To: Eric Paris, viro
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, akpm,
	Michael Kerrisk

On Tuesday 17 August 2010 17:24:28 Eric Paris wrote:
> On Tue, 2010-08-17 at 10:38 +0200, Andreas Gruenbacher wrote:
> > One of the wonky remaining bits is the way how files are reopened with
> > dentry_open() with the f_flags passed to fanotify_init().  The open can
> > fail, in which case the user is left with an error condition but with no
> > indication as to which object the error happened for.  What the heck?
> 
> What else can be done?  When notification is based on an open fd and you
> can't give them an open fd, there's nothing left....

The main point would be to allow the listener to object the event is about.  
This could be done by returning st_dev and st_ino in the event. (i_generation 
might be useful too but we don't even return that in struct stat today.)

Another way would be to return a file pointer to a bad inode that can be 
stat() normally, and to return the error code separately.

Al might have an opinion on that.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17 15:24             ` Eric Paris
  2010-08-17 15:48               ` Andreas Gruenbacher
@ 2010-08-18 14:18               ` Andreas Gruenbacher
  1 sibling, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-18 14:18 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Tuesday 17 August 2010 17:24:28 Eric Paris wrote:
> On Tue, 2010-08-17 at 10:38 +0200, Andreas Gruenbacher wrote:
> > On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> > > On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > > > 	[1] http://people.redhat.com/~eparis/fanotify/
> > 
> > Oh ... this example doesn't actually build; both syscall prototypes are
> > wrong. What have you been testing this with?
> 
> I updated that code, I didn't realize just how out of date it got.

Thanks.  Please grab some cleanups from here:

	http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-16 20:32       ` Andreas Gruenbacher
  2010-08-17  3:39         ` Eric Paris
@ 2010-08-18 15:47         ` Andreas Gruenbacher
  2010-08-18 15:59           ` Eric Paris
  1 sibling, 1 reply; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-18 15:47 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

Eric,

one more thing that doesn't make sense is events on directories: when watching 
a directory, some actions like reading a directory or creat()ing a file in the 
directory will generate events (even open_perm and access_perm), but other 
actions like hard linking files into the directory or removing files will not 
create any events.  This means that fanotify currently cannot be used for 
watching for directory changes.

In my opinion, events on directories should either be made to actually work, 
or else no directory events at all should be generated.

Also, I can think of users of fanotify which are not concerned with directory 
events at all.  It would make sense to allow such users to subscribe only to 
file events and not receive any directory events which they will end up 
ignoring anyway.

Again it shows that this code just wasn't ready to be merged yet ...

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-18 15:47         ` [GIT PULL] notification tree: directory events Andreas Gruenbacher
@ 2010-08-18 15:59           ` Eric Paris
  2010-08-18 16:42             ` Christoph Hellwig
  2010-08-19 12:44             ` Andreas Gruenbacher
  0 siblings, 2 replies; 46+ messages in thread
From: Eric Paris @ 2010-08-18 15:59 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Wed, 2010-08-18 at 17:47 +0200, Andreas Gruenbacher wrote:
> Eric,
> 
> one more thing that doesn't make sense is events on directories: when watching 
> a directory, some actions like reading a directory or creat()ing a file in the 
> directory will generate events (even open_perm and access_perm), but other 
> actions like hard linking files into the directory or removing files will not 
> create any events.  This means that fanotify currently cannot be used for 
> watching for directory changes.
> 
> In my opinion, events on directories should either be made to actually work, 
> or else no directory events at all should be generated.
> 
> Also, I can think of users of fanotify which are not concerned with directory 
> events at all.  It would make sense to allow such users to subscribe only to 
> file events and not receive any directory events which they will end up 
> ignoring anyway.
> 
> Again it shows that this code just wasn't ready to be merged yet ...

I'm going to file your e-mail into my todo list and hopefully I get the
time to look at the ability to ignore directory events.  Nothing hard
about it.  It's as easy as defining a flag and adding a conditional in
the code but it's not high on my list.

Thus far your e-mails have pointed out one bug in the permissions
implementation I am currently working fixing and a bunch of complaining
about features you can imagine someone might someday want but which
noone has actually stood up and said 'I will use this' or 'this sucks
for my use case'.  I can find all sorts of things around the kernel
where I can imagine some mythical users might want to do something
different but it isn't a reason to prevent merger.  I'd love to have
more review, I'm certainly going to look at your wish list, but don't
expect response to future trolling messages.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-18 15:59           ` Eric Paris
@ 2010-08-18 16:42             ` Christoph Hellwig
  2010-08-18 17:07               ` Eric Paris
  2010-08-19 12:44             ` Andreas Gruenbacher
  1 sibling, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2010-08-18 16:42 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Wed, Aug 18, 2010 at 11:59:06AM -0400, Eric Paris wrote:
> Thus far your e-mails have pointed out one bug in the permissions
> implementation I am currently working fixing and a bunch of complaining
> about features you can imagine someone might someday want but which
> noone has actually stood up and said 'I will use this' or 'this sucks
> for my use case'.  I can find all sorts of things around the kernel
> where I can imagine some mythical users might want to do something
> different but it isn't a reason to prevent merger.  I'd love to have
> more review, I'm certainly going to look at your wish list, but don't
> expect response to future trolling messages.

Eric, please stop that crap.  You've sent a pull request for stuff
that's not only been contentious but also not reviewed at all in this
form to Linus behind everyones back.  Andreas actually takes his time
to review the clusterfuck you created, so better be really quite and
listen to him. 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-18 16:42             ` Christoph Hellwig
@ 2010-08-18 17:07               ` Eric Paris
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Paris @ 2010-08-18 17:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Wed, 2010-08-18 at 12:42 -0400, Christoph Hellwig wrote:
> On Wed, Aug 18, 2010 at 11:59:06AM -0400, Eric Paris wrote:
> > Thus far your e-mails have pointed out one bug in the permissions
> > implementation I am currently working fixing and a bunch of complaining
> > about features you can imagine someone might someday want but which
> > noone has actually stood up and said 'I will use this' or 'this sucks
> > for my use case'.  I can find all sorts of things around the kernel
> > where I can imagine some mythical users might want to do something
> > different but it isn't a reason to prevent merger.  I'd love to have
> > more review, I'm certainly going to look at your wish list, but don't
> > expect response to future trolling messages.
> 
> Eric, please stop that crap.  You've sent a pull request for stuff
> that's not only been contentious but also not reviewed at all in this
> form to Linus behind everyones back.  Andreas actually takes his time
> to review the clusterfuck you created, so better be really quite and
> listen to him.

I admit that there were a number of patches created since the last merge
request was held up based on Al's review that weren't sent to list.  I
said I screwed up and pointed out what was missed before it was merged.
Clearly those changes didn't live in linux-next long enough to catch all
of their problems (namely the f_count thing everyone agreed was dirty
and broke sound)  I wasn't the only person to look at most of those
changes, but they absolutely should have been on list.  I've screwed up
on that twice.

But an implication that the idea, the interface, the event types sent
and received, how things worked, or anything like that wasn't sent to
list or that I didn't beg for review just isn't true (all of which has
been implied).

I fucked up not posting some of my internal notification reworks to
improve system performance, maintainability, and reliability.  But any
belief that 'contentious' portions of code just magically showed up
behind anyone's back or at the last minute isn't true.

Like I said, I'd love more review.  I'll gladly add more things to my
todo list if people have useful ideas.  But multiple messages suggesting
code should be reverted because it doesn't implement some imagined
feature or because the code has a bug for I'm betting well over a year
obviously bothers me.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-18 15:59           ` Eric Paris
  2010-08-18 16:42             ` Christoph Hellwig
@ 2010-08-19 12:44             ` Andreas Gruenbacher
  2010-08-19 15:00               ` Eric Paris
  1 sibling, 1 reply; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-19 12:44 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Wednesday 18 August 2010 17:59:06 Eric Paris wrote:
> I'm going to file your e-mail into my todo list and hopefully I get the
> time to look at the ability to ignore directory events.

As far as I can remember, several people involved in the previous discussion 
agreed that a reasonable goal would be to make fanotify  a superset of 
inotify.  My understanding was that this would be the general direction of 
development.  The code apparently is not there yet; it only reports a subset 
of the relevant directory events.  (In other words, the directory event part 
of the code is currently useless.)  Given that, I was surprised to see the 
code getting merged.

Fanotify has a subset of functionality for watching and vetting regular file 
accesses which seems to be useful in its own right; the anti-malware folks 
want this part.  Implementing only this part was not what was originally 
discussed, but I can see some arguments for putting this functionality in now 
(or rather leaving it in) and adding the rest later.

The half-thought-out directory events are not part of this subset though.  
They are not useful in their own right and only generate overheads, and much 
worse, they could even get in the way when proper directory event support is 
eventually added.  So that part should really be removed ASAP.

I expect more from you than just ignoring my concerns as you imply.

> Nothing hard about it.  It's as easy as defining a flag and adding a
> conditional in the code but it's not high on my list.

We are not talking about Eric's own private syscalls here.  Things we screw up 
now may be very hard or impossible to fix later; syscalls don't just change 
from release to release.

This also applies to the error reporting mess I have commented on.  At least 
the interface should be changed so that it can report a valid file descriptor 
and an error condition at the same time; otherwise, we will end up with a 
weakness in the interface that we won't be able to fix.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-19 12:44             ` Andreas Gruenbacher
@ 2010-08-19 15:00               ` Eric Paris
  2010-08-19 23:41                 ` Andreas Gruenbacher
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Paris @ 2010-08-19 15:00 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Thu, 2010-08-19 at 14:44 +0200, Andreas Gruenbacher wrote:
> On Wednesday 18 August 2010 17:59:06 Eric Paris wrote:

> The half-thought-out directory events are not part of this subset though.  
> They are not useful in their own right and only generate overheads, and much 
> worse, they could even get in the way when proper directory event support is 
> eventually added.  So that part should really be removed ASAP.

They aren't something I specifically added so you can call them
zero-thought-out.  fanotify is just a userspace interface on top of the
notification infrastructure in the kernel.  The events the
infrastructure delivers are the events it sends.

> > Nothing hard about it.  It's as easy as defining a flag and adding a
> > conditional in the code but it's not high on my list.
> 
> We are not talking about Eric's own private syscalls here.  Things we screw up 
> now may be very hard or impossible to fix later; syscalls don't just change 
> from release to release.

Which is why there was so much discussion and reworking of the
interface.  It went through many iterations to end up here.  What all
did we have in those discussions?  2 magic proc files, ioctls on a char
dev, netlink, a socket with get/setsockopt and eventually we landed on 2
syscalls that noone found fault with.  For this particular feature
request the syscall in question looks like so:

SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)

Your changes could be as simple as defining a new flag and then add an
if (flag && S_IFDIR(i_mode)) to the fanotify_should_send_event handler.
I'd love to hear objections, failings, short comings, or suggestions if
you think the interface is inadequate to address these or other needs.

> This also applies to the error reporting mess I have commented on.  At least 
> the interface should be changed so that it can report a valid file descriptor 
> and an error condition at the same time; otherwise, we will end up with a 
> weakness in the interface that we won't be able to fix.

Can you describe your problem for me again.  I'm not sure I understand
your request.  I don't understand how we have an error and a valid fd at
the same time.  Are you really intending to restating the complaint that
when the system is unable to deliver notification there is not a lot of
information about what notification it failed to deliver?  You suggested
possibly adding st_ino and st_dev to the notification in that situation.
The notification messages are easy extensible assuming userspace
properly used the provided macros FAN_EVENT_NEXT and FAN_EVENT_OK to do
its processing.  Nothing prevents us from expanding what information is
provided in the message.  (idea stolen from netlink for just that
purpose)

I understand you want new features but I'm not seeing failures of the
interface to be forward looking or failures in the feature set that is
provided.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17 15:08             ` Eric Paris
@ 2010-08-19 20:24               ` Andreas Gruenbacher
  2010-08-19 20:32                 ` Andreas Gruenbacher
  2010-08-19 20:42                 ` Eric Paris
  0 siblings, 2 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-19 20:24 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Tuesday 17 August 2010 17:08:26 Eric Paris wrote:
> On Tue, 2010-08-17 at 10:09 +0200, Andreas Gruenbacher wrote:
> > On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> > > On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > > > Q: What happens when a process watching for FAN_OPEN_PERM or
> > > > FAN_ACCESS_PERM events exits or dies while events are in flight?  I
> > > > can't see anything in the code that would wake sleeping processes up
> > > > when the fsnotify_group of the listener is torn down.
> > > 
> > > We can get stuck.  There was code which cleaned that up, but it got
> > > accidentally removed long ago when, upon review on list, I was told to
> > > remove all timeout code.  It's easy enough to fix up.  I'll post a
> > > patch this week.
> > 
> > This needs to be fixed then.  Not such a big deal, but it shows that the
> > tree wasn't ready for being merged yet and needs further review.
> 
> Code with bugs, shocking!  Two other bugs have been found and patches
> for those will be coming shortly.  I've begged for review how many
> times?  I don't care when review it comes, I'll address any issues as
> they come up.

Here is one more bug: when watching a directory with inotify, doing an ls 
gives me:

	Watching d
	d was opened
	d not opened for writing was closed

Watching the same directory with fanotify results in:

	.../d: pid=... open_perm
	.../d: pid=... open
	.../d: pid=... access_perm
	.../d: pid=... access_perm
	.../d: pid=... close

Five events seem a bit excessive; I can't explain why so many are generated.  
The real issue is when watching the same directory both with inotify and 
fanotify, though: the fanotify result stays the same, but 

	Watching d
	d has not changed
	d was opened
	d has not changed
	d has not changed
	d not opened for writing was closed

In other words, watching a directory with fanotify causes extra inotify events 
with mask == 0.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-19 20:24               ` Andreas Gruenbacher
@ 2010-08-19 20:32                 ` Andreas Gruenbacher
  2010-08-19 20:42                 ` Eric Paris
  1 sibling, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-19 20:32 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Thursday 19 August 2010 22:24:11 Andreas Gruenbacher wrote:
> Watching the same directory with fanotify results in:
> 
> 	.../d: pid=... open_perm
> 	.../d: pid=... open
> 	.../d: pid=... access_perm
> 	.../d: pid=... access_perm
> 	.../d: pid=... close

One more thing: the fanotify example code automatically sets ignore marks by 
default so the second access_perm check above will normally be suppressed.  
Use the -n option of the copy here to turn ignore marks off:

	http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-19 20:24               ` Andreas Gruenbacher
  2010-08-19 20:32                 ` Andreas Gruenbacher
@ 2010-08-19 20:42                 ` Eric Paris
  2010-08-19 21:07                   ` Andreas Gruenbacher
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Paris @ 2010-08-19 20:42 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Thu, 2010-08-19 at 22:24 +0200, Andreas Gruenbacher wrote:
> On Tuesday 17 August 2010 17:08:26 Eric Paris wrote:
> > On Tue, 2010-08-17 at 10:09 +0200, Andreas Gruenbacher wrote:
> > > On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> > > > On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > > > > Q: What happens when a process watching for FAN_OPEN_PERM or
> > > > > FAN_ACCESS_PERM events exits or dies while events are in flight?  I
> > > > > can't see anything in the code that would wake sleeping processes up
> > > > > when the fsnotify_group of the listener is torn down.
> > > > 
> > > > We can get stuck.  There was code which cleaned that up, but it got
> > > > accidentally removed long ago when, upon review on list, I was told to
> > > > remove all timeout code.  It's easy enough to fix up.  I'll post a
> > > > patch this week.
> > > 
> > > This needs to be fixed then.  Not such a big deal, but it shows that the
> > > tree wasn't ready for being merged yet and needs further review.
> > 
> > Code with bugs, shocking!  Two other bugs have been found and patches
> > for those will be coming shortly.  I've begged for review how many
> > times?  I don't care when review it comes, I'll address any issues as
> > they come up.
> 
> Here is one more bug: when watching a directory with inotify, doing an ls 
> gives me:
> 
> 	Watching d
> 	d was opened
> 	d not opened for writing was closed
> 
> Watching the same directory with fanotify results in:
> 
> 	.../d: pid=... open_perm
> 	.../d: pid=... open
> 	.../d: pid=... access_perm
> 	.../d: pid=... access_perm
> 	.../d: pid=... close
> 
> Five events seem a bit excessive; I can't explain why so many are generated.  
> The real issue is when watching the same directory both with inotify and 
> fanotify, though: the fanotify result stays the same, but 

The extra events are plainly the new events that inotify doesn't
support: namely permissions events.  You ask for and received extra
events....

> 	Watching d
> 	d has not changed
> 	d was opened
> 	d has not changed
> 	d has not changed
> 	d not opened for writing was closed
> 
> In other words, watching a directory with fanotify causes extra inotify events 
> with mask == 0.

I can't reproduce it.

inotifywait -m /mnt/tmp
fanotify -p /mnt/tmp
ls /mnt/tmp

All I see is:

# /tmp/inotifywait.strace -- inotifywait -m /mnt/tmp/
Setting up watches.  
Watches established.
/mnt/tmp/ OPEN,ISDIR 
/mnt/tmp/ CLOSE_NOWRITE,CLOSE,ISDIR

# /storage/tmp/fanotify/fanotify -p /mnt/tmp
/mnt/tmp: pid=508 open_perm
/mnt/tmp: pid=508 open
/mnt/tmp: pid=508 access_perm
/mnt/tmp: pid=508 close

# ls
file  lost+found

You must have some other testing methodology.....


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-19 20:42                 ` Eric Paris
@ 2010-08-19 21:07                   ` Andreas Gruenbacher
  2010-08-19 21:22                     ` Andreas Gruenbacher
  2010-08-20  3:50                     ` Eric Paris
  0 siblings, 2 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-19 21:07 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Thursday 19 August 2010 22:42:45 Eric Paris wrote:
> The extra events are plainly the new events that inotify doesn't
> support: namely permissions events.  You ask for and received extra
> events....

How can it make sense that inotify suddenly starts seeing events it does not
support?

> I can't reproduce it. You must have some other testing methodology.....

Apparently.  Here is a trace if what I'm getting through inotify (the inotify
events were dumped with gdb, the rest is from strace):

inotify_init()                          = 3
inotify_add_watch(3, "d", IN_ACCESS|IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_CLOSE_NOWRITE|IN_OPEN|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|
IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1
read(3, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0 \0\0@\0\0\0\0\0\0\0\0", 4210688) = 32

	=> {wd = 1, mask = 0, cookie = 0, len = 0, name = ...}
	=> {wd = 1, mask = 1073741856, cookie = 0, len = 0, name = ...}

read(3, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\20\0\0@\0\0\0\0\0\0\0\0", 4210688) = 32

	=> {wd = 1, mask = 0, cookie = 0, len = 0, name = ...}
	=> {wd = 1, mask = 1073741840, cookie = 0, len = 0, name = ...}

read(3, 0x7fff56db15e0, 4210688)        = -1 EINTR (Interrupted system call)


Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-19 21:07                   ` Andreas Gruenbacher
@ 2010-08-19 21:22                     ` Andreas Gruenbacher
  2010-08-20  3:50                     ` Eric Paris
  1 sibling, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-19 21:22 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Thursday 19 August 2010 23:07:31 Andreas Gruenbacher wrote:
> inotify_init()                          = 3
> inotify_add_watch(3, "d",
> IN_ACCESS|IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_CLOSE_NOWRITE|IN_OPEN|IN_M
> OVED_FROM|IN_MOVED_TO|IN_CREATE| IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1
> read(3, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0 \0\0@\0\0\0\0\0\0\0\0",
> 4210688) = 32
> 
> 	=> {wd = 1, mask = 0, cookie = 0, len = 0, name = ...}
> 	=> {wd = 1, mask = 1073741856, cookie = 0, len = 0, name = ...}
> 
> read(3, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\20\0\0@\0\0\0\0\0\0\0\0",
> 4210688) = 32
> 
> 	=> {wd = 1, mask = 0, cookie = 0, len = 0, name = ...}

When disabling ignore masks (-n), I get another of these here:

	=> {wd = 1, mask = 0, cookie = 0, len = 0, name = ...}

> 	=> {wd = 1, mask = 1073741840, cookie = 0, len = 0, name = ...}
> 
> read(3, 0x7fff56db15e0, 4210688)        = -1 EINTR (Interrupted system
> call)

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-19 15:00               ` Eric Paris
@ 2010-08-19 23:41                 ` Andreas Gruenbacher
  2010-08-20  3:38                   ` Eric Paris
  0 siblings, 1 reply; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-19 23:41 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

[Adding linux-fsdevel to the list as this really is a filesystem discussion.]

On Thursday 19 August 2010 17:00:12 Eric Paris wrote:
> On Thu, 2010-08-19 at 14:44 +0200, Andreas Gruenbacher wrote:
> > On Wednesday 18 August 2010 17:59:06 Eric Paris wrote:
> > 
> > The half-thought-out directory events are not part of this subset though.
> > They are not useful in their own right and only generate overheads, and
> > much worse, they could even get in the way when proper directory event
> > support is eventually added.  So that part should really be removed
> > ASAP.
> 
> They aren't something I specifically added so you can call them
> zero-thought-out.  fanotify is just a userspace interface on top of the
> notification infrastructure in the kernel.  The events the
> infrastructure delivers are the events it sends.

Apparently the infrastructure delivers a number of directory events like file 
creation, deletion, and renames through inotify that are not delivered through 
fanotify, and fanotify only sees a subset of all directory events.  The result 
is that directory events for inotify are useful because they allow to watch a 
directory for changes, and the directory events in fanotify are not useful for 
that right now.

>[...]
>> Your changes could be as simple as defining a new flag and then add an
> if (flag && S_IFDIR(i_mode)) to the fanotify_should_send_event handler.
> I'd love to hear objections, failings, short comings, or suggestions if
> you think the interface is inadequate to address these or other needs.

I don't think the events that fanotify delivers for directories (open_perm, 
open, access_perm, close) are useful at all, or that we should allow perm 
events for directories.

So let's please get rid of those directory events unconditionally now, and add 
them back in their proper, final form later, after 2.6.36.

> > We are not talking about Eric's own private syscalls here.  Things we
> > screw up now may be very hard or impossible to fix later; syscalls don't
> > just change from release to release.
> 
> Which is why there was so much discussion and reworking of the
> interface.  It went through many iterations to end up here.  What all
> did we have in those discussions?  2 magic proc files, ioctls on a char
> dev, netlink, a socket with get/setsockopt and eventually we landed on 2
> syscalls that noone found fault with.

Unfortunately there wasn't a lot of discussion about which events should be 
generated when.  Fanotify was merged before turning into a superset of 
inotify, and there was even less discussion about how fanotify should look if 
it isn't a superset of inotify.

> > This also applies to the error reporting mess I have commented on.  At
> > least the interface should be changed so that it can report a valid file
> > descriptor and an error condition at the same time; otherwise, we will
> > end up with a weakness in the interface that we won't be able to fix.
> 
> Can you describe your problem for me again.  I'm not sure I understand
> your request.  I don't understand how we have an error and a valid fd at
> the same time.

Yes.  Right now, struct fanotify_event_metadata contains a file descriptor 
which is the only information we have about the object the event refers to, or 
a negative error code if a file descriptor could not be opened with 
dentry_open() or some other error occurred.

Being able to identify the object that an event refers to is important.  There 
are two ways to do that:

	(1) Include fields like st_dev and st_ino in struct
	    fanotify_event_metadata.  This is not ideal because the listener
	    won't be able to find out any additional information about the file
	    (like an idea about its name, for example).

		 For example, if inode->i_generation is ever exposed to user-space
	    through stat(), that information would still be missing in struct
	    fanotify_event_metadata.

	(2) Construct a file descriptor that refers to the file that could not be
	    dentry_open()ed, but which cannot be used for any I/O, and pass the
	    error condition in a separate field.  The kernel has all the
	    information needed for doing that, and it shouldn't be hard to
	    implement.

	    That way, the listener always has a file descriptor to poke around
	    with.

Failing to do (2) right now, I think it still makes sense to separate the file 
descriptor from the error code in struct fanotify_event_metadata; this would 
enable us to do (2) later if we decide to.

> I understand you want new features but I'm not seeing failures of the
> interface to be forward looking or failures in the feature set that is
> provided.

I do see failures of being forward looking, inconsistencies in the feature set 
which might lead to trouble as we extend fanotify in the future, and bugs that 
would have been shaken out in a proper review (OOMing the system when a 
listener stops listening or blocking access to files when a listener dies; I 
have reported that).

Sorry to say, but this code really should not have been merged yet.  (And mind 
I'm not saying this because I want to block fanotify from making progress -- 
quite the contrary.)

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-17  8:09           ` Andreas Gruenbacher
  2010-08-17 15:08             ` Eric Paris
@ 2010-08-20  0:00             ` Andreas Gruenbacher
  1 sibling, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20  0:00 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

[Adding linux-fsdevel here as well.]

On Tuesday 17 August 2010 10:09:50 Andreas Gruenbacher wrote:
> > > Q: What prevents the system from going out of memory when a listener
> > > decides to stop reading events or simply can't keep up?  There doesn't
> > > seem to be a limit on the queue depth.  Listeners currently need
> > > CAP_SYS_ADMIN, but somehow limiting the queue depth and throttling when
> > > things start to go bad still sounds like a reasonable thing to do,
> > > right?
> > 
> > It's an interesting question and obviously one that I've thought about.
> > You remember when we talked previously I said the hardest part left was
> > allowing non-root users to use the interface.  It gets especially
> > difficult when thinking about perm-events.  I was specifically told not
> > to timeout or drop those.  But when dealing with non-root users using
> > perm events?   As for pure notification we can do something like inotify
> > does quite easily.
> > 
> > I'm not certain exactly what the best semantics are for non trusted
> > users, so I didn't push any patches that way.  Suggestions welcome   :)
> 
> The system will happily go OOM for trusted users and non-perm events if the
> listener doesn't keep up, so some throttling, dropping, or both needs to
> happen for non-perm events.  This is the critical case.  Doing what inotify
> does (queue an overflow event and drop further events) seems to make sense
> here.
> 
> The situation with perm-events is less severe because the number of
> outstanding perm events is bounded by the number of running processes. 
> This may be enough of a limit.
> 
> I don't think we need to worry about perm-events for untrusted users.  We
> can start supporting some kinds of non-perm-events for untrusted users
> later; this won't change the existing interface.

Another case where fanotify fails to generate useful events is when a listener 
runs out of file descriptors; events will simply end up with fd == -EMFILE in 
that case.  I don't think this behavior is useful; instead, reading from the 
fanotify file descriptor (he one returned by fanotify_init()) should fail to 
give the listener a chance to react.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-19 23:41                 ` Andreas Gruenbacher
@ 2010-08-20  3:38                   ` Eric Paris
  2010-08-20  5:19                     ` Andreas Dilger
                                       ` (4 more replies)
  0 siblings, 5 replies; 46+ messages in thread
From: Eric Paris @ 2010-08-20  3:38 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
> [Adding linux-fsdevel to the list as this really is a filesystem discussion.]

> Yes.  Right now, struct fanotify_event_metadata contains a file descriptor 
> which is the only information we have about the object the event refers to, or 
> a negative error code if a file descriptor could not be opened with 
> dentry_open() or some other error occurred.
> 
> Being able to identify the object that an event refers to is important.  There 
> are two ways to do that:
> 
> 	(1) Include fields like st_dev and st_ino in struct
> 	    fanotify_event_metadata.  This is not ideal because the listener
> 	    won't be able to find out any additional information about the file
> 	    (like an idea about its name, for example).
> 
> 		 For example, if inode->i_generation is ever exposed to user-space
> 	    through stat(), that information would still be missing in struct
> 	    fanotify_event_metadata.
> 
> 	(2) Construct a file descriptor that refers to the file that could not be
> 	    dentry_open()ed, but which cannot be used for any I/O, and pass the
> 	    error condition in a separate field.  The kernel has all the
> 	    information needed for doing that, and it shouldn't be hard to
> 	    implement.
> 
> 	    That way, the listener always has a file descriptor to poke around
> 	    with.
> 
> Failing to do (2) right now, I think it still makes sense to separate the file 
> descriptor from the error code in struct fanotify_event_metadata; this would 
> enable us to do (2) later if we decide to.

In reference to (2), I don't even understand what an fd is that can't be
used for anything.  I'll let Al or Christoph respond if they feel like
it, but it sounds crazy to me.  You want to just magic up some struct
file and attach a struct path to it even though dentry_open() failed?
So you can do what with it?

On a more realistic note, I'm not opposed to (1), however, your
arguments would lead one to reject inotify as the IN_OVERFLOW or oom
conditions will result in silently lost events or events which provide
no useful information since the notification system has broken down.
When the appropriate use of notification is impossible I'm certainly not
opposed to patches which add best effort information, but you are
already outside the bounds of a reasonably functional system and there
is no good solution.

> bugs that 
> would have been shaken out in a proper review (OOMing the system when a 
> listener stops listening or blocking access to files when a listener dies; I 
> have reported that).

So the actual bugs you have reported, I see two.

The (nearly) unbounded number of potential outstanding notifications
events is a known situation, pointed out in previous discussions well
before this commit and is one of the (numerous) reasons why fanotify is
at this time CAP_SYS_ADMIN only.  It is something that is difficult to
address while still making fanotify useful for permissions gating.  But
the issue is clearly noted.

As to when a listener dies:  You have to define 'dies'.  If the process
just stops respond it is supposed to lock forever.  I was explicitly ask
to remove timeouts (even though I've already been ask off list to put
them back)  In Linus' tree there is a race in which both processes (the
listener and the process doing an action waiting on the listener) can
deadlock and hang forever.  A (much less racy but not right) patch to
fix this has already been posted.  Do you have comments on that patch?
I have a better version which has worked well for me today which I will
likely post tomorrow morning after I look over it again.  I'd love your
feedback.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-19 21:07                   ` Andreas Gruenbacher
  2010-08-19 21:22                     ` Andreas Gruenbacher
@ 2010-08-20  3:50                     ` Eric Paris
  2010-08-20 12:38                       ` Andreas Gruenbacher
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Paris @ 2010-08-20  3:50 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Thu, 2010-08-19 at 23:07 +0200, Andreas Gruenbacher wrote:
> On Thursday 19 August 2010 22:42:45 Eric Paris wrote:

[necessary context to the conversation reinserted]

>>> Watching the same directory with fanotify results in:
>>> 
>>>       .../d: pid=... open_perm
>>>       .../d: pid=... open
>>>       .../d: pid=... access_perm
>>>       .../d: pid=... access_perm
>>>       .../d: pid=... close
>>> 
>>> Five events seem a bit excessive; I can't explain why so many are generated.  
>>> The real issue is when watching the same directory both with inotify and 
>>> fanotify, though: the fanotify result stays the same, but

> > The extra events are plainly the new events that inotify doesn't
> > support: namely permissions events.  You ask for and received extra
> > events....
> 
> How can it make sense that inotify suddenly starts seeing events it does not
> support?

I inline commented in the wrong place to make it clear what I was
responding to.  The question I was was answering was: "Five events seem
a bit excessive; I can't explain why so many are generated."  I answered
it.

Now onto the the question of extra inotify events:

> > I can't reproduce it. You must have some other testing methodology.....
> 
> Apparently.  Here is a trace if what I'm getting through inotify (the inotify
> events were dumped with gdb, the rest is from strace):
> 
> inotify_init()                          = 3
> inotify_add_watch(3, "d", IN_ACCESS|IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_CLOSE_NOWRITE|IN_OPEN|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|
> IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1
> read(3, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0 \0\0@\0\0\0\0\0\0\0\0", 4210688) = 32
> 
> 	=> {wd = 1, mask = 0, cookie = 0, len = 0, name = ...}
> 	=> {wd = 1, mask = 1073741856, cookie = 0, len = 0, name = ...}
> 
> read(3, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\20\0\0@\0\0\0\0\0\0\0\0", 4210688) = 32
> 
> 	=> {wd = 1, mask = 0, cookie = 0, len = 0, name = ...}
> 	=> {wd = 1, mask = 1073741840, cookie = 0, len = 0, name = ...}
> 
> read(3, 0x7fff56db15e0, 4210688)        = -1 EINTR (Interrupted system call)

inotify_add_watch(3, "/mnt/tmp/", IN_ACCESS|IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_CLOSE_NOWRITE|IN_OPEN|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1
lstat("/mnt/tmp/", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0
write(2, "Watches established.\n", 21)  = 21
select(4, [3], NULL, NULL, NULL)        = 1 (in [3])
ioctl(3, FIONREAD, [16])                = 0
read(3, "\1\0\0\0 \0\0@\0\0\0\0\0\0\0\0", 65536) = 16
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8fef0d3000
write(1, "/mnt/tmp/ OPEN,ISDIR \n", 22) = 22
select(4, [3], NULL, NULL, NULL)        = 1 (in [3])
ioctl(3, FIONREAD, [16])                = 0
read(3, "\1\0\0\0\20\0\0@\0\0\0\0\0\0\0\0", 65536) = 16
write(1, "/mnt/tmp/ CLOSE_NOWRITE,CLOSE,IS"..., 37) = 37
select(4, [3], NULL, NULL, NULL)        = ? ERESTARTNOHAND (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
+++ killed by SIGINT +++

We must be doing something different...  What kernel?  what kconfig?
What exact FS setup?  What exact steps are you taking?  What programs
are you using to test east side?  If you want to spam me with something
I added a whole lot of pr_debug statements throughout notification code
just in case it wasn't perfect (haha) so if you built with dynamic debug
you could run my debug script:

#!/bin/bash

echo "file fs/notify/inode_mark.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/mark.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/notification.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/fanotify/fanotify_user.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/fanotify/fanotify.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/vfsmount_mark.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/group.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/inotify/inotify_fsnotify.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/inotify/inotify_user.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/dnotify/dnotify.c +p" > /sys/kernel/debug/dynamic_debug/control
echo "file fs/notify/fsnotify.c +p" > /sys/kernel/debug/dynamic_debug/control

reproduce and send me the dmesg results.  Maybe I can glean something
out of it....

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38                   ` Eric Paris
@ 2010-08-20  5:19                     ` Andreas Dilger
  2010-08-20  9:21                       ` Christoph Hellwig
  2010-08-20  9:09                     ` Tvrtko Ursulin
                                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Andreas Dilger @ 2010-08-20  5:19 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk, linux-fsdevel

On 2010-08-19, at 21:38, Eric Paris wrote:
> On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
>> Being able to identify the object that an event refers to is important.  There 
>> are two ways to do that:
>> 
>> 	(1) Include fields like st_dev and st_ino in struct
>> 	    fanotify_event_metadata.
> 
> On a more realistic note, I'm not opposed to (1), however, your
> arguments would lead one to reject inotify as the IN_OVERFLOW or oom
> conditions will result in silently lost events or events which provide
> no useful information since the notification system has broken down.
> When the appropriate use of notification is impossible I'm certainly not
> opposed to patches which add best effort information, but you are
> already outside the bounds of a reasonably functional system and there
> is no good solution.

What about unifying the file identification here with the file handle used for open_by_handle()?  That keeps a consistent identifier for the inode between multiple system calls (always a good thing), and allows the inode to be opened again via open_by_handle() if needed.

Cheers, Andreas






^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38                   ` Eric Paris
  2010-08-20  5:19                     ` Andreas Dilger
@ 2010-08-20  9:09                     ` Tvrtko Ursulin
  2010-08-20 11:07                     ` Andreas Gruenbacher
                                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Tvrtko Ursulin @ 2010-08-20  9:09 UTC (permalink / raw)
  To: Eric Paris
  Cc: Andreas Gruenbacher, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk, linux-fsdevel

On Friday 20 Aug 2010 04:38:17 Eric Paris wrote:
> On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
> >     (2) Construct a file descriptor that refers to the file that could not
> >     be
> >
> >         dentry_open()ed, but which cannot be used for any I/O, and pass
the
> >         error condition in a separate field.  The kernel has all the
> >         information needed for doing that, and it shouldn't be hard to
> >         implement.
> >
> >         That way, the listener always has a file descriptor to poke around
> >         with.
> >
> > Failing to do (2) right now, I think it still makes sense to separate the
> > file descriptor from the error code in struct fanotify_event_metadata;
> > this would enable us to do (2) later if we decide to.
>
> In reference to (2), I don't even understand what an fd is that can't be
> used for anything.  I'll let Al or Christoph respond if they feel like
> it, but it sounds crazy to me.  You want to just magic up some struct
> file and attach a struct path to it even though dentry_open() failed?
> So you can do what with it?

I think Andreas would like to get a path even if open failed so it could be
for that use? If creating such file object will be impossible or too hacky,
but the general idea accepted, it may be better just to attach the path to the
event.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  5:19                     ` Andreas Dilger
@ 2010-08-20  9:21                       ` Christoph Hellwig
  2010-08-20 15:29                         ` Andreas Gruenbacher
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2010-08-20  9:21 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Eric Paris, Andreas Gruenbacher, Christoph Hellwig, Matt Helsley,
	torvalds, linux-kernel, viro, akpm, Michael Kerrisk,
	linux-fsdevel

On Thu, Aug 19, 2010 at 11:19:07PM -0600, Andreas Dilger wrote:
> What about unifying the file identification here with the file handle used for open_by_handle()?  That keeps a consistent identifier for the inode between multiple system calls (always a good thing), and allows the inode to be opened again via open_by_handle() if needed.

That's what the dmapi callouts that are intendeded to do just about the
same as fanotify always did.  I'm pretty sure I brought this up earlier
in the discussion.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38                   ` Eric Paris
  2010-08-20  5:19                     ` Andreas Dilger
  2010-08-20  9:09                     ` Tvrtko Ursulin
@ 2010-08-20 11:07                     ` Andreas Gruenbacher
  2010-08-20 11:25                     ` Andreas Gruenbacher
  2010-08-20 12:16                     ` Andreas Gruenbacher
  4 siblings, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 11:07 UTC (permalink / raw)
  To: Eric Paris, David Howells
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 05:38:17 Eric Paris wrote:
> On Fri, 2010-08-20 at 01:41 +0200, Andreas Gruenbacher wrote:
> > 	(2) Construct a file descriptor that refers to the file that could not be
> > 	    dentry_open()ed, but which cannot be used for any I/O, and pass the
> > 	    error condition in a separate field.  The kernel has all the
> > 	    information needed for doing that, and it shouldn't be hard to
> > 	    implement.
> > 	    
> > 	    That way, the listener always has a file descriptor to poke around
> > 	    with.
> > 
> > Failing to do (2) right now, I think it still makes sense to separate the
> > file descriptor from the error code in struct fanotify_event_metadata;
> > this would enable us to do (2) later if we decide to.
> 
> In reference to (2), I don't even understand what an fd is that can't be
> used for anything.  I'll let Al or Christoph respond if they feel like
> it, but it sounds crazy to me.  You want to just magic up some struct
> file and attach a struct path to it even though dentry_open() failed?
> So you can do what with it?

I've never said the fd can't be used for anything.  The idea would be to
allow the file to be stat()ed, to poke around in /proc/fd/ in the hope to
get a somewhat useful pathname out, or to use something like xstat() on the
fd one day (David Howells is going to be looking into that).

It would be a good thing to be able to use the same mechanisms for
identifying the file that are available for a "normal" file descriptor.

> On a more realistic note, I'm not opposed to (1), however, your
> arguments would lead one to reject inotify as the IN_OVERFLOW or oom
> conditions will result in silently lost events or events which provide
> no useful information since the notification system has broken down.

Nonsense.  An overflow event obviously is not associated with any one file,
so that would be the (only) case where you should get an event that doesn't
refer to a file.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38                   ` Eric Paris
                                       ` (2 preceding siblings ...)
  2010-08-20 11:07                     ` Andreas Gruenbacher
@ 2010-08-20 11:25                     ` Andreas Gruenbacher
  2010-08-20 12:16                     ` Andreas Gruenbacher
  4 siblings, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 11:25 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 05:38:17 Eric Paris wrote:
> So the actual bugs you have reported, I see two.
> 
> The (nearly) unbounded number of potential outstanding notifications
> events is a known situation, pointed out in previous discussions well
> before this commit and is one of the (numerous) reasons why fanotify is
> at this time CAP_SYS_ADMIN only.  It is something that is difficult to
> address while still making fanotify useful for permissions gating.  But
> the issue is clearly noted.

Clearly noting and blissfully ignoring the problem is not enough
unfortunately; this needs to be addressed now.  I have already pointed out 
(quoted below) that permission gating is not the worst problem here; the worst 
problem are listeners whose fanotify event queue just grows and grows.  There 
is no throttling, and no guarantee that even a listener which simply reads and 
completely ignores all events will manage to keep up.  The system will run out 
of memory eventually.

Here is a quote from a previous message about this problem that only went to 
linux-kernel:

On Tuesday 17 August 2010 05:39:47 Eric Paris wrote:
> On Mon, 2010-08-16 at 22:32 +0200, Andreas Gruenbacher wrote:
> > Q: What prevents the system from going out of memory when a listener
> > decides to stop reading events or simply can't keep up?  There doesn't
> > seem to be a limit on the queue depth.  Listeners currently need
> > CAP_SYS_ADMIN, but somehow limiting the queue depth and throttling when
> > things start to go bad still sounds like a reasonable thing to do,
> > right?
> 
> It's an interesting question and obviously one that I've thought about.
> You remember when we talked previously I said the hardest part left was
> allowing non-root users to use the interface.  It gets especially
> difficult when thinking about perm-events.  I was specifically told not
> to timeout or drop those.  But when dealing with non-root users using
> perm events?   As for pure notification we can do something like inotify
> does quite easily.
> 
> I'm not certain exactly what the best semantics are for non trusted
> users, so I didn't push any patches that way.  Suggestions welcome   :)

The system will happily go OOM for trusted users and non-perm events if the 
listener doesn't keep up, so some throttling, dropping, or both needs to 
happen for non-perm events.  This is the critical case.  Doing what inotify 
does (queue an overflow event and drop further events) seems to make sense 
here.

The situation with perm-events is less severe because the number of 
outstanding perm events is bounded by the number of running processes.  This 
may be enough of a limit.

I don't think we need to worry about perm-events for untrusted users.  We can 
start supporting some kinds of non-perm-events for untrusted users later; this 
won't change the existing interface.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  3:38                   ` Eric Paris
                                       ` (3 preceding siblings ...)
  2010-08-20 11:25                     ` Andreas Gruenbacher
@ 2010-08-20 12:16                     ` Andreas Gruenbacher
  4 siblings, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 12:16 UTC (permalink / raw)
  To: Eric Paris
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 05:38:17 Eric Paris wrote:
> As to when a listener dies:  You have to define 'dies'.

I meant a process that gets killed or simply exits while there are outstanding 
perm events.  Nothing in the code would wake up the processes blocked on the 
perm event; they will simply be stuck forever.  (They cannot even be killed 
with SIGKILL.)

This is easy to reproduce with a listener that is killed while processing a 
perm event: just run the fanotify example [1] with the new -s option and hit 
^C while it is sleeping.

Bonus points would be awarded to make a process blocked on a listener killable 
with SIGKILL or other lethal signals.

[1] http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git

The listener will also hang forever in that case; not sure why that is the 
case.

I already told you about this before (http://lkml.org/lkml/2010/8/16/349); not 
sure why everything needs to repeated multiple times until it sinks in.

> If the process just stops respond it is supposed to lock forever.

I agree.

> I was explicitly ask to remove timeouts (even though I've already been ask
> off list to put them back)

I disagree with putting timeouts back in.

> In Linus' tree there is a race in which both processes (the
> listener and the process doing an action waiting on the listener) can
> deadlock and hang forever.  A (much less racy but not right) patch to
> fix this has already been posted.
>
> I have a better version which has worked well for me today which I will
> likely post tomorrow morning after I look over it again.  I'd love your
> feedback.

Do you have a pointer to that?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-20  3:50                     ` Eric Paris
@ 2010-08-20 12:38                       ` Andreas Gruenbacher
  2010-08-23 16:46                         ` Eric Paris
  0 siblings, 1 reply; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 12:38 UTC (permalink / raw)
  To: Eric Paris, linux-fsdevel
  Cc: Christoph Hellwig, Matt Helsley, torvalds, linux-kernel, viro,
	akpm, Michael Kerrisk

On Friday 20 August 2010 05:50:36 Eric Paris wrote:
> We must be doing something different...  What kernel?  what kconfig?
> What exact FS setup?  What exact steps are you taking?  What programs
> are you using to test east side?

I'm runnning 2.6.36-rc1, with CONFIG_FANOTIFY and 
CONFIG_FANOTIFY_ACCESS_PERMISSIONS on apparently.  I am watching the same 
directory with inotify and fanotify at the same time, that is, with both an 
inotify and an fanotify listener running in two separate processes.  The 
inotify listener is code I cannot send so easily, but I've shown the resulting 
strace.  The fanotify listener is the one from [1].

	[1] http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git

Together with the traces I've provided this should give you way enough clues 
to be able to look up in the code why listening for fanotify events apparently 
causes a concurrent inotify listener to return an inotify event with struct 
inotify_event->mask == 0 for each fanotify perm event.

Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20  9:21                       ` Christoph Hellwig
@ 2010-08-20 15:29                         ` Andreas Gruenbacher
  2010-08-20 20:39                           ` Andreas Dilger
  0 siblings, 1 reply; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-20 15:29 UTC (permalink / raw)
  To: Christoph Hellwig, Andreas Dilger
  Cc: Eric Paris, Matt Helsley, torvalds, linux-kernel, viro, akpm,
	Michael Kerrisk, linux-fsdevel

On Friday 20 August 2010 11:21:27 Christoph Hellwig wrote:
> On Thu, Aug 19, 2010 at 11:19:07PM -0600, Andreas Dilger wrote:
> > What about unifying the file identification here with the file handle
> > used for open_by_handle()?  That keeps a consistent identifier for the
> > inode between multiple system calls (always a good thing), and allows
> > the inode to be opened again via open_by_handle() if needed.
> 
> That's what the dmapi callouts that are intendeded to do just about the
> same as fanotify always did.  I'm pretty sure I brought this up earlier
> in the discussion.

I remember you bringing this up.

The persistent handles require CAP_DAC_READ_SEARCH for using open_by_handle() 
to prevent an unprivileged process from forging handles and bypassing 
directory permission checks.  This would be okay for users of fanotify which 
can listen to all events in their namespace (and which requires CAP_SYS_ADMIN 
anyway).

I don't see how open_by_handle() could be made safe to use by arbitrary 
processes; I don't think we can make the handles cryptographically strong, for 
example.  But I may be overlooking something here.

[Side note: checking for CAP_DAC_READ_SEARCH in fanotify would probably be 
enough when no perm events are involved because dentry_open() performs tail 
permission checks anyway.]

In the future it will make sense to extend fanotify to allow unprivileged 
processes to listen to "their own" events, for example, like inotify does 
today.  (You know that providing a better inotify replacement was one of the 
goals of fanotify before it got merged anyway.)  Unprivileged processes 
wouldn't be allowed to use open_by_handle() anymore though, and so file 
handles still look like a better choice for fanotify to me.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree: directory events
  2010-08-20 15:29                         ` Andreas Gruenbacher
@ 2010-08-20 20:39                           ` Andreas Dilger
  0 siblings, 0 replies; 46+ messages in thread
From: Andreas Dilger @ 2010-08-20 20:39 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Eric Paris, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk, linux-fsdevel

On 2010-08-20, at 09:29, Andreas Gruenbacher wrote:
> On Friday 20 August 2010 11:21:27 Christoph Hellwig wrote:
>> On Thu, Aug 19, 2010 at 11:19:07PM -0600, Andreas Dilger wrote:
>>> What about unifying the file identification here with the file handle
>>> used for open_by_handle()?  That keeps a consistent identifier for the
>>> inode between multiple system calls (always a good thing), and allows
>>> the inode to be opened again via open_by_handle() if needed.
>> 
>> That's what the dmapi callouts that are intended to do just about the
>> same as fanotify always did.  I'm pretty sure I brought this up earlier
>> in the discussion.
> 
> I remember you bringing this up.
> 
> The persistent handles require CAP_DAC_READ_SEARCH for using open_by_handle() 
> to prevent an unprivileged process from forging handles and bypassing 
> directory permission checks.  This would be okay for users of fanotify which 
> can listen to all events in their namespace (and which requires CAP_SYS_ADMIN 
> anyway).
> 
> I don't see how open_by_handle() could be made safe to use by arbitrary 
> processes; I don't think we can make the handles cryptographically strong, for 
> example.  But I may be overlooking something here.

If the file handles only need to have a limited lifetime for unprivileged processes, then they can contain a random salt that is kept on the in-core inode.  For me and my intended HPC use case this would be a useful addition for open_by_handle() to allow unprivileged process access.  At worst, if the inode is evicted from memory the process would redo the name_to_handle(), or do the slower open-by-path().

I suspect it would also be possible to have an array of per-superblock (or global) crypto keys that are hashed with the file handle data.  That avoids the per-inode memory, and allows a well-defined lifetime for the handle (minutes, hours, days) only as a function of how quickly the crypto key needs to rotate (based on key length and attack speed) and the size of the array that is kept.

> In the future it will make sense to extend fanotify to allow unprivileged 
> processes to listen to "their own" events, for example, like inotify does 
> today.  (You know that providing a better inotify replacement was one of the 
> goals of fanotify before it got merged anyway.)  Unprivileged processes 
> wouldn't be allowed to use open_by_handle() anymore though, and so file 
> handles still look like a better choice for fanotify to me.


Cheers, Andreas






^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-20 12:38                       ` Andreas Gruenbacher
@ 2010-08-23 16:46                         ` Eric Paris
  2010-08-23 22:38                           ` Andreas Gruenbacher
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Paris @ 2010-08-23 16:46 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: linux-fsdevel, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Fri, 2010-08-20 at 14:38 +0200, Andreas Gruenbacher wrote:
> On Friday 20 August 2010 05:50:36 Eric Paris wrote:
> > We must be doing something different...  What kernel?  what kconfig?
> > What exact FS setup?  What exact steps are you taking?  What programs
> > are you using to test east side?
> 
> I'm runnning 2.6.36-rc1, with CONFIG_FANOTIFY and 
> CONFIG_FANOTIFY_ACCESS_PERMISSIONS on apparently.  I am watching the same 
> directory with inotify and fanotify at the same time, that is, with both an 
> inotify and an fanotify listener running in two separate processes.  The 
> inotify listener is code I cannot send so easily, but I've shown the resulting 
> strace.  The fanotify listener is the one from [1].
> 
> 	[1] http://git.kernel.org/?p=linux/kernel/git/agruen/fanotify-example.git
> 
> Together with the traces I've provided this should give you way enough clues 
> to be able to look up in the code why listening for fanotify events apparently 
> causes a concurrent inotify listener to return an inotify event with struct 
> inotify_event->mask == 0 for each fanotify perm event.

Spent a bit of the weekend trying to figure out what you were doing and
couldn't reproduce it or find it in the code because I had already fixed
it (albeit for slightly different reasons).  The patch in question was:

http://marc.info/?l=linux-kernel&m=128214903125780&w=2

the vfsmount_test_mask was always initialized but since no vfsmount
marks were found it was never cleared.  This left the code thinking that
the given (inode) mark was interested in the event.  I think you could
reproduce it differently

inotifywait -m -e open /mnt/tmp
inotifywait -m -e close /mnt/tmp

and you would get both event types for both watches.

-Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [GIT PULL] notification tree - try 37!
  2010-08-23 16:46                         ` Eric Paris
@ 2010-08-23 22:38                           ` Andreas Gruenbacher
  0 siblings, 0 replies; 46+ messages in thread
From: Andreas Gruenbacher @ 2010-08-23 22:38 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-fsdevel, Christoph Hellwig, Matt Helsley, torvalds,
	linux-kernel, viro, akpm, Michael Kerrisk

On Monday 23 August 2010 18:46:13 Eric Paris wrote:
> Spent a bit of the weekend trying to figure out what you were doing and
> couldn't reproduce it or find it in the code because I had already fixed
> it (albeit for slightly different reasons).  The patch in question was:
> 
> http://marc.info/?l=linux-kernel&m=128214903125780&w=2
> 
> the vfsmount_test_mask was always initialized but since no vfsmount
> marks were found it was never cleared.  This left the code thinking that
> the given (inode) mark was interested in the event.

That seems to explain it.  I can no longer reproduce the bug with your current 
for-linus tree (3ba67c8a) using my previous method.

Sorry this bug was difficult for you to track down; I was not even aware of 
this fix until today.

> I think you could reproduce it differently
> 
> inotifywait -m -e open /mnt/tmp
> inotifywait -m -e close /mnt/tmp
> 
> and you would get both event types for both watches.

No, the open listener gets:
	d/ OPEN,ISDIR

and the close listener gets:
	d/ CLOSE_NOWRITE,CLOSE,ISDIR

I tried to reproduce the previously failing case with inotifywait. I can see 
that inotifywait receives five inotify events at the syscall level, but it 
will not report events with mask == 0.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2010-08-23 22:38 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-06 15:58 [GIT PULL] notification tree - try 37! Eric Paris
2010-08-06 23:34 ` Matt Helsley
2010-08-07  0:06   ` Christoph Hellwig
2010-08-07 19:15     ` Eric Paris
2010-08-07 20:55       ` Matt Helsley
2010-08-16 20:32       ` Andreas Gruenbacher
2010-08-17  3:39         ` Eric Paris
2010-08-17  4:03           ` Matt Helsley
     [not found]           ` <1282016387.21419.113.camel-u/cB4NFi02V49Jlha2NJH1aTQe2KTcn/@public.gmane.org>
2010-08-17  4:03             ` Matt Helsley
2010-08-17  8:09           ` Andreas Gruenbacher
2010-08-17 15:08             ` Eric Paris
2010-08-19 20:24               ` Andreas Gruenbacher
2010-08-19 20:32                 ` Andreas Gruenbacher
2010-08-19 20:42                 ` Eric Paris
2010-08-19 21:07                   ` Andreas Gruenbacher
2010-08-19 21:22                     ` Andreas Gruenbacher
2010-08-20  3:50                     ` Eric Paris
2010-08-20 12:38                       ` Andreas Gruenbacher
2010-08-23 16:46                         ` Eric Paris
2010-08-23 22:38                           ` Andreas Gruenbacher
2010-08-20  0:00             ` Andreas Gruenbacher
2010-08-17  8:38           ` Andreas Gruenbacher
2010-08-17 15:24             ` Eric Paris
2010-08-17 15:48               ` Andreas Gruenbacher
2010-08-18 14:18               ` Andreas Gruenbacher
2010-08-17  9:45           ` Tvrtko Ursulin
2010-08-17 10:01             ` Andreas Gruenbacher
2010-08-17 10:12               ` Tvrtko Ursulin
2010-08-17 10:55                 ` Tvrtko Ursulin
2010-08-17 15:27                   ` Eric Paris
2010-08-18 15:47         ` [GIT PULL] notification tree: directory events Andreas Gruenbacher
2010-08-18 15:59           ` Eric Paris
2010-08-18 16:42             ` Christoph Hellwig
2010-08-18 17:07               ` Eric Paris
2010-08-19 12:44             ` Andreas Gruenbacher
2010-08-19 15:00               ` Eric Paris
2010-08-19 23:41                 ` Andreas Gruenbacher
2010-08-20  3:38                   ` Eric Paris
2010-08-20  5:19                     ` Andreas Dilger
2010-08-20  9:21                       ` Christoph Hellwig
2010-08-20 15:29                         ` Andreas Gruenbacher
2010-08-20 20:39                           ` Andreas Dilger
2010-08-20  9:09                     ` Tvrtko Ursulin
2010-08-20 11:07                     ` Andreas Gruenbacher
2010-08-20 11:25                     ` Andreas Gruenbacher
2010-08-20 12:16                     ` Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.