linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock
@ 2009-03-27 20:05 Eric Paris
  2009-03-27 20:05 ` [PATCH -V2 02/13] fsnotify: unified filesystem notification backend Eric Paris
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

Much like the atomic_dec_and_lock() function in which we take an hold a
spin_lock if we drop the atomic to 0 this function takes and holds the
mutex if we dec the atomic to 0.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090323172417.410913479@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---

 include/linux/mutex.h |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 3069ec7..93054fc 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -151,4 +151,27 @@ extern int __must_check mutex_lock_killable(struct mutex *lock);
 extern int mutex_trylock(struct mutex *lock);
 extern void mutex_unlock(struct mutex *lock);
 
+/**
+ * atomic_dec_and_mutex_lock - return holding mutex if we dec to 0
+ * @cnt: the atomic which we are to dec
+ * @lock: the mutex to return holding if we dec to 0
+ *
+ * return true and hold lock if we dec to 0, return false otherwise
+ */
+static inline int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock)
+{
+	/* dec if we can't possibly hit 0 */
+	if (atomic_add_unless(cnt, -1, 1))
+		return 0;
+	/* we might hit 0, so take the lock */
+	mutex_lock(lock);
+	if (!atomic_dec_and_test(cnt)) {
+		/* when we actually did the dec, we didn't hit 0 */
+		mutex_unlock(lock);
+		return 0;
+	}
+	/* we hit 0, and we hold the lock */
+	return 1;
+}
+
 #endif


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 02/13] fsnotify: unified filesystem notification backend
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-04-07 23:05   ` Andrew Morton
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:05 ` [PATCH -V2 03/13] fsnotify: add group priorities Eric Paris
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

fsnotify is a backend for filesystem notification.  fsnotify does
not provide any userspace interface but does provide the basis
needed for other notification schemes such as dnotify.  fsnotify
can be extended to be the backend for inotify or the upcoming
fanotify.  fsnotify provides a mechanism for "groups" to register for
some set of filesystem events and to then deliver those events to
those groups for processing.

fsnotify has a number of benefits, the first being actually shrinking the size
of an inode.  Before fsnotify to support both dnotify and inotify an inode had

        unsigned long           i_dnotify_mask; /* Directory notify events */
        struct dnotify_struct   *i_dnotify; /* for directory notifications */
        struct list_head        inotify_watches; /* watches on this inode */
        struct mutex            inotify_mutex;  /* protects the watches list

But with fsnotify this same functionallity (and more) is done with just

        __u32                   i_fsnotify_mask; /* all events for this inode */
        struct hlist_head       i_fsnotify_mark_entries; /* marks on this inode */

That's right, inotify, dnotify, and fanotify all in 64 bits.  We used that
much space just in inotify_watches alone, before this patch set.

fsnotify object lifetime and locking is MUCH better than what we have today.
inotify locking is incredibly complex.  See 8f7b0ba1c8539 as an example of
what's been busted since inception.  inotify needs to know internal semantics
of superblock destruction and unmounting to function.  The inode pinning and
vfs contortions are horrible.

no fsnotify implementers do allocation under locks.  This means things like
f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to
GFP_NOFS can be reverted.  There are no longer any allocation rules when using
or implementing your own fsnotify listener.

fsnotify paves the way for fanotify.  people may not care for the original
companies that pushed for TALPA, but fanotify was designed with flexibility in
mind.  A first group that wants fanotify like interfaces is the readahead
group.  So they can be profiling at boot time in order to dynamicly tune
readahead to help with boot speed.  I've got other ideas how to use fanotify
to speed up boot when dealing with encrypted mounts, but I'm not ready to say
it yet since I don't know if my idea will work.

This patch series just builds fsnotify to the point that it can implement
dnotify and inotify_user.  Patches exist and will be sent soon after
acceptance to finish the in kernel inotify conversion (audit) and implement
fanotify.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/Kconfig                |   13 +++
 fs/notify/Makefile               |    2 
 fs/notify/fsnotify.c             |   79 ++++++++++++++++++
 fs/notify/fsnotify.h             |   17 ++++
 fs/notify/group.c                |  168 ++++++++++++++++++++++++++++++++++++++
 fs/notify/notification.c         |  116 ++++++++++++++++++++++++++
 include/linux/fsnotify.h         |  129 +++++++++++++++++++++--------
 include/linux/fsnotify_backend.h |  135 +++++++++++++++++++++++++++++++
 8 files changed, 622 insertions(+), 37 deletions(-)
 create mode 100644 fs/notify/fsnotify.c
 create mode 100644 fs/notify/fsnotify.h
 create mode 100644 fs/notify/group.c
 create mode 100644 fs/notify/notification.c
 create mode 100644 include/linux/fsnotify_backend.h

diff --git a/fs/notify/Kconfig b/fs/notify/Kconfig
index 50914d7..31dac7e 100644
--- a/fs/notify/Kconfig
+++ b/fs/notify/Kconfig
@@ -1,2 +1,15 @@
+config FSNOTIFY
+	bool "Filesystem notification backend"
+	default y
+	---help---
+	   fsnotify is a backend for filesystem notification.  fsnotify does
+	   not provide any userspace interface but does provide the basis
+	   needed for other notification schemes such as dnotify, inotify,
+	   and fanotify.
+
+	   Say Y here to enable fsnotify suport.
+
+	   If unsure, say Y.
+
 source "fs/notify/dnotify/Kconfig"
 source "fs/notify/inotify/Kconfig"
diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index 5a95b60..db5467b 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,2 +1,4 @@
+obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o
+
 obj-y			+= dnotify/
 obj-y			+= inotify/
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
new file mode 100644
index 0000000..56bee0f
--- /dev/null
+++ b/fs/notify/fsnotify.c
@@ -0,0 +1,79 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/dcache.h>
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/srcu.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+/*
+ * This is the main call to fsnotify.  The VFS calls into hook specific functions
+ * in linux/fsnotify.h.  Those functions then in turn call here.  Here will call
+ * out to all of the registered fsnotify_group.  Those groups can then use the
+ * notification event in whatever means they feel necessary.
+ */
+void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is)
+{
+	struct fsnotify_group *group;
+	struct fsnotify_event *event = NULL;
+	int idx;
+
+	if (list_empty(&fsnotify_groups))
+		return;
+
+	if (!(mask & fsnotify_mask))
+		return;
+
+	/*
+	 * SRCU!!  the groups list is very very much read only and the path is
+	 * very hot.  The VAST majority of events are not going to need to do
+	 * anything other than walk the list so it's crazy to pre-allocate.
+	 */
+	idx = srcu_read_lock(&fsnotify_grp_srcu);
+	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
+		if (mask & group->mask) {
+			if (!event) {
+				event = fsnotify_create_event(to_tell, mask, data, data_is);
+				/* shit, we OOM'd and now we can't tell, maybe
+				 * someday someone else will want to do something
+				 * here */
+				if (!event)
+					break;
+			}
+			group->ops->handle_event(group, event);
+		}
+	}
+	srcu_read_unlock(&fsnotify_grp_srcu, idx);
+	/*
+	 * fsnotify_create_event() took a reference so the event can't be cleaned
+	 * up while we are still trying to add it to lists, drop that one.
+	 */
+	if (event)
+		fsnotify_put_event(event);
+}
+EXPORT_SYMBOL_GPL(fsnotify);
+
+static __init int fsnotify_init(void)
+{
+	return init_srcu_struct(&fsnotify_grp_srcu);
+}
+subsys_initcall(fsnotify_init);
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
new file mode 100644
index 0000000..bf41e60
--- /dev/null
+++ b/fs/notify/fsnotify.h
@@ -0,0 +1,17 @@
+#ifndef _LINUX_FSNOTIFY_PRIVATE_H
+#define _LINUX_FSNOTIFY_PRIVATE_H
+
+#include <linux/dcache.h>
+#include <linux/list.h>
+#include <linux/fs.h>
+#include <linux/path.h>
+#include <linux/spinlock.h>
+
+#include <linux/fsnotify.h>
+
+#include <asm/atomic.h>
+
+extern struct srcu_struct fsnotify_grp_srcu;
+extern struct list_head fsnotify_groups;
+extern __u32 fsnotify_mask;
+#endif	/* _LINUX_FSNOTIFY_PRIVATE_H */
diff --git a/fs/notify/group.c b/fs/notify/group.c
new file mode 100644
index 0000000..88d040b
--- /dev/null
+++ b/fs/notify/group.c
@@ -0,0 +1,168 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/srcu.h>
+#include <linux/rculist.h>
+#include <linux/wait.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+#include <asm/atomic.h>
+
+DEFINE_MUTEX(fsnotify_grp_mutex);
+struct srcu_struct fsnotify_grp_srcu;
+LIST_HEAD(fsnotify_groups);
+__u32 fsnotify_mask;
+
+void fsnotify_recalc_global_mask(void)
+{
+	struct fsnotify_group *group;
+	__u32 mask = 0;
+	int idx;
+
+	idx = srcu_read_lock(&fsnotify_grp_srcu);
+	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
+		mask |= group->mask;
+	}
+	srcu_read_unlock(&fsnotify_grp_srcu, idx);
+	fsnotify_mask = mask;
+}
+
+static void fsnotify_add_group(struct fsnotify_group *group)
+{
+	list_add_rcu(&group->group_list, &fsnotify_groups);
+	group->evicted = 0;
+}
+
+static void fsnotify_get_group(struct fsnotify_group *group)
+{
+	atomic_inc(&group->refcnt);
+}
+
+static void fsnotify_destroy_group(struct fsnotify_group *group)
+{
+	if (group->ops->free_group_priv)
+		group->ops->free_group_priv(group);
+
+	kfree(group);
+}
+
+static void __fsnotify_evict_group(struct fsnotify_group *group)
+{
+	BUG_ON(!mutex_is_locked(&fsnotify_grp_mutex));
+
+	if (!group->evicted)
+		list_del_rcu(&group->group_list);
+	group->evicted = 1;
+}
+
+void fsnotify_evict_group(struct fsnotify_group *group)
+{
+	mutex_lock(&fsnotify_grp_mutex);
+	__fsnotify_evict_group(group);
+	mutex_unlock(&fsnotify_grp_mutex);
+}
+
+void fsnotify_put_group(struct fsnotify_group *group)
+{
+	if (!atomic_dec_and_mutex_lock(&group->refcnt, &fsnotify_grp_mutex))
+		return;
+
+	/* OK, now we know that there's no other users *and* we hold mutex,
+	 * so no new references will appear */
+	__fsnotify_evict_group(group);
+
+	/* now it's off the list, so the only thing we might care about is
+	 * srcu acces.... */
+	mutex_unlock(&fsnotify_grp_mutex);
+	synchronize_srcu(&fsnotify_grp_srcu);
+
+	/* and now it is really dead. _Nothing_ could be seeing it */
+	fsnotify_recalc_global_mask();
+	fsnotify_destroy_group(group);
+}
+
+static struct fsnotify_group *fsnotify_find_group(unsigned int group_num, __u32 mask,
+						  const struct fsnotify_ops *ops)
+{
+	struct fsnotify_group *group_iter;
+	struct fsnotify_group *group = NULL;
+
+	BUG_ON(!mutex_is_locked(&fsnotify_grp_mutex));
+
+	list_for_each_entry_rcu(group_iter, &fsnotify_groups, group_list) {
+		if (group_iter->group_num == group_num) {
+			if ((group_iter->mask == mask) &&
+			    (group_iter->ops == ops)) {
+				fsnotify_get_group(group_iter);
+				group = group_iter;
+			} else
+				group = ERR_PTR(-EEXIST);
+		}
+	}
+	return group;
+}
+
+/*
+ * Either finds an existing group which matches the group_num, mask, and ops or
+ * creates a new group and adds it to the global group list.  In either case we
+ * take a reference for the group returned.
+ *
+ * low use function, could be faster to check if the group is there before we do
+ * the allocation and the initialization, but this is only called when notification
+ * systems make changes, so why make it more complex?
+ */
+struct fsnotify_group *fsnotify_obtain_group(unsigned int group_num, __u32 mask,
+					     const struct fsnotify_ops *ops)
+{
+	struct fsnotify_group *group, *tgroup;
+
+	group = kmalloc(sizeof(struct fsnotify_group), GFP_KERNEL);
+	if (!group)
+		return ERR_PTR(-ENOMEM);
+
+	atomic_set(&group->refcnt, 1);
+
+	group->group_num = group_num;
+	group->mask = mask;
+
+	group->ops = ops;
+
+	mutex_lock(&fsnotify_grp_mutex);
+	tgroup = fsnotify_find_group(group_num, mask, ops);
+	if (tgroup) {
+		/* group already exists */
+		mutex_unlock(&fsnotify_grp_mutex);
+		/* destroy the new one we made */
+		fsnotify_put_group(group);
+		return tgroup;
+	}
+
+	/* group not found, add a new one */
+	fsnotify_add_group(group);
+	mutex_unlock(&fsnotify_grp_mutex);
+
+	if (mask)
+		fsnotify_recalc_global_mask();
+
+	return group;
+}
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
new file mode 100644
index 0000000..eb23a69
--- /dev/null
+++ b/fs/notify/notification.c
@@ -0,0 +1,116 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mount.h>
+#include <linux/mutex.h>
+#include <linux/namei.h>
+#include <linux/path.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <asm/atomic.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+static struct kmem_cache *event_kmem_cache;
+
+void fsnotify_get_event(struct fsnotify_event *event)
+{
+	atomic_inc(&event->refcnt);
+}
+
+void fsnotify_put_event(struct fsnotify_event *event)
+{
+	if (!event)
+		return;
+
+	if (atomic_dec_and_test(&event->refcnt)) {
+		if (event->data_type == FSNOTIFY_EVENT_PATH) {
+			path_put(&event->path);
+			event->path.dentry = NULL;
+			event->path.mnt = NULL;
+		}
+
+		event->mask = 0;
+
+		kmem_cache_free(event_kmem_cache, event);
+	}
+}
+
+struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type)
+{
+	struct fsnotify_event *event;
+
+	event = kmem_cache_alloc(event_kmem_cache, GFP_KERNEL);
+	if (!event)
+		return NULL;
+
+	atomic_set(&event->refcnt, 1);
+
+	spin_lock_init(&event->lock);
+
+	event->path.dentry = NULL;
+	event->path.mnt = NULL;
+	event->inode = NULL;
+
+	event->to_tell = to_tell;
+
+	switch (data_type) {
+	case FSNOTIFY_EVENT_FILE: {
+		struct file *file = data;
+		struct path *path = &file->f_path;
+		event->path.dentry = path->dentry;
+		event->path.mnt = path->mnt;
+		path_get(&event->path);
+		event->data_type = FSNOTIFY_EVENT_PATH;
+		break;
+	}
+	case FSNOTIFY_EVENT_PATH: {
+		struct path *path = data;
+		event->path.dentry = path->dentry;
+		event->path.mnt = path->mnt;
+		path_get(&event->path);
+		event->data_type = FSNOTIFY_EVENT_PATH;
+		break;
+	}
+	case FSNOTIFY_EVENT_INODE:
+		event->inode = data;
+		event->data_type = FSNOTIFY_EVENT_INODE;
+		break;
+	default:
+		BUG();
+	};
+
+	event->mask = mask;
+
+	return event;
+}
+
+__init int fsnotify_notification_init(void)
+{
+	event_kmem_cache = kmem_cache_create("fsnotify_event", sizeof(struct fsnotify_event), 0, SLAB_PANIC, NULL);
+
+	return 0;
+}
+subsys_initcall(fsnotify_notification_init);
+
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 00fbd5b..3d68058 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -13,6 +13,7 @@
 
 #include <linux/dnotify.h>
 #include <linux/inotify.h>
+#include <linux/fsnotify_backend.h>
 #include <linux/audit.h>
 
 /*
@@ -35,6 +36,17 @@ static inline void fsnotify_d_move(struct dentry *entry)
 }
 
 /*
+ * fsnotify_inoderemove - an inode is going away
+ */
+static inline void fsnotify_inoderemove(struct inode *inode)
+{
+	inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL, NULL);
+	inotify_inode_is_dead(inode);
+
+	fsnotify(inode, FS_DELETE_SELF, inode, FSNOTIFY_EVENT_INODE);
+}
+
+/*
  * fsnotify_move - file old_name at old_dir was moved to new_name at new_dir
  */
 static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
@@ -43,28 +55,42 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 {
 	struct inode *source = moved->d_inode;
 	u32 cookie = inotify_get_cookie();
+	__u32 old_dir_mask = 0;
+	__u32 new_dir_mask = 0;
 
-	if (old_dir == new_dir)
+	if (old_dir == new_dir) {
 		inode_dir_notify(old_dir, DN_RENAME);
-	else {
+		old_dir_mask = FS_DN_RENAME;
+	} else {
 		inode_dir_notify(old_dir, DN_DELETE);
+		old_dir_mask = FS_DELETE;
 		inode_dir_notify(new_dir, DN_CREATE);
+		new_dir_mask = FS_CREATE;
 	}
 
-	if (isdir)
+	if (isdir) {
 		isdir = IN_ISDIR;
+		old_dir_mask |= FS_IN_ISDIR;
+		new_dir_mask |= FS_IN_ISDIR;
+	}
+
+	old_dir_mask |= FS_MOVED_FROM;
+	new_dir_mask |= FS_MOVED_TO;
+
 	inotify_inode_queue_event(old_dir, IN_MOVED_FROM|isdir,cookie,old_name,
 				  source);
 	inotify_inode_queue_event(new_dir, IN_MOVED_TO|isdir, cookie, new_name,
 				  source);
 
-	if (target) {
-		inotify_inode_queue_event(target, IN_DELETE_SELF, 0, NULL, NULL);
-		inotify_inode_is_dead(target);
-	}
+	fsnotify(old_dir, old_dir_mask, old_dir, FSNOTIFY_EVENT_INODE);
+	fsnotify(new_dir, new_dir_mask, new_dir, FSNOTIFY_EVENT_INODE);
+
+	if (target)
+		fsnotify_inoderemove(target);
 
 	if (source) {
 		inotify_inode_queue_event(source, IN_MOVE_SELF, 0, NULL, NULL);
+		fsnotify(source, FS_MOVE_SELF, moved->d_inode, FSNOTIFY_EVENT_INODE);
 	}
 	audit_inode_child(new_name, moved, new_dir);
 }
@@ -75,26 +101,19 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 {
 	if (isdir)
-		isdir = IN_ISDIR;
+		isdir = FS_IN_ISDIR;
 	dnotify_parent(dentry, DN_DELETE);
 	inotify_dentry_parent_queue_event(dentry, IN_DELETE|isdir, 0, dentry->d_name.name);
 }
 
 /*
- * fsnotify_inoderemove - an inode is going away
- */
-static inline void fsnotify_inoderemove(struct inode *inode)
-{
-	inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL, NULL);
-	inotify_inode_is_dead(inode);
-}
-
-/*
  * fsnotify_link_count - inode's link count changed
  */
 static inline void fsnotify_link_count(struct inode *inode)
 {
 	inotify_inode_queue_event(inode, IN_ATTRIB, 0, NULL, NULL);
+
+	fsnotify(inode, FS_ATTRIB, inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -106,6 +125,8 @@ static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
 	inotify_inode_queue_event(inode, IN_CREATE, 0, dentry->d_name.name,
 				  dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
+
+	fsnotify(inode, FS_CREATE, dentry->d_inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -120,6 +141,8 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct
 				  inode);
 	fsnotify_link_count(inode);
 	audit_inode_child(new_dentry->d_name.name, new_dentry, dir);
+
+	fsnotify(dir, FS_CREATE, inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -131,6 +154,8 @@ static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 	inotify_inode_queue_event(inode, IN_CREATE | IN_ISDIR, 0, 
 				  dentry->d_name.name, dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
+
+	fsnotify(inode, FS_CREATE | FS_IN_ISDIR, dentry->d_inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -139,14 +164,16 @@ static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 static inline void fsnotify_access(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
-	u32 mask = IN_ACCESS;
+	__u32 mask = FS_ACCESS;
 
 	if (S_ISDIR(inode->i_mode))
-		mask |= IN_ISDIR;
+		mask |= FS_IN_ISDIR;
 
 	dnotify_parent(dentry, DN_ACCESS);
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -155,14 +182,16 @@ static inline void fsnotify_access(struct dentry *dentry)
 static inline void fsnotify_modify(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
-	u32 mask = IN_MODIFY;
+	__u32 mask = FS_MODIFY;
 
 	if (S_ISDIR(inode->i_mode))
-		mask |= IN_ISDIR;
+		mask |= FS_IN_ISDIR;
 
 	dnotify_parent(dentry, DN_MODIFY);
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -171,13 +200,15 @@ static inline void fsnotify_modify(struct dentry *dentry)
 static inline void fsnotify_open(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
-	u32 mask = IN_OPEN;
+	__u32 mask = FS_OPEN;
 
 	if (S_ISDIR(inode->i_mode))
-		mask |= IN_ISDIR;
+		mask |= FS_IN_ISDIR;
 
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -189,13 +220,15 @@ static inline void fsnotify_close(struct file *file)
 	struct inode *inode = dentry->d_inode;
 	const char *name = dentry->d_name.name;
 	fmode_t mode = file->f_mode;
-	u32 mask = (mode & FMODE_WRITE) ? IN_CLOSE_WRITE : IN_CLOSE_NOWRITE;
+	__u32 mask = (mode & FMODE_WRITE) ? FS_CLOSE_WRITE : FS_CLOSE_NOWRITE;
 
 	if (S_ISDIR(inode->i_mode))
-		mask |= IN_ISDIR;
+		mask |= FS_IN_ISDIR;
 
 	inotify_dentry_parent_queue_event(dentry, mask, 0, name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+
+	fsnotify(inode, mask, file, FSNOTIFY_EVENT_FILE);
 }
 
 /*
@@ -204,13 +237,15 @@ static inline void fsnotify_close(struct file *file)
 static inline void fsnotify_xattr(struct dentry *dentry)
 {
 	struct inode *inode = dentry->d_inode;
-	u32 mask = IN_ATTRIB;
+	__u32 mask = FS_ATTRIB;
 
 	if (S_ISDIR(inode->i_mode))
-		mask |= IN_ISDIR;
+		mask |= FS_IN_ISDIR;
 
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
 /*
@@ -224,31 +259,31 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 	u32 in_mask = 0;
 
 	if (ia_valid & ATTR_UID) {
-		in_mask |= IN_ATTRIB;
+		in_mask |= FS_ATTRIB;
 		dn_mask |= DN_ATTRIB;
 	}
 	if (ia_valid & ATTR_GID) {
-		in_mask |= IN_ATTRIB;
+		in_mask |= FS_ATTRIB;
 		dn_mask |= DN_ATTRIB;
 	}
 	if (ia_valid & ATTR_SIZE) {
-		in_mask |= IN_MODIFY;
+		in_mask |= FS_MODIFY;
 		dn_mask |= DN_MODIFY;
 	}
 	/* both times implies a utime(s) call */
 	if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
 	{
-		in_mask |= IN_ATTRIB;
+		in_mask |= FS_ATTRIB;
 		dn_mask |= DN_ATTRIB;
 	} else if (ia_valid & ATTR_ATIME) {
-		in_mask |= IN_ACCESS;
+		in_mask |= FS_ACCESS;
 		dn_mask |= DN_ACCESS;
 	} else if (ia_valid & ATTR_MTIME) {
-		in_mask |= IN_MODIFY;
+		in_mask |= FS_MODIFY;
 		dn_mask |= DN_MODIFY;
 	}
 	if (ia_valid & ATTR_MODE) {
-		in_mask |= IN_ATTRIB;
+		in_mask |= FS_ATTRIB;
 		dn_mask |= DN_ATTRIB;
 	}
 
@@ -256,20 +291,40 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 		dnotify_parent(dentry, dn_mask);
 	if (in_mask) {
 		if (S_ISDIR(inode->i_mode))
-			in_mask |= IN_ISDIR;
+			in_mask |= FS_IN_ISDIR;
 		inotify_inode_queue_event(inode, in_mask, 0, NULL, NULL);
 		inotify_dentry_parent_queue_event(dentry, in_mask, 0,
 						  dentry->d_name.name);
+		fsnotify(inode, in_mask, inode, FSNOTIFY_EVENT_INODE);
 	}
 }
 
-#ifdef CONFIG_INOTIFY	/* inotify helpers */
+#if defined(CONFIG_INOTIFY) || defined(CONFIG_FSNOTIFY)	/* notify helpers */
 
 /*
  * fsnotify_oldname_init - save off the old filename before we change it
  */
 static inline const char *fsnotify_oldname_init(const char *name)
 {
+	BUILD_BUG_ON(IN_ACCESS != FS_ACCESS);
+	BUILD_BUG_ON(IN_MODIFY != FS_MODIFY);
+	BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB);
+	BUILD_BUG_ON(IN_CLOSE_WRITE != FS_CLOSE_WRITE);
+	BUILD_BUG_ON(IN_CLOSE_NOWRITE != FS_CLOSE_NOWRITE);
+	BUILD_BUG_ON(IN_OPEN != FS_OPEN);
+	BUILD_BUG_ON(IN_MOVED_FROM != FS_MOVED_FROM);
+	BUILD_BUG_ON(IN_MOVED_TO != FS_MOVED_TO);
+	BUILD_BUG_ON(IN_CREATE != FS_CREATE);
+	BUILD_BUG_ON(IN_DELETE != FS_DELETE);
+	BUILD_BUG_ON(IN_DELETE_SELF != FS_DELETE_SELF);
+	BUILD_BUG_ON(IN_MOVE_SELF != FS_MOVE_SELF);
+	BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW);
+
+	BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT);
+	BUILD_BUG_ON(IN_ISDIR != FS_IN_ISDIR);
+	BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED);
+	BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT);
+
 	return kstrdup(name, GFP_KERNEL);
 }
 
@@ -281,7 +336,7 @@ static inline void fsnotify_oldname_free(const char *old_name)
 	kfree(old_name);
 }
 
-#else	/* CONFIG_INOTIFY */
+#else	/* CONFIG_INOTIFY || CONFIG_FSNOTIFY */
 
 static inline const char *fsnotify_oldname_init(const char *name)
 {
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
new file mode 100644
index 0000000..0523333
--- /dev/null
+++ b/include/linux/fsnotify_backend.h
@@ -0,0 +1,135 @@
+/*
+ * Filesystem access notification for Linux
+ *
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ */
+
+#ifndef _LINUX_FSNOTIFY_BACKEND_H
+#define _LINUX_FSNOTIFY_BACKEND_H
+
+#ifdef __KERNEL__
+
+#include <linux/fs.h> /* struct inode */
+#include <linux/list.h>
+#include <linux/path.h> /* struct path */
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+
+#include <asm/atomic.h>
+
+/*
+ * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily
+ * convert between them.  dnotify only needs conversion at watch creation
+ * so no perf loss there.  fanotify isn't defined yet, so it can use the
+ * wholes if it needs more events.
+ */
+#define FS_ACCESS		0x00000001ul	/* File was accessed */
+#define FS_MODIFY		0x00000002ul	/* File was modified */
+#define FS_ATTRIB		0x00000004ul	/* Metadata changed */
+#define FS_CLOSE_WRITE		0x00000008ul	/* Writtable file was closed */
+#define FS_CLOSE_NOWRITE	0x00000010ul	/* Unwrittable file closed */
+#define FS_OPEN			0x00000020ul	/* File was opened */
+#define FS_MOVED_FROM		0x00000040ul	/* File was moved from X */
+#define FS_MOVED_TO		0x00000080ul	/* File was moved to Y */
+#define FS_CREATE		0x00000100ul	/* Subfile was created */
+#define FS_DELETE		0x00000200ul	/* Subfile was deleted */
+#define FS_DELETE_SELF		0x00000400ul	/* Self was deleted */
+#define FS_MOVE_SELF		0x00000800ul	/* Self was moved */
+
+#define FS_UNMOUNT		0x00002000ul	/* inode on umount fs */
+#define FS_Q_OVERFLOW		0x00004000ul	/* Event queued overflowed */
+#define FS_IN_IGNORED		0x00008000ul	/* last inotify event here */
+
+#define FS_IN_ISDIR		0x40000000ul	/* event occurred against dir */
+#define FS_IN_ONESHOT		0x80000000ul	/* only send event once */
+
+#define FS_DN_RENAME		0x10000000ul	/* file renamed */
+#define FS_DN_MULTISHOT		0x20000000ul	/* dnotify multishot */
+
+#define FS_EVENT_ON_CHILD	0x08000000ul
+
+struct fsnotify_group;
+struct fsnotify_event;
+
+/*
+ * Each group much define these ops.
+ *
+ * handle_event - main call for a group to handle an fs event
+ * free_group_priv - called when a group refcnt hits 0 to clean up the private union
+ */
+struct fsnotify_ops {
+	int (*handle_event)(struct fsnotify_group *group, struct fsnotify_event *event);
+	void (*free_group_priv)(struct fsnotify_group *group);
+};
+
+/*
+ * A group is a "thing" that wants to receive notification about filesystem
+ * events.  The mask holds the subset of event types this group cares about.
+ * refcnt on a group is up to the implementor and at any moment if it goes 0
+ * everything will be cleaned up.
+ */
+struct fsnotify_group {
+	struct list_head group_list;	/* list of all groups on the system */
+	__u32 mask;			/* mask of events this group cares about */
+	atomic_t refcnt;		/* num of processes with a special file open */
+	unsigned int group_num;		/* the 'name' of the event */
+
+	const struct fsnotify_ops *ops;	/* how this group handles things */
+
+	unsigned int evicted:1;		/* has this group been evicted? */
+
+	/* groups can define private fields here */
+	union {
+	};
+};
+
+/*
+ * all of the information about the original object we want to now send to
+ * a group.  If you want to carry more info from the accessing task to the
+ * listener this structure is where you need to be adding fields.
+ */
+struct fsnotify_event {
+	spinlock_t lock;	/* protection for the associated event_holder and private_list */
+	struct inode *to_tell;
+	/*
+	 * depending on the event type we should have either a path or inode
+	 * we should never have more than one....
+	 */
+	union {
+		struct path path;
+		struct inode *inode;
+	};
+/* when calling fsnotify tell it if the data is a path or inode */
+#define FSNOTIFY_EVENT_PATH	1
+#define FSNOTIFY_EVENT_INODE	2
+#define FSNOTIFY_EVENT_FILE	3
+	int data_type;		/* which of the above union we have */
+	atomic_t refcnt;	/* how many groups still are using/need to send this event */
+	__u32 mask;		/* the type of access */
+};
+
+#ifdef CONFIG_FSNOTIFY
+
+/* called from the vfs to signal fs events */
+extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is);
+
+/* called from fsnotify interfaces, such as fanotify or dnotify */
+extern void fsnotify_recalc_global_mask(void);
+extern struct fsnotify_group *fsnotify_obtain_group(unsigned int group_num, __u32 mask, const struct fsnotify_ops *ops);
+extern void fsnotify_put_group(struct fsnotify_group *group);
+
+extern void fsnotify_get_event(struct fsnotify_event *event);
+extern void fsnotify_put_event(struct fsnotify_event *event);
+extern struct fsnotify_event_private_data *fsnotify_get_priv_from_event(struct fsnotify_group *group, struct fsnotify_event *event);
+
+/* put here because inotify does some weird stuff when destroying watches */
+extern struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_is);
+#else
+
+static inline void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is);
+{}
+#endif	/* CONFIG_FSNOTIFY */
+
+#endif	/* __KERNEL __ */
+
+#endif	/* _LINUX_FSNOTIFY_BACKEND_H */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 03/13] fsnotify: add group priorities
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
  2009-03-27 20:05 ` [PATCH -V2 02/13] fsnotify: unified filesystem notification backend Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:05 ` [PATCH -V2 04/13] fsnotify: add in inode fsnotify markings Eric Paris
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

This introduces an ordering to fnotify groups.  It's most interesting because
an HSM would need to run before a typical access notifier.  And an access
control system would need to run between the two.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/group.c                |   29 ++++++++++++++++++++++-------
 include/linux/fsnotify_backend.h |    4 +++-
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index 88d040b..dd1d18d 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -49,8 +49,21 @@ void fsnotify_recalc_global_mask(void)
 
 static void fsnotify_add_group(struct fsnotify_group *group)
 {
-	list_add_rcu(&group->group_list, &fsnotify_groups);
+	int priority = group->priority;
+	struct fsnotify_group *group_iter;
+
 	group->evicted = 0;
+	list_for_each_entry(group_iter, &fsnotify_groups, group_list) {
+		/* insert in front of this one? */
+		if (priority < group_iter->priority) {
+			/* I used list_add_tail() to insert in front of group_iter...  */
+			list_add_tail_rcu(&group->group_list, &group_iter->group_list);
+			return;
+		}
+	}
+
+	/* apparently we need to be the last entry */
+	list_add_tail_rcu(&group->group_list, &fsnotify_groups);
 }
 
 static void fsnotify_get_group(struct fsnotify_group *group)
@@ -101,8 +114,8 @@ void fsnotify_put_group(struct fsnotify_group *group)
 	fsnotify_destroy_group(group);
 }
 
-static struct fsnotify_group *fsnotify_find_group(unsigned int group_num, __u32 mask,
-						  const struct fsnotify_ops *ops)
+static struct fsnotify_group *fsnotify_find_group(unsigned int priority, unsigned int group_num,
+						  __u32 mask, const struct fsnotify_ops *ops)
 {
 	struct fsnotify_group *group_iter;
 	struct fsnotify_group *group = NULL;
@@ -110,8 +123,9 @@ static struct fsnotify_group *fsnotify_find_group(unsigned int group_num, __u32
 	BUG_ON(!mutex_is_locked(&fsnotify_grp_mutex));
 
 	list_for_each_entry_rcu(group_iter, &fsnotify_groups, group_list) {
-		if (group_iter->group_num == group_num) {
+		if (group_iter->priority == priority) {
 			if ((group_iter->mask == mask) &&
+			    (group_iter->group_num == group_num) &&
 			    (group_iter->ops == ops)) {
 				fsnotify_get_group(group_iter);
 				group = group_iter;
@@ -131,8 +145,8 @@ static struct fsnotify_group *fsnotify_find_group(unsigned int group_num, __u32
  * the allocation and the initialization, but this is only called when notification
  * systems make changes, so why make it more complex?
  */
-struct fsnotify_group *fsnotify_obtain_group(unsigned int group_num, __u32 mask,
-					     const struct fsnotify_ops *ops)
+struct fsnotify_group *fsnotify_obtain_group(unsigned int priority, unsigned int group_num,
+					     __u32 mask, const struct fsnotify_ops *ops)
 {
 	struct fsnotify_group *group, *tgroup;
 
@@ -142,13 +156,14 @@ struct fsnotify_group *fsnotify_obtain_group(unsigned int group_num, __u32 mask,
 
 	atomic_set(&group->refcnt, 1);
 
+	group->priority = priority;
 	group->group_num = group_num;
 	group->mask = mask;
 
 	group->ops = ops;
 
 	mutex_lock(&fsnotify_grp_mutex);
-	tgroup = fsnotify_find_group(group_num, mask, ops);
+	tgroup = fsnotify_find_group(priority, group_num, mask, ops);
 	if (tgroup) {
 		/* group already exists */
 		mutex_unlock(&fsnotify_grp_mutex);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 0523333..a349691 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -76,6 +76,7 @@ struct fsnotify_group {
 
 	const struct fsnotify_ops *ops;	/* how this group handles things */
 
+	unsigned int priority;		/* order this group should receive msgs.  low first */
 	unsigned int evicted:1;		/* has this group been evicted? */
 
 	/* groups can define private fields here */
@@ -115,7 +116,8 @@ extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is)
 
 /* called from fsnotify interfaces, such as fanotify or dnotify */
 extern void fsnotify_recalc_global_mask(void);
-extern struct fsnotify_group *fsnotify_obtain_group(unsigned int group_num, __u32 mask, const struct fsnotify_ops *ops);
+extern struct fsnotify_group *fsnotify_obtain_group(unsigned int priority, unsigned int group_num,
+						    __u32 mask, const struct fsnotify_ops *ops);
 extern void fsnotify_put_group(struct fsnotify_group *group);
 
 extern void fsnotify_get_event(struct fsnotify_event *event);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 04/13] fsnotify: add in inode fsnotify markings
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
  2009-03-27 20:05 ` [PATCH -V2 02/13] fsnotify: unified filesystem notification backend Eric Paris
  2009-03-27 20:05 ` [PATCH -V2 03/13] fsnotify: add group priorities Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:05 ` [PATCH -V2 05/13] fsnotify: parent event notification Eric Paris
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

This patch creates in inode fsnotify markings.  dnotify will make use of in
inode markings to mark which inodes it wishes to send events for.  fanotify
will use this to mark which inodes it does not wish to send events for.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 Documentation/filesystems/fsnotify.txt |  180 +++++++++++++++++++++++++
 fs/inode.c                             |    9 +
 fs/notify/Makefile                     |    2 
 fs/notify/fsnotify.c                   |   10 +
 fs/notify/fsnotify.h                   |    3 
 fs/notify/group.c                      |   33 ++++-
 fs/notify/inode_mark.c                 |  229 ++++++++++++++++++++++++++++++++
 include/linux/fs.h                     |    5 +
 include/linux/fsnotify.h               |    9 +
 include/linux/fsnotify_backend.h       |   58 ++++++++
 10 files changed, 535 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/filesystems/fsnotify.txt
 create mode 100644 fs/notify/inode_mark.c

diff --git a/Documentation/filesystems/fsnotify.txt b/Documentation/filesystems/fsnotify.txt
new file mode 100644
index 0000000..e1c90f5
--- /dev/null
+++ b/Documentation/filesystems/fsnotify.txt
@@ -0,0 +1,180 @@
+fsnotify inode mark locking/lifetime/and refcnting
+
+struct fsnotify_mark_entry {
+        __u32 mask;                     /* mask this mark entry is for */
+        /* we hold ref for each i_list and g_list.  also one ref for each 'thing'
+         * in kernel that found and may be using this mark. */
+        atomic_t refcnt;                /* active things looking at this mark */
+        struct inode *inode;            /* inode this entry is associated with */
+        struct fsnotify_group *group;   /* group this mark entry is for */
+        struct hlist_node i_list;       /* list of mark_entries by inode->i_fsnotify_mark_entries */
+        struct list_head g_list;        /* list of mark_entries by group->i_fsnotify_mark_entries */
+        spinlock_t lock;                /* protect group, inode, and killme */
+        struct list_head free_i_list;   /* tmp list used when freeing this mark */
+        struct list_head free_g_list;   /* tmp list used when freeing this mark */
+        void (*free_mark)(struct fsnotify_mark_entry *entry); /* called on final put+free */
+};
+
+REFCNT:
+The mark->refcnt tells how many "things" in the kernel currectly are
+referencing this object.  The object typically will live inside the kernel
+with a refcnt of 2, one for each list it is on (i_list, g_list).  Any task
+which can find this object holding the appropriete locks, can take a reference
+and the object itself is guarenteed to survive until the reference is dropped.
+
+LOCKING:
+There are 3 spinlocks involved with fsnotify inode marks and they MUST
+be taking in order as follows:
+
+entry->lock
+group->mark_lock
+inode->i_lock
+
+entry->lock protects 2 things, entry->group and entry->inode.  You must hold
+that lock to dereference either of these things (they could be NULL even with
+the lock)
+
+group->mark_lock protects the mark_entries list anchored inside a given group
+and each entry is hooked via the g_list.  It also sorta protects the
+free_g_list, which when used is anchored by a private list on the stack of the
+task which held the group->mark_lock.
+
+inode->i_lock protects the i_fsnotify_mark_entries list anchored inside a
+given inode and each entry is hooked via the i_list. (and sorta the
+free_i_list)
+
+
+LIFETIME:
+Inode marks survive between when they are added to an inode and when their
+refcnt==0.
+
+The inode mark can be cleared for a number of different reasons including:
+- The inode is unlinked for the last time.  (fsnotify_inoderemove)
+- The inode is being evicted from cache. (fsnotify_inode_delete)
+- The fs the inode is on is unmounted.  (fsnotify_inode_delete/fsnotify_unmount_inodes)
+- Something explicitly requests that it be removed.  (fsnotify_destroy_mark_by_entry)
+- The fsnotify_group associated with the mark is going away and all such marks
+  need to be cleaned up. (fsnotify_clear_marks_by_group)
+
+Worst case we are given an inode and need to clean up all the marks on that
+inode.  We take i_lock and walk the i_fsnotify_mark_entries safely.  For each
+mark on the list we take a reference (so the mark can't disappear under us).
+We remove that mark form the inode's list of marks and we add this mark to a
+private list anchored on the stack using i_free_list;  At this point we no
+longer fear anything finding the mark using the inode's list of marks.
+
+We can safely and locklessly run the private list on the stack of everything
+we just unattached from the original inode.  For each mark on the private list
+we grab the mark-> and can thus dereference mark->group and mark->inode.  If
+we see the group and inode are not NULL we take those locks.  Now holding all
+3 locks we can completely remove the mark from other tasks finding it in the
+future.  Remember, 10 things might already be referencing this mark, but they
+better be holding a ref.  We drop our reference we took before we unhooked it
+from the inode.  When the ref hits 0 we can free the mark.
+
+Very similarly for freeing by group, except we use free_g_list.
+
+This has the very interesting property of being able to run concurrently with
+any (or all) other directions.  Lets walk through what happens with every
+combination trying to simultaneously mark this entry for destruction.
+
+(A) finds this event by some means and takes a reference.  (this could be any
+means including in the case of inotify through an idr, which is known to be
+safe since the idr entry itself holds a reference)
+(B) finds this event by some means and takes a reference.
+
+At this point.
+	refcnt == 4
+	i_list -> inode
+	inode -> inode
+	g_list -> group
+	group -> group
+	free_i_list -> NULL
+	free_g_list -> NULL
+
+(C) comes in and tries to free all of the fsnotify_mark attached to an inode.
+---- C  will take the i_lock and walk the i_fsnotify_mark entries list calling
+	list_del_init() on i_list, adding the entry to it's private list via
+	free_i_list, and taking a reference.  C releases the i_lock.  Start
+	walking the private list and block on the entry->lock (held by A
+	below)
+
+At this point.
+	refcnt == 5
+	i_list -> NULL
+	inode -> inode
+	g_list -> group
+	group -> group
+	free_i_list -> private list on (C) stack
+	free_g_list -> NULL
+
+(D) comes in and tries to free all of the marks attached to the same inode.
+---- D  will take the i_lock and won't find this entry on the list and does
+	nothing.  (this is the end of D)
+
+(E) comes along and wants to free all of the marks in the group.
+---- E  take the group->mark_lock walk the group->mark_entry.  grab a
+	reference to the mark, list_del_init the g_list.  Add the mark to the
+	free_g_list.  Release the group->mark_lock.  Now start walking the new
+	private list and block in entry->lock.
+
+**This is actually the point where the kernel cannot find this mark **
+
+At this point.
+	refcnt == 6
+	i_list -> NULL
+	inode -> inode
+	g_list -> NULL
+	group -> group
+	free_i_list -> private list on (C) stack
+	free_g_list -> private list on (E) stack
+
+(A) finally decides it wants to kill this entry for some reason.
+---- A  will take the entry->lock.  It will check if mark->group is non-NULL
+	and if so takes mark->group->mark_lock (it may have blocked here on D
+	above).  Check the ->inode and if set take mark->inode->i_lock (again
+	we may have been blocking on C).  We now own all the locks.  So
+	list_del_init on i_list and g_list.  set ->inode and ->group = NULL
+	drop those refs.  Unlock i_lock, mark_lock, and entry->lock.  Drop our
+	reference.   (this is the end of A)
+
+**Diff sequence of events this could be the point where the object is no
+longer able to be found**
+
+At this point.
+	refcnt == 3
+	i_list -> NUL
+	inode -> NULL
+	g_list -> NUL
+	group -> NULL
+	free_i_list -> private list on (C) stack
+	free_g_list -> private list on (E) stack
+
+(D) happens to be the one to win the entry->lock.
+---- D  sees that ->inode and ->group and NULL so it just doesn't bother to
+	grab those locks (if they are NULL we know this mark if off the
+	relevant lists).  So D doesn't do anything.  It sees that the mark is
+	off the lists so all it need to do is drop it's reference.
+
+At this point.
+	refcnt == 2
+	i_list -> NUL
+	inode -> NULL
+	g_list -> NUL
+	group -> NULL
+	free_i_list -> private list on (C) stack
+	free_g_list -> undefined
+
+(C) does the same thing as B and the mark looks like:
+
+At this point.
+	refcnt == 1
+	i_list -> NUL
+	inode -> NULL
+	g_list -> NUL
+	group -> NULL
+	free_i_list -> undefined
+	free_g_list -> undefined
+
+(B) is the only thing left with a reference when it drops that reference the
+object will get freed.
diff --git a/fs/inode.c b/fs/inode.c
index f2e0f3d..6a9a98e 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -22,6 +22,7 @@
 #include <linux/cdev.h>
 #include <linux/bootmem.h>
 #include <linux/inotify.h>
+#include <linux/fsnotify.h>
 #include <linux/mount.h>
 #include <linux/async.h>
 
@@ -189,6 +190,10 @@ struct inode *inode_init_always(struct super_block *sb, struct inode *inode)
 	inode->i_private = NULL;
 	inode->i_mapping = mapping;
 
+#ifdef CONFIG_FSNOTIFY
+	inode->i_fsnotify_mask = 0;
+#endif
+
 	return inode;
 
 out_free_security:
@@ -220,6 +225,7 @@ void destroy_inode(struct inode *inode)
 {
 	BUG_ON(inode_has_buffers(inode));
 	security_inode_free(inode);
+	fsnotify_inode_delete(inode);
 	if (inode->i_sb->s_op->destroy_inode)
 		inode->i_sb->s_op->destroy_inode(inode);
 	else
@@ -251,6 +257,9 @@ void inode_init_once(struct inode *inode)
 	INIT_LIST_HEAD(&inode->inotify_watches);
 	mutex_init(&inode->inotify_mutex);
 #endif
+#ifdef CONFIG_FSNOTIFY
+	INIT_HLIST_HEAD(&inode->i_fsnotify_mark_entries);
+#endif
 }
 
 EXPORT_SYMBOL(inode_init_once);
diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index db5467b..0922cc8 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,4 +1,4 @@
-obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o
+obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o inode_mark.o
 
 obj-y			+= dnotify/
 obj-y			+= inotify/
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 56bee0f..4cc2d46 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -25,6 +25,12 @@
 #include <linux/fsnotify_backend.h>
 #include "fsnotify.h"
 
+void __fsnotify_inode_delete(struct inode *inode)
+{
+	fsnotify_clear_marks_by_inode(inode);
+}
+EXPORT_SYMBOL_GPL(__fsnotify_inode_delete);
+
 /*
  * This is the main call to fsnotify.  The VFS calls into hook specific functions
  * in linux/fsnotify.h.  Those functions then in turn call here.  Here will call
@@ -43,6 +49,8 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is)
 	if (!(mask & fsnotify_mask))
 		return;
 
+	if (!(mask & to_tell->i_fsnotify_mask))
+		return;
 	/*
 	 * SRCU!!  the groups list is very very much read only and the path is
 	 * very hot.  The VAST majority of events are not going to need to do
@@ -51,6 +59,8 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is)
 	idx = srcu_read_lock(&fsnotify_grp_srcu);
 	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
 		if (mask & group->mask) {
+			if (!group->ops->should_send_event(group, to_tell, mask))
+				continue;
 			if (!event) {
 				event = fsnotify_create_event(to_tell, mask, data, data_is);
 				/* shit, we OOM'd and now we can't tell, maybe
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index bf41e60..48d4372 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -14,4 +14,7 @@
 extern struct srcu_struct fsnotify_grp_srcu;
 extern struct list_head fsnotify_groups;
 extern __u32 fsnotify_mask;
+
+extern void fsnotify_final_destroy_group(struct fsnotify_group *group);
+extern void fsnotify_clear_marks_by_inode(struct inode *inode);
 #endif	/* _LINUX_FSNOTIFY_PRIVATE_H */
diff --git a/fs/notify/group.c b/fs/notify/group.c
index dd1d18d..b6b32fa 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -47,6 +47,24 @@ void fsnotify_recalc_global_mask(void)
 	fsnotify_mask = mask;
 }
 
+void fsnotify_recalc_group_mask(struct fsnotify_group *group)
+{
+	__u32 mask = 0;
+	__u32 old_mask = group->mask;
+	struct fsnotify_mark_entry *entry;
+
+	spin_lock(&group->mark_lock);
+	list_for_each_entry(entry, &group->mark_entries, g_list) {
+		mask |= entry->mask;
+	}
+	spin_unlock(&group->mark_lock);
+
+	group->mask = mask;
+
+	if (old_mask != mask)
+		fsnotify_recalc_global_mask();
+}
+
 static void fsnotify_add_group(struct fsnotify_group *group)
 {
 	int priority = group->priority;
@@ -71,13 +89,22 @@ static void fsnotify_get_group(struct fsnotify_group *group)
 	atomic_inc(&group->refcnt);
 }
 
-static void fsnotify_destroy_group(struct fsnotify_group *group)
+void fsnotify_final_destroy_group(struct fsnotify_group *group)
 {
 	if (group->ops->free_group_priv)
 		group->ops->free_group_priv(group);
 
 	kfree(group);
 }
+static void fsnotify_destroy_group(struct fsnotify_group *group)
+{
+	/* clear all inode mark entries for this group */
+	fsnotify_clear_marks_by_group(group);
+
+	/* past the point of no return, matches the initial value of 1 */
+	if (atomic_dec_and_test(&group->num_marks))
+		fsnotify_final_destroy_group(group);
+}
 
 static void __fsnotify_evict_group(struct fsnotify_group *group)
 {
@@ -160,6 +187,10 @@ struct fsnotify_group *fsnotify_obtain_group(unsigned int priority, unsigned int
 	group->group_num = group_num;
 	group->mask = mask;
 
+	spin_lock_init(&group->mark_lock);
+	atomic_set(&group->num_marks, 1);
+	INIT_LIST_HEAD(&group->mark_entries);
+
 	group->ops = ops;
 
 	mutex_lock(&fsnotify_grp_mutex);
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
new file mode 100644
index 0000000..0271e65
--- /dev/null
+++ b/fs/notify/inode_mark.c
@@ -0,0 +1,229 @@
+/*
+ *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2, or (at your option)
+ *  any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; see the file COPYING.  If not, write to
+ *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <asm/atomic.h>
+
+#include <linux/fsnotify_backend.h>
+#include "fsnotify.h"
+
+void fsnotify_get_mark(struct fsnotify_mark_entry *entry)
+{
+	atomic_inc(&entry->refcnt);
+}
+
+void fsnotify_put_mark(struct fsnotify_mark_entry *entry)
+{
+	if (atomic_dec_and_test(&entry->refcnt))
+		entry->free_mark(entry);
+}
+
+/*
+ * recalculate the mask of events relevant to a given inode locked.
+ */
+static void fsnotify_recalc_inode_mask_locked(struct inode *inode)
+{
+	struct fsnotify_mark_entry *entry;
+	struct hlist_node *pos;
+	__u32 new_mask = 0;
+
+	assert_spin_locked(&inode->i_lock);
+
+	hlist_for_each_entry(entry, pos, &inode->i_fsnotify_mark_entries, i_list) {
+		new_mask |= entry->mask;
+	}
+	inode->i_fsnotify_mask = new_mask;
+}
+
+/*
+ * recalculate the mask of events relevant to a given inode.
+ */
+void fsnotify_recalc_inode_mask(struct inode *inode)
+{
+	spin_lock(&inode->i_lock);
+	fsnotify_recalc_inode_mask_locked(inode);
+	spin_unlock(&inode->i_lock);
+}
+
+/*
+ * Any time a mark is getting freed we end up here.
+ * The caller had better be holding a reference to this mark so we don't actually
+ * do the final put under the entry->lock
+ */
+void fsnotify_destroy_mark_by_entry(struct fsnotify_mark_entry *entry)
+{
+	struct fsnotify_group *group;
+	struct inode *inode;
+
+	spin_lock(&entry->lock);
+
+	group = entry->group;
+	inode = entry->inode;
+
+	BUG_ON(group && !inode);
+	BUG_ON(!group && inode);
+
+	/* if !group something else already marked this to die */
+	if (!group) {
+		spin_unlock(&entry->lock);
+		return;
+	}
+
+	/* this just tests that the caller held a reference */
+	if (unlikely(atomic_read(&entry->refcnt) < 3))
+		BUG();
+
+	spin_lock(&group->mark_lock);
+	spin_lock(&inode->i_lock);
+
+	hlist_del_init(&entry->i_list);
+	entry->inode = NULL;
+	fsnotify_put_mark(entry); /* for i_list */
+
+	list_del_init(&entry->g_list);
+	entry->group = NULL;
+	fsnotify_put_mark(entry); /* for g_list */
+
+	fsnotify_recalc_inode_mask_locked(inode);
+
+	spin_unlock(&inode->i_lock);
+	spin_unlock(&group->mark_lock);
+	spin_unlock(&entry->lock);
+
+	group->ops->freeing_mark(entry, group);
+
+	if (atomic_dec_and_test(&group->num_marks))
+		fsnotify_final_destroy_group(group);
+}
+
+void fsnotify_clear_marks_by_group(struct fsnotify_group *group)
+{
+	struct fsnotify_mark_entry *lentry, *entry;
+	LIST_HEAD(free_list);
+
+	spin_lock(&group->mark_lock);
+	list_for_each_entry_safe(entry, lentry, &group->mark_entries, g_list) {
+		list_add(&entry->free_g_list, &free_list);
+		list_del_init(&entry->g_list);
+		fsnotify_get_mark(entry);
+	}
+	spin_unlock(&group->mark_lock);
+
+	list_for_each_entry_safe(entry, lentry, &free_list, free_g_list) {
+		fsnotify_destroy_mark_by_entry(entry);
+		fsnotify_put_mark(entry);
+	}
+}
+
+void fsnotify_clear_marks_by_inode(struct inode *inode)
+{
+	struct fsnotify_mark_entry *entry, *lentry;
+	struct hlist_node *pos, *n;
+	LIST_HEAD(free_list);
+
+	spin_lock(&inode->i_lock);
+	hlist_for_each_entry_safe(entry, pos, n, &inode->i_fsnotify_mark_entries, i_list) {
+		list_add(&entry->free_i_list, &free_list);
+		hlist_del_init(&entry->i_list);
+		fsnotify_get_mark(entry);
+	}
+	spin_unlock(&inode->i_lock);
+
+	list_for_each_entry_safe(entry, lentry, &free_list, free_i_list) {
+		fsnotify_destroy_mark_by_entry(entry);
+		fsnotify_put_mark(entry);
+	}
+}
+
+struct fsnotify_mark_entry *fsnotify_find_mark_entry(struct fsnotify_group *group, struct inode *inode)
+{
+	struct fsnotify_mark_entry *entry;
+	struct hlist_node *pos;
+
+	assert_spin_locked(&inode->i_lock);
+
+	hlist_for_each_entry(entry, pos, &inode->i_fsnotify_mark_entries, i_list) {
+		if (entry->group == group) {
+			fsnotify_get_mark(entry);
+			return entry;
+		}
+	}
+	return NULL;
+}
+
+void fsnotify_init_mark(struct fsnotify_mark_entry *entry, void (*free_mark)(struct fsnotify_mark_entry *entry))
+
+{
+	spin_lock_init(&entry->lock);
+	atomic_set(&entry->refcnt, 1);
+	INIT_HLIST_NODE(&entry->i_list);
+	entry->group = NULL;
+	entry->mask = 0;
+	entry->inode = NULL;
+	entry->free_mark = free_mark;
+}
+
+int fsnotify_add_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *group, struct inode *inode)
+{
+	struct fsnotify_mark_entry *lentry;
+	int ret = 0;
+
+	/*
+	 * LOCKING ORDER!!!!
+	 * entry->lock
+	 * group->mark_lock
+	 * inode->i_lock
+	 */
+	spin_lock(&entry->lock);
+	spin_lock(&group->mark_lock);
+	spin_lock(&inode->i_lock);
+
+	entry->group = group;
+	entry->inode = inode;
+
+	lentry = fsnotify_find_mark_entry(group, inode);
+	if (!lentry) {
+		hlist_add_head(&entry->i_list, &inode->i_fsnotify_mark_entries);
+		list_add(&entry->g_list, &group->mark_entries);
+
+		fsnotify_get_mark(entry); /* for i_list */
+		fsnotify_get_mark(entry); /* for g_list */
+
+		atomic_inc(&group->num_marks);
+
+		fsnotify_recalc_inode_mask_locked(inode);
+	}
+
+	spin_unlock(&inode->i_lock);
+	spin_unlock(&group->mark_lock);
+	spin_unlock(&entry->lock);
+
+	if (lentry) {
+		ret = -EEXIST;
+		fsnotify_put_mark(lentry);
+	}
+
+	return ret;
+}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b228538..d391ab4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -696,6 +696,11 @@ struct inode {
 
 	__u32			i_generation;
 
+#ifdef CONFIG_FSNOTIFY
+	__u32			i_fsnotify_mask; /* all events this inode cares about */
+	struct hlist_head	i_fsnotify_mark_entries; /* fsnotify mark entries */
+#endif
+
 #ifdef CONFIG_DNOTIFY
 	unsigned long		i_dnotify_mask; /* Directory notify events */
 	struct dnotify_struct	*i_dnotify; /* for directory notifications */
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 3d68058..4e04ab2 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -36,6 +36,14 @@ static inline void fsnotify_d_move(struct dentry *entry)
 }
 
 /*
+ * fsnotify_inode_delete - and inode is being evicted from cache, clean up is needed
+ */
+static inline void fsnotify_inode_delete(struct inode *inode)
+{
+	__fsnotify_inode_delete(inode);
+}
+
+/*
  * fsnotify_inoderemove - an inode is going away
  */
 static inline void fsnotify_inoderemove(struct inode *inode)
@@ -44,6 +52,7 @@ static inline void fsnotify_inoderemove(struct inode *inode)
 	inotify_inode_is_dead(inode);
 
 	fsnotify(inode, FS_DELETE_SELF, inode, FSNOTIFY_EVENT_INODE);
+	__fsnotify_inode_delete(inode);
 }
 
 /*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index a349691..fc71e88 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -13,7 +13,6 @@
 #include <linux/list.h>
 #include <linux/path.h> /* struct path */
 #include <linux/spinlock.h>
-#include <linux/wait.h>
 
 #include <asm/atomic.h>
 
@@ -50,16 +49,24 @@
 
 struct fsnotify_group;
 struct fsnotify_event;
+struct fsnotify_mark_entry;
 
 /*
  * Each group much define these ops.
  *
+ * should_send_event - given a group, inode, and mask this function determines
+ *		if the group is interested in this event.
  * handle_event - main call for a group to handle an fs event
  * free_group_priv - called when a group refcnt hits 0 to clean up the private union
+ * freeing-mark - this means that a mark has been flagged to die when everything
+ *		finishes using it.  The function is supplied with what must be a
+ *		valid group and inode to use to clean up.
  */
 struct fsnotify_ops {
+	int (*should_send_event)(struct fsnotify_group *group, struct inode *inode, __u32 mask);
 	int (*handle_event)(struct fsnotify_group *group, struct fsnotify_event *event);
 	void (*free_group_priv)(struct fsnotify_group *group);
+	void (*freeing_mark)(struct fsnotify_mark_entry *entry, struct fsnotify_group *group);
 };
 
 /*
@@ -76,6 +83,13 @@ struct fsnotify_group {
 
 	const struct fsnotify_ops *ops;	/* how this group handles things */
 
+	/* stores all fastapth entries assoc with this group so they can be cleaned on unregister */
+	spinlock_t mark_lock;		/* protect mark_entries list */
+	atomic_t num_marks;		/* 1 for each mark entry and 1 for not being
+					 * past the point of no return when freeing
+					 * a group */
+	struct list_head mark_entries;	/* all inode mark entries for this group */
+
 	unsigned int priority;		/* order this group should receive msgs.  low first */
 	unsigned int evicted:1;		/* has this group been evicted? */
 
@@ -109,13 +123,40 @@ struct fsnotify_event {
 	__u32 mask;		/* the type of access */
 };
 
+/*
+ * a mark is simply an entry attached to an in core inode which allows an
+ * fsnotify listener to indicate they are either no longer interested in events
+ * of a type matching mask or only interested in those events.
+ *
+ * these are flushed when an inode is evicted from core and may be flushed
+ * when the inode is modified (as seen by fsnotify_access).  Some fsnotify users
+ * (such as dnotify) will flush these when the open fd is closed and not at
+ * inode eviction or modification.
+ */
+struct fsnotify_mark_entry {
+	__u32 mask;			/* mask this mark entry is for */
+	/* we hold ref for each i_list and g_list.  also one ref for each 'thing'
+	 * in kernel that found and may be using this mark. */
+	atomic_t refcnt;		/* active things looking at this mark */
+	struct inode *inode;		/* inode this entry is associated with */
+	struct fsnotify_group *group;	/* group this mark entry is for */
+	struct hlist_node i_list;	/* list of mark_entries by inode->i_fsnotify_mark_entries */
+	struct list_head g_list;	/* list of mark_entries by group->i_fsnotify_mark_entries */
+	spinlock_t lock;		/* protect group, inode, and killme */
+	struct list_head free_i_list;	/* tmp list used when freeing this mark */
+	struct list_head free_g_list;	/* tmp list used when freeing this mark */
+	void (*free_mark)(struct fsnotify_mark_entry *entry); /* called on final put+free */
+};
+
 #ifdef CONFIG_FSNOTIFY
 
 /* called from the vfs to signal fs events */
 extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is);
+extern void __fsnotify_inode_delete(struct inode *inode);
 
 /* called from fsnotify interfaces, such as fanotify or dnotify */
 extern void fsnotify_recalc_global_mask(void);
+extern void fsnotify_recalc_group_mask(struct fsnotify_group *group);
 extern struct fsnotify_group *fsnotify_obtain_group(unsigned int priority, unsigned int group_num,
 						    __u32 mask, const struct fsnotify_ops *ops);
 extern void fsnotify_put_group(struct fsnotify_group *group);
@@ -124,12 +165,27 @@ extern void fsnotify_get_event(struct fsnotify_event *event);
 extern void fsnotify_put_event(struct fsnotify_event *event);
 extern struct fsnotify_event_private_data *fsnotify_get_priv_from_event(struct fsnotify_group *group, struct fsnotify_event *event);
 
+/* functions used to manipulate the marks attached to inodes */
+extern void fsnotify_recalc_inode_mask(struct inode *inode);
+extern void fsnotify_init_mark(struct fsnotify_mark_entry *entry, void (*free_mark)(struct fsnotify_mark_entry *entry));
+extern struct fsnotify_mark_entry *fsnotify_find_mark_entry(struct fsnotify_group *group, struct inode *inode);
+extern int fsnotify_add_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *group, struct inode *inode);
+extern void fsnotify_destroy_mark_by_entry(struct fsnotify_mark_entry *entry);
+extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group);
+extern void fsnotify_get_mark(struct fsnotify_mark_entry *entry);
+extern void fsnotify_put_mark(struct fsnotify_mark_entry *entry);
+
 /* put here because inotify does some weird stuff when destroying watches */
 extern struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_is);
+
 #else
 
 static inline void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is);
 {}
+
+static inline void __fsnotify_inode_delete(struct inode *inode)
+{}
+
 #endif	/* CONFIG_FSNOTIFY */
 
 #endif	/* __KERNEL __ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 05/13] fsnotify: parent event notification
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (2 preceding siblings ...)
  2009-03-27 20:05 ` [PATCH -V2 04/13] fsnotify: add in inode fsnotify markings Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:05 ` [PATCH -V2 06/13] dnotify: reimplement dnotify using fsnotify Eric Paris
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

inotify and dnotify both use a similar parent notification mechanism.  We
add a generic parent notification mechanism to fsnotify for both of these
to use.  This new machanism also adds the dentry flag optimization which
exists for inotify to dnotify.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/inode_mark.c           |   48 +++++++++++++++++++++++++++++++++
 include/linux/dcache.h           |    3 +-
 include/linux/fsnotify.h         |   56 ++++++++++++++++++++++++++++++++++++--
 include/linux/fsnotify_backend.h |   43 +++++++++++++++++++++++++++++
 4 files changed, 146 insertions(+), 4 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 0271e65..e59a198 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -41,6 +41,48 @@ void fsnotify_put_mark(struct fsnotify_mark_entry *entry)
 }
 
 /*
+ * Given an inode, first check if we care what happens to out children.  Inotify
+ * and dnotify both tell their parents about events.  If we care about any event
+ * on a child we run all of our children and set a dentry flag saying that the
+ * parent cares.  Thus when an event happens on a child it can quickly tell if
+ * if there is a need to find a parent and send the event to the parent.
+ */
+static inline void fsnotify_update_dentry_child_flags(struct inode *inode)
+{
+	struct dentry *alias;
+	int watched;
+
+	if (!S_ISDIR(inode->i_mode))
+		return;
+
+	/* determine if the children should tell inode about their events */
+	watched = fsnotify_inode_watches_children(inode);
+
+	spin_lock(&dcache_lock);
+	/* run all of the dentries associated with this inode.  Since this is a
+	 * directory, there damn well better only be one item on this list */
+	list_for_each_entry(alias, &inode->i_dentry, d_alias) {
+		struct dentry *child;
+
+		/* run all of the children of the original inode and fix their
+		 * d_flags to indicate parental interest (their parent is the
+		 * original inode) */
+		list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
+			if (!child->d_inode)
+				continue;
+
+			spin_lock(&child->d_lock);
+			if (watched)
+				child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
+			else
+				child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
+			spin_unlock(&child->d_lock);
+		}
+	}
+	spin_unlock(&dcache_lock);
+}
+
+/*
  * recalculate the mask of events relevant to a given inode locked.
  */
 static void fsnotify_recalc_inode_mask_locked(struct inode *inode)
@@ -65,6 +107,8 @@ void fsnotify_recalc_inode_mask(struct inode *inode)
 	spin_lock(&inode->i_lock);
 	fsnotify_recalc_inode_mask_locked(inode);
 	spin_unlock(&inode->i_lock);
+
+	fsnotify_update_dentry_child_flags(inode);
 }
 
 /*
@@ -114,6 +158,8 @@ void fsnotify_destroy_mark_by_entry(struct fsnotify_mark_entry *entry)
 
 	group->ops->freeing_mark(entry, group);
 
+	fsnotify_update_dentry_child_flags(inode);
+
 	if (atomic_dec_and_test(&group->num_marks))
 		fsnotify_final_destroy_group(group);
 }
@@ -223,6 +269,8 @@ int fsnotify_add_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *
 	if (lentry) {
 		ret = -EEXIST;
 		fsnotify_put_mark(lentry);
+	} else {
+		fsnotify_update_dentry_child_flags(inode);
 	}
 
 	return ret;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index c66d224..2b935f2 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -180,7 +180,8 @@ d_iput:		no		no		no       yes
 #define DCACHE_REFERENCED	0x0008  /* Recently used, don't discard. */
 #define DCACHE_UNHASHED		0x0010	
 
-#define DCACHE_INOTIFY_PARENT_WATCHED	0x0020 /* Parent inode is watched */
+#define DCACHE_INOTIFY_PARENT_WATCHED	0x0020 /* Parent inode is watched by inotify */
+#define DCACHE_FSNOTIFY_PARENT_WATCHED	0x0040 /* Parent inode is watched by some fsnotify listener */
 
 #define DCACHE_COOKIE		0x0040	/* For use by dcookie subsystem */
 
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 4e04ab2..7b45f29 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -20,10 +20,43 @@
  * fsnotify_d_instantiate - instantiate a dentry for inode
  * Called with dcache_lock held.
  */
-static inline void fsnotify_d_instantiate(struct dentry *entry,
-						struct inode *inode)
+static inline void fsnotify_d_instantiate(struct dentry *dentry, struct inode *inode)
 {
-	inotify_d_instantiate(entry, inode);
+	__fsnotify_d_instantiate(dentry, inode);
+
+	/* call the legacy inotify shit */
+	inotify_d_instantiate(dentry, inode);
+}
+
+/* Notify this dentry's parent about a child's events. */
+static inline void fsnotify_parent(struct dentry *dentry, __u32 mask)
+{
+	struct dentry *parent;
+	struct inode *p_inode;
+	char send = 0;
+
+	if (!(dentry->d_flags | DCACHE_FSNOTIFY_PARENT_WATCHED))
+		return;
+
+	/* we are notifying a parent so come up with the new mask which
+	 * specifies these are events which came from a child. */
+	mask |= FS_EVENT_ON_CHILD;
+
+	spin_lock(&dentry->d_lock);
+	parent = dentry->d_parent;
+	p_inode = parent->d_inode;
+
+	if (p_inode->i_fsnotify_mask & mask) {
+		dget(parent);
+		send = 1;
+	}
+
+	spin_unlock(&dentry->d_lock);
+
+	if (send) {
+		fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE);
+		dput(parent);
+	}
 }
 
 /*
@@ -32,6 +65,14 @@ static inline void fsnotify_d_instantiate(struct dentry *entry,
  */
 static inline void fsnotify_d_move(struct dentry *entry)
 {
+	struct dentry *parent;
+
+	parent = entry->d_parent;
+	if (fsnotify_inode_watches_children(parent->d_inode))
+		entry->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
+	else
+		entry->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
+
 	inotify_d_move(entry);
 }
 
@@ -113,6 +154,8 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 		isdir = FS_IN_ISDIR;
 	dnotify_parent(dentry, DN_DELETE);
 	inotify_dentry_parent_queue_event(dentry, IN_DELETE|isdir, 0, dentry->d_name.name);
+
+	fsnotify_parent(dentry, FS_DELETE|isdir);
 }
 
 /*
@@ -182,6 +225,7 @@ static inline void fsnotify_access(struct dentry *dentry)
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
+	fsnotify_parent(dentry, mask);
 	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
@@ -200,6 +244,7 @@ static inline void fsnotify_modify(struct dentry *dentry)
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
+	fsnotify_parent(dentry, mask);
 	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
@@ -217,6 +262,7 @@ static inline void fsnotify_open(struct dentry *dentry)
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
+	fsnotify_parent(dentry, mask);
 	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
@@ -237,6 +283,7 @@ static inline void fsnotify_close(struct file *file)
 	inotify_dentry_parent_queue_event(dentry, mask, 0, name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
+	fsnotify_parent(dentry, mask);
 	fsnotify(inode, mask, file, FSNOTIFY_EVENT_FILE);
 }
 
@@ -254,6 +301,7 @@ static inline void fsnotify_xattr(struct dentry *dentry)
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
+	fsnotify_parent(dentry, mask);
 	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 }
 
@@ -304,6 +352,8 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 		inotify_inode_queue_event(inode, in_mask, 0, NULL, NULL);
 		inotify_dentry_parent_queue_event(dentry, in_mask, 0,
 						  dentry->d_name.name);
+
+		fsnotify_parent(dentry, in_mask);
 		fsnotify(inode, in_mask, inode, FSNOTIFY_EVENT_INODE);
 	}
 }
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index fc71e88..0d380bc 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -45,8 +45,17 @@
 #define FS_DN_RENAME		0x10000000ul	/* file renamed */
 #define FS_DN_MULTISHOT		0x20000000ul	/* dnotify multishot */
 
+/* this inode cares about things that happen to it's children.  Always set for
+ * dnotify and inotify.  never set for fanotify */
 #define FS_EVENT_ON_CHILD	0x08000000ul
 
+/* this is a list of all events that may get sent to a parernt based on fs event
+ * happening to inodes inside that directory */
+#define FS_EVENTS_POSS_ON_CHILD   (FS_ACCESS | FS_MODIFY | FS_ATTRIB |\
+				   FS_CLOSE_WRITE | FS_CLOSE_NOWRITE | FS_OPEN |\
+				   FS_MOVED_FROM | FS_MOVED_TO | FS_CREATE |\
+				   FS_DELETE)
+
 struct fsnotify_group;
 struct fsnotify_event;
 struct fsnotify_mark_entry;
@@ -154,6 +163,37 @@ struct fsnotify_mark_entry {
 extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is);
 extern void __fsnotify_inode_delete(struct inode *inode);
 
+static inline int fsnotify_inode_watches_children(struct inode *inode)
+{
+	/* FS_EVENT_ON_CHILD is set if the inode may care */
+	if (!(inode->i_fsnotify_mask & FS_EVENT_ON_CHILD))
+		return 0;
+	/* this inode might care about child events, does it care about the
+	 * specific set of events that can happen on a child? */
+	return inode->i_fsnotify_mask & FS_EVENTS_POSS_ON_CHILD;
+}
+
+/*
+ * fsnotify_d_instantiate - instantiate a dentry for inode
+ * Called with dcache_lock held.
+ */
+static inline void __fsnotify_d_instantiate(struct dentry *dentry, struct inode *inode)
+{
+	struct dentry *parent;
+	struct inode *p_inode;
+
+	if (!inode)
+		return;
+
+	spin_lock(&dentry->d_lock);
+	parent = dentry->d_parent;
+	p_inode = parent->d_inode;
+
+	if (fsnotify_inode_watches_children(p_inode))
+		dentry->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
+	spin_unlock(&dentry->d_lock);
+}
+
 /* called from fsnotify interfaces, such as fanotify or dnotify */
 extern void fsnotify_recalc_global_mask(void);
 extern void fsnotify_recalc_group_mask(struct fsnotify_group *group);
@@ -186,6 +226,9 @@ static inline void fsnotify(struct inode *to_tell, __u32 mask, void *data, int d
 static inline void __fsnotify_inode_delete(struct inode *inode)
 {}
 
+static inline void __fsnotify_d_instantiate(struct dentry *dentry, struct inode *inode)
+{}
+
 #endif	/* CONFIG_FSNOTIFY */
 
 #endif	/* __KERNEL __ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 06/13] dnotify: reimplement dnotify using fsnotify
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (3 preceding siblings ...)
  2009-03-27 20:05 ` [PATCH -V2 05/13] fsnotify: parent event notification Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:05 ` [PATCH -V2 07/13] fsnotify: generic notification queue and waitq Eric Paris
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

Reimplement dnotify using fsnotify.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 MAINTAINERS                      |    2 
 fs/notify/dnotify/Kconfig        |    1 
 fs/notify/dnotify/dnotify.c      |  405 +++++++++++++++++++++++++++++---------
 include/linux/dnotify.h          |   29 +--
 include/linux/fs.h               |    5 
 include/linux/fsnotify.h         |   70 ++-----
 include/linux/fsnotify_backend.h |    3 
 7 files changed, 340 insertions(+), 175 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1030a0d..3877ec4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1431,6 +1431,8 @@ S:	Orphan
 DIRECTORY NOTIFICATION (DNOTIFY)
 P:	Stephen Rothwell
 M:	sfr@canb.auug.org.au
+P:	Eric Paris
+M:	eparis@parisplace.org
 L:	linux-kernel@vger.kernel.org
 S:	Supported
 
diff --git a/fs/notify/dnotify/Kconfig b/fs/notify/dnotify/Kconfig
index 26adf5d..904ff8d 100644
--- a/fs/notify/dnotify/Kconfig
+++ b/fs/notify/dnotify/Kconfig
@@ -1,5 +1,6 @@
 config DNOTIFY
 	bool "Dnotify support"
+	depends on FSNOTIFY
 	default y
 	help
 	  Dnotify is a directory-based per-fd file change notification system
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index b0aa2cd..f69e0c4 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -3,6 +3,9 @@
  *
  * Copyright (C) 2000,2001,2002 Stephen Rothwell
  *
+ * Copyright (C) 2009 Eric Paris <Red Hat Inc>
+ * dnotify was largly rewritten to use the new fsnotify infrastructure
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the
  * Free Software Foundation; either version 2, or (at your option) any
@@ -21,24 +24,140 @@
 #include <linux/spinlock.h>
 #include <linux/slab.h>
 #include <linux/fdtable.h>
+#include <linux/fsnotify_backend.h>
 
 int dir_notify_enable __read_mostly = 1;
 
 static struct kmem_cache *dn_cache __read_mostly;
+static struct kmem_cache *dnotify_inode_mark_cache __read_mostly;
+static struct fsnotify_group *dnotify_group __read_mostly;
+static DEFINE_MUTEX(dnotify_mark_mutex);
+
+struct dnotify_mark_entry {
+	struct fsnotify_mark_entry fsn_entry;
+	struct dnotify_struct *dn;
+};
 
-static void redo_inode_mask(struct inode *inode)
+static void dnotify_recalc_inode_mask(struct fsnotify_mark_entry *entry)
 {
-	unsigned long new_mask;
+	__u32 new_mask, old_mask;
 	struct dnotify_struct *dn;
+	struct dnotify_mark_entry *dnentry  = container_of(entry, struct dnotify_mark_entry, fsn_entry);
 
+	assert_spin_locked(&entry->lock);
+
+	old_mask = entry->mask;
 	new_mask = 0;
-	for (dn = inode->i_dnotify; dn != NULL; dn = dn->dn_next)
-		new_mask |= dn->dn_mask & ~DN_MULTISHOT;
-	inode->i_dnotify_mask = new_mask;
+	dn = dnentry->dn;
+	for (; dn != NULL; dn = dn->dn_next)
+		new_mask |= (dn->dn_mask & ~FS_DN_MULTISHOT);
+	entry->mask = new_mask;
+
+	if (old_mask == new_mask)
+		return;
+
+	if (entry->inode)
+		fsnotify_recalc_inode_mask(entry->inode);
 }
 
+static int dnotify_handle_event(struct fsnotify_group *group, struct fsnotify_event *event)
+{
+	struct fsnotify_mark_entry *entry = NULL;
+	struct dnotify_mark_entry *dnentry;
+	struct inode *to_tell;
+	struct dnotify_struct *dn;
+	struct dnotify_struct **prev;
+	struct fown_struct *fown;
+
+	to_tell = event->to_tell;
+
+	spin_lock(&to_tell->i_lock);
+	entry = fsnotify_find_mark_entry(group, to_tell);
+	spin_unlock(&to_tell->i_lock);
+
+	/* unlikely since we alreay passed dnotify_should_send_event() */
+	if (unlikely(!entry))
+		return 0;
+	dnentry = container_of(entry, struct dnotify_mark_entry, fsn_entry);
+
+	spin_lock(&entry->lock);
+	prev = &dnentry->dn;
+	while ((dn = *prev) != NULL) {
+		if ((dn->dn_mask & event->mask) == 0) {
+			prev = &dn->dn_next;
+			continue;
+		}
+		fown = &dn->dn_filp->f_owner;
+		send_sigio(fown, dn->dn_fd, POLL_MSG);
+		if (dn->dn_mask & FS_DN_MULTISHOT)
+			prev = &dn->dn_next;
+		else {
+			*prev = dn->dn_next;
+			kmem_cache_free(dn_cache, dn);
+			dnotify_recalc_inode_mask(entry);
+		}
+	}
+
+	spin_unlock(&entry->lock);
+	fsnotify_put_mark(entry);
+
+	return 0;
+}
+
+static int dnotify_should_send_event(struct fsnotify_group *group, struct inode *inode, __u32 mask)
+{
+	struct fsnotify_mark_entry *entry;
+	int send;
+
+	/* !dir_notify_enable should never get here, don't waste time checking
+	if (!dir_notify_enable)
+		return 0; */
+
+	/* not a dir, dnotify doesn't care */
+	if (!S_ISDIR(inode->i_mode))
+		return 0;
+
+	spin_lock(&inode->i_lock);
+	entry = fsnotify_find_mark_entry(group, inode);
+	spin_unlock(&inode->i_lock);
+
+	/* no mark means no dnotify watch */
+	if (!entry)
+		return 0;
+
+	spin_lock(&entry->lock);
+	send = !!(mask & entry->mask);
+	spin_unlock(&entry->lock);
+	fsnotify_put_mark(entry);
+
+	return send;
+}
+
+static void dnotify_freeing_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *group)
+{
+	/* dnotify doesn't care than an inode is on the way out */
+}
+
+static void dnotify_free_mark(struct fsnotify_mark_entry *entry)
+{
+	struct dnotify_mark_entry *dnentry = container_of(entry, struct dnotify_mark_entry, fsn_entry);
+
+	BUG_ON(dnentry->dn);
+
+	kmem_cache_free(dnotify_inode_mark_cache, dnentry);
+}
+
+static struct fsnotify_ops dnotify_fsnotify_ops = {
+	.handle_event = dnotify_handle_event,
+	.should_send_event = dnotify_should_send_event,
+	.free_group_priv = NULL,
+	.freeing_mark = dnotify_freeing_mark,
+};
+
 void dnotify_flush(struct file *filp, fl_owner_t id)
 {
+	struct fsnotify_mark_entry *entry;
+	struct dnotify_mark_entry *dnentry;
 	struct dnotify_struct *dn;
 	struct dnotify_struct **prev;
 	struct inode *inode;
@@ -46,145 +165,229 @@ void dnotify_flush(struct file *filp, fl_owner_t id)
 	inode = filp->f_path.dentry->d_inode;
 	if (!S_ISDIR(inode->i_mode))
 		return;
+
 	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
+	entry = fsnotify_find_mark_entry(dnotify_group, inode);
+	spin_unlock(&inode->i_lock);
+	if (!entry)
+		return;
+	dnentry = container_of(entry, struct dnotify_mark_entry, fsn_entry);
+
+	mutex_lock(&dnotify_mark_mutex);
+
+	spin_lock(&entry->lock);
+	prev = &dnentry->dn;
 	while ((dn = *prev) != NULL) {
 		if ((dn->dn_owner == id) && (dn->dn_filp == filp)) {
 			*prev = dn->dn_next;
-			redo_inode_mask(inode);
 			kmem_cache_free(dn_cache, dn);
+			dnotify_recalc_inode_mask(entry);
 			break;
 		}
 		prev = &dn->dn_next;
 	}
-	spin_unlock(&inode->i_lock);
+
+	spin_unlock(&entry->lock);
+
+	/* nothing else could have found us thanks to the dnotify_mark_mutex */
+	if (dnentry->dn == NULL)
+		fsnotify_destroy_mark_by_entry(entry);
+
+	fsnotify_recalc_group_mask(dnotify_group);
+
+	mutex_unlock(&dnotify_mark_mutex);
+
+	fsnotify_put_mark(entry);
 }
 
-int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
+/* this conversion is done only at watch creation */
+static inline __u32  convert_arg(unsigned long arg)
+{
+	__u32 new_mask = FS_EVENT_ON_CHILD;
+
+	if (arg & DN_MULTISHOT)
+		new_mask |= FS_DN_MULTISHOT;
+	if (arg & DN_DELETE)
+		new_mask |= (FS_DELETE | FS_MOVED_FROM);
+	if (arg & DN_MODIFY)
+		new_mask |= FS_MODIFY;
+	if (arg & DN_ACCESS)
+		new_mask |= FS_ACCESS;
+	if (arg & DN_ATTRIB)
+		new_mask |= FS_ATTRIB;
+	if (arg & DN_RENAME)
+		new_mask |= FS_DN_RENAME;
+	if (arg & DN_CREATE)
+		new_mask |= (FS_CREATE | FS_MOVED_TO);
+
+	return new_mask;
+}
+
+static int attach_dn(struct dnotify_struct *dn, struct dnotify_mark_entry *dnentry, fl_owner_t id,
+		     int fd, struct file *filp, __u32 mask)
 {
-	struct dnotify_struct *dn;
 	struct dnotify_struct *odn;
 	struct dnotify_struct **prev;
-	struct inode *inode;
-	fl_owner_t id = current->files;
-	struct file *f;
-	int error = 0;
 
-	if ((arg & ~DN_MULTISHOT) == 0) {
-		dnotify_flush(filp, id);
-		return 0;
-	}
-	if (!dir_notify_enable)
-		return -EINVAL;
-	inode = filp->f_path.dentry->d_inode;
-	if (!S_ISDIR(inode->i_mode))
-		return -ENOTDIR;
-	dn = kmem_cache_alloc(dn_cache, GFP_KERNEL);
-	if (dn == NULL)
-		return -ENOMEM;
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
+	prev = &dnentry->dn;
 	while ((odn = *prev) != NULL) {
+		/* do we already have a dnotify struct and we are just adding more events? */
 		if ((odn->dn_owner == id) && (odn->dn_filp == filp)) {
 			odn->dn_fd = fd;
-			odn->dn_mask |= arg;
-			inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
-			goto out_free;
+			odn->dn_mask |= mask;
+			return -EEXIST;
 		}
 		prev = &odn->dn_next;
 	}
 
-	rcu_read_lock();
-	f = fcheck(fd);
-	rcu_read_unlock();
-	/* we'd lost the race with close(), sod off silently */
-	/* note that inode->i_lock prevents reordering problems
-	 * between accesses to descriptor table and ->i_dnotify */
-	if (f != filp)
-		goto out_free;
-
-	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
-	if (error)
-		goto out_free;
-
-	dn->dn_mask = arg;
+	dn->dn_mask = mask;
 	dn->dn_fd = fd;
 	dn->dn_filp = filp;
 	dn->dn_owner = id;
-	inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
-	dn->dn_next = inode->i_dnotify;
-	inode->i_dnotify = dn;
-	spin_unlock(&inode->i_lock);
-	return 0;
+	dn->dn_next = dnentry->dn;
+	dnentry->dn = dn;
 
-out_free:
-	spin_unlock(&inode->i_lock);
-	kmem_cache_free(dn_cache, dn);
-	return error;
+	return 0;
 }
 
-void __inode_dir_notify(struct inode *inode, unsigned long event)
+int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 {
-	struct dnotify_struct *	dn;
-	struct dnotify_struct **prev;
-	struct fown_struct *	fown;
-	int			changed = 0;
+	struct dnotify_mark_entry *new_dnentry, *dnentry;
+	struct fsnotify_mark_entry *new_entry, *entry;
+	struct dnotify_struct *dn;
+	struct inode *inode;
+	fl_owner_t id = current->files;
+	struct file *f;
+	int destroy = 0, error = 0;
+	__u32 mask;
 
-	spin_lock(&inode->i_lock);
-	prev = &inode->i_dnotify;
-	while ((dn = *prev) != NULL) {
-		if ((dn->dn_mask & event) == 0) {
-			prev = &dn->dn_next;
-			continue;
-		}
-		fown = &dn->dn_filp->f_owner;
-		send_sigio(fown, dn->dn_fd, POLL_MSG);
-		if (dn->dn_mask & DN_MULTISHOT)
-			prev = &dn->dn_next;
-		else {
-			*prev = dn->dn_next;
-			changed = 1;
-			kmem_cache_free(dn_cache, dn);
-		}
+	/* we use these to tell if we need to kfree */
+	new_entry = NULL;
+	dn = NULL;
+
+	if (!dir_notify_enable) {
+		error = -EINVAL;
+		goto out_err;
 	}
-	if (changed)
-		redo_inode_mask(inode);
-	spin_unlock(&inode->i_lock);
-}
 
-EXPORT_SYMBOL(__inode_dir_notify);
+	if ((arg & ~DN_MULTISHOT) == 0) {
+		dnotify_flush(filp, id);
+		error = 0;
+		goto out_err;
+	}
 
-/*
- * This is hopelessly wrong, but unfixable without API changes.  At
- * least it doesn't oops the kernel...
- *
- * To safely access ->d_parent we need to keep d_move away from it.  Use the
- * dentry's d_lock for this.
- */
-void dnotify_parent(struct dentry *dentry, unsigned long event)
-{
-	struct dentry *parent;
+	inode = filp->f_path.dentry->d_inode;
+	if (!S_ISDIR(inode->i_mode)) {
+		error = -ENOTDIR;
+		goto out_err;
+	}
 
-	if (!dir_notify_enable)
-		return;
+	/* expect most fcntl to add new rather than augment old */
+	dn = kmem_cache_alloc(dn_cache, GFP_KERNEL);
+	if (!dn) {
+		error = -ENOMEM;
+		goto out_err;
+	}
+
+	new_dnentry = kmem_cache_alloc(dnotify_inode_mark_cache, GFP_KERNEL);
+	if (!new_dnentry) {
+		error = -ENOMEM;
+		goto out_err;
+	}
+
+	/* convert the userspace DN_* "arg" to the internal FS_* defines in fsnotify */
+	mask = convert_arg(arg);
 
-	spin_lock(&dentry->d_lock);
-	parent = dentry->d_parent;
-	if (parent->d_inode->i_dnotify_mask & event) {
-		dget(parent);
-		spin_unlock(&dentry->d_lock);
-		__inode_dir_notify(parent->d_inode, event);
-		dput(parent);
+	/* set up the new_entry and new_dnentry */
+	new_entry = &new_dnentry->fsn_entry;
+	fsnotify_init_mark(new_entry, dnotify_free_mark);
+	new_entry->mask = mask;
+	new_dnentry->dn = NULL;
+
+	mutex_lock(&dnotify_mark_mutex);
+
+	/* add the new_entry or find an old one. */
+	spin_lock(&inode->i_lock);
+	entry = fsnotify_find_mark_entry(dnotify_group, inode);
+	spin_unlock(&inode->i_lock);
+	if (entry) {
+		dnentry = container_of(entry, struct dnotify_mark_entry, fsn_entry);
+		spin_lock(&entry->lock);
 	} else {
-		spin_unlock(&dentry->d_lock);
+		fsnotify_add_mark(new_entry, dnotify_group, inode);
+		spin_lock(&new_entry->lock);
+		entry = new_entry;
+		dnentry = new_dnentry;
+		/* we used new_entry, so don't free it */
+		new_entry = NULL;
+	}
+
+	rcu_read_lock();
+	f = fcheck(fd);
+	rcu_read_unlock();
+
+	/* if (f != filp) means that we lost a race and another task/thread
+	 * actually closed the fd we are still playing with before we grabbed
+	 * the dnotify_mark_mutex and entry->lock.  Since closing the fd is the
+	 * only time we clean up the mark entries we need to get our mark off
+	 * the list. */
+	if (f != filp) {
+		/* if we added ourselves, shoot ourselves, it's possible that
+		 * the flush actually did shoot this entry.  That's fine too
+		 * since multiple calls to destroy_mark is perfectly safe */
+		if (dnentry == new_dnentry)
+			destroy = 1;
+		/* if we just found a dnentry already there, just sod off
+		 * silently as the flush at close time dealt with it */
+		goto out;
 	}
+
+	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
+	if (error) {
+		/* if we added, we must shoot */
+		if (dnentry == new_dnentry)
+			destroy = 1;
+		goto out;
+	}
+
+	error = attach_dn(dn, dnentry, id, fd, filp, mask);
+	/* !error means that we attached the dn to the dnentry, so don't free it */
+	if (!error)
+		dn = NULL;
+	/* -EEXIST means that we didn't add this new dn and used an old one.
+	 * that isn't an error (and the unused dn should be freed) */
+	else if (error == -EEXIST)
+		error = 0;
+
+	dnotify_recalc_inode_mask(entry);
+out:
+	spin_unlock(&entry->lock);
+
+	if (destroy)
+		fsnotify_destroy_mark_by_entry(entry);
+
+	fsnotify_recalc_group_mask(dnotify_group);
+
+	mutex_unlock(&dnotify_mark_mutex);
+	fsnotify_put_mark(entry);
+out_err:
+	if (new_entry)
+		fsnotify_put_mark(new_entry);
+	if (dn)
+		kmem_cache_free(dn_cache, dn);
+	return error;
 }
-EXPORT_SYMBOL_GPL(dnotify_parent);
 
 static int __init dnotify_init(void)
 {
 	dn_cache = kmem_cache_create("dnotify_cache",
 		sizeof(struct dnotify_struct), 0, SLAB_PANIC, NULL);
+	dnotify_inode_mark_cache = kmem_cache_create("dnotify_inode_mark",
+		sizeof(struct dnotify_mark_entry), 0, SLAB_PANIC, NULL);
+	dnotify_group = fsnotify_obtain_group(DNOTIFY_GROUP_NUM, DNOTIFY_GROUP_NUM,
+					      0, &dnotify_fsnotify_ops);
+	if (IS_ERR(dnotify_group))
+		panic("unable to allocate fsnotify group for dnotify\n");
 	return 0;
 }
 
diff --git a/include/linux/dnotify.h b/include/linux/dnotify.h
index 102a902..e8c4256 100644
--- a/include/linux/dnotify.h
+++ b/include/linux/dnotify.h
@@ -10,7 +10,7 @@
 
 struct dnotify_struct {
 	struct dnotify_struct *	dn_next;
-	unsigned long		dn_mask;
+	__u64			dn_mask;
 	int			dn_fd;
 	struct file *		dn_filp;
 	fl_owner_t		dn_owner;
@@ -21,23 +21,18 @@ struct dnotify_struct {
 
 #ifdef CONFIG_DNOTIFY
 
-extern void __inode_dir_notify(struct inode *, unsigned long);
+#define ALL_DNOTIFY_EVENTS (FS_DELETE | FS_DELETE_CHILD |\
+			    FS_MODIFY | FS_MODIFY_CHILD |\
+			    FS_ACCESS | FS_ACCESS_CHILD |\
+			    FS_ATTRIB | FS_ATTRIB_CHILD |\
+			    FS_CREATE | FS_DN_RENAME |\
+			    FS_MOVED_FROM | FS_MOVED_TO)
+
 extern void dnotify_flush(struct file *, fl_owner_t);
 extern int fcntl_dirnotify(int, struct file *, unsigned long);
-extern void dnotify_parent(struct dentry *, unsigned long);
-
-static inline void inode_dir_notify(struct inode *inode, unsigned long event)
-{
-	if (inode->i_dnotify_mask & (event))
-		__inode_dir_notify(inode, event);
-}
 
 #else
 
-static inline void __inode_dir_notify(struct inode *inode, unsigned long event)
-{
-}
-
 static inline void dnotify_flush(struct file *filp, fl_owner_t id)
 {
 }
@@ -47,14 +42,6 @@ static inline int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 	return -EINVAL;
 }
 
-static inline void dnotify_parent(struct dentry *dentry, unsigned long event)
-{
-}
-
-static inline void inode_dir_notify(struct inode *inode, unsigned long event)
-{
-}
-
 #endif /* CONFIG_DNOTIFY */
 
 #endif /* __KERNEL __ */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d391ab4..9e5f3a9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -701,11 +701,6 @@ struct inode {
 	struct hlist_head	i_fsnotify_mark_entries; /* fsnotify mark entries */
 #endif
 
-#ifdef CONFIG_DNOTIFY
-	unsigned long		i_dnotify_mask; /* Directory notify events */
-	struct dnotify_struct	*i_dnotify; /* for directory notifications */
-#endif
-
 #ifdef CONFIG_INOTIFY
 	struct list_head	inotify_watches; /* watches on this inode */
 	struct mutex		inotify_mutex;	/* protects the watches list */
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 7b45f29..8b9539c 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -109,13 +109,7 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 	__u32 new_dir_mask = 0;
 
 	if (old_dir == new_dir) {
-		inode_dir_notify(old_dir, DN_RENAME);
 		old_dir_mask = FS_DN_RENAME;
-	} else {
-		inode_dir_notify(old_dir, DN_DELETE);
-		old_dir_mask = FS_DELETE;
-		inode_dir_notify(new_dir, DN_CREATE);
-		new_dir_mask = FS_CREATE;
 	}
 
 	if (isdir) {
@@ -152,7 +146,6 @@ static inline void fsnotify_nameremove(struct dentry *dentry, int isdir)
 {
 	if (isdir)
 		isdir = FS_IN_ISDIR;
-	dnotify_parent(dentry, DN_DELETE);
 	inotify_dentry_parent_queue_event(dentry, IN_DELETE|isdir, 0, dentry->d_name.name);
 
 	fsnotify_parent(dentry, FS_DELETE|isdir);
@@ -173,7 +166,6 @@ static inline void fsnotify_link_count(struct inode *inode)
  */
 static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
 {
-	inode_dir_notify(inode, DN_CREATE);
 	inotify_inode_queue_event(inode, IN_CREATE, 0, dentry->d_name.name,
 				  dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
@@ -188,7 +180,6 @@ static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
  */
 static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct dentry *new_dentry)
 {
-	inode_dir_notify(dir, DN_CREATE);
 	inotify_inode_queue_event(dir, IN_CREATE, 0, new_dentry->d_name.name,
 				  inode);
 	fsnotify_link_count(inode);
@@ -202,7 +193,6 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct
  */
 static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 {
-	inode_dir_notify(inode, DN_CREATE);
 	inotify_inode_queue_event(inode, IN_CREATE | IN_ISDIR, 0, 
 				  dentry->d_name.name, dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
@@ -221,7 +211,6 @@ static inline void fsnotify_access(struct dentry *dentry)
 	if (S_ISDIR(inode->i_mode))
 		mask |= FS_IN_ISDIR;
 
-	dnotify_parent(dentry, DN_ACCESS);
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
@@ -240,7 +229,6 @@ static inline void fsnotify_modify(struct dentry *dentry)
 	if (S_ISDIR(inode->i_mode))
 		mask |= FS_IN_ISDIR;
 
-	dnotify_parent(dentry, DN_MODIFY);
 	inotify_dentry_parent_queue_event(dentry, mask, 0, dentry->d_name.name);
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
@@ -312,49 +300,35 @@ static inline void fsnotify_xattr(struct dentry *dentry)
 static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 {
 	struct inode *inode = dentry->d_inode;
-	int dn_mask = 0;
-	u32 in_mask = 0;
+	__u32 mask = 0;
+
+	if (ia_valid & ATTR_UID)
+		mask |= FS_ATTRIB;
+	if (ia_valid & ATTR_GID)
+		mask |= FS_ATTRIB;
+	if (ia_valid & ATTR_SIZE)
+		mask |= FS_MODIFY;
 
-	if (ia_valid & ATTR_UID) {
-		in_mask |= FS_ATTRIB;
-		dn_mask |= DN_ATTRIB;
-	}
-	if (ia_valid & ATTR_GID) {
-		in_mask |= FS_ATTRIB;
-		dn_mask |= DN_ATTRIB;
-	}
-	if (ia_valid & ATTR_SIZE) {
-		in_mask |= FS_MODIFY;
-		dn_mask |= DN_MODIFY;
-	}
 	/* both times implies a utime(s) call */
 	if ((ia_valid & (ATTR_ATIME | ATTR_MTIME)) == (ATTR_ATIME | ATTR_MTIME))
-	{
-		in_mask |= FS_ATTRIB;
-		dn_mask |= DN_ATTRIB;
-	} else if (ia_valid & ATTR_ATIME) {
-		in_mask |= FS_ACCESS;
-		dn_mask |= DN_ACCESS;
-	} else if (ia_valid & ATTR_MTIME) {
-		in_mask |= FS_MODIFY;
-		dn_mask |= DN_MODIFY;
-	}
-	if (ia_valid & ATTR_MODE) {
-		in_mask |= FS_ATTRIB;
-		dn_mask |= DN_ATTRIB;
-	}
+		mask |= FS_ATTRIB;
+	else if (ia_valid & ATTR_ATIME)
+		mask |= FS_ACCESS;
+	else if (ia_valid & ATTR_MTIME)
+		mask |= FS_MODIFY;
+
+	if (ia_valid & ATTR_MODE)
+		mask |= FS_ATTRIB;
 
-	if (dn_mask)
-		dnotify_parent(dentry, dn_mask);
-	if (in_mask) {
+	if (mask) {
 		if (S_ISDIR(inode->i_mode))
-			in_mask |= FS_IN_ISDIR;
-		inotify_inode_queue_event(inode, in_mask, 0, NULL, NULL);
-		inotify_dentry_parent_queue_event(dentry, in_mask, 0,
+			mask |= FS_IN_ISDIR;
+		inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
+		inotify_dentry_parent_queue_event(dentry, mask, 0,
 						  dentry->d_name.name);
 
-		fsnotify_parent(dentry, in_mask);
-		fsnotify(inode, in_mask, inode, FSNOTIFY_EVENT_INODE);
+		fsnotify_parent(dentry, mask);
+		fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
 	}
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 0d380bc..c9a4da6 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -56,6 +56,9 @@
 				   FS_MOVED_FROM | FS_MOVED_TO | FS_CREATE |\
 				   FS_DELETE)
 
+/* listeners that hard code group numbers near the top */
+#define DNOTIFY_GROUP_NUM	UINT_MAX
+
 struct fsnotify_group;
 struct fsnotify_event;
 struct fsnotify_mark_entry;


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 07/13] fsnotify: generic notification queue and waitq
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (4 preceding siblings ...)
  2009-03-27 20:05 ` [PATCH -V2 06/13] dnotify: reimplement dnotify using fsnotify Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:05 ` [PATCH -V2 08/13] fsnotify: include pathnames with entries when possible Eric Paris
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

inotify needs to do asyc notification in which event information is stored
on a queue until the listener is ready to receive it.  This patch
implements a generic notification queue for inotify (and later fanotify) to
store events to be sent at a later time.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/fsnotify.h             |    3 +
 fs/notify/group.c                |    9 ++
 fs/notify/notification.c         |  214 ++++++++++++++++++++++++++++++++++++--
 include/linux/fsnotify_backend.h |   35 ++++++
 4 files changed, 250 insertions(+), 11 deletions(-)

diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 48d4372..b258d44 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -15,6 +15,9 @@ extern struct srcu_struct fsnotify_grp_srcu;
 extern struct list_head fsnotify_groups;
 extern __u32 fsnotify_mask;
 
+extern void fsnotify_flush_notif(struct fsnotify_group *group);
+
 extern void fsnotify_final_destroy_group(struct fsnotify_group *group);
+
 extern void fsnotify_clear_marks_by_inode(struct inode *inode);
 #endif	/* _LINUX_FSNOTIFY_PRIVATE_H */
diff --git a/fs/notify/group.c b/fs/notify/group.c
index b6b32fa..87e6b70 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -91,6 +91,9 @@ static void fsnotify_get_group(struct fsnotify_group *group)
 
 void fsnotify_final_destroy_group(struct fsnotify_group *group)
 {
+	/* clear the notification queue of all events */
+	fsnotify_flush_notif(group);
+
 	if (group->ops->free_group_priv)
 		group->ops->free_group_priv(group);
 
@@ -187,6 +190,12 @@ struct fsnotify_group *fsnotify_obtain_group(unsigned int priority, unsigned int
 	group->group_num = group_num;
 	group->mask = mask;
 
+	mutex_init(&group->notification_mutex);
+	INIT_LIST_HEAD(&group->notification_list);
+	init_waitqueue_head(&group->notification_waitq);
+	group->q_len = 0;
+	group->max_events = UINT_MAX;
+
 	spin_lock_init(&group->mark_lock);
 	atomic_set(&group->num_marks, 1);
 	INIT_LIST_HEAD(&group->mark_entries);
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index eb23a69..b21b9bb 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -33,6 +33,21 @@
 #include "fsnotify.h"
 
 static struct kmem_cache *event_kmem_cache;
+static struct kmem_cache *event_holder_kmem_cache;
+/*
+ * this is a magic event we sent when the q is too full.  since it doesn't
+ * hold real event information we just keep one system wide and use it any time
+ * it is needed.  It's refcnt is set 1 at kernel init time and will never
+ * get set to 0 so it will never get 'freed'
+ */
+static struct fsnotify_event q_overflow_event;
+
+/* return 1 if something is available, return 0 otherwise */
+int fsnotify_check_notif_queue(struct fsnotify_group *group)
+{
+	BUG_ON(!mutex_is_locked(&group->notification_mutex));
+	return !list_empty(&group->notification_list);
+}
 
 void fsnotify_get_event(struct fsnotify_event *event)
 {
@@ -45,26 +60,180 @@ void fsnotify_put_event(struct fsnotify_event *event)
 		return;
 
 	if (atomic_dec_and_test(&event->refcnt)) {
-		if (event->data_type == FSNOTIFY_EVENT_PATH) {
+		if (event->data_type == FSNOTIFY_EVENT_PATH)
 			path_put(&event->path);
-			event->path.dentry = NULL;
-			event->path.mnt = NULL;
-		}
+		kmem_cache_free(event_kmem_cache, event);
+	}
+}
 
-		event->mask = 0;
+struct fsnotify_event_holder *alloc_event_holder(void)
+{
+	return kmem_cache_alloc(event_holder_kmem_cache, GFP_KERNEL);
+}
 
-		kmem_cache_free(event_kmem_cache, event);
+void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder)
+{
+	kmem_cache_free(event_holder_kmem_cache, holder);
+}
+
+/*
+ * check if 2 events contain the same information.
+ */
+static inline int event_compare(struct fsnotify_event *old, struct fsnotify_event *new)
+{
+	if ((old->mask == new->mask) &&
+	    (old->to_tell == new->to_tell) &&
+	    (old->data_type == new->data_type)) {
+		switch (old->data_type) {
+		case (FSNOTIFY_EVENT_INODE):
+			if (old->inode == new->inode)
+				return 1;
+			break;
+		case (FSNOTIFY_EVENT_PATH):
+			if ((old->path.mnt == new->path.mnt) &&
+			    (old->path.dentry == new->path.dentry))
+				return 1;
+		case (FSNOTIFY_EVENT_NONE):
+			return 1;
+		};
 	}
+	return 0;
 }
 
-struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type)
+/*
+ * Add an event to the group notification queue.  The group can later pull this
+ * event off the queue to deal with.
+ */
+int fsnotify_add_notif_event(struct fsnotify_group *group, struct fsnotify_event *event)
+{
+	struct fsnotify_event_holder *holder = NULL;
+	struct list_head *list = &group->notification_list;
+	struct fsnotify_event_holder *last_holder;
+	struct fsnotify_event *last_event;
+
+	/*
+	 * Check if we expect to be able to use the in event holder.  If not alloc
+	 * a new holder.
+	 * For the overflow event it's possible that something will use the in
+	 * event holder before we get the lock so we may need to jump back and
+	 * alloc a new holder.
+	 */
+	if (!list_empty(&event->holder.event_list)) {
+alloc_holder:
+		holder = alloc_event_holder();
+		if (!holder)
+			return -ENOMEM;
+	}
+
+	mutex_lock(&group->notification_mutex);
+
+	if (group->q_len >= group->max_events)
+		event = &q_overflow_event;
+
+	spin_lock(&event->lock);
+
+	if (list_empty(&event->holder.event_list)) {
+		if (unlikely(holder))
+			fsnotify_destroy_event_holder(holder);
+		holder = &event->holder;
+	} else if (unlikely(!holder)) {
+		/* between the time we checked above and got the lock the in
+		 * event holder was used, go back and get a new one */
+		spin_unlock(&event->lock);
+		mutex_unlock(&group->notification_mutex);
+		goto alloc_holder;
+	}
+
+	if (!list_empty(list)) {
+		last_holder = list_entry(list->prev, struct fsnotify_event_holder, event_list);
+		last_event = last_holder->event;
+		if (event_compare(last_event, event)) {
+			spin_unlock(&event->lock);
+			mutex_unlock(&group->notification_mutex);
+			if (holder != &event->holder)
+				fsnotify_destroy_event_holder(holder);
+			return 0;
+		}
+	}
+
+	group->q_len++;
+	holder->event = event;
+
+	fsnotify_get_event(event);
+	list_add_tail(&holder->event_list, list);
+	spin_unlock(&event->lock);
+	mutex_unlock(&group->notification_mutex);
+
+	wake_up(&group->notification_waitq);
+	return 0;
+}
+
+/*
+ * remove and return the first event from the notification list.  There is a
+ * reference held on this event since it was on the list.  It is the responsibility
+ * of the caller to drop this reference.
+ */
+struct fsnotify_event *fsnotify_remove_notif_event(struct fsnotify_group *group)
 {
 	struct fsnotify_event *event;
+	struct fsnotify_event_holder *holder;
 
-	event = kmem_cache_alloc(event_kmem_cache, GFP_KERNEL);
-	if (!event)
-		return NULL;
+	BUG_ON(!mutex_is_locked(&group->notification_mutex));
+
+	holder = list_first_entry(&group->notification_list, struct fsnotify_event_holder, event_list);
+
+	event = holder->event;
+
+	spin_lock(&event->lock);
+	holder->event = NULL;
+	list_del_init(&holder->event_list);
+	spin_unlock(&event->lock);
+
+	/* event == holder means we are referenced through the in event holder */
+	if (holder != &event->holder)
+		fsnotify_destroy_event_holder(holder);
+
+	group->q_len--;
+
+	return event;
+}
+
+/*
+ * this will not remove the event, that must be done with fsnotify_remove_notif_event()
+ */
+struct fsnotify_event *fsnotify_peek_notif_event(struct fsnotify_group *group)
+{
+	struct fsnotify_event *event;
+	struct fsnotify_event_holder *holder;
+
+	BUG_ON(!mutex_is_locked(&group->notification_mutex));
+
+	holder = list_first_entry(&group->notification_list, struct fsnotify_event_holder, event_list);
+	event = holder->event;
+
+	return event;
+}
+
+/*
+ * called when a group is being torn down to clean up any outstanding
+ * event notifications.
+ */
+void fsnotify_flush_notif(struct fsnotify_group *group)
+{
+	struct fsnotify_event *event;
 
+	mutex_lock(&group->notification_mutex);
+	while (fsnotify_check_notif_queue(group)) {
+		event = fsnotify_remove_notif_event(group);
+		fsnotify_put_event(event);
+	}
+	mutex_unlock(&group->notification_mutex);
+}
+
+static void initialize_event(struct fsnotify_event *event)
+{
+	event->holder.event = NULL;
+	INIT_LIST_HEAD(&event->holder.event_list);
 	atomic_set(&event->refcnt, 1);
 
 	spin_lock_init(&event->lock);
@@ -72,7 +241,28 @@ struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask,
 	event->path.dentry = NULL;
 	event->path.mnt = NULL;
 	event->inode = NULL;
+	event->data_type = FSNOTIFY_EVENT_NONE;
 
+	event->to_tell = NULL;
+}
+
+/*
+ * create a new event to send to all interested fsnotify implementors.
+ * @to_tell the inode which is supposed to receive the event (sometimes a
+ *	parent of the inode to which the event happened.
+ * @mask what actually happened.
+ * @data pointer to the object which was actually affected
+ * @data_type flag indication if the data is a file, path, inode, nothing...
+ */
+struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type)
+{
+	struct fsnotify_event *event;
+
+	event = kmem_cache_alloc(event_kmem_cache, GFP_KERNEL);
+	if (!event)
+		return NULL;
+
+	initialize_event(event);
 	event->to_tell = to_tell;
 
 	switch (data_type) {
@@ -109,6 +299,10 @@ struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask,
 __init int fsnotify_notification_init(void)
 {
 	event_kmem_cache = kmem_cache_create("fsnotify_event", sizeof(struct fsnotify_event), 0, SLAB_PANIC, NULL);
+	event_holder_kmem_cache = kmem_cache_create("fsnotify_event_holder", sizeof(struct fsnotify_event_holder), 0, SLAB_PANIC, NULL);
+
+	initialize_event(&q_overflow_event);
+	q_overflow_event.mask = FS_Q_OVERFLOW;
 
 	return 0;
 }
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index c9a4da6..ddcc9da 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -95,6 +95,13 @@ struct fsnotify_group {
 
 	const struct fsnotify_ops *ops;	/* how this group handles things */
 
+	/* needed to send notification to userspace */
+	struct mutex notification_mutex;	/* protect the notification_list */
+	struct list_head notification_list;	/* list of event_holder this group needs to send to userspace */
+	wait_queue_head_t notification_waitq;	/* read() on the notification file blocks on this waitq */
+	unsigned int q_len;			/* events on the queue */
+	unsigned int max_events;		/* maximum events allowed on the list */
+
 	/* stores all fastapth entries assoc with this group so they can be cleaned on unregister */
 	spinlock_t mark_lock;		/* protect mark_entries list */
 	atomic_t num_marks;		/* 1 for each mark entry and 1 for not being
@@ -111,11 +118,32 @@ struct fsnotify_group {
 };
 
 /*
+ * A single event can be queued in multiple group->notification_lists.
+ *
+ * each group->notification_list will point to an event_holder which in turns points
+ * to the actual event that needs to be sent to userspace.
+ *
+ * Seemed cheaper to create a refcnt'd event and a small holder for every group
+ * than create a different event for every group
+ *
+ */
+struct fsnotify_event_holder {
+	struct fsnotify_event *event;
+	struct list_head event_list;
+};
+
+/*
  * all of the information about the original object we want to now send to
  * a group.  If you want to carry more info from the accessing task to the
  * listener this structure is where you need to be adding fields.
  */
 struct fsnotify_event {
+	/*
+	 * If we create an event we are also likely going to need a holder
+	 * to link to a group.  So embed one holder in the event.  Means only
+	 * one allocation for the common case where we only have one group
+	 */
+	struct fsnotify_event_holder holder;
 	spinlock_t lock;	/* protection for the associated event_holder and private_list */
 	struct inode *to_tell;
 	/*
@@ -127,6 +155,7 @@ struct fsnotify_event {
 		struct inode *inode;
 	};
 /* when calling fsnotify tell it if the data is a path or inode */
+#define FSNOTIFY_EVENT_NONE	0
 #define FSNOTIFY_EVENT_PATH	1
 #define FSNOTIFY_EVENT_INODE	2
 #define FSNOTIFY_EVENT_FILE	3
@@ -206,7 +235,11 @@ extern void fsnotify_put_group(struct fsnotify_group *group);
 
 extern void fsnotify_get_event(struct fsnotify_event *event);
 extern void fsnotify_put_event(struct fsnotify_event *event);
-extern struct fsnotify_event_private_data *fsnotify_get_priv_from_event(struct fsnotify_group *group, struct fsnotify_event *event);
+
+extern int fsnotify_add_notif_event(struct fsnotify_group *group, struct fsnotify_event *event);
+extern int fsnotify_check_notif_queue(struct fsnotify_group *group);
+extern struct fsnotify_event *fsnotify_peek_notif_event(struct fsnotify_group *group);
+extern struct fsnotify_event *fsnotify_remove_notif_event(struct fsnotify_group *group);
 
 /* functions used to manipulate the marks attached to inodes */
 extern void fsnotify_recalc_inode_mask(struct inode *inode);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 08/13] fsnotify: include pathnames with entries when possible
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (5 preceding siblings ...)
  2009-03-27 20:05 ` [PATCH -V2 07/13] fsnotify: generic notification queue and waitq Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-03-27 20:05 ` [PATCH -V2 09/13] fsnotify: add correlations between events Eric Paris
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

When inotify wants to send events to a directory about a child it includes
the name of the original file.  This patch collects that filename and makes
it available for notification.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/fsnotify.c             |    4 ++--
 fs/notify/fsnotify.h             |    4 ++++
 fs/notify/notification.c         |   19 ++++++++++++++++++-
 include/linux/fsnotify.h         |   30 +++++++++++++++---------------
 include/linux/fsnotify_backend.h |    9 ++++++---
 5 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 4cc2d46..6816b81 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(__fsnotify_inode_delete);
  * out to all of the registered fsnotify_group.  Those groups can then use the
  * notification event in whatever means they feel necessary.
  */
-void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is)
+void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *file_name)
 {
 	struct fsnotify_group *group;
 	struct fsnotify_event *event = NULL;
@@ -62,7 +62,7 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is)
 			if (!group->ops->should_send_event(group, to_tell, mask))
 				continue;
 			if (!event) {
-				event = fsnotify_create_event(to_tell, mask, data, data_is);
+				event = fsnotify_create_event(to_tell, mask, data, data_is, file_name);
 				/* shit, we OOM'd and now we can't tell, maybe
 				 * someday someone else will want to do something
 				 * here */
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index b258d44..05d1bf2 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -20,4 +20,8 @@ extern void fsnotify_flush_notif(struct fsnotify_group *group);
 extern void fsnotify_final_destroy_group(struct fsnotify_group *group);
 
 extern void fsnotify_clear_marks_by_inode(struct inode *inode);
+
+extern struct fsnotify_event_holder *fsnotify_alloc_event_holder(void);
+extern void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder);
+
 #endif	/* _LINUX_FSNOTIFY_PRIVATE_H */
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index b21b9bb..89e6422 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -62,6 +62,8 @@ void fsnotify_put_event(struct fsnotify_event *event)
 	if (atomic_dec_and_test(&event->refcnt)) {
 		if (event->data_type == FSNOTIFY_EVENT_PATH)
 			path_put(&event->path);
+
+		kfree(event->file_name);
 		kmem_cache_free(event_kmem_cache, event);
 	}
 }
@@ -244,6 +246,9 @@ static void initialize_event(struct fsnotify_event *event)
 	event->data_type = FSNOTIFY_EVENT_NONE;
 
 	event->to_tell = NULL;
+
+	event->file_name = NULL;
+	event->name_len = 0;
 }
 
 /*
@@ -254,7 +259,7 @@ static void initialize_event(struct fsnotify_event *event)
  * @data pointer to the object which was actually affected
  * @data_type flag indication if the data is a file, path, inode, nothing...
  */
-struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type)
+struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type, const char *name)
 {
 	struct fsnotify_event *event;
 
@@ -263,6 +268,15 @@ struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask,
 		return NULL;
 
 	initialize_event(event);
+
+	if (name) {
+		event->file_name = kstrdup(name, GFP_KERNEL);
+		if (!event->file_name) {
+			kmem_cache_free(event_kmem_cache, event);
+			return NULL;
+		}
+		event->name_len = strlen(event->file_name);
+	}
 	event->to_tell = to_tell;
 
 	switch (data_type) {
@@ -288,6 +302,9 @@ struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask,
 		event->data_type = FSNOTIFY_EVENT_INODE;
 		break;
 	default:
+		event->path.dentry = NULL;
+		event->path.mnt = NULL;
+		event->inode = NULL;
 		BUG();
 	};
 
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 8b9539c..48762b1 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -54,7 +54,7 @@ static inline void fsnotify_parent(struct dentry *dentry, __u32 mask)
 	spin_unlock(&dentry->d_lock);
 
 	if (send) {
-		fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE);
+		fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name);
 		dput(parent);
 	}
 }
@@ -92,7 +92,7 @@ static inline void fsnotify_inoderemove(struct inode *inode)
 	inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL, NULL);
 	inotify_inode_is_dead(inode);
 
-	fsnotify(inode, FS_DELETE_SELF, inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, FS_DELETE_SELF, inode, FSNOTIFY_EVENT_INODE, NULL);
 	__fsnotify_inode_delete(inode);
 }
 
@@ -126,15 +126,15 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 	inotify_inode_queue_event(new_dir, IN_MOVED_TO|isdir, cookie, new_name,
 				  source);
 
-	fsnotify(old_dir, old_dir_mask, old_dir, FSNOTIFY_EVENT_INODE);
-	fsnotify(new_dir, new_dir_mask, new_dir, FSNOTIFY_EVENT_INODE);
+	fsnotify(old_dir, old_dir_mask, old_dir, FSNOTIFY_EVENT_INODE, old_name);
+	fsnotify(new_dir, new_dir_mask, new_dir, FSNOTIFY_EVENT_INODE, new_name);
 
 	if (target)
 		fsnotify_inoderemove(target);
 
 	if (source) {
 		inotify_inode_queue_event(source, IN_MOVE_SELF, 0, NULL, NULL);
-		fsnotify(source, FS_MOVE_SELF, moved->d_inode, FSNOTIFY_EVENT_INODE);
+		fsnotify(source, FS_MOVE_SELF, moved->d_inode, FSNOTIFY_EVENT_INODE, NULL);
 	}
 	audit_inode_child(new_name, moved, new_dir);
 }
@@ -158,7 +158,7 @@ static inline void fsnotify_link_count(struct inode *inode)
 {
 	inotify_inode_queue_event(inode, IN_ATTRIB, 0, NULL, NULL);
 
-	fsnotify(inode, FS_ATTRIB, inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, FS_ATTRIB, inode, FSNOTIFY_EVENT_INODE, NULL);
 }
 
 /*
@@ -170,7 +170,7 @@ static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
 				  dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
 
-	fsnotify(inode, FS_CREATE, dentry->d_inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, FS_CREATE, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name);
 }
 
 /*
@@ -185,7 +185,7 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct
 	fsnotify_link_count(inode);
 	audit_inode_child(new_dentry->d_name.name, new_dentry, dir);
 
-	fsnotify(dir, FS_CREATE, inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(dir, FS_CREATE, inode, FSNOTIFY_EVENT_INODE, new_dentry->d_name.name);
 }
 
 /*
@@ -197,7 +197,7 @@ static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 				  dentry->d_name.name, dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
 
-	fsnotify(inode, FS_CREATE | FS_IN_ISDIR, dentry->d_inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, FS_CREATE | FS_IN_ISDIR, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name);
 }
 
 /*
@@ -215,7 +215,7 @@ static inline void fsnotify_access(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
 }
 
 /*
@@ -233,7 +233,7 @@ static inline void fsnotify_modify(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
 }
 
 /*
@@ -251,7 +251,7 @@ static inline void fsnotify_open(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
 }
 
 /*
@@ -272,7 +272,7 @@ static inline void fsnotify_close(struct file *file)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, file, FSNOTIFY_EVENT_FILE);
+	fsnotify(inode, mask, file, FSNOTIFY_EVENT_FILE, NULL);
 }
 
 /*
@@ -290,7 +290,7 @@ static inline void fsnotify_xattr(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
 }
 
 /*
@@ -328,7 +328,7 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 						  dentry->d_name.name);
 
 		fsnotify_parent(dentry, mask);
-		fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE);
+		fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
 	}
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index ddcc9da..c1f89da 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -162,6 +162,9 @@ struct fsnotify_event {
 	int data_type;		/* which of the above union we have */
 	atomic_t refcnt;	/* how many groups still are using/need to send this event */
 	__u32 mask;		/* the type of access */
+
+	char *file_name;
+	size_t name_len;
 };
 
 /*
@@ -192,7 +195,7 @@ struct fsnotify_mark_entry {
 #ifdef CONFIG_FSNOTIFY
 
 /* called from the vfs to signal fs events */
-extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is);
+extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *name);
 extern void __fsnotify_inode_delete(struct inode *inode);
 
 static inline int fsnotify_inode_watches_children(struct inode *inode)
@@ -252,11 +255,11 @@ extern void fsnotify_get_mark(struct fsnotify_mark_entry *entry);
 extern void fsnotify_put_mark(struct fsnotify_mark_entry *entry);
 
 /* put here because inotify does some weird stuff when destroying watches */
-extern struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_is);
+extern struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *name);
 
 #else
 
-static inline void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is);
+static inline void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *name);
 {}
 
 static inline void __fsnotify_inode_delete(struct inode *inode)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 09/13] fsnotify: add correlations between events
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (6 preceding siblings ...)
  2009-03-27 20:05 ` [PATCH -V2 08/13] fsnotify: include pathnames with entries when possible Eric Paris
@ 2009-03-27 20:05 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:06 ` [PATCH -V2 10/13] fsnotify: allow groups to add private data to events Eric Paris
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

inotify sends userspace a correlation between events when they are related
(aka when dentries are moved).  This adds that same support for all
fsnotify events.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/fsnotify.c             |    4 ++--
 fs/notify/notification.c         |   15 ++++++++++++++-
 include/linux/fsnotify.h         |   37 +++++++++++++++++++------------------
 include/linux/fsnotify_backend.h |   15 ++++++++++++---
 4 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 6816b81..5b4622b 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -37,7 +37,7 @@ EXPORT_SYMBOL_GPL(__fsnotify_inode_delete);
  * out to all of the registered fsnotify_group.  Those groups can then use the
  * notification event in whatever means they feel necessary.
  */
-void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *file_name)
+void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *file_name, u32 cookie)
 {
 	struct fsnotify_group *group;
 	struct fsnotify_event *event = NULL;
@@ -62,7 +62,7 @@ void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const
 			if (!group->ops->should_send_event(group, to_tell, mask))
 				continue;
 			if (!event) {
-				event = fsnotify_create_event(to_tell, mask, data, data_is, file_name);
+				event = fsnotify_create_event(to_tell, mask, data, data_is, file_name, cookie);
 				/* shit, we OOM'd and now we can't tell, maybe
 				 * someday someone else will want to do something
 				 * here */
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 89e6422..420769e 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
+#include <linux/module.h>
 #include <linux/mount.h>
 #include <linux/mutex.h>
 #include <linux/namei.h>
@@ -41,6 +42,13 @@ static struct kmem_cache *event_holder_kmem_cache;
  * get set to 0 so it will never get 'freed'
  */
 static struct fsnotify_event q_overflow_event;
+static atomic_t fsnotify_sync_cookie = ATOMIC_INIT(0);
+
+u32 fsnotify_get_cookie(void)
+{
+	return atomic_inc_return(&fsnotify_sync_cookie);
+}
+EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
 
 /* return 1 if something is available, return 0 otherwise */
 int fsnotify_check_notif_queue(struct fsnotify_group *group)
@@ -249,6 +257,8 @@ static void initialize_event(struct fsnotify_event *event)
 
 	event->file_name = NULL;
 	event->name_len = 0;
+
+	event->sync_cookie = 0;
 }
 
 /*
@@ -259,7 +269,8 @@ static void initialize_event(struct fsnotify_event *event)
  * @data pointer to the object which was actually affected
  * @data_type flag indication if the data is a file, path, inode, nothing...
  */
-struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type, const char *name)
+struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data,
+					     int data_type, const char *name, u32 cookie)
 {
 	struct fsnotify_event *event;
 
@@ -277,6 +288,8 @@ struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask,
 		}
 		event->name_len = strlen(event->file_name);
 	}
+
+	event->sync_cookie = cookie;
 	event->to_tell = to_tell;
 
 	switch (data_type) {
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 48762b1..7f4efee 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -54,7 +54,7 @@ static inline void fsnotify_parent(struct dentry *dentry, __u32 mask)
 	spin_unlock(&dentry->d_lock);
 
 	if (send) {
-		fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name);
+		fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name, 0);
 		dput(parent);
 	}
 }
@@ -92,7 +92,7 @@ static inline void fsnotify_inoderemove(struct inode *inode)
 	inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL, NULL);
 	inotify_inode_is_dead(inode);
 
-	fsnotify(inode, FS_DELETE_SELF, inode, FSNOTIFY_EVENT_INODE, NULL);
+	fsnotify(inode, FS_DELETE_SELF, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 	__fsnotify_inode_delete(inode);
 }
 
@@ -104,7 +104,8 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 				 int isdir, struct inode *target, struct dentry *moved)
 {
 	struct inode *source = moved->d_inode;
-	u32 cookie = inotify_get_cookie();
+	u32 in_cookie = inotify_get_cookie();
+	u32 fs_cookie = fsnotify_get_cookie();
 	__u32 old_dir_mask = 0;
 	__u32 new_dir_mask = 0;
 
@@ -121,20 +122,20 @@ static inline void fsnotify_move(struct inode *old_dir, struct inode *new_dir,
 	old_dir_mask |= FS_MOVED_FROM;
 	new_dir_mask |= FS_MOVED_TO;
 
-	inotify_inode_queue_event(old_dir, IN_MOVED_FROM|isdir,cookie,old_name,
+	inotify_inode_queue_event(old_dir, IN_MOVED_FROM|isdir, in_cookie, old_name,
 				  source);
-	inotify_inode_queue_event(new_dir, IN_MOVED_TO|isdir, cookie, new_name,
+	inotify_inode_queue_event(new_dir, IN_MOVED_TO|isdir, in_cookie, new_name,
 				  source);
 
-	fsnotify(old_dir, old_dir_mask, old_dir, FSNOTIFY_EVENT_INODE, old_name);
-	fsnotify(new_dir, new_dir_mask, new_dir, FSNOTIFY_EVENT_INODE, new_name);
+	fsnotify(old_dir, old_dir_mask, old_dir, FSNOTIFY_EVENT_INODE, old_name, fs_cookie);
+	fsnotify(new_dir, new_dir_mask, new_dir, FSNOTIFY_EVENT_INODE, new_name, fs_cookie);
 
 	if (target)
 		fsnotify_inoderemove(target);
 
 	if (source) {
 		inotify_inode_queue_event(source, IN_MOVE_SELF, 0, NULL, NULL);
-		fsnotify(source, FS_MOVE_SELF, moved->d_inode, FSNOTIFY_EVENT_INODE, NULL);
+		fsnotify(source, FS_MOVE_SELF, moved->d_inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 	}
 	audit_inode_child(new_name, moved, new_dir);
 }
@@ -158,7 +159,7 @@ static inline void fsnotify_link_count(struct inode *inode)
 {
 	inotify_inode_queue_event(inode, IN_ATTRIB, 0, NULL, NULL);
 
-	fsnotify(inode, FS_ATTRIB, inode, FSNOTIFY_EVENT_INODE, NULL);
+	fsnotify(inode, FS_ATTRIB, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 }
 
 /*
@@ -170,7 +171,7 @@ static inline void fsnotify_create(struct inode *inode, struct dentry *dentry)
 				  dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
 
-	fsnotify(inode, FS_CREATE, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name);
+	fsnotify(inode, FS_CREATE, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name, 0);
 }
 
 /*
@@ -185,7 +186,7 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode, struct
 	fsnotify_link_count(inode);
 	audit_inode_child(new_dentry->d_name.name, new_dentry, dir);
 
-	fsnotify(dir, FS_CREATE, inode, FSNOTIFY_EVENT_INODE, new_dentry->d_name.name);
+	fsnotify(dir, FS_CREATE, inode, FSNOTIFY_EVENT_INODE, new_dentry->d_name.name, 0);
 }
 
 /*
@@ -197,7 +198,7 @@ static inline void fsnotify_mkdir(struct inode *inode, struct dentry *dentry)
 				  dentry->d_name.name, dentry->d_inode);
 	audit_inode_child(dentry->d_name.name, dentry, inode);
 
-	fsnotify(inode, FS_CREATE | FS_IN_ISDIR, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name);
+	fsnotify(inode, FS_CREATE | FS_IN_ISDIR, dentry->d_inode, FSNOTIFY_EVENT_INODE, dentry->d_name.name, 0);
 }
 
 /*
@@ -215,7 +216,7 @@ static inline void fsnotify_access(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 }
 
 /*
@@ -233,7 +234,7 @@ static inline void fsnotify_modify(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 }
 
 /*
@@ -251,7 +252,7 @@ static inline void fsnotify_open(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 }
 
 /*
@@ -272,7 +273,7 @@ static inline void fsnotify_close(struct file *file)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, file, FSNOTIFY_EVENT_FILE, NULL);
+	fsnotify(inode, mask, file, FSNOTIFY_EVENT_FILE, NULL, 0);
 }
 
 /*
@@ -290,7 +291,7 @@ static inline void fsnotify_xattr(struct dentry *dentry)
 	inotify_inode_queue_event(inode, mask, 0, NULL, NULL);
 
 	fsnotify_parent(dentry, mask);
-	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
+	fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 }
 
 /*
@@ -328,7 +329,7 @@ static inline void fsnotify_change(struct dentry *dentry, unsigned int ia_valid)
 						  dentry->d_name.name);
 
 		fsnotify_parent(dentry, mask);
-		fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL);
+		fsnotify(inode, mask, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
 	}
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index c1f89da..f1b61b7 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -163,6 +163,7 @@ struct fsnotify_event {
 	atomic_t refcnt;	/* how many groups still are using/need to send this event */
 	__u32 mask;		/* the type of access */
 
+	u32 sync_cookie;	/* used to corrolate events, namely inotify mv events */
 	char *file_name;
 	size_t name_len;
 };
@@ -195,8 +196,9 @@ struct fsnotify_mark_entry {
 #ifdef CONFIG_FSNOTIFY
 
 /* called from the vfs to signal fs events */
-extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *name);
+extern void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *name, u32 cookie);
 extern void __fsnotify_inode_delete(struct inode *inode);
+extern u32 fsnotify_get_cookie(void);
 
 static inline int fsnotify_inode_watches_children(struct inode *inode)
 {
@@ -255,11 +257,13 @@ extern void fsnotify_get_mark(struct fsnotify_mark_entry *entry);
 extern void fsnotify_put_mark(struct fsnotify_mark_entry *entry);
 
 /* put here because inotify does some weird stuff when destroying watches */
-extern struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *name);
+extern struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data,
+						    int data_is, const char *name, __u32 cookie);
 
 #else
 
-static inline void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, const char *name);
+static inline void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
+			    const char *name, u32 cookie);
 {}
 
 static inline void __fsnotify_inode_delete(struct inode *inode)
@@ -268,6 +272,11 @@ static inline void __fsnotify_inode_delete(struct inode *inode)
 static inline void __fsnotify_d_instantiate(struct dentry *dentry, struct inode *inode)
 {}
 
+static inline u32 fsnotify_get_cookie(void)
+{
+	return 0;
+}
+
 #endif	/* CONFIG_FSNOTIFY */
 
 #endif	/* __KERNEL __ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 10/13] fsnotify: allow groups to add private data to events
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (7 preceding siblings ...)
  2009-03-27 20:05 ` [PATCH -V2 09/13] fsnotify: add correlations between events Eric Paris
@ 2009-03-27 20:06 ` Eric Paris
  2009-03-27 20:06 ` [PATCH -V2 11/13] fsnotify: fsnotify marks on inodes pin them in core Eric Paris
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

inotify needs per group information attached to events.  This patch allows
groups to attach private information and implements a callback so that
information can be freed when an event is being destroyed.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/dnotify/dnotify.c      |    1 +
 fs/notify/notification.c         |   31 +++++++++++++++++++++++++++----
 include/linux/fsnotify_backend.h |   15 ++++++++++++++-
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index f69e0c4..26c07c5 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -152,6 +152,7 @@ static struct fsnotify_ops dnotify_fsnotify_ops = {
 	.should_send_event = dnotify_should_send_event,
 	.free_group_priv = NULL,
 	.freeing_mark = dnotify_freeing_mark,
+	.free_event_priv = NULL,
 };
 
 void dnotify_flush(struct file *filp, fl_owner_t id)
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 420769e..2750703 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -71,6 +71,7 @@ void fsnotify_put_event(struct fsnotify_event *event)
 		if (event->data_type == FSNOTIFY_EVENT_PATH)
 			path_put(&event->path);
 
+		BUG_ON(!list_empty(&event->private_data_list));
 		kfree(event->file_name);
 		kmem_cache_free(event_kmem_cache, event);
 	}
@@ -86,8 +87,23 @@ void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder)
 	kmem_cache_free(event_holder_kmem_cache, holder);
 }
 
+struct fsnotify_event_private_data *fsnotify_get_priv_from_event(struct fsnotify_group *group, struct fsnotify_event *event)
+{
+	struct fsnotify_event_private_data *lpriv;
+	struct fsnotify_event_private_data *priv = NULL;
+
+	list_for_each_entry(lpriv, &event->private_data_list, event_list) {
+		if (lpriv->group == group) {
+			priv = lpriv;
+			break;
+		}
+	}
+	return priv;
+}
+
 /*
- * check if 2 events contain the same information.
+ * check if 2 events contain the same information.  we do not compare private data
+ * but at this moment that isn't a problem.
  */
 static inline int event_compare(struct fsnotify_event *old, struct fsnotify_event *new)
 {
@@ -111,10 +127,11 @@ static inline int event_compare(struct fsnotify_event *old, struct fsnotify_even
 }
 
 /*
- * Add an event to the group notification queue.  The group can later pull this
- * event off the queue to deal with.
+ * Add events to the generic group notification queue.  We test if the event
+ * is the same as the last event in the queue, and if so, we do not add it.
+ * Events added to this queue must be removed with fsnotify_remove_notif_event.
  */
-int fsnotify_add_notif_event(struct fsnotify_group *group, struct fsnotify_event *event)
+int fsnotify_add_notif_event(struct fsnotify_group *group, struct fsnotify_event *event, struct fsnotify_event_private_data *priv)
 {
 	struct fsnotify_event_holder *holder = NULL;
 	struct list_head *list = &group->notification_list;
@@ -171,6 +188,8 @@ alloc_holder:
 
 	fsnotify_get_event(event);
 	list_add_tail(&holder->event_list, list);
+	if (priv)
+		list_add_tail(&priv->event_list, &event->private_data_list);
 	spin_unlock(&event->lock);
 	mutex_unlock(&group->notification_mutex);
 
@@ -235,6 +254,8 @@ void fsnotify_flush_notif(struct fsnotify_group *group)
 	mutex_lock(&group->notification_mutex);
 	while (fsnotify_check_notif_queue(group)) {
 		event = fsnotify_remove_notif_event(group);
+		if (group->ops->free_event_priv)
+			group->ops->free_event_priv(group, event);
 		fsnotify_put_event(event);
 	}
 	mutex_unlock(&group->notification_mutex);
@@ -253,6 +274,8 @@ static void initialize_event(struct fsnotify_event *event)
 	event->inode = NULL;
 	event->data_type = FSNOTIFY_EVENT_NONE;
 
+	INIT_LIST_HEAD(&event->private_data_list);
+
 	event->to_tell = NULL;
 
 	event->file_name = NULL;
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f1b61b7..2da9790 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -79,6 +79,7 @@ struct fsnotify_ops {
 	int (*handle_event)(struct fsnotify_group *group, struct fsnotify_event *event);
 	void (*free_group_priv)(struct fsnotify_group *group);
 	void (*freeing_mark)(struct fsnotify_mark_entry *entry, struct fsnotify_group *group);
+	void (*free_event_priv)(struct fsnotify_group *group, struct fsnotify_event *event);
 };
 
 /*
@@ -133,6 +134,15 @@ struct fsnotify_event_holder {
 };
 
 /*
+ * Inotify needs to tack data onto an event.  This struct lets us later find the
+ * correct private data of the correct group.
+ */
+struct fsnotify_event_private_data {
+	struct fsnotify_group *group;
+	struct list_head event_list;
+};
+
+/*
  * all of the information about the original object we want to now send to
  * a group.  If you want to carry more info from the accessing task to the
  * listener this structure is where you need to be adding fields.
@@ -166,6 +176,8 @@ struct fsnotify_event {
 	u32 sync_cookie;	/* used to corrolate events, namely inotify mv events */
 	char *file_name;
 	size_t name_len;
+
+	struct list_head private_data_list;	/* groups can store private data here */
 };
 
 /*
@@ -241,10 +253,11 @@ extern void fsnotify_put_group(struct fsnotify_group *group);
 extern void fsnotify_get_event(struct fsnotify_event *event);
 extern void fsnotify_put_event(struct fsnotify_event *event);
 
-extern int fsnotify_add_notif_event(struct fsnotify_group *group, struct fsnotify_event *event);
+extern int fsnotify_add_notif_event(struct fsnotify_group *group, struct fsnotify_event *event, struct fsnotify_event_private_data *priv);
 extern int fsnotify_check_notif_queue(struct fsnotify_group *group);
 extern struct fsnotify_event *fsnotify_peek_notif_event(struct fsnotify_group *group);
 extern struct fsnotify_event *fsnotify_remove_notif_event(struct fsnotify_group *group);
+extern struct fsnotify_event_private_data *fsnotify_get_priv_from_event(struct fsnotify_group *group, struct fsnotify_event *event);
 
 /* functions used to manipulate the marks attached to inodes */
 extern void fsnotify_recalc_inode_mask(struct inode *inode);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 11/13] fsnotify: fsnotify marks on inodes pin them in core
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (8 preceding siblings ...)
  2009-03-27 20:06 ` [PATCH -V2 10/13] fsnotify: allow groups to add private data to events Eric Paris
@ 2009-03-27 20:06 ` Eric Paris
  2009-03-27 20:06 ` [PATCH -V2 12/13] fsnotify: handle filesystem unmounts with fsnotify marks Eric Paris
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

This patch pins any inodes with an fsnotify mark in core.  The idea is that
as soon as the mark is removed from the inode->fsnotify_mark_entries list
the inode will be iput.  In reality is doesn't quite work exactly this way.
The igrab will happen when the mark is added to an inode, but the iput will
happen when the inode pointer is NULL'd inside the mark.

It's possible that 2 racing things will try to remove the mark from
different directions.  One may try to remove the mark because of an
explicit request and one might try to remove it because the inode was
deleted.  It's possible that the removal because of inode deletion will
remove the mark from the inode's list, but the removal by explicit request
will actually set entry->inode == NULL; and call the iput.  This is safe.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/notify/inode_mark.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index e59a198..e9e89d2 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -160,6 +160,8 @@ void fsnotify_destroy_mark_by_entry(struct fsnotify_mark_entry *entry)
 
 	fsnotify_update_dentry_child_flags(inode);
 
+	iput(inode);
+
 	if (atomic_dec_and_test(&group->num_marks))
 		fsnotify_final_destroy_group(group);
 }
@@ -231,11 +233,16 @@ void fsnotify_init_mark(struct fsnotify_mark_entry *entry, void (*free_mark)(str
 	entry->free_mark = free_mark;
 }
 
-int fsnotify_add_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *group, struct inode *inode)
+int fsnotify_add_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *group, struct inode *in_inode)
 {
 	struct fsnotify_mark_entry *lentry;
+	struct inode *inode;
 	int ret = 0;
 
+	inode = igrab(in_inode);
+	if (unlikely(!inode))
+		return -EINVAL;
+
 	/*
 	 * LOCKING ORDER!!!!
 	 * entry->lock


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 12/13] fsnotify: handle filesystem unmounts with fsnotify marks
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (9 preceding siblings ...)
  2009-03-27 20:06 ` [PATCH -V2 11/13] fsnotify: fsnotify marks on inodes pin them in core Eric Paris
@ 2009-03-27 20:06 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-03-27 20:06 ` [PATCH -V2 13/13] inotify: reimplement inotify using fsnotify Eric Paris
  2009-04-07 23:06 ` [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Andrew Morton
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

When an fs is unmounted with an fsnotify mark entry attached to one of its
inodes we need to destroy that mark entry and we also (like inotify) send
an unmount event.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 fs/inode.c                       |    1 +
 fs/notify/inode_mark.c           |   73 ++++++++++++++++++++++++++++++++++++++
 include/linux/fsnotify_backend.h |    4 ++
 3 files changed, 78 insertions(+), 0 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 6a9a98e..7922d0b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -406,6 +406,7 @@ int invalidate_inodes(struct super_block * sb)
 	mutex_lock(&iprune_mutex);
 	spin_lock(&inode_lock);
 	inotify_unmount_inodes(&sb->s_inodes);
+	fsnotify_unmount_inodes(&sb->s_inodes);
 	busy = invalidate_list(&sb->s_inodes, &throw_away);
 	spin_unlock(&inode_lock);
 
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index e9e89d2..06a00be 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -23,6 +23,7 @@
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/writeback.h> /* for inode_lock */
 
 #include <asm/atomic.h>
 
@@ -282,3 +283,75 @@ int fsnotify_add_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *
 
 	return ret;
 }
+
+/**
+ * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
+ * @list: list of inodes being unmounted (sb->s_inodes)
+ *
+ * Called with inode_lock held, protecting the unmounting super block's list
+ * of inodes, and with iprune_mutex held, keeping shrink_icache_memory() at bay.
+ * We temporarily drop inode_lock, however, and CAN block.
+ */
+void fsnotify_unmount_inodes(struct list_head *list)
+{
+	struct inode *inode, *next_i, *need_iput = NULL;
+
+	list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
+		struct inode *need_iput_tmp;
+
+		/*
+		 * If i_count is zero, the inode cannot have any watches and
+		 * doing an __iget/iput with MS_ACTIVE clear would actually
+		 * evict all inodes with zero i_count from icache which is
+		 * unnecessarily violent and may in fact be illegal to do.
+		 */
+		if (!atomic_read(&inode->i_count))
+			continue;
+
+		/*
+		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
+		 * I_WILL_FREE which is fine because by that point the inode
+		 * cannot have any associated watches.
+		 */
+		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
+			continue;
+
+		need_iput_tmp = need_iput;
+		need_iput = NULL;
+
+		/* In case fsnotify_inode_delete() drops a reference. */
+		if (inode != need_iput_tmp)
+			__iget(inode);
+		else
+			need_iput_tmp = NULL;
+
+		/* In case the dropping of a reference would nuke next_i. */
+		if ((&next_i->i_sb_list != list) &&
+		    atomic_read(&next_i->i_count) &&
+		    !(next_i->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))) {
+			__iget(next_i);
+			need_iput = next_i;
+		}
+
+		/*
+		 * We can safely drop inode_lock here because we hold
+		 * references on both inode and next_i.  Also no new inodes
+		 * will be added since the umount has begun.  Finally,
+		 * iprune_mutex keeps shrink_icache_memory() away.
+		 */
+		spin_unlock(&inode_lock);
+
+		if (need_iput_tmp)
+			iput(need_iput_tmp);
+
+		/* for each watch, send FS_UNMOUNT and then remove it */
+		fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+
+		fsnotify_inode_delete(inode);
+
+		iput(inode);
+
+		spin_lock(&inode_lock);
+	}
+}
+EXPORT_SYMBOL_GPL(fsnotify_unmount_inodes);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 2da9790..f2daf59 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -268,6 +268,7 @@ extern void fsnotify_destroy_mark_by_entry(struct fsnotify_mark_entry *entry);
 extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group);
 extern void fsnotify_get_mark(struct fsnotify_mark_entry *entry);
 extern void fsnotify_put_mark(struct fsnotify_mark_entry *entry);
+extern void fsnotify_unmount_inodes(struct list_head *list);
 
 /* put here because inotify does some weird stuff when destroying watches */
 extern struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data,
@@ -290,6 +291,9 @@ static inline u32 fsnotify_get_cookie(void)
 	return 0;
 }
 
+static inline void fsnotify_unmount_inodes(struct list_head *list)
+{}
+
 #endif	/* CONFIG_FSNOTIFY */
 
 #endif	/* __KERNEL __ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH -V2 13/13] inotify: reimplement inotify using fsnotify
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (10 preceding siblings ...)
  2009-03-27 20:06 ` [PATCH -V2 12/13] fsnotify: handle filesystem unmounts with fsnotify marks Eric Paris
@ 2009-03-27 20:06 ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  2009-04-07 23:06 ` [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Andrew Morton
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Paris @ 2009-03-27 20:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, hch, alan, sfr, john, rlove, akpm

Reimplement inotify_user using fsnotify.  This should be feature for feature
exactly the same as the original inotify_user.  This does not make any changes
to the in kernel inotify feature used by audit.  Those patches (and the eventual
removal of in kernel inotify) will come after the new inotify_user proves to be
working correctly.

Signed-off-by: Eric Paris <eparis@redhat.com>
---

 MAINTAINERS                          |    2 
 fs/notify/inotify/Kconfig            |   20 +
 fs/notify/inotify/Makefile           |    2 
 fs/notify/inotify/inotify.h          |  107 ++++++
 fs/notify/inotify/inotify_fsnotify.c |  156 +++++++++
 fs/notify/inotify/inotify_kernel.c   |  276 ++++++++++++++++
 fs/notify/inotify/inotify_user.c     |  584 ++++++++--------------------------
 include/linux/fsnotify_backend.h     |   11 +
 8 files changed, 707 insertions(+), 451 deletions(-)
 create mode 100644 fs/notify/inotify/inotify.h
 create mode 100644 fs/notify/inotify/inotify_fsnotify.c
 create mode 100644 fs/notify/inotify/inotify_kernel.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3877ec4..6fa9f03 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2252,6 +2252,8 @@ P:	John McCutchan
 M:	john@johnmccutchan.com
 P:	Robert Love
 M:	rlove@rlove.org
+P:	Eric Paris
+M:	eparis@parisplace.org
 L:	linux-kernel@vger.kernel.org
 S:	Maintained
 
diff --git a/fs/notify/inotify/Kconfig b/fs/notify/inotify/Kconfig
index 4467928..5356884 100644
--- a/fs/notify/inotify/Kconfig
+++ b/fs/notify/inotify/Kconfig
@@ -1,26 +1,30 @@
 config INOTIFY
 	bool "Inotify file change notification support"
-	default y
+	default n
 	---help---
-	  Say Y here to enable inotify support.  Inotify is a file change
-	  notification system and a replacement for dnotify.  Inotify fixes
-	  numerous shortcomings in dnotify and introduces several new features
-	  including multiple file events, one-shot support, and unmount
-	  notification.
+	  Say Y here to enable legacy in kernel inotify support.  Inotify is a
+	  file change notification system.  It is a replacement for dnotify.
+	  This option only provides the legacy inotify in kernel API.  There
+	  are no in tree kernel users of this interface since it is deprecated.
+	  You only need this if you are loading an out of tree kernel module
+	  that uses inotify.
 
 	  For more information, see <file:Documentation/filesystems/inotify.txt>
 
-	  If unsure, say Y.
+	  If unsure, say N.
 
 config INOTIFY_USER
 	bool "Inotify support for userspace"
-	depends on INOTIFY
+	depends on FSNOTIFY
 	default y
 	---help---
 	  Say Y here to enable inotify support for userspace, including the
 	  associated system calls.  Inotify allows monitoring of both files and
 	  directories via a single open fd.  Events are read from the file
 	  descriptor, which is also select()- and poll()-able.
+	  Inotify fixes numerous shortcomings in dnotify and introduces several
+	  new features including multiple file events, one-shot support, and
+	  unmount notification.
 
 	  For more information, see <file:Documentation/filesystems/inotify.txt>
 
diff --git a/fs/notify/inotify/Makefile b/fs/notify/inotify/Makefile
index e290f3b..aff7f68 100644
--- a/fs/notify/inotify/Makefile
+++ b/fs/notify/inotify/Makefile
@@ -1,2 +1,2 @@
 obj-$(CONFIG_INOTIFY)		+= inotify.o
-obj-$(CONFIG_INOTIFY_USER)	+= inotify_user.o
+obj-$(CONFIG_INOTIFY_USER)	+= inotify_fsnotify.o inotify_kernel.o inotify_user.o
diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
new file mode 100644
index 0000000..04922de
--- /dev/null
+++ b/fs/notify/inotify/inotify.h
@@ -0,0 +1,107 @@
+/*
+ * fs/inotify_user.c - inotify support for userspace
+ *
+ * Authors:
+ *	John McCutchan	<ttb@tentacle.dhs.org>
+ *	Robert Love	<rml@novell.com>
+ *
+ * Copyright (C) 2005 John McCutchan
+ * Copyright 2006 Hewlett-Packard Development Company, L.P.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/fsnotify_backend.h>
+#include <linux/limits.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/inotify.h>
+#include <linux/syscalls.h>
+#include <linux/string.h>
+#include <linux/magic.h>
+#include <linux/writeback.h>
+#include <linux/fsnotify.h>
+
+#include <asm/ioctls.h>
+
+extern struct kmem_cache *event_priv_cachep;
+extern int inotify_max_user_watches;
+
+struct inotify_event_private_data {
+	struct fsnotify_event_private_data fsnotify_event_priv_data;
+	int wd;
+};
+
+struct inotify_inode_mark_entry {
+	/* fsnotify_mark_entry MUST be the first thing */
+	struct fsnotify_mark_entry fsn_entry;
+	int wd;
+};
+
+static inline __u32 inotify_arg_to_mask(u32 arg)
+{
+	__u32 mask;
+
+	/* FS_* damn sure better equal IN_* */
+	BUILD_BUG_ON(IN_ACCESS != FS_ACCESS);
+	BUILD_BUG_ON(IN_MODIFY != FS_MODIFY);
+	BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB);
+	BUILD_BUG_ON(IN_CLOSE_WRITE != FS_CLOSE_WRITE);
+	BUILD_BUG_ON(IN_CLOSE_NOWRITE != FS_CLOSE_NOWRITE);
+	BUILD_BUG_ON(IN_OPEN != FS_OPEN);
+	BUILD_BUG_ON(IN_MOVED_FROM != FS_MOVED_FROM);
+	BUILD_BUG_ON(IN_MOVED_TO != FS_MOVED_TO);
+	BUILD_BUG_ON(IN_CREATE != FS_CREATE);
+	BUILD_BUG_ON(IN_DELETE != FS_DELETE);
+	BUILD_BUG_ON(IN_DELETE_SELF != FS_DELETE_SELF);
+	BUILD_BUG_ON(IN_MOVE_SELF != FS_MOVE_SELF);
+	BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW);
+
+	BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT);
+	BUILD_BUG_ON(IN_ISDIR != FS_IN_ISDIR);
+	BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED);
+	BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT);
+
+	/* everything should accept their own ignored and cares about children */
+	mask = (FS_IN_IGNORED | FS_EVENT_ON_CHILD);
+
+	/* mask off the flags used to open the fd */
+	mask |= (arg & (IN_ALL_EVENTS | IN_ONESHOT));
+
+	return mask;
+}
+
+static inline u32 inotify_mask_to_arg(__u32 mask)
+{
+	u32 arg;
+
+	arg = (mask & (IN_ALL_EVENTS | IN_ISDIR | IN_UNMOUNT | IN_IGNORED | IN_Q_OVERFLOW));
+
+	return arg;
+}
+
+
+extern int find_inode(const char __user *dirname, struct path *path, unsigned flags);
+extern void inotify_destroy_mark_entry(struct fsnotify_mark_entry *entry, struct fsnotify_group *group);
+extern int inotify_update_watch(struct fsnotify_group *group, struct inode *inode, u32 arg);
+extern struct fsnotify_group *inotify_new_group(struct user_struct *user, unsigned int max_events);
+extern void __inotify_free_event_priv(struct inotify_event_private_data *event_priv);
+
+extern const struct fsnotify_ops inotify_fsnotify_ops;
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
new file mode 100644
index 0000000..ed6906b
--- /dev/null
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -0,0 +1,156 @@
+/*
+ * fs/inotify_user.c - inotify support for userspace
+ *
+ * Authors:
+ *	John McCutchan	<ttb@tentacle.dhs.org>
+ *	Robert Love	<rml@novell.com>
+ *
+ * Copyright (C) 2005 John McCutchan
+ * Copyright 2006 Hewlett-Packard Development Company, L.P.
+ *
+ * Copyright (C) 2009 Eric Paris <Red Hat Inc>
+ * inotify was largely rewriten to make use of the fsnotify infrastructure
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/limits.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/inotify.h>
+#include <linux/list.h>
+#include <linux/syscalls.h>
+#include <linux/string.h>
+#include <linux/magic.h>
+#include <linux/writeback.h>
+
+#include "inotify.h"
+
+#include <asm/ioctls.h>
+
+static int inotify_handle_event(struct fsnotify_group *group, struct fsnotify_event *event)
+{
+	struct fsnotify_mark_entry *entry;
+	struct inotify_inode_mark_entry *ientry;
+	struct inode *to_tell;
+	struct inotify_event_private_data *event_priv;
+	int wd, ret;
+
+	to_tell = event->to_tell;
+
+	spin_lock(&to_tell->i_lock);
+	entry = fsnotify_find_mark_entry(group, to_tell);
+	spin_unlock(&to_tell->i_lock);
+	/* race with watch removal? */
+	if (!entry)
+		return 0;
+	ientry = (struct inotify_inode_mark_entry *)entry;
+
+	wd = ientry->wd;
+
+	fsnotify_put_mark(entry);
+
+	event_priv = kmem_cache_alloc(event_priv_cachep, GFP_KERNEL);
+	if (unlikely(!event_priv))
+		return -ENOMEM;
+
+	event_priv->fsnotify_event_priv_data.group = group;
+	event_priv->wd = wd;
+
+	ret = fsnotify_add_notif_event(group, event, &event_priv->fsnotify_event_priv_data);
+	if (ret) {
+		__inotify_free_event_priv(event_priv);
+		if (ret == -EALREADY)
+			ret = 0;
+	}
+
+	return ret;
+}
+
+static void inotify_freeing_mark(struct fsnotify_mark_entry *entry, struct fsnotify_group *group)
+{
+	inotify_destroy_mark_entry(entry, group);
+}
+
+static int inotify_should_send_event(struct fsnotify_group *group, struct inode *inode, __u32 mask)
+{
+	struct fsnotify_mark_entry *entry;
+	int send;
+
+	spin_lock(&inode->i_lock);
+	entry = fsnotify_find_mark_entry(group, inode);
+	spin_unlock(&inode->i_lock);
+	if (!entry)
+		return 0;
+
+	spin_lock(&entry->lock);
+	send = !!(entry->mask & mask);
+	spin_unlock(&entry->lock);
+
+	/* find took a reference */
+	fsnotify_put_mark(entry);
+
+	return send;
+}
+
+static int idr_callback(int id, void *p, void *data)
+{
+	BUG();
+	return 0;
+}
+
+static void inotify_free_group_priv(struct fsnotify_group *group)
+{
+	/* ideally the idr is empty and we won't hit the BUG in teh callback */
+	idr_for_each(&group->inotify_data.idr, idr_callback, NULL);
+	idr_remove_all(&group->inotify_data.idr);
+	idr_destroy(&group->inotify_data.idr);
+}
+
+void __inotify_free_event_priv(struct inotify_event_private_data *event_priv)
+{
+	list_del_init(&event_priv->fsnotify_event_priv_data.event_list);
+	kmem_cache_free(event_priv_cachep, event_priv);
+}
+
+static void inotify_free_event_priv(struct fsnotify_group *group, struct fsnotify_event *event)
+{
+	struct fsnotify_event_private_data *fsn_event_priv;
+	struct inotify_event_private_data *event_priv;
+
+	spin_lock(&event->lock);
+
+	fsn_event_priv = fsnotify_get_priv_from_event(group, event);
+	BUG_ON(!fsn_event_priv);
+
+	event_priv = container_of(fsn_event_priv, struct inotify_event_private_data, fsnotify_event_priv_data);
+
+	__inotify_free_event_priv(event_priv);
+
+	spin_unlock(&event->lock);
+}
+
+const struct fsnotify_ops inotify_fsnotify_ops = {
+	.handle_event = inotify_handle_event,
+	.should_send_event = inotify_should_send_event,
+	.free_group_priv = inotify_free_group_priv,
+	.free_event_priv = inotify_free_event_priv,
+	.freeing_mark = inotify_freeing_mark,
+};
diff --git a/fs/notify/inotify/inotify_kernel.c b/fs/notify/inotify/inotify_kernel.c
new file mode 100644
index 0000000..43f4ecb
--- /dev/null
+++ b/fs/notify/inotify/inotify_kernel.c
@@ -0,0 +1,276 @@
+/*
+ * fs/inotify_user.c - inotify support for userspace
+ *
+ * Authors:
+ *	John McCutchan	<ttb@tentacle.dhs.org>
+ *	Robert Love	<rml@novell.com>
+ *
+ * Copyright (C) 2005 John McCutchan
+ * Copyright 2006 Hewlett-Packard Development Company, L.P.
+ *
+ * Copyright (C) 2009 Eric Paris <Red Hat Inc>
+ * inotify was largely rewriten to make use of the fsnotify infrastructure
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/limits.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/idr.h>
+#include <linux/init.h>
+#include <linux/inotify.h>
+#include <linux/list.h>
+#include <linux/syscalls.h>
+#include <linux/string.h>
+#include <linux/magic.h>
+#include <linux/writeback.h>
+
+#include "inotify.h"
+
+#include <asm/ioctls.h>
+
+static struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
+struct kmem_cache *event_priv_cachep __read_mostly;
+static struct fsnotify_event *inotify_ignored_event;
+
+atomic_t inotify_grp_num;
+
+/*
+ * find_inode - resolve a user-given path to a specific inode
+ */
+int find_inode(const char __user *dirname, struct path *path, unsigned flags)
+{
+	int error;
+
+	error = user_path_at(AT_FDCWD, dirname, flags, path);
+	if (error)
+		return error;
+	/* you can only watch an inode if you have read permissions on it */
+	error = inode_permission(path->dentry->d_inode, MAY_READ);
+	if (error)
+		path_put(path);
+	return error;
+}
+
+/*
+ * When, for whatever reason, inotify is done with a mark (or what used to be a
+ * watch) we need to remove that watch from the idr and we need to send IN_IGNORED
+ * for the given wd.
+ *
+ * There is a bit of recursion here.  The loop looks like:
+ * 	inotify_destroy_mark_entry -> fsnotify_destroy_mark_by_entry ->
+ *	inotify_freeing_mark -> inotify_destory_mark_entry -> restart
+ * But the loop is broken in 2 places.  fsnotify_destroy_mark_by_entry sets
+ * entry->group = NULL before the call to inotify_freeing_mark, so the if (egroup)
+ * test below will not call back to fsnotify again.  But even if that test wasn't
+ * there this would still be safe since fsnotify_destroy_mark_by_entry() is
+ * safe from recursion.
+ */
+void inotify_destroy_mark_entry(struct fsnotify_mark_entry *entry, struct fsnotify_group *group)
+{
+	struct inotify_inode_mark_entry *ientry;
+	struct inotify_event_private_data *event_priv;
+	struct fsnotify_group *egroup;
+	struct idr *idr;
+	int ret;
+
+	spin_lock(&entry->lock);
+	egroup = entry->group;
+
+	/* if egroup we aren't really done and something might still send events
+	 * for this inode, on the callback we'll send the IN_IGNORED */
+	if (egroup) {
+		spin_unlock(&entry->lock);
+		fsnotify_destroy_mark_by_entry(entry);
+		return;
+	}
+	spin_unlock(&entry->lock);
+
+	ientry = container_of(entry, struct inotify_inode_mark_entry, fsn_entry);
+
+	event_priv = kmem_cache_alloc(event_priv_cachep, GFP_KERNEL);
+	if (unlikely(!event_priv))
+		goto skip_send_ignore;
+
+	event_priv->fsnotify_event_priv_data.group = group;
+	event_priv->wd = ientry->wd;
+
+	ret = fsnotify_add_notif_event(group, inotify_ignored_event, &event_priv->fsnotify_event_priv_data);
+	if (ret)
+		__inotify_free_event_priv(event_priv);
+
+skip_send_ignore:
+
+	/* remove this entry from the idr */
+	spin_lock(&group->inotify_data.idr_lock);
+	idr = &group->inotify_data.idr;
+	idr_remove(idr, ientry->wd);
+	spin_unlock(&group->inotify_data.idr_lock);
+
+	/* removed from idr, drop that reference */
+	fsnotify_put_mark(entry);
+}
+
+/* ding dong the mark is dead */
+static void inotify_free_mark(struct fsnotify_mark_entry *entry)
+{
+	struct inotify_inode_mark_entry *ientry = (struct inotify_inode_mark_entry *)entry;
+
+	kmem_cache_free(inotify_inode_mark_cachep, ientry);
+}
+
+int inotify_update_watch(struct fsnotify_group *group, struct inode *inode, u32 arg)
+{
+	struct fsnotify_mark_entry *entry = NULL;
+	struct inotify_inode_mark_entry *ientry;
+	int ret = 0;
+	int add = (arg & IN_MASK_ADD);
+	__u32 mask;
+	__u32 old_mask, new_mask;
+
+	/* don't allow invalid bits: we don't want flags set */
+	mask = inotify_arg_to_mask(arg);
+	if (unlikely(!mask))
+		return -EINVAL;
+
+	ientry = kmem_cache_alloc(inotify_inode_mark_cachep, GFP_KERNEL);
+	if (unlikely(!ientry))
+		return -ENOMEM;
+	/* we set the mask at the end after attaching it */
+	fsnotify_init_mark(&ientry->fsn_entry, inotify_free_mark);
+	ientry->wd = 0;
+
+find_entry:
+	spin_lock(&inode->i_lock);
+	entry = fsnotify_find_mark_entry(group, inode);
+	spin_unlock(&inode->i_lock);
+	if (entry) {
+		kmem_cache_free(inotify_inode_mark_cachep, ientry);
+		ientry = container_of(entry, struct inotify_inode_mark_entry, fsn_entry);
+	} else {
+		if (atomic_read(&group->inotify_data.user->inotify_watches) >= inotify_max_user_watches) {
+			ret = -ENOSPC;
+			goto out_err;
+		}
+
+		ret = fsnotify_add_mark(&ientry->fsn_entry, group, inode);
+		if (ret == -EEXIST)
+			goto find_entry;
+		else if (ret)
+			goto out_err;
+
+		entry = &ientry->fsn_entry;
+retry:
+		ret = -ENOMEM;
+		if (unlikely(!idr_pre_get(&group->inotify_data.idr, GFP_KERNEL)))
+			goto out_err;
+
+		spin_lock(&group->inotify_data.idr_lock);
+		/* if entry is added to the idr we keep the reference obtained
+		 * through fsnotify_mark_add.  remember to drop this reference
+		 * when entry is removed from idr */
+		ret = idr_get_new_above(&group->inotify_data.idr, entry,
+					++group->inotify_data.last_wd,
+					&ientry->wd);
+		spin_unlock(&group->inotify_data.idr_lock);
+		if (ret) {
+			if (ret == -EAGAIN)
+				goto retry;
+			goto out_err;
+		}
+		atomic_inc(&group->inotify_data.user->inotify_watches);
+	}
+
+	spin_lock(&entry->lock);
+
+	old_mask = entry->mask;
+	if (add) {
+		entry->mask |= mask;
+		new_mask = entry->mask;
+	} else {
+		entry->mask = mask;
+		new_mask = entry->mask;
+	}
+
+	spin_unlock(&entry->lock);
+
+	if (old_mask != new_mask) {
+		/* more bits in old than in new? */
+		int dropped = (old_mask & ~new_mask);
+		/* more bits in this entry than the inode's mask? */
+		int do_inode = (new_mask & ~inode->i_fsnotify_mask);
+		/* more bits in this entry than the group? */
+		int do_group = (new_mask & ~group->mask);
+
+		/* update the inode with this new entry */
+		if (dropped || do_inode)
+			fsnotify_recalc_inode_mask(inode);
+
+		/* update the group mask with the new mask */
+		if (dropped || do_group)
+			fsnotify_recalc_group_mask(group);
+	}
+
+	return ientry->wd;
+
+out_err:
+	/* see this isn't supposed to happen, just kill the watch */
+	if (entry) {
+		fsnotify_destroy_mark_by_entry(entry);
+		fsnotify_put_mark(entry);
+	}
+	return ret;
+}
+
+struct fsnotify_group *inotify_new_group(struct user_struct *user, unsigned int max_events)
+{
+	struct fsnotify_group *group;
+	unsigned int grp_num;
+
+	/* fsnotify_obtain_group took a reference to group, we put this when we kill the file in the end */
+	grp_num = (INOTIFY_GROUP_NUM - atomic_inc_return(&inotify_grp_num));
+	group = fsnotify_obtain_group(grp_num, grp_num, 0, &inotify_fsnotify_ops);
+	if (IS_ERR(group))
+		return group;
+
+	group->max_events = max_events;
+
+	spin_lock_init(&group->inotify_data.idr_lock);
+	idr_init(&group->inotify_data.idr);
+	group->inotify_data.last_wd = 0;
+	group->inotify_data.user = user;
+	group->inotify_data.fa = NULL;
+
+	return group;
+}
+
+static int __init inotify_kernel_setup(void)
+{
+	inotify_inode_mark_cachep = kmem_cache_create("inotify_mark_entry",
+					sizeof(struct inotify_inode_mark_entry),
+					0, SLAB_PANIC, NULL);
+	event_priv_cachep = kmem_cache_create("inotify_event_priv_cache",
+					sizeof(struct inotify_event_private_data),
+					0, SLAB_PANIC, NULL);
+	inotify_ignored_event = fsnotify_create_event(NULL, FS_IN_IGNORED, NULL, FSNOTIFY_EVENT_INODE, NULL, 0);
+	if (!inotify_ignored_event)
+		panic("unable to allocate the inotify ignored event\n");
+	return 0;
+}
+subsys_initcall(inotify_kernel_setup);
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index bed766e..906c03f 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -8,6 +8,9 @@
  * Copyright (C) 2005 John McCutchan
  * Copyright 2006 Hewlett-Packard Development Company, L.P.
  *
+ * Copyright (C) 2009 Eric Paris <Red Hat Inc>
+ * inotify was largely rewriten to make use of the fsnotify infrastructure
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the
  * Free Software Foundation; either version 2, or (at your option) any
@@ -24,89 +27,32 @@
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/file.h>
+#include <linux/limits.h>
+#include <linux/module.h>
 #include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/poll.h>
 #include <linux/init.h>
-#include <linux/list.h>
 #include <linux/inotify.h>
+#include <linux/list.h>
 #include <linux/syscalls.h>
+#include <linux/string.h>
 #include <linux/magic.h>
+#include <linux/writeback.h>
 
-#include <asm/ioctls.h>
+#include "inotify.h"
 
-static struct kmem_cache *watch_cachep __read_mostly;
-static struct kmem_cache *event_cachep __read_mostly;
+#include <asm/ioctls.h>
 
 static struct vfsmount *inotify_mnt __read_mostly;
 
+/* this just sits here and wastes global memory.  used to just pad userspace messages with zeros */
+static struct inotify_event nul_inotify_event;
+
 /* these are configurable via /proc/sys/fs/inotify/ */
 static int inotify_max_user_instances __read_mostly;
-static int inotify_max_user_watches __read_mostly;
 static int inotify_max_queued_events __read_mostly;
-
-/*
- * Lock ordering:
- *
- * inotify_dev->up_mutex (ensures we don't re-add the same watch)
- * 	inode->inotify_mutex (protects inode's watch list)
- * 		inotify_handle->mutex (protects inotify_handle's watch list)
- * 			inotify_dev->ev_mutex (protects device's event queue)
- */
-
-/*
- * Lifetimes of the main data structures:
- *
- * inotify_device: Lifetime is managed by reference count, from
- * sys_inotify_init() until release.  Additional references can bump the count
- * via get_inotify_dev() and drop the count via put_inotify_dev().
- *
- * inotify_user_watch: Lifetime is from create_watch() to the receipt of an
- * IN_IGNORED event from inotify, or when using IN_ONESHOT, to receipt of the
- * first event, or to inotify_destroy().
- */
-
-/*
- * struct inotify_device - represents an inotify instance
- *
- * This structure is protected by the mutex 'mutex'.
- */
-struct inotify_device {
-	wait_queue_head_t 	wq;		/* wait queue for i/o */
-	struct mutex		ev_mutex;	/* protects event queue */
-	struct mutex		up_mutex;	/* synchronizes watch updates */
-	struct list_head 	events;		/* list of queued events */
-	struct user_struct	*user;		/* user who opened this dev */
-	struct inotify_handle	*ih;		/* inotify handle */
-	struct fasync_struct    *fa;            /* async notification */
-	atomic_t		count;		/* reference count */
-	unsigned int		queue_size;	/* size of the queue (bytes) */
-	unsigned int		event_count;	/* number of pending events */
-	unsigned int		max_events;	/* maximum number of events */
-};
-
-/*
- * struct inotify_kernel_event - An inotify event, originating from a watch and
- * queued for user-space.  A list of these is attached to each instance of the
- * device.  In read(), this list is walked and all events that can fit in the
- * buffer are returned.
- *
- * Protected by dev->ev_mutex of the device in which we are queued.
- */
-struct inotify_kernel_event {
-	struct inotify_event	event;	/* the user-space event */
-	struct list_head        list;	/* entry in inotify_device's list */
-	char			*name;	/* filename, if any */
-};
-
-/*
- * struct inotify_user_watch - our version of an inotify_watch, we add
- * a reference to the associated inotify_device.
- */
-struct inotify_user_watch {
-	struct inotify_device	*dev;	/* associated device */
-	struct inotify_watch	wdata;	/* inotify watch data */
-};
+int inotify_max_user_watches __read_mostly;
 
 #ifdef CONFIG_SYSCTL
 
@@ -149,280 +95,17 @@ ctl_table inotify_table[] = {
 };
 #endif /* CONFIG_SYSCTL */
 
-static inline void get_inotify_dev(struct inotify_device *dev)
-{
-	atomic_inc(&dev->count);
-}
-
-static inline void put_inotify_dev(struct inotify_device *dev)
-{
-	if (atomic_dec_and_test(&dev->count)) {
-		atomic_dec(&dev->user->inotify_devs);
-		free_uid(dev->user);
-		kfree(dev);
-	}
-}
-
-/*
- * free_inotify_user_watch - cleans up the watch and its references
- */
-static void free_inotify_user_watch(struct inotify_watch *w)
-{
-	struct inotify_user_watch *watch;
-	struct inotify_device *dev;
-
-	watch = container_of(w, struct inotify_user_watch, wdata);
-	dev = watch->dev;
-
-	atomic_dec(&dev->user->inotify_watches);
-	put_inotify_dev(dev);
-	kmem_cache_free(watch_cachep, watch);
-}
-
-/*
- * kernel_event - create a new kernel event with the given parameters
- *
- * This function can sleep.
- */
-static struct inotify_kernel_event * kernel_event(s32 wd, u32 mask, u32 cookie,
-						  const char *name)
-{
-	struct inotify_kernel_event *kevent;
-
-	kevent = kmem_cache_alloc(event_cachep, GFP_NOFS);
-	if (unlikely(!kevent))
-		return NULL;
-
-	/* we hand this out to user-space, so zero it just in case */
-	memset(&kevent->event, 0, sizeof(struct inotify_event));
-
-	kevent->event.wd = wd;
-	kevent->event.mask = mask;
-	kevent->event.cookie = cookie;
-
-	INIT_LIST_HEAD(&kevent->list);
-
-	if (name) {
-		size_t len, rem, event_size = sizeof(struct inotify_event);
-
-		/*
-		 * We need to pad the filename so as to properly align an
-		 * array of inotify_event structures.  Because the structure is
-		 * small and the common case is a small filename, we just round
-		 * up to the next multiple of the structure's sizeof.  This is
-		 * simple and safe for all architectures.
-		 */
-		len = strlen(name) + 1;
-		rem = event_size - len;
-		if (len > event_size) {
-			rem = event_size - (len % event_size);
-			if (len % event_size == 0)
-				rem = 0;
-		}
-
-		kevent->name = kmalloc(len + rem, GFP_KERNEL);
-		if (unlikely(!kevent->name)) {
-			kmem_cache_free(event_cachep, kevent);
-			return NULL;
-		}
-		memcpy(kevent->name, name, len);
-		if (rem)
-			memset(kevent->name + len, 0, rem);
-		kevent->event.len = len + rem;
-	} else {
-		kevent->event.len = 0;
-		kevent->name = NULL;
-	}
-
-	return kevent;
-}
-
-/*
- * inotify_dev_get_event - return the next event in the given dev's queue
- *
- * Caller must hold dev->ev_mutex.
- */
-static inline struct inotify_kernel_event *
-inotify_dev_get_event(struct inotify_device *dev)
-{
-	return list_entry(dev->events.next, struct inotify_kernel_event, list);
-}
-
-/*
- * inotify_dev_get_last_event - return the last event in the given dev's queue
- *
- * Caller must hold dev->ev_mutex.
- */
-static inline struct inotify_kernel_event *
-inotify_dev_get_last_event(struct inotify_device *dev)
-{
-	if (list_empty(&dev->events))
-		return NULL;
-	return list_entry(dev->events.prev, struct inotify_kernel_event, list);
-}
-
-/*
- * inotify_dev_queue_event - event handler registered with core inotify, adds
- * a new event to the given device
- *
- * Can sleep (calls kernel_event()).
- */
-static void inotify_dev_queue_event(struct inotify_watch *w, u32 wd, u32 mask,
-				    u32 cookie, const char *name,
-				    struct inode *ignored)
-{
-	struct inotify_user_watch *watch;
-	struct inotify_device *dev;
-	struct inotify_kernel_event *kevent, *last;
-
-	watch = container_of(w, struct inotify_user_watch, wdata);
-	dev = watch->dev;
-
-	mutex_lock(&dev->ev_mutex);
-
-	/* we can safely put the watch as we don't reference it while
-	 * generating the event
-	 */
-	if (mask & IN_IGNORED || w->mask & IN_ONESHOT)
-		put_inotify_watch(w); /* final put */
-
-	/* coalescing: drop this event if it is a dupe of the previous */
-	last = inotify_dev_get_last_event(dev);
-	if (last && last->event.mask == mask && last->event.wd == wd &&
-			last->event.cookie == cookie) {
-		const char *lastname = last->name;
-
-		if (!name && !lastname)
-			goto out;
-		if (name && lastname && !strcmp(lastname, name))
-			goto out;
-	}
-
-	/* the queue overflowed and we already sent the Q_OVERFLOW event */
-	if (unlikely(dev->event_count > dev->max_events))
-		goto out;
-
-	/* if the queue overflows, we need to notify user space */
-	if (unlikely(dev->event_count == dev->max_events))
-		kevent = kernel_event(-1, IN_Q_OVERFLOW, cookie, NULL);
-	else
-		kevent = kernel_event(wd, mask, cookie, name);
-
-	if (unlikely(!kevent))
-		goto out;
-
-	/* queue the event and wake up anyone waiting */
-	dev->event_count++;
-	dev->queue_size += sizeof(struct inotify_event) + kevent->event.len;
-	list_add_tail(&kevent->list, &dev->events);
-	wake_up_interruptible(&dev->wq);
-	kill_fasync(&dev->fa, SIGIO, POLL_IN);
-
-out:
-	mutex_unlock(&dev->ev_mutex);
-}
-
-/*
- * remove_kevent - cleans up the given kevent
- *
- * Caller must hold dev->ev_mutex.
- */
-static void remove_kevent(struct inotify_device *dev,
-			  struct inotify_kernel_event *kevent)
-{
-	list_del(&kevent->list);
-
-	dev->event_count--;
-	dev->queue_size -= sizeof(struct inotify_event) + kevent->event.len;
-}
-
-/*
- * free_kevent - frees the given kevent.
- */
-static void free_kevent(struct inotify_kernel_event *kevent)
-{
-	kfree(kevent->name);
-	kmem_cache_free(event_cachep, kevent);
-}
-
-/*
- * inotify_dev_event_dequeue - destroy an event on the given device
- *
- * Caller must hold dev->ev_mutex.
- */
-static void inotify_dev_event_dequeue(struct inotify_device *dev)
-{
-	if (!list_empty(&dev->events)) {
-		struct inotify_kernel_event *kevent;
-		kevent = inotify_dev_get_event(dev);
-		remove_kevent(dev, kevent);
-		free_kevent(kevent);
-	}
-}
-
-/*
- * find_inode - resolve a user-given path to a specific inode
- */
-static int find_inode(const char __user *dirname, struct path *path,
-		      unsigned flags)
-{
-	int error;
-
-	error = user_path_at(AT_FDCWD, dirname, flags, path);
-	if (error)
-		return error;
-	/* you can only watch an inode if you have read permissions on it */
-	error = inode_permission(path->dentry->d_inode, MAY_READ);
-	if (error)
-		path_put(path);
-	return error;
-}
-
-/*
- * create_watch - creates a watch on the given device.
- *
- * Callers must hold dev->up_mutex.
- */
-static int create_watch(struct inotify_device *dev, struct inode *inode,
-			u32 mask)
-{
-	struct inotify_user_watch *watch;
-	int ret;
-
-	if (atomic_read(&dev->user->inotify_watches) >=
-			inotify_max_user_watches)
-		return -ENOSPC;
-
-	watch = kmem_cache_alloc(watch_cachep, GFP_KERNEL);
-	if (unlikely(!watch))
-		return -ENOMEM;
-
-	/* save a reference to device and bump the count to make it official */
-	get_inotify_dev(dev);
-	watch->dev = dev;
-
-	atomic_inc(&dev->user->inotify_watches);
-
-	inotify_init_watch(&watch->wdata);
-	ret = inotify_add_watch(dev->ih, &watch->wdata, inode, mask);
-	if (ret < 0)
-		free_inotify_user_watch(&watch->wdata);
-
-	return ret;
-}
-
-/* Device Interface */
-
+/* intofiy userspace file descriptor functions */
 static unsigned int inotify_poll(struct file *file, poll_table *wait)
 {
-	struct inotify_device *dev = file->private_data;
+	struct fsnotify_group *group = file->private_data;
 	int ret = 0;
 
-	poll_wait(file, &dev->wq, wait);
-	mutex_lock(&dev->ev_mutex);
-	if (!list_empty(&dev->events))
+	poll_wait(file, &group->notification_waitq, wait);
+	mutex_lock(&group->notification_mutex);
+	if (fsnotify_check_notif_queue(group))
 		ret = POLLIN | POLLRDNORM;
-	mutex_unlock(&dev->ev_mutex);
+	mutex_unlock(&group->notification_mutex);
 
 	return ret;
 }
@@ -432,26 +115,29 @@ static unsigned int inotify_poll(struct file *file, poll_table *wait)
  * enough to fit in "count". Return an error pointer if
  * not large enough.
  *
- * Called with the device ev_mutex held.
+ * Called with the group->notification_mutex held.
  */
-static struct inotify_kernel_event *get_one_event(struct inotify_device *dev,
-						  size_t count)
+static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
+					    size_t count)
 {
 	size_t event_size = sizeof(struct inotify_event);
-	struct inotify_kernel_event *kevent;
+	struct fsnotify_event *event;
 
-	if (list_empty(&dev->events))
+	if (!fsnotify_check_notif_queue(group))
 		return NULL;
 
-	kevent = inotify_dev_get_event(dev);
-	if (kevent->name)
-		event_size += kevent->event.len;
+	event = fsnotify_peek_notif_event(group);
+
+	event_size += roundup(event->name_len, event_size);
 
 	if (event_size > count)
 		return ERR_PTR(-EINVAL);
 
-	remove_kevent(dev, kevent);
-	return kevent;
+	/* held the notification_mutex the whole time, so this is the
+	 * same event we peeked above */
+	fsnotify_remove_notif_event(group);
+
+	return event;
 }
 
 /*
@@ -460,51 +146,82 @@ static struct inotify_kernel_event *get_one_event(struct inotify_device *dev,
  * We already checked that the event size is smaller than the
  * buffer we had in "get_one_event()" above.
  */
-static ssize_t copy_event_to_user(struct inotify_kernel_event *kevent,
+static ssize_t copy_event_to_user(struct fsnotify_group *group,
+				  struct fsnotify_event *event,
 				  char __user *buf)
 {
+	struct inotify_event inotify_event;
+	struct inotify_event_private_data *priv;
 	size_t event_size = sizeof(struct inotify_event);
+	size_t name_len;
+
+	/* we get the inotify watch descriptor from the event private data */
+	spin_lock(&event->lock);
+	priv = (struct inotify_event_private_data *)fsnotify_get_priv_from_event(group, event);
+	inotify_event.wd = priv->wd;
+	__inotify_free_event_priv(priv);
+	spin_unlock(&event->lock);
+
+	/* round up event->name_len so it is a multiple of event_size */
+	name_len = roundup(event->name_len, event_size);
+	inotify_event.len = name_len;
 
-	if (copy_to_user(buf, &kevent->event, event_size))
+	inotify_event.mask = inotify_mask_to_arg(event->mask);
+	inotify_event.cookie = event->sync_cookie;
+
+	/* send the main event */
+	if (copy_to_user(buf, &inotify_event, event_size))
 		return -EFAULT;
 
-	if (kevent->name) {
-		buf += event_size;
+	buf += event_size;
 
-		if (copy_to_user(buf, kevent->name, kevent->event.len))
+	/*
+	 * fsnotify only stores the pathname, so here we have to send the pathname
+	 * and then pad that pathname out to a multiple of sizeof(inotify_event)
+	 * with zeros.  I get my zeros from the nul_inotify_event.
+	 */
+	if (name_len) {
+		unsigned int len_to_zero = name_len - event->name_len;
+		/* copy the path name */
+		if (copy_to_user(buf, event->file_name, event->name_len))
 			return -EFAULT;
+		buf += event->name_len;
 
-		event_size += kevent->event.len;
+		/* fill userspace with 0's from nul_inotify_event */
+		if (copy_to_user(buf, &nul_inotify_event, len_to_zero))
+			return -EFAULT;
+		buf += len_to_zero;
+		event_size += name_len;
 	}
+
 	return event_size;
 }
 
 static ssize_t inotify_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *pos)
 {
-	struct inotify_device *dev;
+	struct fsnotify_group *group;
+	struct fsnotify_event *kevent;
 	char __user *start;
 	int ret;
 	DEFINE_WAIT(wait);
 
 	start = buf;
-	dev = file->private_data;
+	group = file->private_data;
 
 	while (1) {
-		struct inotify_kernel_event *kevent;
+		prepare_to_wait(&group->notification_waitq, &wait, TASK_INTERRUPTIBLE);
 
-		prepare_to_wait(&dev->wq, &wait, TASK_INTERRUPTIBLE);
-
-		mutex_lock(&dev->ev_mutex);
-		kevent = get_one_event(dev, count);
-		mutex_unlock(&dev->ev_mutex);
+		mutex_lock(&group->notification_mutex);
+		kevent = get_one_event(group, count);
+		mutex_unlock(&group->notification_mutex);
 
 		if (kevent) {
 			ret = PTR_ERR(kevent);
 			if (IS_ERR(kevent))
 				break;
-			ret = copy_event_to_user(kevent, buf);
-			free_kevent(kevent);
+			ret = copy_event_to_user(group, kevent, buf);
+			fsnotify_put_event(kevent);
 			if (ret < 0)
 				break;
 			buf += ret;
@@ -525,7 +242,7 @@ static ssize_t inotify_read(struct file *file, char __user *buf,
 		schedule();
 	}
 
-	finish_wait(&dev->wq, &wait);
+	finish_wait(&group->notification_waitq, &wait);
 	if (start != buf && ret != -EFAULT)
 		ret = buf - start;
 	return ret;
@@ -533,25 +250,19 @@ static ssize_t inotify_read(struct file *file, char __user *buf,
 
 static int inotify_fasync(int fd, struct file *file, int on)
 {
-	struct inotify_device *dev = file->private_data;
+	struct fsnotify_group *group = file->private_data;
 
-	return fasync_helper(fd, file, on, &dev->fa) >= 0 ? 0 : -EIO;
+	return fasync_helper(fd, file, on, &group->inotify_data.fa) >= 0 ? 0 : -EIO;
 }
 
 static int inotify_release(struct inode *ignored, struct file *file)
 {
-	struct inotify_device *dev = file->private_data;
-
-	inotify_destroy(dev->ih);
+	struct fsnotify_group *group = file->private_data;
 
-	/* destroy all of the events on this device */
-	mutex_lock(&dev->ev_mutex);
-	while (!list_empty(&dev->events))
-		inotify_dev_event_dequeue(dev);
-	mutex_unlock(&dev->ev_mutex);
+	fsnotify_clear_marks_by_group(group);
 
-	/* free this device: the put matching the get in inotify_init() */
-	put_inotify_dev(dev);
+	/* free this group, matching get was inotify_init->fsnotify_obtain_group */
+	fsnotify_put_group(group);
 
 	return 0;
 }
@@ -559,16 +270,25 @@ static int inotify_release(struct inode *ignored, struct file *file)
 static long inotify_ioctl(struct file *file, unsigned int cmd,
 			  unsigned long arg)
 {
-	struct inotify_device *dev;
+	struct fsnotify_group *group;
+	struct fsnotify_event_holder *holder;
+	struct fsnotify_event *event;
 	void __user *p;
 	int ret = -ENOTTY;
+	size_t send_len = 0;
 
-	dev = file->private_data;
+	group = file->private_data;
 	p = (void __user *) arg;
 
 	switch (cmd) {
 	case FIONREAD:
-		ret = put_user(dev->queue_size, (int __user *) p);
+		mutex_lock(&group->notification_mutex);
+		list_for_each_entry(holder, &group->notification_list, event_list) {
+			event = holder->event;
+			send_len += sizeof(struct inotify_event) + event->name_len;
+		}
+		mutex_unlock(&group->notification_mutex);
+		ret = put_user(send_len, (int __user *) p);
 		break;
 	}
 
@@ -576,23 +296,18 @@ static long inotify_ioctl(struct file *file, unsigned int cmd,
 }
 
 static const struct file_operations inotify_fops = {
-	.poll           = inotify_poll,
-	.read           = inotify_read,
-	.fasync         = inotify_fasync,
-	.release        = inotify_release,
-	.unlocked_ioctl = inotify_ioctl,
+	.poll		= inotify_poll,
+	.read		= inotify_read,
+	.fasync		= inotify_fasync,
+	.release	= inotify_release,
+	.unlocked_ioctl	= inotify_ioctl,
 	.compat_ioctl	= inotify_ioctl,
 };
 
-static const struct inotify_operations inotify_user_ops = {
-	.handle_event	= inotify_dev_queue_event,
-	.destroy_watch	= free_inotify_user_watch,
-};
-
+/* inotify syscalls */
 SYSCALL_DEFINE1(inotify_init1, int, flags)
 {
-	struct inotify_device *dev;
-	struct inotify_handle *ih;
+	struct fsnotify_group *group;
 	struct user_struct *user;
 	struct file *filp;
 	int fd, ret;
@@ -621,45 +336,27 @@ SYSCALL_DEFINE1(inotify_init1, int, flags)
 		goto out_free_uid;
 	}
 
-	dev = kmalloc(sizeof(struct inotify_device), GFP_KERNEL);
-	if (unlikely(!dev)) {
-		ret = -ENOMEM;
+	/* fsnotify_obtain_group took a reference to group, we put this when we kill the file in the end */
+	group = inotify_new_group(user, inotify_max_queued_events);
+	if (IS_ERR(group)) {
+		ret = PTR_ERR(group);
 		goto out_free_uid;
 	}
 
-	ih = inotify_init(&inotify_user_ops);
-	if (IS_ERR(ih)) {
-		ret = PTR_ERR(ih);
-		goto out_free_dev;
-	}
-	dev->ih = ih;
-	dev->fa = NULL;
-
 	filp->f_op = &inotify_fops;
 	filp->f_path.mnt = mntget(inotify_mnt);
 	filp->f_path.dentry = dget(inotify_mnt->mnt_root);
 	filp->f_mapping = filp->f_path.dentry->d_inode->i_mapping;
 	filp->f_mode = FMODE_READ;
 	filp->f_flags = O_RDONLY | (flags & O_NONBLOCK);
-	filp->private_data = dev;
-
-	INIT_LIST_HEAD(&dev->events);
-	init_waitqueue_head(&dev->wq);
-	mutex_init(&dev->ev_mutex);
-	mutex_init(&dev->up_mutex);
-	dev->event_count = 0;
-	dev->queue_size = 0;
-	dev->max_events = inotify_max_queued_events;
-	dev->user = user;
-	atomic_set(&dev->count, 0);
-
-	get_inotify_dev(dev);
+	filp->private_data = group;
+
 	atomic_inc(&user->inotify_devs);
+
 	fd_install(fd, filp);
 
 	return fd;
-out_free_dev:
-	kfree(dev);
+
 out_free_uid:
 	free_uid(user);
 	put_filp(filp);
@@ -676,8 +373,8 @@ SYSCALL_DEFINE0(inotify_init)
 SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 		u32, mask)
 {
+	struct fsnotify_group *group;
 	struct inode *inode;
-	struct inotify_device *dev;
 	struct path path;
 	struct file *filp;
 	int ret, fput_needed;
@@ -699,19 +396,19 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
 		flags |= LOOKUP_DIRECTORY;
 
 	ret = find_inode(pathname, &path, flags);
-	if (unlikely(ret))
+	if (ret)
 		goto fput_and_out;
 
-	/* inode held in place by reference to path; dev by fget on fd */
+	/* inode held in place by reference to path; group by fget on fd */
 	inode = path.dentry->d_inode;
-	dev = filp->private_data;
+	group = filp->private_data;
 
-	mutex_lock(&dev->up_mutex);
-	ret = inotify_find_update_watch(dev->ih, inode, mask);
-	if (ret == -ENOENT)
-		ret = create_watch(dev, inode, mask);
-	mutex_unlock(&dev->up_mutex);
+	/* create/update an inode mark */
+	ret = inotify_update_watch(group, inode, mask);
+	if (unlikely(ret))
+		goto path_put_and_out;
 
+path_put_and_out:
 	path_put(&path);
 fput_and_out:
 	fput_light(filp, fput_needed);
@@ -720,9 +417,10 @@ fput_and_out:
 
 SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 {
+	struct fsnotify_group *group;
+	struct fsnotify_mark_entry *entry;
 	struct file *filp;
-	struct inotify_device *dev;
-	int ret, fput_needed;
+	int ret = 0, fput_needed;
 
 	filp = fget_light(fd, &fput_needed);
 	if (unlikely(!filp))
@@ -734,10 +432,20 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 		goto out;
 	}
 
-	dev = filp->private_data;
+	group = filp->private_data;
 
-	/* we free our watch data when we get IN_IGNORED */
-	ret = inotify_rm_wd(dev->ih, wd);
+	spin_lock(&group->inotify_data.idr_lock);
+	entry = idr_find(&group->inotify_data.idr, wd);
+	if (unlikely(!entry)) {
+		spin_unlock(&group->inotify_data.idr_lock);
+		ret = -EINVAL;
+		goto out;
+	}
+	fsnotify_get_mark(entry);
+	spin_unlock(&group->inotify_data.idr_lock);
+
+	inotify_destroy_mark_entry(entry, group);
+	fsnotify_put_mark(entry);
 
 out:
 	fput_light(filp, fput_needed);
@@ -753,9 +461,9 @@ inotify_get_sb(struct file_system_type *fs_type, int flags,
 }
 
 static struct file_system_type inotify_fs_type = {
-    .name           = "inotifyfs",
-    .get_sb         = inotify_get_sb,
-    .kill_sb        = kill_anon_super,
+    .name	= "inotifyfs",
+    .get_sb	= inotify_get_sb,
+    .kill_sb	= kill_anon_super,
 };
 
 /*
@@ -779,14 +487,6 @@ static int __init inotify_user_setup(void)
 	inotify_max_user_instances = 128;
 	inotify_max_user_watches = 8192;
 
-	watch_cachep = kmem_cache_create("inotify_watch_cache",
-					 sizeof(struct inotify_user_watch),
-					 0, SLAB_PANIC, NULL);
-	event_cachep = kmem_cache_create("inotify_event_cache",
-					 sizeof(struct inotify_kernel_event),
-					 0, SLAB_PANIC, NULL);
-
 	return 0;
 }
-
 module_init(inotify_user_setup);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f2daf59..2c42691 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -9,6 +9,7 @@
 
 #ifdef __KERNEL__
 
+#include <linux/idr.h> /* inotify uses this */
 #include <linux/fs.h> /* struct inode */
 #include <linux/list.h>
 #include <linux/path.h> /* struct path */
@@ -58,6 +59,7 @@
 
 /* listeners that hard code group numbers near the top */
 #define DNOTIFY_GROUP_NUM	UINT_MAX
+#define INOTIFY_GROUP_NUM	(DNOTIFY_GROUP_NUM-1)
 
 struct fsnotify_group;
 struct fsnotify_event;
@@ -115,6 +117,15 @@ struct fsnotify_group {
 
 	/* groups can define private fields here */
 	union {
+#ifdef CONFIG_INOTIFY_USER
+		struct inotify_group_private_data {
+			spinlock_t	idr_lock;
+			struct idr      idr;
+			u32             last_wd;
+			struct fasync_struct    *fa;    /* async notification */
+			struct user_struct      *user;
+		} inotify_data;
+#endif
 	};
 };
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 02/13] fsnotify: unified filesystem notification backend
  2009-03-27 20:05 ` [PATCH -V2 02/13] fsnotify: unified filesystem notification backend Eric Paris
@ 2009-04-07 23:05   ` Andrew Morton
  2009-04-08  0:37     ` Eric Paris
  2009-04-07 23:06   ` Andrew Morton
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:05 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:15 -0400
Eric Paris <eparis@redhat.com> wrote:

> fsnotify paves the way for fanotify.

It'd kinda help if the changelog were to tell us what fanotify is.

google takes me to http://lwn.net/Articles/306804/.  It's not
immediately obvious how fanotify differs from the existing stuff
(perhaps after some enhancements), apart from having a different
userspace interface?

Generally speaking it'd be nice if we were to be given a better
understanding of where all this is leading and what we can expect to
see as a consequence of merging this patch series.  If it cleans up the
existing stuff then that's cool, and might be sufficient grounds for a
merge.  But it's a bit of a worry if it commits us to merging things
which aren't well understood yet.


>  people may not care for the original
> companies that pushed for TALPA, but fanotify was designed with flexibility in
> mind.  A first group that wants fanotify like interfaces is the readahead
> group.  So they can be profiling at boot time in order to dynamicly tune
> readahead to help with boot speed.  I've got other ideas how to use fanotify
> to speed up boot when dealing with encrypted mounts, but I'm not ready to say
> it yet since I don't know if my idea will work.

TALPA is virus scanning.  But that didn't make it onto your list of
potential applications of fanotify?

I'm inclined to merge this patch series if only to get us an
inotify/dnotify maintainer ;)


General comment on the patches: complex.  I found them depressingly
hard to understand (and hence review) - the lack of high-level
commentary in the code is pretty severe.  There's a nice-looking design
document there, but like everyone else, I didn't look at it much ;)
It's not really a successful substitute for carefully-chosen comments
at the appropriate codesites and data structures.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock
  2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
                   ` (11 preceding siblings ...)
  2009-03-27 20:06 ` [PATCH -V2 13/13] inotify: reimplement inotify using fsnotify Eric Paris
@ 2009-04-07 23:06 ` Andrew Morton
  2009-04-28 20:08   ` Andrew Morton
  12 siblings, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:08 -0400
Eric Paris <eparis@redhat.com> wrote:

> Much like the atomic_dec_and_lock() function in which we take an hold a
> spin_lock if we drop the atomic to 0 this function takes and holds the
> mutex if we dec the atomic to 0.
> 
> Signed-off-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Paul Mackerras <paulus@samba.org>
> LKML-Reference: <20090323172417.410913479@chello.nl>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> 
>  include/linux/mutex.h |   23 +++++++++++++++++++++++
>  1 files changed, 23 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> index 3069ec7..93054fc 100644
> --- a/include/linux/mutex.h
> +++ b/include/linux/mutex.h
> @@ -151,4 +151,27 @@ extern int __must_check mutex_lock_killable(struct mutex *lock);
>  extern int mutex_trylock(struct mutex *lock);
>  extern void mutex_unlock(struct mutex *lock);
>  
> +/**
> + * atomic_dec_and_mutex_lock - return holding mutex if we dec to 0
> + * @cnt: the atomic which we are to dec
> + * @lock: the mutex to return holding if we dec to 0
> + *
> + * return true and hold lock if we dec to 0, return false otherwise
> + */
> +static inline int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock)
> +{
> +	/* dec if we can't possibly hit 0 */
> +	if (atomic_add_unless(cnt, -1, 1))
> +		return 0;
> +	/* we might hit 0, so take the lock */
> +	mutex_lock(lock);
> +	if (!atomic_dec_and_test(cnt)) {
> +		/* when we actually did the dec, we didn't hit 0 */
> +		mutex_unlock(lock);
> +		return 0;
> +	}
> +	/* we hit 0, and we hold the lock */
> +	return 1;
> +}
> +

This looks too large to be inlined?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 02/13] fsnotify: unified filesystem notification backend
  2009-03-27 20:05 ` [PATCH -V2 02/13] fsnotify: unified filesystem notification backend Eric Paris
  2009-04-07 23:05   ` Andrew Morton
@ 2009-04-07 23:06   ` Andrew Morton
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:15 -0400
Eric Paris <eparis@redhat.com> wrote:

> fsnotify is a backend for filesystem notification.  fsnotify does
> not provide any userspace interface but does provide the basis
> needed for other notification schemes such as dnotify.  fsnotify
> can be extended to be the backend for inotify or the upcoming
> fanotify.  fsnotify provides a mechanism for "groups" to register for
> some set of filesystem events and to then deliver those events to
> those groups for processing.
> 
> fsnotify has a number of benefits, the first being actually shrinking the size
> of an inode.  Before fsnotify to support both dnotify and inotify an inode had
> 
>         unsigned long           i_dnotify_mask; /* Directory notify events */
>         struct dnotify_struct   *i_dnotify; /* for directory notifications */
>         struct list_head        inotify_watches; /* watches on this inode */
>         struct mutex            inotify_mutex;  /* protects the watches list
> 
> But with fsnotify this same functionallity (and more) is done with just
> 
>         __u32                   i_fsnotify_mask; /* all events for this inode */
>         struct hlist_head       i_fsnotify_mark_entries; /* marks on this inode */
> 
> That's right, inotify, dnotify, and fanotify all in 64 bits.  We used that
> much space just in inotify_watches alone, before this patch set.
> 
> fsnotify object lifetime and locking is MUCH better than what we have today.
> inotify locking is incredibly complex.  See 8f7b0ba1c8539 as an example of
> what's been busted since inception.  inotify needs to know internal semantics
> of superblock destruction and unmounting to function.  The inode pinning and
> vfs contortions are horrible.
> 
> no fsnotify implementers do allocation under locks.  This means things like
> f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to
> GFP_NOFS can be reverted.  There are no longer any allocation rules when using
> or implementing your own fsnotify listener.
> 
> fsnotify paves the way for fanotify.  people may not care for the original
> companies that pushed for TALPA, but fanotify was designed with flexibility in
> mind.  A first group that wants fanotify like interfaces is the readahead
> group.  So they can be profiling at boot time in order to dynamicly tune
> readahead to help with boot speed.  I've got other ideas how to use fanotify
> to speed up boot when dealing with encrypted mounts, but I'm not ready to say
> it yet since I don't know if my idea will work.
> 
> This patch series just builds fsnotify to the point that it can implement
> dnotify and inotify_user.  Patches exist and will be sent soon after
> acceptance to finish the in kernel inotify conversion (audit) and implement
> fanotify.
> 
>
> ...
>
> +void fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is)
> +{
> +	struct fsnotify_group *group;
> +	struct fsnotify_event *event = NULL;
> +	int idx;
> +
> +	if (list_empty(&fsnotify_groups))
> +		return;
> +
> +	if (!(mask & fsnotify_mask))
> +		return;
> +
> +	/*
> +	 * SRCU!!  the groups list is very very much read only and the path is

I hinted to paulmck that he might like to review this ;)

> +	 * very hot.  The VAST majority of events are not going to need to do
> +	 * anything other than walk the list so it's crazy to pre-allocate.
> +	 */
> +	idx = srcu_read_lock(&fsnotify_grp_srcu);
> +	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
> +		if (mask & group->mask) {
> +			if (!event) {
> +				event = fsnotify_create_event(to_tell, mask, data, data_is);
> +				/* shit, we OOM'd and now we can't tell, maybe
> +				 * someday someone else will want to do something
> +				 * here */
> +				if (!event)
> +					break;
> +			}
> +			group->ops->handle_event(group, event);
> +		}
> +	}
> +	srcu_read_unlock(&fsnotify_grp_srcu, idx);
> +	/*
> +	 * fsnotify_create_event() took a reference so the event can't be cleaned
> +	 * up while we are still trying to add it to lists, drop that one.
> +	 */
> +	if (event)
> +		fsnotify_put_event(event);
> +}
>
> ...
>
> --- /dev/null
> +++ b/fs/notify/fsnotify.h
> @@ -0,0 +1,17 @@
> +#ifndef _LINUX_FSNOTIFY_PRIVATE_H
> +#define _LINUX_FSNOTIFY_PRIVATE_H

The #define doesn't match the filename?

> +#include <linux/dcache.h>
> +#include <linux/list.h>
> +#include <linux/fs.h>
> +#include <linux/path.h>
> +#include <linux/spinlock.h>
> +
>
> ...
>
> +
> +DEFINE_MUTEX(fsnotify_grp_mutex);

This has global scope, but isn't declared in fsnotify.h

> +struct srcu_struct fsnotify_grp_srcu;
> +LIST_HEAD(fsnotify_groups);
> +__u32 fsnotify_mask;

It's nice to provide comments explaining the role of key globals such
as these.  fsnotify_mask is particularly unobvious.

> +void fsnotify_recalc_global_mask(void)
> +{
> +	struct fsnotify_group *group;
> +	__u32 mask = 0;
> +	int idx;
> +
> +	idx = srcu_read_lock(&fsnotify_grp_srcu);
> +	list_for_each_entry_rcu(group, &fsnotify_groups, group_list) {
> +		mask |= group->mask;
> +	}

unneeded braces.

> +	srcu_read_unlock(&fsnotify_grp_srcu, idx);
> +	fsnotify_mask = mask;
> +}

What does this function do?

It's hard to review code when things such as this are left unexplained.

> +static void fsnotify_add_group(struct fsnotify_group *group)
> +{
> +	list_add_rcu(&group->group_list, &fsnotify_groups);
> +	group->evicted = 0;
> +}

Locking?  Seems to requrie that callers hold fsnotify_grp_mutex?

> +static void fsnotify_get_group(struct fsnotify_group *group)
> +{
> +	atomic_inc(&group->refcnt);
> +}
> +
> +static void fsnotify_destroy_group(struct fsnotify_group *group)
> +{
> +	if (group->ops->free_group_priv)
> +		group->ops->free_group_priv(group);
> +
> +	kfree(group);
> +}
> +
> +static void __fsnotify_evict_group(struct fsnotify_group *group)
> +{
> +	BUG_ON(!mutex_is_locked(&fsnotify_grp_mutex));
> +
> +	if (!group->evicted)
> +		list_del_rcu(&group->group_list);
> +	group->evicted = 1;
> +}

Why is this called "evict"?  In Linux, the term "eviction" implies some
sort of involuntary asynchronous reclaimation.  But afaict (and lacking
explanatory documentation) this function seems to be a plain old
freeing function.  So why is it not called fsnotify_free_group()?

This is all a bit unaproachable.

> +void fsnotify_evict_group(struct fsnotify_group *group)
> +{
> +	mutex_lock(&fsnotify_grp_mutex);
> +	__fsnotify_evict_group(group);
> +	mutex_unlock(&fsnotify_grp_mutex);
> +}
> +
> +void fsnotify_put_group(struct fsnotify_group *group)
> +{
> +	if (!atomic_dec_and_mutex_lock(&group->refcnt, &fsnotify_grp_mutex))
> +		return;
> +
> +	/* OK, now we know that there's no other users *and* we hold mutex,
> +	 * so no new references will appear */

The usual commenting style is

	/*
	 * OK, now we know that there's no other users *and* we hold mutex,
	 * so no new references will appear
	 */

> +	__fsnotify_evict_group(group);
> +
> +	/* now it's off the list, so the only thing we might care about is
> +	 * srcu acces.... */

"access"

> +	mutex_unlock(&fsnotify_grp_mutex);
> +	synchronize_srcu(&fsnotify_grp_srcu);
> +
> +	/* and now it is really dead. _Nothing_ could be seeing it */
> +	fsnotify_recalc_global_mask();
> +	fsnotify_destroy_group(group);
> +}
> +
>
> ...
>
> +/*
> + * Either finds an existing group which matches the group_num, mask, and ops or
> + * creates a new group and adds it to the global group list.  In either case we
> + * take a reference for the group returned.
> + *
> + * low use function, could be faster to check if the group is there before we do
> + * the allocation and the initialization, but this is only called when notification
> + * systems make changes, so why make it more complex?

Yup.  But that would seem to be a pretty simple change to make?

> + */
> +struct fsnotify_group *fsnotify_obtain_group(unsigned int group_num, __u32 mask,
> +					     const struct fsnotify_ops *ops)
> +{
> +	struct fsnotify_group *group, *tgroup;
> +
> +	group = kmalloc(sizeof(struct fsnotify_group), GFP_KERNEL);
> +	if (!group)
> +		return ERR_PTR(-ENOMEM);
> +
> +	atomic_set(&group->refcnt, 1);
> +
> +	group->group_num = group_num;
> +	group->mask = mask;
> +
> +	group->ops = ops;
> +
> +	mutex_lock(&fsnotify_grp_mutex);
> +	tgroup = fsnotify_find_group(group_num, mask, ops);
> +	if (tgroup) {
> +		/* group already exists */
> +		mutex_unlock(&fsnotify_grp_mutex);
> +		/* destroy the new one we made */
> +		fsnotify_put_group(group);
> +		return tgroup;
> +	}
> +
> +	/* group not found, add a new one */
> +	fsnotify_add_group(group);

This is the only fsnotify_add_group() callsite and it's just two lines. 
Open-code it here?

> +	mutex_unlock(&fsnotify_grp_mutex);
> +
> +	if (mask)
> +		fsnotify_recalc_global_mask();

I'd understand this if I knew what fsnotify_mask does :(

> +	return group;
> +}
>
> ...
>
> +void fsnotify_put_event(struct fsnotify_event *event)
> +{
> +	if (!event)
> +		return;
> +
> +	if (atomic_dec_and_test(&event->refcnt)) {
> +		if (event->data_type == FSNOTIFY_EVENT_PATH) {
> +			path_put(&event->path);
> +			event->path.dentry = NULL;
> +			event->path.mnt = NULL;

Why are these fields zeroed here?  If it's for debugging then slab
poisoning should suffice?

> +		}
> +
> +		event->mask = 0;
> +
> +		kmem_cache_free(event_kmem_cache, event);
> +	}
> +}
> +
> +struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type)
> +{
> +	struct fsnotify_event *event;
> +
> +	event = kmem_cache_alloc(event_kmem_cache, GFP_KERNEL);
> +	if (!event)
> +		return NULL;
> +
> +	atomic_set(&event->refcnt, 1);
> +
> +	spin_lock_init(&event->lock);
> +
> +	event->path.dentry = NULL;
> +	event->path.mnt = NULL;
> +	event->inode = NULL;
> +
> +	event->to_tell = to_tell;
> +
> +	switch (data_type) {
> +	case FSNOTIFY_EVENT_FILE: {
> +		struct file *file = data;
> +		struct path *path = &file->f_path;
> +		event->path.dentry = path->dentry;
> +		event->path.mnt = path->mnt;
> +		path_get(&event->path);
> +		event->data_type = FSNOTIFY_EVENT_PATH;
> +		break;
> +	}
> +	case FSNOTIFY_EVENT_PATH: {
> +		struct path *path = data;
> +		event->path.dentry = path->dentry;
> +		event->path.mnt = path->mnt;
> +		path_get(&event->path);
> +		event->data_type = FSNOTIFY_EVENT_PATH;
> +		break;
> +	}
> +	case FSNOTIFY_EVENT_INODE:
> +		event->inode = data;
> +		event->data_type = FSNOTIFY_EVENT_INODE;
> +		break;
> +	default:
> +		BUG();
> +	};

unneeded semicolon

> +	event->mask = mask;
> +
> +	return event;
> +}
> +
> +__init int fsnotify_notification_init(void)
> +{
> +	event_kmem_cache = kmem_cache_create("fsnotify_event", sizeof(struct fsnotify_event), 0, SLAB_PANIC, NULL);

Can use the cheesy KMEM_CACHE() macro?

> +	return 0;
> +}
> +subsys_initcall(fsnotify_notification_init);
> +
> diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
> index 00fbd5b..3d68058 100644
> --- a/include/linux/fsnotify.h
> +++ b/include/linux/fsnotify.h
> @@ -13,6 +13,7 @@
>  
>  #include <linux/dnotify.h>
>  #include <linux/inotify.h>
> +#include <linux/fsnotify_backend.h>
>  #include <linux/audit.h>
>  
>  /*
> @@ -35,6 +36,17 @@ static inline void fsnotify_d_move(struct dentry *entry)
>  }
>  
>  /*
> + * fsnotify_inoderemove - an inode is going away
> + */
> +static inline void fsnotify_inoderemove(struct inode *inode)

inode_remove?

> +{
> +	inotify_inode_queue_event(inode, IN_DELETE_SELF, 0, NULL, NULL);
> +	inotify_inode_is_dead(inode);
> +
> +	fsnotify(inode, FS_DELETE_SELF, inode, FSNOTIFY_EVENT_INODE);
> +}
> +
>
> ...
>
> +/*
> + * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily
> + * convert between them.  dnotify only needs conversion at watch creation
> + * so no perf loss there.  fanotify isn't defined yet, so it can use the
> + * wholes if it needs more events.
> + */
> +#define FS_ACCESS		0x00000001ul	/* File was accessed */
> +#define FS_MODIFY		0x00000002ul	/* File was modified */
> +#define FS_ATTRIB		0x00000004ul	/* Metadata changed */
> +#define FS_CLOSE_WRITE		0x00000008ul	/* Writtable file was closed */
> +#define FS_CLOSE_NOWRITE	0x00000010ul	/* Unwrittable file closed */
> +#define FS_OPEN			0x00000020ul	/* File was opened */
> +#define FS_MOVED_FROM		0x00000040ul	/* File was moved from X */
> +#define FS_MOVED_TO		0x00000080ul	/* File was moved to Y */
> +#define FS_CREATE		0x00000100ul	/* Subfile was created */
> +#define FS_DELETE		0x00000200ul	/* Subfile was deleted */
> +#define FS_DELETE_SELF		0x00000400ul	/* Self was deleted */
> +#define FS_MOVE_SELF		0x00000800ul	/* Self was moved */
> +
> +#define FS_UNMOUNT		0x00002000ul	/* inode on umount fs */
> +#define FS_Q_OVERFLOW		0x00004000ul	/* Event queued overflowed */
> +#define FS_IN_IGNORED		0x00008000ul	/* last inotify event here */
> +
> +#define FS_IN_ISDIR		0x40000000ul	/* event occurred against dir */
> +#define FS_IN_ONESHOT		0x80000000ul	/* only send event once */
> +
> +#define FS_DN_RENAME		0x10000000ul	/* file renamed */
> +#define FS_DN_MULTISHOT		0x20000000ul	/* dnotify multishot */
> +
> +#define FS_EVENT_ON_CHILD	0x08000000ul

All the "ul"s seem redundant?

> +struct fsnotify_group;
> +struct fsnotify_event;
> +
> +/*
> + * Each group much define these ops.
> + *
> + * handle_event - main call for a group to handle an fs event
> + * free_group_priv - called when a group refcnt hits 0 to clean up the private union
> + */
> +struct fsnotify_ops {
> +	int (*handle_event)(struct fsnotify_group *group, struct fsnotify_event *event);
> +	void (*free_group_priv)(struct fsnotify_group *group);

"free_group_private"

> +};
> +
> +/*
> + * A group is a "thing" that wants to receive notification about filesystem
> + * events.  The mask holds the subset of event types this group cares about.

It's unclear what the "event types" are.  FS_* from above?

Perhaps things would be clearer if they were named FS_EVENT_*, or FSE_*?

> + * refcnt on a group is up to the implementor and at any moment if it goes 0
> + * everything will be cleaned up.
> + */
> +struct fsnotify_group {
> +	struct list_head group_list;	/* list of all groups on the system */
> +	__u32 mask;			/* mask of events this group cares about */
> +	atomic_t refcnt;		/* num of processes with a special file open */
> +	unsigned int group_num;		/* the 'name' of the event */
> +
> +	const struct fsnotify_ops *ops;	/* how this group handles things */
> +
> +	unsigned int evicted:1;		/* has this group been evicted? */

If someone adds another bitfield here then they will share the same
word and will hence need locking.  It'd be less risky to just make this
a plain old `unsigned'.  Or `bool'.

> +	/* groups can define private fields here */
> +	union {
> +	};
> +};
> +
> +/*
> + * all of the information about the original object we want to now send to
> + * a group.  If you want to carry more info from the accessing task to the
> + * listener this structure is where you need to be adding fields.
> + */
> +struct fsnotify_event {
> +	spinlock_t lock;	/* protection for the associated event_holder and private_list */
> +	struct inode *to_tell;

Does the existence of a `struct fsnotify_event' cause a reference to be
taken on fsnotify_event.to_tell?

If so, that's useful information to add here.

Either way, a few words about the design of the lifetime management
would be helpful.

>
> ...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 03/13] fsnotify: add group priorities
  2009-03-27 20:05 ` [PATCH -V2 03/13] fsnotify: add group priorities Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:20 -0400
Eric Paris <eparis@redhat.com> wrote:

> This introduces an ordering to fnotify groups.  It's most interesting because
> an HSM would need to run before a typical access notifier.  And an access
> control system would need to run between the two.

HSM == Hierarchical storage management?  google says "High School Musical" ;)

Some considerably more detailed explanation of the use-cases would be
helpful here.  It's all whizzing over my head.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 04/13] fsnotify: add in inode fsnotify markings
  2009-03-27 20:05 ` [PATCH -V2 04/13] fsnotify: add in inode fsnotify markings Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:26 -0400
Eric Paris <eparis@redhat.com> wrote:

> This patch creates in inode fsnotify markings.  dnotify will make use of in
> inode markings to mark which inodes it wishes to send events for.

The text appears to be using the term "in inode" as a term-of-art.  But
it's a new one to me.

>  fanotify
> will use this to mark which inodes it does not wish to send events for.
> 

<but i dont know what fanotify is>

> ---
> 
>  Documentation/filesystems/fsnotify.txt |  180 +++++++++++++++++++++++++
>  fs/inode.c                             |    9 +
>  fs/notify/Makefile                     |    2 
>  fs/notify/fsnotify.c                   |   10 +
>  fs/notify/fsnotify.h                   |    3 
>  fs/notify/group.c                      |   33 ++++-
>  fs/notify/inode_mark.c                 |  229 ++++++++++++++++++++++++++++++++
>  include/linux/fs.h                     |    5 +
>  include/linux/fsnotify.h               |    9 +
>  include/linux/fsnotify_backend.h       |   58 ++++++++
>  10 files changed, 535 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/filesystems/fsnotify.txt
>  create mode 100644 fs/notify/inode_mark.c
> 
> diff --git a/Documentation/filesystems/fsnotify.txt b/Documentation/filesystems/fsnotify.txt
> new file mode 100644
> index 0000000..e1c90f5
> --- /dev/null
> +++ b/Documentation/filesystems/fsnotify.txt
> @@ -0,0 +1,180 @@
> +fsnotify inode mark locking/lifetime/and refcnting
> +
> +struct fsnotify_mark_entry {
> +        __u32 mask;                     /* mask this mark entry is for */
> +        /* we hold ref for each i_list and g_list.  also one ref for each 'thing'
> +         * in kernel that found and may be using this mark. */
> +        atomic_t refcnt;                /* active things looking at this mark */
> +        struct inode *inode;            /* inode this entry is associated with */
> +        struct fsnotify_group *group;   /* group this mark entry is for */
> +        struct hlist_node i_list;       /* list of mark_entries by inode->i_fsnotify_mark_entries */
> +        struct list_head g_list;        /* list of mark_entries by group->i_fsnotify_mark_entries */
> +        spinlock_t lock;                /* protect group, inode, and killme */
> +        struct list_head free_i_list;   /* tmp list used when freeing this mark */
> +        struct list_head free_g_list;   /* tmp list used when freeing this mark */
> +        void (*free_mark)(struct fsnotify_mark_entry *entry); /* called on final put+free */
> +};
> +
> +REFCNT:
> +The mark->refcnt tells how many "things" in the kernel currectly are

"currently"

>
> ...
>
> +The inode mark can be cleared for a number of different reasons including:
> +- The inode is unlinked for the last time.  (fsnotify_inoderemove)
> +- The inode is being evicted from cache. (fsnotify_inode_delete)
> +- The fs the inode is on is unmounted.  (fsnotify_inode_delete/fsnotify_unmount_inodes)

So it did want to be called fsnotify_inode_remove.

> +- Something explicitly requests that it be removed.  (fsnotify_destroy_mark_by_entry)
> +- The fsnotify_group associated with the mark is going away and all such marks
> +  need to be cleaned up. (fsnotify_clear_marks_by_group)
> +
> +Worst case we are given an inode and need to clean up all the marks on that
> +inode.  We take i_lock and walk the i_fsnotify_mark_entries safely.  For each
> +mark on the list we take a reference (so the mark can't disappear under us).
> +We remove that mark form the inode's list of marks and we add this mark to a
> +private list anchored on the stack using i_free_list;  At this point we no
> +longer fear anything finding the mark using the inode's list of marks.

damn, I wish this was all written in the C files instead.

>
> ...
>
> +void fsnotify_recalc_group_mask(struct fsnotify_group *group)
> +{
> +	__u32 mask = 0;
> +	__u32 old_mask = group->mask;
> +	struct fsnotify_mark_entry *entry;
> +
> +	spin_lock(&group->mark_lock);
> +	list_for_each_entry(entry, &group->mark_entries, g_list) {
> +		mask |= entry->mask;
> +	}

unneeded {}  (numerous places)

> +	spin_unlock(&group->mark_lock);
> +
> +	group->mask = mask;
> +
> +	if (old_mask != mask)
> +		fsnotify_recalc_global_mask();
> +}
> +
>  static void fsnotify_add_group(struct fsnotify_group *group)
>  {
>  	int priority = group->priority;
> @@ -71,13 +89,22 @@ static void fsnotify_get_group(struct fsnotify_group *group)
>  	atomic_inc(&group->refcnt);
>  }
>  
> -static void fsnotify_destroy_group(struct fsnotify_group *group)
> +void fsnotify_final_destroy_group(struct fsnotify_group *group)
>  {
>  	if (group->ops->free_group_priv)
>  		group->ops->free_group_priv(group);
>  
>  	kfree(group);
>  }

missing newline

> +static void fsnotify_destroy_group(struct fsnotify_group *group)
> +{
> +	/* clear all inode mark entries for this group */
> +	fsnotify_clear_marks_by_group(group);
> +
> +	/* past the point of no return, matches the initial value of 1 */
> +	if (atomic_dec_and_test(&group->num_marks))
> +		fsnotify_final_destroy_group(group);
> +}
>  
>
> ...
>
> +void fsnotify_destroy_mark_by_entry(struct fsnotify_mark_entry *entry)
> +{
> +	struct fsnotify_group *group;
> +	struct inode *inode;
> +
> +	spin_lock(&entry->lock);
> +
> +	group = entry->group;
> +	inode = entry->inode;
> +
> +	BUG_ON(group && !inode);
> +	BUG_ON(!group && inode);
> +
> +	/* if !group something else already marked this to die */
> +	if (!group) {
> +		spin_unlock(&entry->lock);
> +		return;
> +	}
> +
> +	/* this just tests that the caller held a reference */
> +	if (unlikely(atomic_read(&entry->refcnt) < 3))
> +		BUG();

BUG_ON()

> +	spin_lock(&group->mark_lock);
> +	spin_lock(&inode->i_lock);
> +
> +	hlist_del_init(&entry->i_list);
> +	entry->inode = NULL;
> +	fsnotify_put_mark(entry); /* for i_list */
> +
> +	list_del_init(&entry->g_list);
> +	entry->group = NULL;
> +	fsnotify_put_mark(entry); /* for g_list */
> +
> +	fsnotify_recalc_inode_mask_locked(inode);
> +
> +	spin_unlock(&inode->i_lock);
> +	spin_unlock(&group->mark_lock);
> +	spin_unlock(&entry->lock);
> +
> +	group->ops->freeing_mark(entry, group);
> +
> +	if (atomic_dec_and_test(&group->num_marks))
> +		fsnotify_final_destroy_group(group);
> +}
> +
>
> ...
>
> +void fsnotify_init_mark(struct fsnotify_mark_entry *entry, void (*free_mark)(struct fsnotify_mark_entry *entry))

you must have a wide monitor.

> +
> +{
> +	spin_lock_init(&entry->lock);
> +	atomic_set(&entry->refcnt, 1);
> +	INIT_HLIST_NODE(&entry->i_list);
> +	entry->group = NULL;
> +	entry->mask = 0;
> +	entry->inode = NULL;
> +	entry->free_mark = free_mark;
> +}
> +
>
> ...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 05/13] fsnotify: parent event notification
  2009-03-27 20:05 ` [PATCH -V2 05/13] fsnotify: parent event notification Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:32 -0400
Eric Paris <eparis@redhat.com> wrote:

> inotify and dnotify both use a similar parent notification mechanism.  We
> add a generic parent notification mechanism to fsnotify for both of these
> to use.  This new machanism also adds the dentry flag optimization which
> exists for inotify to dnotify.
> 
>
> ...
>
>  /*
> + * Given an inode, first check if we care what happens to out children.  Inotify

"our"

> + * and dnotify both tell their parents about events.  If we care about any event
> + * on a child we run all of our children and set a dentry flag saying that the
> + * parent cares.  Thus when an event happens on a child it can quickly tell if
> + * if there is a need to find a parent and send the event to the parent.
> + */
> +static inline void fsnotify_update_dentry_child_flags(struct inode *inode)
> +{
> +	struct dentry *alias;
> +	int watched;
> +
> +	if (!S_ISDIR(inode->i_mode))
> +		return;
> +
> +	/* determine if the children should tell inode about their events */
> +	watched = fsnotify_inode_watches_children(inode);
> +
> +	spin_lock(&dcache_lock);
> +	/* run all of the dentries associated with this inode.  Since this is a
> +	 * directory, there damn well better only be one item on this list */
> +	list_for_each_entry(alias, &inode->i_dentry, d_alias) {
> +		struct dentry *child;
> +
> +		/* run all of the children of the original inode and fix their
> +		 * d_flags to indicate parental interest (their parent is the
> +		 * original inode) */
> +		list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
> +			if (!child->d_inode)
> +				continue;
> +
> +			spin_lock(&child->d_lock);
> +			if (watched)
> +				child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
> +			else
> +				child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
> +			spin_unlock(&child->d_lock);
> +		}
> +	}
> +	spin_unlock(&dcache_lock);
> +}

Huge function, three callsites, way too large to inline!

afacit all these DCACHE_FSNOTIFY_PARENT_WATCHED bits are left set
without suitable locks being held.  What prevents different threads of
control from setting and clearing them under each others' feet?

The comment should be updated to answer this, please.

>
> ...
>
> +static inline void fsnotify_d_instantiate(struct dentry *dentry, struct inode *inode)
>  {
> -	inotify_d_instantiate(entry, inode);
> +	__fsnotify_d_instantiate(dentry, inode);
> +
> +	/* call the legacy inotify shit */
> +	inotify_d_instantiate(dentry, inode);
> +}
> +
> +/* Notify this dentry's parent about a child's events. */
> +static inline void fsnotify_parent(struct dentry *dentry, __u32 mask)
> +{
> +	struct dentry *parent;
> +	struct inode *p_inode;
> +	char send = 0;
> +
> +	if (!(dentry->d_flags | DCACHE_FSNOTIFY_PARENT_WATCHED))
> +		return;
> +
> +	/* we are notifying a parent so come up with the new mask which
> +	 * specifies these are events which came from a child. */
> +	mask |= FS_EVENT_ON_CHILD;
> +
> +	spin_lock(&dentry->d_lock);
> +	parent = dentry->d_parent;
> +	p_inode = parent->d_inode;
> +
> +	if (p_inode->i_fsnotify_mask & mask) {
> +		dget(parent);
> +		send = 1;
> +	}
> +
> +	spin_unlock(&dentry->d_lock);
> +
> +	if (send) {
> +		fsnotify(p_inode, mask, dentry->d_inode, FSNOTIFY_EVENT_INODE);
> +		dput(parent);
> +	}
>  }

I hereby revoke your inlining license.

>
> ...
>
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -45,8 +45,17 @@
>  #define FS_DN_RENAME		0x10000000ul	/* file renamed */
>  #define FS_DN_MULTISHOT		0x20000000ul	/* dnotify multishot */
>  
> +/* this inode cares about things that happen to it's children.  Always set for

"its"

> + * dnotify and inotify.  never set for fanotify */

You might want to go through the comments and start sentences with
capital latters :(

>  #define FS_EVENT_ON_CHILD	0x08000000ul
>  
> +/* this is a list of all events that may get sent to a parernt based on fs event
> + * happening to inodes inside that directory */
>
> ...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 06/13] dnotify: reimplement dnotify using fsnotify
  2009-03-27 20:05 ` [PATCH -V2 06/13] dnotify: reimplement dnotify using fsnotify Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:37 -0400
Eric Paris <eparis@redhat.com> wrote:

> Reimplement dnotify using fsnotify.
> 
>
> ...
>
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1431,6 +1431,8 @@ S:	Orphan
>  DIRECTORY NOTIFICATION (DNOTIFY)
>  P:	Stephen Rothwell
>  M:	sfr@canb.auug.org.au
> +P:	Eric Paris
> +M:	eparis@parisplace.org

hah!

>  L:	linux-kernel@vger.kernel.org
>  S:	Supported
>  
>
> ...
>
> +static int dnotify_should_send_event(struct fsnotify_group *group, struct inode *inode, __u32 mask)

could return bool, if you like that sort of thing.

> +{
> +	struct fsnotify_mark_entry *entry;
> +	int send;
> +
> +	/* !dir_notify_enable should never get here, don't waste time checking
> +	if (!dir_notify_enable)
> +		return 0; */
> +
> +	/* not a dir, dnotify doesn't care */
> +	if (!S_ISDIR(inode->i_mode))
> +		return 0;
> +
> +	spin_lock(&inode->i_lock);
> +	entry = fsnotify_find_mark_entry(group, inode);
> +	spin_unlock(&inode->i_lock);
> +
> +	/* no mark means no dnotify watch */
> +	if (!entry)
> +		return 0;
> +
> +	spin_lock(&entry->lock);
> +	send = !!(mask & entry->mask);
> +	spin_unlock(&entry->lock);
> +	fsnotify_put_mark(entry);
> +
> +	return send;
> +}
> +
>
> ...
>
> -int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
> +/* this conversion is done only at watch creation */
> +static inline __u32  convert_arg(unsigned long arg)

The compiler will inline this anyway.

s/  / /

> +{
> +	__u32 new_mask = FS_EVENT_ON_CHILD;
> +
> +	if (arg & DN_MULTISHOT)
> +		new_mask |= FS_DN_MULTISHOT;
> +	if (arg & DN_DELETE)
> +		new_mask |= (FS_DELETE | FS_MOVED_FROM);
> +	if (arg & DN_MODIFY)
> +		new_mask |= FS_MODIFY;
> +	if (arg & DN_ACCESS)
> +		new_mask |= FS_ACCESS;
> +	if (arg & DN_ATTRIB)
> +		new_mask |= FS_ATTRIB;
> +	if (arg & DN_RENAME)
> +		new_mask |= FS_DN_RENAME;
> +	if (arg & DN_CREATE)
> +		new_mask |= (FS_CREATE | FS_MOVED_TO);
> +
> +	return new_mask;
> +}
> +
> +static int attach_dn(struct dnotify_struct *dn, struct dnotify_mark_entry *dnentry, fl_owner_t id,
> +		     int fd, struct file *filp, __u32 mask)

Given that the definition is already broken over two lines, there's
nothing to be gained by making it look messy in 80-cols?

>  {
> -	struct dnotify_struct *dn;
>  	struct dnotify_struct *odn;
>  	struct dnotify_struct **prev;
> -	struct inode *inode;
> -	fl_owner_t id = current->files;
> -	struct file *f;
> -	int error = 0;
>  
> -	if ((arg & ~DN_MULTISHOT) == 0) {
> -		dnotify_flush(filp, id);
> -		return 0;
> -	}
> -	if (!dir_notify_enable)
> -		return -EINVAL;
> -	inode = filp->f_path.dentry->d_inode;
> -	if (!S_ISDIR(inode->i_mode))
> -		return -ENOTDIR;
> -	dn = kmem_cache_alloc(dn_cache, GFP_KERNEL);
> -	if (dn == NULL)
> -		return -ENOMEM;
> -	spin_lock(&inode->i_lock);
> -	prev = &inode->i_dnotify;
> +	prev = &dnentry->dn;
>  	while ((odn = *prev) != NULL) {
> +		/* do we already have a dnotify struct and we are just adding more events? */
>  		if ((odn->dn_owner == id) && (odn->dn_filp == filp)) {
>  			odn->dn_fd = fd;
> -			odn->dn_mask |= arg;
> -			inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
> -			goto out_free;
> +			odn->dn_mask |= mask;
> +			return -EEXIST;
>  		}
>  		prev = &odn->dn_next;
>  	}
>  
> -	rcu_read_lock();
> -	f = fcheck(fd);
> -	rcu_read_unlock();
> -	/* we'd lost the race with close(), sod off silently */
> -	/* note that inode->i_lock prevents reordering problems
> -	 * between accesses to descriptor table and ->i_dnotify */
> -	if (f != filp)
> -		goto out_free;
> -
> -	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
> -	if (error)
> -		goto out_free;
> -
> -	dn->dn_mask = arg;
> +	dn->dn_mask = mask;
>  	dn->dn_fd = fd;
>  	dn->dn_filp = filp;
>  	dn->dn_owner = id;
> -	inode->i_dnotify_mask |= arg & ~DN_MULTISHOT;
> -	dn->dn_next = inode->i_dnotify;
> -	inode->i_dnotify = dn;
> -	spin_unlock(&inode->i_lock);
> -	return 0;
> +	dn->dn_next = dnentry->dn;
> +	dnentry->dn = dn;
>  
> -out_free:
> -	spin_unlock(&inode->i_lock);
> -	kmem_cache_free(dn_cache, dn);
> -	return error;
> +	return 0;
>  }
>  
>
> ...
>
> -extern void __inode_dir_notify(struct inode *, unsigned long);
> +#define ALL_DNOTIFY_EVENTS (FS_DELETE | FS_DELETE_CHILD |\
> +			    FS_MODIFY | FS_MODIFY_CHILD |\
> +			    FS_ACCESS | FS_ACCESS_CHILD |\
> +			    FS_ATTRIB | FS_ATTRIB_CHILD |\
> +			    FS_CREATE | FS_DN_RENAME |\
> +			    FS_MOVED_FROM | FS_MOVED_TO)

"DNOTIFY_ALL_EVENTS"?

>  extern void dnotify_flush(struct file *, fl_owner_t);
>  extern int fcntl_dirnotify(int, struct file *, unsigned long);
> -extern void dnotify_parent(struct dentry *, unsigned long);
> -
> -static inline void inode_dir_notify(struct inode *inode, unsigned long event)
> -{
> -	if (inode->i_dnotify_mask & (event))
> -		__inode_dir_notify(inode, event);
> -}
>  
>
> ...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 07/13] fsnotify: generic notification queue and waitq
  2009-03-27 20:05 ` [PATCH -V2 07/13] fsnotify: generic notification queue and waitq Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:43 -0400
Eric Paris <eparis@redhat.com> wrote:

> inotify needs to do asyc notification in which event information is stored
> on a queue until the listener is ready to receive it.  This patch
> implements a generic notification queue for inotify (and later fanotify) to
> store events to be sent at a later time.
> 
>
> ...
>
> +/* return 1 if something is available, return 0 otherwise */
> +int fsnotify_check_notif_queue(struct fsnotify_group *group)
> +{
> +	BUG_ON(!mutex_is_locked(&group->notification_mutex));
> +	return !list_empty(&group->notification_list);
> +}

It's a poorly named function, because the name ("check") doesn't convey
information about the return value.

Better would be

	bool fsnotify_notif_queue_nonempty(...)

or

	bool fsnotify_notif_queue_empty(...)

(and invert test in callers)

	
and the abbreviation of "notify" to "notif" just makes it harder to
remember its name.

best is

	bool fsnotify_notify_queue_is_empty(...)

>  void fsnotify_get_event(struct fsnotify_event *event)
>  {
> @@ -45,26 +60,180 @@ void fsnotify_put_event(struct fsnotify_event *event)
>  		return;
>  
>  	if (atomic_dec_and_test(&event->refcnt)) {
> -		if (event->data_type == FSNOTIFY_EVENT_PATH) {
> +		if (event->data_type == FSNOTIFY_EVENT_PATH)
>  			path_put(&event->path);
> -			event->path.dentry = NULL;
> -			event->path.mnt = NULL;
> -		}
> +		kmem_cache_free(event_kmem_cache, event);
> +	}
> +}
>  
> -		event->mask = 0;
> +struct fsnotify_event_holder *alloc_event_holder(void)
> +{
> +	return kmem_cache_alloc(event_holder_kmem_cache, GFP_KERNEL);
> +}

That's a pretty generic-sounding name for a global symbol.

> -		kmem_cache_free(event_kmem_cache, event);
> +void fsnotify_destroy_event_holder(struct fsnotify_event_holder *holder)
> +{
> +	kmem_cache_free(event_holder_kmem_cache, holder);
> +}

That one's better.

> +/*
> + * check if 2 events contain the same information.
> + */
> +static inline int event_compare(struct fsnotify_event *old, struct fsnotify_event *new)
> +{
> +	if ((old->mask == new->mask) &&
> +	    (old->to_tell == new->to_tell) &&
> +	    (old->data_type == new->data_type)) {
> +		switch (old->data_type) {
> +		case (FSNOTIFY_EVENT_INODE):
> +			if (old->inode == new->inode)
> +				return 1;
> +			break;
> +		case (FSNOTIFY_EVENT_PATH):
> +			if ((old->path.mnt == new->path.mnt) &&
> +			    (old->path.dentry == new->path.dentry))
> +				return 1;
> +		case (FSNOTIFY_EVENT_NONE):
> +			return 1;
> +		};
>  	}
> +	return 0;
>  }

The compiler would have inlined this.

> -struct fsnotify_event *fsnotify_create_event(struct inode *to_tell, __u32 mask, void *data, int data_type)
> +/*
> + * Add an event to the group notification queue.  The group can later pull this
> + * event off the queue to deal with.
> + */
> +int fsnotify_add_notif_event(struct fsnotify_group *group, struct fsnotify_event *event)

s/notif/notify/

> +{
> +	struct fsnotify_event_holder *holder = NULL;
> +	struct list_head *list = &group->notification_list;
> +	struct fsnotify_event_holder *last_holder;
> +	struct fsnotify_event *last_event;
> +
> +	/*
> +	 * Check if we expect to be able to use the in event holder.  If not alloc
> +	 * a new holder.
> +	 * For the overflow event it's possible that something will use the in
> +	 * event holder before we get the lock so we may need to jump back and
> +	 * alloc a new holder.
> +	 */

The term "in event" is unclear to this reader.

> +	if (!list_empty(&event->holder.event_list)) {
> +alloc_holder:
> +		holder = alloc_event_holder();
> +		if (!holder)
> +			return -ENOMEM;
> +	}
> +
> +	mutex_lock(&group->notification_mutex);
> +
> +	if (group->q_len >= group->max_events)
> +		event = &q_overflow_event;
> +
> +	spin_lock(&event->lock);
> +
> +	if (list_empty(&event->holder.event_list)) {
> +		if (unlikely(holder))
> +			fsnotify_destroy_event_holder(holder);
> +		holder = &event->holder;
> +	} else if (unlikely(!holder)) {
> +		/* between the time we checked above and got the lock the in
> +		 * event holder was used, go back and get a new one */
> +		spin_unlock(&event->lock);
> +		mutex_unlock(&group->notification_mutex);
> +		goto alloc_holder;
> +	}
> +
> +	if (!list_empty(list)) {
> +		last_holder = list_entry(list->prev, struct fsnotify_event_holder, event_list);
> +		last_event = last_holder->event;
> +		if (event_compare(last_event, event)) {
> +			spin_unlock(&event->lock);
> +			mutex_unlock(&group->notification_mutex);
> +			if (holder != &event->holder)
> +				fsnotify_destroy_event_holder(holder);
> +			return 0;
> +		}
> +	}
> +
> +	group->q_len++;
> +	holder->event = event;
> +
> +	fsnotify_get_event(event);
> +	list_add_tail(&holder->event_list, list);
> +	spin_unlock(&event->lock);
> +	mutex_unlock(&group->notification_mutex);
> +
> +	wake_up(&group->notification_waitq);
> +	return 0;
> +}
> +
>
> ...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 09/13] fsnotify: add correlations between events
  2009-03-27 20:05 ` [PATCH -V2 09/13] fsnotify: add correlations between events Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:05:54 -0400
Eric Paris <eparis@redhat.com> wrote:

> inotify sends userspace a correlation between events when they are related
> (aka when dentries are moved).  This adds that same support for all
> fsnotify events.
> 

So is this a new userspace-visible feature?

>
> ...
>
> +static atomic_t fsnotify_sync_cookie = ATOMIC_INIT(0);
> +
> +u32 fsnotify_get_cookie(void)
> +{
> +	return atomic_inc_return(&fsnotify_sync_cookie);
> +}
> +EXPORT_SYMBOL_GPL(fsnotify_get_cookie);

Please make a poilcy of documenting the global symbols.  Especially the
exported-to-modules symbols.  At least.

>
> ...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 12/13] fsnotify: handle filesystem unmounts with fsnotify marks
  2009-03-27 20:06 ` [PATCH -V2 12/13] fsnotify: handle filesystem unmounts with fsnotify marks Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:06:11 -0400
Eric Paris <eparis@redhat.com> wrote:

> When an fs is unmounted with an fsnotify mark entry attached to one of its
> inodes we need to destroy that mark entry and we also (like inotify) send
> an unmount event.
> 
>
> ...
>
> +/**
> + * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
> + * @list: list of inodes being unmounted (sb->s_inodes)
> + *
> + * Called with inode_lock held, protecting the unmounting super block's list
> + * of inodes, and with iprune_mutex held, keeping shrink_icache_memory() at bay.
> + * We temporarily drop inode_lock, however, and CAN block.
> + */
> +void fsnotify_unmount_inodes(struct list_head *list)
> +{
> +	struct inode *inode, *next_i, *need_iput = NULL;
> +
> +	list_for_each_entry_safe(inode, next_i, list, i_sb_list) {
> +		struct inode *need_iput_tmp;
> +
> +		/*
> +		 * If i_count is zero, the inode cannot have any watches and
> +		 * doing an __iget/iput with MS_ACTIVE clear would actually
> +		 * evict all inodes with zero i_count from icache which is
> +		 * unnecessarily violent and may in fact be illegal to do.
> +		 */
> +		if (!atomic_read(&inode->i_count))
> +			continue;
> +
> +		/*
> +		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> +		 * I_WILL_FREE which is fine because by that point the inode
> +		 * cannot have any associated watches.
> +		 */
> +		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
> +			continue;
> +
> +		need_iput_tmp = need_iput;
> +		need_iput = NULL;
> +
> +		/* In case fsnotify_inode_delete() drops a reference. */
> +		if (inode != need_iput_tmp)
> +			__iget(inode);
> +		else
> +			need_iput_tmp = NULL;
> +
> +		/* In case the dropping of a reference would nuke next_i. */
> +		if ((&next_i->i_sb_list != list) &&
> +		    atomic_read(&next_i->i_count) &&
> +		    !(next_i->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))) {
> +			__iget(next_i);
> +			need_iput = next_i;
> +		}
> +
> +		/*
> +		 * We can safely drop inode_lock here because we hold
> +		 * references on both inode and next_i.  Also no new inodes
> +		 * will be added since the umount has begun.  Finally,
> +		 * iprune_mutex keeps shrink_icache_memory() away.
> +		 */
> +		spin_unlock(&inode_lock);
> +
> +		if (need_iput_tmp)
> +			iput(need_iput_tmp);

iput(NULL) is legal.

> +		/* for each watch, send FS_UNMOUNT and then remove it */
> +		fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
> +
> +		fsnotify_inode_delete(inode);
> +
> +		iput(inode);
> +
> +		spin_lock(&inode_lock);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(fsnotify_unmount_inodes);

Why is it exported?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 13/13] inotify: reimplement inotify using fsnotify
  2009-03-27 20:06 ` [PATCH -V2 13/13] inotify: reimplement inotify using fsnotify Eric Paris
@ 2009-04-07 23:06   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-07 23:06 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Fri, 27 Mar 2009 16:06:17 -0400
Eric Paris <eparis@redhat.com> wrote:

> Reimplement inotify_user using fsnotify.  This should be feature for feature
> exactly the same as the original inotify_user.  This does not make any changes
> to the in kernel inotify feature used by audit.  Those patches (and the eventual
> removal of in kernel inotify) will come after the new inotify_user proves to be
> working correctly.
> 
>
> ...
>
> +static inline __u32 inotify_arg_to_mask(u32 arg)
> +{
> +	__u32 mask;
> +
> +	/* FS_* damn sure better equal IN_* */
> +	BUILD_BUG_ON(IN_ACCESS != FS_ACCESS);
> +	BUILD_BUG_ON(IN_MODIFY != FS_MODIFY);
> +	BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB);
> +	BUILD_BUG_ON(IN_CLOSE_WRITE != FS_CLOSE_WRITE);
> +	BUILD_BUG_ON(IN_CLOSE_NOWRITE != FS_CLOSE_NOWRITE);
> +	BUILD_BUG_ON(IN_OPEN != FS_OPEN);
> +	BUILD_BUG_ON(IN_MOVED_FROM != FS_MOVED_FROM);
> +	BUILD_BUG_ON(IN_MOVED_TO != FS_MOVED_TO);
> +	BUILD_BUG_ON(IN_CREATE != FS_CREATE);
> +	BUILD_BUG_ON(IN_DELETE != FS_DELETE);
> +	BUILD_BUG_ON(IN_DELETE_SELF != FS_DELETE_SELF);
> +	BUILD_BUG_ON(IN_MOVE_SELF != FS_MOVE_SELF);
> +	BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW);
> +
> +	BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT);
> +	BUILD_BUG_ON(IN_ISDIR != FS_IN_ISDIR);
> +	BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED);
> +	BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT);

These checks can be placed anywhere.  Putting them in a header file
means that they are performed nultiple times per build and slows the
build a bit.

> +	/* everything should accept their own ignored and cares about children */
> +	mask = (FS_IN_IGNORED | FS_EVENT_ON_CHILD);
> +
> +	/* mask off the flags used to open the fd */
> +	mask |= (arg & (IN_ALL_EVENTS | IN_ONESHOT));
> +
> +	return mask;
> +}
> +
> +static inline u32 inotify_mask_to_arg(__u32 mask)
> +{
> +	u32 arg;
> +
> +	arg = (mask & (IN_ALL_EVENTS | IN_ISDIR | IN_UNMOUNT | IN_IGNORED | IN_Q_OVERFLOW));
> +
> +	return arg;
> +}

	return mask & (IN_ALL_EVENTS|IN_ISDIR|IN_UNMOUNT|IN_IGNORED|
			IN_Q_OVERFLOW);

would suffice.

>
> ...
>
> --- /dev/null
> +++ b/fs/notify/inotify/inotify_fsnotify.c
> @@ -0,0 +1,156 @@
> +/*
> + * fs/inotify_user.c - inotify support for userspace
> + *
> + * Authors:
> + *	John McCutchan	<ttb@tentacle.dhs.org>
> + *	Robert Love	<rml@novell.com>
> + *
> + * Copyright (C) 2005 John McCutchan
> + * Copyright 2006 Hewlett-Packard Development Company, L.P.
> + *
> + * Copyright (C) 2009 Eric Paris <Red Hat Inc>
> + * inotify was largely rewriten to make use of the fsnotify infrastructure
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 2, or (at your option) any
> + * later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/limits.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
> +#include <linux/namei.h>
> +#include <linux/poll.h>
> +#include <linux/idr.h>
> +#include <linux/init.h>
> +#include <linux/inotify.h>
> +#include <linux/list.h>
> +#include <linux/syscalls.h>
> +#include <linux/string.h>
> +#include <linux/magic.h>
> +#include <linux/writeback.h>
> +
> +#include "inotify.h"
> +
> +#include <asm/ioctls.h>

Is this include needed?
>
> ...
>
> +void __inotify_free_event_priv(struct inotify_event_private_data *event_priv)
> +{
> +	list_del_init(&event_priv->fsnotify_event_priv_data.event_list);
> +	kmem_cache_free(event_priv_cachep, event_priv);
> +}

Locking for this?  Seems to be event->lock, but inotify_handle_event()
calls this without locks.

>
> ...
>
> --- /dev/null
> +++ b/fs/notify/inotify/inotify_kernel.c
> @@ -0,0 +1,276 @@
> +/*
> + * fs/inotify_user.c - inotify support for userspace
> + *
> + * Authors:
> + *	John McCutchan	<ttb@tentacle.dhs.org>
> + *	Robert Love	<rml@novell.com>
> + *
> + * Copyright (C) 2005 John McCutchan
> + * Copyright 2006 Hewlett-Packard Development Company, L.P.
> + *
> + * Copyright (C) 2009 Eric Paris <Red Hat Inc>
> + * inotify was largely rewriten to make use of the fsnotify infrastructure
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 2, or (at your option) any
> + * later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/limits.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
> +#include <linux/namei.h>
> +#include <linux/poll.h>
> +#include <linux/idr.h>
> +#include <linux/init.h>
> +#include <linux/inotify.h>
> +#include <linux/list.h>
> +#include <linux/syscalls.h>
> +#include <linux/string.h>
> +#include <linux/magic.h>
> +#include <linux/writeback.h>
> +
> +#include "inotify.h"
> +
> +#include <asm/ioctls.h>

Needed?

> +static struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
> +struct kmem_cache *event_priv_cachep __read_mostly;
> +static struct fsnotify_event *inotify_ignored_event;
> +
> +atomic_t inotify_grp_num;

In some places you initialise static atomic_t's with ATOMIC_INIT(0). 
In others, not.

We seem to have given up on always initialising these things, so
omitting the initialiser is OK.

> +/*
> + * find_inode - resolve a user-given path to a specific inode
> + */
> +int find_inode(const char __user *dirname, struct path *path, unsigned flags)

A poorly-chosen global identifier.

> +{
> +	int error;
> +
> +	error = user_path_at(AT_FDCWD, dirname, flags, path);
> +	if (error)
> +		return error;
> +	/* you can only watch an inode if you have read permissions on it */
> +	error = inode_permission(path->dentry->d_inode, MAY_READ);
> +	if (error)
> +		path_put(path);
> +	return error;
> +}
> +
>
> ...
>
> +/* ding dong the mark is dead */
> +static void inotify_free_mark(struct fsnotify_mark_entry *entry)
> +{
> +	struct inotify_inode_mark_entry *ientry = (struct inotify_inode_mark_entry *)entry;

container_of(), please.

> +
> +	kmem_cache_free(inotify_inode_mark_cachep, ientry);
> +}
> +
>
> ...
>
> +static int __init inotify_kernel_setup(void)
> +{
> +	inotify_inode_mark_cachep = kmem_cache_create("inotify_mark_entry",
> +					sizeof(struct inotify_inode_mark_entry),
> +					0, SLAB_PANIC, NULL);
> +	event_priv_cachep = kmem_cache_create("inotify_event_priv_cache",
> +					sizeof(struct inotify_event_private_data),
> +					0, SLAB_PANIC, NULL);

KMEM_CACHE()?

> +	inotify_ignored_event = fsnotify_create_event(NULL, FS_IN_IGNORED, NULL, FSNOTIFY_EVENT_INODE, NULL, 0);
> +	if (!inotify_ignored_event)
> +		panic("unable to allocate the inotify ignored event\n");
> +	return 0;
> +}
>
> ...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 02/13] fsnotify: unified filesystem notification backend
  2009-04-07 23:05   ` Andrew Morton
@ 2009-04-08  0:37     ` Eric Paris
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Paris @ 2009-04-08  0:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro, hch, alan, sfr, john, rlove

On Tue, 2009-04-07 at 16:05 -0700, Andrew Morton wrote:
> On Fri, 27 Mar 2009 16:05:15 -0400
> Eric Paris <eparis@redhat.com> wrote:
> 
> > fsnotify paves the way for fanotify.
> 
> It'd kinda help if the changelog were to tell us what fanotify is.

In the next patch I'm sorta inclined to drop all reference to fanotify
since any hint of fanotify/TALPA/anti-malware puts a bad taste in
people's mouth.  This patch set is by itself an improvement.  It gives
us a wonderful plugin mechanism I can use to build fanotify (the
mechanism in fsnotify is based on the old fanotify code.  Using
foundation I do all fanotify implementation in less than 1100 lines) but
we aren't committing to anything.  These 13 patches (plus a couple more
for audit) should be taken on their own.  They stand alone as an
improvement.

But I'll explain fanotify right here.  fanotify is different for 2 main
reasons.  1) There isn't the huge race between the event and the
notification.  When an "event" is given to an fanotify listener it comes
with a magically opened file descriptor in the context of the listener.
With inotify I either have to keep the fd open for everything I'm
listening to or watch the parent directory and based on the name in the
inotify event try to open the file in question some time after the
original open.  By that time the file could have been moved, deleted, or
who knows what.

2) inotify and dnotify are both based on naming the couple of specific
files or directories you care about.  fanotify is about giving you every
single event on the system and letting you instead descript what you do
NOT care about.

I've got 3 users for fanotify.  The first user is the AV community.  The
ones who pushed hard enough that I'm writing all of this.  But we have 2
other users who are interested.

boot profiling: they currently get open/close/read/write type info by
hijacking audit.  fanotify fits perfectly.

the desktop search/filesystem indexer group has also expressed interest
http://lkml.org/lkml/2009/3/27/166 and it's simple to do what they want.

> I'm inclined to merge this patch series if only to get us an
> inotify/dnotify maintainer ;)

That's in here, and we actually have an inotify patch floating in the
ether I'm going to just stick into this series to see if anyone
notices....   :)

> General comment on the patches: complex.  I found them depressingly
> hard to understand (and hence review) - the lack of high-level
> commentary in the code is pretty severe.  There's a nice-looking design
> document there, but like everyone else, I didn't look at it much ;)
> It's not really a successful substitute for carefully-chosen comments
> at the appropriate codesites and data structures.

I'll make another pass based on your comments to the rest of the series
and try to throw more random comments even when you didn't ask.  I
realize I need to make it accessible to more than just Al.

-Eric


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock
  2009-04-07 23:06 ` [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Andrew Morton
@ 2009-04-28 20:08   ` Andrew Morton
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Morton @ 2009-04-28 20:08 UTC (permalink / raw)
  To: eparis, linux-kernel, viro, hch, alan, sfr, john, rlove

On Tue, 7 Apr 2009 16:06:01 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 27 Mar 2009 16:05:08 -0400
> Eric Paris <eparis@redhat.com> wrote:
> 
> > Much like the atomic_dec_and_lock() function in which we take an hold a
> > spin_lock if we drop the atomic to 0 this function takes and holds the
> > mutex if we dec the atomic to 0.
> > 
> > Signed-off-by: Eric Paris <eparis@redhat.com>
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Paul Mackerras <paulus@samba.org>
> > LKML-Reference: <20090323172417.410913479@chello.nl>
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > ---
> > 
> >  include/linux/mutex.h |   23 +++++++++++++++++++++++
> >  1 files changed, 23 insertions(+), 0 deletions(-)
> > 
> > diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> > index 3069ec7..93054fc 100644
> > --- a/include/linux/mutex.h
> > +++ b/include/linux/mutex.h
> > @@ -151,4 +151,27 @@ extern int __must_check mutex_lock_killable(struct mutex *lock);
> >  extern int mutex_trylock(struct mutex *lock);
> >  extern void mutex_unlock(struct mutex *lock);
> >  
> > +/**
> > + * atomic_dec_and_mutex_lock - return holding mutex if we dec to 0
> > + * @cnt: the atomic which we are to dec
> > + * @lock: the mutex to return holding if we dec to 0
> > + *
> > + * return true and hold lock if we dec to 0, return false otherwise
> > + */
> > +static inline int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock)
> > +{
> > +	/* dec if we can't possibly hit 0 */
> > +	if (atomic_add_unless(cnt, -1, 1))
> > +		return 0;
> > +	/* we might hit 0, so take the lock */
> > +	mutex_lock(lock);
> > +	if (!atomic_dec_and_test(cnt)) {
> > +		/* when we actually did the dec, we didn't hit 0 */
> > +		mutex_unlock(lock);
> > +		return 0;
> > +	}
> > +	/* we hit 0, and we hold the lock */
> > +	return 1;
> > +}
> > +
> 
> This looks too large to be inlined?

It still looks too large to be inlined.

Take a look at atomic_add_unless(), and split your sides laughing.

Once you factor in all the lockdep and other debug goop, the code
generation here is pretty bewildering, but it won't be small.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-04-28 20:14 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-27 20:05 [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Eric Paris
2009-03-27 20:05 ` [PATCH -V2 02/13] fsnotify: unified filesystem notification backend Eric Paris
2009-04-07 23:05   ` Andrew Morton
2009-04-08  0:37     ` Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:05 ` [PATCH -V2 03/13] fsnotify: add group priorities Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:05 ` [PATCH -V2 04/13] fsnotify: add in inode fsnotify markings Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:05 ` [PATCH -V2 05/13] fsnotify: parent event notification Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:05 ` [PATCH -V2 06/13] dnotify: reimplement dnotify using fsnotify Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:05 ` [PATCH -V2 07/13] fsnotify: generic notification queue and waitq Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:05 ` [PATCH -V2 08/13] fsnotify: include pathnames with entries when possible Eric Paris
2009-03-27 20:05 ` [PATCH -V2 09/13] fsnotify: add correlations between events Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:06 ` [PATCH -V2 10/13] fsnotify: allow groups to add private data to events Eric Paris
2009-03-27 20:06 ` [PATCH -V2 11/13] fsnotify: fsnotify marks on inodes pin them in core Eric Paris
2009-03-27 20:06 ` [PATCH -V2 12/13] fsnotify: handle filesystem unmounts with fsnotify marks Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-03-27 20:06 ` [PATCH -V2 13/13] inotify: reimplement inotify using fsnotify Eric Paris
2009-04-07 23:06   ` Andrew Morton
2009-04-07 23:06 ` [PATCH -V2 01/13] mutex: add atomic_dec_and_mutex_lock Andrew Morton
2009-04-28 20:08   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).