All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] fsnotify: change locking order
@ 2011-06-14 15:29 Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 1/9] inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group() Lino Sanfilippo
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro


Hi Eric,

After the patch series "decouple mark lock and marks fsobject lock" 
I sent on Feb. 21 this is another attempt to change the locking order
used in fsnotify from 

	mark->lock
	group->mark_lock
	inode/mnt->lock
to 
	group->mark_lock
	mark->lock
	inode/mnt->lock

The former series still contained some flaws:
1. inodes/mounts that have marks and are not pinned could dissappear while 
another thread still wants to access a marks inode/mount (see https://lkml.org/lkml/2011/3/1/373)
2. in the new introduced function remove_mark_locked() a group could be freed 
while the groups mark mutex is held

Both issues should be fixed with this series.

The reason for changing the locking order is, as you may remember, that there are 
still some races when adding/removing marks to a group 
(see also https://lkml.org/lkml/2010/11/30/189).

So what this series does is change the locking order (PATCH 4) to

	group->mark_mutex
	mark->lock
	inode/mnt->lock

Beside this the group mark_lock is turned into a mutex (PATCH 6), which allows us to 
call functions that may sleep while this lock is held. 

At some places there is only a mark and no group available 
(i.e. clear_marks_by_[inode|mount]()), so we first have to get the group from the mark
to use the group->mark_mutex (PATCH 7).  

Since we cant get a group from a mark and use it without the danger that the group is 
unregistered and freed, reference counting for groups is introduced and used 
(PATCHES 1 to 3) for each mark that is added to the group. 
With this freeing a group does not longer depend on the number of marks, but is 
simply destroyed when the reference counter hits 0.

Since a group ref is now taken for each mark that is added to the group we first
have to explicitly call fsnotify_destroy_group() to remove all marks from the group
and release all group references held by those marks. This will also release the
- possibly final - ref of the group held by the caller (PATCH 1).

Furthermore we now can take the mark lock with the group mutex held so we dont need an
extra "clear list" any more if we clear all marks by group (PATCH 9).

For [add|remove]_mark() locked versions are introduced (PATCH 8) that can be 
used if a caller already has the mark lock held. Thus we now have the possibility
to use the groups mark mutex to also synchronize addition/removal of a mark or to
replace the dnotify mutex. 
This is not part of these patches. It would be the next step if these patches are 
accepted to be merged.

This is against 2.6.39














	

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 2/9] fsnotify: introduce fsnotify_get_group() Lino Sanfilippo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

Currently in fsnotify_put_group() the ref count of a group is decremented and if
it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only
at group creation set to 1 and never increased after that a call to fsnotify_put_group()
always results in a call to fsnotify_destroy_group().
With this patch fsnotify_destroy_group() is called directly.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/fanotify/fanotify_user.c |   14 +++++++-------
 fs/notify/group.c                  |    2 +-
 fs/notify/inotify/inotify_user.c   |    8 +++-----
 include/linux/fsnotify_backend.h   |    3 ++-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9fde1c0..a85752d 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -417,7 +417,7 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 	wake_up(&group->fanotify_data.access_waitq);
 #endif
 	/* matches the fanotify_init->fsnotify_alloc_group */
-	fsnotify_put_group(group);
+	fsnotify_destroy_group(group);
 
 	return 0;
 }
@@ -730,13 +730,13 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 		break;
 	default:
 		fd = -EINVAL;
-		goto out_put_group;
+		goto out_destroy_group;
 	}
 
 	if (flags & FAN_UNLIMITED_QUEUE) {
 		fd = -EPERM;
 		if (!capable(CAP_SYS_ADMIN))
-			goto out_put_group;
+			goto out_destroy_group;
 		group->max_events = UINT_MAX;
 	} else {
 		group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS;
@@ -745,7 +745,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 	if (flags & FAN_UNLIMITED_MARKS) {
 		fd = -EPERM;
 		if (!capable(CAP_SYS_ADMIN))
-			goto out_put_group;
+			goto out_destroy_group;
 		group->fanotify_data.max_marks = UINT_MAX;
 	} else {
 		group->fanotify_data.max_marks = FANOTIFY_DEFAULT_MAX_MARKS;
@@ -753,12 +753,12 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 
 	fd = anon_inode_getfd("[fanotify]", &fanotify_fops, group, f_flags);
 	if (fd < 0)
-		goto out_put_group;
+		goto out_destroy_group;
 
 	return fd;
 
-out_put_group:
-	fsnotify_put_group(group);
+out_destroy_group:
+	fsnotify_destroy_group(group);
 	return fd;
 }
 
diff --git a/fs/notify/group.c b/fs/notify/group.c
index d309f38..5e638a2 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -50,7 +50,7 @@ void fsnotify_final_destroy_group(struct fsnotify_group *group)
  * situtation, the fsnotify_final_destroy_group will get called when that final
  * mark is freed.
  */
-static void fsnotify_destroy_group(struct fsnotify_group *group)
+void fsnotify_destroy_group(struct fsnotify_group *group)
 {
 	/* clear all inode marks for this group */
 	fsnotify_clear_marks_by_group(group);
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 8445fbc..dbafbfc 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -293,10 +293,8 @@ static int inotify_release(struct inode *ignored, struct file *file)
 
 	pr_debug("%s: group=%p\n", __func__, group);
 
-	fsnotify_clear_marks_by_group(group);
-
 	/* free this group, matching get was inotify_init->fsnotify_obtain_group */
-	fsnotify_put_group(group);
+	fsnotify_destroy_group(group);
 
 	return 0;
 }
@@ -712,7 +710,7 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
 
 	if (atomic_inc_return(&group->inotify_data.user->inotify_devs) >
 	    inotify_max_user_instances) {
-		fsnotify_put_group(group);
+		fsnotify_destroy_group(group);
 		return ERR_PTR(-EMFILE);
 	}
 
@@ -741,7 +739,7 @@ SYSCALL_DEFINE1(inotify_init1, int, flags)
 	ret = anon_inode_getfd("inotify", &inotify_fops, group,
 				  O_RDONLY | flags);
 	if (ret < 0)
-		fsnotify_put_group(group);
+		fsnotify_destroy_group(group);
 
 	return ret;
 }
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 69ad89b..877764a 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -364,7 +364,8 @@ static inline void __fsnotify_d_instantiate(struct dentry *dentry, struct inode
 extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops);
 /* drop reference on a group from fsnotify_alloc_group */
 extern void fsnotify_put_group(struct fsnotify_group *group);
-
+/* destroy group */
+extern void fsnotify_destroy_group(struct fsnotify_group *group);
 /* take a reference to an event */
 extern void fsnotify_get_event(struct fsnotify_event *event);
 extern void fsnotify_put_event(struct fsnotify_event *event);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] fsnotify: introduce fsnotify_get_group()
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 1/9] inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group() Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 3/9] fsnotify: use reference counting for groups Lino Sanfilippo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

Introduce fsnotify_get_group() which increments the reference counter of a group.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/group.c                |    8 ++++++++
 include/linux/fsnotify_backend.h |    4 +++-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index 5e638a2..0fd8f5e 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -63,6 +63,14 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
 }
 
 /*
+ * Get reference to a group.
+ */
+void fsnotify_get_group(struct fsnotify_group *group)
+{
+	atomic_inc(&group->refcnt);
+}
+
+/*
  * Drop a reference to a group.  Free it if it's through.
  */
 void fsnotify_put_group(struct fsnotify_group *group)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 877764a..073b72b 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -360,8 +360,10 @@ static inline void __fsnotify_d_instantiate(struct dentry *dentry, struct inode
 
 /* called from fsnotify listeners, such as fanotify or dnotify */
 
-/* get a reference to an existing or create a new group */
+/* create a new group */
 extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops);
+/* get reference to a group */
+extern void fsnotify_get_group(struct fsnotify_group *group);
 /* drop reference on a group from fsnotify_alloc_group */
 extern void fsnotify_put_group(struct fsnotify_group *group);
 /* destroy group */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] fsnotify: use reference counting for groups
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 1/9] inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group() Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 2/9] fsnotify: introduce fsnotify_get_group() Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 4/9] fsnotify: take groups mark_lock before mark lock Lino Sanfilippo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

Get a group ref for each mark that is added to the groups list and release that
ref when the mark is freed in fsnotify_put_mark().
We also use get a group reference for duplicated marks and for private event
data.
Now we dont free a group any more when the number of marks becomes 0 but when
the groups ref count does. Since this will only happen when all marks are removed
from a groups mark list, we dont have to set the groups number of marks to 1 at
group creation.

Beside clearing all marks in fsnotify_destroy_group() we do also flush the
groups event queue. This is since events may hold references to groups (due to
private event data) and we have to put those references first before we get a
chance to put the final ref, which will result in a call to
fsnotify_final_destroy_group().

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/group.c                    |   28 ++++++++++------------------
 fs/notify/inotify/inotify_fsnotify.c |    2 ++
 fs/notify/inotify/inotify_user.c     |    1 +
 fs/notify/mark.c                     |   22 +++++++++++++---------
 4 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index 0fd8f5e..64ad3f0 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -33,9 +33,6 @@
  */
 void fsnotify_final_destroy_group(struct fsnotify_group *group)
 {
-	/* clear the notification queue of all events */
-	fsnotify_flush_notify(group);
-
 	if (group->ops->free_group_priv)
 		group->ops->free_group_priv(group);
 
@@ -43,12 +40,10 @@ void fsnotify_final_destroy_group(struct fsnotify_group *group)
 }
 
 /*
- * Trying to get rid of a group.  We need to first get rid of any outstanding
- * allocations and then free the group.  Remember that fsnotify_clear_marks_by_group
- * could miss marks that are being freed by inode and those marks could still
- * hold a reference to this group (via group->num_marks)  If we get into that
- * situtation, the fsnotify_final_destroy_group will get called when that final
- * mark is freed.
+ * Trying to get rid of a group. Remove all marks, flush all events and release
+ * the group reference.
+ * Note that another thread calling fsnotify_clear_marks_by_group() may still
+ * hold a ref to the group.
  */
 void fsnotify_destroy_group(struct fsnotify_group *group)
 {
@@ -57,9 +52,10 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
 
 	synchronize_srcu(&fsnotify_mark_srcu);
 
-	/* past the point of no return, matches the initial value of 1 */
-	if (atomic_dec_and_test(&group->num_marks))
-		fsnotify_final_destroy_group(group);
+	/* clear the notification queue of all events */
+	fsnotify_flush_notify(group);
+
+	fsnotify_put_group(group);
 }
 
 /*
@@ -76,7 +72,7 @@ void fsnotify_get_group(struct fsnotify_group *group)
 void fsnotify_put_group(struct fsnotify_group *group)
 {
 	if (atomic_dec_and_test(&group->refcnt))
-		fsnotify_destroy_group(group);
+		fsnotify_final_destroy_group(group);
 }
 
 /*
@@ -92,11 +88,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
 
 	/* set to 0 when there a no external references to this group */
 	atomic_set(&group->refcnt, 1);
-	/*
-	 * hits 0 when there are no external references AND no marks for
-	 * this group
-	 */
-	atomic_set(&group->num_marks, 1);
+	atomic_set(&group->num_marks, 0);
 
 	mutex_init(&group->notification_mutex);
 	INIT_LIST_HEAD(&group->notification_list);
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index e3cbd74..74977fb 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -118,6 +118,7 @@ static int inotify_handle_event(struct fsnotify_group *group,
 
 	fsn_event_priv = &event_priv->fsnotify_event_priv_data;
 
+	fsnotify_get_group(group);
 	fsn_event_priv->group = group;
 	event_priv->wd = wd;
 
@@ -210,6 +211,7 @@ void inotify_free_event_priv(struct fsnotify_event_private_data *fsn_event_priv)
 	event_priv = container_of(fsn_event_priv, struct inotify_event_private_data,
 				  fsnotify_event_priv_data);
 
+	fsnotify_put_group(fsn_event_priv->group);
 	kmem_cache_free(event_priv_cachep, event_priv);
 }
 
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index dbafbfc..246250f 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -531,6 +531,7 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
 
 	fsn_event_priv = &event_priv->fsnotify_event_priv_data;
 
+	fsnotify_get_group(group);
 	fsn_event_priv->group = group;
 	event_priv->wd = i_mark->wd;
 
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 252ab1f..d246a64 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -109,8 +109,11 @@ void fsnotify_get_mark(struct fsnotify_mark *mark)
 
 void fsnotify_put_mark(struct fsnotify_mark *mark)
 {
-	if (atomic_dec_and_test(&mark->refcnt))
+	if (atomic_dec_and_test(&mark->refcnt)) {
+		if (mark->group)
+			fsnotify_put_group(mark->group);
 		mark->free_mark(mark);
+	}
 }
 
 /*
@@ -125,12 +128,13 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 
 	spin_lock(&mark->lock);
 
+	fsnotify_get_group(mark->group);
 	group = mark->group;
 
 	/* something else already called this function on this mark */
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
 		spin_unlock(&mark->lock);
-		return;
+		goto put_group;
 	}
 
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
@@ -180,14 +184,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 
 	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
 		iput(inode);
+	atomic_dec(&group->num_marks);
 
-	/*
-	 * it's possible that this group tried to destroy itself, but this
-	 * this mark was simultaneously being freed by inode.  If that's the
-	 * case, we finish freeing the group here.
-	 */
-	if (unlikely(atomic_dec_and_test(&group->num_marks)))
-		fsnotify_final_destroy_group(group);
+put_group:
+	fsnotify_put_group(group);
 }
 
 void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask)
@@ -232,6 +232,7 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 
 	mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
 
+	fsnotify_get_group(group);
 	mark->group = group;
 	list_add(&mark->g_list, &group->marks_list);
 	atomic_inc(&group->num_marks);
@@ -263,6 +264,7 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 err:
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
 	list_del_init(&mark->g_list);
+	fsnotify_put_group(group);
 	mark->group = NULL;
 	atomic_dec(&group->num_marks);
 
@@ -315,6 +317,8 @@ void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *ol
 	assert_spin_locked(&old->lock);
 	new->i.inode = old->i.inode;
 	new->m.mnt = old->m.mnt;
+	if (old->group)
+		fsnotify_get_group(old->group);
 	new->group = old->group;
 	new->mask = old->mask;
 	new->free_mark = old->free_mark;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/9] fsnotify: take groups mark_lock before mark lock
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
                   ` (2 preceding siblings ...)
  2011-06-14 15:29 ` [PATCH 3/9] fsnotify: use reference counting for groups Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 5/9] fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed Lino Sanfilippo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

Race-free addition and removal of a mark to a groups mark list would be easier
if we could lock the mark list of group before we lock the specific mark.
This patch changes the order used to add/remove marks to/from mark lists from

1. mark->lock
2. group->mark_lock
3. inode->i_lock

to

1. group->mark_lock
2. mark->lock
3. inode->i_lock

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/mark.c |   26 ++++++++++++++++----------
 1 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d246a64..e621115 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -127,13 +127,22 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 	struct inode *inode = NULL;
 
 	spin_lock(&mark->lock);
-
+	/* dont get the group from a mark that is not alive yet */
+	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
+		spin_unlock(&mark->lock);
+		return;
+	}
 	fsnotify_get_group(mark->group);
 	group = mark->group;
+	spin_unlock(&mark->lock);
+
+	spin_lock(&group->mark_lock);
+	spin_lock(&mark->lock);
 
 	/* something else already called this function on this mark */
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
 		spin_unlock(&mark->lock);
+		spin_unlock(&group->mark_lock);
 		goto put_group;
 	}
 
@@ -142,8 +151,6 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 	/* 1 from caller and 1 for being on i_list/g_list */
 	BUG_ON(atomic_read(&mark->refcnt) < 2);
 
-	spin_lock(&group->mark_lock);
-
 	if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
 		inode = mark->i.inode;
 		fsnotify_destroy_inode_mark(mark);
@@ -154,8 +161,8 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 
 	list_del_init(&mark->g_list);
 
-	spin_unlock(&group->mark_lock);
 	spin_unlock(&mark->lock);
+	spin_unlock(&group->mark_lock);
 
 	spin_lock(&destroy_lock);
 	list_add(&mark->destroy_list, &destroy_list);
@@ -223,13 +230,13 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 
 	/*
 	 * LOCKING ORDER!!!!
-	 * mark->lock
 	 * group->mark_lock
+	 * mark->lock
 	 * inode->i_lock
 	 */
-	spin_lock(&mark->lock);
 	spin_lock(&group->mark_lock);
 
+	spin_lock(&mark->lock);
 	mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
 
 	fsnotify_get_group(group);
@@ -250,13 +257,12 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 		BUG();
 	}
 
-	spin_unlock(&group->mark_lock);
-
 	/* this will pin the object if appropriate */
 	fsnotify_set_mark_mask_locked(mark, mark->mask);
-
 	spin_unlock(&mark->lock);
 
+	spin_unlock(&group->mark_lock);
+
 	if (inode)
 		__fsnotify_update_child_dentry_flags(inode);
 
@@ -268,8 +274,8 @@ err:
 	mark->group = NULL;
 	atomic_dec(&group->num_marks);
 
-	spin_unlock(&group->mark_lock);
 	spin_unlock(&mark->lock);
+	spin_unlock(&group->mark_lock);
 
 	spin_lock(&destroy_lock);
 	list_add(&mark->destroy_list, &destroy_list);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
                   ` (3 preceding siblings ...)
  2011-06-14 15:29 ` [PATCH 4/9] fsnotify: take groups mark_lock before mark lock Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 6/9] fsnotify: use a mutex instead of a spinlock to protect a groups mark list Lino Sanfilippo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

This patch adds an extra flag to mark_remove_from_mask() to inform the caller if
the mark should be destroyed.
With this we dont destroy the mark implicitly in the function itself any more
but let the caller handle it.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/fanotify/fanotify_user.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index a85752d..8424bee 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -513,7 +513,8 @@ out:
 
 static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark,
 					    __u32 mask,
-					    unsigned int flags)
+					    unsigned int flags,
+					    int *destroy)
 {
 	__u32 oldmask;
 
@@ -527,8 +528,7 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark,
 	}
 	spin_unlock(&fsn_mark->lock);
 
-	if (!(oldmask & ~mask))
-		fsnotify_destroy_mark(fsn_mark);
+	*destroy = !(oldmask & ~mask);
 
 	return mask & oldmask;
 }
@@ -539,12 +539,17 @@ static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
 {
 	struct fsnotify_mark *fsn_mark = NULL;
 	__u32 removed;
+	int destroy_mark;
 
 	fsn_mark = fsnotify_find_vfsmount_mark(group, mnt);
 	if (!fsn_mark)
 		return -ENOENT;
 
-	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags);
+	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
+						 &destroy_mark);
+	if (destroy_mark)
+		fsnotify_destroy_mark(fsn_mark);
+
 	fsnotify_put_mark(fsn_mark);
 	if (removed & mnt->mnt_fsnotify_mask)
 		fsnotify_recalc_vfsmount_mask(mnt);
@@ -558,12 +563,16 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 {
 	struct fsnotify_mark *fsn_mark = NULL;
 	__u32 removed;
+	int destroy_mark;
 
 	fsn_mark = fsnotify_find_inode_mark(group, inode);
 	if (!fsn_mark)
 		return -ENOENT;
 
-	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags);
+	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
+						 &destroy_mark);
+	if (destroy_mark)
+		fsnotify_destroy_mark(fsn_mark);
 	/* matches the fsnotify_find_inode_mark() */
 	fsnotify_put_mark(fsn_mark);
 	if (removed & inode->i_fsnotify_mask)
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] fsnotify: use a mutex instead of a spinlock to protect a groups mark list
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
                   ` (4 preceding siblings ...)
  2011-06-14 15:29 ` [PATCH 5/9] fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 7/9] fsnotify: pass group to fsnotify_destroy_mark() Lino Sanfilippo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead
of a spinlock results in more flexibility (i.e it allows to sleep while the
lock is held).

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/group.c                |    2 +-
 fs/notify/inode_mark.c           |    4 ++--
 fs/notify/mark.c                 |   18 +++++++++---------
 fs/notify/vfsmount_mark.c        |    4 ++--
 include/linux/fsnotify_backend.h |    2 +-
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index 64ad3f0..d0eea75 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -95,7 +95,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
 	init_waitqueue_head(&group->notification_waitq);
 	group->max_events = UINT_MAX;
 
-	spin_lock_init(&group->mark_lock);
+	mutex_init(&group->mark_mutex);
 	INIT_LIST_HEAD(&group->marks_list);
 
 	group->ops = ops;
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 07ea8d3..d14ff75 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -63,8 +63,8 @@ void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
 {
 	struct inode *inode = mark->i.inode;
 
+	BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
 	assert_spin_locked(&mark->lock);
-	assert_spin_locked(&mark->group->mark_lock);
 
 	spin_lock(&inode->i_lock);
 
@@ -191,8 +191,8 @@ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
 
 	mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
 
+	BUG_ON(!mutex_is_locked(&group->mark_mutex));
 	assert_spin_locked(&mark->lock);
-	assert_spin_locked(&group->mark_lock);
 
 	spin_lock(&inode->i_lock);
 
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index e621115..bfb148e 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -136,13 +136,13 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 	group = mark->group;
 	spin_unlock(&mark->lock);
 
-	spin_lock(&group->mark_lock);
+	mutex_lock(&group->mark_mutex);
 	spin_lock(&mark->lock);
 
 	/* something else already called this function on this mark */
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
 		spin_unlock(&mark->lock);
-		spin_unlock(&group->mark_lock);
+		mutex_unlock(&group->mark_mutex);
 		goto put_group;
 	}
 
@@ -162,7 +162,7 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 	list_del_init(&mark->g_list);
 
 	spin_unlock(&mark->lock);
-	spin_unlock(&group->mark_lock);
+	mutex_unlock(&group->mark_mutex);
 
 	spin_lock(&destroy_lock);
 	list_add(&mark->destroy_list, &destroy_list);
@@ -230,11 +230,11 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 
 	/*
 	 * LOCKING ORDER!!!!
-	 * group->mark_lock
+	 * group->mark_mutex
 	 * mark->lock
 	 * inode->i_lock
 	 */
-	spin_lock(&group->mark_lock);
+	mutex_lock(&group->mark_mutex);
 
 	spin_lock(&mark->lock);
 	mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
@@ -261,7 +261,7 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 	fsnotify_set_mark_mask_locked(mark, mark->mask);
 	spin_unlock(&mark->lock);
 
-	spin_unlock(&group->mark_lock);
+	mutex_unlock(&group->mark_mutex);
 
 	if (inode)
 		__fsnotify_update_child_dentry_flags(inode);
@@ -275,7 +275,7 @@ err:
 	atomic_dec(&group->num_marks);
 
 	spin_unlock(&mark->lock);
-	spin_unlock(&group->mark_lock);
+	mutex_unlock(&group->mark_mutex);
 
 	spin_lock(&destroy_lock);
 	list_add(&mark->destroy_list, &destroy_list);
@@ -294,7 +294,7 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 	struct fsnotify_mark *lmark, *mark;
 	LIST_HEAD(free_list);
 
-	spin_lock(&group->mark_lock);
+	mutex_lock(&group->mark_mutex);
 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
 		if (mark->flags & flags) {
 			list_add(&mark->free_g_list, &free_list);
@@ -302,7 +302,7 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 			fsnotify_get_mark(mark);
 		}
 	}
-	spin_unlock(&group->mark_lock);
+	mutex_unlock(&group->mark_mutex);
 
 	list_for_each_entry_safe(mark, lmark, &free_list, free_g_list) {
 		fsnotify_destroy_mark(mark);
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index e86577d..8d09677 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -85,8 +85,8 @@ void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark)
 {
 	struct vfsmount *mnt = mark->m.mnt;
 
+	BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
 	assert_spin_locked(&mark->lock);
-	assert_spin_locked(&mark->group->mark_lock);
 
 	spin_lock(&mnt->mnt_root->d_lock);
 
@@ -146,8 +146,8 @@ int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
 
 	mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
 
+	BUG_ON(!mutex_is_locked(&group->mark_mutex));
 	assert_spin_locked(&mark->lock);
-	assert_spin_locked(&group->mark_lock);
 
 	spin_lock(&mnt->mnt_root->d_lock);
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 073b72b..b0c3908 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -141,7 +141,7 @@ struct fsnotify_group {
 	unsigned int priority;
 
 	/* stores all fastpath marks assoc with this group so they can be cleaned on unregister */
-	spinlock_t mark_lock;		/* protect marks_list */
+	struct mutex mark_mutex;	/* protect marks_list */
 	atomic_t num_marks;		/* 1 for each mark and 1 for not being
 					 * past the point of no return when freeing
 					 * a group */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] fsnotify: pass group to fsnotify_destroy_mark()
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
                   ` (5 preceding siblings ...)
  2011-06-14 15:29 ` [PATCH 6/9] fsnotify: use a mutex instead of a spinlock to protect a groups mark list Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 8/9] fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() Lino Sanfilippo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

In fsnotify_destroy_mark() dont get the group from the passed mark anymore,
but pass the group itself as an additional parameter to the function.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/dnotify/dnotify.c          |    4 ++--
 fs/notify/fanotify/fanotify_user.c   |    4 ++--
 fs/notify/inode_mark.c               |   10 +++++++++-
 fs/notify/inotify/inotify_fsnotify.c |    2 +-
 fs/notify/inotify/inotify_user.c     |    2 +-
 fs/notify/mark.c                     |   21 ++++-----------------
 fs/notify/vfsmount_mark.c            |   10 +++++++++-
 include/linux/fsnotify_backend.h     |    5 +++--
 kernel/audit_tree.c                  |   10 +++++-----
 kernel/audit_watch.c                 |    4 ++--
 10 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 3344bdd..08b886f 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -201,7 +201,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id)
 
 	/* nothing else could have found us thanks to the dnotify_mark_mutex */
 	if (dn_mark->dn == NULL)
-		fsnotify_destroy_mark(fsn_mark);
+		fsnotify_destroy_mark(fsn_mark, dnotify_group);
 
 	mutex_unlock(&dnotify_mark_mutex);
 
@@ -385,7 +385,7 @@ out:
 	spin_unlock(&fsn_mark->lock);
 
 	if (destroy)
-		fsnotify_destroy_mark(fsn_mark);
+		fsnotify_destroy_mark(fsn_mark, dnotify_group);
 
 	mutex_unlock(&dnotify_mark_mutex);
 	fsnotify_put_mark(fsn_mark);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8424bee..88630be 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -548,7 +548,7 @@ static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
 	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
 						 &destroy_mark);
 	if (destroy_mark)
-		fsnotify_destroy_mark(fsn_mark);
+		fsnotify_destroy_mark(fsn_mark, group);
 
 	fsnotify_put_mark(fsn_mark);
 	if (removed & mnt->mnt_fsnotify_mask)
@@ -572,7 +572,7 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
 						 &destroy_mark);
 	if (destroy_mark)
-		fsnotify_destroy_mark(fsn_mark);
+		fsnotify_destroy_mark(fsn_mark, group);
 	/* matches the fsnotify_find_inode_mark() */
 	fsnotify_put_mark(fsn_mark);
 	if (removed & inode->i_fsnotify_mask)
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index d14ff75..e0a02d8 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -99,8 +99,16 @@ void fsnotify_clear_marks_by_inode(struct inode *inode)
 	spin_unlock(&inode->i_lock);
 
 	list_for_each_entry_safe(mark, lmark, &free_list, i.free_i_list) {
-		fsnotify_destroy_mark(mark);
+		struct fsnotify_group *group;
+
+		spin_lock(&mark->lock);
+		fsnotify_get_group(mark->group);
+		group = mark->group;
+		spin_unlock(&mark->lock);
+
+		fsnotify_destroy_mark(mark, group);
 		fsnotify_put_mark(mark);
+		fsnotify_put_group(group);
 	}
 }
 
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 74977fb..871569c 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -132,7 +132,7 @@ static int inotify_handle_event(struct fsnotify_group *group,
 	}
 
 	if (inode_mark->mask & IN_ONESHOT)
-		fsnotify_destroy_mark(inode_mark);
+		fsnotify_destroy_mark(inode_mark, group);
 
 	return ret;
 }
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 246250f..00ff82f 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -816,7 +816,7 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
 
 	ret = 0;
 
-	fsnotify_destroy_mark(&i_mark->fsn_mark);
+	fsnotify_destroy_mark(&i_mark->fsn_mark, group);
 
 	/* match ref taken by inotify_idr_find */
 	fsnotify_put_mark(&i_mark->fsn_mark);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index bfb148e..5c335de 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -121,21 +121,11 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
  * The caller had better be holding a reference to this mark so we don't actually
  * do the final put under the mark->lock
  */
-void fsnotify_destroy_mark(struct fsnotify_mark *mark)
+void fsnotify_destroy_mark(struct fsnotify_mark *mark,
+			   struct fsnotify_group *group)
 {
-	struct fsnotify_group *group;
 	struct inode *inode = NULL;
 
-	spin_lock(&mark->lock);
-	/* dont get the group from a mark that is not alive yet */
-	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
-		spin_unlock(&mark->lock);
-		return;
-	}
-	fsnotify_get_group(mark->group);
-	group = mark->group;
-	spin_unlock(&mark->lock);
-
 	mutex_lock(&group->mark_mutex);
 	spin_lock(&mark->lock);
 
@@ -143,7 +133,7 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
 		spin_unlock(&mark->lock);
 		mutex_unlock(&group->mark_mutex);
-		goto put_group;
+		return;
 	}
 
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
@@ -192,9 +182,6 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
 	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
 		iput(inode);
 	atomic_dec(&group->num_marks);
-
-put_group:
-	fsnotify_put_group(group);
 }
 
 void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask)
@@ -305,7 +292,7 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 	mutex_unlock(&group->mark_mutex);
 
 	list_for_each_entry_safe(mark, lmark, &free_list, free_g_list) {
-		fsnotify_destroy_mark(mark);
+		fsnotify_destroy_mark(mark, group);
 		fsnotify_put_mark(mark);
 	}
 }
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index 8d09677..53987c6 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -44,8 +44,16 @@ void fsnotify_clear_marks_by_mount(struct vfsmount *mnt)
 	spin_unlock(&mnt->mnt_root->d_lock);
 
 	list_for_each_entry_safe(mark, lmark, &free_list, m.free_m_list) {
-		fsnotify_destroy_mark(mark);
+		struct fsnotify_group *group;
+
+		spin_lock(&mark->lock);
+		fsnotify_get_group(mark->group);
+		group = mark->group;
+		spin_unlock(&mark->lock);
+
+		fsnotify_destroy_mark(mark, group);
 		fsnotify_put_mark(mark);
+		fsnotify_put_group(group);
 	}
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index b0c3908..acf5f6d 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -408,8 +408,9 @@ extern void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask
 /* attach the mark to both the group and the inode */
 extern int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
 			     struct inode *inode, struct vfsmount *mnt, int allow_dups);
-/* given a mark, flag it to be freed when all references are dropped */
-extern void fsnotify_destroy_mark(struct fsnotify_mark *mark);
+/* given a group and a mark, flag mark to be freed when all references are dropped */
+extern void fsnotify_destroy_mark(struct fsnotify_mark *mark,
+				  struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the vfsmount marks */
 extern void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the inode marks */
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index e99dda0..01c0d05 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -255,7 +255,7 @@ static void untag_chunk(struct node *p)
 		list_del_rcu(&chunk->hash);
 		spin_unlock(&hash_lock);
 		spin_unlock(&entry->lock);
-		fsnotify_destroy_mark(entry);
+		fsnotify_destroy_mark(entry, audit_tree_group);
 		fsnotify_put_mark(entry);
 		goto out;
 	}
@@ -298,7 +298,7 @@ static void untag_chunk(struct node *p)
 		owner->root = new;
 	spin_unlock(&hash_lock);
 	spin_unlock(&entry->lock);
-	fsnotify_destroy_mark(entry);
+	fsnotify_destroy_mark(entry, audit_tree_group);
 	fsnotify_put_mark(entry);
 	goto out;
 
@@ -338,7 +338,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 		spin_unlock(&hash_lock);
 		chunk->dead = 1;
 		spin_unlock(&entry->lock);
-		fsnotify_destroy_mark(entry);
+		fsnotify_destroy_mark(entry, audit_tree_group);
 		fsnotify_put_mark(entry);
 		return 0;
 	}
@@ -418,7 +418,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		spin_unlock(&chunk_entry->lock);
 		spin_unlock(&old_entry->lock);
 
-		fsnotify_destroy_mark(chunk_entry);
+		fsnotify_destroy_mark(chunk_entry, audit_tree_group);
 
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
@@ -449,7 +449,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	spin_unlock(&hash_lock);
 	spin_unlock(&chunk_entry->lock);
 	spin_unlock(&old_entry->lock);
-	fsnotify_destroy_mark(old_entry);
+	fsnotify_destroy_mark(old_entry, audit_tree_group);
 	fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
 	fsnotify_put_mark(old_entry); /* and kill it */
 	return 0;
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index e683869..a0825e7 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -349,7 +349,7 @@ static void audit_remove_parent_watches(struct audit_parent *parent)
 	}
 	mutex_unlock(&audit_filter_mutex);
 
-	fsnotify_destroy_mark(&parent->mark);
+	fsnotify_destroy_mark(&parent->mark, audit_watch_group);
 }
 
 /* Get path information necessary for adding watches. */
@@ -475,7 +475,7 @@ void audit_remove_watch_rule(struct audit_krule *krule)
 
 		if (list_empty(&parent->watches)) {
 			audit_get_parent(parent);
-			fsnotify_destroy_mark(&parent->mark);
+			fsnotify_destroy_mark(&parent->mark, audit_watch_group);
 			audit_put_parent(parent);
 		}
 	}
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
                   ` (6 preceding siblings ...)
  2011-06-14 15:29 ` [PATCH 7/9] fsnotify: pass group to fsnotify_destroy_mark() Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-06-14 15:29 ` [PATCH 9/9] fsnotify: dont put marks on temporary list when clearing marks by group Lino Sanfilippo
  2011-08-01 20:38 ` [PATCH 0/9] fsnotify: change locking order Eric Paris
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked()
which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but
assume that the caller has already taken the groups mark mutex.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/mark.c                 |   42 +++++++++++++++++++++++++++----------
 include/linux/fsnotify_backend.h |    4 +++
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 5c335de..43a5154 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -121,18 +121,18 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
  * The caller had better be holding a reference to this mark so we don't actually
  * do the final put under the mark->lock
  */
-void fsnotify_destroy_mark(struct fsnotify_mark *mark,
-			   struct fsnotify_group *group)
+void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
+				  struct fsnotify_group *group)
 {
 	struct inode *inode = NULL;
 
-	mutex_lock(&group->mark_mutex);
+	BUG_ON(!mutex_is_locked(&group->mark_mutex));
+
 	spin_lock(&mark->lock);
 
 	/* something else already called this function on this mark */
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
 		spin_unlock(&mark->lock);
-		mutex_unlock(&group->mark_mutex);
 		return;
 	}
 
@@ -152,6 +152,8 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 	list_del_init(&mark->g_list);
 
 	spin_unlock(&mark->lock);
+
+	/* release lock temporarily */
 	mutex_unlock(&group->mark_mutex);
 
 	spin_lock(&destroy_lock);
@@ -182,6 +184,16 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
 		iput(inode);
 	atomic_dec(&group->num_marks);
+
+	mutex_lock(&group->mark_mutex);
+}
+
+void fsnotify_destroy_mark(struct fsnotify_mark *mark,
+			   struct fsnotify_group *group)
+{
+	mutex_lock(&group->mark_mutex);
+	fsnotify_destroy_mark_locked(mark, group);
+	mutex_unlock(&group->mark_mutex);
 }
 
 void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask)
@@ -206,14 +218,15 @@ void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mas
  * These marks may be used for the fsnotify backend to determine which
  * event types should be delivered to which group.
  */
-int fsnotify_add_mark(struct fsnotify_mark *mark,
-		      struct fsnotify_group *group, struct inode *inode,
-		      struct vfsmount *mnt, int allow_dups)
+int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
+			     struct fsnotify_group *group, struct inode *inode,
+			     struct vfsmount *mnt, int allow_dups)
 {
 	int ret = 0;
 
 	BUG_ON(inode && mnt);
 	BUG_ON(!inode && !mnt);
+	BUG_ON(!mutex_is_locked(&group->mark_mutex));
 
 	/*
 	 * LOCKING ORDER!!!!
@@ -221,8 +234,6 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 	 * mark->lock
 	 * inode->i_lock
 	 */
-	mutex_lock(&group->mark_mutex);
-
 	spin_lock(&mark->lock);
 	mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE;
 
@@ -248,8 +259,6 @@ int fsnotify_add_mark(struct fsnotify_mark *mark,
 	fsnotify_set_mark_mask_locked(mark, mark->mask);
 	spin_unlock(&mark->lock);
 
-	mutex_unlock(&group->mark_mutex);
-
 	if (inode)
 		__fsnotify_update_child_dentry_flags(inode);
 
@@ -262,7 +271,6 @@ err:
 	atomic_dec(&group->num_marks);
 
 	spin_unlock(&mark->lock);
-	mutex_unlock(&group->mark_mutex);
 
 	spin_lock(&destroy_lock);
 	list_add(&mark->destroy_list, &destroy_list);
@@ -272,6 +280,16 @@ err:
 	return ret;
 }
 
+int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
+		      struct inode *inode, struct vfsmount *mnt, int allow_dups)
+{
+	int ret;
+	mutex_lock(&group->mark_mutex);
+	ret = fsnotify_add_mark_locked(mark, group, inode, mnt, allow_dups);
+	mutex_unlock(&group->mark_mutex);
+	return ret;
+}
+
 /*
  * clear any marks in a group in which mark->flags & flags is true
  */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index acf5f6d..6ea57c9 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -408,9 +408,13 @@ extern void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask
 /* attach the mark to both the group and the inode */
 extern int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
 			     struct inode *inode, struct vfsmount *mnt, int allow_dups);
+extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, struct fsnotify_group *group,
+				    struct inode *inode, struct vfsmount *mnt, int allow_dups);
 /* given a group and a mark, flag mark to be freed when all references are dropped */
 extern void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 				  struct fsnotify_group *group);
+extern void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
+					 struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the vfsmount marks */
 extern void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the inode marks */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] fsnotify: dont put marks on temporary list when clearing marks by group
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
                   ` (7 preceding siblings ...)
  2011-06-14 15:29 ` [PATCH 8/9] fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() Lino Sanfilippo
@ 2011-06-14 15:29 ` Lino Sanfilippo
  2011-08-01 20:38 ` [PATCH 0/9] fsnotify: change locking order Eric Paris
  9 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2011-06-14 15:29 UTC (permalink / raw)
  To: eparis; +Cc: linux-fsdevel, viro, Lino Sanfilippo

In clear_marks_by_group_flags() the mark list of a group is iterated and the
marks are put on a temporary list.
Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list
any more and are able to remove the marks while the mark list is iterated and
the mark list mutex is held.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/mark.c                 |   10 ++--------
 include/linux/fsnotify_backend.h |    1 -
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 43a5154..1d1cc60 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -297,22 +297,16 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 					 unsigned int flags)
 {
 	struct fsnotify_mark *lmark, *mark;
-	LIST_HEAD(free_list);
 
 	mutex_lock(&group->mark_mutex);
 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
 		if (mark->flags & flags) {
-			list_add(&mark->free_g_list, &free_list);
-			list_del_init(&mark->g_list);
 			fsnotify_get_mark(mark);
+			fsnotify_destroy_mark_locked(mark, group);
+			fsnotify_put_mark(mark);
 		}
 	}
 	mutex_unlock(&group->mark_mutex);
-
-	list_for_each_entry_safe(mark, lmark, &free_list, free_g_list) {
-		fsnotify_destroy_mark(mark, group);
-		fsnotify_put_mark(mark);
-	}
 }
 
 /*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 6ea57c9..6e93f9b 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -287,7 +287,6 @@ struct fsnotify_mark {
 		struct fsnotify_inode_mark i;
 		struct fsnotify_vfsmount_mark m;
 	};
-	struct list_head free_g_list;	/* tmp list used when freeing this mark */
 	__u32 ignored_mask;		/* events types to ignore */
 #define FSNOTIFY_MARK_FLAG_INODE		0x01
 #define FSNOTIFY_MARK_FLAG_VFSMOUNT		0x02
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
                   ` (8 preceding siblings ...)
  2011-06-14 15:29 ` [PATCH 9/9] fsnotify: dont put marks on temporary list when clearing marks by group Lino Sanfilippo
@ 2011-08-01 20:38 ` Eric Paris
  2011-08-11 23:13   ` Lino Sanfilippo
  9 siblings, 1 reply; 20+ messages in thread
From: Eric Paris @ 2011-08-01 20:38 UTC (permalink / raw)
  To: Lino Sanfilippo; +Cc: linux-fsdevel, viro

[-- Attachment #1: Type: text/plain, Size: 6034 bytes --]

On 06/14/2011 11:29 AM, Lino Sanfilippo wrote:
> Hi Eric,
> 
> After the patch series "decouple mark lock and marks fsobject lock" 
> I sent on Feb. 21 this is another attempt to change the locking order
> used in fsnotify from 
> 
> 	mark->lock
> 	group->mark_lock
> 	inode/mnt->lock
> to 
> 	group->mark_lock
> 	mark->lock
> 	inode/mnt->lock
> 
> The former series still contained some flaws:
> 1. inodes/mounts that have marks and are not pinned could dissappear while 
> another thread still wants to access a marks inode/mount (see https://lkml.org/lkml/2011/3/1/373)
> 2. in the new introduced function remove_mark_locked() a group could be freed 
> while the groups mark mutex is held
> 
> Both issues should be fixed with this series.
> 
> The reason for changing the locking order is, as you may remember, that there are 
> still some races when adding/removing marks to a group 
> (see also https://lkml.org/lkml/2010/11/30/189).
> 
> So what this series does is change the locking order (PATCH 4) to
> 
> 	group->mark_mutex
> 	mark->lock
> 	inode/mnt->lock
> 
> Beside this the group mark_lock is turned into a mutex (PATCH 6), which allows us to 
> call functions that may sleep while this lock is held. 
> 
> At some places there is only a mark and no group available 
> (i.e. clear_marks_by_[inode|mount]()), so we first have to get the group from the mark
> to use the group->mark_mutex (PATCH 7).  
> 
> Since we cant get a group from a mark and use it without the danger that the group is 
> unregistered and freed, reference counting for groups is introduced and used 
> (PATCHES 1 to 3) for each mark that is added to the group. 
> With this freeing a group does not longer depend on the number of marks, but is 
> simply destroyed when the reference counter hits 0.
> 
> Since a group ref is now taken for each mark that is added to the group we first
> have to explicitly call fsnotify_destroy_group() to remove all marks from the group
> and release all group references held by those marks. This will also release the
> - possibly final - ref of the group held by the caller (PATCH 1).
> 
> Furthermore we now can take the mark lock with the group mutex held so we dont need an
> extra "clear list" any more if we clear all marks by group (PATCH 9).
> 
> For [add|remove]_mark() locked versions are introduced (PATCH 8) that can be 
> used if a caller already has the mark lock held. Thus we now have the possibility
> to use the groups mark mutex to also synchronize addition/removal of a mark or to
> replace the dnotify mutex. 
> This is not part of these patches. It would be the next step if these patches are 
> accepted to be merged.
> 
> This is against 2.6.39

I finally built and tested a v3.0 kernel with these patches (I know I'm
SOOOOOO far behind).  Not what I hoped for:

> [  150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds.  Have a nice day...
> [  150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
> [  150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> [  150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0 
> [  150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> [  150.946012] CPU 0 
> [  150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
> [  150.946012] 
> [  150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
> [  150.946012] RIP: 0010:[<ffffffff810ffd58>]  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> [  150.946012] RSP: 0018:ffff88002c2e5df8  EFLAGS: 00010282
> [  150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
> [  150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
> [  150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
> [  150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
> [  150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
> [  150.946012] FS:  00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
> [  150.946012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
> [  150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
> [  150.946012] Stack:
> [  150.946012]  ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
> [  150.946012]  ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
> [  150.946012]  ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
> [  150.946012] Call Trace:
> [  150.946012]  [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130
> [  150.946012]  [<ffffffff8115e9be>] evict+0x7e/0x170
> [  150.946012]  [<ffffffff8115ee40>] iput_final+0xd0/0x190
> [  150.946012]  [<ffffffff8115ef33>] iput+0x33/0x40
> [  150.946012]  [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160
> [  150.946012]  [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50
> [  150.946012]  [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0
> [  150.946012]  [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b
> [  150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00 
> [  150.946012]  83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a 
> [  150.946012] RIP  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> [  150.946012]  RSP <ffff88002c2e5df8>
> [  150.946012] CR2: 0000000000000070

Looks at aweful lot like the problem from:
http://www.spinics.net/lists/linux-fsdevel/msg46101.html

Latest stress test program attached.....

-Eric

[-- Attachment #2: syscall_thrash.c --]
[-- Type: text/plain, Size: 9878 bytes --]

#define _GNU_SOURCE

#include <errno.h>
#include <fcntl.h>
#include <limits.h>
#include <pthread.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/inotify.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define NUM_CORES		2
#define NUM_DATA_DUMPERS	4
#define WATCHER_MULTIPLIER	4
#define NUM_WATCHER_THREADS 	NUM_CORES
#define NUM_CLOSER_THREADS	NUM_WATCHER_THREADS * 2
#define NUM_ZERO_CLOSERS	1
#define NUM_FILE_CREATERS	2
#define NUM_INOTIFY_INSTANCES	2

#define TMP_DIR_NAME "/tmp/inotify_syscall_thrash"

struct watcher_struct {
	int inotify_fd;
	int file_num;
};

struct operator_struct {
	int inotify_fd;
};

static int stopped = 0;

static int high_wd = 0;
static int low_wd = INT_MAX;

void *add_watches(void *ptr);
void *close_watches(void *ptr);
void *zero_close_watches(void *ptr);
void *dump_data(void *ptr);
void *reset_low_wd(void *ptr);
void *create_files(void *ptr);
void *mount_tmpdir(void *ptr);
void sigfunc(int sig_num);

int handle_error(const char *arg)
{
	perror(arg);
	exit(1);
}

int main(void)
{
	int inotify_fd[NUM_INOTIFY_INSTANCES];
	struct watcher_struct *watcher_arg;
	struct operator_struct *operator_arg;
	pthread_t *watchers;
	pthread_t *closers;
	pthread_t *zero_closers;
	pthread_t *data_dumpers;
	pthread_t *file_creaters;
	pthread_t low_wd_reseter;
	pthread_t mounter;
	pthread_attr_t attr;
	int iret;
	void *ret;
	int i, j, k;
	struct sigaction setmask;

	/* close cleanly on cntl+c */
	sigemptyset( &setmask.sa_mask );
	setmask.sa_handler = sigfunc;
	setmask.sa_flags   = 0;
	sigaction( SIGINT,  &setmask, (struct sigaction *) NULL );

	/* create and inotify instance an make it O_NONBLOCK */
	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++) {
		int fd  = inotify_init();
		if (fd < 0)
			handle_error("opening inotify_fd");
		iret = fcntl(fd, F_SETFL, O_NONBLOCK);
		if (iret)
			handle_error("setting NONBLOCK");
		inotify_fd[i] = fd;
	}

	/* make sure the directory exists */
	mkdir(TMP_DIR_NAME, S_IRWXU);

	/* set up a pthread attr with a tiny stack */
	iret = pthread_attr_init(&attr);
	if (iret)
		handle_error("pthread_attr_init");
	iret = pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN*2);
	if (iret)
		handle_error("pthread_attr_setstacksize");

	/* watchers need to know what file to pay with, so we need and argument */
	watcher_arg = calloc(NUM_INOTIFY_INSTANCES * NUM_WATCHER_THREADS, sizeof(struct watcher_struct));
	if (!watcher_arg)
		handle_error("allocating watcher_arg");

	operator_arg = calloc(NUM_INOTIFY_INSTANCES, sizeof(struct operator_struct));
	if (!operator_arg)
		handle_error("allocating operator_arg");

	/* allocate the pthread_t's for all of the threads */
	watchers = calloc(NUM_INOTIFY_INSTANCES * NUM_WATCHER_THREADS * WATCHER_MULTIPLIER, sizeof(pthread_t));
	if (!watchers)
		handle_error("allocating watchers");

	closers = calloc(NUM_INOTIFY_INSTANCES * NUM_CLOSER_THREADS, sizeof(pthread_t));
	if (!closers)
		handle_error("allocating closers");

	zero_closers = calloc(NUM_INOTIFY_INSTANCES * NUM_ZERO_CLOSERS, sizeof(pthread_t));
	if (!zero_closers)
		handle_error("allocating zero_closers");

	data_dumpers = calloc(NUM_INOTIFY_INSTANCES * NUM_DATA_DUMPERS, sizeof(pthread_t));
	if (!data_dumpers)
		handle_error("allocating data_dumpers");

	file_creaters = calloc(NUM_FILE_CREATERS, sizeof(*file_creaters));
	if (!file_creaters)
		handle_error("allocating file_creaters");

	/* create a thread that does nothing but reset the low_wd */
	iret = pthread_create(&low_wd_reseter, &attr, reset_low_wd, NULL);
	if (iret)
		handle_error("low_wd_reseter");

	iret = pthread_create(&mounter, &attr, mount_tmpdir, NULL);
	if (iret)
		handle_error("low_wd_reseter");

	/* create WATCHER_MULTIPLIER threads per file which do nothing
	 * but try to add a watch for each INOTIFY_INSTANCE */
	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++) {
		for (j = 0; j < NUM_WATCHER_THREADS; j++) {
			watcher_arg[i * NUM_WATCHER_THREADS + j].file_num = j;
			watcher_arg[i * NUM_WATCHER_THREADS + j].inotify_fd = inotify_fd[i];
			for (k = 0; k < WATCHER_MULTIPLIER; k++) {
				iret = pthread_create(&watchers[i * (NUM_WATCHER_THREADS * WATCHER_MULTIPLIER) +
								(j * WATCHER_MULTIPLIER) + k],
						      &attr, add_watches, &watcher_arg[i * NUM_WATCHER_THREADS + j]);
				if (iret)
					handle_error("creating water threads");
			}
		}
	}

	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		operator_arg[i].inotify_fd = inotify_fd[i];

	/* create threads which unlink and then recreate all of the files in question */
	for (i = 0; i < NUM_FILE_CREATERS; i++) {
		iret = pthread_create( &file_creaters[i], &attr, create_files, NULL);
		if (iret)
			handle_error("creating the file creators");
	}

	/* create threads which walk from low_wd to high_wd closing all of the wd's in between */
	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		for (j = 0; j < NUM_CLOSER_THREADS; j++) {
			iret = pthread_create ( &closers[i * NUM_CLOSER_THREADS + j], &attr, close_watches, &operator_arg[i]);
			if (iret)
				handle_error("creating the close threads");
		}

	/* create threads which just walk from low_wd to low_wd +3 closing wd's for extra races */
	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		for (j = 0; j < NUM_ZERO_CLOSERS; j++) {
			iret = pthread_create ( &zero_closers[i * NUM_ZERO_CLOSERS + j], &attr, zero_close_watches, &operator_arg[i]);
			if (iret)
				handle_error("creating the low closer threads");
		}

	/* create threads which just pull data off of the inotify fd.
	 * use default ATTR for larger stack */
	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		for (j = 0; j < NUM_DATA_DUMPERS; j++) {
			iret = pthread_create( &data_dumpers[i * NUM_DATA_DUMPERS + j], NULL, dump_data, &operator_arg[i]);
			if (iret)
				handle_error("creating threads to dump inotify data");
		}


	/* Wait till threads are complete before main continues. */
	pthread_join(low_wd_reseter, &ret);

	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		for (j = 0; j < NUM_WATCHER_THREADS; j++)
			for (k = 0; k < WATCHER_MULTIPLIER; k++)
				pthread_join(watchers[i * (NUM_WATCHER_THREADS * WATCHER_MULTIPLIER) +
						      (j * WATCHER_MULTIPLIER) + k], &ret);

	for (i = 0; i < NUM_FILE_CREATERS; i++)
		pthread_join(file_creaters[i], &ret);

	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		for (j = 0; j < NUM_CLOSER_THREADS; j++)
			pthread_join(closers[i * NUM_CLOSER_THREADS + j], &ret);

	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		for (j = 0; j < NUM_ZERO_CLOSERS; j++)
			pthread_join(zero_closers[i * NUM_ZERO_CLOSERS + j], &ret);

	for (i = 0; i < NUM_INOTIFY_INSTANCES; i++)
		for (j = 0; j < NUM_DATA_DUMPERS; j++)
			pthread_join(data_dumpers[i * NUM_DATA_DUMPERS + j], &ret);

	/* clean up the tmp dir which should be empty */
	rmdir(TMP_DIR_NAME);

	exit(0);
}

void sigfunc(int sig_num)
{
	if (sig_num == SIGINT)
		stopped = 1;
	else
		printf("Got an unknown signal!\n");
}

/* constantly create and delete all of the files that are bieng watched */
void *create_files(__attribute__ ((unused)) void *ptr)
{
	char filename[50];
	int i;

	fprintf(stderr, "Starting creator thread\n");

	while (!stopped) {
		for (i = 0; i < NUM_WATCHER_THREADS; i++) {
			int fd;

			snprintf(filename, 50, "%s/%d", TMP_DIR_NAME, i);
			unlink(filename);
			fd = open(filename, O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
			if (fd >= 0)
				close(fd);
		}
		sleep(10);
	}

	/* cleanup all files on exit */
	for (i = 0; i < NUM_WATCHER_THREADS; i++) {
		snprintf(filename, 50, "%s/%d", TMP_DIR_NAME, i);
		unlink(filename);
	}

	return NULL;
}

/* Reset the low_wd so closers can be smart */
void *reset_low_wd(__attribute__ ((unused)) void *ptr)
{
	fprintf(stderr, "Starting low_wd reset thread\n");

	while (!stopped) {
		low_wd = INT_MAX;
		sleep(1);
	}

	return NULL;
}

/* Pull events off the buffer and ignore them */
void *dump_data(void *ptr)
{
	char buf[8096];
	struct operator_struct *operator_arg = ptr;
	int inotify_fd = operator_arg->inotify_fd;
	int ret;

	fprintf(stderr, "Starting inotify data dumper thread\n");

	while (!stopped) {
		ret = read(inotify_fd, buf, 8096);
		if (ret <= 0)
			pthread_yield();
	}

	return NULL;
}

/* add a watch to a specific file as fast as we can */
void *add_watches(void *ptr)
{
	struct watcher_struct *watcher_arg = ptr;
	int file_num = watcher_arg->file_num;
	int notify_fd = watcher_arg->inotify_fd;
	int ret;
	char filename[50];

	fprintf(stderr, "Creating an watch adder thread, notify_fd=%d filenum=%d\n",
		notify_fd, file_num);

	snprintf(filename, 50, "%s/%d", TMP_DIR_NAME, file_num);

	while (!stopped) {
		ret = inotify_add_watch(notify_fd, filename, IN_ALL_EVENTS);
		if (ret < 0 && errno != ENOENT)
			perror("inotify_add_watch");
		if (ret > high_wd)
			high_wd = ret;
		if (ret < low_wd)
			low_wd = ret;
		pthread_yield();
	}

	return NULL;
}

/* run from low_wd to high_wd closing all watches in between */
void *close_watches(void *ptr)
{
	struct operator_struct *operator_arg = ptr;
	int inotify_fd = operator_arg->inotify_fd;
	int i;

	fprintf(stderr, "Starting a thread to close watchers\n");

	while (!stopped) {
		for (i = low_wd; i < high_wd; i++)
			inotify_rm_watch(inotify_fd, i);
		pthread_yield();
	}
	return NULL;
}

/* run from low_wd to low_wd+3 closing all watch in between just for extra races */
void *zero_close_watches(void *ptr)
{
	struct operator_struct *operator_arg = ptr;
	int inotify_fd = operator_arg->inotify_fd;
	int i;
	while (!stopped) {
		for (i = low_wd; i <= low_wd+3; i++)
			inotify_rm_watch(inotify_fd, i);
		pthread_yield();
	}
	return NULL;
}

void *mount_tmpdir(__attribute__ ((unused)) void *ptr)
{
	int rc;

	while (!stopped) {
		rc = mount(TMP_DIR_NAME, TMP_DIR_NAME, "tmpfs", MS_MGC_VAL, "rootcontext=\"unconfined_u:object_r:tmp_t:s0\"");
		usleep(100000);
		if (!rc)
			umount2(TMP_DIR_NAME, MNT_DETACH);
		usleep(100000);
	}
	return NULL;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2011-08-01 20:38 ` [PATCH 0/9] fsnotify: change locking order Eric Paris
@ 2011-08-11 23:13   ` Lino Sanfilippo
  2012-03-20 18:49     ` Luis Henriques
  0 siblings, 1 reply; 20+ messages in thread
From: Lino Sanfilippo @ 2011-08-11 23:13 UTC (permalink / raw)
  To: Eric Paris; +Cc: viro, linux-fsdevel

On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
> 
> I finally built and tested a v3.0 kernel with these patches (I know I'm
> SOOOOOO far behind).  Not what I hoped for:
> 
> > [  150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds.  Have a nice day...
> > [  150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
> > [  150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> > [  150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0 
> > [  150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [  150.946012] CPU 0 
> > [  150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
> > [  150.946012] 
> > [  150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
> > [  150.946012] RIP: 0010:[<ffffffff810ffd58>]  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> > [  150.946012] RSP: 0018:ffff88002c2e5df8  EFLAGS: 00010282
> > [  150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
> > [  150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
> > [  150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
> > [  150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
> > [  150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
> > [  150.946012] FS:  00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
> > [  150.946012] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
> > [  150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [  150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
> > [  150.946012] Stack:
> > [  150.946012]  ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
> > [  150.946012]  ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
> > [  150.946012]  ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
> > [  150.946012] Call Trace:
> > [  150.946012]  [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130
> > [  150.946012]  [<ffffffff8115e9be>] evict+0x7e/0x170
> > [  150.946012]  [<ffffffff8115ee40>] iput_final+0xd0/0x190
> > [  150.946012]  [<ffffffff8115ef33>] iput+0x33/0x40
> > [  150.946012]  [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160
> > [  150.946012]  [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50
> > [  150.946012]  [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0
> > [  150.946012]  [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b
> > [  150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00 
> > [  150.946012]  83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a 
> > [  150.946012] RIP  [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> > [  150.946012]  RSP <ffff88002c2e5df8>
> > [  150.946012] CR2: 0000000000000070
> 
> Looks at aweful lot like the problem from:
> http://www.spinics.net/lists/linux-fsdevel/msg46101.html
> 

I tried to reproduce this bug with your test program, but without success.
However, if I understand correctly, this occurs since we dont hold any locks when
we call iput() in mark_destroy(), right?
With the patches you tested, iput() is also not called within any lock, since the
groups mark_mutex is released temporarily before iput() is called.  This is, since
the original codes behaviour is similar.
However since we now have a mutex as the biggest lock, we can do what you 
suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and 
call iput() with the mutex held to avoid the race.
The patch below implements this. It uses nested locking to avoid deadlock in case
we do the final iput() on an inode which still holds marks and thus would take
the mutex again when calling fsnotify_inode_delete() in destroy_inode().


Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
---
 fs/notify/mark.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 1d1cc60..acd2072 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -153,6 +153,8 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
 
 	spin_unlock(&mark->lock);
 
+	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
+		iput(inode);
 	/* release lock temporarily */
 	mutex_unlock(&group->mark_mutex);
 
@@ -181,17 +183,15 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
 	 * is just a lazy update (and could be a perf win...)
 	 */
 
-	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
-		iput(inode);
 	atomic_dec(&group->num_marks);
 
-	mutex_lock(&group->mark_mutex);
+	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
 }
 
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 			   struct fsnotify_group *group)
 {
-	mutex_lock(&group->mark_mutex);
+	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
 	fsnotify_destroy_mark_locked(mark, group);
 	mutex_unlock(&group->mark_mutex);
 }
@@ -298,7 +298,7 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 {
 	struct fsnotify_mark *lmark, *mark;
 
-	mutex_lock(&group->mark_mutex);
+	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
 		if (mark->flags & flags) {
 			fsnotify_get_mark(mark);
-- 
1.5.6.5



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2011-08-11 23:13   ` Lino Sanfilippo
@ 2012-03-20 18:49     ` Luis Henriques
  2012-03-20 18:58       ` Eric Paris
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Henriques @ 2012-03-20 18:49 UTC (permalink / raw)
  To: Lino Sanfilippo; +Cc: Eric Paris, viro, linux-fsdevel


Hi Lino,

Lino Sanfilippo <LinoSanfilippo@gmx.de> writes:

> On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
...
>> 
>> Looks at aweful lot like the problem from:
>> http://www.spinics.net/lists/linux-fsdevel/msg46101.html
>> 
>
> I tried to reproduce this bug with your test program, but without success.
> However, if I understand correctly, this occurs since we dont hold any locks when
> we call iput() in mark_destroy(), right?
> With the patches you tested, iput() is also not called within any lock, since the
> groups mark_mutex is released temporarily before iput() is called.  This is, since
> the original codes behaviour is similar.
> However since we now have a mutex as the biggest lock, we can do what you 
> suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and 
> call iput() with the mutex held to avoid the race.
> The patch below implements this. It uses nested locking to avoid deadlock in case
> we do the final iput() on an inode which still holds marks and thus would take
> the mutex again when calling fsnotify_inode_delete() in destroy_inode().

I know it's been a while since you posted this series, but I was
wondering if there has been any progress.  Is there anyone working on
this, or is it stalled?

Cheers,
--
Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2012-03-20 18:49     ` Luis Henriques
@ 2012-03-20 18:58       ` Eric Paris
  2012-03-20 19:05         ` Luis Henriques
                           ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Eric Paris @ 2012-03-20 18:58 UTC (permalink / raw)
  To: Luis Henriques; +Cc: Lino Sanfilippo, viro, linux-fsdevel

I'm pretty sure it stalled on me.  I pushed his work to my tree months
and months ago and never sent it along to Linus.  I'll try to make sure
to do that this window.

-Eric

On Tue, 2012-03-20 at 18:49 +0000, Luis Henriques wrote:
> Hi Lino,
> 
> Lino Sanfilippo <LinoSanfilippo@gmx.de> writes:
> 
> > On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
> ...
> >> 
> >> Looks at aweful lot like the problem from:
> >> http://www.spinics.net/lists/linux-fsdevel/msg46101.html
> >> 
> >
> > I tried to reproduce this bug with your test program, but without success.
> > However, if I understand correctly, this occurs since we dont hold any locks when
> > we call iput() in mark_destroy(), right?
> > With the patches you tested, iput() is also not called within any lock, since the
> > groups mark_mutex is released temporarily before iput() is called.  This is, since
> > the original codes behaviour is similar.
> > However since we now have a mutex as the biggest lock, we can do what you 
> > suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and 
> > call iput() with the mutex held to avoid the race.
> > The patch below implements this. It uses nested locking to avoid deadlock in case
> > we do the final iput() on an inode which still holds marks and thus would take
> > the mutex again when calling fsnotify_inode_delete() in destroy_inode().
> 
> I know it's been a while since you posted this series, but I was
> wondering if there has been any progress.  Is there anyone working on
> this, or is it stalled?
> 
> Cheers,
> --
> Luis



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2012-03-20 18:58       ` Eric Paris
@ 2012-03-20 19:05         ` Luis Henriques
  2012-03-22 22:14         ` Lino Sanfilippo
  2012-03-26 11:27         ` Luis Henriques
  2 siblings, 0 replies; 20+ messages in thread
From: Luis Henriques @ 2012-03-20 19:05 UTC (permalink / raw)
  To: Eric Paris; +Cc: Lino Sanfilippo, viro, linux-fsdevel

Eric Paris <eparis@redhat.com> writes:

> I'm pretty sure it stalled on me.  I pushed his work to my tree months
> and months ago and never sent it along to Linus.  I'll try to make sure
> to do that this window.

Great, thanks a lot.
--
Luis

> -Eric
>
> On Tue, 2012-03-20 at 18:49 +0000, Luis Henriques wrote:
>> Hi Lino,
>> 
>> Lino Sanfilippo <LinoSanfilippo@gmx.de> writes:
>> 
>> > On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
>> ...
>> >> 
>> >> Looks at aweful lot like the problem from:
>> >> http://www.spinics.net/lists/linux-fsdevel/msg46101.html
>> >> 
>> >
>> > I tried to reproduce this bug with your test program, but without success.
>> > However, if I understand correctly, this occurs since we dont hold any locks when
>> > we call iput() in mark_destroy(), right?
>> > With the patches you tested, iput() is also not called within any lock, since the
>> > groups mark_mutex is released temporarily before iput() is called.  This is, since
>> > the original codes behaviour is similar.
>> > However since we now have a mutex as the biggest lock, we can do what you 
>> > suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and 
>> > call iput() with the mutex held to avoid the race.
>> > The patch below implements this. It uses nested locking to avoid deadlock in case
>> > we do the final iput() on an inode which still holds marks and thus would take
>> > the mutex again when calling fsnotify_inode_delete() in destroy_inode().
>> 
>> I know it's been a while since you posted this series, but I was
>> wondering if there has been any progress.  Is there anyone working on
>> this, or is it stalled?
>> 
>> Cheers,
>> --
>> Luis
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2012-03-20 18:58       ` Eric Paris
  2012-03-20 19:05         ` Luis Henriques
@ 2012-03-22 22:14         ` Lino Sanfilippo
  2012-03-26 11:27         ` Luis Henriques
  2 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2012-03-22 22:14 UTC (permalink / raw)
  To: Eric Paris; +Cc: henrix, viro, linux-fsdevel

On Tue, Mar 20, 2012 at 02:58:19PM -0400, Eric Paris wrote:
> I'm pretty sure it stalled on me.  I pushed his work to my tree months
> and months ago and never sent it along to Linus.  I'll try to make sure
> to do that this window.
> 

Hi,

thats good to hear. So the patches passed all your test cases now?

Lino

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2012-03-20 18:58       ` Eric Paris
  2012-03-20 19:05         ` Luis Henriques
  2012-03-22 22:14         ` Lino Sanfilippo
@ 2012-03-26 11:27         ` Luis Henriques
  2012-03-26 15:12           ` Eric Paris
  2 siblings, 1 reply; 20+ messages in thread
From: Luis Henriques @ 2012-03-26 11:27 UTC (permalink / raw)
  To: Eric Paris; +Cc: Lino Sanfilippo, viro, linux-fsdevel, luis.henriques


Hi Eric,

Eric Paris <eparis@redhat.com> writes:

> I'm pretty sure it stalled on me.  I pushed his work to my tree months
> and months ago and never sent it along to Linus.  I'll try to make sure
> to do that this window.

Are there any news on this?  Do you believe this patchset will make it
into this windows?  Also, I believe this is something that should be
considered to stable as well.

Cheers,
--
Luis

>
> -Eric
>
> On Tue, 2012-03-20 at 18:49 +0000, Luis Henriques wrote:
>> Hi Lino,
>> 
>> Lino Sanfilippo <LinoSanfilippo@gmx.de> writes:
>> 
>> > On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
>> ...
>> >> 
>> >> Looks at aweful lot like the problem from:
>> >> http://www.spinics.net/lists/linux-fsdevel/msg46101.html
>> >> 
>> >
>> > I tried to reproduce this bug with your test program, but without success.
>> > However, if I understand correctly, this occurs since we dont hold any locks when
>> > we call iput() in mark_destroy(), right?
>> > With the patches you tested, iput() is also not called within any lock, since the
>> > groups mark_mutex is released temporarily before iput() is called.  This is, since
>> > the original codes behaviour is similar.
>> > However since we now have a mutex as the biggest lock, we can do what you 
>> > suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and 
>> > call iput() with the mutex held to avoid the race.
>> > The patch below implements this. It uses nested locking to avoid deadlock in case
>> > we do the final iput() on an inode which still holds marks and thus would take
>> > the mutex again when calling fsnotify_inode_delete() in destroy_inode().
>> 
>> I know it's been a while since you posted this series, but I was
>> wondering if there has been any progress.  Is there anyone working on
>> this, or is it stalled?
>> 
>> Cheers,
>> --
>> Luis
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2012-03-26 11:27         ` Luis Henriques
@ 2012-03-26 15:12           ` Eric Paris
  2012-03-26 15:27             ` Luis Henriques
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Paris @ 2012-03-26 15:12 UTC (permalink / raw)
  To: Luis Henriques; +Cc: Lino Sanfilippo, viro, linux-fsdevel

On Mon, 2012-03-26 at 12:27 +0100, Luis Henriques wrote:
> Hi Eric,
> 
> Eric Paris <eparis@redhat.com> writes:
> 
> > I'm pretty sure it stalled on me.  I pushed his work to my tree months
> > and months ago and never sent it along to Linus.  I'll try to make sure
> > to do that this window.
> 
> Are there any news on this?  Do you believe this patchset will make it
> into this windows?  Also, I believe this is something that should be
> considered to stable as well.

Got back to things and learned I lied.  I committed to my tree, but
never sent to next.  I will wait until next window to push to Linus.  I
am not going to try to force large locking changes at the last second.
Sorry   :-(

This is all my fault.  I am sorry and need to be a better maintainer.

-Eric


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2012-03-26 15:12           ` Eric Paris
@ 2012-03-26 15:27             ` Luis Henriques
  2012-03-26 22:51               ` Lino Sanfilippo
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Henriques @ 2012-03-26 15:27 UTC (permalink / raw)
  To: Eric Paris; +Cc: Lino Sanfilippo, viro, linux-fsdevel

On Mon, Mar 26, 2012 at 11:12:56AM -0400, Eric Paris wrote:
> On Mon, 2012-03-26 at 12:27 +0100, Luis Henriques wrote:
> > Hi Eric,
> > 
> > Eric Paris <eparis@redhat.com> writes:
> > 
> > > I'm pretty sure it stalled on me.  I pushed his work to my tree months
> > > and months ago and never sent it along to Linus.  I'll try to make sure
> > > to do that this window.
> > 
> > Are there any news on this?  Do you believe this patchset will make it
> > into this windows?  Also, I believe this is something that should be
> > considered to stable as well.
> 
> Got back to things and learned I lied.  I committed to my tree, but
> never sent to next.  I will wait until next window to push to Linus.  I
> am not going to try to force large locking changes at the last second.
> Sorry   :-(
> 
> This is all my fault.  I am sorry and need to be a better maintainer.

No problem, and thank you for the update.  I guess this is the sort of
patches that actually requires a good amount of testing on -next, so
it makes sense to wait for next window.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] fsnotify: change locking order
  2012-03-26 15:27             ` Luis Henriques
@ 2012-03-26 22:51               ` Lino Sanfilippo
  0 siblings, 0 replies; 20+ messages in thread
From: Lino Sanfilippo @ 2012-03-26 22:51 UTC (permalink / raw)
  To: Luis Henriques; +Cc: eparis, viro, linux-fsdevel

On Mon, Mar 26, 2012 at 04:27:42PM +0100, Luis Henriques wrote:
> 
> No problem, and thank you for the update.  I guess this is the sort of
> patches that actually requires a good amount of testing on -next, so
> it makes sense to wait for next window.
> 

I totally agree. Better to have this in linux-next for a while and make
sure that no nasty side effects have been introduced.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-03-26 22:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-14 15:29 [PATCH 0/9] fsnotify: change locking order Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 1/9] inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group() Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 2/9] fsnotify: introduce fsnotify_get_group() Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 3/9] fsnotify: use reference counting for groups Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 4/9] fsnotify: take groups mark_lock before mark lock Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 5/9] fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 6/9] fsnotify: use a mutex instead of a spinlock to protect a groups mark list Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 7/9] fsnotify: pass group to fsnotify_destroy_mark() Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 8/9] fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() Lino Sanfilippo
2011-06-14 15:29 ` [PATCH 9/9] fsnotify: dont put marks on temporary list when clearing marks by group Lino Sanfilippo
2011-08-01 20:38 ` [PATCH 0/9] fsnotify: change locking order Eric Paris
2011-08-11 23:13   ` Lino Sanfilippo
2012-03-20 18:49     ` Luis Henriques
2012-03-20 18:58       ` Eric Paris
2012-03-20 19:05         ` Luis Henriques
2012-03-22 22:14         ` Lino Sanfilippo
2012-03-26 11:27         ` Luis Henriques
2012-03-26 15:12           ` Eric Paris
2012-03-26 15:27             ` Luis Henriques
2012-03-26 22:51               ` Lino Sanfilippo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.