[PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events
@ 2016-12-22  9:15 Jan Kara
  2016-12-22  9:15 ` [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo Jan Kara
                   ` (22 more replies)
  0 siblings, 23 replies; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Hello,

currently, fanotify waits for response to a permission even from userspace
process while holding fsnotify_mark_srcu lock. That has a consequence that
when userspace process takes long to respond or does not respond at all,
fsnotify_mark_srcu period cannot ever complete blocking reclaim of any
notification marks and also blocking any process that did synchronize_srcu()
on fsnotify_mark_srcu. Effectively, this eventually blocks anybody interacting
with the notification subsystem. Miklos has some real world reports of this
happening. Although this in principle a problem of broken userspace
application (which futhermore has to have CAP_SYS_ADMIN in init_user_ns, so
it is not a security problem), it is still nasty that a simple error can
block the kernel like this.

This patch set solves this problem. The basic idea of the solution is that
when fanotify needs to wait for response from userspace process, it grabs
reference to the mark which generated the event and drops fsnotify_mark_srcu
lock. When userspace responds, we grab fsnotify_mark_srcu again, drop
the mark reference, and continue iterating the list of marks attached to the
inode / vfsmount delivering the event to other notification groups. What
complicates this simple approach is that the mark for which we wait for
response has to stay pinned in the list of marks attached to the inode /
vfsmount so that we can resume iteration of the list when userspace responds
but on the other hand when the inode gets unlinked while we wait for userspace
reponse, we need to destroy the mark (or at least detach it from the inode).

The first 5 patches contain some initial fixes and cleanups. Patches 6-8
implement attaching of marks to inode / vfsmount via a dedicated structure
which allows us to detach list of marks from the object without having to
destroy the list itself. Patches 9-10 implement removal of mark from the
list of marks attached to an object when last mark reference is dropped.
Patches 11-15 then implement dropping of SRCU lock when waiting on response
from userspace. Patches 16-22 are mostly trivial cleanups that get rid of
trivial wrappers and one pointer in the mark structure.

Patches have survived testing with inotify/fanotify tests in LTP. I didn't test
audit - Paul can you give these patches some testing?  Since some of the
changes are really non-trivial, I'd welcome if someone reviewed the patch set.
Thanks!

Finally, to ease experimenting with the patches I've pushed them out to
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git for_testing

								Honza

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-22 12:59   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 02/22] inotify: Remove inode pointers from debug messages Jan Kara
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

show_fdinfo() iterates group's list of marks. All marks found there are
guaranteed to be alive and they stay so until we release
group->mark_mutex. So remove uncecessary tests whether mark is alive.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fdinfo.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index fd98e5100cab..601a59c8d87e 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -76,8 +76,7 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 	struct inotify_inode_mark *inode_mark;
 	struct inode *inode;
 
-	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE) ||
-	    !(mark->flags & FSNOTIFY_MARK_FLAG_INODE))
+	if (!(mark->flags & FSNOTIFY_MARK_FLAG_INODE))
 		return;
 
 	inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
@@ -113,9 +112,6 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 	unsigned int mflags = 0;
 	struct inode *inode;
 
-	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE))
-		return;
-
 	if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)
 		mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/22] inotify: Remove inode pointers from debug messages
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
  2016-12-22  9:15 ` [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-22 15:31   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex Jan Kara
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Printing inode pointers in warnings has dubious value and with future
changes we won't be able to easily get them without either locking or
chances we oops along the way. So just remove inode pointers from the
warning messages.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/inotify/inotify_fsnotify.c |  4 ++--
 fs/notify/inotify/inotify_user.c     | 25 ++++++++++---------------
 2 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 19e7ec109a75..8421f44b3cb3 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -155,8 +155,8 @@ static int idr_callback(int id, void *p, void *data)
 	 * BUG() that was here.
 	 */
 	if (fsn_mark)
-		printk(KERN_WARNING "fsn_mark->group=%p inode=%p wd=%d\n",
-			fsn_mark->group, fsn_mark->inode, i_mark->wd);
+		printk(KERN_WARNING "fsn_mark->group=%p wd=%d\n",
+			fsn_mark->group, i_mark->wd);
 	return 0;
 }
 
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 69d1ea3d292a..3697567c7897 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -431,18 +431,16 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
 	 * if it wasn't....
 	 */
 	if (wd == -1) {
-		WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
-			" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
-			i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
+		WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p\n",
+			__func__, i_mark, i_mark->wd, i_mark->fsn_mark.group);
 		goto out;
 	}
 
 	/* Lets look in the idr to see if we find it */
 	found_i_mark = inotify_idr_find_locked(group, wd);
 	if (unlikely(!found_i_mark)) {
-		WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
-			" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
-			i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
+		WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p\n",
+			__func__, i_mark, i_mark->wd, i_mark->fsn_mark.group);
 		goto out;
 	}
 
@@ -453,12 +451,10 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
 	 */
 	if (unlikely(found_i_mark != i_mark)) {
 		WARN_ONCE(1, "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p "
-			"mark->inode=%p found_i_mark=%p found_i_mark->wd=%d "
-			"found_i_mark->group=%p found_i_mark->inode=%p\n",
-			__func__, i_mark, i_mark->wd, i_mark->fsn_mark.group,
-			i_mark->fsn_mark.inode, found_i_mark, found_i_mark->wd,
-			found_i_mark->fsn_mark.group,
-			found_i_mark->fsn_mark.inode);
+			"found_i_mark=%p found_i_mark->wd=%d "
+			"found_i_mark->group=%p\n", __func__, i_mark,
+			i_mark->wd, i_mark->fsn_mark.group, found_i_mark,
+			found_i_mark->wd, found_i_mark->fsn_mark.group);
 		goto out;
 	}
 
@@ -468,9 +464,8 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
 	 * one ref grabbed by inotify_idr_find
 	 */
 	if (unlikely(atomic_read(&i_mark->fsn_mark.refcnt) < 3)) {
-		printk(KERN_ERR "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p"
-			" i_mark->inode=%p\n", __func__, i_mark, i_mark->wd,
-			i_mark->fsn_mark.group, i_mark->fsn_mark.inode);
+		printk(KERN_ERR "%s: i_mark=%p i_mark->wd=%d i_mark->group=%p\n",
+			 __func__, i_mark, i_mark->wd, i_mark->fsn_mark.group);
 		/* we can't really recover with bad ref cnting.. */
 		BUG();
 	}
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
  2016-12-22  9:15 ` [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo Jan Kara
  2016-12-22  9:15 ` [PATCH 02/22] inotify: Remove inode pointers from debug messages Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-22 16:27   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark() Jan Kara
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Move recalculation of inode / vfsmount notification mask under
group->mark_mutex of the mark which was modified. These are the only
places where mask recalculation happens without mark being protected
from detaching from inode / vfsmount which will cause issues with the
following patches.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify_user.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 7ebfca6a1427..8dcec9eecafd 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -541,6 +541,8 @@ static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
 
 	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
 						 &destroy_mark);
+	if (removed & real_mount(mnt)->mnt_fsnotify_mask)
+		fsnotify_recalc_vfsmount_mask(mnt);
 	if (destroy_mark)
 		fsnotify_detach_mark(fsn_mark);
 	mutex_unlock(&group->mark_mutex);
@@ -548,9 +550,6 @@ static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
 		fsnotify_free_mark(fsn_mark);
 
 	fsnotify_put_mark(fsn_mark);
-	if (removed & real_mount(mnt)->mnt_fsnotify_mask)
-		fsnotify_recalc_vfsmount_mask(mnt);
-
 	return 0;
 }
 
@@ -571,6 +570,8 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 
 	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
 						 &destroy_mark);
+	if (removed & inode->i_fsnotify_mask)
+		fsnotify_recalc_inode_mask(inode);
 	if (destroy_mark)
 		fsnotify_detach_mark(fsn_mark);
 	mutex_unlock(&group->mark_mutex);
@@ -579,8 +580,6 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 
 	/* matches the fsnotify_find_inode_mark() */
 	fsnotify_put_mark(fsn_mark);
-	if (removed & inode->i_fsnotify_mask)
-		fsnotify_recalc_inode_mask(inode);
 
 	return 0;
 }
@@ -656,10 +655,9 @@ static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
 		}
 	}
 	added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
-	mutex_unlock(&group->mark_mutex);
-
 	if (added & ~real_mount(mnt)->mnt_fsnotify_mask)
 		fsnotify_recalc_vfsmount_mask(mnt);
+	mutex_unlock(&group->mark_mutex);
 
 	fsnotify_put_mark(fsn_mark);
 	return 0;
@@ -694,10 +692,9 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group,
 		}
 	}
 	added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
-	mutex_unlock(&group->mark_mutex);
-
 	if (added & ~inode->i_fsnotify_mask)
 		fsnotify_recalc_inode_mask(inode);
+	mutex_unlock(&group->mark_mutex);
 
 	fsnotify_put_mark(fsn_mark);
 	return 0;
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark()
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (2 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-22 23:13   ` Paul Moore
  2016-12-22  9:15 ` [PATCH 05/22] audit: Fix sleep in atomic Jan Kara
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

There are only two calls sites of fsnotify_duplicate_mark(). Those are
in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
for audit tree, inode pointer and group gets set in
fsnotify_add_mark_locked() later anyway, mask and free_mark are already
set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
actively harmful because following fsnotify_add_mark_locked() will leak
group reference by overwriting the group pointer. So just remove the two
calls to fsnotify_duplicate_mark() and the function.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/mark.c                 | 12 ------------
 include/linux/fsnotify_backend.h |  2 --
 kernel/audit_tree.c              |  6 ++----
 3 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d3fea0bd89e2..6043306e8e21 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -510,18 +510,6 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
 	}
 }
 
-void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
-{
-	assert_spin_locked(&old->lock);
-	new->inode = old->inode;
-	new->mnt = old->mnt;
-	if (old->group)
-		fsnotify_get_group(old->group);
-	new->group = old->group;
-	new->mask = old->mask;
-	new->free_mark = old->free_mark;
-}
-
 /*
  * Nothing fancy, just initialize lists and locks and counters.
  */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 0cf34d6cc253..487246546ebe 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -323,8 +323,6 @@ extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(str
 extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
 /* find (and take a reference) to a mark associated with group and vfsmount */
 extern struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt);
-/* copy the values from old into new */
-extern void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old);
 /* set the ignored_mask of a mark */
 extern void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask);
 /* set the mask of a mark (might pin the object into memory */
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 8b1dde96a0fa..f3130eb0a4bd 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -258,8 +258,7 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	fsnotify_duplicate_mark(&new->mark, entry);
-	if (fsnotify_add_mark(&new->mark, new->mark.group, new->mark.inode, NULL, 1)) {
+	if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
 		fsnotify_put_mark(&new->mark);
 		goto Fallback;
 	}
@@ -395,8 +394,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		return -ENOENT;
 	}
 
-	fsnotify_duplicate_mark(chunk_entry, old_entry);
-	if (fsnotify_add_mark(chunk_entry, chunk_entry->group, chunk_entry->inode, NULL, 1)) {
+	if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
 		spin_unlock(&old_entry->lock);
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/22] audit: Fix sleep in atomic
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (3 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark() Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-22 23:18   ` Paul Moore
  2016-12-22  9:15 ` [PATCH 06/22] audit: Abstract hash key handling Jan Kara
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Audit tree code was happily adding new notification marks while holding
spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
lead to sleeping while holding a spinlock, deadlocks due to lock
inversion, and probably other fun. Fix the problem by acquiring
group->mark_mutex earlier.

CC: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index f3130eb0a4bd..156b6a93f4fc 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -231,6 +231,7 @@ static void untag_chunk(struct node *p)
 	if (size)
 		new = alloc_chunk(size);
 
+	mutex_lock(&entry->group->mark_mutex);
 	spin_lock(&entry->lock);
 	if (chunk->dead || !entry->inode) {
 		spin_unlock(&entry->lock);
@@ -258,7 +259,8 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
+	if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
+				     NULL, 1)) {
 		fsnotify_put_mark(&new->mark);
 		goto Fallback;
 	}
@@ -309,6 +311,7 @@ static void untag_chunk(struct node *p)
 	spin_unlock(&hash_lock);
 	spin_unlock(&entry->lock);
 out:
+	mutex_unlock(&entry->group->mark_mutex);
 	fsnotify_put_mark(entry);
 	spin_lock(&hash_lock);
 }
@@ -385,17 +388,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	chunk_entry = &chunk->mark;
 
+	mutex_lock(&old_entry->group->mark_mutex);
 	spin_lock(&old_entry->lock);
 	if (!old_entry->inode) {
 		/* old_entry is being shot, lets just lie */
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(old_entry);
 		free_chunk(chunk);
 		return -ENOENT;
 	}
 
-	if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
+	if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
+				     old_entry->inode, NULL, 1)) {
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
 		return -ENOSPC;
@@ -411,6 +418,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		chunk->dead = 1;
 		spin_unlock(&chunk_entry->lock);
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 
 		fsnotify_destroy_mark(chunk_entry, audit_tree_group);
 
@@ -443,6 +451,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	spin_unlock(&hash_lock);
 	spin_unlock(&chunk_entry->lock);
 	spin_unlock(&old_entry->lock);
+	mutex_unlock(&old_entry->group->mark_mutex);
 	fsnotify_destroy_mark(old_entry, audit_tree_group);
 	fsnotify_put_mark(chunk_entry);	/* drop initial reference */
 	fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/22] audit: Abstract hash key handling
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (4 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 05/22] audit: Fix sleep in atomic Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-22 23:27   ` Paul Moore
  2016-12-22  9:15 ` [PATCH 07/22] fsnotify: Update comments Jan Kara
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Audit tree currently uses inode pointer as a key into the hash table.
Getting that from notification mark will be somewhat more difficult with
coming fsnotify changes and there's no reason we really have to use the
inode pointer. So abstract getting of hash key from the audit chunk and
inode so that we can switch to a different key easily later.

CC: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 156b6a93f4fc..f0859828de09 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -163,33 +163,48 @@ enum {HASH_SIZE = 128};
 static struct list_head chunk_hash_heads[HASH_SIZE];
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(hash_lock);
 
-static inline struct list_head *chunk_hash(const struct inode *inode)
+/* Function to return search key in our hash from inode. */
+static unsigned long inode_to_key(const struct inode *inode)
 {
-	unsigned long n = (unsigned long)inode / L1_CACHE_BYTES;
+	return (unsigned long)inode;
+}
+
+/*
+ * Function to return search key in our hash from chunk. Key 0 is special and
+ * should never be present in the hash.
+ */
+static unsigned long chunk_to_key(struct audit_chunk *chunk)
+{
+	return (unsigned long)chunk->mark.inode;
+}
+
+static inline struct list_head *chunk_hash(unsigned long key)
+{
+	unsigned long n = key / L1_CACHE_BYTES;
 	return chunk_hash_heads + n % HASH_SIZE;
 }
 
 /* hash_lock & entry->lock is held by caller */
 static void insert_hash(struct audit_chunk *chunk)
 {
-	struct fsnotify_mark *entry = &chunk->mark;
+	unsigned long key = chunk_to_key(chunk);
 	struct list_head *list;
 
-	if (!entry->inode)
+	if (!key)
 		return;
-	list = chunk_hash(entry->inode);
+	list = chunk_hash(key);
 	list_add_rcu(&chunk->hash, list);
 }
 
 /* called under rcu_read_lock */
 struct audit_chunk *audit_tree_lookup(const struct inode *inode)
 {
-	struct list_head *list = chunk_hash(inode);
+	unsigned long key = inode_to_key(inode);
+	struct list_head *list = chunk_hash(key);
 	struct audit_chunk *p;
 
 	list_for_each_entry_rcu(p, list, hash) {
-		/* mark.inode may have gone NULL, but who cares? */
-		if (p->mark.inode == inode) {
+		if (chunk_to_key(p) == key) {
 			atomic_long_inc(&p->refs);
 			return p;
 		}
@@ -585,7 +600,8 @@ int audit_remove_tree_rule(struct audit_krule *rule)
 
 static int compare_root(struct vfsmount *mnt, void *arg)
 {
-	return d_backing_inode(mnt->mnt_root) == arg;
+	return inode_to_key(d_backing_inode(mnt->mnt_root)) ==
+	       (unsigned long)arg;
 }
 
 void audit_trim_trees(void)
@@ -620,9 +636,10 @@ void audit_trim_trees(void)
 		list_for_each_entry(node, &tree->chunks, list) {
 			struct audit_chunk *chunk = find_chunk(node);
 			/* this could be NULL if the watch is dying else where... */
-			struct inode *inode = chunk->mark.inode;
 			node->index |= 1U<<31;
-			if (iterate_mounts(compare_root, inode, root_mnt))
+			if (iterate_mounts(compare_root,
+					   (void *)chunk_to_key(chunk),
+					   root_mnt))
 				node->index &= ~(1U<<31);
 		}
 		spin_unlock(&hash_lock);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/22] fsnotify: Update comments
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (5 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 06/22] audit: Abstract hash key handling Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-23  4:45   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure Jan Kara
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Add a comment that lifetime of a notification mark is protected by SRCU
and remove a comment about clearing of marks attached to the inode. It
is stale and more uptodate version is at fsnotify_destroy_marks() which
is the function handling this case.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/mark.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 6043306e8e21..44836e539169 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -51,7 +51,7 @@
  *
  * LIFETIME:
  * Inode marks survive between when they are added to an inode and when their
- * refcnt==0.
+ * refcnt==0. Marks are also protected by fsnotify_mark_srcu.
  *
  * The inode mark can be cleared for a number of different reasons including:
  * - The inode is unlinked for the last time.  (fsnotify_inode_remove)
@@ -61,17 +61,6 @@
  * - The fsnotify_group associated with the mark is going away and all such marks
  *   need to be cleaned up. (fsnotify_clear_marks_by_group)
  *
- * Worst case we are given an inode and need to clean up all the marks on that
- * inode.  We take i_lock and walk the i_fsnotify_marks safely.  For each
- * mark on the list we take a reference (so the mark can't disappear under us).
- * We remove that mark form the inode's list of marks and we add this mark to a
- * private list anchored on the stack using i_free_list; we walk i_free_list
- * and before we destroy the mark we make sure that we dont race with a
- * concurrent destroy_group by getting a ref to the marks group and taking the
- * groups mutex.
-
- * Very similarly for freeing by group, except we use free_g_list.
- *
  * This has the very interesting property of being able to run concurrently with
  * any (or all) other directions.
  */
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (6 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 07/22] fsnotify: Update comments Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-23  5:48   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 09/22] inotify: Do not drop mark reference under idr_lock Jan Kara
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Currently notification marks are attached to object (inode or vfsmnt) by
a hlist_head in the object. The list is also protected by a spinlock in
the object. So while there is any mark attached to the list of marks,
the object must be pinned in memory (and thus e.g. last iput() deleting
inode cannot happen). Also for list iteration in fsnotify() to work, we
must hold fsnotify_mark_srcu lock so that mark itself and
mark->obj_list.next cannot get freed. Thus we are required to wait for
response to fanotify events from userspace process with
fsnotify_mark_srcu lock held. That causes issues when userspace process
is buggy and does not reply to some event - basically the whole
notification subsystem gets eventually stuck.

So to be able to drop fsnotify_mark_srcu lock while waiting for
response, we have to pin the mark in memory and make sure it stays in
the object list (as removing the mark waiting for response could lead to
lost notification events for groups later in the list). However we don't
want inode reclaim to block on such mark as that would lead to system
just locking up elsewhere.

This commit tries to pave a way towards solving these conflicting
lifetime needs. Instead of anchoring the list of marks directly in the
object, we anchor it in a dedicated structure (fsnotify_mark_list) and
just point to that structure from the object. Also the list is protected
by a spinlock contained in that structure. With this, we can detach
notification marks from object without having to modify the list itself.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c                       |   3 -
 fs/mount.h                       |   2 +-
 fs/namespace.c                   |   3 -
 fs/notify/dnotify/dnotify.c      |   3 +-
 fs/notify/fdinfo.c               |  12 +-
 fs/notify/fsnotify.c             |  30 +++-
 fs/notify/fsnotify.h             |  36 +----
 fs/notify/inode_mark.c           |  94 +-----------
 fs/notify/mark.c                 | 315 ++++++++++++++++++++++++++++-----------
 fs/notify/vfsmount_mark.c        |  64 +-------
 include/linux/fs.h               |   4 +-
 include/linux/fsnotify_backend.h |  43 ++++--
 kernel/audit_tree.c              |  15 +-
 kernel/auditsc.c                 |   4 +-
 14 files changed, 321 insertions(+), 307 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 88110fd0b282..131b2bcebc48 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -371,9 +371,6 @@ void inode_init_once(struct inode *inode)
 	INIT_LIST_HEAD(&inode->i_lru);
 	address_space_init_once(&inode->i_data);
 	i_size_ordered_init(inode);
-#ifdef CONFIG_FSNOTIFY
-	INIT_HLIST_HEAD(&inode->i_fsnotify_marks);
-#endif
 }
 EXPORT_SYMBOL(inode_init_once);
 
diff --git a/fs/mount.h b/fs/mount.h
index 2c856fc47ae3..4c1975184ec4 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -59,7 +59,7 @@ struct mount {
 	struct mountpoint *mnt_mp;	/* where is it mounted */
 	struct hlist_node mnt_mp_list;	/* list mounts with the same mountpoint */
 #ifdef CONFIG_FSNOTIFY
-	struct hlist_head mnt_fsnotify_marks;
+	struct fsnotify_mark_list *mnt_fsnotify_marks;
 	__u32 mnt_fsnotify_mask;
 #endif
 	int mnt_id;			/* mount identifier */
diff --git a/fs/namespace.c b/fs/namespace.c
index f7e28f8ea04d..482f590268ec 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -237,9 +237,6 @@ static struct mount *alloc_vfsmnt(const char *name)
 		INIT_LIST_HEAD(&mnt->mnt_slave_list);
 		INIT_LIST_HEAD(&mnt->mnt_slave);
 		INIT_HLIST_NODE(&mnt->mnt_mp_list);
-#ifdef CONFIG_FSNOTIFY
-		INIT_HLIST_HEAD(&mnt->mnt_fsnotify_marks);
-#endif
 		init_fs_pin(&mnt->mnt_umount, drop_mountpoint);
 	}
 	return mnt;
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 5a4ec309e283..8633d9f93fbd 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -69,8 +69,7 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark)
 	if (old_mask == new_mask)
 		return;
 
-	if (fsn_mark->inode)
-		fsnotify_recalc_inode_mask(fsn_mark->inode);
+	fsnotify_recalc_mask(fsn_mark->obj_list_head);
 }
 
 /*
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 601a59c8d87e..7afadf76a3cb 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -76,11 +76,11 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 	struct inotify_inode_mark *inode_mark;
 	struct inode *inode;
 
-	if (!(mark->flags & FSNOTIFY_MARK_FLAG_INODE))
+	if (!(mark->obj_list_head->flags & FSNOTIFY_LIST_TYPE_INODE))
 		return;
 
 	inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
-	inode = igrab(mark->inode);
+	inode = igrab(mark->obj_list_head->inode);
 	if (inode) {
 		/*
 		 * IN_ALL_EVENTS represents all of the mask bits
@@ -115,8 +115,8 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 	if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)
 		mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
 
-	if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
-		inode = igrab(mark->inode);
+	if (mark->obj_list_head->flags & FSNOTIFY_LIST_TYPE_INODE) {
+		inode = igrab(mark->obj_list_head->inode);
 		if (!inode)
 			return;
 		seq_printf(m, "fanotify ino:%lx sdev:%x mflags:%x mask:%x ignored_mask:%x ",
@@ -125,8 +125,8 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 		show_mark_fhandle(m, inode);
 		seq_putc(m, '\n');
 		iput(inode);
-	} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT) {
-		struct mount *mnt = real_mount(mark->mnt);
+	} else if (mark->obj_list_head->flags & FSNOTIFY_LIST_TYPE_VFSMOUNT) {
+		struct mount *mnt = real_mount(mark->obj_list_head->mnt);
 
 		seq_printf(m, "fanotify mnt_id:%x mflags:%x mask:%x ignored_mask:%x\n",
 			   mnt->mnt_id, mflags, mark->mask, mark->ignored_mask);
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index b41515d3f081..755502ec1eb6 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -193,6 +193,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	struct hlist_node *inode_node = NULL, *vfsmount_node = NULL;
 	struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL;
 	struct fsnotify_group *inode_group, *vfsmount_group;
+	struct fsnotify_mark_list *inode_list, *vfsmount_list;
 	struct mount *mnt;
 	int idx, ret = 0;
 	/* global tests shouldn't care about events on child only the specific event */
@@ -210,8 +211,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	 * SRCU because we have no references to any objects and do not
 	 * need SRCU to keep them "alive".
 	 */
-	if (hlist_empty(&to_tell->i_fsnotify_marks) &&
-	    (!mnt || hlist_empty(&mnt->mnt_fsnotify_marks)))
+	if (!to_tell->i_fsnotify_marks &&
+	    (!mnt || !mnt->mnt_fsnotify_marks))
 		return 0;
 	/*
 	 * if this is a modify event we may need to clear the ignored masks
@@ -226,16 +227,27 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 	idx = srcu_read_lock(&fsnotify_mark_srcu);
 
 	if ((mask & FS_MODIFY) ||
-	    (test_mask & to_tell->i_fsnotify_mask))
-		inode_node = srcu_dereference(to_tell->i_fsnotify_marks.first,
+	    (test_mask & to_tell->i_fsnotify_mask)) {
+		inode_list = srcu_dereference(to_tell->i_fsnotify_marks,
 					      &fsnotify_mark_srcu);
+		if (inode_list)
+			inode_node = srcu_dereference(inode_list->list.first,
+						      &fsnotify_mark_srcu);
+	}
 
 	if (mnt && ((mask & FS_MODIFY) ||
 		    (test_mask & mnt->mnt_fsnotify_mask))) {
-		vfsmount_node = srcu_dereference(mnt->mnt_fsnotify_marks.first,
-						 &fsnotify_mark_srcu);
-		inode_node = srcu_dereference(to_tell->i_fsnotify_marks.first,
+		inode_list = srcu_dereference(to_tell->i_fsnotify_marks,
 					      &fsnotify_mark_srcu);
+		if (inode_list)
+			inode_node = srcu_dereference(inode_list->list.first,
+						      &fsnotify_mark_srcu);
+		vfsmount_list = srcu_dereference(mnt->mnt_fsnotify_marks,
+					         &fsnotify_mark_srcu);
+		if (vfsmount_list)
+			vfsmount_node = srcu_dereference(
+						vfsmount_list->list.first,
+						&fsnotify_mark_srcu);
 	}
 
 	/*
@@ -293,6 +305,8 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 }
 EXPORT_SYMBOL_GPL(fsnotify);
 
+extern struct kmem_cache *fsnotify_mark_list_cachep;
+
 static __init int fsnotify_init(void)
 {
 	int ret;
@@ -303,6 +317,8 @@ static __init int fsnotify_init(void)
 	if (ret)
 		panic("initializing fsnotify_mark_srcu");
 
+	fsnotify_mark_list_cachep = KMEM_CACHE(fsnotify_mark_list, SLAB_PANIC);
+
 	return 0;
 }
 core_initcall(fsnotify_init);
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 0a3bc2cf192c..bdc489af0e5b 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -14,47 +14,25 @@ extern void fsnotify_flush_notify(struct fsnotify_group *group);
 /* protects reads of inode and vfsmount marks list */
 extern struct srcu_struct fsnotify_mark_srcu;
 
-/* Calculate mask of events for a list of marks */
-extern u32 fsnotify_recalc_mask(struct hlist_head *head);
-
 /* compare two groups for sorting of marks lists */
 extern int fsnotify_compare_groups(struct fsnotify_group *a,
 				   struct fsnotify_group *b);
 
-extern void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *fsn_mark,
-						__u32 mask);
-/* Add mark to a proper place in mark list */
-extern int fsnotify_add_mark_list(struct hlist_head *head,
-				  struct fsnotify_mark *mark,
-				  int allow_dups);
-/* add a mark to an inode */
-extern int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
-				   struct fsnotify_group *group, struct inode *inode,
-				   int allow_dups);
-/* add a mark to a vfsmount */
-extern int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
-				      struct fsnotify_group *group, struct vfsmount *mnt,
-				      int allow_dups);
-
-/* vfsmount specific destruction of a mark */
-extern void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark);
-/* inode specific destruction of a mark */
-extern void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark);
 /* Find mark belonging to given group in the list of marks */
-extern struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
-						struct fsnotify_group *group);
-/* Destroy all marks in the given list protected by 'lock' */
-extern void fsnotify_destroy_marks(struct hlist_head *head, spinlock_t *lock);
+extern struct fsnotify_mark *fsnotify_find_mark(
+					struct fsnotify_mark_list **listp,
+					struct fsnotify_group *group);
+/* Destroy all marks in the given list */
+extern void fsnotify_destroy_marks(struct fsnotify_mark_list **list);
 /* run the list of all marks associated with inode and destroy them */
 static inline void fsnotify_clear_marks_by_inode(struct inode *inode)
 {
-	fsnotify_destroy_marks(&inode->i_fsnotify_marks, &inode->i_lock);
+	fsnotify_destroy_marks(&inode->i_fsnotify_marks);
 }
 /* run the list of all marks associated with vfsmount and destroy them */
 static inline void fsnotify_clear_marks_by_mount(struct vfsmount *mnt)
 {
-	fsnotify_destroy_marks(&real_mount(mnt)->mnt_fsnotify_marks,
-			       &mnt->mnt_root->d_lock);
+	fsnotify_destroy_marks(&real_mount(mnt)->mnt_fsnotify_marks);
 }
 /* prepare for freeing all marks associated with given group */
 extern void fsnotify_detach_group_marks(struct fsnotify_group *group);
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index a3645249f7ec..3d309c890670 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -30,38 +30,9 @@
 
 #include "../internal.h"
 
-/*
- * Recalculate the inode->i_fsnotify_mask, or the mask of all FS_* event types
- * any notifier is interested in hearing for this inode.
- */
 void fsnotify_recalc_inode_mask(struct inode *inode)
 {
-	spin_lock(&inode->i_lock);
-	inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
-	spin_unlock(&inode->i_lock);
-
-	__fsnotify_update_child_dentry_flags(inode);
-}
-
-void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
-{
-	struct inode *inode = mark->inode;
-
-	BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
-	assert_spin_locked(&mark->lock);
-
-	spin_lock(&inode->i_lock);
-
-	hlist_del_init_rcu(&mark->obj_list);
-	mark->inode = NULL;
-
-	/*
-	 * this mark is now off the inode->i_fsnotify_marks list and we
-	 * hold the inode->i_lock, so this is the perfect time to update the
-	 * inode->i_fsnotify_mask
-	 */
-	inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
-	spin_unlock(&inode->i_lock);
+	fsnotify_recalc_mask(inode->i_fsnotify_marks);
 }
 
 /*
@@ -69,7 +40,7 @@ void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
  */
 void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group)
 {
-	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_MARK_FLAG_INODE);
+	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_INODE);
 }
 
 /*
@@ -79,66 +50,7 @@ void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group)
 struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group,
 					       struct inode *inode)
 {
-	struct fsnotify_mark *mark;
-
-	spin_lock(&inode->i_lock);
-	mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
-	spin_unlock(&inode->i_lock);
-
-	return mark;
-}
-
-/*
- * If we are setting a mark mask on an inode mark we should pin the inode
- * in memory.
- */
-void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *mark,
-					 __u32 mask)
-{
-	struct inode *inode;
-
-	assert_spin_locked(&mark->lock);
-
-	if (mask &&
-	    mark->inode &&
-	    !(mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) {
-		mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
-		inode = igrab(mark->inode);
-		/*
-		 * we shouldn't be able to get here if the inode wasn't
-		 * already safely held in memory.  But bug in case it
-		 * ever is wrong.
-		 */
-		BUG_ON(!inode);
-	}
-}
-
-/*
- * Attach an initialized mark to a given inode.
- * These marks may be used for the fsnotify backend to determine which
- * event types should be delivered to which group and for which inodes.  These
- * marks are ordered according to priority, highest number first, and then by
- * the group's location in memory.
- */
-int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
-			    struct fsnotify_group *group, struct inode *inode,
-			    int allow_dups)
-{
-	int ret;
-
-	mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
-
-	BUG_ON(!mutex_is_locked(&group->mark_mutex));
-	assert_spin_locked(&mark->lock);
-
-	spin_lock(&inode->i_lock);
-	mark->inode = inode;
-	ret = fsnotify_add_mark_list(&inode->i_fsnotify_marks, mark,
-				     allow_dups);
-	inode->i_fsnotify_mask = fsnotify_recalc_mask(&inode->i_fsnotify_marks);
-	spin_unlock(&inode->i_lock);
-
-	return ret;
+	return fsnotify_find_mark(&inode->i_fsnotify_marks, group);
 }
 
 /**
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 44836e539169..a9870beabe67 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -33,7 +33,7 @@
  *
  * group->mark_mutex
  * mark->lock
- * inode->i_lock
+ * mark->obj_list_head->lock
  *
  * group->mark_mutex protects the marks_list anchored inside a given group and
  * each mark is hooked via the g_list.  It also protects the groups private
@@ -44,10 +44,12 @@
  * is assigned to as well as the access to a reference of the inode/vfsmount
  * that is being watched by the mark.
  *
- * inode->i_lock protects the i_fsnotify_marks list anchored inside a
- * given inode and each mark is hooked via the i_list. (and sorta the
- * free_i_list)
+ * mark->obj_list_head->lock protects the list of marks anchored inside an
+ * inode / vfsmount and each mark is hooked via the i_list.
  *
+ * A list of notification marks relating to inode / mnt is contained in
+ * fsnotify_mark_list. That structure is alive as long as there are any marks
+ * in the list and is also protected by fsnotify_mark_srcu.
  *
  * LIFETIME:
  * Inode marks survive between when they are added to an inode and when their
@@ -83,12 +85,18 @@
 #define FSNOTIFY_REAPER_DELAY	(1)	/* 1 jiffy */
 
 struct srcu_struct fsnotify_mark_srcu;
+struct kmem_cache *fsnotify_mark_list_cachep;
+
 static DEFINE_SPINLOCK(destroy_lock);
 static LIST_HEAD(destroy_list);
+static struct fsnotify_mark_list *list_destroy_list;
 
 static void fsnotify_mark_destroy_workfn(struct work_struct *work);
 static DECLARE_DELAYED_WORK(reaper_work, fsnotify_mark_destroy_workfn);
 
+static void fsnotify_list_destroy_workfn(struct work_struct *work);
+static DECLARE_WORK(list_reaper_work, fsnotify_list_destroy_workfn);
+
 void fsnotify_get_mark(struct fsnotify_mark *mark)
 {
 	atomic_inc(&mark->refcnt);
@@ -103,15 +111,97 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
 	}
 }
 
-/* Calculate mask of events for a list of marks */
-u32 fsnotify_recalc_mask(struct hlist_head *head)
+static void __fsnotify_recalc_mask(struct fsnotify_mark_list *list)
 {
 	u32 new_mask = 0;
 	struct fsnotify_mark *mark;
 
-	hlist_for_each_entry(mark, head, obj_list)
-		new_mask |= mark->mask;
-	return new_mask;
+	assert_spin_locked(&list->lock);
+	hlist_for_each_entry(mark, &list->list, obj_list) {
+		if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)
+			new_mask |= mark->mask;
+	}
+	if (list->flags & FSNOTIFY_LIST_TYPE_INODE)
+		list->inode->i_fsnotify_mask = new_mask;
+	else if (list->flags & FSNOTIFY_LIST_TYPE_VFSMOUNT)
+		real_mount(list->mnt)->mnt_fsnotify_mask = new_mask;
+}
+
+/*
+ * Calculate mask of events for a list of marks. The caller must make sure list
+ * cannot disappear under us (usually by holding a mark->lock or
+ * mark->group->mark_mutex for a mark on this list).
+ */
+void fsnotify_recalc_mask(struct fsnotify_mark_list *list)
+{
+	struct inode *inode = NULL;
+
+	spin_lock(&list->lock);
+	__fsnotify_recalc_mask(list);
+	if (list->flags & FSNOTIFY_LIST_TYPE_INODE)
+		inode = igrab(list->inode);
+	spin_unlock(&list->lock);
+	if (inode) {
+		__fsnotify_update_child_dentry_flags(inode);
+		iput(inode);
+	}
+}
+
+/* Free all list heads queued for freeing once SRCU period ends */
+static void fsnotify_list_destroy_workfn(struct work_struct *work)
+{
+	struct fsnotify_mark_list *list, *free;
+
+	spin_lock(&destroy_lock);
+	list = list_destroy_list;
+	list_destroy_list = NULL;
+	spin_unlock(&destroy_lock);
+
+	synchronize_srcu(&fsnotify_mark_srcu);
+	while (list) {
+		free = list;
+		list = list->destroy_next;
+		kmem_cache_free(fsnotify_mark_list_cachep, free);
+	}
+}
+
+static struct inode *fsnotify_detach_from_object(struct fsnotify_mark *mark)
+{
+	struct fsnotify_mark_list *list;
+	struct inode *inode = NULL;
+	bool free_list = false;
+
+	list = mark->obj_list_head;
+	spin_lock(&list->lock);
+	hlist_del_init_rcu(&mark->obj_list);
+	if (hlist_empty(&list->list)) {
+		if (list->flags & FSNOTIFY_LIST_TYPE_INODE) {
+			inode = list->inode;
+			inode->i_fsnotify_marks = NULL;
+			inode->i_fsnotify_mask = 0;
+			list->inode = NULL;
+			list->flags &= ~FSNOTIFY_LIST_TYPE_INODE;
+		} else if (list->flags & FSNOTIFY_LIST_TYPE_VFSMOUNT) {
+			real_mount(list->mnt)->mnt_fsnotify_marks = NULL;
+			real_mount(list->mnt)->mnt_fsnotify_mask = 0;
+			list->mnt = NULL;
+			list->flags &= ~FSNOTIFY_LIST_TYPE_VFSMOUNT;
+		}
+		free_list = true;
+	} else
+		__fsnotify_recalc_mask(list);
+	mark->obj_list_head = NULL;
+	spin_unlock(&list->lock);
+
+	if (free_list) {
+		spin_lock(&destroy_lock);
+		list->destroy_next = list_destroy_list;
+		list_destroy_list = list;
+		spin_unlock(&destroy_lock);
+		queue_work(system_unbound_wq, &list_reaper_work);
+	}
+
+	return inode;
 }
 
 /*
@@ -137,13 +227,8 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
 
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ATTACHED;
 
-	if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
-		inode = mark->inode;
-		fsnotify_destroy_inode_mark(mark);
-	} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
-		fsnotify_destroy_vfsmount_mark(mark);
-	else
-		BUG();
+	inode = fsnotify_detach_from_object(mark);
+
 	/*
 	 * Note that we didn't update flags telling whether inode cares about
 	 * what's happening with children. We update these flags from
@@ -155,7 +240,7 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
 
 	spin_unlock(&mark->lock);
 
-	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
+	if (inode)
 		iput(inode);
 
 	atomic_dec(&group->num_marks);
@@ -220,45 +305,11 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 	fsnotify_free_mark(mark);
 }
 
-void fsnotify_destroy_marks(struct hlist_head *head, spinlock_t *lock)
-{
-	struct fsnotify_mark *mark;
-
-	while (1) {
-		/*
-		 * We have to be careful since we can race with e.g.
-		 * fsnotify_clear_marks_by_group() and once we drop 'lock',
-		 * mark can get removed from the obj_list and destroyed. But
-		 * we are holding mark reference so mark cannot be freed and
-		 * calling fsnotify_destroy_mark() more than once is fine.
-		 */
-		spin_lock(lock);
-		if (hlist_empty(head)) {
-			spin_unlock(lock);
-			break;
-		}
-		mark = hlist_entry(head->first, struct fsnotify_mark, obj_list);
-		/*
-		 * We don't update i_fsnotify_mask / mnt_fsnotify_mask here
-		 * since inode / mount is going away anyway. So just remove
-		 * mark from the list.
-		 */
-		hlist_del_init_rcu(&mark->obj_list);
-		fsnotify_get_mark(mark);
-		spin_unlock(lock);
-		fsnotify_destroy_mark(mark, mark->group);
-		fsnotify_put_mark(mark);
-	}
-}
-
 void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask)
 {
 	assert_spin_locked(&mark->lock);
 
 	mark->mask = mask;
-
-	if (mark->flags & FSNOTIFY_MARK_FLAG_INODE)
-		fsnotify_set_inode_mark_mask_locked(mark, mask);
 }
 
 void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask)
@@ -304,37 +355,117 @@ int fsnotify_compare_groups(struct fsnotify_group *a, struct fsnotify_group *b)
 	return -1;
 }
 
-/* Add mark into proper place in given list of marks */
-int fsnotify_add_mark_list(struct hlist_head *head, struct fsnotify_mark *mark,
-			   int allow_dups)
+/*
+ * Get mark list structure, make sure it is alive and return with its lock held.
+ * This is for users that get list pointer from inode or mount. Users that hold
+ * reference to a mark on the list may directly lock list->lock as they are sure
+ * list cannot go away under them.
+ */
+static struct fsnotify_mark_list *fsnotify_grab_list(
+					struct fsnotify_mark_list **listp)
+{
+	struct fsnotify_mark_list *list;
+	int idx;
+
+	idx = srcu_read_lock(&fsnotify_mark_srcu);
+	list = srcu_dereference(*listp, &fsnotify_mark_srcu);
+	if (!list)
+		goto out;
+	spin_lock(&list->lock);
+	if (!(list->flags & (FSNOTIFY_LIST_TYPE_INODE |
+			     FSNOTIFY_LIST_TYPE_VFSMOUNT))) {
+		spin_unlock(&list->lock);
+		srcu_read_unlock(&fsnotify_mark_srcu, idx);
+		return NULL;
+	}
+out:
+	srcu_read_unlock(&fsnotify_mark_srcu, idx);
+	return list;
+}
+
+static void fsnotify_put_list(struct fsnotify_mark_list *list)
+{
+	spin_unlock(&list->lock);
+}
+
+/*
+ * Add mark into proper place in given list of marks. These marks may be used
+ * for the fsnotify backend to determine which event types should be delivered
+ * to which group and for which inodes. These marks are ordered according to
+ * priority, highest number first, and then by the group's location in memory.
+ */
+static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
+				  struct inode *inode, struct vfsmount *mnt,
+				  int allow_dups)
 {
 	struct fsnotify_mark *lmark, *last = NULL;
+	struct fsnotify_mark_list *list;
+	struct fsnotify_mark_list **listp;
 	int cmp;
+	int err = 0;
+
+	BUG_ON(!inode && !mnt);
+	if (inode)
+		listp = &inode->i_fsnotify_marks;
+	else
+		listp = &real_mount(mnt)->mnt_fsnotify_marks;
+restart:
+	spin_lock(&mark->lock);
+	list = fsnotify_grab_list(listp);
+	if (!list) {
+		spin_unlock(&mark->lock);
+		list = kmem_cache_alloc(fsnotify_mark_list_cachep, GFP_KERNEL);
+		if (!list)
+			return -ENOMEM;
+		spin_lock_init(&list->lock);
+		INIT_HLIST_HEAD(&list->list);
+		if (inode) {
+			list->flags = FSNOTIFY_LIST_TYPE_INODE;
+			list->inode = igrab(inode);
+		} else {
+			list->flags = FSNOTIFY_LIST_TYPE_VFSMOUNT;
+			list->mnt = mnt;
+		}
+		if (cmpxchg(listp, NULL, list)) {
+			/* Someone else created list structure for us */
+			if (inode)
+				iput(inode);
+			kmem_cache_free(fsnotify_mark_list_cachep, list);
+		}
+		goto restart;
+	}
 
 	/* is mark the first mark? */
-	if (hlist_empty(head)) {
-		hlist_add_head_rcu(&mark->obj_list, head);
-		return 0;
+	if (hlist_empty(&list->list)) {
+		hlist_add_head_rcu(&mark->obj_list, &list->list);
+		goto added;
 	}
 
 	/* should mark be in the middle of the current list? */
-	hlist_for_each_entry(lmark, head, obj_list) {
+	hlist_for_each_entry(lmark, &list->list, obj_list) {
 		last = lmark;
 
-		if ((lmark->group == mark->group) && !allow_dups)
-			return -EEXIST;
+		if ((lmark->group == mark->group) && !allow_dups) {
+			err = -EEXIST;
+			goto out_err;
+		}
 
 		cmp = fsnotify_compare_groups(lmark->group, mark->group);
 		if (cmp >= 0) {
 			hlist_add_before_rcu(&mark->obj_list, &lmark->obj_list);
-			return 0;
+			goto added;
 		}
 	}
 
 	BUG_ON(last == NULL);
 	/* mark should be the last entry.  last is the current last entry */
 	hlist_add_behind_rcu(&mark->obj_list, &last->obj_list);
-	return 0;
+added:
+	mark->obj_list_head = list;
+out_err:
+	fsnotify_put_list(list);
+	spin_unlock(&mark->lock);
+	return err;
 }
 
 /*
@@ -356,7 +487,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	 * LOCKING ORDER!!!!
 	 * group->mark_mutex
 	 * mark->lock
-	 * inode->i_lock
+	 * mark->obj_list_head->lock
 	 */
 	spin_lock(&mark->lock);
 	mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE | FSNOTIFY_MARK_FLAG_ATTACHED;
@@ -366,25 +497,14 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	list_add(&mark->g_list, &group->marks_list);
 	atomic_inc(&group->num_marks);
 	fsnotify_get_mark(mark); /* for i_list and g_list */
-
-	if (inode) {
-		ret = fsnotify_add_inode_mark(mark, group, inode, allow_dups);
-		if (ret)
-			goto err;
-	} else if (mnt) {
-		ret = fsnotify_add_vfsmount_mark(mark, group, mnt, allow_dups);
-		if (ret)
-			goto err;
-	} else {
-		BUG();
-	}
-
-	/* this will pin the object if appropriate */
-	fsnotify_set_mark_mask_locked(mark, mark->mask);
 	spin_unlock(&mark->lock);
 
-	if (inode)
-		__fsnotify_update_child_dentry_flags(inode);
+	ret = fsnotify_add_mark_list(mark, inode, mnt, allow_dups);
+	if (ret)
+		goto err;
+
+	if (mark->mask)
+		fsnotify_recalc_mask(mark->obj_list_head);
 
 	return ret;
 err:
@@ -419,17 +539,23 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
  * Given a list of marks, find the mark associated with given group. If found
  * take a reference to that mark and return it, else return NULL.
  */
-struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
+struct fsnotify_mark *fsnotify_find_mark(struct fsnotify_mark_list **listp,
 					 struct fsnotify_group *group)
 {
 	struct fsnotify_mark *mark;
+	struct fsnotify_mark_list *list;
 
-	hlist_for_each_entry(mark, head, obj_list) {
+	list = fsnotify_grab_list(listp);
+	if (!list)
+		return NULL;
+	hlist_for_each_entry(mark, &list->list, obj_list) {
 		if (mark->group == group) {
 			fsnotify_get_mark(mark);
+			fsnotify_put_list(list);
 			return mark;
 		}
 	}
+	fsnotify_put_list(list);
 	return NULL;
 }
 
@@ -453,7 +579,7 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 	 */
 	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
-		if (mark->flags & flags)
+		if (mark->obj_list_head->flags & flags)
 			list_move(&mark->g_list, &to_free);
 	}
 	mutex_unlock(&group->mark_mutex);
@@ -499,6 +625,29 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
 	}
 }
 
+/* Destroy all marks attached to inode / vfsmount */
+void fsnotify_destroy_marks(struct fsnotify_mark_list **listp)
+{
+	struct fsnotify_mark_list *list;
+	struct fsnotify_mark *mark;
+
+	while ((list = fsnotify_grab_list(listp))) {
+		/*
+		 * We have to be careful since we can race with e.g.
+		 * fsnotify_clear_marks_by_group() and once we drop the list
+		 * lock, mark can get removed from the obj_list and destroyed.
+		 * But we are holding mark reference so mark cannot be freed
+		 * and calling fsnotify_destroy_mark() more than once is fine.
+		 */
+		mark = hlist_entry(list->list.first, struct fsnotify_mark,
+				   obj_list);
+		fsnotify_get_mark(mark);
+		fsnotify_put_list(list);
+		fsnotify_destroy_mark(mark, mark->group);
+		fsnotify_put_mark(mark);
+	}
+}
+
 /*
  * Nothing fancy, just initialize lists and locks and counters.
  */
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index a8fcab68faef..c4166d38c0cc 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -29,39 +29,14 @@
 #include <linux/fsnotify_backend.h>
 #include "fsnotify.h"
 
-void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
-{
-	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_MARK_FLAG_VFSMOUNT);
-}
-
-/*
- * Recalculate the mnt->mnt_fsnotify_mask, or the mask of all FS_* event types
- * any notifier is interested in hearing for this mount point
- */
 void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt)
 {
-	struct mount *m = real_mount(mnt);
-
-	spin_lock(&mnt->mnt_root->d_lock);
-	m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
-	spin_unlock(&mnt->mnt_root->d_lock);
+	fsnotify_recalc_mask(real_mount(mnt)->mnt_fsnotify_marks);
 }
 
-void fsnotify_destroy_vfsmount_mark(struct fsnotify_mark *mark)
+void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
 {
-	struct vfsmount *mnt = mark->mnt;
-	struct mount *m = real_mount(mnt);
-
-	BUG_ON(!mutex_is_locked(&mark->group->mark_mutex));
-	assert_spin_locked(&mark->lock);
-
-	spin_lock(&mnt->mnt_root->d_lock);
-
-	hlist_del_init_rcu(&mark->obj_list);
-	mark->mnt = NULL;
-
-	m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
-	spin_unlock(&mnt->mnt_root->d_lock);
+	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_VFSMOUNT);
 }
 
 /*
@@ -72,37 +47,6 @@ struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group,
 						  struct vfsmount *mnt)
 {
 	struct mount *m = real_mount(mnt);
-	struct fsnotify_mark *mark;
-
-	spin_lock(&mnt->mnt_root->d_lock);
-	mark = fsnotify_find_mark(&m->mnt_fsnotify_marks, group);
-	spin_unlock(&mnt->mnt_root->d_lock);
-
-	return mark;
-}
-
-/*
- * Attach an initialized mark to a given group and vfsmount.
- * These marks may be used for the fsnotify backend to determine which
- * event types should be delivered to which groups.
- */
-int fsnotify_add_vfsmount_mark(struct fsnotify_mark *mark,
-			       struct fsnotify_group *group, struct vfsmount *mnt,
-			       int allow_dups)
-{
-	struct mount *m = real_mount(mnt);
-	int ret;
-
-	mark->flags |= FSNOTIFY_MARK_FLAG_VFSMOUNT;
-
-	BUG_ON(!mutex_is_locked(&group->mark_mutex));
-	assert_spin_locked(&mark->lock);
-
-	spin_lock(&mnt->mnt_root->d_lock);
-	mark->mnt = mnt;
-	ret = fsnotify_add_mark_list(&m->mnt_fsnotify_marks, mark, allow_dups);
-	m->mnt_fsnotify_mask = fsnotify_recalc_mask(&m->mnt_fsnotify_marks);
-	spin_unlock(&mnt->mnt_root->d_lock);
 
-	return ret;
+	return fsnotify_find_mark(&m->mnt_fsnotify_marks, group);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2ba074328894..1a672d35403e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -545,6 +545,8 @@ is_uncached_acl(struct posix_acl *acl)
 #define IOP_XATTR	0x0008
 #define IOP_DEFAULT_READLINK	0x0010
 
+struct fsnotify_mark_list;
+
 /*
  * Keep mostly read-only and often accessed (especially for
  * the RCU path lookup and 'stat' data) fields at the beginning
@@ -644,7 +646,7 @@ struct inode {
 
 #ifdef CONFIG_FSNOTIFY
 	__u32			i_fsnotify_mask; /* all events this inode cares about */
-	struct hlist_head	i_fsnotify_marks;
+	struct fsnotify_mark_list	*i_fsnotify_marks;
 #endif
 
 #if IS_ENABLED(CONFIG_FS_ENCRYPTION)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 487246546ebe..6086fc7ff6df 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -194,6 +194,28 @@ struct fsnotify_group {
 #define FSNOTIFY_EVENT_INODE	2
 
 /*
+ * Inode / vfsmount point to this structure which tracks all marks attached to
+ * the inode / vfsmount. The reference to inode / vfsmount is held by this
+ * structure. We destroy this structure when there are no more marks attached
+ * to it. The structure is protected by fsnotify_mark_srcu.
+ */
+struct fsnotify_mark_list {
+	spinlock_t lock;
+#define FSNOTIFY_LIST_TYPE_INODE	0x01
+#define FSNOTIFY_LIST_TYPE_VFSMOUNT	0x02
+	unsigned int flags;	/* Type of object [lock] */
+	union {	/* Object pointer [lock] */
+		struct inode *inode;
+		struct vfsmount *mnt;
+	};
+	union {
+		struct hlist_head list;
+		/* Used listing heads to free after srcu period expires */
+		struct fsnotify_mark_list *destroy_next;
+	};
+};
+
+/*
  * A mark is simply an object attached to an in core inode which allows an
  * fsnotify listener to indicate they are either no longer interested in events
  * of a type matching mask or only interested in those events.
@@ -222,20 +244,15 @@ struct fsnotify_mark {
 	struct list_head g_list;
 	/* Protects inode / mnt pointers, flags, masks */
 	spinlock_t lock;
-	/* List of marks for inode / vfsmount [obj_lock] */
+	/* List of marks for inode / vfsmount [obj_list_head->lock] */
 	struct hlist_node obj_list;
-	union {	/* Object pointer [mark->lock, group->mark_mutex] */
-		struct inode *inode;	/* inode this mark is associated with */
-		struct vfsmount *mnt;	/* vfsmount this mark is associated with */
-	};
+	/* Head of list of marks for an object [mark->lock, group->mark_mutex] */
+	struct fsnotify_mark_list *obj_list_head;
 	/* Events types to ignore [mark->lock, group->mark_mutex] */
 	__u32 ignored_mask;
-#define FSNOTIFY_MARK_FLAG_INODE		0x01
-#define FSNOTIFY_MARK_FLAG_VFSMOUNT		0x02
-#define FSNOTIFY_MARK_FLAG_OBJECT_PINNED	0x04
-#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY	0x08
-#define FSNOTIFY_MARK_FLAG_ALIVE		0x10
-#define FSNOTIFY_MARK_FLAG_ATTACHED		0x20
+#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY	0x01
+#define FSNOTIFY_MARK_FLAG_ALIVE		0x02
+#define FSNOTIFY_MARK_FLAG_ATTACHED		0x04
 	unsigned int flags;		/* flags [mark->lock] */
 	void (*free_mark)(struct fsnotify_mark *mark); /* called on final put+free */
 };
@@ -314,6 +331,8 @@ extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group
 
 /* functions used to manipulate the marks attached to inodes */
 
+/* Calculate mask of events for a list of marks */
+extern void fsnotify_recalc_mask(struct fsnotify_mark_list *list);
 /* run all marks associated with a vfsmount and update mnt->mnt_fsnotify_mask */
 extern void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt);
 /* run all marks associated with an inode and update inode->i_fsnotify_mask */
@@ -343,7 +362,7 @@ extern void fsnotify_free_mark(struct fsnotify_mark *mark);
 extern void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the inode marks */
 extern void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group);
-/* run all the marks in a group, and clear all of the marks where mark->flags & flags is true*/
+/* run all the marks in a group, and clear all of the marks attached to given object type */
 extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, unsigned int flags);
 extern void fsnotify_get_mark(struct fsnotify_mark *mark);
 extern void fsnotify_put_mark(struct fsnotify_mark *mark);
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index f0859828de09..b95a97bf2fd3 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -166,7 +166,7 @@ static __cacheline_aligned_in_smp DEFINE_SPINLOCK(hash_lock);
 /* Function to return search key in our hash from inode. */
 static unsigned long inode_to_key(const struct inode *inode)
 {
-	return (unsigned long)inode;
+	return (unsigned long)inode->i_fsnotify_marks;
 }
 
 /*
@@ -175,7 +175,7 @@ static unsigned long inode_to_key(const struct inode *inode)
  */
 static unsigned long chunk_to_key(struct audit_chunk *chunk)
 {
-	return (unsigned long)chunk->mark.inode;
+	return (unsigned long)chunk->mark.obj_list_head;
 }
 
 static inline struct list_head *chunk_hash(unsigned long key)
@@ -248,7 +248,8 @@ static void untag_chunk(struct node *p)
 
 	mutex_lock(&entry->group->mark_mutex);
 	spin_lock(&entry->lock);
-	if (chunk->dead || !entry->inode) {
+	if (chunk->dead || !entry->obj_list_head ||
+	    !entry->obj_list_head->inode) {
 		spin_unlock(&entry->lock);
 		if (new)
 			free_chunk(new);
@@ -274,8 +275,8 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
-				     NULL, 1)) {
+	if (fsnotify_add_mark_locked(&new->mark, entry->group,
+				     entry->obj_list_head->inode, NULL, 1)) {
 		fsnotify_put_mark(&new->mark);
 		goto Fallback;
 	}
@@ -405,7 +406,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	mutex_lock(&old_entry->group->mark_mutex);
 	spin_lock(&old_entry->lock);
-	if (!old_entry->inode) {
+	if (!old_entry->obj_list_head || !old_entry->obj_list_head->inode) {
 		/* old_entry is being shot, lets just lie */
 		spin_unlock(&old_entry->lock);
 		mutex_unlock(&old_entry->group->mark_mutex);
@@ -415,7 +416,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	}
 
 	if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
-				     old_entry->inode, NULL, 1)) {
+			     old_entry->obj_list_head->inode, NULL, 1)) {
 		spin_unlock(&old_entry->lock);
 		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(chunk_entry);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index cf1fa43512c1..e990de915341 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1591,7 +1591,7 @@ static inline void handle_one(const struct inode *inode)
 	struct audit_tree_refs *p;
 	struct audit_chunk *chunk;
 	int count;
-	if (likely(hlist_empty(&inode->i_fsnotify_marks)))
+	if (likely(!inode->i_fsnotify_marks))
 		return;
 	context = current->audit_context;
 	p = context->trees;
@@ -1634,7 +1634,7 @@ static void handle_path(const struct dentry *dentry)
 	seq = read_seqbegin(&rename_lock);
 	for(;;) {
 		struct inode *inode = d_backing_inode(d);
-		if (inode && unlikely(!hlist_empty(&inode->i_fsnotify_marks))) {
+		if (inode && unlikely(inode->i_fsnotify_marks)) {
 			struct audit_chunk *chunk;
 			chunk = audit_tree_lookup(inode);
 			if (chunk) {
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/22] inotify: Do not drop mark reference under idr_lock
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (7 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-23  8:04   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 10/22] fsnotify: Detach mark from object list when last reference is dropped Jan Kara
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Dropping mark reference can result in mark being freed. Although it
should not happen in inotify_remove_from_idr() since caller should hold
another reference, just don't risk lock up just after WARN_ON
unnecessarily. Also fold do_inotify_remove_from_idr() into the single
callsite as that function really is just two lines of real code.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/inotify/inotify_user.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 3697567c7897..06dae605158d 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -397,21 +397,6 @@ static struct inotify_inode_mark *inotify_idr_find(struct fsnotify_group *group,
 	return i_mark;
 }
 
-static void do_inotify_remove_from_idr(struct fsnotify_group *group,
-				       struct inotify_inode_mark *i_mark)
-{
-	struct idr *idr = &group->inotify_data.idr;
-	spinlock_t *idr_lock = &group->inotify_data.idr_lock;
-	int wd = i_mark->wd;
-
-	assert_spin_locked(idr_lock);
-
-	idr_remove(idr, wd);
-
-	/* removed from the idr, drop that ref */
-	fsnotify_put_mark(&i_mark->fsn_mark);
-}
-
 /*
  * Remove the mark from the idr (if present) and drop the reference
  * on the mark because it was in the idr.
@@ -419,6 +404,7 @@ static void do_inotify_remove_from_idr(struct fsnotify_group *group,
 static void inotify_remove_from_idr(struct fsnotify_group *group,
 				    struct inotify_inode_mark *i_mark)
 {
+	struct idr *idr = &group->inotify_data.idr;
 	spinlock_t *idr_lock = &group->inotify_data.idr_lock;
 	struct inotify_inode_mark *found_i_mark = NULL;
 	int wd;
@@ -470,13 +456,15 @@ static void inotify_remove_from_idr(struct fsnotify_group *group,
 		BUG();
 	}
 
-	do_inotify_remove_from_idr(group, i_mark);
+	idr_remove(idr, wd);
+	/* Removed from the idr, drop that ref. */
+	fsnotify_put_mark(&i_mark->fsn_mark);
 out:
+	i_mark->wd = -1;
+	spin_unlock(idr_lock);
 	/* match the ref taken by inotify_idr_find_locked() */
 	if (found_i_mark)
 		fsnotify_put_mark(&found_i_mark->fsn_mark);
-	i_mark->wd = -1;
-	spin_unlock(idr_lock);
 }
 
 /*
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/22] fsnotify: Detach mark from object list when last reference is dropped
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (8 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 09/22] inotify: Do not drop mark reference under idr_lock Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-23 10:51   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 11/22] fsnotify: Remove special handling of mark destruction on group shutdown Jan Kara
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Instead of removing mark from object list from fsnotify_detach_mark(),
remove the mark when last reference to the mark is dropped. This will
allow fanotify to wait for userspace response to event without having to
hold onto fsnotify_mark_srcu.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/mark.c                 | 166 +++++++++++++++++++++++++--------------
 include/linux/fsnotify_backend.h |   4 +-
 2 files changed, 108 insertions(+), 62 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index a9870beabe67..55550dad6617 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -49,7 +49,13 @@
  *
  * A list of notification marks relating to inode / mnt is contained in
  * fsnotify_mark_list. That structure is alive as long as there are any marks
- * in the list and is also protected by fsnotify_mark_srcu.
+ * in the list and is also protected by fsnotify_mark_srcu. A mark gets
+ * detached from fsnotify_mark_list when last reference to the mark is dropped.
+ * Thus having mark reference is enough to protect mark->obj_list_head pointer
+ * and to make sure fsnotify_mark_list cannot disappear. Also because we remove
+ * mark from g_list before dropping mark reference associated with that, any
+ * mark found through g_list is guaranteed to have mark->obj_list_head set
+ * until we drop group->mark_mutex.
  *
  * LIFETIME:
  * Inode marks survive between when they are added to an inode and when their
@@ -102,15 +108,6 @@ void fsnotify_get_mark(struct fsnotify_mark *mark)
 	atomic_inc(&mark->refcnt);
 }
 
-void fsnotify_put_mark(struct fsnotify_mark *mark)
-{
-	if (atomic_dec_and_test(&mark->refcnt)) {
-		if (mark->group)
-			fsnotify_put_group(mark->group);
-		mark->free_mark(mark);
-	}
-}
-
 static void __fsnotify_recalc_mask(struct fsnotify_mark_list *list)
 {
 	u32 new_mask = 0;
@@ -165,34 +162,47 @@ static void fsnotify_list_destroy_workfn(struct work_struct *work)
 	}
 }
 
-static struct inode *fsnotify_detach_from_object(struct fsnotify_mark *mark)
+static struct inode *fsnotify_detach_list_from_object(
+					struct fsnotify_mark_list *list)
+{
+	struct inode *inode = NULL;
+
+	if (list->flags & FSNOTIFY_LIST_TYPE_INODE) {
+		inode = list->inode;
+		inode->i_fsnotify_marks = NULL;
+		inode->i_fsnotify_mask = 0;
+		list->inode = NULL;
+		list->flags &= ~FSNOTIFY_LIST_TYPE_INODE;
+	} else if (list->flags & FSNOTIFY_LIST_TYPE_VFSMOUNT) {
+		real_mount(list->mnt)->mnt_fsnotify_marks = NULL;
+		real_mount(list->mnt)->mnt_fsnotify_mask = 0;
+		list->mnt = NULL;
+		list->flags &= ~FSNOTIFY_LIST_TYPE_VFSMOUNT;
+	}
+
+	return inode;
+}
+
+/* Called with mark->obj_list_head->lock held, releases it */
+static void fsnotify_detach_from_object(struct fsnotify_mark *mark)
 {
 	struct fsnotify_mark_list *list;
 	struct inode *inode = NULL;
 	bool free_list = false;
 
 	list = mark->obj_list_head;
-	spin_lock(&list->lock);
 	hlist_del_init_rcu(&mark->obj_list);
 	if (hlist_empty(&list->list)) {
-		if (list->flags & FSNOTIFY_LIST_TYPE_INODE) {
-			inode = list->inode;
-			inode->i_fsnotify_marks = NULL;
-			inode->i_fsnotify_mask = 0;
-			list->inode = NULL;
-			list->flags &= ~FSNOTIFY_LIST_TYPE_INODE;
-		} else if (list->flags & FSNOTIFY_LIST_TYPE_VFSMOUNT) {
-			real_mount(list->mnt)->mnt_fsnotify_marks = NULL;
-			real_mount(list->mnt)->mnt_fsnotify_mask = 0;
-			list->mnt = NULL;
-			list->flags &= ~FSNOTIFY_LIST_TYPE_VFSMOUNT;
-		}
+		inode = fsnotify_detach_list_from_object(list);
 		free_list = true;
 	} else
 		__fsnotify_recalc_mask(list);
 	mark->obj_list_head = NULL;
 	spin_unlock(&list->lock);
 
+	if (inode)
+		iput(inode);
+
 	if (free_list) {
 		spin_lock(&destroy_lock);
 		list->destroy_next = list_destroy_list;
@@ -200,49 +210,66 @@ static struct inode *fsnotify_detach_from_object(struct fsnotify_mark *mark)
 		spin_unlock(&destroy_lock);
 		queue_work(system_unbound_wq, &list_reaper_work);
 	}
+}
 
-	return inode;
+static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark)
+{
+	if (mark->group)
+		fsnotify_put_group(mark->group);
+	mark->free_mark(mark);
+}
+
+void fsnotify_put_mark(struct fsnotify_mark *mark)
+{
+	/* Catch marks that were actually never attached to object */
+	if (!mark->obj_list_head && atomic_dec_and_test(&mark->refcnt)) {
+		fsnotify_final_mark_destroy(mark);
+		return;
+	}
+
+	/*
+	 * We have to be careful so that traversals of obj_list under lock can
+	 * safely grab mark reference.
+	 */
+	if (atomic_dec_and_lock(&mark->refcnt, &mark->obj_list_head->lock)) {
+		fsnotify_detach_from_object(mark);
+		/*
+		 * Note that we didn't update flags telling whether inode cares
+		 * about what's happening with children. We update these flags
+		 * from __fsnotify_parent() lazily when next event happens on
+		 * one of our children.
+		 */
+		fsnotify_final_mark_destroy(mark);
+	}
 }
 
 /*
- * Remove mark from inode / vfsmount list, group list, drop inode reference
- * if we got one.
+ * Mark mark as dead, remove it from group list. Mark still stays in object
+ * list until its last reference is dropped. The reference corresponding to
+ * group list gets dropped after SRCU period ends from
+ * fsnotify_mark_destroy_list(). Note that we rely on mark being removed from
+ * group list before corresponding reference to it is dropped. In particular we
+ * rely on mark->obj_list_head being valid while we hold group->mark_mutex if
+ * we found the mark through g_list.
  *
  * Must be called with group->mark_mutex held.
  */
 void fsnotify_detach_mark(struct fsnotify_mark *mark)
 {
-	struct inode *inode = NULL;
 	struct fsnotify_group *group = mark->group;
 
 	BUG_ON(!mutex_is_locked(&group->mark_mutex));
 
 	spin_lock(&mark->lock);
-
 	/* something else already called this function on this mark */
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
 		spin_unlock(&mark->lock);
 		return;
 	}
-
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ATTACHED;
-
-	inode = fsnotify_detach_from_object(mark);
-
-	/*
-	 * Note that we didn't update flags telling whether inode cares about
-	 * what's happening with children. We update these flags from
-	 * __fsnotify_parent() lazily when next event happens on one of our
-	 * children.
-	 */
-
 	list_del_init(&mark->g_list);
-
 	spin_unlock(&mark->lock);
 
-	if (inode)
-		iput(inode);
-
 	atomic_dec(&group->num_marks);
 }
 
@@ -445,7 +472,9 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
 	hlist_for_each_entry(lmark, &list->list, obj_list) {
 		last = lmark;
 
-		if ((lmark->group == mark->group) && !allow_dups) {
+		if ((lmark->group == mark->group) &&
+		    (lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) &&
+		    !allow_dups) {
 			err = -EEXIST;
 			goto out_err;
 		}
@@ -496,7 +525,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	mark->group = group;
 	list_add(&mark->g_list, &group->marks_list);
 	atomic_inc(&group->num_marks);
-	fsnotify_get_mark(mark); /* for i_list and g_list */
+	fsnotify_get_mark(mark); /* for g_list */
 	spin_unlock(&mark->lock);
 
 	ret = fsnotify_add_mark_list(mark, inode, mnt, allow_dups);
@@ -549,7 +578,8 @@ struct fsnotify_mark *fsnotify_find_mark(struct fsnotify_mark_list **listp,
 	if (!list)
 		return NULL;
 	hlist_for_each_entry(mark, &list->list, obj_list) {
-		if (mark->group == group) {
+		if (mark->group == group &&
+		    (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)) {
 			fsnotify_get_mark(mark);
 			fsnotify_put_list(list);
 			return mark;
@@ -629,23 +659,39 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
 void fsnotify_destroy_marks(struct fsnotify_mark_list **listp)
 {
 	struct fsnotify_mark_list *list;
-	struct fsnotify_mark *mark;
+	struct fsnotify_mark *mark, *old_mark = NULL;
+	struct inode *inode;
 
-	while ((list = fsnotify_grab_list(listp))) {
-		/*
-		 * We have to be careful since we can race with e.g.
-		 * fsnotify_clear_marks_by_group() and once we drop the list
-		 * lock, mark can get removed from the obj_list and destroyed.
-		 * But we are holding mark reference so mark cannot be freed
-		 * and calling fsnotify_destroy_mark() more than once is fine.
-		 */
-		mark = hlist_entry(list->list.first, struct fsnotify_mark,
-				   obj_list);
+	list = fsnotify_grab_list(listp);
+	if (!list)
+		return;
+	/*
+	 * We have to be careful since we can race with e.g.
+	 * fsnotify_clear_marks_by_group() and once we drop the list->lock, the
+	 * list can get modified. However we are holding mark reference and
+	 * thus our mark cannot be removed from obj_list so we can continue
+	 * iteration after regaining list->lock.
+	 */
+	hlist_for_each_entry(mark, &list->list, obj_list) {
 		fsnotify_get_mark(mark);
-		fsnotify_put_list(list);
+		spin_unlock(&list->lock);
+		if (old_mark)
+			fsnotify_put_mark(old_mark);
+		old_mark = mark;
 		fsnotify_destroy_mark(mark, mark->group);
-		fsnotify_put_mark(mark);
+		spin_lock(&list->lock);
 	}
+	/*
+	 * Detach list from object now so that we don't pin inode until all
+	 * mark references get dropped. It would lead to strange results such
+	 * as delaying inode deletion or blocking unmount.
+	 */
+	inode = fsnotify_detach_list_from_object(list);
+	fsnotify_put_list(list);
+	if (inode)
+		iput(inode);
+	if (old_mark)
+		fsnotify_put_mark(old_mark);
 }
 
 /*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 6086fc7ff6df..76b3c34172c7 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -244,9 +244,9 @@ struct fsnotify_mark {
 	struct list_head g_list;
 	/* Protects inode / mnt pointers, flags, masks */
 	spinlock_t lock;
-	/* List of marks for inode / vfsmount [obj_list_head->lock] */
+	/* List of marks for inode / vfsmount [obj_list_head->lock, mark ref] */
 	struct hlist_node obj_list;
-	/* Head of list of marks for an object [mark->lock, group->mark_mutex] */
+	/* Head of list of marks for an object [mark ref] */
 	struct fsnotify_mark_list *obj_list_head;
 	/* Events types to ignore [mark->lock, group->mark_mutex] */
 	__u32 ignored_mask;
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/22] fsnotify: Remove special handling of mark destruction on group shutdown
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (9 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 10/22] fsnotify: Detach mark from object list when last reference is dropped Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-23 12:12   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark() Jan Kara
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Currently we queue all marks for destruction on group shutdown and then
destroy them from fsnotify_destroy_group() instead from a worker thread
which is the usual path. However worker can already be processing some
list of marks to destroy so this does not make 100% all marks are really
destroyed by the time group is shut down. This isn't a big problem as
each mark holds group reference and thus group stays partially alive
until all marks are really freed but there's no point in complicating
our lives - just wait for the delayed work to be finished instead.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fsnotify.h |  6 ++----
 fs/notify/group.c    | 10 ++++++----
 fs/notify/mark.c     |  9 +++++----
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index bdc489af0e5b..670c2bac1342 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -36,10 +36,8 @@ static inline void fsnotify_clear_marks_by_mount(struct vfsmount *mnt)
 }
 /* prepare for freeing all marks associated with given group */
 extern void fsnotify_detach_group_marks(struct fsnotify_group *group);
-/*
- * wait for fsnotify_mark_srcu period to end and free all marks in destroy_list
- */
-extern void fsnotify_mark_destroy_list(void);
+/* Wait until all marks queued for destruction are destroyed */
+extern void fsnotify_wait_marks_destroyed(void);
 
 /*
  * update the dentry->d_flags of all of inode's children to indicate if inode cares
diff --git a/fs/notify/group.c b/fs/notify/group.c
index fbe3cbebec16..0fb4aadcc19f 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -66,14 +66,16 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
 	 */
 	fsnotify_group_stop_queueing(group);
 
-	/* clear all inode marks for this group, attach them to destroy_list */
+	/* Clear all marks for this group and queue them for destruction */
 	fsnotify_detach_group_marks(group);
 
 	/*
-	 * Wait for fsnotify_mark_srcu period to end and free all marks in
-	 * destroy_list
+	 * Wait until all marks get really destroyed. We could actually destroy
+	 * them ourselves instead of waiting for worker to do it, however that
+	 * would be racy as worker can already be processing some marks before
+	 * we even entered fsnotify_destroy_group().
 	 */
-	fsnotify_mark_destroy_list();
+	fsnotify_wait_marks_destroyed();
 
 	/*
 	 * Since we have waited for fsnotify_mark_srcu in
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 55550dad6617..60f5754ce5ed 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -650,7 +650,7 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
 		fsnotify_get_mark(mark);
 		fsnotify_detach_mark(mark);
 		mutex_unlock(&group->mark_mutex);
-		__fsnotify_free_mark(mark);
+		fsnotify_free_mark(mark);
 		fsnotify_put_mark(mark);
 	}
 }
@@ -710,7 +710,7 @@ void fsnotify_init_mark(struct fsnotify_mark *mark,
  * Destroy all marks in destroy_list, waits for SRCU period to finish before
  * actually freeing marks.
  */
-void fsnotify_mark_destroy_list(void)
+static void fsnotify_mark_destroy_workfn(struct work_struct *work)
 {
 	struct fsnotify_mark *mark, *next;
 	struct list_head private_destroy_list;
@@ -728,7 +728,8 @@ void fsnotify_mark_destroy_list(void)
 	}
 }
 
-static void fsnotify_mark_destroy_workfn(struct work_struct *work)
+/* Wait for all marks queued for destruction to be actually destroyed */
+void fsnotify_wait_marks_destroyed(void)
 {
-	fsnotify_mark_destroy_list();
+	flush_delayed_work(&reaper_work);
 }
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (10 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 11/22] fsnotify: Remove special handling of mark destruction on group shutdown Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 14:15   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Jan Kara
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Currently we queue mark into a list of marks for destruction in
__fsnotify_free_mark() and keep last mark reference dangling. After the
worker waits for SRCU period, it drops the last reference to the mark
which frees it. This scheme has the disadvantage that if we hold
reference to mark and drop and reacquire SRCU lock, the mark can get
freed immediately which is slightly inconvenient and we will need to
avoid this in the future.

Move to a scheme where queueing of mark into a list of marks for
destruction happens when the last reference to the mark is dropped. Also
drop reference to the mark held by group list already when mark is
removed from that list instead of dropping it only from the destruction
worker.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/mark.c | 77 ++++++++++++++++++++++----------------------------------
 1 file changed, 30 insertions(+), 47 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 60f5754ce5ed..fee4255e9227 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -105,6 +105,7 @@ static DECLARE_WORK(list_reaper_work, fsnotify_list_destroy_workfn);
 
 void fsnotify_get_mark(struct fsnotify_mark *mark)
 {
+	WARN_ON_ONCE(!atomic_read(&mark->refcnt));
 	atomic_inc(&mark->refcnt);
 }
 
@@ -239,26 +240,32 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
 		 * from __fsnotify_parent() lazily when next event happens on
 		 * one of our children.
 		 */
-		fsnotify_final_mark_destroy(mark);
+		spin_lock(&destroy_lock);
+		list_add(&mark->g_list, &destroy_list);
+		spin_unlock(&destroy_lock);
+		queue_delayed_work(system_unbound_wq, &reaper_work,
+				   FSNOTIFY_REAPER_DELAY);
 	}
 }
 
 /*
  * Mark mark as dead, remove it from group list. Mark still stays in object
- * list until its last reference is dropped. The reference corresponding to
- * group list gets dropped after SRCU period ends from
- * fsnotify_mark_destroy_list(). Note that we rely on mark being removed from
- * group list before corresponding reference to it is dropped. In particular we
- * rely on mark->obj_list_head being valid while we hold group->mark_mutex if
- * we found the mark through g_list.
+ * list until its last reference is dropped.  Note that we rely on mark being
+ * removed from group list before corresponding reference to it is dropped. In
+ * particular we rely on mark->obj_list_head being valid while we hold
+ * group->mark_mutex if we found the mark through g_list.
  *
- * Must be called with group->mark_mutex held.
+ * Must be called with group->mark_mutex held. The caller must either hold
+ * reference to the mark or be protected by fsnotify_mark_srcu.
  */
 void fsnotify_detach_mark(struct fsnotify_mark *mark)
 {
 	struct fsnotify_group *group = mark->group;
 
-	BUG_ON(!mutex_is_locked(&group->mark_mutex));
+	WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex));
+	WARN_ON_ONCE(!srcu_read_lock_held(&fsnotify_mark_srcu) &&
+		     atomic_read(&mark->refcnt) < 1 +
+			!!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED));
 
 	spin_lock(&mark->lock);
 	/* something else already called this function on this mark */
@@ -271,18 +278,20 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
 	spin_unlock(&mark->lock);
 
 	atomic_dec(&group->num_marks);
+
+	/* Drop mark reference acquired in fsnotify_add_mark_locked() */
+	fsnotify_put_mark(mark);
 }
 
 /*
- * Prepare mark for freeing and add it to the list of marks prepared for
- * freeing. The actual freeing must happen after SRCU period ends and the
- * caller is responsible for this.
+ * Free fsnotify mark. The mark is actually only marked as being freed.  The
+ * freeing is actually happening only once last reference to the mark is
+ * dropped from a workqueue which first waits for srcu period end.
  *
- * The function returns true if the mark was added to the list of marks for
- * freeing. The function returns false if someone else has already called
- * __fsnotify_free_mark() for the mark.
+ * Caller must have a reference to the mark or be protected by
+ * fsnotify_mark_srcu.
  */
-static bool __fsnotify_free_mark(struct fsnotify_mark *mark)
+void fsnotify_free_mark(struct fsnotify_mark *mark)
 {
 	struct fsnotify_group *group = mark->group;
 
@@ -290,7 +299,7 @@ static bool __fsnotify_free_mark(struct fsnotify_mark *mark)
 	/* something else already called this function on this mark */
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
 		spin_unlock(&mark->lock);
-		return false;
+		return;
 	}
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
 	spin_unlock(&mark->lock);
@@ -302,25 +311,6 @@ static bool __fsnotify_free_mark(struct fsnotify_mark *mark)
 	 */
 	if (group->ops->freeing_mark)
 		group->ops->freeing_mark(mark, group);
-
-	spin_lock(&destroy_lock);
-	list_add(&mark->g_list, &destroy_list);
-	spin_unlock(&destroy_lock);
-
-	return true;
-}
-
-/*
- * Free fsnotify mark. The freeing is actually happening from a workqueue which
- * first waits for srcu period end. Caller must have a reference to the mark
- * or be protected by fsnotify_mark_srcu.
- */
-void fsnotify_free_mark(struct fsnotify_mark *mark)
-{
-	if (__fsnotify_free_mark(mark)) {
-		queue_delayed_work(system_unbound_wq, &reaper_work,
-				   FSNOTIFY_REAPER_DELAY);
-	}
 }
 
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
@@ -537,20 +527,13 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 
 	return ret;
 err:
-	mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
+	mark->flags &= ~(FSNOTIFY_MARK_FLAG_ALIVE |
+			 FSNOTIFY_MARK_FLAG_ATTACHED);
 	list_del_init(&mark->g_list);
-	fsnotify_put_group(group);
-	mark->group = NULL;
 	atomic_dec(&group->num_marks);
-
 	spin_unlock(&mark->lock);
 
-	spin_lock(&destroy_lock);
-	list_add(&mark->g_list, &destroy_list);
-	spin_unlock(&destroy_lock);
-	queue_delayed_work(system_unbound_wq, &reaper_work,
-				FSNOTIFY_REAPER_DELAY);
-
+	fsnotify_put_mark(mark);
 	return ret;
 }
 
@@ -724,7 +707,7 @@ static void fsnotify_mark_destroy_workfn(struct work_struct *work)
 
 	list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
 		list_del_init(&mark->g_list);
-		fsnotify_put_mark(mark);
+		fsnotify_final_mark_destroy(mark);
 	}
 }
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (11 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark() Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 15:01   ` Amir Goldstein
                     ` (2 more replies)
  2016-12-22  9:15 ` [PATCH 14/22] fsnotify: Pass SRCU index into handle_event handler Jan Kara
                   ` (9 subsequent siblings)
  22 siblings, 3 replies; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
from userspace so that the whole notification subsystem is not blocked
during that time. This patch provides a framework for safely getting
mark reference for a mark found in the object list which pins the mark
in that list. We can then drop fsnotify_mark_srcu, wait for userspace
response and then safely continue iteration of the object list once we
reaquire fsnotify_mark_srcu.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/group.c                |  1 +
 fs/notify/mark.c                 | 87 ++++++++++++++++++++++++++++++++++++++++
 include/linux/fsnotify_backend.h |  8 ++++
 3 files changed, 96 insertions(+)

diff --git a/fs/notify/group.c b/fs/notify/group.c
index 0fb4aadcc19f..79439cdf16e0 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -126,6 +126,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
 	/* set to 0 when there a no external references to this group */
 	atomic_set(&group->refcnt, 1);
 	atomic_set(&group->num_marks, 0);
+	atomic_set(&group->user_waits, 0);
 
 	spin_lock_init(&group->notification_lock);
 	INIT_LIST_HEAD(&group->notification_list);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index fee4255e9227..c5c1dcc8fa00 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -109,6 +109,16 @@ void fsnotify_get_mark(struct fsnotify_mark *mark)
 	atomic_inc(&mark->refcnt);
 }
 
+/*
+ * Get mark reference when we found the mark via lockless traversal of object
+ * list. Mark can be already removed from the list by now and on its way to be
+ * destroyed once SRCU period ends.
+ */
+static bool fsnotify_get_mark_safe(struct fsnotify_mark *mark)
+{
+	return atomic_inc_not_zero(&mark->refcnt);
+}
+
 static void __fsnotify_recalc_mask(struct fsnotify_mark_list *list)
 {
 	u32 new_mask = 0;
@@ -248,6 +258,77 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
 	}
 }
 
+bool fsnotify_prepare_user_wait(struct fsnotify_mark *inode_mark,
+				struct fsnotify_mark *vfsmount_mark,
+				int *srcu_idx)
+{
+	struct fsnotify_group *group;
+
+	if (WARN_ON_ONCE(!inode_mark && !vfsmount_mark))
+		return false;
+
+	if (inode_mark)
+		group = inode_mark->group;
+	else
+		group = vfsmount_mark->group;
+
+	/*
+	 * Since acquisition of mark reference is an atomic op as well, we can
+	 * be sure this inc is seen before any effect of refcount increment.
+	 */
+	atomic_inc(&group->user_waits);
+
+	if (inode_mark) {
+		/* This can fail if mark is being removed */
+		if (!fsnotify_get_mark_safe(inode_mark))
+			goto out_wait;
+	}
+	if (vfsmount_mark) {
+		if (!fsnotify_get_mark_safe(vfsmount_mark))
+			goto out_inode;
+	}
+
+	/*
+	 * Now that both marks are pinned by refcount we can drop SRCU lock.
+	 * Marks can still be removed from the list but because of refcount
+	 * they cannot be destroyed and we can safely resume the list iteration
+	 * once userspace returns.
+	 */
+	srcu_read_unlock(&fsnotify_mark_srcu, *srcu_idx);
+
+	return true;
+out_inode:
+	if (inode_mark)
+		fsnotify_put_mark(inode_mark);
+out_wait:
+	if (atomic_dec_and_test(&group->user_waits) && group->shutdown)
+		wake_up(&group->notification_waitq);
+	return false;
+}
+
+void fsnotify_finish_user_wait(struct fsnotify_mark *inode_mark,
+			       struct fsnotify_mark *vfsmount_mark,
+			       int *srcu_idx)
+{
+	struct fsnotify_group *group = NULL;
+
+	*srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
+	if (inode_mark) {
+		group = inode_mark->group;
+		fsnotify_put_mark(inode_mark);
+	}
+	if (vfsmount_mark) {
+		group = vfsmount_mark->group;
+		fsnotify_put_mark(vfsmount_mark);
+	}
+	/*
+	 * We abuse notification_waitq on group shutdown for waiting for all
+	 * marks pinned when waiting for userspace.
+	 */
+	if (atomic_dec_and_test(&group->user_waits) && group->shutdown)
+		wake_up(&group->notification_waitq);
+}
+
 /*
  * Mark mark as dead, remove it from group list. Mark still stays in object
  * list until its last reference is dropped.  Note that we rely on mark being
@@ -636,6 +717,12 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
 		fsnotify_free_mark(mark);
 		fsnotify_put_mark(mark);
 	}
+	/*
+	 * Some marks can still be pinned when waiting for response from
+	 * userspace. Wait for those now. fsnotify_prepare_user_wait() will
+	 * not succeed now so this wait is race-free.
+	 */
+	wait_event(group->notification_waitq, !atomic_read(&group->user_waits));
 }
 
 /* Destroy all marks attached to inode / vfsmount */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 76b3c34172c7..27223e254e00 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -162,6 +162,8 @@ struct fsnotify_group {
 	struct fsnotify_event *overflow_event;	/* Event we queue when the
 						 * notification list is too
 						 * full */
+	atomic_t user_waits;		/* Number of tasks waiting for user
+					 * response */
 
 	/* groups can define private fields here or use the void *private */
 	union {
@@ -367,6 +369,12 @@ extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, un
 extern void fsnotify_get_mark(struct fsnotify_mark *mark);
 extern void fsnotify_put_mark(struct fsnotify_mark *mark);
 extern void fsnotify_unmount_inodes(struct super_block *sb);
+extern void fsnotify_finish_user_wait(struct fsnotify_mark *inode_mark,
+				      struct fsnotify_mark *vfsmount_mark,
+				      int *srcu_idx);
+extern bool fsnotify_prepare_user_wait(struct fsnotify_mark *inode_mark,
+				       struct fsnotify_mark *vfsmount_mark,
+				       int *srcu_idx);
 
 /* put here because inotify does some weird stuff when destroying watches */
 extern void fsnotify_init_event(struct fsnotify_event *event,
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/22] fsnotify: Pass SRCU index into handle_event handler
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (12 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 15:13   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 15/22] fanotify: Release SRCU lock when waiting for userspace response Jan Kara
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Pass index acquired from srcu_read_lock into ->handle_event() handler so
that it can release and reacquire SRCU lock via
fsnotify_prepare_user_wait() and fsnotify_finish_user_wait() functions.
These functions also make sure current marks are appropriately pinned so
that iteration protected by srcu in fsnotify() stays safe.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/dnotify/dnotify.c          | 3 ++-
 fs/notify/fanotify/fanotify.c        | 3 ++-
 fs/notify/fsnotify.c                 | 7 ++++---
 fs/notify/inotify/inotify.h          | 3 ++-
 fs/notify/inotify/inotify_fsnotify.c | 3 ++-
 fs/notify/inotify/inotify_user.c     | 2 +-
 include/linux/fsnotify_backend.h     | 3 ++-
 kernel/audit_fsnotify.c              | 3 ++-
 kernel/audit_tree.c                  | 3 ++-
 kernel/audit_watch.c                 | 3 ++-
 10 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 8633d9f93fbd..b73bb082a2dc 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -85,7 +85,8 @@ static int dnotify_handle_event(struct fsnotify_group *group,
 				struct fsnotify_mark *inode_mark,
 				struct fsnotify_mark *vfsmount_mark,
 				u32 mask, const void *data, int data_type,
-				const unsigned char *file_name, u32 cookie)
+				const unsigned char *file_name, u32 cookie,
+				int *srcu_idx)
 {
 	struct dnotify_mark *dn_mark;
 	struct dnotify_struct *dn;
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index bbc175d4213d..2e8ca885fb3e 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -178,7 +178,8 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 				 struct fsnotify_mark *inode_mark,
 				 struct fsnotify_mark *fanotify_mark,
 				 u32 mask, const void *data, int data_type,
-				 const unsigned char *file_name, u32 cookie)
+				 const unsigned char *file_name, u32 cookie,
+				 int *srcu_idx)
 {
 	int ret = 0;
 	struct fanotify_event_info *event;
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 755502ec1eb6..7c1ebc934a08 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -127,7 +127,8 @@ static int send_to_group(struct inode *to_tell,
 			 struct fsnotify_mark *vfsmount_mark,
 			 __u32 mask, const void *data,
 			 int data_is, u32 cookie,
-			 const unsigned char *file_name)
+			 const unsigned char *file_name,
+			 int *srcu_idx)
 {
 	struct fsnotify_group *group = NULL;
 	__u32 inode_test_mask = 0;
@@ -178,7 +179,7 @@ static int send_to_group(struct inode *to_tell,
 
 	return group->ops->handle_event(group, to_tell, inode_mark,
 					vfsmount_mark, mask, data, data_is,
-					file_name, cookie);
+					file_name, cookie, srcu_idx);
 }
 
 /*
@@ -285,7 +286,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, const void *data, int data_is,
 			}
 		}
 		ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask,
-				    data, data_is, cookie, file_name);
+				    data, data_is, cookie, file_name, &idx);
 
 		if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
 			goto out;
diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
index a6f5907a3fee..507840969edc 100644
--- a/fs/notify/inotify/inotify.h
+++ b/fs/notify/inotify/inotify.h
@@ -27,6 +27,7 @@ extern int inotify_handle_event(struct fsnotify_group *group,
 				struct fsnotify_mark *inode_mark,
 				struct fsnotify_mark *vfsmount_mark,
 				u32 mask, const void *data, int data_type,
-				const unsigned char *file_name, u32 cookie);
+				const unsigned char *file_name, u32 cookie,
+				int *srcu_idx);
 
 extern const struct fsnotify_ops inotify_fsnotify_ops;
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 8421f44b3cb3..35f2f39514b8 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -67,7 +67,8 @@ int inotify_handle_event(struct fsnotify_group *group,
 			 struct fsnotify_mark *inode_mark,
 			 struct fsnotify_mark *vfsmount_mark,
 			 u32 mask, const void *data, int data_type,
-			 const unsigned char *file_name, u32 cookie)
+			 const unsigned char *file_name, u32 cookie,
+			 int *srcu_idx)
 {
 	struct inotify_inode_mark *i_mark;
 	struct inotify_event_info *event;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 06dae605158d..9fe125f58113 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -477,7 +477,7 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
 
 	/* Queue ignore event for the watch */
 	inotify_handle_event(group, NULL, fsn_mark, NULL, FS_IN_IGNORED,
-			     NULL, FSNOTIFY_EVENT_NONE, NULL, 0);
+			     NULL, FSNOTIFY_EVENT_NONE, NULL, 0, NULL);
 
 	i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark);
 	/* remove this mark from the idr */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 27223e254e00..79280a65b28a 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -97,7 +97,8 @@ struct fsnotify_ops {
 			    struct fsnotify_mark *inode_mark,
 			    struct fsnotify_mark *vfsmount_mark,
 			    u32 mask, const void *data, int data_type,
-			    const unsigned char *file_name, u32 cookie);
+			    const unsigned char *file_name, u32 cookie,
+			    int *srcu_idx);
 	void (*free_group_priv)(struct fsnotify_group *group);
 	void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group);
 	void (*free_event)(struct fsnotify_event *event);
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 7ea57e516029..ae8599c7aa81 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -168,7 +168,8 @@ static int audit_mark_handle_event(struct fsnotify_group *group,
 				    struct fsnotify_mark *inode_mark,
 				    struct fsnotify_mark *vfsmount_mark,
 				    u32 mask, const void *data, int data_type,
-				    const unsigned char *dname, u32 cookie)
+				    const unsigned char *dname, u32 cookie,
+				    int *srcu_idx)
 {
 	struct audit_fsnotify_mark *audit_mark;
 	const struct inode *inode = NULL;
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index b95a97bf2fd3..ff6c490004c1 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -973,7 +973,8 @@ static int audit_tree_handle_event(struct fsnotify_group *group,
 				   struct fsnotify_mark *inode_mark,
 				   struct fsnotify_mark *vfsmount_mark,
 				   u32 mask, const void *data, int data_type,
-				   const unsigned char *file_name, u32 cookie)
+				   const unsigned char *file_name, u32 cookie,
+				   int *srcu_idx)
 {
 	return 0;
 }
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index f79e4658433d..4c06e961e4ba 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -472,7 +472,8 @@ static int audit_watch_handle_event(struct fsnotify_group *group,
 				    struct fsnotify_mark *inode_mark,
 				    struct fsnotify_mark *vfsmount_mark,
 				    u32 mask, const void *data, int data_type,
-				    const unsigned char *dname, u32 cookie)
+				    const unsigned char *dname, u32 cookie,
+				    int *srcu_idx)
 {
 	const struct inode *inode;
 	struct audit_parent *parent;
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/22] fanotify: Release SRCU lock when waiting for userspace response
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (13 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 14/22] fsnotify: Pass SRCU index into handle_event handler Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 15:22   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 16/22] fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked() Jan Kara
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

When userspace task processing fanotify permission events screws up and
does not respond, fsnotify_mark_srcu SRCU is held indefinitely which
causes further hangs in the whole notification subsystem. Although we
cannot easily solve the problem of operations blocked waiting for
response from userspace, we can at least somewhat localize the damage by
dropping SRCU lock before waiting for userspace response and reacquiring
it when userspace responds.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 2e8ca885fb3e..284d2d112ad2 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -61,7 +61,10 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
 
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
 static int fanotify_get_response(struct fsnotify_group *group,
-				 struct fanotify_perm_event_info *event)
+				 struct fsnotify_mark *inode_mark,
+				 struct fsnotify_mark *vfsmount_mark,
+				 struct fanotify_perm_event_info *event,
+				 int *srcu_idx)
 {
 	int ret;
 
@@ -69,6 +72,15 @@ static int fanotify_get_response(struct fsnotify_group *group,
 
 	wait_event(group->fanotify_data.access_waitq, event->response);
 
+	if (!fsnotify_prepare_user_wait(inode_mark, vfsmount_mark, srcu_idx)) {
+		event->response = FAN_ALLOW;
+		goto out;
+	}
+
+	wait_event(group->fanotify_data.access_waitq, event->response);
+
+	fsnotify_finish_user_wait(inode_mark, vfsmount_mark, srcu_idx);
+out:
 	/* userspace responded, convert to something usable */
 	switch (event->response) {
 	case FAN_ALLOW:
@@ -220,7 +232,8 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
 	if (mask & FAN_ALL_PERM_EVENTS) {
-		ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event));
+		ret = fanotify_get_response(group, inode_mark, fanotify_mark,
+					    FANOTIFY_PE(fsn_event), srcu_idx);
 		fsnotify_destroy_event(group, fsn_event);
 	}
 #endif
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/22] fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (14 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 15/22] fanotify: Release SRCU lock when waiting for userspace response Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 16:42   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 17/22] fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask() Jan Kara
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

These helpers are now only a simple assignment and just obfuscate
what is going on. Remove them.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/dnotify/dnotify.c        |  9 +++------
 fs/notify/fanotify/fanotify_user.c |  9 ++++-----
 fs/notify/inotify/inotify_user.c   |  6 ++----
 fs/notify/mark.c                   | 14 --------------
 include/linux/fsnotify_backend.h   |  4 ----
 5 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index b73bb082a2dc..98463f40f43c 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -52,7 +52,7 @@ struct dnotify_mark {
  */
 static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark)
 {
-	__u32 new_mask, old_mask;
+	__u32 new_mask = 0;
 	struct dnotify_struct *dn;
 	struct dnotify_mark *dn_mark  = container_of(fsn_mark,
 						     struct dnotify_mark,
@@ -60,14 +60,11 @@ static void dnotify_recalc_inode_mask(struct fsnotify_mark *fsn_mark)
 
 	assert_spin_locked(&fsn_mark->lock);
 
-	old_mask = fsn_mark->mask;
-	new_mask = 0;
 	for (dn = dn_mark->dn; dn != NULL; dn = dn->dn_next)
 		new_mask |= (dn->dn_mask & ~FS_DN_MULTISHOT);
-	fsnotify_set_mark_mask_locked(fsn_mark, new_mask);
-
-	if (old_mask == new_mask)
+	if (fsn_mark->mask == new_mask)
 		return;
+	fsn_mark->mask = new_mask;
 
 	fsnotify_recalc_mask(fsn_mark->obj_list_head);
 }
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 8dcec9eecafd..14a1036eee06 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -510,13 +510,12 @@ static __u32 fanotify_mark_remove_from_mask(struct fsnotify_mark *fsn_mark,
 			tmask &= ~FAN_ONDIR;
 
 		oldmask = fsn_mark->mask;
-		fsnotify_set_mark_mask_locked(fsn_mark, tmask);
+		fsn_mark->mask = tmask;
 	} else {
 		__u32 tmask = fsn_mark->ignored_mask & ~mask;
 		if (flags & FAN_MARK_ONDIR)
 			tmask &= ~FAN_ONDIR;
-
-		fsnotify_set_mark_ignored_mask_locked(fsn_mark, tmask);
+		fsn_mark->ignored_mask = tmask;
 	}
 	*destroy = !(fsn_mark->mask | fsn_mark->ignored_mask);
 	spin_unlock(&fsn_mark->lock);
@@ -598,13 +597,13 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
 			tmask |= FAN_ONDIR;
 
 		oldmask = fsn_mark->mask;
-		fsnotify_set_mark_mask_locked(fsn_mark, tmask);
+		fsn_mark->mask = tmask;
 	} else {
 		__u32 tmask = fsn_mark->ignored_mask | mask;
 		if (flags & FAN_MARK_ONDIR)
 			tmask |= FAN_ONDIR;
 
-		fsnotify_set_mark_ignored_mask_locked(fsn_mark, tmask);
+		fsn_mark->ignored_mask = tmask;
 		if (flags & FAN_MARK_IGNORED_SURV_MODIFY)
 			fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY;
 	}
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 9fe125f58113..ee12b8f01c01 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -516,14 +516,12 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
 	i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark);
 
 	spin_lock(&fsn_mark->lock);
-
 	old_mask = fsn_mark->mask;
 	if (add)
-		fsnotify_set_mark_mask_locked(fsn_mark, (fsn_mark->mask | mask));
+		fsn_mark->mask |= mask;
 	else
-		fsnotify_set_mark_mask_locked(fsn_mark, mask);
+		fsn_mark->mask = mask;
 	new_mask = fsn_mark->mask;
-
 	spin_unlock(&fsn_mark->lock);
 
 	if (old_mask != new_mask) {
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index c5c1dcc8fa00..b4a2f6b237cd 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -403,20 +403,6 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 	fsnotify_free_mark(mark);
 }
 
-void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask)
-{
-	assert_spin_locked(&mark->lock);
-
-	mark->mask = mask;
-}
-
-void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask)
-{
-	assert_spin_locked(&mark->lock);
-
-	mark->ignored_mask = mask;
-}
-
 /*
  * Sorting function for lists of fsnotify marks.
  *
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 79280a65b28a..b32bed260afc 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -345,10 +345,6 @@ extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(str
 extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
 /* find (and take a reference) to a mark associated with group and vfsmount */
 extern struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt);
-/* set the ignored_mask of a mark */
-extern void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask);
-/* set the mask of a mark (might pin the object into memory */
-extern void fsnotify_set_mark_mask_locked(struct fsnotify_mark *mark, __u32 mask);
 /* attach the mark to both the group and the inode */
 extern int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
 			     struct inode *inode, struct vfsmount *mnt, int allow_dups);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/22] fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask()
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (15 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 16/22] fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked() Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 16:44   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 18/22] fsnotify: Inline fsnotify_clear_{inode|vfsmount|_mark_group() Jan Kara
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

These helpers are just very thin wrappers now. Remove them.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify_user.c | 8 ++++----
 fs/notify/inode_mark.c             | 5 -----
 fs/notify/inotify/inotify_user.c   | 2 +-
 fs/notify/vfsmount_mark.c          | 5 -----
 include/linux/fsnotify_backend.h   | 4 ----
 5 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 14a1036eee06..b68cec253a21 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -541,7 +541,7 @@ static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
 	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
 						 &destroy_mark);
 	if (removed & real_mount(mnt)->mnt_fsnotify_mask)
-		fsnotify_recalc_vfsmount_mask(mnt);
+		fsnotify_recalc_mask(real_mount(mnt)->mnt_fsnotify_marks);
 	if (destroy_mark)
 		fsnotify_detach_mark(fsn_mark);
 	mutex_unlock(&group->mark_mutex);
@@ -570,7 +570,7 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 	removed = fanotify_mark_remove_from_mask(fsn_mark, mask, flags,
 						 &destroy_mark);
 	if (removed & inode->i_fsnotify_mask)
-		fsnotify_recalc_inode_mask(inode);
+		fsnotify_recalc_mask(inode->i_fsnotify_marks);
 	if (destroy_mark)
 		fsnotify_detach_mark(fsn_mark);
 	mutex_unlock(&group->mark_mutex);
@@ -655,7 +655,7 @@ static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
 	}
 	added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
 	if (added & ~real_mount(mnt)->mnt_fsnotify_mask)
-		fsnotify_recalc_vfsmount_mask(mnt);
+		fsnotify_recalc_mask(real_mount(mnt)->mnt_fsnotify_marks);
 	mutex_unlock(&group->mark_mutex);
 
 	fsnotify_put_mark(fsn_mark);
@@ -692,7 +692,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group,
 	}
 	added = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
 	if (added & ~inode->i_fsnotify_mask)
-		fsnotify_recalc_inode_mask(inode);
+		fsnotify_recalc_mask(inode->i_fsnotify_marks);
 	mutex_unlock(&group->mark_mutex);
 
 	fsnotify_put_mark(fsn_mark);
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index 3d309c890670..c6888dab682b 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -30,11 +30,6 @@
 
 #include "../internal.h"
 
-void fsnotify_recalc_inode_mask(struct inode *inode)
-{
-	fsnotify_recalc_mask(inode->i_fsnotify_marks);
-}
-
 /*
  * Given a group clear all of the inode marks associated with that group.
  */
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index ee12b8f01c01..bbcebf6beeb9 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -532,7 +532,7 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
 
 		/* update the inode with this new fsn_mark */
 		if (dropped || do_inode)
-			fsnotify_recalc_inode_mask(inode);
+			fsnotify_recalc_mask(inode->i_fsnotify_marks);
 
 	}
 
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index c4166d38c0cc..548c33e8f2fa 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -29,11 +29,6 @@
 #include <linux/fsnotify_backend.h>
 #include "fsnotify.h"
 
-void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt)
-{
-	fsnotify_recalc_mask(real_mount(mnt)->mnt_fsnotify_marks);
-}
-
 void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
 {
 	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_VFSMOUNT);
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index b32bed260afc..5c33ba9bf1ec 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -336,10 +336,6 @@ extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group
 
 /* Calculate mask of events for a list of marks */
 extern void fsnotify_recalc_mask(struct fsnotify_mark_list *list);
-/* run all marks associated with a vfsmount and update mnt->mnt_fsnotify_mask */
-extern void fsnotify_recalc_vfsmount_mask(struct vfsmount *mnt);
-/* run all marks associated with an inode and update inode->i_fsnotify_mask */
-extern void fsnotify_recalc_inode_mask(struct inode *inode);
 extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(struct fsnotify_mark *mark));
 /* find (and take a reference) to a mark associated with group and inode */
 extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/22] fsnotify: Inline fsnotify_clear_{inode|vfsmount|_mark_group()
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (16 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 17/22] fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask() Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 16:57   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 19/22] fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark() Jan Kara
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Inline these helpers as they are very thin. We still keep them as we
don't want to expose details about how list type is determined.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/inode_mark.c           |  8 --------
 fs/notify/vfsmount_mark.c        |  5 -----
 include/linux/fsnotify_backend.h | 14 ++++++++++----
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index c6888dab682b..bdc15f736082 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -31,14 +31,6 @@
 #include "../internal.h"
 
 /*
- * Given a group clear all of the inode marks associated with that group.
- */
-void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group)
-{
-	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_INODE);
-}
-
-/*
  * given a group and inode, find the mark associated with that combination.
  * if found take a reference to that mark and return it, else return NULL
  */
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
index 548c33e8f2fa..1e692c56deec 100644
--- a/fs/notify/vfsmount_mark.c
+++ b/fs/notify/vfsmount_mark.c
@@ -29,11 +29,6 @@
 #include <linux/fsnotify_backend.h>
 #include "fsnotify.h"
 
-void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
-{
-	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_VFSMOUNT);
-}
-
 /*
  * given a group and vfsmount, find the mark associated with that combination.
  * if found take a reference to that mark and return it, else return NULL
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 5c33ba9bf1ec..c485b17edba9 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -353,12 +353,18 @@ extern void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 extern void fsnotify_detach_mark(struct fsnotify_mark *mark);
 /* free mark */
 extern void fsnotify_free_mark(struct fsnotify_mark *mark);
-/* run all the marks in a group, and clear all of the vfsmount marks */
-extern void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group);
-/* run all the marks in a group, and clear all of the inode marks */
-extern void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the marks attached to given object type */
 extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, unsigned int flags);
+/* run all the marks in a group, and clear all of the vfsmount marks */
+static inline void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
+{
+	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_VFSMOUNT);
+}
+/* run all the marks in a group, and clear all of the inode marks */
+static inline void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group)
+{
+	fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_INODE);
+}
 extern void fsnotify_get_mark(struct fsnotify_mark *mark);
 extern void fsnotify_put_mark(struct fsnotify_mark *mark);
 extern void fsnotify_unmount_inodes(struct super_block *sb);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/22] fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark()
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (17 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 18/22] fsnotify: Inline fsnotify_clear_{inode|vfsmount|_mark_group() Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 17:14   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 20/22] fsnotify: Drop inode_mark.c Jan Kara
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

These are very thin wrappers, just remove them. Drop
fs/notify/vfsmount_mark.c as it is empty now.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/Makefile                 |  2 +-
 fs/notify/dnotify/dnotify.c        |  4 ++--
 fs/notify/fanotify/fanotify_user.c | 12 ++++++-----
 fs/notify/fsnotify.h               |  4 ----
 fs/notify/inode_mark.c             | 10 ---------
 fs/notify/inotify/inotify_user.c   |  2 +-
 fs/notify/vfsmount_mark.c          | 42 --------------------------------------
 include/linux/fsnotify_backend.h   |  8 ++++----
 kernel/audit_tree.c                |  3 ++-
 kernel/audit_watch.c               |  2 +-
 10 files changed, 18 insertions(+), 71 deletions(-)
 delete mode 100644 fs/notify/vfsmount_mark.c

diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index 96d3420d0242..ebb64a0282d1 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o inode_mark.o \
-				   mark.o vfsmount_mark.o fdinfo.o
+				   mark.o fdinfo.o
 
 obj-y			+= dnotify/
 obj-y			+= inotify/
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 98463f40f43c..840cc56478ad 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -157,7 +157,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id)
 	if (!S_ISDIR(inode->i_mode))
 		return;
 
-	fsn_mark = fsnotify_find_inode_mark(dnotify_group, inode);
+	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group);
 	if (!fsn_mark)
 		return;
 	dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark);
@@ -313,7 +313,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 	mutex_lock(&dnotify_group->mark_mutex);
 
 	/* add the new_fsn_mark or find an old one. */
-	fsn_mark = fsnotify_find_inode_mark(dnotify_group, inode);
+	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group);
 	if (fsn_mark) {
 		dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark);
 		spin_lock(&fsn_mark->lock);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index b68cec253a21..7b4d9d67f4f9 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -532,7 +532,8 @@ static int fanotify_remove_vfsmount_mark(struct fsnotify_group *group,
 	int destroy_mark;
 
 	mutex_lock(&group->mark_mutex);
-	fsn_mark = fsnotify_find_vfsmount_mark(group, mnt);
+	fsn_mark = fsnotify_find_mark(&real_mount(mnt)->mnt_fsnotify_marks,
+				      group);
 	if (!fsn_mark) {
 		mutex_unlock(&group->mark_mutex);
 		return -ENOENT;
@@ -561,7 +562,7 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 	int destroy_mark;
 
 	mutex_lock(&group->mark_mutex);
-	fsn_mark = fsnotify_find_inode_mark(group, inode);
+	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
 	if (!fsn_mark) {
 		mutex_unlock(&group->mark_mutex);
 		return -ENOENT;
@@ -577,7 +578,7 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 	if (destroy_mark)
 		fsnotify_free_mark(fsn_mark);
 
-	/* matches the fsnotify_find_inode_mark() */
+	/* matches the fsnotify_find_mark() */
 	fsnotify_put_mark(fsn_mark);
 
 	return 0;
@@ -645,7 +646,8 @@ static int fanotify_add_vfsmount_mark(struct fsnotify_group *group,
 	__u32 added;
 
 	mutex_lock(&group->mark_mutex);
-	fsn_mark = fsnotify_find_vfsmount_mark(group, mnt);
+	fsn_mark = fsnotify_find_mark(&real_mount(mnt)->mnt_fsnotify_marks,
+				      group);
 	if (!fsn_mark) {
 		fsn_mark = fanotify_add_new_mark(group, NULL, mnt);
 		if (IS_ERR(fsn_mark)) {
@@ -682,7 +684,7 @@ static int fanotify_add_inode_mark(struct fsnotify_group *group,
 		return 0;
 
 	mutex_lock(&group->mark_mutex);
-	fsn_mark = fsnotify_find_inode_mark(group, inode);
+	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
 	if (!fsn_mark) {
 		fsn_mark = fanotify_add_new_mark(group, inode, NULL);
 		if (IS_ERR(fsn_mark)) {
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 670c2bac1342..ba722b8d36b7 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -18,10 +18,6 @@ extern struct srcu_struct fsnotify_mark_srcu;
 extern int fsnotify_compare_groups(struct fsnotify_group *a,
 				   struct fsnotify_group *b);
 
-/* Find mark belonging to given group in the list of marks */
-extern struct fsnotify_mark *fsnotify_find_mark(
-					struct fsnotify_mark_list **listp,
-					struct fsnotify_group *group);
 /* Destroy all marks in the given list */
 extern void fsnotify_destroy_marks(struct fsnotify_mark_list **list);
 /* run the list of all marks associated with inode and destroy them */
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
index bdc15f736082..5cc317bad082 100644
--- a/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@ -30,16 +30,6 @@
 
 #include "../internal.h"
 
-/*
- * given a group and inode, find the mark associated with that combination.
- * if found take a reference to that mark and return it, else return NULL
- */
-struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group,
-					       struct inode *inode)
-{
-	return fsnotify_find_mark(&inode->i_fsnotify_marks, group);
-}
-
 /**
  * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
  * @sb: superblock being unmounted.
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index bbcebf6beeb9..546a4c7d0f03 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -509,7 +509,7 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
 
 	mask = inotify_arg_to_mask(arg);
 
-	fsn_mark = fsnotify_find_inode_mark(group, inode);
+	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
 	if (!fsn_mark)
 		return -ENOENT;
 
diff --git a/fs/notify/vfsmount_mark.c b/fs/notify/vfsmount_mark.c
deleted file mode 100644
index 1e692c56deec..000000000000
--- a/fs/notify/vfsmount_mark.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/*
- *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
- *
- *  This program is free software; you can redistribute it and/or modify
- *  it under the terms of the GNU General Public License as published by
- *  the Free Software Foundation; either version 2, or (at your option)
- *  any later version.
- *
- *  This program is distributed in the hope that it will be useful,
- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *  GNU General Public License for more details.
- *
- *  You should have received a copy of the GNU General Public License
- *  along with this program; see the file COPYING.  If not, write to
- *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include <linux/fs.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/mount.h>
-#include <linux/mutex.h>
-#include <linux/spinlock.h>
-
-#include <linux/atomic.h>
-
-#include <linux/fsnotify_backend.h>
-#include "fsnotify.h"
-
-/*
- * given a group and vfsmount, find the mark associated with that combination.
- * if found take a reference to that mark and return it, else return NULL
- */
-struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group,
-						  struct vfsmount *mnt)
-{
-	struct mount *m = real_mount(mnt);
-
-	return fsnotify_find_mark(&m->mnt_fsnotify_marks, group);
-}
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index c485b17edba9..b0f543c03928 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -337,10 +337,10 @@ extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group
 /* Calculate mask of events for a list of marks */
 extern void fsnotify_recalc_mask(struct fsnotify_mark_list *list);
 extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(struct fsnotify_mark *mark));
-/* find (and take a reference) to a mark associated with group and inode */
-extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
-/* find (and take a reference) to a mark associated with group and vfsmount */
-extern struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt);
+/* Find mark belonging to given group in the list of marks */
+extern struct fsnotify_mark *fsnotify_find_mark(
+					struct fsnotify_mark_list **listp,
+					struct fsnotify_group *group);
 /* attach the mark to both the group and the inode */
 extern int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
 			     struct inode *inode, struct vfsmount *mnt, int allow_dups);
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index ff6c490004c1..429111dcfecb 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -379,7 +379,8 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	struct node *p;
 	int n;
 
-	old_entry = fsnotify_find_inode_mark(audit_tree_group, inode);
+	old_entry = fsnotify_find_mark(&inode->i_fsnotify_marks,
+				       audit_tree_group);
 	if (!old_entry)
 		return create_chunk(inode, tree);
 
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 4c06e961e4ba..800084cf0539 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -102,7 +102,7 @@ static inline struct audit_parent *audit_find_parent(struct inode *inode)
 	struct audit_parent *parent = NULL;
 	struct fsnotify_mark *entry;
 
-	entry = fsnotify_find_inode_mark(audit_watch_group, inode);
+	entry = fsnotify_find_mark(&inode->i_fsnotify_marks, audit_watch_group);
 	if (entry)
 		parent = container_of(entry, struct audit_parent, mark);
 
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 20/22] fsnotify: Drop inode_mark.c
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (18 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 19/22] fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark() Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 17:15   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 21/22] fsnotify: Add group pointer in fsnotify_init_mark() Jan Kara
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

inode_mark.c now contains only a single function. Move it to
fs/notify/fsnotify.c and remove inode_mark.c.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/Makefile     |  4 +--
 fs/notify/fsnotify.c   | 57 ++++++++++++++++++++++++++++++++
 fs/notify/inode_mark.c | 88 --------------------------------------------------
 3 files changed, 59 insertions(+), 90 deletions(-)
 delete mode 100644 fs/notify/inode_mark.c

diff --git a/fs/notify/Makefile b/fs/notify/Makefile
index ebb64a0282d1..3e969ae91b60 100644
--- a/fs/notify/Makefile
+++ b/fs/notify/Makefile
@@ -1,5 +1,5 @@
-obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o inode_mark.o \
-				   mark.o fdinfo.o
+obj-$(CONFIG_FSNOTIFY)		+= fsnotify.o notification.o group.o mark.o \
+				   fdinfo.o
 
 obj-y			+= dnotify/
 obj-y			+= inotify/
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 7c1ebc934a08..b52a48694f14 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -41,6 +41,63 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt)
 	fsnotify_clear_marks_by_mount(mnt);
 }
 
+/**
+ * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
+ * @sb: superblock being unmounted.
+ *
+ * Called during unmount with no locks held, so needs to be safe against
+ * concurrent modifiers. We temporarily drop sb->s_inode_list_lock and CAN block.
+ */
+void fsnotify_unmount_inodes(struct super_block *sb)
+{
+	struct inode *inode, *iput_inode = NULL;
+
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		/*
+		 * We cannot __iget() an inode in state I_FREEING,
+		 * I_WILL_FREE, or I_NEW which is fine because by that point
+		 * the inode cannot have any associated watches.
+		 */
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+
+		/*
+		 * If i_count is zero, the inode cannot have any watches and
+		 * doing an __iget/iput with MS_ACTIVE clear would actually
+		 * evict all inodes with zero i_count from icache which is
+		 * unnecessarily violent and may in fact be illegal to do.
+		 */
+		if (!atomic_read(&inode->i_count)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&sb->s_inode_list_lock);
+
+		if (iput_inode)
+			iput(iput_inode);
+
+		/* for each watch, send FS_UNMOUNT and then remove it */
+		fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
+
+		fsnotify_inode_delete(inode);
+
+		iput_inode = inode;
+
+		spin_lock(&sb->s_inode_list_lock);
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+
+	if (iput_inode)
+		iput(iput_inode);
+}
+
 /*
  * Given an inode, first check if we care what happens to our children.  Inotify
  * and dnotify both tell their parents about events.  If we care about any event
diff --git a/fs/notify/inode_mark.c b/fs/notify/inode_mark.c
deleted file mode 100644
index 5cc317bad082..000000000000
--- a/fs/notify/inode_mark.c
+++ /dev/null
@@ -1,88 +0,0 @@
-/*
- *  Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
- *
- *  This program is free software; you can redistribute it and/or modify
- *  it under the terms of the GNU General Public License as published by
- *  the Free Software Foundation; either version 2, or (at your option)
- *  any later version.
- *
- *  This program is distributed in the hope that it will be useful,
- *  but WITHOUT ANY WARRANTY; without even the implied warranty of
- *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- *  GNU General Public License for more details.
- *
- *  You should have received a copy of the GNU General Public License
- *  along with this program; see the file COPYING.  If not, write to
- *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include <linux/fs.h>
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/mutex.h>
-#include <linux/spinlock.h>
-
-#include <linux/atomic.h>
-
-#include <linux/fsnotify_backend.h>
-#include "fsnotify.h"
-
-#include "../internal.h"
-
-/**
- * fsnotify_unmount_inodes - an sb is unmounting.  handle any watched inodes.
- * @sb: superblock being unmounted.
- *
- * Called during unmount with no locks held, so needs to be safe against
- * concurrent modifiers. We temporarily drop sb->s_inode_list_lock and CAN block.
- */
-void fsnotify_unmount_inodes(struct super_block *sb)
-{
-	struct inode *inode, *iput_inode = NULL;
-
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		/*
-		 * We cannot __iget() an inode in state I_FREEING,
-		 * I_WILL_FREE, or I_NEW which is fine because by that point
-		 * the inode cannot have any associated watches.
-		 */
-		spin_lock(&inode->i_lock);
-		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-
-		/*
-		 * If i_count is zero, the inode cannot have any watches and
-		 * doing an __iget/iput with MS_ACTIVE clear would actually
-		 * evict all inodes with zero i_count from icache which is
-		 * unnecessarily violent and may in fact be illegal to do.
-		 */
-		if (!atomic_read(&inode->i_count)) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-
-		__iget(inode);
-		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
-
-		if (iput_inode)
-			iput(iput_inode);
-
-		/* for each watch, send FS_UNMOUNT and then remove it */
-		fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0);
-
-		fsnotify_inode_delete(inode);
-
-		iput_inode = inode;
-
-		spin_lock(&sb->s_inode_list_lock);
-	}
-	spin_unlock(&sb->s_inode_list_lock);
-
-	if (iput_inode)
-		iput(iput_inode);
-}
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 21/22] fsnotify: Add group pointer in fsnotify_init_mark()
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (19 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 20/22] fsnotify: Drop inode_mark.c Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 17:34   ` Amir Goldstein
  2016-12-22  9:15 ` [PATCH 22/22] fsnotify: Move ->free_mark callback to fsnotify_ops Jan Kara
  2016-12-22 20:58 ` [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Paul Moore
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Currently we initialize mark->group only in fsnotify_add_mark_lock().
However we will need to access fsnotify_ops of corresponding group from
fsnotify_put_mark() so we need mark->group initialized earlier. Do that
in fsnotify_init_mark() which has a consequence that once
fsnotify_init_mark() is called on a mark, the mark has to be destroyed
by fsnotify_put_mark().

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/dnotify/dnotify.c        |  5 ++---
 fs/notify/fanotify/fanotify_user.c |  4 ++--
 fs/notify/inotify/inotify_user.c   |  5 ++---
 fs/notify/mark.c                   | 17 ++++++++++-------
 include/linux/fsnotify_backend.h   | 12 +++++++-----
 kernel/audit_fsnotify.c            |  7 ++++---
 kernel/audit_tree.c                | 15 ++++++++-------
 kernel/audit_watch.c               |  5 +++--
 8 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 840cc56478ad..b1a5c7aa7c80 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -305,7 +305,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 
 	/* set up the new_fsn_mark and new_dn_mark */
 	new_fsn_mark = &new_dn_mark->fsn_mark;
-	fsnotify_init_mark(new_fsn_mark, dnotify_free_mark);
+	fsnotify_init_mark(new_fsn_mark, dnotify_group, dnotify_free_mark);
 	new_fsn_mark->mask = mask;
 	new_dn_mark->dn = NULL;
 
@@ -318,8 +318,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 		dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark);
 		spin_lock(&fsn_mark->lock);
 	} else {
-		fsnotify_add_mark_locked(new_fsn_mark, dnotify_group, inode,
-					 NULL, 0);
+		fsnotify_add_mark_locked(new_fsn_mark, inode, NULL, 0);
 		spin_lock(&new_fsn_mark->lock);
 		fsn_mark = new_fsn_mark;
 		dn_mark = new_dn_mark;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 7b4d9d67f4f9..4ba51b836a44 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -627,8 +627,8 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 	if (!mark)
 		return ERR_PTR(-ENOMEM);
 
-	fsnotify_init_mark(mark, fanotify_free_mark);
-	ret = fsnotify_add_mark_locked(mark, group, inode, mnt, 0);
+	fsnotify_init_mark(mark, group, fanotify_free_mark);
+	ret = fsnotify_add_mark_locked(mark, inode, mnt, 0);
 	if (ret) {
 		fsnotify_put_mark(mark);
 		return ERR_PTR(ret);
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 546a4c7d0f03..38e66d497b3b 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -561,7 +561,7 @@ static int inotify_new_watch(struct fsnotify_group *group,
 	if (unlikely(!tmp_i_mark))
 		return -ENOMEM;
 
-	fsnotify_init_mark(&tmp_i_mark->fsn_mark, inotify_free_mark);
+	fsnotify_init_mark(&tmp_i_mark->fsn_mark, group, inotify_free_mark);
 	tmp_i_mark->fsn_mark.mask = mask;
 	tmp_i_mark->wd = -1;
 
@@ -574,8 +574,7 @@ static int inotify_new_watch(struct fsnotify_group *group,
 		goto out_err;
 
 	/* we are on the idr, now get on the inode */
-	ret = fsnotify_add_mark_locked(&tmp_i_mark->fsn_mark, group, inode,
-				       NULL, 0);
+	ret = fsnotify_add_mark_locked(&tmp_i_mark->fsn_mark, inode, NULL, 0);
 	if (ret) {
 		/* we failed to get on the inode, get off the idr */
 		inotify_remove_from_idr(group, tmp_i_mark);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index b4a2f6b237cd..f32f4ff4ecc4 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -559,10 +559,10 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
  * These marks may be used for the fsnotify backend to determine which
  * event types should be delivered to which group.
  */
-int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
-			     struct fsnotify_group *group, struct inode *inode,
+int fsnotify_add_mark_locked(struct fsnotify_mark *mark, struct inode *inode,
 			     struct vfsmount *mnt, int allow_dups)
 {
+	struct fsnotify_group *group = mark->group;
 	int ret = 0;
 
 	BUG_ON(inode && mnt);
@@ -578,8 +578,6 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	spin_lock(&mark->lock);
 	mark->flags |= FSNOTIFY_MARK_FLAG_ALIVE | FSNOTIFY_MARK_FLAG_ATTACHED;
 
-	fsnotify_get_group(group);
-	mark->group = group;
 	list_add(&mark->g_list, &group->marks_list);
 	atomic_inc(&group->num_marks);
 	fsnotify_get_mark(mark); /* for g_list */
@@ -604,12 +602,14 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	return ret;
 }
 
-int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
-		      struct inode *inode, struct vfsmount *mnt, int allow_dups)
+int fsnotify_add_mark(struct fsnotify_mark *mark, struct inode *inode,
+		      struct vfsmount *mnt, int allow_dups)
 {
 	int ret;
+	struct fsnotify_group *group = mark->group;
+
 	mutex_lock(&group->mark_mutex);
-	ret = fsnotify_add_mark_locked(mark, group, inode, mnt, allow_dups);
+	ret = fsnotify_add_mark_locked(mark, inode, mnt, allow_dups);
 	mutex_unlock(&group->mark_mutex);
 	return ret;
 }
@@ -754,12 +754,15 @@ void fsnotify_destroy_marks(struct fsnotify_mark_list **listp)
  * Nothing fancy, just initialize lists and locks and counters.
  */
 void fsnotify_init_mark(struct fsnotify_mark *mark,
+			struct fsnotify_group *group,
 			void (*free_mark)(struct fsnotify_mark *mark))
 {
 	memset(mark, 0, sizeof(*mark));
 	spin_lock_init(&mark->lock);
 	atomic_set(&mark->refcnt, 1);
 	mark->free_mark = free_mark;
+	fsnotify_get_group(group);
+	mark->group = group;
 }
 
 /*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index b0f543c03928..292884ecc204 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -336,15 +336,17 @@ extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group
 
 /* Calculate mask of events for a list of marks */
 extern void fsnotify_recalc_mask(struct fsnotify_mark_list *list);
-extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(struct fsnotify_mark *mark));
+extern void fsnotify_init_mark(struct fsnotify_mark *mark,
+			       struct fsnotify_group *group,
+			       void (*free_mark)(struct fsnotify_mark *mark));
 /* Find mark belonging to given group in the list of marks */
 extern struct fsnotify_mark *fsnotify_find_mark(
 					struct fsnotify_mark_list **listp,
 					struct fsnotify_group *group);
-/* attach the mark to both the group and the inode */
-extern int fsnotify_add_mark(struct fsnotify_mark *mark, struct fsnotify_group *group,
-			     struct inode *inode, struct vfsmount *mnt, int allow_dups);
-extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, struct fsnotify_group *group,
+/* attach the mark to the inode or vfsmount */
+extern int fsnotify_add_mark(struct fsnotify_mark *mark, struct inode *inode,
+			     struct vfsmount *mnt, int allow_dups);
+extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 				    struct inode *inode, struct vfsmount *mnt, int allow_dups);
 /* given a group and a mark, flag mark to be freed when all references are dropped */
 extern void fsnotify_destroy_mark(struct fsnotify_mark *mark,
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index ae8599c7aa81..5bb7e920f126 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -103,15 +103,16 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
 		goto out;
 	}
 
-	fsnotify_init_mark(&audit_mark->mark, audit_fsnotify_free_mark);
+	fsnotify_init_mark(&audit_mark->mark, audit_fsnotify_group,
+			   audit_fsnotify_free_mark);
 	audit_mark->mark.mask = AUDIT_FS_EVENTS;
 	audit_mark->path = pathname;
 	audit_update_mark(audit_mark, dentry->d_inode);
 	audit_mark->rule = krule;
 
-	ret = fsnotify_add_mark(&audit_mark->mark, audit_fsnotify_group, inode, NULL, true);
+	ret = fsnotify_add_mark(&audit_mark->mark, inode, NULL, true);
 	if (ret < 0) {
-		audit_fsnotify_mark_free(audit_mark);
+		fsnotify_put_mark(&audit_mark->mark);
 		audit_mark = ERR_PTR(ret);
 	}
 out:
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 429111dcfecb..bce7ee6c5fda 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -154,7 +154,8 @@ static struct audit_chunk *alloc_chunk(int count)
 		INIT_LIST_HEAD(&chunk->owners[i].list);
 		chunk->owners[i].index = i;
 	}
-	fsnotify_init_mark(&chunk->mark, audit_tree_destroy_watch);
+	fsnotify_init_mark(&chunk->mark, audit_tree_group,
+			   audit_tree_destroy_watch);
 	chunk->mark.mask = FS_IN_IGNORED;
 	return chunk;
 }
@@ -252,7 +253,7 @@ static void untag_chunk(struct node *p)
 	    !entry->obj_list_head->inode) {
 		spin_unlock(&entry->lock);
 		if (new)
-			free_chunk(new);
+			fsnotify_put_mark(&new->mark);
 		goto out;
 	}
 
@@ -275,8 +276,8 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	if (fsnotify_add_mark_locked(&new->mark, entry->group,
-				     entry->obj_list_head->inode, NULL, 1)) {
+	if (fsnotify_add_mark_locked(&new->mark, entry->obj_list_head->inode,
+				     NULL, 1)) {
 		fsnotify_put_mark(&new->mark);
 		goto Fallback;
 	}
@@ -340,7 +341,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 		return -ENOMEM;
 
 	entry = &chunk->mark;
-	if (fsnotify_add_mark(entry, audit_tree_group, inode, NULL, 0)) {
+	if (fsnotify_add_mark(entry, inode, NULL, 0)) {
 		fsnotify_put_mark(entry);
 		return -ENOSPC;
 	}
@@ -412,11 +413,11 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		spin_unlock(&old_entry->lock);
 		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(old_entry);
-		free_chunk(chunk);
+		fsnotify_put_mark(&chunk->mark);
 		return -ENOENT;
 	}
 
-	if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
+	if (fsnotify_add_mark_locked(chunk_entry,
 			     old_entry->obj_list_head->inode, NULL, 1)) {
 		spin_unlock(&old_entry->lock);
 		mutex_unlock(&old_entry->group->mark_mutex);
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 800084cf0539..0f355086215a 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -157,9 +157,10 @@ static struct audit_parent *audit_init_parent(struct path *path)
 
 	INIT_LIST_HEAD(&parent->watches);
 
-	fsnotify_init_mark(&parent->mark, audit_watch_free_mark);
+	fsnotify_init_mark(&parent->mark, audit_watch_group,
+			   audit_watch_free_mark);
 	parent->mark.mask = AUDIT_FS_WATCH;
-	ret = fsnotify_add_mark(&parent->mark, audit_watch_group, inode, NULL, 0);
+	ret = fsnotify_add_mark(&parent->mark, inode, NULL, 0);
 	if (ret < 0) {
 		audit_free_parent(parent);
 		return ERR_PTR(ret);
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 22/22] fsnotify: Move ->free_mark callback to fsnotify_ops
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (20 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 21/22] fsnotify: Add group pointer in fsnotify_init_mark() Jan Kara
@ 2016-12-22  9:15 ` Jan Kara
  2016-12-26 17:39   ` Amir Goldstein
  2016-12-22 20:58 ` [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Paul Moore
  22 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22  9:15 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Lino Sanfilippo, Miklos Szeredi, Paul Moore, Jan Kara

Pointer to ->free_mark callback unnecessarily occupies one long in each
fsnotify_mark although they are the same for all marks from one
notification group. Move the callback pointer to fsnotify_ops.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/dnotify/dnotify.c          |  3 ++-
 fs/notify/fanotify/fanotify.c        |  6 ++++++
 fs/notify/fanotify/fanotify.h        |  1 +
 fs/notify/fanotify/fanotify_user.c   |  9 ++-------
 fs/notify/inotify/inotify.h          |  2 ++
 fs/notify/inotify/inotify_fsnotify.c | 11 +++++++++++
 fs/notify/inotify/inotify_user.c     | 14 ++------------
 fs/notify/mark.c                     | 13 +++++++------
 include/linux/fsnotify_backend.h     |  6 +++---
 kernel/audit_fsnotify.c              |  4 ++--
 kernel/audit_tree.c                  |  4 ++--
 kernel/audit_watch.c                 |  4 ++--
 12 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index b1a5c7aa7c80..a5ba7a16c484 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -135,6 +135,7 @@ static void dnotify_free_mark(struct fsnotify_mark *fsn_mark)
 
 static struct fsnotify_ops dnotify_fsnotify_ops = {
 	.handle_event = dnotify_handle_event,
+	.free_mark = dnotify_free_mark,
 };
 
 /*
@@ -305,7 +306,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 
 	/* set up the new_fsn_mark and new_dn_mark */
 	new_fsn_mark = &new_dn_mark->fsn_mark;
-	fsnotify_init_mark(new_fsn_mark, dnotify_group, dnotify_free_mark);
+	fsnotify_init_mark(new_fsn_mark, dnotify_group);
 	new_fsn_mark->mask = mask;
 	new_dn_mark->dn = NULL;
 
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 284d2d112ad2..1fe740fc9e7e 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -266,8 +266,14 @@ static void fanotify_free_event(struct fsnotify_event *fsn_event)
 	kmem_cache_free(fanotify_event_cachep, event);
 }
 
+static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
+{
+	kmem_cache_free(fanotify_mark_cache, fsn_mark);
+}
+
 const struct fsnotify_ops fanotify_fsnotify_ops = {
 	.handle_event = fanotify_handle_event,
 	.free_group_priv = fanotify_free_group_priv,
 	.free_event = fanotify_free_event,
+	.free_mark = fanotify_free_mark,
 };
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 4500a74f8d38..4eb6f5efa282 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -2,6 +2,7 @@
 #include <linux/path.h>
 #include <linux/slab.h>
 
+extern struct kmem_cache *fanotify_mark_cache;
 extern struct kmem_cache *fanotify_event_cachep;
 extern struct kmem_cache *fanotify_perm_event_cachep;
 
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 4ba51b836a44..a1d5b121fdf8 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -40,7 +40,7 @@
 
 extern const struct fsnotify_ops fanotify_fsnotify_ops;
 
-static struct kmem_cache *fanotify_mark_cache __read_mostly;
+struct kmem_cache *fanotify_mark_cache __read_mostly;
 struct kmem_cache *fanotify_event_cachep __read_mostly;
 struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
 
@@ -444,11 +444,6 @@ static const struct file_operations fanotify_fops = {
 	.llseek		= noop_llseek,
 };
 
-static void fanotify_free_mark(struct fsnotify_mark *fsn_mark)
-{
-	kmem_cache_free(fanotify_mark_cache, fsn_mark);
-}
-
 static int fanotify_find_path(int dfd, const char __user *filename,
 			      struct path *path, unsigned int flags)
 {
@@ -627,7 +622,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 	if (!mark)
 		return ERR_PTR(-ENOMEM);
 
-	fsnotify_init_mark(mark, group, fanotify_free_mark);
+	fsnotify_init_mark(mark, group);
 	ret = fsnotify_add_mark_locked(mark, inode, mnt, 0);
 	if (ret) {
 		fsnotify_put_mark(mark);
diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
index 507840969edc..049e15da5564 100644
--- a/fs/notify/inotify/inotify.h
+++ b/fs/notify/inotify/inotify.h
@@ -31,3 +31,5 @@ extern int inotify_handle_event(struct fsnotify_group *group,
 				int *srcu_idx);
 
 extern const struct fsnotify_ops inotify_fsnotify_ops;
+
+extern struct kmem_cache *inotify_inode_mark_cachep;
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 35f2f39514b8..a33aa3173a2d 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -177,9 +177,20 @@ static void inotify_free_event(struct fsnotify_event *fsn_event)
 	kfree(INOTIFY_E(fsn_event));
 }
 
+/* ding dong the mark is dead */
+static void inotify_free_mark(struct fsnotify_mark *fsn_mark)
+{
+	struct inotify_inode_mark *i_mark;
+
+	i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark);
+
+	kmem_cache_free(inotify_inode_mark_cachep, i_mark);
+}
+
 const struct fsnotify_ops inotify_fsnotify_ops = {
 	.handle_event = inotify_handle_event,
 	.free_group_priv = inotify_free_group_priv,
 	.free_event = inotify_free_event,
 	.freeing_mark = inotify_freeing_mark,
+	.free_mark = inotify_free_mark,
 };
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 38e66d497b3b..f16d591f0b41 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -49,7 +49,7 @@ static int inotify_max_user_instances __read_mostly;
 static int inotify_max_queued_events __read_mostly;
 static int inotify_max_user_watches __read_mostly;
 
-static struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
+struct kmem_cache *inotify_inode_mark_cachep __read_mostly;
 
 #ifdef CONFIG_SYSCTL
 
@@ -486,16 +486,6 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
 	atomic_dec(&group->inotify_data.user->inotify_watches);
 }
 
-/* ding dong the mark is dead */
-static void inotify_free_mark(struct fsnotify_mark *fsn_mark)
-{
-	struct inotify_inode_mark *i_mark;
-
-	i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark);
-
-	kmem_cache_free(inotify_inode_mark_cachep, i_mark);
-}
-
 static int inotify_update_existing_watch(struct fsnotify_group *group,
 					 struct inode *inode,
 					 u32 arg)
@@ -561,7 +551,7 @@ static int inotify_new_watch(struct fsnotify_group *group,
 	if (unlikely(!tmp_i_mark))
 		return -ENOMEM;
 
-	fsnotify_init_mark(&tmp_i_mark->fsn_mark, group, inotify_free_mark);
+	fsnotify_init_mark(&tmp_i_mark->fsn_mark, group);
 	tmp_i_mark->fsn_mark.mask = mask;
 	tmp_i_mark->wd = -1;
 
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index f32f4ff4ecc4..c8aca8916de1 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -225,9 +225,12 @@ static void fsnotify_detach_from_object(struct fsnotify_mark *mark)
 
 static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark)
 {
-	if (mark->group)
-		fsnotify_put_group(mark->group);
-	mark->free_mark(mark);
+	struct fsnotify_group *group = mark->group;
+
+	if (WARN_ON_ONCE(!group))
+		return;
+	group->ops->free_mark(mark);
+	fsnotify_put_group(group);
 }
 
 void fsnotify_put_mark(struct fsnotify_mark *mark)
@@ -754,13 +757,11 @@ void fsnotify_destroy_marks(struct fsnotify_mark_list **listp)
  * Nothing fancy, just initialize lists and locks and counters.
  */
 void fsnotify_init_mark(struct fsnotify_mark *mark,
-			struct fsnotify_group *group,
-			void (*free_mark)(struct fsnotify_mark *mark))
+			struct fsnotify_group *group)
 {
 	memset(mark, 0, sizeof(*mark));
 	spin_lock_init(&mark->lock);
 	atomic_set(&mark->refcnt, 1);
-	mark->free_mark = free_mark;
 	fsnotify_get_group(group);
 	mark->group = group;
 }
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 292884ecc204..dfd66552c34f 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -102,6 +102,8 @@ struct fsnotify_ops {
 	void (*free_group_priv)(struct fsnotify_group *group);
 	void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group);
 	void (*free_event)(struct fsnotify_event *event);
+	/* called on final put+free to free memory */
+	void (*free_mark)(struct fsnotify_mark *mark);
 };
 
 /*
@@ -257,7 +259,6 @@ struct fsnotify_mark {
 #define FSNOTIFY_MARK_FLAG_ALIVE		0x02
 #define FSNOTIFY_MARK_FLAG_ATTACHED		0x04
 	unsigned int flags;		/* flags [mark->lock] */
-	void (*free_mark)(struct fsnotify_mark *mark); /* called on final put+free */
 };
 
 #ifdef CONFIG_FSNOTIFY
@@ -337,8 +338,7 @@ extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group
 /* Calculate mask of events for a list of marks */
 extern void fsnotify_recalc_mask(struct fsnotify_mark_list *list);
 extern void fsnotify_init_mark(struct fsnotify_mark *mark,
-			       struct fsnotify_group *group,
-			       void (*free_mark)(struct fsnotify_mark *mark));
+			       struct fsnotify_group *group);
 /* Find mark belonging to given group in the list of marks */
 extern struct fsnotify_mark *fsnotify_find_mark(
 					struct fsnotify_mark_list **listp,
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 5bb7e920f126..7c724b59fe7d 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -103,8 +103,7 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
 		goto out;
 	}
 
-	fsnotify_init_mark(&audit_mark->mark, audit_fsnotify_group,
-			   audit_fsnotify_free_mark);
+	fsnotify_init_mark(&audit_mark->mark, audit_fsnotify_group);
 	audit_mark->mark.mask = AUDIT_FS_EVENTS;
 	audit_mark->path = pathname;
 	audit_update_mark(audit_mark, dentry->d_inode);
@@ -203,6 +202,7 @@ static int audit_mark_handle_event(struct fsnotify_group *group,
 
 static const struct fsnotify_ops audit_mark_fsnotify_ops = {
 	.handle_event =	audit_mark_handle_event,
+	.free_mark = audit_fsnotify_free_mark,
 };
 
 static int __init audit_fsnotify_init(void)
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index bce7ee6c5fda..7b66474c960f 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -154,8 +154,7 @@ static struct audit_chunk *alloc_chunk(int count)
 		INIT_LIST_HEAD(&chunk->owners[i].list);
 		chunk->owners[i].index = i;
 	}
-	fsnotify_init_mark(&chunk->mark, audit_tree_group,
-			   audit_tree_destroy_watch);
+	fsnotify_init_mark(&chunk->mark, audit_tree_group);
 	chunk->mark.mask = FS_IN_IGNORED;
 	return chunk;
 }
@@ -997,6 +996,7 @@ static void audit_tree_freeing_mark(struct fsnotify_mark *entry, struct fsnotify
 static const struct fsnotify_ops audit_tree_ops = {
 	.handle_event = audit_tree_handle_event,
 	.freeing_mark = audit_tree_freeing_mark,
+	.free_mark = audit_tree_destroy_watch,
 };
 
 static int __init audit_tree_init(void)
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 0f355086215a..d32b68b10cf7 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -157,8 +157,7 @@ static struct audit_parent *audit_init_parent(struct path *path)
 
 	INIT_LIST_HEAD(&parent->watches);
 
-	fsnotify_init_mark(&parent->mark, audit_watch_group,
-			   audit_watch_free_mark);
+	fsnotify_init_mark(&parent->mark, audit_watch_group);
 	parent->mark.mask = AUDIT_FS_WATCH;
 	ret = fsnotify_add_mark(&parent->mark, inode, NULL, 0);
 	if (ret < 0) {
@@ -508,6 +507,7 @@ static int audit_watch_handle_event(struct fsnotify_group *group,
 
 static const struct fsnotify_ops audit_watch_fsnotify_ops = {
 	.handle_event = 	audit_watch_handle_event,
+	.free_mark =		audit_watch_free_mark,
 };
 
 static int __init audit_watch_init(void)
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo
  2016-12-22  9:15 ` [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo Jan Kara
@ 2016-12-22 12:59   ` Amir Goldstein
  2016-12-22 15:16     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-22 12:59 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> show_fdinfo() iterates group's list of marks. All marks found there are
> guaranteed to be alive and they stay so until we release
> group->mark_mutex. So remove uncecessary tests whether mark is alive.
>

The statement above is true for fanotify. I don't think it holds for inotify.

SYS_inotify_rm_watch()
  fsnotify_destroy_mark()
    fsnotify_destroy_mark(mark, group)
        mutex_lock_nested(&group->mark_mutex) /* not really nested for
inotify */
        fsnotify_detach_mark(mark)
        mutex_unlock(&group->mark_mutex);
        fsnotify_free_mark(mark)
           mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
/* !!! mark is not alive and on the group's list. group->mark_mutex is
not held !!! */
           list_add(&mark->g_list, &destroy_list);

> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/fdinfo.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> index fd98e5100cab..601a59c8d87e 100644
> --- a/fs/notify/fdinfo.c
> +++ b/fs/notify/fdinfo.c
> @@ -76,8 +76,7 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
>         struct inotify_inode_mark *inode_mark;
>         struct inode *inode;
>
> -       if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE) ||
> -           !(mark->flags & FSNOTIFY_MARK_FLAG_INODE))
> +       if (!(mark->flags & FSNOTIFY_MARK_FLAG_INODE))
>                 return;
>
>         inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
> @@ -113,9 +112,6 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
>         unsigned int mflags = 0;
>         struct inode *inode;
>
> -       if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE))
> -               return;
> -
>         if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)
>                 mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
>
> --
> 2.10.2
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo
  2016-12-22 12:59   ` Amir Goldstein
@ 2016-12-22 15:16     ` Jan Kara
  2016-12-22 15:54       ` Amir Goldstein
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22 15:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu 22-12-16 14:59:35, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > show_fdinfo() iterates group's list of marks. All marks found there are
> > guaranteed to be alive and they stay so until we release
> > group->mark_mutex. So remove uncecessary tests whether mark is alive.
> >
> 
> The statement above is true for fanotify. I don't think it holds for inotify.
> 
> SYS_inotify_rm_watch()
>   fsnotify_destroy_mark()
>     fsnotify_destroy_mark(mark, group)
>         mutex_lock_nested(&group->mark_mutex) /* not really nested for
> inotify */
>         fsnotify_detach_mark(mark)
>         mutex_unlock(&group->mark_mutex);
>         fsnotify_free_mark(mark)
>            mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
> /* !!! mark is not alive and on the group's list. group->mark_mutex is
> not held !!! */

How come it is on the group's list? fsnotify_detach_mark() will remove it
from that list... The destroy_list is just a private list used for mark
destruction, not a list of any group.

>            list_add(&mark->g_list, &destroy_list);

								Honza

> 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/notify/fdinfo.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
> > index fd98e5100cab..601a59c8d87e 100644
> > --- a/fs/notify/fdinfo.c
> > +++ b/fs/notify/fdinfo.c
> > @@ -76,8 +76,7 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
> >         struct inotify_inode_mark *inode_mark;
> >         struct inode *inode;
> >
> > -       if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE) ||
> > -           !(mark->flags & FSNOTIFY_MARK_FLAG_INODE))
> > +       if (!(mark->flags & FSNOTIFY_MARK_FLAG_INODE))
> >                 return;
> >
> >         inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
> > @@ -113,9 +112,6 @@ static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
> >         unsigned int mflags = 0;
> >         struct inode *inode;
> >
> > -       if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE))
> > -               return;
> > -
> >         if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)
> >                 mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
> >
> > --
> > 2.10.2
> >
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/22] inotify: Remove inode pointers from debug messages
  2016-12-22  9:15 ` [PATCH 02/22] inotify: Remove inode pointers from debug messages Jan Kara
@ 2016-12-22 15:31   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-22 15:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Printing inode pointers in warnings has dubious value and with future
> changes we won't be able to easily get them without either locking or
> chances we oops along the way. So just remove inode pointers from the
> warning messages.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo
  2016-12-22 15:16     ` Jan Kara
@ 2016-12-22 15:54       ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-22 15:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 5:16 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 14:59:35, Amir Goldstein wrote:
>> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > show_fdinfo() iterates group's list of marks. All marks found there are
>> > guaranteed to be alive and they stay so until we release
>> > group->mark_mutex. So remove uncecessary tests whether mark is alive.
>> >
>>
>> The statement above is true for fanotify. I don't think it holds for inotify.
>>
>> SYS_inotify_rm_watch()
>>   fsnotify_destroy_mark()
>>     fsnotify_destroy_mark(mark, group)
>>         mutex_lock_nested(&group->mark_mutex) /* not really nested for
>> inotify */
>>         fsnotify_detach_mark(mark)
>>         mutex_unlock(&group->mark_mutex);
>>         fsnotify_free_mark(mark)
>>            mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
>> /* !!! mark is not alive and on the group's list. group->mark_mutex is
>> not held !!! */
>
> How come it is on the group's list? fsnotify_detach_mark() will remove it
> from that list... The destroy_list is just a private list used for mark
> destruction, not a list of any group.
>

I stand corrected. This is bound to happen a few more times ;-)

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex
  2016-12-22  9:15 ` [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex Jan Kara
@ 2016-12-22 16:27   ` Amir Goldstein
  2016-12-22 17:31     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-22 16:27 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Move recalculation of inode / vfsmount notification mask under
> group->mark_mutex of the mark which was modified. These are the only
> places where mask recalculation happens without mark being protected
> from detaching from inode / vfsmount which will cause issues with the
> following patches.
>

What am I missing this time?

dnotify_handle_event()
   dnotify_recalc_inode_mask()
      fsnotify_recalc_inode_mask() /* not under dnotify_group->mark_mutex */

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex
  2016-12-22 16:27   ` Amir Goldstein
@ 2016-12-22 17:31     ` Jan Kara
  2016-12-22 19:08       ` Amir Goldstein
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-22 17:31 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu 22-12-16 18:27:11, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Move recalculation of inode / vfsmount notification mask under
> > group->mark_mutex of the mark which was modified. These are the only
> > places where mask recalculation happens without mark being protected
> > from detaching from inode / vfsmount which will cause issues with the
> > following patches.
> >
> 
> What am I missing this time?
> 
> dnotify_handle_event()
>    dnotify_recalc_inode_mask()
>       fsnotify_recalc_inode_mask() /* not under dnotify_group->mark_mutex */

Dnotify uses mark->lock to protect recalculation of mask which also
protects mark from being removed from the list.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex
  2016-12-22 17:31     ` Jan Kara
@ 2016-12-22 19:08       ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-22 19:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 7:31 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 18:27:11, Amir Goldstein wrote:
>> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > Move recalculation of inode / vfsmount notification mask under
>> > group->mark_mutex of the mark which was modified. These are the only
>> > places where mask recalculation happens without mark being protected
>> > from detaching from inode / vfsmount which will cause issues with the
>> > following patches.
>> >
>>
>> What am I missing this time?
>>
>> dnotify_handle_event()
>>    dnotify_recalc_inode_mask()
>>       fsnotify_recalc_inode_mask() /* not under dnotify_group->mark_mutex */
>
> Dnotify uses mark->lock to protect recalculation of mask which also
> protects mark from being removed from the list.
>

Right. Thanks for clarifying.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events
  2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
                   ` (21 preceding siblings ...)
  2016-12-22  9:15 ` [PATCH 22/22] fsnotify: Move ->free_mark callback to fsnotify_ops Jan Kara
@ 2016-12-22 20:58 ` Paul Moore
  2016-12-22 21:05   ` Amir Goldstein
  22 siblings, 1 reply; 80+ messages in thread
From: Paul Moore @ 2016-12-22 20:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> currently, fanotify waits for response to a permission even from userspace
> process while holding fsnotify_mark_srcu lock. That has a consequence that
> when userspace process takes long to respond or does not respond at all,
> fsnotify_mark_srcu period cannot ever complete blocking reclaim of any
> notification marks and also blocking any process that did synchronize_srcu()
> on fsnotify_mark_srcu. Effectively, this eventually blocks anybody interacting
> with the notification subsystem. Miklos has some real world reports of this
> happening. Although this in principle a problem of broken userspace
> application (which futhermore has to have CAP_SYS_ADMIN in init_user_ns, so
> it is not a security problem), it is still nasty that a simple error can
> block the kernel like this.
>
> This patch set solves this problem ...
>
> Patches have survived testing with inotify/fanotify tests in LTP. I didn't test
> audit - Paul can you give these patches some testing?  Since some of the
> changes are really non-trivial, I'd welcome if someone reviewed the patch set.

I'm going to take a look at the audit related patches right now,
expect some feedback shortly.

In the meantime, if you wanted to play a bit with some simple audit
regression tests, check out the testsuite below:

* https://github.com/linux-audit/audit-testsuite

... it is still rather simplistic, but the tests in tests/file_* and
tests/exec_name should do some basic exercises of the audit code that
leverages fsnotify.  If nothing else, it should give you some ideas
about how you might stress this a bit more with audit.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events
  2016-12-22 20:58 ` [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Paul Moore
@ 2016-12-22 21:05   ` Amir Goldstein
  2016-12-22 23:04     ` Paul Moore
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-22 21:05 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, linux-audit

On Thu, Dec 22, 2016 at 10:58 PM, Paul Moore <paul@paul-moore.com> wrote:
> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> Hello,
>>
>> currently, fanotify waits for response to a permission even from userspace
>> process while holding fsnotify_mark_srcu lock. That has a consequence that
>> when userspace process takes long to respond or does not respond at all,
>> fsnotify_mark_srcu period cannot ever complete blocking reclaim of any
>> notification marks and also blocking any process that did synchronize_srcu()
>> on fsnotify_mark_srcu. Effectively, this eventually blocks anybody interacting
>> with the notification subsystem. Miklos has some real world reports of this
>> happening. Although this in principle a problem of broken userspace
>> application (which futhermore has to have CAP_SYS_ADMIN in init_user_ns, so
>> it is not a security problem), it is still nasty that a simple error can
>> block the kernel like this.
>>
>> This patch set solves this problem ...
>>
>> Patches have survived testing with inotify/fanotify tests in LTP. I didn't test
>> audit - Paul can you give these patches some testing?  Since some of the
>> changes are really non-trivial, I'd welcome if someone reviewed the patch set.
>
> I'm going to take a look at the audit related patches right now,
> expect some feedback shortly.
>
> In the meantime, if you wanted to play a bit with some simple audit
> regression tests, check out the testsuite below:
>
> * https://github.com/linux-audit/audit-testsuite
>
> ... it is still rather simplistic, but the tests in tests/file_* and
> tests/exec_name should do some basic exercises of the audit code that
> leverages fsnotify.  If nothing else, it should give you some ideas
> about how you might stress this a bit more with audit.
>


Mmm that's interesting. I was looking for a good place to start with a proper
testsuite for fsnotify.
It seems like the 2 subsystems could use the same testsuite.

I will look into it.

Thanks!

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events
  2016-12-22 21:05   ` Amir Goldstein
@ 2016-12-22 23:04     ` Paul Moore
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Moore @ 2016-12-22 23:04 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, linux-audit

On Thu, Dec 22, 2016 at 4:05 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Dec 22, 2016 at 10:58 PM, Paul Moore <paul@paul-moore.com> wrote:
>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>>> Hello,
>>>
>>> currently, fanotify waits for response to a permission even from userspace
>>> process while holding fsnotify_mark_srcu lock. That has a consequence that
>>> when userspace process takes long to respond or does not respond at all,
>>> fsnotify_mark_srcu period cannot ever complete blocking reclaim of any
>>> notification marks and also blocking any process that did synchronize_srcu()
>>> on fsnotify_mark_srcu. Effectively, this eventually blocks anybody interacting
>>> with the notification subsystem. Miklos has some real world reports of this
>>> happening. Although this in principle a problem of broken userspace
>>> application (which futhermore has to have CAP_SYS_ADMIN in init_user_ns, so
>>> it is not a security problem), it is still nasty that a simple error can
>>> block the kernel like this.
>>>
>>> This patch set solves this problem ...
>>>
>>> Patches have survived testing with inotify/fanotify tests in LTP. I didn't test
>>> audit - Paul can you give these patches some testing?  Since some of the
>>> changes are really non-trivial, I'd welcome if someone reviewed the patch set.
>>
>> I'm going to take a look at the audit related patches right now,
>> expect some feedback shortly.
>>
>> In the meantime, if you wanted to play a bit with some simple audit
>> regression tests, check out the testsuite below:
>>
>> * https://github.com/linux-audit/audit-testsuite
>>
>> ... it is still rather simplistic, but the tests in tests/file_* and
>> tests/exec_name should do some basic exercises of the audit code that
>> leverages fsnotify.  If nothing else, it should give you some ideas
>> about how you might stress this a bit more with audit.
>
> Mmm that's interesting. I was looking for a good place to start with a proper
> testsuite for fsnotify.
> It seems like the 2 subsystems could use the same testsuite.
>
> I will look into it.
>
> Thanks!

No problem, I'm glad it's helpful.

FWIW, it's based off ideas from the selinux-testsuite (link below);
the general motivation being a quick and easy regression test that can
be used to verify patches and general upstream development.

* https://github.com/SELinuxProject/selinux-testsuite

In addition to individual commit testing, I've combined both the audit
and SELinux testsuites with a semi-automated weekly kernel build to
test both the -rcX releases as well the selinux/next and audit/next
branches; it's proved quite beneficial.  In case you're curious, I did
a short presentation on it this summer (slides and video at the link
below).  If you are interested, I'm happy to talk about it further,
but perhaps in another thread - I don't want to hijack Jan's patchset
with marginally relevant testing discussion :)

* http://www.paul-moore.com/blog/d/2016/08/flock-kernel-testing.html

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark()
  2016-12-22  9:15 ` [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark() Jan Kara
@ 2016-12-22 23:13   ` Paul Moore
  2016-12-23 13:22     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Paul Moore @ 2016-12-22 23:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> There are only two calls sites of fsnotify_duplicate_mark(). Those are
> in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
> for audit tree, inode pointer and group gets set in
> fsnotify_add_mark_locked() later anyway, mask and free_mark are already
> set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
> actively harmful because following fsnotify_add_mark_locked() will leak
> group reference by overwriting the group pointer. So just remove the two
> calls to fsnotify_duplicate_mark() and the function.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/mark.c                 | 12 ------------
>  include/linux/fsnotify_backend.h |  2 --
>  kernel/audit_tree.c              |  6 ++----
>  3 files changed, 2 insertions(+), 18 deletions(-)

At first glance this looks reasonable, although you keep mentioning
"fsnotify_add_mark_locked" above when untag_chunk() is calling
"fsnotify_add_mark"; I just wanted to make sure you hadn't intended to
take the mutex in the audit code instead of relying on the locking in
fsnotify_add_mark().

> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index d3fea0bd89e2..6043306e8e21 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -510,18 +510,6 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
>         }
>  }
>
> -void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
> -{
> -       assert_spin_locked(&old->lock);
> -       new->inode = old->inode;
> -       new->mnt = old->mnt;
> -       if (old->group)
> -               fsnotify_get_group(old->group);
> -       new->group = old->group;
> -       new->mask = old->mask;
> -       new->free_mark = old->free_mark;
> -}
> -
>  /*
>   * Nothing fancy, just initialize lists and locks and counters.
>   */
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 0cf34d6cc253..487246546ebe 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -323,8 +323,6 @@ extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(str
>  extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
>  /* find (and take a reference) to a mark associated with group and vfsmount */
>  extern struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt);
> -/* copy the values from old into new */
> -extern void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old);
>  /* set the ignored_mask of a mark */
>  extern void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask);
>  /* set the mask of a mark (might pin the object into memory */
> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 8b1dde96a0fa..f3130eb0a4bd 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -258,8 +258,7 @@ static void untag_chunk(struct node *p)
>         if (!new)
>                 goto Fallback;
>
> -       fsnotify_duplicate_mark(&new->mark, entry);
> -       if (fsnotify_add_mark(&new->mark, new->mark.group, new->mark.inode, NULL, 1)) {
> +       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
>                 fsnotify_put_mark(&new->mark);
>                 goto Fallback;
>         }
> @@ -395,8 +394,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>                 return -ENOENT;
>         }
>
> -       fsnotify_duplicate_mark(chunk_entry, old_entry);
> -       if (fsnotify_add_mark(chunk_entry, chunk_entry->group, chunk_entry->inode, NULL, 1)) {
> +       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
>                 spin_unlock(&old_entry->lock);
>                 fsnotify_put_mark(chunk_entry);
>                 fsnotify_put_mark(old_entry);
> --
> 2.10.2
>

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2016-12-22  9:15 ` [PATCH 05/22] audit: Fix sleep in atomic Jan Kara
@ 2016-12-22 23:18   ` Paul Moore
  2016-12-23 13:24     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Paul Moore @ 2016-12-22 23:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> Audit tree code was happily adding new notification marks while holding
> spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
> lead to sleeping while holding a spinlock, deadlocks due to lock
> inversion, and probably other fun. Fix the problem by acquiring
> group->mark_mutex earlier.
>
> CC: Paul Moore <paul@paul-moore.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)

[SIDE NOTE: this patch explains your comments and my earlier concern
about the locked/unlocked variants of fsnotify_add_mark() in
untag_chunk()]

Ouch.  Thanks for catching this ... what is your goal with these
patches, are you targeting this as a fix during the v4.10-rcX cycle?
If not, any objections if I pull this patch into the audit tree and
send this to Linus during the v4.10-rcX cycle (assuming it passes
testing, yadda yadda)?

> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index f3130eb0a4bd..156b6a93f4fc 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -231,6 +231,7 @@ static void untag_chunk(struct node *p)
>         if (size)
>                 new = alloc_chunk(size);
>
> +       mutex_lock(&entry->group->mark_mutex);
>         spin_lock(&entry->lock);
>         if (chunk->dead || !entry->inode) {
>                 spin_unlock(&entry->lock);
> @@ -258,7 +259,8 @@ static void untag_chunk(struct node *p)
>         if (!new)
>                 goto Fallback;
>
> -       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
> +       if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
> +                                    NULL, 1)) {
>                 fsnotify_put_mark(&new->mark);
>                 goto Fallback;
>         }
> @@ -309,6 +311,7 @@ static void untag_chunk(struct node *p)
>         spin_unlock(&hash_lock);
>         spin_unlock(&entry->lock);
>  out:
> +       mutex_unlock(&entry->group->mark_mutex);
>         fsnotify_put_mark(entry);
>         spin_lock(&hash_lock);
>  }
> @@ -385,17 +388,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>
>         chunk_entry = &chunk->mark;
>
> +       mutex_lock(&old_entry->group->mark_mutex);
>         spin_lock(&old_entry->lock);
>         if (!old_entry->inode) {
>                 /* old_entry is being shot, lets just lie */
>                 spin_unlock(&old_entry->lock);
> +               mutex_unlock(&old_entry->group->mark_mutex);
>                 fsnotify_put_mark(old_entry);
>                 free_chunk(chunk);
>                 return -ENOENT;
>         }
>
> -       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
> +       if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
> +                                    old_entry->inode, NULL, 1)) {
>                 spin_unlock(&old_entry->lock);
> +               mutex_unlock(&old_entry->group->mark_mutex);
>                 fsnotify_put_mark(chunk_entry);
>                 fsnotify_put_mark(old_entry);
>                 return -ENOSPC;
> @@ -411,6 +418,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>                 chunk->dead = 1;
>                 spin_unlock(&chunk_entry->lock);
>                 spin_unlock(&old_entry->lock);
> +               mutex_unlock(&old_entry->group->mark_mutex);
>
>                 fsnotify_destroy_mark(chunk_entry, audit_tree_group);
>
> @@ -443,6 +451,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>         spin_unlock(&hash_lock);
>         spin_unlock(&chunk_entry->lock);
>         spin_unlock(&old_entry->lock);
> +       mutex_unlock(&old_entry->group->mark_mutex);
>         fsnotify_destroy_mark(old_entry, audit_tree_group);
>         fsnotify_put_mark(chunk_entry); /* drop initial reference */
>         fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
> --
> 2.10.2
>



-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/22] audit: Abstract hash key handling
  2016-12-22  9:15 ` [PATCH 06/22] audit: Abstract hash key handling Jan Kara
@ 2016-12-22 23:27   ` Paul Moore
  2016-12-23 13:27     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Paul Moore @ 2016-12-22 23:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> Audit tree currently uses inode pointer as a key into the hash table.
> Getting that from notification mark will be somewhat more difficult with
> coming fsnotify changes and there's no reason we really have to use the
> inode pointer. So abstract getting of hash key from the audit chunk and
> inode so that we can switch to a different key easily later.
>
> CC: Paul Moore <paul@paul-moore.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
>  1 file changed, 28 insertions(+), 11 deletions(-)

I have no objections with this patch in particular, but in patch 8,
are you certain that inode_to_key() and chunk_to_key() will continue
to return the same key value?

> diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> index 156b6a93f4fc..f0859828de09 100644
> --- a/kernel/audit_tree.c
> +++ b/kernel/audit_tree.c
> @@ -163,33 +163,48 @@ enum {HASH_SIZE = 128};
>  static struct list_head chunk_hash_heads[HASH_SIZE];
>  static __cacheline_aligned_in_smp DEFINE_SPINLOCK(hash_lock);
>
> -static inline struct list_head *chunk_hash(const struct inode *inode)
> +/* Function to return search key in our hash from inode. */
> +static unsigned long inode_to_key(const struct inode *inode)
>  {
> -       unsigned long n = (unsigned long)inode / L1_CACHE_BYTES;
> +       return (unsigned long)inode;
> +}
> +
> +/*
> + * Function to return search key in our hash from chunk. Key 0 is special and
> + * should never be present in the hash.
> + */
> +static unsigned long chunk_to_key(struct audit_chunk *chunk)
> +{
> +       return (unsigned long)chunk->mark.inode;
> +}
> +
> +static inline struct list_head *chunk_hash(unsigned long key)
> +{
> +       unsigned long n = key / L1_CACHE_BYTES;
>         return chunk_hash_heads + n % HASH_SIZE;
>  }
>
>  /* hash_lock & entry->lock is held by caller */
>  static void insert_hash(struct audit_chunk *chunk)
>  {
> -       struct fsnotify_mark *entry = &chunk->mark;
> +       unsigned long key = chunk_to_key(chunk);
>         struct list_head *list;
>
> -       if (!entry->inode)
> +       if (!key)
>                 return;
> -       list = chunk_hash(entry->inode);
> +       list = chunk_hash(key);
>         list_add_rcu(&chunk->hash, list);
>  }
>
>  /* called under rcu_read_lock */
>  struct audit_chunk *audit_tree_lookup(const struct inode *inode)
>  {
> -       struct list_head *list = chunk_hash(inode);
> +       unsigned long key = inode_to_key(inode);
> +       struct list_head *list = chunk_hash(key);
>         struct audit_chunk *p;
>
>         list_for_each_entry_rcu(p, list, hash) {
> -               /* mark.inode may have gone NULL, but who cares? */
> -               if (p->mark.inode == inode) {
> +               if (chunk_to_key(p) == key) {
>                         atomic_long_inc(&p->refs);
>                         return p;
>                 }
> @@ -585,7 +600,8 @@ int audit_remove_tree_rule(struct audit_krule *rule)
>
>  static int compare_root(struct vfsmount *mnt, void *arg)
>  {
> -       return d_backing_inode(mnt->mnt_root) == arg;
> +       return inode_to_key(d_backing_inode(mnt->mnt_root)) ==
> +              (unsigned long)arg;
>  }
>
>  void audit_trim_trees(void)
> @@ -620,9 +636,10 @@ void audit_trim_trees(void)
>                 list_for_each_entry(node, &tree->chunks, list) {
>                         struct audit_chunk *chunk = find_chunk(node);
>                         /* this could be NULL if the watch is dying else where... */
> -                       struct inode *inode = chunk->mark.inode;
>                         node->index |= 1U<<31;
> -                       if (iterate_mounts(compare_root, inode, root_mnt))
> +                       if (iterate_mounts(compare_root,
> +                                          (void *)chunk_to_key(chunk),
> +                                          root_mnt))
>                                 node->index &= ~(1U<<31);
>                 }
>                 spin_unlock(&hash_lock);
> --
> 2.10.2
>



-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/22] fsnotify: Update comments
  2016-12-22  9:15 ` [PATCH 07/22] fsnotify: Update comments Jan Kara
@ 2016-12-23  4:45   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-23  4:45 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Add a comment that lifetime of a notification mark is protected by SRCU
> and remove a comment about clearing of marks attached to the inode. It
> is stale and more uptodate version is at fsnotify_destroy_marks() which
> is the function handling this case.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure
  2016-12-22  9:15 ` [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure Jan Kara
@ 2016-12-23  5:48   ` Amir Goldstein
  2016-12-23 13:34     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-23  5:48 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Currently notification marks are attached to object (inode or vfsmnt) by
> a hlist_head in the object. The list is also protected by a spinlock in
> the object. So while there is any mark attached to the list of marks,
> the object must be pinned in memory (and thus e.g. last iput() deleting
> inode cannot happen). Also for list iteration in fsnotify() to work, we
> must hold fsnotify_mark_srcu lock so that mark itself and
> mark->obj_list.next cannot get freed. Thus we are required to wait for
> response to fanotify events from userspace process with
> fsnotify_mark_srcu lock held. That causes issues when userspace process
> is buggy and does not reply to some event - basically the whole
> notification subsystem gets eventually stuck.
>
> So to be able to drop fsnotify_mark_srcu lock while waiting for
> response, we have to pin the mark in memory and make sure it stays in
> the object list (as removing the mark waiting for response could lead to
> lost notification events for groups later in the list). However we don't
> want inode reclaim to block on such mark as that would lead to system
> just locking up elsewhere.
>
> This commit tries to pave a way towards solving these conflicting
> lifetime needs. Instead of anchoring the list of marks directly in the
> object, we anchor it in a dedicated structure (fsnotify_mark_list) and
> just point to that structure from the object. Also the list is protected
> by a spinlock contained in that structure. With this, we can detach
> notification marks from object without having to modify the list itself.
>

The structural change looks very good to.
It makes the code much easier to manage IMO.

I am only half way though this big change, but I wanted to make one meta
comment.

I have a problem with the choice of naming for the new struct.
'list' is really an overloaded term and the use of 'list' as a name of
a class that
contains a list head makes for some really confusing constructs like
list->list and mark->obj_list_head which is not a list_head struct.

For future generations, I suggest that we invest the effort in choosing
a name that makes more sense. I do realize how annoying it would be to
fix the entire series now, so it's not a problem if renaming is done in the end
of the series as long as we agree on the end result.

May I suggest the name fsnotify_tap to describe the new struct.
I know it is arbitrary, but not more arbitrary then fsnotify_mark and certainly
not any more arbitrary then fsnotify_group.

Here are some examples of constructs that will make more sense:

<+#define FSNOTIFY_LIST_TYPE_INODE       0x01
<+#define FSNOTIFY_LIST_TYPE_VFSMOUNT    0x02
>+#define FSNOTIFY_TAP_TYPE_INODE       0x01
>+#define FSNOTIFY_TAP_TYPE_VFSMOUNT    0x02

LIST_TYPE_INODE implies this is a list of inodes
TAP_TYPE_INODE implies this is a tap on an inode

<+       /* Head of list of marks for an object [mark->lock,
group->mark_mutex] */
<+       struct fsnotify_mark_list *obj_list_head;
>+       /* Container for list of marks for an object [mark->lock, group->mark_mutex] */
>+       struct fsnotify_tap *tap;

...

+static struct inode *fsnotify_detach_from_object(struct fsnotify_mark *mark)
+{
+       struct fsnotify_mark_list *list;
+       struct inode *inode = NULL;
+       bool free_list = false;
+
+       list = mark->obj_list_head;
+       spin_lock(&list->lock);
+       hlist_del_init_rcu(&mark->obj_list);
+       if (hlist_empty(&list->list)) {
+               if (list->flags & FSNOTIFY_LIST_TYPE_INODE) {
+                       inode = list->inode;
+                       inode->i_fsnotify_marks = NULL;
+                       inode->i_fsnotify_mask = 0;
+                       list->inode = NULL;
+                       list->flags &= ~FSNOTIFY_LIST_TYPE_INODE;
+               } else if (list->flags & FSNOTIFY_LIST_TYPE_VFSMOUNT) {
+                       real_mount(list->mnt)->mnt_fsnotify_marks = NULL;
+                       real_mount(list->mnt)->mnt_fsnotify_mask = 0;
+                       list->mnt = NULL;
+                       list->flags &= ~FSNOTIFY_LIST_TYPE_VFSMOUNT;
+               }
+               free_list = true;

if (hlist_empty(&tap->list)) {
        fsnotify_detach_tap(tap); /* this helper is very called for IMO */
        free_tap = true;

+       } else
+               __fsnotify_recalc_mask(list);
+       mark->obj_list_head = NULL;
+       spin_unlock(&list->lock);
+
+       if (free_list) {
+               spin_lock(&destroy_lock);
+               list->destroy_next = list_destroy_list;
+               list_destroy_list = list;

And killing list_destroy_list is my personal favorite...

+               spin_unlock(&destroy_lock);
+               queue_work(system_unbound_wq, &list_reaper_work);
+       }
+
+       return inode;
 }

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/22] inotify: Do not drop mark reference under idr_lock
  2016-12-22  9:15 ` [PATCH 09/22] inotify: Do not drop mark reference under idr_lock Jan Kara
@ 2016-12-23  8:04   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-23  8:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Dropping mark reference can result in mark being freed. Although it
> should not happen in inotify_remove_from_idr() since caller should hold
> another reference, just don't risk lock up just after WARN_ON
> unnecessarily. Also fold do_inotify_remove_from_idr() into the single
> callsite as that function really is just two lines of real code.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/22] fsnotify: Detach mark from object list when last reference is dropped
  2016-12-22  9:15 ` [PATCH 10/22] fsnotify: Detach mark from object list when last reference is dropped Jan Kara
@ 2016-12-23 10:51   ` Amir Goldstein
  2016-12-23 13:42     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-23 10:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Instead of removing mark from object list from fsnotify_detach_mark(),
> remove the mark when last reference to the mark is dropped. This will
> allow fanotify to wait for userspace response to event without having to
> hold onto fsnotify_mark_srcu.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
...

+/* Called with mark->obj_list_head->lock held, releases it */
+static void fsnotify_detach_from_object(struct fsnotify_mark *mark)
 {

IMO, the implicit release in this function makes the code using it hard
to read and maintain. Please consider splitting it into 2 functions
to be called from code that explicitly unlocks, e.g.:

     free_list = fsnotify_detach_from_object_locked(mark, &inode);
     spin_unlock(&list->lock);
     if (inode)
        iput(inode);
     if (free_list)
        fsnotify_free_list(list);

...
> +               inode = fsnotify_detach_list_from_object(list);
>                 free_list = true;
>         } else
>                 __fsnotify_recalc_mask(list);
>         mark->obj_list_head = NULL;
>         spin_unlock(&list->lock);
>
> +       if (inode)
> +               iput(inode);
> +

Question: if list is holding inode anyway, what's the use of
FSNOTIFY_MARK_FLAG_OBJECT_PINNED?
or maybe you are removing it later on in the series?


...
> +       /*
> +        * We have to be careful since we can race with e.g.
> +        * fsnotify_clear_marks_by_group() and once we drop the list->lock, the
> +        * list can get modified. However we are holding mark reference and
> +        * thus our mark cannot be removed from obj_list so we can continue
> +        * iteration after regaining list->lock.
> +        */
> +       hlist_for_each_entry(mark, &list->list, obj_list) {
>                 fsnotify_get_mark(mark);
> -               fsnotify_put_list(list);
> +               spin_unlock(&list->lock);
> +               if (old_mark)
> +                       fsnotify_put_mark(old_mark);
> +               old_mark = mark;
>                 fsnotify_destroy_mark(mark, mark->group);
> -               fsnotify_put_mark(mark);
> +               spin_lock(&list->lock);
>         }
> +       /*
> +        * Detach list from object now so that we don't pin inode until all
> +        * mark references get dropped. It would lead to strange results such
> +        * as delaying inode deletion or blocking unmount.
> +        */
> +       inode = fsnotify_detach_list_from_object(list);
> +       fsnotify_put_list(list);
> +       if (inode)
> +               iput(inode);
> +       if (old_mark)
> +               fsnotify_put_mark(old_mark);

I must be missing something subtle here. I don't see where the list->lock
is unlocked. Also, I am not sure if you placed put old_mark after
fsnotify_put_list
for a reason. If you did, I did not find that reason in the comments. If you
didn't I think it would be more appropriate after the list iteration
ends, although
it appear that put old_mark should be called after list->lock unlock.

Please untangle this knot for me.


> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 6086fc7ff6df..76b3c34172c7 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -244,9 +244,9 @@ struct fsnotify_mark {
>         struct list_head g_list;
>         /* Protects inode / mnt pointers, flags, masks */
>         spinlock_t lock;
> -       /* List of marks for inode / vfsmount [obj_list_head->lock] */
> +       /* List of marks for inode / vfsmount [obj_list_head->lock, mark ref] */
>         struct hlist_node obj_list;
> -       /* Head of list of marks for an object [mark->lock, group->mark_mutex] */
> +       /* Head of list of marks for an object [mark ref] */
>         struct fsnotify_mark_list *obj_list_head;

What is the meaning of [mark ref] here?
If the mark is on the obj_list its refcount is already elevated.
I thought it's the mark that is holding a ref on the list_head (or tap
if you accept my suggestion)
and not the other way around.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/22] fsnotify: Remove special handling of mark destruction on group shutdown
  2016-12-22  9:15 ` [PATCH 11/22] fsnotify: Remove special handling of mark destruction on group shutdown Jan Kara
@ 2016-12-23 12:12   ` Amir Goldstein
  2016-12-23 13:31     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-23 12:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Currently we queue all marks for destruction on group shutdown and then
> destroy them from fsnotify_destroy_group() instead from a worker thread
> which is the usual path. However worker can already be processing some
> list of marks to destroy so this does not make 100% all marks are really
> destroyed by the time group is shut down. This isn't a big problem as
> each mark holds group reference and thus group stays partially alive
> until all marks are really freed but there's no point in complicating
> our lives - just wait for the delayed work to be finished instead.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Isn't it *required* to wait for all marks to really be freed and not only
nice behavior following the same reason for
35e4817 fsnotify: avoid spurious EMFILE errors from inotify_init()

Otherwise

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark()
  2016-12-22 23:13   ` Paul Moore
@ 2016-12-23 13:22     ` Jan Kara
  2016-12-23 14:01       ` Paul Moore
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-23 13:22 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit

On Thu 22-12-16 18:13:11, Paul Moore wrote:
> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> > There are only two calls sites of fsnotify_duplicate_mark(). Those are
> > in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
> > for audit tree, inode pointer and group gets set in
> > fsnotify_add_mark_locked() later anyway, mask and free_mark are already
> > set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
> > actively harmful because following fsnotify_add_mark_locked() will leak
> > group reference by overwriting the group pointer. So just remove the two
> > calls to fsnotify_duplicate_mark() and the function.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/notify/mark.c                 | 12 ------------
> >  include/linux/fsnotify_backend.h |  2 --
> >  kernel/audit_tree.c              |  6 ++----
> >  3 files changed, 2 insertions(+), 18 deletions(-)
> 
> At first glance this looks reasonable, although you keep mentioning
> "fsnotify_add_mark_locked" above when untag_chunk() is calling
> "fsnotify_add_mark"; I just wanted to make sure you hadn't intended to
> take the mutex in the audit code instead of relying on the locking in
> fsnotify_add_mark().

No, I didn't want to take mutex in the audit code. It is just that
fsnotify_add_mark() is a thin wrapper around fsnotify_add_mark_locked() so
I was speaking about that function.

								Honza

> 
> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> > index d3fea0bd89e2..6043306e8e21 100644
> > --- a/fs/notify/mark.c
> > +++ b/fs/notify/mark.c
> > @@ -510,18 +510,6 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
> >         }
> >  }
> >
> > -void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
> > -{
> > -       assert_spin_locked(&old->lock);
> > -       new->inode = old->inode;
> > -       new->mnt = old->mnt;
> > -       if (old->group)
> > -               fsnotify_get_group(old->group);
> > -       new->group = old->group;
> > -       new->mask = old->mask;
> > -       new->free_mark = old->free_mark;
> > -}
> > -
> >  /*
> >   * Nothing fancy, just initialize lists and locks and counters.
> >   */
> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > index 0cf34d6cc253..487246546ebe 100644
> > --- a/include/linux/fsnotify_backend.h
> > +++ b/include/linux/fsnotify_backend.h
> > @@ -323,8 +323,6 @@ extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(str
> >  extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
> >  /* find (and take a reference) to a mark associated with group and vfsmount */
> >  extern struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt);
> > -/* copy the values from old into new */
> > -extern void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old);
> >  /* set the ignored_mask of a mark */
> >  extern void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask);
> >  /* set the mask of a mark (might pin the object into memory */
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index 8b1dde96a0fa..f3130eb0a4bd 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -258,8 +258,7 @@ static void untag_chunk(struct node *p)
> >         if (!new)
> >                 goto Fallback;
> >
> > -       fsnotify_duplicate_mark(&new->mark, entry);
> > -       if (fsnotify_add_mark(&new->mark, new->mark.group, new->mark.inode, NULL, 1)) {
> > +       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
> >                 fsnotify_put_mark(&new->mark);
> >                 goto Fallback;
> >         }
> > @@ -395,8 +394,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >                 return -ENOENT;
> >         }
> >
> > -       fsnotify_duplicate_mark(chunk_entry, old_entry);
> > -       if (fsnotify_add_mark(chunk_entry, chunk_entry->group, chunk_entry->inode, NULL, 1)) {
> > +       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
> >                 spin_unlock(&old_entry->lock);
> >                 fsnotify_put_mark(chunk_entry);
> >                 fsnotify_put_mark(old_entry);
> > --
> > 2.10.2
> >
> 
> -- 
> paul moore
> www.paul-moore.com
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2016-12-22 23:18   ` Paul Moore
@ 2016-12-23 13:24     ` Jan Kara
  2016-12-23 14:17       ` Paul Moore
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-23 13:24 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit

On Thu 22-12-16 18:18:36, Paul Moore wrote:
> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Audit tree code was happily adding new notification marks while holding
> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
> > lead to sleeping while holding a spinlock, deadlocks due to lock
> > inversion, and probably other fun. Fix the problem by acquiring
> > group->mark_mutex earlier.
> >
> > CC: Paul Moore <paul@paul-moore.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/audit_tree.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> [SIDE NOTE: this patch explains your comments and my earlier concern
> about the locked/unlocked variants of fsnotify_add_mark() in
> untag_chunk()]
> 
> Ouch.  Thanks for catching this ... what is your goal with these
> patches, are you targeting this as a fix during the v4.10-rcX cycle?
> If not, any objections if I pull this patch into the audit tree and
> send this to Linus during the v4.10-rcX cycle (assuming it passes
> testing, yadda yadda)?

Sure, go ahead. I plan these patches for the next merge window. So I can
rebase the series once you merge audit fixes...

								Honza
> 
> 
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index f3130eb0a4bd..156b6a93f4fc 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -231,6 +231,7 @@ static void untag_chunk(struct node *p)
> >         if (size)
> >                 new = alloc_chunk(size);
> >
> > +       mutex_lock(&entry->group->mark_mutex);
> >         spin_lock(&entry->lock);
> >         if (chunk->dead || !entry->inode) {
> >                 spin_unlock(&entry->lock);
> > @@ -258,7 +259,8 @@ static void untag_chunk(struct node *p)
> >         if (!new)
> >                 goto Fallback;
> >
> > -       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
> > +       if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
> > +                                    NULL, 1)) {
> >                 fsnotify_put_mark(&new->mark);
> >                 goto Fallback;
> >         }
> > @@ -309,6 +311,7 @@ static void untag_chunk(struct node *p)
> >         spin_unlock(&hash_lock);
> >         spin_unlock(&entry->lock);
> >  out:
> > +       mutex_unlock(&entry->group->mark_mutex);
> >         fsnotify_put_mark(entry);
> >         spin_lock(&hash_lock);
> >  }
> > @@ -385,17 +388,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >
> >         chunk_entry = &chunk->mark;
> >
> > +       mutex_lock(&old_entry->group->mark_mutex);
> >         spin_lock(&old_entry->lock);
> >         if (!old_entry->inode) {
> >                 /* old_entry is being shot, lets just lie */
> >                 spin_unlock(&old_entry->lock);
> > +               mutex_unlock(&old_entry->group->mark_mutex);
> >                 fsnotify_put_mark(old_entry);
> >                 free_chunk(chunk);
> >                 return -ENOENT;
> >         }
> >
> > -       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
> > +       if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
> > +                                    old_entry->inode, NULL, 1)) {
> >                 spin_unlock(&old_entry->lock);
> > +               mutex_unlock(&old_entry->group->mark_mutex);
> >                 fsnotify_put_mark(chunk_entry);
> >                 fsnotify_put_mark(old_entry);
> >                 return -ENOSPC;
> > @@ -411,6 +418,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >                 chunk->dead = 1;
> >                 spin_unlock(&chunk_entry->lock);
> >                 spin_unlock(&old_entry->lock);
> > +               mutex_unlock(&old_entry->group->mark_mutex);
> >
> >                 fsnotify_destroy_mark(chunk_entry, audit_tree_group);
> >
> > @@ -443,6 +451,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >         spin_unlock(&hash_lock);
> >         spin_unlock(&chunk_entry->lock);
> >         spin_unlock(&old_entry->lock);
> > +       mutex_unlock(&old_entry->group->mark_mutex);
> >         fsnotify_destroy_mark(old_entry, audit_tree_group);
> >         fsnotify_put_mark(chunk_entry); /* drop initial reference */
> >         fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
> > --
> > 2.10.2
> >
> 
> 
> 
> -- 
> paul moore
> www.paul-moore.com
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/22] audit: Abstract hash key handling
  2016-12-22 23:27   ` Paul Moore
@ 2016-12-23 13:27     ` Jan Kara
  2016-12-23 14:13       ` Paul Moore
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-23 13:27 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit

On Thu 22-12-16 18:27:40, Paul Moore wrote:
> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Audit tree currently uses inode pointer as a key into the hash table.
> > Getting that from notification mark will be somewhat more difficult with
> > coming fsnotify changes and there's no reason we really have to use the
> > inode pointer. So abstract getting of hash key from the audit chunk and
> > inode so that we can switch to a different key easily later.
> >
> > CC: Paul Moore <paul@paul-moore.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
> >  1 file changed, 28 insertions(+), 11 deletions(-)
> 
> I have no objections with this patch in particular, but in patch 8,
> are you certain that inode_to_key() and chunk_to_key() will continue
> to return the same key value?

Yes, that's the intention. Or better in that patch the key will no longer
be inode pointer but instead the fsnotify_list pointer. But still it would
match for chunks attached to an inode and inode itself so comparison
results should stay the same.

								Honza 
> 
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index 156b6a93f4fc..f0859828de09 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -163,33 +163,48 @@ enum {HASH_SIZE = 128};
> >  static struct list_head chunk_hash_heads[HASH_SIZE];
> >  static __cacheline_aligned_in_smp DEFINE_SPINLOCK(hash_lock);
> >
> > -static inline struct list_head *chunk_hash(const struct inode *inode)
> > +/* Function to return search key in our hash from inode. */
> > +static unsigned long inode_to_key(const struct inode *inode)
> >  {
> > -       unsigned long n = (unsigned long)inode / L1_CACHE_BYTES;
> > +       return (unsigned long)inode;
> > +}
> > +
> > +/*
> > + * Function to return search key in our hash from chunk. Key 0 is special and
> > + * should never be present in the hash.
> > + */
> > +static unsigned long chunk_to_key(struct audit_chunk *chunk)
> > +{
> > +       return (unsigned long)chunk->mark.inode;
> > +}
> > +
> > +static inline struct list_head *chunk_hash(unsigned long key)
> > +{
> > +       unsigned long n = key / L1_CACHE_BYTES;
> >         return chunk_hash_heads + n % HASH_SIZE;
> >  }
> >
> >  /* hash_lock & entry->lock is held by caller */
> >  static void insert_hash(struct audit_chunk *chunk)
> >  {
> > -       struct fsnotify_mark *entry = &chunk->mark;
> > +       unsigned long key = chunk_to_key(chunk);
> >         struct list_head *list;
> >
> > -       if (!entry->inode)
> > +       if (!key)
> >                 return;
> > -       list = chunk_hash(entry->inode);
> > +       list = chunk_hash(key);
> >         list_add_rcu(&chunk->hash, list);
> >  }
> >
> >  /* called under rcu_read_lock */
> >  struct audit_chunk *audit_tree_lookup(const struct inode *inode)
> >  {
> > -       struct list_head *list = chunk_hash(inode);
> > +       unsigned long key = inode_to_key(inode);
> > +       struct list_head *list = chunk_hash(key);
> >         struct audit_chunk *p;
> >
> >         list_for_each_entry_rcu(p, list, hash) {
> > -               /* mark.inode may have gone NULL, but who cares? */
> > -               if (p->mark.inode == inode) {
> > +               if (chunk_to_key(p) == key) {
> >                         atomic_long_inc(&p->refs);
> >                         return p;
> >                 }
> > @@ -585,7 +600,8 @@ int audit_remove_tree_rule(struct audit_krule *rule)
> >
> >  static int compare_root(struct vfsmount *mnt, void *arg)
> >  {
> > -       return d_backing_inode(mnt->mnt_root) == arg;
> > +       return inode_to_key(d_backing_inode(mnt->mnt_root)) ==
> > +              (unsigned long)arg;
> >  }
> >
> >  void audit_trim_trees(void)
> > @@ -620,9 +636,10 @@ void audit_trim_trees(void)
> >                 list_for_each_entry(node, &tree->chunks, list) {
> >                         struct audit_chunk *chunk = find_chunk(node);
> >                         /* this could be NULL if the watch is dying else where... */
> > -                       struct inode *inode = chunk->mark.inode;
> >                         node->index |= 1U<<31;
> > -                       if (iterate_mounts(compare_root, inode, root_mnt))
> > +                       if (iterate_mounts(compare_root,
> > +                                          (void *)chunk_to_key(chunk),
> > +                                          root_mnt))
> >                                 node->index &= ~(1U<<31);
> >                 }
> >                 spin_unlock(&hash_lock);
> > --
> > 2.10.2
> >
> 
> 
> 
> -- 
> paul moore
> www.paul-moore.com
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/22] fsnotify: Remove special handling of mark destruction on group shutdown
  2016-12-23 12:12   ` Amir Goldstein
@ 2016-12-23 13:31     ` Jan Kara
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Kara @ 2016-12-23 13:31 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Fri 23-12-16 14:12:19, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Currently we queue all marks for destruction on group shutdown and then
> > destroy them from fsnotify_destroy_group() instead from a worker thread
> > which is the usual path. However worker can already be processing some
> > list of marks to destroy so this does not make 100% all marks are really
> > destroyed by the time group is shut down. This isn't a big problem as
> > each mark holds group reference and thus group stays partially alive
> > until all marks are really freed but there's no point in complicating
> > our lives - just wait for the delayed work to be finished instead.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Isn't it *required* to wait for all marks to really be freed and not only
> nice behavior following the same reason for
> 35e4817 fsnotify: avoid spurious EMFILE errors from inotify_init()
> 
> Otherwise
> 
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>

Well, it is not nice to leave marks dangling after the group destruction
exactly for the reason you reference above. But OTOH it is mostly an
annoyance and not a serious issue unless it happens a lot (which it did not
in this case).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure
  2016-12-23  5:48   ` Amir Goldstein
@ 2016-12-23 13:34     ` Jan Kara
  2017-01-04 13:38       ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2016-12-23 13:34 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Fri 23-12-16 07:48:43, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Currently notification marks are attached to object (inode or vfsmnt) by
> > a hlist_head in the object. The list is also protected by a spinlock in
> > the object. So while there is any mark attached to the list of marks,
> > the object must be pinned in memory (and thus e.g. last iput() deleting
> > inode cannot happen). Also for list iteration in fsnotify() to work, we
> > must hold fsnotify_mark_srcu lock so that mark itself and
> > mark->obj_list.next cannot get freed. Thus we are required to wait for
> > response to fanotify events from userspace process with
> > fsnotify_mark_srcu lock held. That causes issues when userspace process
> > is buggy and does not reply to some event - basically the whole
> > notification subsystem gets eventually stuck.
> >
> > So to be able to drop fsnotify_mark_srcu lock while waiting for
> > response, we have to pin the mark in memory and make sure it stays in
> > the object list (as removing the mark waiting for response could lead to
> > lost notification events for groups later in the list). However we don't
> > want inode reclaim to block on such mark as that would lead to system
> > just locking up elsewhere.
> >
> > This commit tries to pave a way towards solving these conflicting
> > lifetime needs. Instead of anchoring the list of marks directly in the
> > object, we anchor it in a dedicated structure (fsnotify_mark_list) and
> > just point to that structure from the object. Also the list is protected
> > by a spinlock contained in that structure. With this, we can detach
> > notification marks from object without having to modify the list itself.
> >
> 
> The structural change looks very good to.
> It makes the code much easier to manage IMO.
> 
> I am only half way though this big change, but I wanted to make one meta
> comment.
> 
> I have a problem with the choice of naming for the new struct.
> 'list' is really an overloaded term and the use of 'list' as a name of
> a class that
> contains a list head makes for some really confusing constructs like
> list->list and mark->obj_list_head which is not a list_head struct.

OK, I'll think about better naming. I agree it may be slightly confusing.

> For future generations, I suggest that we invest the effort in choosing
> a name that makes more sense. I do realize how annoying it would be to
> fix the entire series now, so it's not a problem if renaming is done in the end
> of the series as long as we agree on the end result.
> 
> May I suggest the name fsnotify_tap to describe the new struct.
> I know it is arbitrary, but not more arbitrary then fsnotify_mark and certainly
> not any more arbitrary then fsnotify_group.
> 
> Here are some examples of constructs that will make more sense:
> 
> <+#define FSNOTIFY_LIST_TYPE_INODE       0x01
> <+#define FSNOTIFY_LIST_TYPE_VFSMOUNT    0x02
> >+#define FSNOTIFY_TAP_TYPE_INODE       0x01
> >+#define FSNOTIFY_TAP_TYPE_VFSMOUNT    0x02
> 
> LIST_TYPE_INODE implies this is a list of inodes
> TAP_TYPE_INODE implies this is a tap on an inode

Frankly, I don't like 'TAP' much because as you say it is rather arbitrary
(or maybe I'm just missing a point as a non-native speaker). I'd prefer
something more descriptive.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/22] fsnotify: Detach mark from object list when last reference is dropped
  2016-12-23 10:51   ` Amir Goldstein
@ 2016-12-23 13:42     ` Jan Kara
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Kara @ 2016-12-23 13:42 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Fri 23-12-16 12:51:28, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Instead of removing mark from object list from fsnotify_detach_mark(),
> > remove the mark when last reference to the mark is dropped. This will
> > allow fanotify to wait for userspace response to event without having to
> > hold onto fsnotify_mark_srcu.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> ...
> 
> +/* Called with mark->obj_list_head->lock held, releases it */
> +static void fsnotify_detach_from_object(struct fsnotify_mark *mark)
>  {
> 
> IMO, the implicit release in this function makes the code using it hard
> to read and maintain. Please consider splitting it into 2 functions
> to be called from code that explicitly unlocks, e.g.:
> 
>      free_list = fsnotify_detach_from_object_locked(mark, &inode);
>      spin_unlock(&list->lock);
>      if (inode)
>         iput(inode);
>      if (free_list)
>         fsnotify_free_list(list);

Maybe I'll instead move atomic_dec_and_lock() into
fsnotify_detach_from_object(). I'll just have to find good name for that
function.

> ...
> > +               inode = fsnotify_detach_list_from_object(list);
> >                 free_list = true;
> >         } else
> >                 __fsnotify_recalc_mask(list);
> >         mark->obj_list_head = NULL;
> >         spin_unlock(&list->lock);
> >
> > +       if (inode)
> > +               iput(inode);
> > +
> 
> Question: if list is holding inode anyway, what's the use of
> FSNOTIFY_MARK_FLAG_OBJECT_PINNED?
> or maybe you are removing it later on in the series?

It is removed (maybe later, but certainly I remember dropping it).

> ...
> > +       /*
> > +        * We have to be careful since we can race with e.g.
> > +        * fsnotify_clear_marks_by_group() and once we drop the list->lock, the
> > +        * list can get modified. However we are holding mark reference and
> > +        * thus our mark cannot be removed from obj_list so we can continue
> > +        * iteration after regaining list->lock.
> > +        */
> > +       hlist_for_each_entry(mark, &list->list, obj_list) {
> >                 fsnotify_get_mark(mark);
> > -               fsnotify_put_list(list);
> > +               spin_unlock(&list->lock);
> > +               if (old_mark)
> > +                       fsnotify_put_mark(old_mark);
> > +               old_mark = mark;
> >                 fsnotify_destroy_mark(mark, mark->group);
> > -               fsnotify_put_mark(mark);
> > +               spin_lock(&list->lock);
> >         }
> > +       /*
> > +        * Detach list from object now so that we don't pin inode until all
> > +        * mark references get dropped. It would lead to strange results such
> > +        * as delaying inode deletion or blocking unmount.
> > +        */
> > +       inode = fsnotify_detach_list_from_object(list);
> > +       fsnotify_put_list(list);
> > +       if (inode)
> > +               iput(inode);
> > +       if (old_mark)
> > +               fsnotify_put_mark(old_mark);
> 
> I must be missing something subtle here. I don't see where the list->lock
> is unlocked.

fsnotify_put_list() - fsnotify_grab_list() will get list->lock,
fsnotify_put_list() drops it.

> Also, I am not sure if you placed put old_mark after
> fsnotify_put_list
> for a reason. If you did, I did not find that reason in the comments. If you
> didn't I think it would be more appropriate after the list iteration
> ends, although
> it appear that put old_mark should be called after list->lock unlock.

Exactly. You have to drop mark reference after unlocking list for obvious
reasons.

> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> > index 6086fc7ff6df..76b3c34172c7 100644
> > --- a/include/linux/fsnotify_backend.h
> > +++ b/include/linux/fsnotify_backend.h
> > @@ -244,9 +244,9 @@ struct fsnotify_mark {
> >         struct list_head g_list;
> >         /* Protects inode / mnt pointers, flags, masks */
> >         spinlock_t lock;
> > -       /* List of marks for inode / vfsmount [obj_list_head->lock] */
> > +       /* List of marks for inode / vfsmount [obj_list_head->lock, mark ref] */
> >         struct hlist_node obj_list;
> > -       /* Head of list of marks for an object [mark->lock, group->mark_mutex] */
> > +       /* Head of list of marks for an object [mark ref] */
> >         struct fsnotify_mark_list *obj_list_head;
> 
> What is the meaning of [mark ref] here?
> If the mark is on the obj_list its refcount is already elevated.
> I thought it's the mark that is holding a ref on the list_head (or tap
> if you accept my suggestion)
> and not the other way around.

So [mark ref] here means that if you hold mark reference, obj_list_head
cannot change and mark is pinned in the object list. Probably I can put
more detailed explanation above the structure declaration.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark()
  2016-12-23 13:22     ` Jan Kara
@ 2016-12-23 14:01       ` Paul Moore
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Moore @ 2016-12-23 14:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Fri, Dec 23, 2016 at 8:22 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 18:13:11, Paul Moore wrote:
>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > There are only two calls sites of fsnotify_duplicate_mark(). Those are
>> > in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
>> > for audit tree, inode pointer and group gets set in
>> > fsnotify_add_mark_locked() later anyway, mask and free_mark are already
>> > set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
>> > actively harmful because following fsnotify_add_mark_locked() will leak
>> > group reference by overwriting the group pointer. So just remove the two
>> > calls to fsnotify_duplicate_mark() and the function.
>> >
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> >  fs/notify/mark.c                 | 12 ------------
>> >  include/linux/fsnotify_backend.h |  2 --
>> >  kernel/audit_tree.c              |  6 ++----
>> >  3 files changed, 2 insertions(+), 18 deletions(-)
>>
>> At first glance this looks reasonable, although you keep mentioning
>> "fsnotify_add_mark_locked" above when untag_chunk() is calling
>> "fsnotify_add_mark"; I just wanted to make sure you hadn't intended to
>> take the mutex in the audit code instead of relying on the locking in
>> fsnotify_add_mark().
>
> No, I didn't want to take mutex in the audit code. It is just that
> fsnotify_add_mark() is a thin wrapper around fsnotify_add_mark_locked() so
> I was speaking about that function.

Understood.  I just wanted to make sure this is what you intended
since the commit description and patch did not match.

If you end up respinning the patchset for any reason I might suggest
changing the function name in the description above.

>>
>> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
>> > index d3fea0bd89e2..6043306e8e21 100644
>> > --- a/fs/notify/mark.c
>> > +++ b/fs/notify/mark.c
>> > @@ -510,18 +510,6 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
>> >         }
>> >  }
>> >
>> > -void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
>> > -{
>> > -       assert_spin_locked(&old->lock);
>> > -       new->inode = old->inode;
>> > -       new->mnt = old->mnt;
>> > -       if (old->group)
>> > -               fsnotify_get_group(old->group);
>> > -       new->group = old->group;
>> > -       new->mask = old->mask;
>> > -       new->free_mark = old->free_mark;
>> > -}
>> > -
>> >  /*
>> >   * Nothing fancy, just initialize lists and locks and counters.
>> >   */
>> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
>> > index 0cf34d6cc253..487246546ebe 100644
>> > --- a/include/linux/fsnotify_backend.h
>> > +++ b/include/linux/fsnotify_backend.h
>> > @@ -323,8 +323,6 @@ extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(str
>> >  extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
>> >  /* find (and take a reference) to a mark associated with group and vfsmount */
>> >  extern struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt);
>> > -/* copy the values from old into new */
>> > -extern void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old);
>> >  /* set the ignored_mask of a mark */
>> >  extern void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask);
>> >  /* set the mask of a mark (might pin the object into memory */
>> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
>> > index 8b1dde96a0fa..f3130eb0a4bd 100644
>> > --- a/kernel/audit_tree.c
>> > +++ b/kernel/audit_tree.c
>> > @@ -258,8 +258,7 @@ static void untag_chunk(struct node *p)
>> >         if (!new)
>> >                 goto Fallback;
>> >
>> > -       fsnotify_duplicate_mark(&new->mark, entry);
>> > -       if (fsnotify_add_mark(&new->mark, new->mark.group, new->mark.inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
>> >                 fsnotify_put_mark(&new->mark);
>> >                 goto Fallback;
>> >         }
>> > @@ -395,8 +394,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >                 return -ENOENT;
>> >         }
>> >
>> > -       fsnotify_duplicate_mark(chunk_entry, old_entry);
>> > -       if (fsnotify_add_mark(chunk_entry, chunk_entry->group, chunk_entry->inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
>> >                 spin_unlock(&old_entry->lock);
>> >                 fsnotify_put_mark(chunk_entry);
>> >                 fsnotify_put_mark(old_entry);
>> > --
>> > 2.10.2
>> >
>>
>> --
>> paul moore
>> www.paul-moore.com
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR



-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/22] audit: Abstract hash key handling
  2016-12-23 13:27     ` Jan Kara
@ 2016-12-23 14:13       ` Paul Moore
  2017-01-03 17:34         ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Paul Moore @ 2016-12-23 14:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Fri, Dec 23, 2016 at 8:27 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 18:27:40, Paul Moore wrote:
>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > Audit tree currently uses inode pointer as a key into the hash table.
>> > Getting that from notification mark will be somewhat more difficult with
>> > coming fsnotify changes and there's no reason we really have to use the
>> > inode pointer. So abstract getting of hash key from the audit chunk and
>> > inode so that we can switch to a different key easily later.
>> >
>> > CC: Paul Moore <paul@paul-moore.com>
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> >  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
>> >  1 file changed, 28 insertions(+), 11 deletions(-)
>>
>> I have no objections with this patch in particular, but in patch 8,
>> are you certain that inode_to_key() and chunk_to_key() will continue
>> to return the same key value?
>
> Yes, that's the intention. Or better in that patch the key will no longer
> be inode pointer but instead the fsnotify_list pointer. But still it would
> match for chunks attached to an inode and inode itself so comparison
> results should stay the same.

My apologies, I probably should have been more clear.

Yes, I think we are all in agreement that the *_to_key() functions
need to return a consistent value for the same object.  My concern is
that in patch 8 these functions are using different variables (granted
they may contain the same value, and therefore evaluate to the same
key) and I worry that there is a possibility of the two variables
taking on different values and breaking the hash.  What guarantees
exist that these values will be the same?  Are there any safeguards to
prevent future patches from accidentally sidestepping these
guarantees?

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2016-12-23 13:24     ` Jan Kara
@ 2016-12-23 14:17       ` Paul Moore
  2016-12-26 16:33         ` Paul Moore
  0 siblings, 1 reply; 80+ messages in thread
From: Paul Moore @ 2016-12-23 14:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Fri, Dec 23, 2016 at 8:24 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 18:18:36, Paul Moore wrote:
>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > Audit tree code was happily adding new notification marks while holding
>> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
>> > lead to sleeping while holding a spinlock, deadlocks due to lock
>> > inversion, and probably other fun. Fix the problem by acquiring
>> > group->mark_mutex earlier.
>> >
>> > CC: Paul Moore <paul@paul-moore.com>
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> >  kernel/audit_tree.c | 13 +++++++++++--
>> >  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> [SIDE NOTE: this patch explains your comments and my earlier concern
>> about the locked/unlocked variants of fsnotify_add_mark() in
>> untag_chunk()]
>>
>> Ouch.  Thanks for catching this ... what is your goal with these
>> patches, are you targeting this as a fix during the v4.10-rcX cycle?
>> If not, any objections if I pull this patch into the audit tree and
>> send this to Linus during the v4.10-rcX cycle (assuming it passes
>> testing, yadda yadda)?
>
> Sure, go ahead. I plan these patches for the next merge window. So I can
> rebase the series once you merge audit fixes...

Okay, great.  I'll merge this patch in the audit/stable-4.10 branch
for Linus but there will likely be some delays due to
holidays/vacation on my end.

Thanks again for your help fixing this, I really appreciate it.

>> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
>> > index f3130eb0a4bd..156b6a93f4fc 100644
>> > --- a/kernel/audit_tree.c
>> > +++ b/kernel/audit_tree.c
>> > @@ -231,6 +231,7 @@ static void untag_chunk(struct node *p)
>> >         if (size)
>> >                 new = alloc_chunk(size);
>> >
>> > +       mutex_lock(&entry->group->mark_mutex);
>> >         spin_lock(&entry->lock);
>> >         if (chunk->dead || !entry->inode) {
>> >                 spin_unlock(&entry->lock);
>> > @@ -258,7 +259,8 @@ static void untag_chunk(struct node *p)
>> >         if (!new)
>> >                 goto Fallback;
>> >
>> > -       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
>> > +                                    NULL, 1)) {
>> >                 fsnotify_put_mark(&new->mark);
>> >                 goto Fallback;
>> >         }
>> > @@ -309,6 +311,7 @@ static void untag_chunk(struct node *p)
>> >         spin_unlock(&hash_lock);
>> >         spin_unlock(&entry->lock);
>> >  out:
>> > +       mutex_unlock(&entry->group->mark_mutex);
>> >         fsnotify_put_mark(entry);
>> >         spin_lock(&hash_lock);
>> >  }
>> > @@ -385,17 +388,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >
>> >         chunk_entry = &chunk->mark;
>> >
>> > +       mutex_lock(&old_entry->group->mark_mutex);
>> >         spin_lock(&old_entry->lock);
>> >         if (!old_entry->inode) {
>> >                 /* old_entry is being shot, lets just lie */
>> >                 spin_unlock(&old_entry->lock);
>> > +               mutex_unlock(&old_entry->group->mark_mutex);
>> >                 fsnotify_put_mark(old_entry);
>> >                 free_chunk(chunk);
>> >                 return -ENOENT;
>> >         }
>> >
>> > -       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
>> > +                                    old_entry->inode, NULL, 1)) {
>> >                 spin_unlock(&old_entry->lock);
>> > +               mutex_unlock(&old_entry->group->mark_mutex);
>> >                 fsnotify_put_mark(chunk_entry);
>> >                 fsnotify_put_mark(old_entry);
>> >                 return -ENOSPC;
>> > @@ -411,6 +418,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >                 chunk->dead = 1;
>> >                 spin_unlock(&chunk_entry->lock);
>> >                 spin_unlock(&old_entry->lock);
>> > +               mutex_unlock(&old_entry->group->mark_mutex);
>> >
>> >                 fsnotify_destroy_mark(chunk_entry, audit_tree_group);
>> >
>> > @@ -443,6 +451,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >         spin_unlock(&hash_lock);
>> >         spin_unlock(&chunk_entry->lock);
>> >         spin_unlock(&old_entry->lock);
>> > +       mutex_unlock(&old_entry->group->mark_mutex);
>> >         fsnotify_destroy_mark(old_entry, audit_tree_group);
>> >         fsnotify_put_mark(chunk_entry); /* drop initial reference */
>> >         fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
>> > --
>> > 2.10.2
>> >

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()
  2016-12-22  9:15 ` [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark() Jan Kara
@ 2016-12-26 14:15   ` Amir Goldstein
  2017-01-04 10:28     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 14:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Currently we queue mark into a list of marks for destruction in
> __fsnotify_free_mark() and keep last mark reference dangling. After the
> worker waits for SRCU period, it drops the last reference to the mark
> which frees it. This scheme has the disadvantage that if we hold
> reference to mark and drop and reacquire SRCU lock, the mark can get
> freed immediately which is slightly inconvenient and we will need to
> avoid this in the future.
>
> Move to a scheme where queueing of mark into a list of marks for
> destruction happens when the last reference to the mark is dropped. Also
> drop reference to the mark held by group list already when mark is
> removed from that list instead of dropping it only from the destruction
> worker.
>

The BEFORE section refers to what SRCU protects, which this patch
slightly changes. Can you please add to AFTER section, what is protected
by SRCU after this patch. IIUC, SRCU protects from freeing the mark,
but it does not protect from removing mark from group list, so after
drop and reacquire SRCU with elevated mark refcount, mark can find
itself not on any list.
For example in inotify_handle_event() when calling fsnotify_destroy_mark()
for IN_ONESHOT mark.

Please clarify.

> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/mark.c | 77 ++++++++++++++++++++++----------------------------------
>  1 file changed, 30 insertions(+), 47 deletions(-)
>
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index 60f5754ce5ed..fee4255e9227 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -105,6 +105,7 @@ static DECLARE_WORK(list_reaper_work, fsnotify_list_destroy_workfn);
>
>  void fsnotify_get_mark(struct fsnotify_mark *mark)
>  {
> +       WARN_ON_ONCE(!atomic_read(&mark->refcnt));
>         atomic_inc(&mark->refcnt);
>  }
>
> @@ -239,26 +240,32 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
>                  * from __fsnotify_parent() lazily when next event happens on
>                  * one of our children.
>                  */
> -               fsnotify_final_mark_destroy(mark);
> +               spin_lock(&destroy_lock);
> +               list_add(&mark->g_list, &destroy_list);
> +               spin_unlock(&destroy_lock);
> +               queue_delayed_work(system_unbound_wq, &reaper_work,
> +                                  FSNOTIFY_REAPER_DELAY);
>         }
>  }
>
>  /*
>   * Mark mark as dead, remove it from group list. Mark still stays in object
> - * list until its last reference is dropped. The reference corresponding to
> - * group list gets dropped after SRCU period ends from
> - * fsnotify_mark_destroy_list(). Note that we rely on mark being removed from
> - * group list before corresponding reference to it is dropped. In particular we
> - * rely on mark->obj_list_head being valid while we hold group->mark_mutex if
> - * we found the mark through g_list.
> + * list until its last reference is dropped.  Note that we rely on mark being
> + * removed from group list before corresponding reference to it is dropped. In
> + * particular we rely on mark->obj_list_head being valid while we hold
> + * group->mark_mutex if we found the mark through g_list.
>   *
> - * Must be called with group->mark_mutex held.
> + * Must be called with group->mark_mutex held. The caller must either hold
> + * reference to the mark or be protected by fsnotify_mark_srcu.
>   */
>  void fsnotify_detach_mark(struct fsnotify_mark *mark)
>  {
>         struct fsnotify_group *group = mark->group;
>
> -       BUG_ON(!mutex_is_locked(&group->mark_mutex));
> +       WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex));
> +       WARN_ON_ONCE(!srcu_read_lock_held(&fsnotify_mark_srcu) &&
> +                    atomic_read(&mark->refcnt) < 1 +
> +                       !!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED));
>
>         spin_lock(&mark->lock);
>         /* something else already called this function on this mark */
> @@ -271,18 +278,20 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
>         spin_unlock(&mark->lock);
>
>         atomic_dec(&group->num_marks);
> +
> +       /* Drop mark reference acquired in fsnotify_add_mark_locked() */
> +       fsnotify_put_mark(mark);
>  }
>
>  /*
> - * Prepare mark for freeing and add it to the list of marks prepared for
> - * freeing. The actual freeing must happen after SRCU period ends and the
> - * caller is responsible for this.
> + * Free fsnotify mark. The mark is actually only marked as being freed.  The
> + * freeing is actually happening only once last reference to the mark is
> + * dropped from a workqueue which first waits for srcu period end.
>   *
> - * The function returns true if the mark was added to the list of marks for
> - * freeing. The function returns false if someone else has already called
> - * __fsnotify_free_mark() for the mark.
> + * Caller must have a reference to the mark or be protected by
> + * fsnotify_mark_srcu.
>   */
> -static bool __fsnotify_free_mark(struct fsnotify_mark *mark)
> +void fsnotify_free_mark(struct fsnotify_mark *mark)
>  {
>         struct fsnotify_group *group = mark->group;
>
> @@ -290,7 +299,7 @@ static bool __fsnotify_free_mark(struct fsnotify_mark *mark)
>         /* something else already called this function on this mark */
>         if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
>                 spin_unlock(&mark->lock);
> -               return false;
> +               return;
>         }
>         mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
>         spin_unlock(&mark->lock);
> @@ -302,25 +311,6 @@ static bool __fsnotify_free_mark(struct fsnotify_mark *mark)
>          */
>         if (group->ops->freeing_mark)
>                 group->ops->freeing_mark(mark, group);
> -
> -       spin_lock(&destroy_lock);
> -       list_add(&mark->g_list, &destroy_list);
> -       spin_unlock(&destroy_lock);
> -
> -       return true;
> -}
> -
> -/*
> - * Free fsnotify mark. The freeing is actually happening from a workqueue which
> - * first waits for srcu period end. Caller must have a reference to the mark
> - * or be protected by fsnotify_mark_srcu.
> - */
> -void fsnotify_free_mark(struct fsnotify_mark *mark)
> -{
> -       if (__fsnotify_free_mark(mark)) {
> -               queue_delayed_work(system_unbound_wq, &reaper_work,
> -                                  FSNOTIFY_REAPER_DELAY);
> -       }
>  }
>
>  void fsnotify_destroy_mark(struct fsnotify_mark *mark,
> @@ -537,20 +527,13 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
>
>         return ret;
>  err:
> -       mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
> +       mark->flags &= ~(FSNOTIFY_MARK_FLAG_ALIVE |
> +                        FSNOTIFY_MARK_FLAG_ATTACHED);
>         list_del_init(&mark->g_list);
> -       fsnotify_put_group(group);

fsnotify_put_group() is removed by this patch here and not added anywhere else
nor is any fsnotify_get_group() call removed. Is this a leak or maybe
fixed later on??

> -       mark->group = NULL;
>         atomic_dec(&group->num_marks);
> -
>         spin_unlock(&mark->lock);
>
> -       spin_lock(&destroy_lock);
> -       list_add(&mark->g_list, &destroy_list);
> -       spin_unlock(&destroy_lock);
> -       queue_delayed_work(system_unbound_wq, &reaper_work,
> -                               FSNOTIFY_REAPER_DELAY);
> -
> +       fsnotify_put_mark(mark);
>         return ret;
>  }
>
> @@ -724,7 +707,7 @@ static void fsnotify_mark_destroy_workfn(struct work_struct *work)
>
>         list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
>                 list_del_init(&mark->g_list);
> -               fsnotify_put_mark(mark);
> +               fsnotify_final_mark_destroy(mark);
>         }
>  }
>
> --
> 2.10.2
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2016-12-22  9:15 ` [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Jan Kara
@ 2016-12-26 15:01   ` Amir Goldstein
  2016-12-26 15:11   ` Amir Goldstein
  2016-12-26 18:37   ` Amir Goldstein
  2 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 15:01 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
> from userspace so that the whole notification subsystem is not blocked
> during that time. This patch provides a framework for safely getting
> mark reference for a mark found in the object list which pins the mark
> in that list. We can then drop fsnotify_mark_srcu, wait for userspace
> response and then safely continue iteration of the object list once we
> reaquire fsnotify_mark_srcu.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks good

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2016-12-22  9:15 ` [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Jan Kara
  2016-12-26 15:01   ` Amir Goldstein
@ 2016-12-26 15:11   ` Amir Goldstein
  2017-01-04  9:03     ` Jan Kara
  2016-12-26 18:37   ` Amir Goldstein
  2 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 15:11 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
> from userspace so that the whole notification subsystem is not blocked
> during that time. This patch provides a framework for safely getting
> mark reference for a mark found in the object list which pins the mark
> in that list. We can then drop fsnotify_mark_srcu, wait for userspace
> response and then safely continue iteration of the object list once we
> reaquire fsnotify_mark_srcu.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
...
> +       /*
> +        * Now that both marks are pinned by refcount we can drop SRCU lock.
> +        * Marks can still be removed from the list but because of refcount
> +        * they cannot be destroyed and we can safely resume the list iteration
> +        * once userspace returns.
> +        */

Sorry, forgot to comment on this.
"Marks can still be removed from the list ...
... and we can safely resume the list iteration"

I suppose you are plannig to get the mechanics right, by replacing
hlist_del_init() with just __hlist_del() ?? but this sentence is confusing.
Usually, it wouldn't be safe to resume iteration if items may have been removed,
so perhaps rephrase or clarify.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/22] fsnotify: Pass SRCU index into handle_event handler
  2016-12-22  9:15 ` [PATCH 14/22] fsnotify: Pass SRCU index into handle_event handler Jan Kara
@ 2016-12-26 15:13   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 15:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Pass index acquired from srcu_read_lock into ->handle_event() handler so
> that it can release and reacquire SRCU lock via
> fsnotify_prepare_user_wait() and fsnotify_finish_user_wait() functions.
> These functions also make sure current marks are appropriately pinned so
> that iteration protected by srcu in fsnotify() stays safe.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/22] fanotify: Release SRCU lock when waiting for userspace response
  2016-12-22  9:15 ` [PATCH 15/22] fanotify: Release SRCU lock when waiting for userspace response Jan Kara
@ 2016-12-26 15:22   ` Amir Goldstein
  2017-01-04  9:05     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 15:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> When userspace task processing fanotify permission events screws up and
> does not respond, fsnotify_mark_srcu SRCU is held indefinitely which
> causes further hangs in the whole notification subsystem. Although we
> cannot easily solve the problem of operations blocked waiting for
> response from userspace, we can at least somewhat localize the damage by
> dropping SRCU lock before waiting for userspace response and reacquiring
> it when userspace responds.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

Looks good. one nit below.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>


>  fs/notify/fanotify/fanotify.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 2e8ca885fb3e..284d2d112ad2 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -61,7 +61,10 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
>
>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>  static int fanotify_get_response(struct fsnotify_group *group,
> -                                struct fanotify_perm_event_info *event)
> +                                struct fsnotify_mark *inode_mark,
> +                                struct fsnotify_mark *vfsmount_mark,
> +                                struct fanotify_perm_event_info *event,
> +                                int *srcu_idx)
>  {
>         int ret;
>
> @@ -69,6 +72,15 @@ static int fanotify_get_response(struct fsnotify_group *group,
>
>         wait_event(group->fanotify_data.access_waitq, event->response);
>
> +       if (!fsnotify_prepare_user_wait(inode_mark, vfsmount_mark, srcu_idx)) {

Since it is not clear for reader of this code the conditions where
fsnotify_prepare_user_wait() can fail, a comment here would be nice
to explain the choice of ALLOW

> +               event->response = FAN_ALLOW;
> +               goto out;
> +       }
> +
> +       wait_event(group->fanotify_data.access_waitq, event->response);
> +
> +       fsnotify_finish_user_wait(inode_mark, vfsmount_mark, srcu_idx);
> +out:
>         /* userspace responded, convert to something usable */
>         switch (event->response) {
>         case FAN_ALLOW:
> @@ -220,7 +232,8 @@ static int fanotify_handle_event(struct fsnotify_group *group,
>
>  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
>         if (mask & FAN_ALL_PERM_EVENTS) {
> -               ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event));
> +               ret = fanotify_get_response(group, inode_mark, fanotify_mark,
> +                                           FANOTIFY_PE(fsn_event), srcu_idx);
>                 fsnotify_destroy_event(group, fsn_event);
>         }
>  #endif
> --
> 2.10.2
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2016-12-23 14:17       ` Paul Moore
@ 2016-12-26 16:33         ` Paul Moore
  2017-01-02 18:21           ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Paul Moore @ 2016-12-26 16:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Fri, Dec 23, 2016 at 9:17 AM, Paul Moore <paul@paul-moore.com> wrote:
> On Fri, Dec 23, 2016 at 8:24 AM, Jan Kara <jack@suse.cz> wrote:
>> On Thu 22-12-16 18:18:36, Paul Moore wrote:
>>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>>> > Audit tree code was happily adding new notification marks while holding
>>> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
>>> > lead to sleeping while holding a spinlock, deadlocks due to lock
>>> > inversion, and probably other fun. Fix the problem by acquiring
>>> > group->mark_mutex earlier.
>>> >
>>> > CC: Paul Moore <paul@paul-moore.com>
>>> > Signed-off-by: Jan Kara <jack@suse.cz>
>>> > ---
>>> >  kernel/audit_tree.c | 13 +++++++++++--
>>> >  1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> [SIDE NOTE: this patch explains your comments and my earlier concern
>>> about the locked/unlocked variants of fsnotify_add_mark() in
>>> untag_chunk()]
>>>
>>> Ouch.  Thanks for catching this ... what is your goal with these
>>> patches, are you targeting this as a fix during the v4.10-rcX cycle?
>>> If not, any objections if I pull this patch into the audit tree and
>>> send this to Linus during the v4.10-rcX cycle (assuming it passes
>>> testing, yadda yadda)?
>>
>> Sure, go ahead. I plan these patches for the next merge window. So I can
>> rebase the series once you merge audit fixes...
>
> Okay, great.  I'll merge this patch in the audit/stable-4.10 branch
> for Linus but there will likely be some delays due to
> holidays/vacation on my end.
>
> Thanks again for your help fixing this, I really appreciate it.

I merged this patch, as well as the "Remove fsnotify_duplicate_mark()"
patch (to make things cleaner when merging this patch) and did a quick
test using the audit-testsuite ... the test hung on the "file_create"
tests.  Unfortunately, I'm traveling right now for the holidays and
will not likely have a chance to debug this much further until after
the new year, but I thought I would mention it in case you had some
time to look into this failure.

For reference, here is the audit-testsuite again:

* https://github.com/linux-audit/audit-testsuite

... and if you have a Fedora test system, here is the Rawhide kernel I
used to test (it is basically my kernel-secnext test kernel with those
two patches mentioned above added on top):

* https://copr.fedorainfracloud.org/coprs/pcmoore/kernel-testing/build/492386

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/22] fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()
  2016-12-22  9:15 ` [PATCH 16/22] fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked() Jan Kara
@ 2016-12-26 16:42   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 16:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> These helpers are now only a simple assignment and just obfuscate
> what is going on. Remove them.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/22] fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask()
  2016-12-22  9:15 ` [PATCH 17/22] fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask() Jan Kara
@ 2016-12-26 16:44   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 16:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> These helpers are just very thin wrappers now. Remove them.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/22] fsnotify: Inline fsnotify_clear_{inode|vfsmount|_mark_group()
  2016-12-22  9:15 ` [PATCH 18/22] fsnotify: Inline fsnotify_clear_{inode|vfsmount|_mark_group() Jan Kara
@ 2016-12-26 16:57   ` Amir Goldstein
  2017-01-04  9:28     ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 16:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Inline these helpers as they are very thin. We still keep them as we
> don't want to expose details about how list type is determined.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

This patch looks good, but see comment below about suggested extra cleanup.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>


> +/* run all the marks in a group, and clear all of the vfsmount marks */
> +static inline void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
> +{
> +       fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_VFSMOUNT);

Suggestion for extra cleanup while at it:
IMO, the choice of name fsnotify_clear_marks_by_group_flags() was a
bad choice, because
1. it sounds like "by group->flags" and its not
2. it is presented as a generic helper, but it is never likely to be
used by anything other then
    those 2 call sites for FAN_MARK_FLUSH api

So given the above, I think it would make more sense to name the function
fsnotify_clear_marks_by_group_and_type() for what it really is.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 19/22] fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark()
  2016-12-22  9:15 ` [PATCH 19/22] fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark() Jan Kara
@ 2016-12-26 17:14   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 17:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> These are very thin wrappers, just remove them. Drop
> fs/notify/vfsmount_mark.c as it is empty now.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 20/22] fsnotify: Drop inode_mark.c
  2016-12-22  9:15 ` [PATCH 20/22] fsnotify: Drop inode_mark.c Jan Kara
@ 2016-12-26 17:15   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 17:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> inode_mark.c now contains only a single function. Move it to
> fs/notify/fsnotify.c and remove inode_mark.c.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 21/22] fsnotify: Add group pointer in fsnotify_init_mark()
  2016-12-22  9:15 ` [PATCH 21/22] fsnotify: Add group pointer in fsnotify_init_mark() Jan Kara
@ 2016-12-26 17:34   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 17:34 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Currently we initialize mark->group only in fsnotify_add_mark_lock().
> However we will need to access fsnotify_ops of corresponding group from
> fsnotify_put_mark() so we need mark->group initialized earlier. Do that
> in fsnotify_init_mark() which has a consequence that once
> fsnotify_init_mark() is called on a mark, the mark has to be destroyed
> by fsnotify_put_mark().
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

Paul,

I reviewed the fs/notify parts.
audit bits were hard for me to follow.
Please back me up on those.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/22] fsnotify: Move ->free_mark callback to fsnotify_ops
  2016-12-22  9:15 ` [PATCH 22/22] fsnotify: Move ->free_mark callback to fsnotify_ops Jan Kara
@ 2016-12-26 17:39   ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 17:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> Pointer to ->free_mark callback unnecessarily occupies one long in each
> fsnotify_mark although they are the same for all marks from one
> notification group. Move the callback pointer to fsnotify_ops.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2016-12-22  9:15 ` [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Jan Kara
  2016-12-26 15:01   ` Amir Goldstein
  2016-12-26 15:11   ` Amir Goldstein
@ 2016-12-26 18:37   ` Amir Goldstein
  2 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2016-12-26 18:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

> +bool fsnotify_prepare_user_wait(struct fsnotify_mark *inode_mark,
> +                               struct fsnotify_mark *vfsmount_mark,
> +                               int *srcu_idx)
> +{
> +       struct fsnotify_group *group;
> +
> +       if (WARN_ON_ONCE(!inode_mark && !vfsmount_mark))
> +               return false;
> +
> +       if (inode_mark)
> +               group = inode_mark->group;
> +       else
> +               group = vfsmount_mark->group;
> +
> +       /*
> +        * Since acquisition of mark reference is an atomic op as well, we can
> +        * be sure this inc is seen before any effect of refcount increment.
> +        */
> +       atomic_inc(&group->user_waits);
> +
> +       if (inode_mark) {
> +               /* This can fail if mark is being removed */
> +               if (!fsnotify_get_mark_safe(inode_mark))
> +                       goto out_wait;
> +       }
> +       if (vfsmount_mark) {
> +               if (!fsnotify_get_mark_safe(vfsmount_mark))
> +                       goto out_inode;
> +       }
> +
> +       /*
> +        * Now that both marks are pinned by refcount we can drop SRCU lock.
> +        * Marks can still be removed from the list but because of refcount
> +        * they cannot be destroyed and we can safely resume the list iteration
> +        * once userspace returns.
> +        */

Jan,

Forgive me for hijacking this review for yet another cleanup proposal.
When I first looked at this function I thought:
"<sigh> again with those inode_mark, vfsmount_mark args.. oh well"
but then I took another look and it suddenly seems quite simple to get rid of
all the places that get passed these 2 args and simplify all of them
and mostly simplify send_to_group().

The plan is:
1. Return 1 from handle_event() => send_to_group() if event was
"dropped" by group.
2. backends may return "dropped" for several reasons (e.g. non-dir
inode in dnotify),
    but the only interesting case to return "dropped" is in fanotify
    if (!fanotify_should_send_event()), because only fanotify supports
vfsmount_mark
3. in fsnotify(), if (inode_group == vfsmount_group), pass only vfsmount_mark
    to send_to_group() and check for "dropped" event. if event was dropped,
    set inode_group = NULL, so inode_mark is iterated again. if event
wasn't dropped,
    there is no reason to call send_to_group() again with inode_mark.

This logic change incurs a behavior change because fanotify_should_send_event()
for some reason combines the inode and vfsmount mark ignore masks to a single
unified ignore mask, so an ignore bit in just one of them would today cause not
sending event to group.
However, from reading the man page, this seems like a bug rather then desired
behavior, because the ignore mask should be relative to the object in question
and it should be cleared when the object in question is modified and
in that sense
a mount is a completely different object than an inode, so their ignore masks
should be independent as well and not unified.

I will send out a POC patch for you to consider along with the rest of your
very neat cleanups.

Amir.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2016-12-26 16:33         ` Paul Moore
@ 2017-01-02 18:21           ` Jan Kara
  2017-01-03 21:11               ` Paul Moore
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2017-01-02 18:21 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit

[-- Attachment #1: Type: text/plain, Size: 3016 bytes --]

On Mon 26-12-16 11:33:10, Paul Moore wrote:
> On Fri, Dec 23, 2016 at 9:17 AM, Paul Moore <paul@paul-moore.com> wrote:
> > On Fri, Dec 23, 2016 at 8:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> On Thu 22-12-16 18:18:36, Paul Moore wrote:
> >>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> >>> > Audit tree code was happily adding new notification marks while holding
> >>> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
> >>> > lead to sleeping while holding a spinlock, deadlocks due to lock
> >>> > inversion, and probably other fun. Fix the problem by acquiring
> >>> > group->mark_mutex earlier.
> >>> >
> >>> > CC: Paul Moore <paul@paul-moore.com>
> >>> > Signed-off-by: Jan Kara <jack@suse.cz>
> >>> > ---
> >>> >  kernel/audit_tree.c | 13 +++++++++++--
> >>> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >>>
> >>> [SIDE NOTE: this patch explains your comments and my earlier concern
> >>> about the locked/unlocked variants of fsnotify_add_mark() in
> >>> untag_chunk()]
> >>>
> >>> Ouch.  Thanks for catching this ... what is your goal with these
> >>> patches, are you targeting this as a fix during the v4.10-rcX cycle?
> >>> If not, any objections if I pull this patch into the audit tree and
> >>> send this to Linus during the v4.10-rcX cycle (assuming it passes
> >>> testing, yadda yadda)?
> >>
> >> Sure, go ahead. I plan these patches for the next merge window. So I can
> >> rebase the series once you merge audit fixes...
> >
> > Okay, great.  I'll merge this patch in the audit/stable-4.10 branch
> > for Linus but there will likely be some delays due to
> > holidays/vacation on my end.
> >
> > Thanks again for your help fixing this, I really appreciate it.
> 
> I merged this patch, as well as the "Remove fsnotify_duplicate_mark()"
> patch (to make things cleaner when merging this patch) and did a quick
> test using the audit-testsuite ... the test hung on the "file_create"
> tests.  Unfortunately, I'm traveling right now for the holidays and
> will not likely have a chance to debug this much further until after
> the new year, but I thought I would mention it in case you had some
> time to look into this failure.
> 
> For reference, here is the audit-testsuite again:
> 
> * https://github.com/linux-audit/audit-testsuite
> 
> ... and if you have a Fedora test system, here is the Rawhide kernel I
> used to test (it is basically my kernel-secnext test kernel with those
> two patches mentioned above added on top):
> 
> * https://copr.fedorainfracloud.org/coprs/pcmoore/kernel-testing/build/492386

So I found where the problem was. Attached is a new version of the patch.
Tests from audit-testsuite fail for me but do not hang anymore. I guess the
failing is because I don't have audit or selinux configured in any way and
I'm using SUSE I guess (if there's some easy way to do that, I'd be
interested) - runtests.pl complains that I have to be root although I am...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-audit-Fix-sleep-in-atomic.patch --]
[-- Type: text/x-patch, Size: 3764 bytes --]

>From 300e01735a0b5dbf9cd32c932b77d9dc77258489 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 14 Dec 2016 14:40:05 +0100
Subject: [PATCH] audit: Fix sleep in atomic

Audit tree code was happily adding new notification marks while holding
spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
lead to sleeping while holding a spinlock, deadlocks due to lock
inversion, and probably other fun. Fix the problem by acquiring
group->mark_mutex earlier.

CC: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index f3130eb0a4bd..7b44195da81b 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -231,9 +231,11 @@ static void untag_chunk(struct node *p)
 	if (size)
 		new = alloc_chunk(size);
 
+	mutex_lock(&entry->group->mark_mutex);
 	spin_lock(&entry->lock);
 	if (chunk->dead || !entry->inode) {
 		spin_unlock(&entry->lock);
+		mutex_unlock(&entry->group->mark_mutex);
 		if (new)
 			free_chunk(new);
 		goto out;
@@ -251,6 +253,7 @@ static void untag_chunk(struct node *p)
 		list_del_rcu(&chunk->hash);
 		spin_unlock(&hash_lock);
 		spin_unlock(&entry->lock);
+		mutex_unlock(&entry->group->mark_mutex);
 		fsnotify_destroy_mark(entry, audit_tree_group);
 		goto out;
 	}
@@ -258,7 +261,8 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
+	if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
+				     NULL, 1)) {
 		fsnotify_put_mark(&new->mark);
 		goto Fallback;
 	}
@@ -292,6 +296,7 @@ static void untag_chunk(struct node *p)
 		owner->root = new;
 	spin_unlock(&hash_lock);
 	spin_unlock(&entry->lock);
+	mutex_unlock(&entry->group->mark_mutex);
 	fsnotify_destroy_mark(entry, audit_tree_group);
 	fsnotify_put_mark(&new->mark);	/* drop initial reference */
 	goto out;
@@ -308,6 +313,7 @@ static void untag_chunk(struct node *p)
 	put_tree(owner);
 	spin_unlock(&hash_lock);
 	spin_unlock(&entry->lock);
+	mutex_unlock(&entry->group->mark_mutex);
 out:
 	fsnotify_put_mark(entry);
 	spin_lock(&hash_lock);
@@ -385,17 +391,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	chunk_entry = &chunk->mark;
 
+	mutex_lock(&old_entry->group->mark_mutex);
 	spin_lock(&old_entry->lock);
 	if (!old_entry->inode) {
 		/* old_entry is being shot, lets just lie */
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(old_entry);
 		free_chunk(chunk);
 		return -ENOENT;
 	}
 
-	if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
+	if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
+				     old_entry->inode, NULL, 1)) {
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
 		return -ENOSPC;
@@ -411,6 +421,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		chunk->dead = 1;
 		spin_unlock(&chunk_entry->lock);
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 
 		fsnotify_destroy_mark(chunk_entry, audit_tree_group);
 
@@ -443,6 +454,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	spin_unlock(&hash_lock);
 	spin_unlock(&chunk_entry->lock);
 	spin_unlock(&old_entry->lock);
+	mutex_unlock(&old_entry->group->mark_mutex);
 	fsnotify_destroy_mark(old_entry, audit_tree_group);
 	fsnotify_put_mark(chunk_entry);	/* drop initial reference */
 	fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/22] audit: Abstract hash key handling
  2016-12-23 14:13       ` Paul Moore
@ 2017-01-03 17:34         ` Jan Kara
  2017-01-05  2:06           ` Paul Moore
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2017-01-03 17:34 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit

On Fri 23-12-16 09:13:55, Paul Moore wrote:
> On Fri, Dec 23, 2016 at 8:27 AM, Jan Kara <jack@suse.cz> wrote:
> > On Thu 22-12-16 18:27:40, Paul Moore wrote:
> >> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> >> > Audit tree currently uses inode pointer as a key into the hash table.
> >> > Getting that from notification mark will be somewhat more difficult with
> >> > coming fsnotify changes and there's no reason we really have to use the
> >> > inode pointer. So abstract getting of hash key from the audit chunk and
> >> > inode so that we can switch to a different key easily later.
> >> >
> >> > CC: Paul Moore <paul@paul-moore.com>
> >> > Signed-off-by: Jan Kara <jack@suse.cz>
> >> > ---
> >> >  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
> >> >  1 file changed, 28 insertions(+), 11 deletions(-)
> >>
> >> I have no objections with this patch in particular, but in patch 8,
> >> are you certain that inode_to_key() and chunk_to_key() will continue
> >> to return the same key value?
> >
> > Yes, that's the intention. Or better in that patch the key will no longer
> > be inode pointer but instead the fsnotify_list pointer. But still it would
> > match for chunks attached to an inode and inode itself so comparison
> > results should stay the same.
> 
> My apologies, I probably should have been more clear.
> 
> Yes, I think we are all in agreement that the *_to_key() functions
> need to return a consistent value for the same object.  My concern is
> that in patch 8 these functions are using different variables (granted
> they may contain the same value, and therefore evaluate to the same
> key) and I worry that there is a possibility of the two variables
> taking on different values and breaking the hash.  What guarantees
> exist that these values will be the same?  Are there any safeguards to
> prevent future patches from accidentally sidestepping these
> guarantees?

Ah, OK, so this is more about patch 8 than patch 6. So far audit uses inode
pointer as a key - now with patch 8, there is a fsnotify_mark_list attached
to an inode if and only if there is any fsnotify_mark for that inode and
both inode->i_fsnotify_marks (used as a key in inode_to_key()) and
mark->obj_list_head (used as a key in chunk_to_key()) point to it. So keys
for an inode and chunk match if and only if the fsnotify mark in the chunk
is attached to the inode - the same as before patch 8.

The only reason why I changed audit to use a different pointer for the key
is that you need some lock protection to do mark->obj_list_head->inode
dereference and this seemed the easiest. Actually now that all the lifetime
rules have worked out, I can see we can actually use inode pointer as a key
relatively easily since mark->obj_list_head is stable once you hold a mark
reference so locking would be only intermediate step until this gets
established in the series. Would you prefer me to do that?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2017-01-02 18:21           ` Jan Kara
@ 2017-01-03 21:11               ` Paul Moore
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Moore @ 2017-01-03 21:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Mon, Jan 2, 2017 at 1:21 PM, Jan Kara <jack@suse.cz> wrote:
> So I found where the problem was. Attached is a new version of the patch.
> Tests from audit-testsuite fail for me but do not hang anymore. I guess the
> failing is because I don't have audit or selinux configured in any way and
> I'm using SUSE I guess (if there's some easy way to do that, I'd be
> interested) - runtests.pl complains that I have to be root although I am...

I've never tried running the tests on SUSE, but if audit and SELinux
are in some undetermined state, then I can only imagine what wierd
test results you would get.  I'm building a test kernel as I type
this, I'll report back when I have some results.

Also, while I'm sure you've heard this before (and likely already know
better), please send patches inline, it makes review/commenting much
easier.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
@ 2017-01-03 21:11               ` Paul Moore
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Moore @ 2017-01-03 21:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, linux-audit, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi

On Mon, Jan 2, 2017 at 1:21 PM, Jan Kara <jack@suse.cz> wrote:
> So I found where the problem was. Attached is a new version of the patch.
> Tests from audit-testsuite fail for me but do not hang anymore. I guess the
> failing is because I don't have audit or selinux configured in any way and
> I'm using SUSE I guess (if there's some easy way to do that, I'd be
> interested) - runtests.pl complains that I have to be root although I am...

I've never tried running the tests on SUSE, but if audit and SELinux
are in some undetermined state, then I can only imagine what wierd
test results you would get.  I'm building a test kernel as I type
this, I'll report back when I have some results.

Also, while I'm sure you've heard this before (and likely already know
better), please send patches inline, it makes review/commenting much
easier.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2017-01-03 21:11               ` Paul Moore
  (?)
@ 2017-01-04  8:50               ` Jan Kara
  2017-01-05  2:14                 ` Paul Moore
  -1 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2017-01-04  8:50 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit

On Tue 03-01-17 16:11:16, Paul Moore wrote:
> On Mon, Jan 2, 2017 at 1:21 PM, Jan Kara <jack@suse.cz> wrote:
> > So I found where the problem was. Attached is a new version of the patch.
> > Tests from audit-testsuite fail for me but do not hang anymore. I guess the
> > failing is because I don't have audit or selinux configured in any way and
> > I'm using SUSE I guess (if there's some easy way to do that, I'd be
> > interested) - runtests.pl complains that I have to be root although I am...
> 
> I've never tried running the tests on SUSE, but if audit and SELinux
> are in some undetermined state, then I can only imagine what wierd
> test results you would get.

Well, the state is well determined - nothing is installed ;) I was kind of
hoping kernel support would be enough but apparently the testsuite needs
some userspace installed & configured as well.

> I'm building a test kernel as I type this, I'll report back when I have
> some results.

Thanks!

> Also, while I'm sure you've heard this before (and likely already know
> better), please send patches inline, it makes review/commenting much
> easier.

Actually, I haven't heard this for quite a long time :) and I myself
prefer attached patches (with text/plain attachment type) when they are
in a reply to another email - they are easier to extract and at least my
mail client automatically inlines them when I hit reply... Arguably the
best of both worlds is to use git-send-email with properly set threading
but I tend to forget about that option.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2016-12-26 15:11   ` Amir Goldstein
@ 2017-01-04  9:03     ` Jan Kara
  2017-01-04 10:50       ` Amir Goldstein
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2017-01-04  9:03 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Mon 26-12-16 17:11:29, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
> > from userspace so that the whole notification subsystem is not blocked
> > during that time. This patch provides a framework for safely getting
> > mark reference for a mark found in the object list which pins the mark
> > in that list. We can then drop fsnotify_mark_srcu, wait for userspace
> > response and then safely continue iteration of the object list once we
> > reaquire fsnotify_mark_srcu.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> ...
> > +       /*
> > +        * Now that both marks are pinned by refcount we can drop SRCU lock.
> > +        * Marks can still be removed from the list but because of refcount
> > +        * they cannot be destroyed and we can safely resume the list iteration
> > +        * once userspace returns.
> > +        */
> 
> Sorry, forgot to comment on this.
> "Marks can still be removed from the list ...
> ... and we can safely resume the list iteration"
> 
> I suppose you are plannig to get the mechanics right, by replacing
> hlist_del_init() with just __hlist_del() ?? but this sentence is confusing.
> Usually, it wouldn't be safe to resume iteration if items may have been removed,
> so perhaps rephrase or clarify.

The point is that marks that have refcount elevated cannot be even removed
from the list we iterate (that happens only once the last reference is
dropped). That is the reason why we are safe to resume the iteration...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/22] fanotify: Release SRCU lock when waiting for userspace response
  2016-12-26 15:22   ` Amir Goldstein
@ 2017-01-04  9:05     ` Jan Kara
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Kara @ 2017-01-04  9:05 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Mon 26-12-16 17:22:58, Amir Goldstein wrote:
> >  #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> >  static int fanotify_get_response(struct fsnotify_group *group,
> > -                                struct fanotify_perm_event_info *event)
> > +                                struct fsnotify_mark *inode_mark,
> > +                                struct fsnotify_mark *vfsmount_mark,
> > +                                struct fanotify_perm_event_info *event,
> > +                                int *srcu_idx)
> >  {
> >         int ret;
> >
> > @@ -69,6 +72,15 @@ static int fanotify_get_response(struct fsnotify_group *group,
> >
> >         wait_event(group->fanotify_data.access_waitq, event->response);
> >
> > +       if (!fsnotify_prepare_user_wait(inode_mark, vfsmount_mark, srcu_idx)) {
> 
> Since it is not clear for reader of this code the conditions where
> fsnotify_prepare_user_wait() can fail, a comment here would be nice
> to explain the choice of ALLOW

Good point. Will add.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/22] fsnotify: Inline fsnotify_clear_{inode|vfsmount|_mark_group()
  2016-12-26 16:57   ` Amir Goldstein
@ 2017-01-04  9:28     ` Jan Kara
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Kara @ 2017-01-04  9:28 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Mon 26-12-16 18:57:38, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Inline these helpers as they are very thin. We still keep them as we
> > don't want to expose details about how list type is determined.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> This patch looks good, but see comment below about suggested extra cleanup.
> 
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>

Thanks.

> > +/* run all the marks in a group, and clear all of the vfsmount marks */
> > +static inline void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
> > +{
> > +       fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_LIST_TYPE_VFSMOUNT);
> 
> Suggestion for extra cleanup while at it:
> IMO, the choice of name fsnotify_clear_marks_by_group_flags() was a
> bad choice, because
> 1. it sounds like "by group->flags" and its not
> 2. it is presented as a generic helper, but it is never likely to be
> used by anything other then
>     those 2 call sites for FAN_MARK_FLUSH api
> 
> So given the above, I think it would make more sense to name the function
> fsnotify_clear_marks_by_group_and_type() for what it really is.

I agree on the bad choice of the name. I will probably add another cleanup
patch that will just name the function fsnotify_clear_marks_by_group() and
cleanup comments about that function in several places.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()
  2016-12-26 14:15   ` Amir Goldstein
@ 2017-01-04 10:28     ` Jan Kara
  2017-01-04 12:22       ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2017-01-04 10:28 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Mon 26-12-16 16:15:23, Amir Goldstein wrote:
> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Currently we queue mark into a list of marks for destruction in
> > __fsnotify_free_mark() and keep last mark reference dangling. After the
> > worker waits for SRCU period, it drops the last reference to the mark
> > which frees it. This scheme has the disadvantage that if we hold
> > reference to mark and drop and reacquire SRCU lock, the mark can get
> > freed immediately which is slightly inconvenient and we will need to
> > avoid this in the future.
> >
> > Move to a scheme where queueing of mark into a list of marks for
> > destruction happens when the last reference to the mark is dropped. Also
> > drop reference to the mark held by group list already when mark is
> > removed from that list instead of dropping it only from the destruction
> > worker.
> >
> 
> The BEFORE section refers to what SRCU protects, which this patch
> slightly changes. Can you please add to AFTER section, what is protected
> by SRCU after this patch.

Ah, the changelog is a victim of me reshuffling patches. It was written in
a situation when mark reference did not protect from any list removal yet.
I'll rewrite it. Thanks for noticing.

> IIUC, SRCU protects from freeing the mark,
> but it does not protect from removing mark from group list, so after
> drop and reacquire SRCU with elevated mark refcount, mark can find
> itself not on any list.

You are correct that SRCU only protects from freeing the mark and it also
makes sure the inode / vfsmount list traversal is fine (because of using
_rcu list primitives during list manipulation) although the mark can be
removed from that list in parallel. It does not give any guarantee for
group list as you correctly note. But mark reference pins mark in the inode
/ vfsmount list (after patch 10), so once we have that reference we are
sure mark stays in inode / vfsmount list.

> For example in inotify_handle_event() when calling fsnotify_destroy_mark()
> for IN_ONESHOT mark.

So in that place we don't hold any mark reference ourselves so mark can
indeed get removed from all the lists - but we still hold SRCU lock so we
are sure mark cannot get freed and inode list traversal can still continue.

> > @@ -537,20 +527,13 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
> >
> >         return ret;
> >  err:
> > -       mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
> > +       mark->flags &= ~(FSNOTIFY_MARK_FLAG_ALIVE |
> > +                        FSNOTIFY_MARK_FLAG_ATTACHED);
> >         list_del_init(&mark->g_list);
> > -       fsnotify_put_group(group);
> 
> fsnotify_put_group() is removed by this patch here and not added anywhere else
> nor is any fsnotify_get_group() call removed. Is this a leak or maybe
> fixed later on??

Note that I also removed mark->group = NULL below. So fsnotify_put_mark()
and followup mark destruction will take care of properly freeing group
reference. There's no reason to special-case this in
fsnotify_add_mark_locked()...

> 
> > -       mark->group = NULL;
> >         atomic_dec(&group->num_marks);
> > -
> >         spin_unlock(&mark->lock);
> >
> > -       spin_lock(&destroy_lock);
> > -       list_add(&mark->g_list, &destroy_list);
> > -       spin_unlock(&destroy_lock);
> > -       queue_delayed_work(system_unbound_wq, &reaper_work,
> > -                               FSNOTIFY_REAPER_DELAY);
> > -
> > +       fsnotify_put_mark(mark);
> >         return ret;
> >  }

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2017-01-04  9:03     ` Jan Kara
@ 2017-01-04 10:50       ` Amir Goldstein
  2017-01-04 11:45         ` Jan Kara
  0 siblings, 1 reply; 80+ messages in thread
From: Amir Goldstein @ 2017-01-04 10:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Wed, Jan 4, 2017 at 11:03 AM, Jan Kara <jack@suse.cz> wrote:
> On Mon 26-12-16 17:11:29, Amir Goldstein wrote:
>> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
>> > from userspace so that the whole notification subsystem is not blocked
>> > during that time. This patch provides a framework for safely getting
>> > mark reference for a mark found in the object list which pins the mark
>> > in that list. We can then drop fsnotify_mark_srcu, wait for userspace
>> > response and then safely continue iteration of the object list once we
>> > reaquire fsnotify_mark_srcu.
>> >
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> ...
>> > +       /*
>> > +        * Now that both marks are pinned by refcount we can drop SRCU lock.
>> > +        * Marks can still be removed from the list but because of refcount
>> > +        * they cannot be destroyed and we can safely resume the list iteration
>> > +        * once userspace returns.
>> > +        */
>>
>> Sorry, forgot to comment on this.
>> "Marks can still be removed from the list ...
>> ... and we can safely resume the list iteration"
>>
>> I suppose you are plannig to get the mechanics right, by replacing
>> hlist_del_init() with just __hlist_del() ?? but this sentence is confusing.
>> Usually, it wouldn't be safe to resume iteration if items may have been removed,
>> so perhaps rephrase or clarify.
>
> The point is that marks that have refcount elevated cannot be even removed
> from the list we iterate (that happens only once the last reference is
> dropped). That is the reason why we are safe to resume the iteration...
>

Well, if they "cannot be even removed" then the comment above that says
"Marks can still be removed... but cannot be destroyed" is inaccurate or
at the very least confusing.

Amir.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2017-01-04 10:50       ` Amir Goldstein
@ 2017-01-04 11:45         ` Jan Kara
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Kara @ 2017-01-04 11:45 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Wed 04-01-17 12:50:48, Amir Goldstein wrote:
> On Wed, Jan 4, 2017 at 11:03 AM, Jan Kara <jack@suse.cz> wrote:
> > On Mon 26-12-16 17:11:29, Amir Goldstein wrote:
> >> On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> >> > fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
> >> > from userspace so that the whole notification subsystem is not blocked
> >> > during that time. This patch provides a framework for safely getting
> >> > mark reference for a mark found in the object list which pins the mark
> >> > in that list. We can then drop fsnotify_mark_srcu, wait for userspace
> >> > response and then safely continue iteration of the object list once we
> >> > reaquire fsnotify_mark_srcu.
> >> >
> >> > Signed-off-by: Jan Kara <jack@suse.cz>
> >> > ---
> >> ...
> >> > +       /*
> >> > +        * Now that both marks are pinned by refcount we can drop SRCU lock.
> >> > +        * Marks can still be removed from the list but because of refcount
> >> > +        * they cannot be destroyed and we can safely resume the list iteration
> >> > +        * once userspace returns.
> >> > +        */
> >>
> >> Sorry, forgot to comment on this.
> >> "Marks can still be removed from the list ...
> >> ... and we can safely resume the list iteration"
> >>
> >> I suppose you are plannig to get the mechanics right, by replacing
> >> hlist_del_init() with just __hlist_del() ?? but this sentence is confusing.
> >> Usually, it wouldn't be safe to resume iteration if items may have been removed,
> >> so perhaps rephrase or clarify.
> >
> > The point is that marks that have refcount elevated cannot be even removed
> > from the list we iterate (that happens only once the last reference is
> > dropped). That is the reason why we are safe to resume the iteration...
> >
> 
> Well, if they "cannot be even removed" then the comment above that says
> "Marks can still be removed... but cannot be destroyed" is inaccurate or
> at the very least confusing.

Good spotting. The comment is just wrong (it used to be like that in
previous version of the patch set but not anymore). I'll fix it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()
  2017-01-04 10:28     ` Jan Kara
@ 2017-01-04 12:22       ` Jan Kara
  0 siblings, 0 replies; 80+ messages in thread
From: Jan Kara @ 2017-01-04 12:22 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Wed 04-01-17 11:28:21, Jan Kara wrote:
> On Mon 26-12-16 16:15:23, Amir Goldstein wrote:
> > On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > > Currently we queue mark into a list of marks for destruction in
> > > __fsnotify_free_mark() and keep last mark reference dangling. After the
> > > worker waits for SRCU period, it drops the last reference to the mark
> > > which frees it. This scheme has the disadvantage that if we hold
> > > reference to mark and drop and reacquire SRCU lock, the mark can get
> > > freed immediately which is slightly inconvenient and we will need to
> > > avoid this in the future.
> > >
> > > Move to a scheme where queueing of mark into a list of marks for
> > > destruction happens when the last reference to the mark is dropped. Also
> > > drop reference to the mark held by group list already when mark is
> > > removed from that list instead of dropping it only from the destruction
> > > worker.
> > >
> > 
> > The BEFORE section refers to what SRCU protects, which this patch
> > slightly changes. Can you please add to AFTER section, what is protected
> > by SRCU after this patch.
> 
> Ah, the changelog is a victim of me reshuffling patches. It was written in
> a situation when mark reference did not protect from any list removal yet.
> I'll rewrite it. Thanks for noticing.

So in the end I've realized this patch needs to go before patch 10 for SRCU
guarantees to work - otherwise we could free mark under the hands of
process iterating inode list under SRCU after patch 10 and before this
patch. And then the changelog is correct. So I've just reshuffled these
patches.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure
  2016-12-23 13:34     ` Jan Kara
@ 2017-01-04 13:38       ` Jan Kara
  2017-01-04 15:29         ` Amir Goldstein
  0 siblings, 1 reply; 80+ messages in thread
From: Jan Kara @ 2017-01-04 13:38 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Fri 23-12-16 14:34:07, Jan Kara wrote:
> On Fri 23-12-16 07:48:43, Amir Goldstein wrote:
> > On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
> > > Currently notification marks are attached to object (inode or vfsmnt) by
> > > a hlist_head in the object. The list is also protected by a spinlock in
> > > the object. So while there is any mark attached to the list of marks,
> > > the object must be pinned in memory (and thus e.g. last iput() deleting
> > > inode cannot happen). Also for list iteration in fsnotify() to work, we
> > > must hold fsnotify_mark_srcu lock so that mark itself and
> > > mark->obj_list.next cannot get freed. Thus we are required to wait for
> > > response to fanotify events from userspace process with
> > > fsnotify_mark_srcu lock held. That causes issues when userspace process
> > > is buggy and does not reply to some event - basically the whole
> > > notification subsystem gets eventually stuck.
> > >
> > > So to be able to drop fsnotify_mark_srcu lock while waiting for
> > > response, we have to pin the mark in memory and make sure it stays in
> > > the object list (as removing the mark waiting for response could lead to
> > > lost notification events for groups later in the list). However we don't
> > > want inode reclaim to block on such mark as that would lead to system
> > > just locking up elsewhere.
> > >
> > > This commit tries to pave a way towards solving these conflicting
> > > lifetime needs. Instead of anchoring the list of marks directly in the
> > > object, we anchor it in a dedicated structure (fsnotify_mark_list) and
> > > just point to that structure from the object. Also the list is protected
> > > by a spinlock contained in that structure. With this, we can detach
> > > notification marks from object without having to modify the list itself.
> > >
> > 
> > The structural change looks very good to.
> > It makes the code much easier to manage IMO.
> > 
> > I am only half way though this big change, but I wanted to make one meta
> > comment.
> > 
> > I have a problem with the choice of naming for the new struct.
> > 'list' is really an overloaded term and the use of 'list' as a name of
> > a class that
> > contains a list head makes for some really confusing constructs like
> > list->list and mark->obj_list_head which is not a list_head struct.
> 
> OK, I'll think about better naming. I agree it may be slightly confusing.

So how about naming the type fsnotify_mark_connector? We can use 'conn' as
a name for local variables. I think that is not as overloaded as 'list'
and it describes that it is a structure used for connecting marks with
inode / vfsmount. Would that make things more comprehensive for you?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure
  2017-01-04 13:38       ` Jan Kara
@ 2017-01-04 15:29         ` Amir Goldstein
  0 siblings, 0 replies; 80+ messages in thread
From: Amir Goldstein @ 2017-01-04 15:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Lino Sanfilippo, Miklos Szeredi, Paul Moore

On Wed, Jan 4, 2017 at 3:38 PM, Jan Kara <jack@suse.cz> wrote:
> On Fri 23-12-16 14:34:07, Jan Kara wrote:
>> On Fri 23-12-16 07:48:43, Amir Goldstein wrote:
>> > On Thu, Dec 22, 2016 at 11:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > > Currently notification marks are attached to object (inode or vfsmnt) by
>> > > a hlist_head in the object. The list is also protected by a spinlock in
>> > > the object. So while there is any mark attached to the list of marks,
>> > > the object must be pinned in memory (and thus e.g. last iput() deleting
>> > > inode cannot happen). Also for list iteration in fsnotify() to work, we
>> > > must hold fsnotify_mark_srcu lock so that mark itself and
>> > > mark->obj_list.next cannot get freed. Thus we are required to wait for
>> > > response to fanotify events from userspace process with
>> > > fsnotify_mark_srcu lock held. That causes issues when userspace process
>> > > is buggy and does not reply to some event - basically the whole
>> > > notification subsystem gets eventually stuck.
>> > >
>> > > So to be able to drop fsnotify_mark_srcu lock while waiting for
>> > > response, we have to pin the mark in memory and make sure it stays in
>> > > the object list (as removing the mark waiting for response could lead to
>> > > lost notification events for groups later in the list). However we don't
>> > > want inode reclaim to block on such mark as that would lead to system
>> > > just locking up elsewhere.
>> > >
>> > > This commit tries to pave a way towards solving these conflicting
>> > > lifetime needs. Instead of anchoring the list of marks directly in the
>> > > object, we anchor it in a dedicated structure (fsnotify_mark_list) and
>> > > just point to that structure from the object. Also the list is protected
>> > > by a spinlock contained in that structure. With this, we can detach
>> > > notification marks from object without having to modify the list itself.
>> > >
>> >
>> > The structural change looks very good to.
>> > It makes the code much easier to manage IMO.
>> >
>> > I am only half way though this big change, but I wanted to make one meta
>> > comment.
>> >
>> > I have a problem with the choice of naming for the new struct.
>> > 'list' is really an overloaded term and the use of 'list' as a name of
>> > a class that
>> > contains a list head makes for some really confusing constructs like
>> > list->list and mark->obj_list_head which is not a list_head struct.
>>
>> OK, I'll think about better naming. I agree it may be slightly confusing.
>
> So how about naming the type fsnotify_mark_connector? We can use 'conn' as
> a name for local variables. I think that is not as overloaded as 'list'
> and it describes that it is a structure used for connecting marks with
> inode / vfsmount. Would that make things more comprehensive for you?
>

connector sounds good.
As long as it is not list->list it works for me ;-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/22] audit: Abstract hash key handling
  2017-01-03 17:34         ` Jan Kara
@ 2017-01-05  2:06           ` Paul Moore
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Moore @ 2017-01-05  2:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Tue, Jan 3, 2017 at 12:34 PM, Jan Kara <jack@suse.cz> wrote:
> On Fri 23-12-16 09:13:55, Paul Moore wrote:
>> On Fri, Dec 23, 2016 at 8:27 AM, Jan Kara <jack@suse.cz> wrote:
>> > On Thu 22-12-16 18:27:40, Paul Moore wrote:
>> >> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> >> > Audit tree currently uses inode pointer as a key into the hash table.
>> >> > Getting that from notification mark will be somewhat more difficult with
>> >> > coming fsnotify changes and there's no reason we really have to use the
>> >> > inode pointer. So abstract getting of hash key from the audit chunk and
>> >> > inode so that we can switch to a different key easily later.
>> >> >
>> >> > CC: Paul Moore <paul@paul-moore.com>
>> >> > Signed-off-by: Jan Kara <jack@suse.cz>
>> >> > ---
>> >> >  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
>> >> >  1 file changed, 28 insertions(+), 11 deletions(-)
>> >>
>> >> I have no objections with this patch in particular, but in patch 8,
>> >> are you certain that inode_to_key() and chunk_to_key() will continue
>> >> to return the same key value?
>> >
>> > Yes, that's the intention. Or better in that patch the key will no longer
>> > be inode pointer but instead the fsnotify_list pointer. But still it would
>> > match for chunks attached to an inode and inode itself so comparison
>> > results should stay the same.
>>
>> My apologies, I probably should have been more clear.
>>
>> Yes, I think we are all in agreement that the *_to_key() functions
>> need to return a consistent value for the same object.  My concern is
>> that in patch 8 these functions are using different variables (granted
>> they may contain the same value, and therefore evaluate to the same
>> key) and I worry that there is a possibility of the two variables
>> taking on different values and breaking the hash.  What guarantees
>> exist that these values will be the same?  Are there any safeguards to
>> prevent future patches from accidentally sidestepping these
>> guarantees?
>
> Ah, OK, so this is more about patch 8 than patch 6. So far audit uses inode
> pointer as a key - now with patch 8, there is a fsnotify_mark_list attached
> to an inode if and only if there is any fsnotify_mark for that inode and
> both inode->i_fsnotify_marks (used as a key in inode_to_key()) and
> mark->obj_list_head (used as a key in chunk_to_key()) point to it. So keys
> for an inode and chunk match if and only if the fsnotify mark in the chunk
> is attached to the inode - the same as before patch 8.
>
> The only reason why I changed audit to use a different pointer for the key
> is that you need some lock protection to do mark->obj_list_head->inode
> dereference and this seemed the easiest. Actually now that all the lifetime
> rules have worked out, I can see we can actually use inode pointer as a key
> relatively easily since mark->obj_list_head is stable once you hold a mark
> reference so locking would be only intermediate step until this gets
> established in the series. Would you prefer me to do that?

Unless you can think of any reason why that would be dangerous, I
think it would be more obvious and easier to maintain as a result.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/22] audit: Fix sleep in atomic
  2017-01-04  8:50               ` Jan Kara
@ 2017-01-05  2:14                 ` Paul Moore
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Moore @ 2017-01-05  2:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit

On Wed, Jan 4, 2017 at 3:50 AM, Jan Kara <jack@suse.cz> wrote:
> On Tue 03-01-17 16:11:16, Paul Moore wrote:
>> On Mon, Jan 2, 2017 at 1:21 PM, Jan Kara <jack@suse.cz> wrote:
>> > So I found where the problem was. Attached is a new version of the patch.
>> > Tests from audit-testsuite fail for me but do not hang anymore. I guess the
>> > failing is because I don't have audit or selinux configured in any way and
>> > I'm using SUSE I guess (if there's some easy way to do that, I'd be
>> > interested) - runtests.pl complains that I have to be root although I am...
>>
>> I've never tried running the tests on SUSE, but if audit and SELinux
>> are in some undetermined state, then I can only imagine what wierd
>> test results you would get.
>
> Well, the state is well determined - nothing is installed ;) I was kind of
> hoping kernel support would be enough but apparently the testsuite needs
> some userspace installed & configured as well.

Heh, yes :)  If nothing is installed you likely missing all the audit
userspace tools and auditd probably isn't running; either case would
cause massive (complete?) failures in the testsuite.

>> I'm building a test kernel as I type this, I'll report back when I have
>> some results.
>
> Thanks!

Good news - I looped the testsuite a couple thousand times this
afternoon and the VM was still standing afterwards (on a 4.10-rc2 base
too!).

Let me take a closer look at your revised patch tomorrow (it is
getting late for me) and if all is well I'll send it up to Linus.

>> Also, while I'm sure you've heard this before (and likely already know
>> better), please send patches inline, it makes review/commenting much
>> easier.
>
> Actually, I haven't heard this for quite a long time :) and I myself
> prefer attached patches (with text/plain attachment type) when they are
> in a reply to another email - they are easier to extract and at least my
> mail client automatically inlines them when I hit reply... Arguably the
> best of both worlds is to use git-send-email with properly set threading
> but I tend to forget about that option.

You probably haven't heard this in some time because everyone else
isn't as cranky and stubborn as me ;)  It isn't a big deal for one
small patch with only a few interested parties, but when you get
several people discussing the patch it can be very handy to have it
inline.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2017-01-05  2:14 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-22  9:15 [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Jan Kara
2016-12-22  9:15 ` [PATCH 01/22] fsnotify: Remove unnecessary tests when showing fdinfo Jan Kara
2016-12-22 12:59   ` Amir Goldstein
2016-12-22 15:16     ` Jan Kara
2016-12-22 15:54       ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 02/22] inotify: Remove inode pointers from debug messages Jan Kara
2016-12-22 15:31   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 03/22] fanotify: Move recalculation of inode / vfsmount mask under mark_mutex Jan Kara
2016-12-22 16:27   ` Amir Goldstein
2016-12-22 17:31     ` Jan Kara
2016-12-22 19:08       ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark() Jan Kara
2016-12-22 23:13   ` Paul Moore
2016-12-23 13:22     ` Jan Kara
2016-12-23 14:01       ` Paul Moore
2016-12-22  9:15 ` [PATCH 05/22] audit: Fix sleep in atomic Jan Kara
2016-12-22 23:18   ` Paul Moore
2016-12-23 13:24     ` Jan Kara
2016-12-23 14:17       ` Paul Moore
2016-12-26 16:33         ` Paul Moore
2017-01-02 18:21           ` Jan Kara
2017-01-03 21:11             ` Paul Moore
2017-01-03 21:11               ` Paul Moore
2017-01-04  8:50               ` Jan Kara
2017-01-05  2:14                 ` Paul Moore
2016-12-22  9:15 ` [PATCH 06/22] audit: Abstract hash key handling Jan Kara
2016-12-22 23:27   ` Paul Moore
2016-12-23 13:27     ` Jan Kara
2016-12-23 14:13       ` Paul Moore
2017-01-03 17:34         ` Jan Kara
2017-01-05  2:06           ` Paul Moore
2016-12-22  9:15 ` [PATCH 07/22] fsnotify: Update comments Jan Kara
2016-12-23  4:45   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 08/22] fsnotify: Attach marks to object via dedicated head structure Jan Kara
2016-12-23  5:48   ` Amir Goldstein
2016-12-23 13:34     ` Jan Kara
2017-01-04 13:38       ` Jan Kara
2017-01-04 15:29         ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 09/22] inotify: Do not drop mark reference under idr_lock Jan Kara
2016-12-23  8:04   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 10/22] fsnotify: Detach mark from object list when last reference is dropped Jan Kara
2016-12-23 10:51   ` Amir Goldstein
2016-12-23 13:42     ` Jan Kara
2016-12-22  9:15 ` [PATCH 11/22] fsnotify: Remove special handling of mark destruction on group shutdown Jan Kara
2016-12-23 12:12   ` Amir Goldstein
2016-12-23 13:31     ` Jan Kara
2016-12-22  9:15 ` [PATCH 12/22] fsnotify: Move queueing of mark for destruction into fsnotify_put_mark() Jan Kara
2016-12-26 14:15   ` Amir Goldstein
2017-01-04 10:28     ` Jan Kara
2017-01-04 12:22       ` Jan Kara
2016-12-22  9:15 ` [PATCH 13/22] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Jan Kara
2016-12-26 15:01   ` Amir Goldstein
2016-12-26 15:11   ` Amir Goldstein
2017-01-04  9:03     ` Jan Kara
2017-01-04 10:50       ` Amir Goldstein
2017-01-04 11:45         ` Jan Kara
2016-12-26 18:37   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 14/22] fsnotify: Pass SRCU index into handle_event handler Jan Kara
2016-12-26 15:13   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 15/22] fanotify: Release SRCU lock when waiting for userspace response Jan Kara
2016-12-26 15:22   ` Amir Goldstein
2017-01-04  9:05     ` Jan Kara
2016-12-22  9:15 ` [PATCH 16/22] fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked() Jan Kara
2016-12-26 16:42   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 17/22] fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask() Jan Kara
2016-12-26 16:44   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 18/22] fsnotify: Inline fsnotify_clear_{inode|vfsmount|_mark_group() Jan Kara
2016-12-26 16:57   ` Amir Goldstein
2017-01-04  9:28     ` Jan Kara
2016-12-22  9:15 ` [PATCH 19/22] fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark() Jan Kara
2016-12-26 17:14   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 20/22] fsnotify: Drop inode_mark.c Jan Kara
2016-12-26 17:15   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 21/22] fsnotify: Add group pointer in fsnotify_init_mark() Jan Kara
2016-12-26 17:34   ` Amir Goldstein
2016-12-22  9:15 ` [PATCH 22/22] fsnotify: Move ->free_mark callback to fsnotify_ops Jan Kara
2016-12-26 17:39   ` Amir Goldstein
2016-12-22 20:58 ` [PATCH 0/22] fsnotify: Avoid SRCU stalls with fanotify permission events Paul Moore
2016-12-22 21:05   ` Amir Goldstein
2016-12-22 23:04     ` Paul Moore

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.