All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang
@ 2019-04-10  4:54 Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 1/5] fsnotify: turn fsnotify reaper thread into a workqueue job Matthew Ruffell
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Matthew Ruffell @ 2019-04-10  4:54 UTC (permalink / raw)
  To: stable

BugLink: https://bugs.launchpad.net/bugs/1775165

[Note to upstream]
I understand that this patch is a little long for -stable, but this patch series
fixes a real issue, seen by real users, is testable, and is made up from
upstream commits. Please consider it.

[Impact]

When userspace tasks which are processing fanotify permission events act 
incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
the whole notification subsystem to hang. 

This has been seen in production, and it can also be seen when running the 
Linux Test Project testsuite, specifically fanotify07. 

[Fix]

Instead of holding the SRCU lock while waiting for userspace to respond, 
which may never happen, or not in the order we are expecting, we drop the 
fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then 
reacquire the lock again when userspace responds.

The fixes are from a series of upstream commits:

05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)

The following are upstream commits necessary for the fixes to function:

35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

[Testcase]

You can reproduce the problem pretty quickly with the Linux Test Project:

Steps (with root):
  1. sudo apt-get install git xfsprogs -y
  2. git clone --depth=1 https://github.com/linux-test-project/ltp.git
  3. cd ltp
  4. make autotools
  5. ./configure
  6. make; make install
  7. cd /opt/ltp
  8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
  9. ./runltp -f /tmp/jobs
  
On a stock Xenial kernel, the system will hang, and the testcase will look like:

<<<test_start>>>
tag=fanotify07 stime=1554326200
cmdline="fanotify07 "
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Test timeouted, sending SIGKILL!
Cannot kill test processes!
Congratulation, likely test hit a kernel bug.
Exitting uncleanly...
<<<execution_status>>>
initiation_status="ok"
duration=350 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=0
<<<test_end>>>

Looking at dmesg, we see the following call stack

[  790.772792] LTP: starting fanotify07 (fanotify07 )
[  960.140455] INFO: task fsnotify_mark:36 blocked for more than 120 seconds.
[  960.140867]       Not tainted 4.4.0-142-generic #168-Ubuntu
[  960.141185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  960.141498] fsnotify_mark   D ffff8800b6703c98     0    36      2 0x00000000
[  960.141516]  ffff8800b6703c98 ffff88013a558a00 ffff8800b7797000 ffff8800b66f8000
[  960.141524]  ffff8800b6704000 7fffffffffffffff ffff8800b6703de0 ffff8800b66f8000
[  960.141528]  0000000000000000 ffff8800b6703cb0 ffffffff8185cb45 ffff8800b6703de8
[  960.141532] Call Trace:
[  960.141580]  [<ffffffff8185cb45>] schedule+0x35/0x80
[  960.141588]  [<ffffffff818600f4>] schedule_timeout+0x1b4/0x270
[  960.141617]  [<ffffffff810f57ac>] ? mod_timer+0x10c/0x240
[  960.141621]  [<ffffffff8185c60d>] ? __schedule+0x30d/0x810
[  960.141625]  [<ffffffff8185d652>] wait_for_completion+0xb2/0x190
[  960.141636]  [<ffffffff810b1f10>] ? wake_up_q+0x70/0x70
[  960.141641]  [<ffffffff810eb140>] __synchronize_srcu+0x100/0x1a0
[  960.141645]  [<ffffffff810ea400>] ? trace_raw_output_rcu_utilization+0x60/0x60
[  960.141664]  [<ffffffff81260870>] ? fsnotify_put_mark+0x40/0x40
[  960.141669]  [<ffffffff810eb204>] synchronize_srcu+0x24/0x30
[  960.141672]  [<ffffffff812608f4>] fsnotify_mark_destroy+0x84/0x130
[  960.141680]  [<ffffffff810ca000>] ? wake_atomic_t_function+0x60/0x60
[  960.141691]  [<ffffffff810a6227>] kthread+0xe7/0x100
[  960.141694]  [<ffffffff8185c601>] ? __schedule+0x301/0x810
[  960.141699]  [<ffffffff810a6140>] ? kthread_create_on_node+0x1e0/0x1e0
[  960.141703]  [<ffffffff818618e5>] ret_from_fork+0x55/0x80
[  960.141706]  [<ffffffff810a6140>] ? kthread_create_on_node+0x1e0/0x1e0

The vanilla 4.4 kernel also shows the same call stack.

On a patched kernel, the test will pass successfully, and there will be no
messages in dmesg. 

[Regression Potential]

This makes modifications to how locking is performed in fsnotify / fanotify and 
there may be some cause for regression. Running all fanotify Linux Test Project
tests shows that there are no extra failures caused by the patches, and instead
fewer failures are seen due to the bugfix. 

Running the entire Linux Test Project testsuite actually works and runs to 
completion, something which doesn't happen in a unpatched kernel since it will 
hang on the fanotify07 test.

The patches are taken from upstream, and all necessary commits have been taken
into account, so I am happy with the potential risks and that testing has been
completed.

Jan Kara (4):
  fsnotify: avoid spurious EMFILE errors from inotify_init()
  fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  fsnotify: Pass fsnotify_iter_info into handle_event handler
  fanotify: Release SRCU lock when waiting for userspace response

Jeff Layton (1):
  fsnotify: turn fsnotify reaper thread into a workqueue job

 fs/notify/dnotify/dnotify.c          |   3 +-
 fs/notify/fanotify/fanotify.c        |  20 ++-
 fs/notify/fsnotify.c                 |  19 ++-
 fs/notify/fsnotify.h                 |  13 ++
 fs/notify/group.c                    |  18 ++-
 fs/notify/inotify/inotify.h          |   3 +-
 fs/notify/inotify/inotify_fsnotify.c |   3 +-
 fs/notify/inotify/inotify_user.c     |   2 +-
 fs/notify/mark.c                     | 194 +++++++++++++++++++++------
 include/linux/fsnotify_backend.h     |  10 +-
 kernel/audit_fsnotify.c              |   3 +-
 kernel/audit_tree.c                  |   3 +-
 kernel/audit_watch.c                 |   3 +-
 13 files changed, 230 insertions(+), 64 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 4.4 1/5] fsnotify: turn fsnotify reaper thread into a workqueue job
  2019-04-10  4:54 [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Matthew Ruffell
@ 2019-04-10  4:54 ` Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 2/5] fsnotify: avoid spurious EMFILE errors from inotify_init() Matthew Ruffell
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Matthew Ruffell @ 2019-04-10  4:54 UTC (permalink / raw)
  To: stable

From: Jeff Layton <jlayton@poochiereds.net>

commit 0918f1c309b86301605650c836ddd2021d311ae2 upstream.

We don't require a dedicated thread for fsnotify cleanup.  Switch it
over to a workqueue job instead that runs on the system_unbound_wq.

In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[mruffell: cherry picked]
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
---
 fs/notify/mark.c | 49 ++++++++++++++++++------------------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index fc0df4442f7b..7115c5d7d373 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -91,10 +91,14 @@
 #include <linux/fsnotify_backend.h>
 #include "fsnotify.h"
 
+#define FSNOTIFY_REAPER_DELAY	(1)	/* 1 jiffy */
+
 struct srcu_struct fsnotify_mark_srcu;
 static DEFINE_SPINLOCK(destroy_lock);
 static LIST_HEAD(destroy_list);
-static DECLARE_WAIT_QUEUE_HEAD(destroy_waitq);
+
+static void fsnotify_mark_destroy(struct work_struct *work);
+static DECLARE_DELAYED_WORK(reaper_work, fsnotify_mark_destroy);
 
 void fsnotify_get_mark(struct fsnotify_mark *mark)
 {
@@ -189,7 +193,8 @@ void fsnotify_free_mark(struct fsnotify_mark *mark)
 	spin_lock(&destroy_lock);
 	list_add(&mark->g_list, &destroy_list);
 	spin_unlock(&destroy_lock);
-	wake_up(&destroy_waitq);
+	queue_delayed_work(system_unbound_wq, &reaper_work,
+				FSNOTIFY_REAPER_DELAY);
 
 	/*
 	 * Some groups like to know that marks are being freed.  This is a
@@ -388,7 +393,8 @@ err:
 	spin_lock(&destroy_lock);
 	list_add(&mark->g_list, &destroy_list);
 	spin_unlock(&destroy_lock);
-	wake_up(&destroy_waitq);
+	queue_delayed_work(system_unbound_wq, &reaper_work,
+				FSNOTIFY_REAPER_DELAY);
 
 	return ret;
 }
@@ -493,39 +499,20 @@ void fsnotify_init_mark(struct fsnotify_mark *mark,
 	mark->free_mark = free_mark;
 }
 
-static int fsnotify_mark_destroy(void *ignored)
+static void fsnotify_mark_destroy(struct work_struct *work)
 {
 	struct fsnotify_mark *mark, *next;
 	struct list_head private_destroy_list;
 
-	for (;;) {
-		spin_lock(&destroy_lock);
-		/* exchange the list head */
-		list_replace_init(&destroy_list, &private_destroy_list);
-		spin_unlock(&destroy_lock);
-
-		synchronize_srcu(&fsnotify_mark_srcu);
+	spin_lock(&destroy_lock);
+	/* exchange the list head */
+	list_replace_init(&destroy_list, &private_destroy_list);
+	spin_unlock(&destroy_lock);
 
-		list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
-			list_del_init(&mark->g_list);
-			fsnotify_put_mark(mark);
-		}
+	synchronize_srcu(&fsnotify_mark_srcu);
 
-		wait_event_interruptible(destroy_waitq, !list_empty(&destroy_list));
+	list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
+		list_del_init(&mark->g_list);
+		fsnotify_put_mark(mark);
 	}
-
-	return 0;
-}
-
-static int __init fsnotify_mark_init(void)
-{
-	struct task_struct *thread;
-
-	thread = kthread_run(fsnotify_mark_destroy, NULL,
-			     "fsnotify_mark");
-	if (IS_ERR(thread))
-		panic("unable to start fsnotify mark destruction thread.");
-
-	return 0;
 }
-device_initcall(fsnotify_mark_init);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4.4 2/5] fsnotify: avoid spurious EMFILE errors from inotify_init()
  2019-04-10  4:54 [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 1/5] fsnotify: turn fsnotify reaper thread into a workqueue job Matthew Ruffell
@ 2019-04-10  4:54 ` Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 3/5] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Matthew Ruffell
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Matthew Ruffell @ 2019-04-10  4:54 UTC (permalink / raw)
  To: stable

From: Jan Kara <jack@suse.cz>

commit 35e481761cdc688dbee0ef552a13f49af8eba6cc upstream.

Inotify instance is destroyed when all references to it are dropped.
That not only means that the corresponding file descriptor needs to be
closed but also that all corresponding instance marks are freed (as each
mark holds a reference to the inotify instance).  However marks are
freed only after SRCU period ends which can take some time and thus if
user rapidly creates and frees inotify instances, number of existing
inotify instances can exceed max_user_instances limit although from user
point of view there is always at most one existing instance.  Thus
inotify_init() returns EMFILE error which is hard to justify from user
point of view.  This problem is exposed by LTP inotify06 testcase on
some machines.

We fix the problem by making sure all group marks are properly freed
while destroying inotify instance.  We wait for SRCU period to end in
that path anyway since we have to make sure there is no event being
added to the instance while we are tearing down the instance.  So it
takes only some plumbing to allow for marks to be destroyed in that path
as well and not from a dedicated work item.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Tested-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[mruffell: backport: adjust layout of fsnotify_destroy_group()]
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
---
 fs/notify/fsnotify.h             |  7 +++
 fs/notify/group.c                | 17 +++++--
 fs/notify/mark.c                 | 78 +++++++++++++++++++++++++-------
 include/linux/fsnotify_backend.h |  2 -
 4 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index b44c68a857e7..0a3bc2cf192c 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -56,6 +56,13 @@ static inline void fsnotify_clear_marks_by_mount(struct vfsmount *mnt)
 	fsnotify_destroy_marks(&real_mount(mnt)->mnt_fsnotify_marks,
 			       &mnt->mnt_root->d_lock);
 }
+/* prepare for freeing all marks associated with given group */
+extern void fsnotify_detach_group_marks(struct fsnotify_group *group);
+/*
+ * wait for fsnotify_mark_srcu period to end and free all marks in destroy_list
+ */
+extern void fsnotify_mark_destroy_list(void);
+
 /*
  * update the dentry->d_flags of all of inode's children to indicate if inode cares
  * about events that happen to its children.
diff --git a/fs/notify/group.c b/fs/notify/group.c
index 18eb30c6bd8f..b47f7cfdcaa4 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -66,12 +66,21 @@ void fsnotify_destroy_group(struct fsnotify_group *group)
 	 */
 	fsnotify_group_stop_queueing(group);
 
-	/* clear all inode marks for this group */
-	fsnotify_clear_marks_by_group(group);
+	/* clear all inode marks for this group, attach them to destroy_list */
+	fsnotify_detach_group_marks(group);
 
-	synchronize_srcu(&fsnotify_mark_srcu);
+	/*
+	 * Wait for fsnotify_mark_srcu period to end and free all marks in
+	 * destroy_list
+	 */
+	fsnotify_mark_destroy_list();
 
-	/* clear the notification queue of all events */
+	/*
+	 * Since we have waited for fsnotify_mark_srcu in
+	 * fsnotify_mark_destroy_list() there can be no outstanding event
+	 * notification against this group. So clearing the notification queue
+	 * of all events is reliable now.
+	 */
 	fsnotify_flush_notify(group);
 
 	/*
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 7115c5d7d373..d3fea0bd89e2 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -97,8 +97,8 @@ struct srcu_struct fsnotify_mark_srcu;
 static DEFINE_SPINLOCK(destroy_lock);
 static LIST_HEAD(destroy_list);
 
-static void fsnotify_mark_destroy(struct work_struct *work);
-static DECLARE_DELAYED_WORK(reaper_work, fsnotify_mark_destroy);
+static void fsnotify_mark_destroy_workfn(struct work_struct *work);
+static DECLARE_DELAYED_WORK(reaper_work, fsnotify_mark_destroy_workfn);
 
 void fsnotify_get_mark(struct fsnotify_mark *mark)
 {
@@ -173,11 +173,15 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
 }
 
 /*
- * Free fsnotify mark. The freeing is actually happening from a kthread which
- * first waits for srcu period end. Caller must have a reference to the mark
- * or be protected by fsnotify_mark_srcu.
+ * Prepare mark for freeing and add it to the list of marks prepared for
+ * freeing. The actual freeing must happen after SRCU period ends and the
+ * caller is responsible for this.
+ *
+ * The function returns true if the mark was added to the list of marks for
+ * freeing. The function returns false if someone else has already called
+ * __fsnotify_free_mark() for the mark.
  */
-void fsnotify_free_mark(struct fsnotify_mark *mark)
+static bool __fsnotify_free_mark(struct fsnotify_mark *mark)
 {
 	struct fsnotify_group *group = mark->group;
 
@@ -185,17 +189,11 @@ void fsnotify_free_mark(struct fsnotify_mark *mark)
 	/* something else already called this function on this mark */
 	if (!(mark->flags & FSNOTIFY_MARK_FLAG_ALIVE)) {
 		spin_unlock(&mark->lock);
-		return;
+		return false;
 	}
 	mark->flags &= ~FSNOTIFY_MARK_FLAG_ALIVE;
 	spin_unlock(&mark->lock);
 
-	spin_lock(&destroy_lock);
-	list_add(&mark->g_list, &destroy_list);
-	spin_unlock(&destroy_lock);
-	queue_delayed_work(system_unbound_wq, &reaper_work,
-				FSNOTIFY_REAPER_DELAY);
-
 	/*
 	 * Some groups like to know that marks are being freed.  This is a
 	 * callback to the group function to let it know that this mark
@@ -203,6 +201,25 @@ void fsnotify_free_mark(struct fsnotify_mark *mark)
 	 */
 	if (group->ops->freeing_mark)
 		group->ops->freeing_mark(mark, group);
+
+	spin_lock(&destroy_lock);
+	list_add(&mark->g_list, &destroy_list);
+	spin_unlock(&destroy_lock);
+
+	return true;
+}
+
+/*
+ * Free fsnotify mark. The freeing is actually happening from a workqueue which
+ * first waits for srcu period end. Caller must have a reference to the mark
+ * or be protected by fsnotify_mark_srcu.
+ */
+void fsnotify_free_mark(struct fsnotify_mark *mark)
+{
+	if (__fsnotify_free_mark(mark)) {
+		queue_delayed_work(system_unbound_wq, &reaper_work,
+				   FSNOTIFY_REAPER_DELAY);
+	}
 }
 
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
@@ -468,11 +485,29 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 }
 
 /*
- * Given a group, destroy all of the marks associated with that group.
+ * Given a group, prepare for freeing all the marks associated with that group.
+ * The marks are attached to the list of marks prepared for destruction, the
+ * caller is responsible for freeing marks in that list after SRCU period has
+ * ended.
  */
-void fsnotify_clear_marks_by_group(struct fsnotify_group *group)
+void fsnotify_detach_group_marks(struct fsnotify_group *group)
 {
-	fsnotify_clear_marks_by_group_flags(group, (unsigned int)-1);
+	struct fsnotify_mark *mark;
+
+	while (1) {
+		mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
+		if (list_empty(&group->marks_list)) {
+			mutex_unlock(&group->mark_mutex);
+			break;
+		}
+		mark = list_first_entry(&group->marks_list,
+					struct fsnotify_mark, g_list);
+		fsnotify_get_mark(mark);
+		fsnotify_detach_mark(mark);
+		mutex_unlock(&group->mark_mutex);
+		__fsnotify_free_mark(mark);
+		fsnotify_put_mark(mark);
+	}
 }
 
 void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
@@ -499,7 +534,11 @@ void fsnotify_init_mark(struct fsnotify_mark *mark,
 	mark->free_mark = free_mark;
 }
 
-static void fsnotify_mark_destroy(struct work_struct *work)
+/*
+ * Destroy all marks in destroy_list, waits for SRCU period to finish before
+ * actually freeing marks.
+ */
+void fsnotify_mark_destroy_list(void)
 {
 	struct fsnotify_mark *mark, *next;
 	struct list_head private_destroy_list;
@@ -516,3 +555,8 @@ static void fsnotify_mark_destroy(struct work_struct *work)
 		fsnotify_put_mark(mark);
 	}
 }
+
+static void fsnotify_mark_destroy_workfn(struct work_struct *work)
+{
+	fsnotify_mark_destroy_list();
+}
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 850d8822e8ff..c611724ff16b 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -364,8 +364,6 @@ extern void fsnotify_clear_vfsmount_marks_by_group(struct fsnotify_group *group)
 extern void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group);
 /* run all the marks in a group, and clear all of the marks where mark->flags & flags is true*/
 extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, unsigned int flags);
-/* run all the marks in a group, and flag them to be freed */
-extern void fsnotify_clear_marks_by_group(struct fsnotify_group *group);
 extern void fsnotify_get_mark(struct fsnotify_mark *mark);
 extern void fsnotify_put_mark(struct fsnotify_mark *mark);
 extern void fsnotify_unmount_inodes(struct super_block *sb);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4.4 3/5] fsnotify: Provide framework for dropping SRCU lock in ->handle_event
  2019-04-10  4:54 [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 1/5] fsnotify: turn fsnotify reaper thread into a workqueue job Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 2/5] fsnotify: avoid spurious EMFILE errors from inotify_init() Matthew Ruffell
@ 2019-04-10  4:54 ` Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 4/5] fsnotify: Pass fsnotify_iter_info into handle_event handler Matthew Ruffell
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Matthew Ruffell @ 2019-04-10  4:54 UTC (permalink / raw)
  To: stable

From: Jan Kara <jack@suse.cz>

commit abc77577a669f424c5d0c185b9994f2621c52aa4 upstream.

fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
from userspace so that the whole notification subsystem is not blocked
during that time. This patch provides a framework for safely getting
mark reference for a mark found in the object list which pins the mark
in that list. We can then drop fsnotify_mark_srcu, wait for userspace
response and then safely continue iteration of the object list once we
reaquire fsnotify_mark_srcu.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[mruffell: backport: realign file fs/notify/mark.c]
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
---
 fs/notify/fsnotify.h             |  6 +++
 fs/notify/group.c                |  1 +
 fs/notify/mark.c                 | 83 +++++++++++++++++++++++++++++++-
 include/linux/fsnotify_backend.h |  5 ++
 4 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h
index 0a3bc2cf192c..0ad0eb9f2e14 100644
--- a/fs/notify/fsnotify.h
+++ b/fs/notify/fsnotify.h
@@ -8,6 +8,12 @@
 
 #include "../mount.h"
 
+struct fsnotify_iter_info {
+	struct fsnotify_mark *inode_mark;
+	struct fsnotify_mark *vfsmount_mark;
+	int srcu_idx;
+};
+
 /* destroy all events sitting in this groups notification queue */
 extern void fsnotify_flush_notify(struct fsnotify_group *group);
 
diff --git a/fs/notify/group.c b/fs/notify/group.c
index b47f7cfdcaa4..4c63b148835f 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -124,6 +124,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
 	/* set to 0 when there a no external references to this group */
 	atomic_set(&group->refcnt, 1);
 	atomic_set(&group->num_marks, 0);
+	atomic_set(&group->user_waits, 0);
 
 	mutex_init(&group->notification_mutex);
 	INIT_LIST_HEAD(&group->notification_list);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d3fea0bd89e2..d3005d95d530 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -105,6 +105,16 @@ void fsnotify_get_mark(struct fsnotify_mark *mark)
 	atomic_inc(&mark->refcnt);
 }
 
+/*
+ * Get mark reference when we found the mark via lockless traversal of object
+ * list. Mark can be already removed from the list by now and on its way to be
+ * destroyed once SRCU period ends.
+ */
+static bool fsnotify_get_mark_safe(struct fsnotify_mark *mark)
+{
+	return atomic_inc_not_zero(&mark->refcnt);
+}
+
 void fsnotify_put_mark(struct fsnotify_mark *mark)
 {
 	if (atomic_dec_and_test(&mark->refcnt)) {
@@ -125,6 +135,72 @@ u32 fsnotify_recalc_mask(struct hlist_head *head)
 	return new_mask;
 }
 
+bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info)
+{
+	struct fsnotify_group *group;
+
+	if (WARN_ON_ONCE(!iter_info->inode_mark && !iter_info->vfsmount_mark))
+		return false;
+
+	if (iter_info->inode_mark)
+		group = iter_info->inode_mark->group;
+	else
+		group = iter_info->vfsmount_mark->group;
+
+	/*
+	 * Since acquisition of mark reference is an atomic op as well, we can
+	 * be sure this inc is seen before any effect of refcount increment.
+	 */
+	atomic_inc(&group->user_waits);
+
+	if (iter_info->inode_mark) {
+		/* This can fail if mark is being removed */
+		if (!fsnotify_get_mark_safe(iter_info->inode_mark))
+			goto out_wait;
+	}
+	if (iter_info->vfsmount_mark) {
+		if (!fsnotify_get_mark_safe(iter_info->vfsmount_mark))
+			goto out_inode;
+	}
+
+	/*
+	 * Now that both marks are pinned by refcount in the inode / vfsmount
+	 * lists, we can drop SRCU lock, and safely resume the list iteration
+	 * once userspace returns.
+	 */
+	srcu_read_unlock(&fsnotify_mark_srcu, iter_info->srcu_idx);
+
+	return true;
+out_inode:
+	if (iter_info->inode_mark)
+		fsnotify_put_mark(iter_info->inode_mark);
+out_wait:
+	if (atomic_dec_and_test(&group->user_waits) && group->shutdown)
+		wake_up(&group->notification_waitq);
+	return false;
+}
+
+void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info)
+{
+	struct fsnotify_group *group = NULL;
+
+	iter_info->srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
+	if (iter_info->inode_mark) {
+		group = iter_info->inode_mark->group;
+		fsnotify_put_mark(iter_info->inode_mark);
+	}
+	if (iter_info->vfsmount_mark) {
+		group = iter_info->vfsmount_mark->group;
+		fsnotify_put_mark(iter_info->vfsmount_mark);
+	}
+	/*
+	 * We abuse notification_waitq on group shutdown for waiting for all
+	 * marks pinned when waiting for userspace.
+	 */
+	if (atomic_dec_and_test(&group->user_waits) && group->shutdown)
+		wake_up(&group->notification_waitq);
+}
+
 /*
  * Remove mark from inode / vfsmount list, group list, drop inode reference
  * if we got one.
@@ -161,7 +237,6 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark)
 	 * __fsnotify_parent() lazily when next event happens on one of our
 	 * children.
 	 */
-
 	list_del_init(&mark->g_list);
 
 	spin_unlock(&mark->lock);
@@ -508,6 +583,12 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
 		__fsnotify_free_mark(mark);
 		fsnotify_put_mark(mark);
 	}
+	/*
+	 * Some marks can still be pinned when waiting for response from
+	 * userspace. Wait for those now. fsnotify_prepare_user_wait() will
+	 * not succeed now so this wait is race-free.
+	 */
+	wait_event(group->notification_waitq, !atomic_read(&group->user_waits));
 }
 
 void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index c611724ff16b..c7c5ea590d54 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -79,6 +79,7 @@ struct fsnotify_event;
 struct fsnotify_mark;
 struct fsnotify_event_private_data;
 struct fsnotify_fname;
+struct fsnotify_iter_info;
 
 /*
  * Each group much define these ops.  The fsnotify infrastructure will call
@@ -162,6 +163,8 @@ struct fsnotify_group {
 	struct fsnotify_event *overflow_event;	/* Event we queue when the
 						 * notification list is too
 						 * full */
+	atomic_t user_waits;		/* Number of tasks waiting for user
+					 * response */
 
 	/* groups can define private fields here or use the void *private */
 	union {
@@ -367,6 +370,8 @@ extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, un
 extern void fsnotify_get_mark(struct fsnotify_mark *mark);
 extern void fsnotify_put_mark(struct fsnotify_mark *mark);
 extern void fsnotify_unmount_inodes(struct super_block *sb);
+extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info);
+extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info);
 
 /* put here because inotify does some weird stuff when destroying watches */
 extern void fsnotify_init_event(struct fsnotify_event *event,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4.4 4/5] fsnotify: Pass fsnotify_iter_info into handle_event handler
  2019-04-10  4:54 [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Matthew Ruffell
                   ` (2 preceding siblings ...)
  2019-04-10  4:54 ` [PATCH 4.4 3/5] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Matthew Ruffell
@ 2019-04-10  4:54 ` Matthew Ruffell
  2019-04-10  4:54 ` [PATCH 4.4 5/5] fanotify: Release SRCU lock when waiting for userspace response Matthew Ruffell
  2019-04-10 16:44 ` [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Sasha Levin
  5 siblings, 0 replies; 8+ messages in thread
From: Matthew Ruffell @ 2019-04-10  4:54 UTC (permalink / raw)
  To: stable

From: Jan Kara <jack@suse.cz>

commit 9385a84d7e1f658bb2d96ab798393e4b16268aaa upstream.

Pass fsnotify_iter_info into ->handle_event() handler so that it can
release and reacquire SRCU lock via fsnotify_prepare_user_wait() and
fsnotify_finish_user_wait() functions.  These functions also make sure
current marks are appropriately pinned so that iteration protected by
srcu in fsnotify() stays safe.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[mruffell: backport: removing const keyword and minor realignment]
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
---
 fs/notify/dnotify/dnotify.c          |  3 ++-
 fs/notify/fanotify/fanotify.c        |  3 ++-
 fs/notify/fsnotify.c                 | 19 +++++++++++++------
 fs/notify/inotify/inotify.h          |  3 ++-
 fs/notify/inotify/inotify_fsnotify.c |  3 ++-
 fs/notify/inotify/inotify_user.c     |  2 +-
 include/linux/fsnotify_backend.h     |  3 ++-
 kernel/audit_fsnotify.c              |  3 ++-
 kernel/audit_tree.c                  |  3 ++-
 kernel/audit_watch.c                 |  3 ++-
 10 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 6faaf710e563..264bfd99a694 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -86,7 +86,8 @@ static int dnotify_handle_event(struct fsnotify_group *group,
 				struct fsnotify_mark *inode_mark,
 				struct fsnotify_mark *vfsmount_mark,
 				u32 mask, void *data, int data_type,
-				const unsigned char *file_name, u32 cookie)
+				const unsigned char *file_name, u32 cookie,
+				struct fsnotify_iter_info *iter_info)
 {
 	struct dnotify_mark *dn_mark;
 	struct dnotify_struct *dn;
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 8a459b179183..4944956cdbd9 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -174,7 +174,8 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 				 struct fsnotify_mark *inode_mark,
 				 struct fsnotify_mark *fanotify_mark,
 				 u32 mask, void *data, int data_type,
-				 const unsigned char *file_name, u32 cookie)
+				 const unsigned char *file_name, u32 cookie,
+				 struct fsnotify_iter_info *iter_info)
 {
 	int ret = 0;
 	struct fanotify_event_info *event;
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index a64adc2fced9..19c75b446314 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -131,7 +131,8 @@ static int send_to_group(struct inode *to_tell,
 			 struct fsnotify_mark *vfsmount_mark,
 			 __u32 mask, void *data,
 			 int data_is, u32 cookie,
-			 const unsigned char *file_name)
+			 const unsigned char *file_name,
+			 struct fsnotify_iter_info *iter_info)
 {
 	struct fsnotify_group *group = NULL;
 	__u32 inode_test_mask = 0;
@@ -182,7 +183,7 @@ static int send_to_group(struct inode *to_tell,
 
 	return group->ops->handle_event(group, to_tell, inode_mark,
 					vfsmount_mark, mask, data, data_is,
-					file_name, cookie);
+					file_name, cookie, iter_info);
 }
 
 /*
@@ -197,8 +198,9 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
 	struct hlist_node *inode_node = NULL, *vfsmount_node = NULL;
 	struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL;
 	struct fsnotify_group *inode_group, *vfsmount_group;
+	struct fsnotify_iter_info iter_info;
 	struct mount *mnt;
-	int idx, ret = 0;
+	int ret = 0;
 	/* global tests shouldn't care about events on child only the specific event */
 	__u32 test_mask = (mask & ~FS_EVENT_ON_CHILD);
 
@@ -227,7 +229,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
 	    !(mnt && test_mask & mnt->mnt_fsnotify_mask))
 		return 0;
 
-	idx = srcu_read_lock(&fsnotify_mark_srcu);
+	iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
 
 	if ((mask & FS_MODIFY) ||
 	    (test_mask & to_tell->i_fsnotify_mask))
@@ -276,8 +278,13 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
 				vfsmount_mark = NULL;
 			}
 		}
+
+		iter_info.inode_mark = inode_mark;
+		iter_info.vfsmount_mark = vfsmount_mark;
+
 		ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask,
-				    data, data_is, cookie, file_name);
+				    data, data_is, cookie, file_name,
+				    &iter_info);
 
 		if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
 			goto out;
@@ -291,7 +298,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is,
 	}
 	ret = 0;
 out:
-	srcu_read_unlock(&fsnotify_mark_srcu, idx);
+	srcu_read_unlock(&fsnotify_mark_srcu, iter_info.srcu_idx);
 
 	return ret;
 }
diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
index ed855ef6f077..726b06b303b8 100644
--- a/fs/notify/inotify/inotify.h
+++ b/fs/notify/inotify/inotify.h
@@ -27,6 +27,7 @@ extern int inotify_handle_event(struct fsnotify_group *group,
 				struct fsnotify_mark *inode_mark,
 				struct fsnotify_mark *vfsmount_mark,
 				u32 mask, void *data, int data_type,
-				const unsigned char *file_name, u32 cookie);
+				const unsigned char *file_name, u32 cookie,
+				struct fsnotify_iter_info *iter_info);
 
 extern const struct fsnotify_ops inotify_fsnotify_ops;
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index 2cd900c2c737..79a5f06b9100 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -67,7 +67,8 @@ int inotify_handle_event(struct fsnotify_group *group,
 			 struct fsnotify_mark *inode_mark,
 			 struct fsnotify_mark *vfsmount_mark,
 			 u32 mask, void *data, int data_type,
-			 const unsigned char *file_name, u32 cookie)
+			 const unsigned char *file_name, u32 cookie,
+			 struct fsnotify_iter_info *iter_info)
 {
 	struct inotify_inode_mark *i_mark;
 	struct inotify_event_info *event;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index b8d08d0d0a4d..6cea8b2131a3 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -494,7 +494,7 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
 
 	/* Queue ignore event for the watch */
 	inotify_handle_event(group, NULL, fsn_mark, NULL, FS_IN_IGNORED,
-			     NULL, FSNOTIFY_EVENT_NONE, NULL, 0);
+			     NULL, FSNOTIFY_EVENT_NONE, NULL, 0, NULL);
 
 	i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark);
 	/* remove this mark from the idr */
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index c7c5ea590d54..ddc13584cbe2 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -98,7 +98,8 @@ struct fsnotify_ops {
 			    struct fsnotify_mark *inode_mark,
 			    struct fsnotify_mark *vfsmount_mark,
 			    u32 mask, void *data, int data_type,
-			    const unsigned char *file_name, u32 cookie);
+			    const unsigned char *file_name, u32 cookie,
+			    struct fsnotify_iter_info *iter_info);
 	void (*free_group_priv)(struct fsnotify_group *group);
 	void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group);
 	void (*free_event)(struct fsnotify_event *event);
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 27c6046c2c3d..94aa9995f41a 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -169,7 +169,8 @@ static int audit_mark_handle_event(struct fsnotify_group *group,
 				    struct fsnotify_mark *inode_mark,
 				    struct fsnotify_mark *vfsmount_mark,
 				    u32 mask, void *data, int data_type,
-				    const unsigned char *dname, u32 cookie)
+				    const unsigned char *dname, u32 cookie,
+				    struct fsnotify_iter_info *iter_info)
 {
 	struct audit_fsnotify_mark *audit_mark;
 	struct inode *inode = NULL;
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index 5efe9b299a12..9443b7fd6d90 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -951,7 +951,8 @@ static int audit_tree_handle_event(struct fsnotify_group *group,
 				   struct fsnotify_mark *inode_mark,
 				   struct fsnotify_mark *vfsmount_mark,
 				   u32 mask, void *data, int data_type,
-				   const unsigned char *file_name, u32 cookie)
+				   const unsigned char *file_name, u32 cookie,
+				   struct fsnotify_iter_info *iter_info)
 {
 	return 0;
 }
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index f45a9a5d3e47..40fb562ca404 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -485,7 +485,8 @@ static int audit_watch_handle_event(struct fsnotify_group *group,
 				    struct fsnotify_mark *inode_mark,
 				    struct fsnotify_mark *vfsmount_mark,
 				    u32 mask, void *data, int data_type,
-				    const unsigned char *dname, u32 cookie)
+				    const unsigned char *dname, u32 cookie,
+				    struct fsnotify_iter_info *iter_info)
 {
 	struct inode *inode;
 	struct audit_parent *parent;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4.4 5/5] fanotify: Release SRCU lock when waiting for userspace response
  2019-04-10  4:54 [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Matthew Ruffell
                   ` (3 preceding siblings ...)
  2019-04-10  4:54 ` [PATCH 4.4 4/5] fsnotify: Pass fsnotify_iter_info into handle_event handler Matthew Ruffell
@ 2019-04-10  4:54 ` Matthew Ruffell
  2019-04-10 16:44 ` [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Sasha Levin
  5 siblings, 0 replies; 8+ messages in thread
From: Matthew Ruffell @ 2019-04-10  4:54 UTC (permalink / raw)
  To: stable

From: Jan Kara <jack@suse.cz>

commit 05f0e38724e8449184acd8fbf0473ee5a07adc6c upstream.

When userspace task processing fanotify permission events screws up and
does not respond, fsnotify_mark_srcu SRCU is held indefinitely which
causes further hangs in the whole notification subsystem. Although we
cannot easily solve the problem of operations blocked waiting for
response from userspace, we can at least somewhat localize the damage by
dropping SRCU lock before waiting for userspace response and reacquiring
it when userspace responds.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[mruffell: cherry picked]
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
---
 fs/notify/fanotify/fanotify.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 4944956cdbd9..eeb5cc1f6978 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -61,14 +61,26 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
 
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
 static int fanotify_get_response(struct fsnotify_group *group,
-				 struct fanotify_perm_event_info *event)
+				 struct fanotify_perm_event_info *event,
+				 struct fsnotify_iter_info *iter_info)
 {
 	int ret;
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
+	/*
+	 * fsnotify_prepare_user_wait() fails if we race with mark deletion.
+	 * Just let the operation pass in that case.
+	 */
+	if (!fsnotify_prepare_user_wait(iter_info)) {
+		event->response = FAN_ALLOW;
+		goto out;
+	}
+
 	wait_event(group->fanotify_data.access_waitq, event->response);
 
+	fsnotify_finish_user_wait(iter_info);
+out:
 	/* userspace responded, convert to something usable */
 	switch (event->response) {
 	case FAN_ALLOW:
@@ -216,7 +228,8 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 
 #ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
 	if (mask & FAN_ALL_PERM_EVENTS) {
-		ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event));
+		ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
+					    iter_info);
 		fsnotify_destroy_event(group, fsn_event);
 	}
 #endif
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang
  2019-04-10  4:54 [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Matthew Ruffell
                   ` (4 preceding siblings ...)
  2019-04-10  4:54 ` [PATCH 4.4 5/5] fanotify: Release SRCU lock when waiting for userspace response Matthew Ruffell
@ 2019-04-10 16:44 ` Sasha Levin
  2019-04-11  3:54   ` Matthew Ruffell
  5 siblings, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2019-04-10 16:44 UTC (permalink / raw)
  To: Matthew Ruffell; +Cc: stable

On Wed, Apr 10, 2019 at 04:54:51PM +1200, Matthew Ruffell wrote:
>BugLink: https://bugs.launchpad.net/bugs/1775165
>
>[Note to upstream]
>I understand that this patch is a little long for -stable, but this patch series
>fixes a real issue, seen by real users, is testable, and is made up from
>upstream commits. Please consider it.
>
>[Impact]
>
>When userspace tasks which are processing fanotify permission events act
>incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
>the whole notification subsystem to hang.
>
>This has been seen in production, and it can also be seen when running the
>Linux Test Project testsuite, specifically fanotify07.
>
>[Fix]
>
>Instead of holding the SRCU lock while waiting for userspace to respond,
>which may never happen, or not in the order we are expecting, we drop the
>fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then
>reacquire the lock again when userspace responds.
>
>The fixes are from a series of upstream commits:
>
>05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
>9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
>abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)
>
>The following are upstream commits necessary for the fixes to function:
>
>35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
>0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)

This would also make sense for 4.9, right? I don't want to fix 4.4
without fixing 4.9 as well.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang
  2019-04-10 16:44 ` [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Sasha Levin
@ 2019-04-11  3:54   ` Matthew Ruffell
  0 siblings, 0 replies; 8+ messages in thread
From: Matthew Ruffell @ 2019-04-11  3:54 UTC (permalink / raw)
  To: Sasha Levin; +Cc: stable

Apologies for the previous HTML email, my bad.

Yes, it does make sense for inclusion in 4.9.

I went and tested 4.9, saw the same hang, and backported the relevant patches. 4.9 requires only 3 of the 5 needed for 4.4.

I sent the patchset to the list, you should see them soon.

Let me know if there's anything else you need.

Thanks,
Matthew

On 11/04/19 4:44 AM, Sasha Levin wrote:
> On Wed, Apr 10, 2019 at 04:54:51PM +1200, Matthew Ruffell wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1775165
>>
>> [Note to upstream]
>> I understand that this patch is a little long for -stable, but this patch series
>> fixes a real issue, seen by real users, is testable, and is made up from
>> upstream commits. Please consider it.
>>
>> [Impact]
>>
>> When userspace tasks which are processing fanotify permission events act
>> incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes
>> the whole notification subsystem to hang.
>>
>> This has been seen in production, and it can also be seen when running the
>> Linux Test Project testsuite, specifically fanotify07.
>>
>> [Fix]
>>
>> Instead of holding the SRCU lock while waiting for userspace to respond,
>> which may never happen, or not in the order we are expecting, we drop the
>> fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then
>> reacquire the lock again when userspace responds.
>>
>> The fixes are from a series of upstream commits:
>>
>> 05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick)
>> 9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport)
>> abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)
>>
>> The following are upstream commits necessary for the fixes to function:
>>
>> 35e481761cdc688dbee0ef552a13f49af8eba6cc (backport)
>> 0918f1c309b86301605650c836ddd2021d311ae2 (cherry-pick)
>
> This would also make sense for 4.9, right? I don't want to fix 4.4
> without fixing 4.9 as well.
>
> -- 
> Thanks,
> Sasha

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-11  3:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-10  4:54 [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Matthew Ruffell
2019-04-10  4:54 ` [PATCH 4.4 1/5] fsnotify: turn fsnotify reaper thread into a workqueue job Matthew Ruffell
2019-04-10  4:54 ` [PATCH 4.4 2/5] fsnotify: avoid spurious EMFILE errors from inotify_init() Matthew Ruffell
2019-04-10  4:54 ` [PATCH 4.4 3/5] fsnotify: Provide framework for dropping SRCU lock in ->handle_event Matthew Ruffell
2019-04-10  4:54 ` [PATCH 4.4 4/5] fsnotify: Pass fsnotify_iter_info into handle_event handler Matthew Ruffell
2019-04-10  4:54 ` [PATCH 4.4 5/5] fanotify: Release SRCU lock when waiting for userspace response Matthew Ruffell
2019-04-10 16:44 ` [PATCH 4.4 0/5] fanotify: Fix notification subsystem hang Sasha Levin
2019-04-11  3:54   ` Matthew Ruffell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.