linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible
@ 2019-02-13 14:54 Jan Kara
  2019-02-13 14:54 ` [PATCH 1/6] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Jan Kara @ 2019-02-13 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski, Jan Kara

Hello,

When waiting for response to fanotify permission events, we currently use
uninterruptible waits. That makes code simple however it can cause lots of
processes to end up in uninterruptible sleep with hard reboot being the only
alternative in case fanotify listener process stops responding (e.g. due to a
bug in its implementation) - reported e.g. in [1]. Uninterruptible sleep also
makes system hibernation fail if the listener gets frozen before the process
generating fanotify permission event (as reported e.g. here [2]).

This patch set modifies fanotify so that it will use interruptible wait when
waiting for fanotify permission event response. Patches are based on current
Linus' tree for the ease of testing (I plan to rebase them on top of Amir's
pending changes later). I have also create LTP test which stresses handling of
permission events while sending processes signals to test the new code [3]
Review, comments, and testing are welcome.

[1] https://lore.kernel.org/lkml/153474898224.6806.12518115530793064797.stgit@buzz/
[2] https://lore.kernel.org/lkml/c1bb16b7-9eee-9cea-2c96-a512d8b3b9c7@nwra.com/
[3] https://lwn.net/ml/linux-fsdevel/20190108165307.GA11259@quack2.suse.cz/

Changes since v1:
* leave pr_debug() calls alone (Amir)
* simplify permission event state tracking (Amir)
* split some changes into separate patches (Amir)

								Honza

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] fanotify: Fold dequeue_event() into process_access_response()
  2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
@ 2019-02-13 14:54 ` Jan Kara
  2019-02-13 19:42   ` Amir Goldstein
  2019-02-13 14:54 ` [PATCH 2/6] fanotify: Move locking inside get_one_event() Jan Kara
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-13 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski, Jan Kara

Fold dequeue_event() into process_access_response(). This will make
changes to use of ->response field easier.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify_user.c | 41 ++++++++++++--------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9c870b0d2b56..908ebc421d15 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -144,28 +144,6 @@ static int fill_event_metadata(struct fsnotify_group *group,
 	return ret;
 }
 
-static struct fanotify_perm_event_info *dequeue_event(
-				struct fsnotify_group *group, int fd)
-{
-	struct fanotify_perm_event_info *event, *return_e = NULL;
-
-	spin_lock(&group->notification_lock);
-	list_for_each_entry(event, &group->fanotify_data.access_list,
-			    fae.fse.list) {
-		if (event->fd != fd)
-			continue;
-
-		list_del_init(&event->fae.fse.list);
-		return_e = event;
-		break;
-	}
-	spin_unlock(&group->notification_lock);
-
-	pr_debug("%s: found return_re=%p\n", __func__, return_e);
-
-	return return_e;
-}
-
 static int process_access_response(struct fsnotify_group *group,
 				   struct fanotify_response *response_struct)
 {
@@ -194,14 +172,21 @@ static int process_access_response(struct fsnotify_group *group,
 	if ((response & FAN_AUDIT) && !FAN_GROUP_FLAG(group, FAN_ENABLE_AUDIT))
 		return -EINVAL;
 
-	event = dequeue_event(group, fd);
-	if (!event)
-		return -ENOENT;
+	spin_lock(&group->notification_lock);
+	list_for_each_entry(event, &group->fanotify_data.access_list,
+			    fae.fse.list) {
+		if (event->fd != fd)
+			continue;
 
-	event->response = response;
-	wake_up(&group->fanotify_data.access_waitq);
+		list_del_init(&event->fae.fse.list);
+		event->response = response;
+		spin_unlock(&group->notification_lock);
+		wake_up(&group->fanotify_data.access_waitq);
+		return 0;
+	}
+	spin_unlock(&group->notification_lock);
 
-	return 0;
+	return -ENOENT;
 }
 
 static ssize_t copy_event_to_user(struct fsnotify_group *group,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] fanotify: Move locking inside get_one_event()
  2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
  2019-02-13 14:54 ` [PATCH 1/6] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
@ 2019-02-13 14:54 ` Jan Kara
  2019-02-13 20:23   ` Amir Goldstein
  2019-02-13 14:54 ` [PATCH 3/6] fsnotify: Create function to remove event from notification list Jan Kara
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-13 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski, Jan Kara

get_one_event() has a single caller and that just locks
notification_lock around the call. Move locking inside get_one_event()
as that will make using ->response field for permission event state
easier.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify_user.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 908ebc421d15..4e36f5642797 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -51,25 +51,25 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
  * Get an fsnotify notification event if one exists and is small
  * enough to fit in "count". Return an error pointer if the count
  * is not large enough.
- *
- * Called with the group->notification_lock held.
  */
 static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
 					    size_t count)
 {
-	assert_spin_locked(&group->notification_lock);
+	struct fsnotify_event *event = NULL;
 
 	pr_debug("%s: group=%p count=%zd\n", __func__, group, count);
-
+	spin_lock(&group->notification_lock);
 	if (fsnotify_notify_queue_is_empty(group))
-		return NULL;
+		goto out;
 
-	if (FAN_EVENT_METADATA_LEN > count)
-		return ERR_PTR(-EINVAL);
-
-	/* held the notification_lock the whole time, so this is the
-	 * same event we peeked above */
-	return fsnotify_remove_first_event(group);
+	if (FAN_EVENT_METADATA_LEN > count) {
+		event = ERR_PTR(-EINVAL);
+		goto out;
+	}
+	event = fsnotify_remove_first_event(group);
+out:
+	spin_unlock(&group->notification_lock);
+	return event;
 }
 
 static int create_fd(struct fsnotify_group *group,
@@ -261,10 +261,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
 
 	add_wait_queue(&group->notification_waitq, &wait);
 	while (1) {
-		spin_lock(&group->notification_lock);
 		kevent = get_one_event(group, count);
-		spin_unlock(&group->notification_lock);
-
 		if (IS_ERR(kevent)) {
 			ret = PTR_ERR(kevent);
 			break;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] fsnotify: Create function to remove event from notification list
  2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
  2019-02-13 14:54 ` [PATCH 1/6] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
  2019-02-13 14:54 ` [PATCH 2/6] fanotify: Move locking inside get_one_event() Jan Kara
@ 2019-02-13 14:54 ` Jan Kara
  2019-02-13 20:23   ` Amir Goldstein
  2019-02-13 14:54 ` [PATCH 4/6] fanotify: Simplify cleaning of access_list Jan Kara
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-13 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski, Jan Kara

Create function to remove event from the notification list. Later it will
be used from more places.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/notification.c         | 20 +++++++++++++-------
 include/linux/fsnotify_backend.h |  3 +++
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 3c3e36745f59..2195a5cf745a 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -141,6 +141,18 @@ int fsnotify_add_event(struct fsnotify_group *group,
 	return ret;
 }
 
+void fsnotify_remove_queued_event(struct fsnotify_group *group,
+				  struct fsnotify_event *event)
+{
+	assert_spin_locked(&group->notification_lock);
+	/*
+	 * We need to init list head for the case of overflow event so that
+	 * check in fsnotify_add_event() works
+	 */
+	list_del_init(&event->list);
+	group->q_len--;
+}
+
 /*
  * Remove and return the first event from the notification list.  It is the
  * responsibility of the caller to destroy the obtained event
@@ -155,13 +167,7 @@ struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group)
 
 	event = list_first_entry(&group->notification_list,
 				 struct fsnotify_event, list);
-	/*
-	 * We need to init list head for the case of overflow event so that
-	 * check in fsnotify_add_event() works
-	 */
-	list_del_init(&event->list);
-	group->q_len--;
-
+	fsnotify_remove_queued_event(group, event);
 	return event;
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 7639774e7475..cddf839bac96 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -416,6 +416,9 @@ extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group);
 extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group);
 /* return AND dequeue the first event on the notification queue */
 extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group);
+/* Remove event queued in the notification list */
+extern void fsnotify_remove_queued_event(struct fsnotify_group *group,
+					 struct fsnotify_event *event);
 
 /* functions used to manipulate the marks attached to inodes */
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] fanotify: Simplify cleaning of access_list
  2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
                   ` (2 preceding siblings ...)
  2019-02-13 14:54 ` [PATCH 3/6] fsnotify: Create function to remove event from notification list Jan Kara
@ 2019-02-13 14:54 ` Jan Kara
  2019-02-13 20:25   ` Amir Goldstein
  2019-02-13 14:54 ` [PATCH 5/6] fanotify: Track permission event state Jan Kara
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-13 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski, Jan Kara

Simplify iteration cleaning access_list in fanotify_release(). That will
make following changes more obvious.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify_user.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 4e36f5642797..96dc469a4086 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -352,7 +352,7 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t
 static int fanotify_release(struct inode *ignored, struct file *file)
 {
 	struct fsnotify_group *group = file->private_data;
-	struct fanotify_perm_event_info *event, *next;
+	struct fanotify_perm_event_info *event;
 	struct fsnotify_event *fsn_event;
 
 	/*
@@ -367,11 +367,9 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 	 * and simulate reply from userspace.
 	 */
 	spin_lock(&group->notification_lock);
-	list_for_each_entry_safe(event, next, &group->fanotify_data.access_list,
-				 fae.fse.list) {
-		pr_debug("%s: found group=%p event=%p\n", __func__, group,
-			 event);
-
+	while (!list_empty(&group->fanotify_data.access_list)) {
+		event = list_first_entry(&group->fanotify_data.access_list,
+				struct fanotify_perm_event_info, fae.fse.list);
 		list_del_init(&event->fae.fse.list);
 		event->response = FAN_ALLOW;
 	}
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] fanotify: Track permission event state
  2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
                   ` (3 preceding siblings ...)
  2019-02-13 14:54 ` [PATCH 4/6] fanotify: Simplify cleaning of access_list Jan Kara
@ 2019-02-13 14:54 ` Jan Kara
  2019-02-13 20:33   ` Amir Goldstein
  2019-02-13 14:54 ` [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events Jan Kara
  2019-02-20 17:27 ` [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Orion Poplawski
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-13 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski, Jan Kara

Track whether permission event got already reported to userspace and
whether userspace already answered to the permission event. Protect
stores to this field together with updates to ->response field by
group->notification_lock. This will allow aborting wait for reply to
permission event from userspace.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c      |  6 +++---
 fs/notify/fanotify/fanotify.h      | 10 +++++++++-
 fs/notify/fanotify/fanotify_user.c | 35 ++++++++++++++++++++++++++++-------
 3 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 3723f3d18d20..e725831be161 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -65,7 +65,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	wait_event(group->fanotify_data.access_waitq, event->response);
+	wait_event(group->fanotify_data.access_waitq,
+		   event->state == FAN_EVENT_ANSWERED);
 
 	/* userspace responded, convert to something usable */
 	switch (event->response & ~FAN_AUDIT) {
@@ -81,8 +82,6 @@ static int fanotify_get_response(struct fsnotify_group *group,
 	if (event->response & FAN_AUDIT)
 		audit_fanotify(event->response & ~FAN_AUDIT);
 
-	event->response = 0;
-
 	pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
 		 group, event, ret);
 	
@@ -167,6 +166,7 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group,
 			goto out;
 		event = &pevent->fae;
 		pevent->response = 0;
+		pevent->state = FAN_EVENT_INIT;
 		goto init;
 	}
 	event = kmem_cache_alloc(fanotify_event_cachep, gfp);
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index ea05b8a401e7..98d58939777c 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -22,6 +22,13 @@ struct fanotify_event_info {
 	struct pid *pid;
 };
 
+/* Possible states of the permission event */
+enum {
+	FAN_EVENT_INIT,
+	FAN_EVENT_REPORTED,
+	FAN_EVENT_ANSWERED
+};
+
 /*
  * Structure for permission fanotify events. It gets allocated and freed in
  * fanotify_handle_event() since we wait there for user response. When the
@@ -31,7 +38,8 @@ struct fanotify_event_info {
  */
 struct fanotify_perm_event_info {
 	struct fanotify_event_info fae;
-	int response;	/* userspace answer to question */
+	unsigned short response;	/* userspace answer to the event */
+	unsigned short state;		/* state of the event */
 	int fd;		/* fd we passed to userspace for this event */
 };
 
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 96dc469a4086..ef6bea0ae751 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -50,7 +50,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
 /*
  * Get an fsnotify notification event if one exists and is small
  * enough to fit in "count". Return an error pointer if the count
- * is not large enough.
+ * is not large enough. When permission event is dequeued, its state is
+ * updated accordingly.
  */
 static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
 					    size_t count)
@@ -67,6 +68,8 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
 		goto out;
 	}
 	event = fsnotify_remove_first_event(group);
+	if (fanotify_is_perm_event(event->mask))
+		FANOTIFY_PE(event)->state = FAN_EVENT_REPORTED;
 out:
 	spin_unlock(&group->notification_lock);
 	return event;
@@ -144,6 +147,21 @@ static int fill_event_metadata(struct fsnotify_group *group,
 	return ret;
 }
 
+/*
+ * Finish processing of permission event by setting it to ANSWERED state and
+ * drop group->notification_lock.
+ */
+static void finish_permission_event(struct fsnotify_group *group,
+				    struct fanotify_perm_event_info *event,
+				    unsigned int response)
+				    __releases(&group->notification_lock)
+{
+	assert_spin_locked(&group->notification_lock);
+	event->response = response;
+	event->state = FAN_EVENT_ANSWERED;
+	spin_unlock(&group->notification_lock);
+}
+
 static int process_access_response(struct fsnotify_group *group,
 				   struct fanotify_response *response_struct)
 {
@@ -179,8 +197,7 @@ static int process_access_response(struct fsnotify_group *group,
 			continue;
 
 		list_del_init(&event->fae.fse.list);
-		event->response = response;
-		spin_unlock(&group->notification_lock);
+		finish_permission_event(group, event, response);
 		wake_up(&group->fanotify_data.access_waitq);
 		return 0;
 	}
@@ -302,7 +319,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
 			fsnotify_destroy_event(group, kevent);
 		} else {
 			if (ret <= 0) {
-				FANOTIFY_PE(kevent)->response = FAN_DENY;
+				spin_lock(&group->notification_lock);
+				finish_permission_event(group,
+					FANOTIFY_PE(kevent), FAN_DENY);
 				wake_up(&group->fanotify_data.access_waitq);
 			} else {
 				spin_lock(&group->notification_lock);
@@ -371,7 +390,8 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 		event = list_first_entry(&group->fanotify_data.access_list,
 				struct fanotify_perm_event_info, fae.fse.list);
 		list_del_init(&event->fae.fse.list);
-		event->response = FAN_ALLOW;
+		finish_permission_event(group, event, FAN_ALLOW);
+		spin_lock(&group->notification_lock);
 	}
 
 	/*
@@ -384,10 +404,11 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 		if (!(fsn_event->mask & FANOTIFY_PERM_EVENTS)) {
 			spin_unlock(&group->notification_lock);
 			fsnotify_destroy_event(group, fsn_event);
-			spin_lock(&group->notification_lock);
 		} else {
-			FANOTIFY_PE(fsn_event)->response = FAN_ALLOW;
+			finish_permission_event(group, FANOTIFY_PE(fsn_event),
+						FAN_ALLOW);
 		}
+		spin_lock(&group->notification_lock);
 	}
 	spin_unlock(&group->notification_lock);
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events
  2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
                   ` (4 preceding siblings ...)
  2019-02-13 14:54 ` [PATCH 5/6] fanotify: Track permission event state Jan Kara
@ 2019-02-13 14:54 ` Jan Kara
  2019-02-13 21:02   ` Amir Goldstein
  2019-02-20 17:27 ` [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Orion Poplawski
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-13 14:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski, Jan Kara

When waiting for response to fanotify permission events, we currently
use uninterruptible waits. That makes code simple however it can cause
lots of processes to end up in uninterruptible sleep with hard reboot
being the only alternative in case fanotify listener process stops
responding (e.g. due to a bug in its implementation). Uninterruptible
sleep also makes system hibernation fail if the listener gets frozen
before the process generating fanotify permission event.

Fix these problems by using interruptible sleep for waiting for response
to fanotify event. This is slightly tricky though - we have to
detect when the event got already reported to userspace as in that
case we must not free the event. Instead we push the responsibility for
freeing the event to the process that will write response to the
event.

Reported-by: Orion Poplawski <orion@nwra.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c | 35 ++++++++++++++++++++++++++++++++---
 fs/notify/fanotify/fanotify.h |  3 ++-
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e725831be161..2e40d5d8660b 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -57,6 +57,13 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
 	return 0;
 }
 
+/*
+ * Wait for response to permission event. The function also takes care of
+ * freeing the permission event (or offloads that in case the wait is canceled
+ * by a signal). The function returns 0 in case access got allowed by userspace,
+ * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
+ * the wait got interrupted by a signal.
+ */
 static int fanotify_get_response(struct fsnotify_group *group,
 				 struct fanotify_perm_event_info *event,
 				 struct fsnotify_iter_info *iter_info)
@@ -65,8 +72,29 @@ static int fanotify_get_response(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	wait_event(group->fanotify_data.access_waitq,
-		   event->state == FAN_EVENT_ANSWERED);
+	ret = wait_event_interruptible(group->fanotify_data.access_waitq,
+				       event->state == FAN_EVENT_ANSWERED);
+	/* Signal pending? */
+	if (ret < 0) {
+		spin_lock(&group->notification_lock);
+		/* Event reported to userspace and no answer yet? */
+		if (event->state == FAN_EVENT_REPORTED) {
+			/* Event will get freed once userspace answers to it */
+			event->state = FAN_EVENT_CANCELED;
+			spin_unlock(&group->notification_lock);
+			return ret;
+		}
+		/* Event not yet reported? Just remove it. */
+		if (event->state == FAN_EVENT_INIT)
+			fsnotify_remove_queued_event(group, &event->fae.fse);
+		/*
+		 * Event may be also answered in case signal delivery raced
+		 * with wakeup. In that case we have nothing to do besides
+		 * freeing the event and reporting error.
+		 */
+		spin_unlock(&group->notification_lock);
+		goto out;
+	}
 
 	/* userspace responded, convert to something usable */
 	switch (event->response & ~FAN_AUDIT) {
@@ -84,6 +112,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
 		 group, event, ret);
+out:
+	fsnotify_destroy_event(group, &event->fae.fse);
 	
 	return ret;
 }
@@ -255,7 +285,6 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 	} else if (fanotify_is_perm_event(mask)) {
 		ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
 					    iter_info);
-		fsnotify_destroy_event(group, fsn_event);
 	}
 finish:
 	if (fanotify_is_perm_event(mask))
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 98d58939777c..bbdd2adfbf0f 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -26,7 +26,8 @@ struct fanotify_event_info {
 enum {
 	FAN_EVENT_INIT,
 	FAN_EVENT_REPORTED,
-	FAN_EVENT_ANSWERED
+	FAN_EVENT_ANSWERED,
+	FAN_EVENT_CANCELED,
 };
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] fanotify: Fold dequeue_event() into process_access_response()
  2019-02-13 14:54 ` [PATCH 1/6] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
@ 2019-02-13 19:42   ` Amir Goldstein
  0 siblings, 0 replies; 20+ messages in thread
From: Amir Goldstein @ 2019-02-13 19:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi, Orion Poplawski

On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
>
> Fold dequeue_event() into process_access_response(). This will make
> changes to use of ->response field easier.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/notify/fanotify/fanotify_user.c | 41 ++++++++++++--------------------------
>  1 file changed, 13 insertions(+), 28 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9c870b0d2b56..908ebc421d15 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -144,28 +144,6 @@ static int fill_event_metadata(struct fsnotify_group *group,
>         return ret;
>  }
>
> -static struct fanotify_perm_event_info *dequeue_event(
> -                               struct fsnotify_group *group, int fd)
> -{
> -       struct fanotify_perm_event_info *event, *return_e = NULL;
> -
> -       spin_lock(&group->notification_lock);
> -       list_for_each_entry(event, &group->fanotify_data.access_list,
> -                           fae.fse.list) {
> -               if (event->fd != fd)
> -                       continue;
> -
> -               list_del_init(&event->fae.fse.list);
> -               return_e = event;
> -               break;
> -       }
> -       spin_unlock(&group->notification_lock);
> -
> -       pr_debug("%s: found return_re=%p\n", __func__, return_e);
> -
> -       return return_e;
> -}
> -
>  static int process_access_response(struct fsnotify_group *group,
>                                    struct fanotify_response *response_struct)
>  {
> @@ -194,14 +172,21 @@ static int process_access_response(struct fsnotify_group *group,
>         if ((response & FAN_AUDIT) && !FAN_GROUP_FLAG(group, FAN_ENABLE_AUDIT))
>                 return -EINVAL;
>
> -       event = dequeue_event(group, fd);
> -       if (!event)
> -               return -ENOENT;
> +       spin_lock(&group->notification_lock);
> +       list_for_each_entry(event, &group->fanotify_data.access_list,
> +                           fae.fse.list) {
> +               if (event->fd != fd)
> +                       continue;
>
> -       event->response = response;
> -       wake_up(&group->fanotify_data.access_waitq);
> +               list_del_init(&event->fae.fse.list);
> +               event->response = response;
> +               spin_unlock(&group->notification_lock);
> +               wake_up(&group->fanotify_data.access_waitq);
> +               return 0;
> +       }
> +       spin_unlock(&group->notification_lock);
>
> -       return 0;
> +       return -ENOENT;
>  }
>
>  static ssize_t copy_event_to_user(struct fsnotify_group *group,
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] fanotify: Move locking inside get_one_event()
  2019-02-13 14:54 ` [PATCH 2/6] fanotify: Move locking inside get_one_event() Jan Kara
@ 2019-02-13 20:23   ` Amir Goldstein
  0 siblings, 0 replies; 20+ messages in thread
From: Amir Goldstein @ 2019-02-13 20:23 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi, Orion Poplawski

On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
>
> get_one_event() has a single caller and that just locks
> notification_lock around the call. Move locking inside get_one_event()
> as that will make using ->response field for permission event state
> easier.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/notify/fanotify/fanotify_user.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 908ebc421d15..4e36f5642797 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -51,25 +51,25 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
>   * Get an fsnotify notification event if one exists and is small
>   * enough to fit in "count". Return an error pointer if the count
>   * is not large enough.
> - *
> - * Called with the group->notification_lock held.
>   */
>  static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
>                                             size_t count)
>  {
> -       assert_spin_locked(&group->notification_lock);
> +       struct fsnotify_event *event = NULL;
>
>         pr_debug("%s: group=%p count=%zd\n", __func__, group, count);
> -
> +       spin_lock(&group->notification_lock);
>         if (fsnotify_notify_queue_is_empty(group))
> -               return NULL;
> +               goto out;
>
> -       if (FAN_EVENT_METADATA_LEN > count)
> -               return ERR_PTR(-EINVAL);
> -
> -       /* held the notification_lock the whole time, so this is the
> -        * same event we peeked above */
> -       return fsnotify_remove_first_event(group);
> +       if (FAN_EVENT_METADATA_LEN > count) {
> +               event = ERR_PTR(-EINVAL);
> +               goto out;
> +       }
> +       event = fsnotify_remove_first_event(group);
> +out:
> +       spin_unlock(&group->notification_lock);
> +       return event;
>  }
>
>  static int create_fd(struct fsnotify_group *group,
> @@ -261,10 +261,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
>
>         add_wait_queue(&group->notification_waitq, &wait);
>         while (1) {
> -               spin_lock(&group->notification_lock);
>                 kevent = get_one_event(group, count);
> -               spin_unlock(&group->notification_lock);
> -
>                 if (IS_ERR(kevent)) {
>                         ret = PTR_ERR(kevent);
>                         break;
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] fsnotify: Create function to remove event from notification list
  2019-02-13 14:54 ` [PATCH 3/6] fsnotify: Create function to remove event from notification list Jan Kara
@ 2019-02-13 20:23   ` Amir Goldstein
  0 siblings, 0 replies; 20+ messages in thread
From: Amir Goldstein @ 2019-02-13 20:23 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi, Orion Poplawski

On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
>
> Create function to remove event from the notification list. Later it will
> be used from more places.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/notify/notification.c         | 20 +++++++++++++-------
>  include/linux/fsnotify_backend.h |  3 +++
>  2 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/fs/notify/notification.c b/fs/notify/notification.c
> index 3c3e36745f59..2195a5cf745a 100644
> --- a/fs/notify/notification.c
> +++ b/fs/notify/notification.c
> @@ -141,6 +141,18 @@ int fsnotify_add_event(struct fsnotify_group *group,
>         return ret;
>  }
>
> +void fsnotify_remove_queued_event(struct fsnotify_group *group,
> +                                 struct fsnotify_event *event)
> +{
> +       assert_spin_locked(&group->notification_lock);
> +       /*
> +        * We need to init list head for the case of overflow event so that
> +        * check in fsnotify_add_event() works
> +        */
> +       list_del_init(&event->list);
> +       group->q_len--;
> +}
> +
>  /*
>   * Remove and return the first event from the notification list.  It is the
>   * responsibility of the caller to destroy the obtained event
> @@ -155,13 +167,7 @@ struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group)
>
>         event = list_first_entry(&group->notification_list,
>                                  struct fsnotify_event, list);
> -       /*
> -        * We need to init list head for the case of overflow event so that
> -        * check in fsnotify_add_event() works
> -        */
> -       list_del_init(&event->list);
> -       group->q_len--;
> -
> +       fsnotify_remove_queued_event(group, event);
>         return event;
>  }
>
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 7639774e7475..cddf839bac96 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -416,6 +416,9 @@ extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group);
>  extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group);
>  /* return AND dequeue the first event on the notification queue */
>  extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group);
> +/* Remove event queued in the notification list */
> +extern void fsnotify_remove_queued_event(struct fsnotify_group *group,
> +                                        struct fsnotify_event *event);
>
>  /* functions used to manipulate the marks attached to inodes */
>
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] fanotify: Simplify cleaning of access_list
  2019-02-13 14:54 ` [PATCH 4/6] fanotify: Simplify cleaning of access_list Jan Kara
@ 2019-02-13 20:25   ` Amir Goldstein
  0 siblings, 0 replies; 20+ messages in thread
From: Amir Goldstein @ 2019-02-13 20:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi, Orion Poplawski

On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
>
> Simplify iteration cleaning access_list in fanotify_release(). That will
> make following changes more obvious.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/notify/fanotify/fanotify_user.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 4e36f5642797..96dc469a4086 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -352,7 +352,7 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t
>  static int fanotify_release(struct inode *ignored, struct file *file)
>  {
>         struct fsnotify_group *group = file->private_data;
> -       struct fanotify_perm_event_info *event, *next;
> +       struct fanotify_perm_event_info *event;
>         struct fsnotify_event *fsn_event;
>
>         /*
> @@ -367,11 +367,9 @@ static int fanotify_release(struct inode *ignored, struct file *file)
>          * and simulate reply from userspace.
>          */
>         spin_lock(&group->notification_lock);
> -       list_for_each_entry_safe(event, next, &group->fanotify_data.access_list,
> -                                fae.fse.list) {
> -               pr_debug("%s: found group=%p event=%p\n", __func__, group,
> -                        event);
> -
> +       while (!list_empty(&group->fanotify_data.access_list)) {
> +               event = list_first_entry(&group->fanotify_data.access_list,
> +                               struct fanotify_perm_event_info, fae.fse.list);
>                 list_del_init(&event->fae.fse.list);
>                 event->response = FAN_ALLOW;
>         }
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] fanotify: Track permission event state
  2019-02-13 14:54 ` [PATCH 5/6] fanotify: Track permission event state Jan Kara
@ 2019-02-13 20:33   ` Amir Goldstein
  0 siblings, 0 replies; 20+ messages in thread
From: Amir Goldstein @ 2019-02-13 20:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi, Orion Poplawski

On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
>
> Track whether permission event got already reported to userspace and
> whether userspace already answered to the permission event. Protect
> stores to this field together with updates to ->response field by
> group->notification_lock. This will allow aborting wait for reply to
> permission event from userspace.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/notify/fanotify/fanotify.c      |  6 +++---
>  fs/notify/fanotify/fanotify.h      | 10 +++++++++-
>  fs/notify/fanotify/fanotify_user.c | 35 ++++++++++++++++++++++++++++-------
>  3 files changed, 40 insertions(+), 11 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 3723f3d18d20..e725831be161 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -65,7 +65,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
>
>         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> -       wait_event(group->fanotify_data.access_waitq, event->response);
> +       wait_event(group->fanotify_data.access_waitq,
> +                  event->state == FAN_EVENT_ANSWERED);
>
>         /* userspace responded, convert to something usable */
>         switch (event->response & ~FAN_AUDIT) {
> @@ -81,8 +82,6 @@ static int fanotify_get_response(struct fsnotify_group *group,
>         if (event->response & FAN_AUDIT)
>                 audit_fanotify(event->response & ~FAN_AUDIT);
>
> -       event->response = 0;
> -
>         pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
>                  group, event, ret);
>
> @@ -167,6 +166,7 @@ struct fanotify_event_info *fanotify_alloc_event(struct fsnotify_group *group,
>                         goto out;
>                 event = &pevent->fae;
>                 pevent->response = 0;
> +               pevent->state = FAN_EVENT_INIT;
>                 goto init;
>         }
>         event = kmem_cache_alloc(fanotify_event_cachep, gfp);
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index ea05b8a401e7..98d58939777c 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -22,6 +22,13 @@ struct fanotify_event_info {
>         struct pid *pid;
>  };
>
> +/* Possible states of the permission event */
> +enum {
> +       FAN_EVENT_INIT,
> +       FAN_EVENT_REPORTED,
> +       FAN_EVENT_ANSWERED
> +};
> +
>  /*
>   * Structure for permission fanotify events. It gets allocated and freed in
>   * fanotify_handle_event() since we wait there for user response. When the
> @@ -31,7 +38,8 @@ struct fanotify_event_info {
>   */
>  struct fanotify_perm_event_info {
>         struct fanotify_event_info fae;
> -       int response;   /* userspace answer to question */
> +       unsigned short response;        /* userspace answer to the event */
> +       unsigned short state;           /* state of the event */
>         int fd;         /* fd we passed to userspace for this event */
>  };
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 96dc469a4086..ef6bea0ae751 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -50,7 +50,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
>  /*
>   * Get an fsnotify notification event if one exists and is small
>   * enough to fit in "count". Return an error pointer if the count
> - * is not large enough.
> + * is not large enough. When permission event is dequeued, its state is
> + * updated accordingly.
>   */
>  static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
>                                             size_t count)
> @@ -67,6 +68,8 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
>                 goto out;
>         }
>         event = fsnotify_remove_first_event(group);
> +       if (fanotify_is_perm_event(event->mask))
> +               FANOTIFY_PE(event)->state = FAN_EVENT_REPORTED;
>  out:
>         spin_unlock(&group->notification_lock);
>         return event;
> @@ -144,6 +147,21 @@ static int fill_event_metadata(struct fsnotify_group *group,
>         return ret;
>  }
>
> +/*
> + * Finish processing of permission event by setting it to ANSWERED state and
> + * drop group->notification_lock.
> + */
> +static void finish_permission_event(struct fsnotify_group *group,
> +                                   struct fanotify_perm_event_info *event,
> +                                   unsigned int response)
> +                                   __releases(&group->notification_lock)
> +{
> +       assert_spin_locked(&group->notification_lock);
> +       event->response = response;
> +       event->state = FAN_EVENT_ANSWERED;
> +       spin_unlock(&group->notification_lock);
> +}
> +
>  static int process_access_response(struct fsnotify_group *group,
>                                    struct fanotify_response *response_struct)
>  {
> @@ -179,8 +197,7 @@ static int process_access_response(struct fsnotify_group *group,
>                         continue;
>
>                 list_del_init(&event->fae.fse.list);
> -               event->response = response;
> -               spin_unlock(&group->notification_lock);
> +               finish_permission_event(group, event, response);
>                 wake_up(&group->fanotify_data.access_waitq);
>                 return 0;
>         }
> @@ -302,7 +319,9 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
>                         fsnotify_destroy_event(group, kevent);
>                 } else {
>                         if (ret <= 0) {
> -                               FANOTIFY_PE(kevent)->response = FAN_DENY;
> +                               spin_lock(&group->notification_lock);
> +                               finish_permission_event(group,
> +                                       FANOTIFY_PE(kevent), FAN_DENY);
>                                 wake_up(&group->fanotify_data.access_waitq);
>                         } else {
>                                 spin_lock(&group->notification_lock);
> @@ -371,7 +390,8 @@ static int fanotify_release(struct inode *ignored, struct file *file)
>                 event = list_first_entry(&group->fanotify_data.access_list,
>                                 struct fanotify_perm_event_info, fae.fse.list);
>                 list_del_init(&event->fae.fse.list);
> -               event->response = FAN_ALLOW;
> +               finish_permission_event(group, event, FAN_ALLOW);
> +               spin_lock(&group->notification_lock);
>         }
>
>         /*
> @@ -384,10 +404,11 @@ static int fanotify_release(struct inode *ignored, struct file *file)
>                 if (!(fsn_event->mask & FANOTIFY_PERM_EVENTS)) {
>                         spin_unlock(&group->notification_lock);
>                         fsnotify_destroy_event(group, fsn_event);
> -                       spin_lock(&group->notification_lock);
>                 } else {
> -                       FANOTIFY_PE(fsn_event)->response = FAN_ALLOW;
> +                       finish_permission_event(group, FANOTIFY_PE(fsn_event),
> +                                               FAN_ALLOW);
>                 }
> +               spin_lock(&group->notification_lock);
>         }
>         spin_unlock(&group->notification_lock);
>
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events
  2019-02-13 14:54 ` [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events Jan Kara
@ 2019-02-13 21:02   ` Amir Goldstein
  2019-02-14 18:01     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2019-02-13 21:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi, Orion Poplawski

On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
>
> When waiting for response to fanotify permission events, we currently
> use uninterruptible waits. That makes code simple however it can cause
> lots of processes to end up in uninterruptible sleep with hard reboot
> being the only alternative in case fanotify listener process stops
> responding (e.g. due to a bug in its implementation). Uninterruptible
> sleep also makes system hibernation fail if the listener gets frozen
> before the process generating fanotify permission event.
>
> Fix these problems by using interruptible sleep for waiting for response
> to fanotify event. This is slightly tricky though - we have to
> detect when the event got already reported to userspace as in that
> case we must not free the event. Instead we push the responsibility for
> freeing the event to the process that will write response to the
> event.
>
> Reported-by: Orion Poplawski <orion@nwra.com>
> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/fanotify/fanotify.c | 35 ++++++++++++++++++++++++++++++++---
>  fs/notify/fanotify/fanotify.h |  3 ++-
>  2 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index e725831be161..2e40d5d8660b 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -57,6 +57,13 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
>         return 0;
>  }
>
> +/*
> + * Wait for response to permission event. The function also takes care of
> + * freeing the permission event (or offloads that in case the wait is canceled
> + * by a signal). The function returns 0 in case access got allowed by userspace,
> + * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
> + * the wait got interrupted by a signal.
> + */
>  static int fanotify_get_response(struct fsnotify_group *group,
>                                  struct fanotify_perm_event_info *event,
>                                  struct fsnotify_iter_info *iter_info)
> @@ -65,8 +72,29 @@ static int fanotify_get_response(struct fsnotify_group *group,
>
>         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> -       wait_event(group->fanotify_data.access_waitq,
> -                  event->state == FAN_EVENT_ANSWERED);
> +       ret = wait_event_interruptible(group->fanotify_data.access_waitq,
> +                                      event->state == FAN_EVENT_ANSWERED);
> +       /* Signal pending? */
> +       if (ret < 0) {
> +               spin_lock(&group->notification_lock);
> +               /* Event reported to userspace and no answer yet? */
> +               if (event->state == FAN_EVENT_REPORTED) {
> +                       /* Event will get freed once userspace answers to it */

Did you forget to commit the
if (event->state == FAN_EVENT_CANCELED)
code?

> +                       event->state = FAN_EVENT_CANCELED;
> +                       spin_unlock(&group->notification_lock);
> +                       return ret;
> +               }
> +               /* Event not yet reported? Just remove it. */
> +               if (event->state == FAN_EVENT_INIT)
> +                       fsnotify_remove_queued_event(group, &event->fae.fse);
> +               /*
> +                * Event may be also answered in case signal delivery raced
> +                * with wakeup. In that case we have nothing to do besides
> +                * freeing the event and reporting error.
> +                */
> +               spin_unlock(&group->notification_lock);
> +               goto out;
> +       }
>
>         /* userspace responded, convert to something usable */
>         switch (event->response & ~FAN_AUDIT) {
> @@ -84,6 +112,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
>
>         pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
>                  group, event, ret);
> +out:
> +       fsnotify_destroy_event(group, &event->fae.fse);
>
>         return ret;
>  }
> @@ -255,7 +285,6 @@ static int fanotify_handle_event(struct fsnotify_group *group,
>         } else if (fanotify_is_perm_event(mask)) {
>                 ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
>                                             iter_info);
> -               fsnotify_destroy_event(group, fsn_event);
>         }
>  finish:
>         if (fanotify_is_perm_event(mask))
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 98d58939777c..bbdd2adfbf0f 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -26,7 +26,8 @@ struct fanotify_event_info {
>  enum {
>         FAN_EVENT_INIT,
>         FAN_EVENT_REPORTED,
> -       FAN_EVENT_ANSWERED
> +       FAN_EVENT_ANSWERED,
> +       FAN_EVENT_CANCELED,
>  };
>
>  /*
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events
  2019-02-13 21:02   ` Amir Goldstein
@ 2019-02-14 18:01     ` Jan Kara
  2019-02-14 18:53       ` Amir Goldstein
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-14 18:01 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski

[-- Attachment #1: Type: text/plain, Size: 3317 bytes --]

On Wed 13-02-19 23:02:17, Amir Goldstein wrote:
> On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
> >
> > When waiting for response to fanotify permission events, we currently
> > use uninterruptible waits. That makes code simple however it can cause
> > lots of processes to end up in uninterruptible sleep with hard reboot
> > being the only alternative in case fanotify listener process stops
> > responding (e.g. due to a bug in its implementation). Uninterruptible
> > sleep also makes system hibernation fail if the listener gets frozen
> > before the process generating fanotify permission event.
> >
> > Fix these problems by using interruptible sleep for waiting for response
> > to fanotify event. This is slightly tricky though - we have to
> > detect when the event got already reported to userspace as in that
> > case we must not free the event. Instead we push the responsibility for
> > freeing the event to the process that will write response to the
> > event.
> >
> > Reported-by: Orion Poplawski <orion@nwra.com>
> > Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/notify/fanotify/fanotify.c | 35 ++++++++++++++++++++++++++++++++---
> >  fs/notify/fanotify/fanotify.h |  3 ++-
> >  2 files changed, 34 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> > index e725831be161..2e40d5d8660b 100644
> > --- a/fs/notify/fanotify/fanotify.c
> > +++ b/fs/notify/fanotify/fanotify.c
> > @@ -57,6 +57,13 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
> >         return 0;
> >  }
> >
> > +/*
> > + * Wait for response to permission event. The function also takes care of
> > + * freeing the permission event (or offloads that in case the wait is canceled
> > + * by a signal). The function returns 0 in case access got allowed by userspace,
> > + * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
> > + * the wait got interrupted by a signal.
> > + */
> >  static int fanotify_get_response(struct fsnotify_group *group,
> >                                  struct fanotify_perm_event_info *event,
> >                                  struct fsnotify_iter_info *iter_info)
> > @@ -65,8 +72,29 @@ static int fanotify_get_response(struct fsnotify_group *group,
> >
> >         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> >
> > -       wait_event(group->fanotify_data.access_waitq,
> > -                  event->state == FAN_EVENT_ANSWERED);
> > +       ret = wait_event_interruptible(group->fanotify_data.access_waitq,
> > +                                      event->state == FAN_EVENT_ANSWERED);
> > +       /* Signal pending? */
> > +       if (ret < 0) {
> > +               spin_lock(&group->notification_lock);
> > +               /* Event reported to userspace and no answer yet? */
> > +               if (event->state == FAN_EVENT_REPORTED) {
> > +                       /* Event will get freed once userspace answers to it */
> 
> Did you forget to commit the
> if (event->state == FAN_EVENT_CANCELED)
> code?

Yeah, somehow I forgot that bit. Thanks for noticing! Attached is a fixed
version of the patch.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-fanotify-Use-interruptible-wait-when-waiting-for-per.patch --]
[-- Type: text/x-patch, Size: 5175 bytes --]

From 071ab4839c021dd759056a6728d9cd47bd189419 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 8 Jan 2019 15:18:02 +0100
Subject: [PATCH] fanotify: Use interruptible wait when waiting for permission
 events

When waiting for response to fanotify permission events, we currently
use uninterruptible waits. That makes code simple however it can cause
lots of processes to end up in uninterruptible sleep with hard reboot
being the only alternative in case fanotify listener process stops
responding (e.g. due to a bug in its implementation). Uninterruptible
sleep also makes system hibernation fail if the listener gets frozen
before the process generating fanotify permission event.

Fix these problems by using interruptible sleep for waiting for response
to fanotify event. This is slightly tricky though - we have to
detect when the event got already reported to userspace as in that
case we must not free the event. Instead we push the responsibility for
freeing the event to the process that will write response to the
event.

Reported-by: Orion Poplawski <orion@nwra.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c      | 35 ++++++++++++++++++++++++++++++++---
 fs/notify/fanotify/fanotify.h      |  3 ++-
 fs/notify/fanotify/fanotify_user.c |  9 ++++++++-
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index e725831be161..2e40d5d8660b 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -57,6 +57,13 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
 	return 0;
 }
 
+/*
+ * Wait for response to permission event. The function also takes care of
+ * freeing the permission event (or offloads that in case the wait is canceled
+ * by a signal). The function returns 0 in case access got allowed by userspace,
+ * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
+ * the wait got interrupted by a signal.
+ */
 static int fanotify_get_response(struct fsnotify_group *group,
 				 struct fanotify_perm_event_info *event,
 				 struct fsnotify_iter_info *iter_info)
@@ -65,8 +72,29 @@ static int fanotify_get_response(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	wait_event(group->fanotify_data.access_waitq,
-		   event->state == FAN_EVENT_ANSWERED);
+	ret = wait_event_interruptible(group->fanotify_data.access_waitq,
+				       event->state == FAN_EVENT_ANSWERED);
+	/* Signal pending? */
+	if (ret < 0) {
+		spin_lock(&group->notification_lock);
+		/* Event reported to userspace and no answer yet? */
+		if (event->state == FAN_EVENT_REPORTED) {
+			/* Event will get freed once userspace answers to it */
+			event->state = FAN_EVENT_CANCELED;
+			spin_unlock(&group->notification_lock);
+			return ret;
+		}
+		/* Event not yet reported? Just remove it. */
+		if (event->state == FAN_EVENT_INIT)
+			fsnotify_remove_queued_event(group, &event->fae.fse);
+		/*
+		 * Event may be also answered in case signal delivery raced
+		 * with wakeup. In that case we have nothing to do besides
+		 * freeing the event and reporting error.
+		 */
+		spin_unlock(&group->notification_lock);
+		goto out;
+	}
 
 	/* userspace responded, convert to something usable */
 	switch (event->response & ~FAN_AUDIT) {
@@ -84,6 +112,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
 		 group, event, ret);
+out:
+	fsnotify_destroy_event(group, &event->fae.fse);
 	
 	return ret;
 }
@@ -255,7 +285,6 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 	} else if (fanotify_is_perm_event(mask)) {
 		ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
 					    iter_info);
-		fsnotify_destroy_event(group, fsn_event);
 	}
 finish:
 	if (fanotify_is_perm_event(mask))
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 98d58939777c..bbdd2adfbf0f 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -26,7 +26,8 @@ struct fanotify_event_info {
 enum {
 	FAN_EVENT_INIT,
 	FAN_EVENT_REPORTED,
-	FAN_EVENT_ANSWERED
+	FAN_EVENT_ANSWERED,
+	FAN_EVENT_CANCELED,
 };
 
 /*
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index ef6bea0ae751..7db4b03c6743 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -156,10 +156,17 @@ static void finish_permission_event(struct fsnotify_group *group,
 				    unsigned int response)
 				    __releases(&group->notification_lock)
 {
+	bool destroy = false;
+
 	assert_spin_locked(&group->notification_lock);
 	event->response = response;
-	event->state = FAN_EVENT_ANSWERED;
+	if (event->state == FAN_EVENT_CANCELED)
+		destroy = true;
+	else
+		event->state = FAN_EVENT_ANSWERED;
 	spin_unlock(&group->notification_lock);
+	if (destroy)
+		fsnotify_destroy_event(group, &event->fae.fse);
 }
 
 static int process_access_response(struct fsnotify_group *group,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events
  2019-02-14 18:01     ` Jan Kara
@ 2019-02-14 18:53       ` Amir Goldstein
  2019-02-18 11:04         ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Amir Goldstein @ 2019-02-14 18:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi, Orion Poplawski

On Thu, Feb 14, 2019 at 8:01 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 13-02-19 23:02:17, Amir Goldstein wrote:
> > On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > When waiting for response to fanotify permission events, we currently
> > > use uninterruptible waits. That makes code simple however it can cause
> > > lots of processes to end up in uninterruptible sleep with hard reboot
> > > being the only alternative in case fanotify listener process stops
> > > responding (e.g. due to a bug in its implementation). Uninterruptible
> > > sleep also makes system hibernation fail if the listener gets frozen
> > > before the process generating fanotify permission event.
> > >
> > > Fix these problems by using interruptible sleep for waiting for response
> > > to fanotify event. This is slightly tricky though - we have to
> > > detect when the event got already reported to userspace as in that
> > > case we must not free the event. Instead we push the responsibility for
> > > freeing the event to the process that will write response to the
> > > event.
> > >
> > > Reported-by: Orion Poplawski <orion@nwra.com>
> > > Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  fs/notify/fanotify/fanotify.c | 35 ++++++++++++++++++++++++++++++++---
> > >  fs/notify/fanotify/fanotify.h |  3 ++-
> > >  2 files changed, 34 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> > > index e725831be161..2e40d5d8660b 100644
> > > --- a/fs/notify/fanotify/fanotify.c
> > > +++ b/fs/notify/fanotify/fanotify.c
> > > @@ -57,6 +57,13 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
> > >         return 0;
> > >  }
> > >
> > > +/*
> > > + * Wait for response to permission event. The function also takes care of
> > > + * freeing the permission event (or offloads that in case the wait is canceled
> > > + * by a signal). The function returns 0 in case access got allowed by userspace,
> > > + * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
> > > + * the wait got interrupted by a signal.
> > > + */
> > >  static int fanotify_get_response(struct fsnotify_group *group,
> > >                                  struct fanotify_perm_event_info *event,
> > >                                  struct fsnotify_iter_info *iter_info)
> > > @@ -65,8 +72,29 @@ static int fanotify_get_response(struct fsnotify_group *group,
> > >
> > >         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > >
> > > -       wait_event(group->fanotify_data.access_waitq,
> > > -                  event->state == FAN_EVENT_ANSWERED);
> > > +       ret = wait_event_interruptible(group->fanotify_data.access_waitq,
> > > +                                      event->state == FAN_EVENT_ANSWERED);
> > > +       /* Signal pending? */
> > > +       if (ret < 0) {
> > > +               spin_lock(&group->notification_lock);
> > > +               /* Event reported to userspace and no answer yet? */
> > > +               if (event->state == FAN_EVENT_REPORTED) {
> > > +                       /* Event will get freed once userspace answers to it */
> >
> > Did you forget to commit the
> > if (event->state == FAN_EVENT_CANCELED)
> > code?
>
> Yeah, somehow I forgot that bit. Thanks for noticing! Attached is a fixed
> version of the patch.
>

Looks ok.

Feel free to add:
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events
  2019-02-14 18:53       ` Amir Goldstein
@ 2019-02-18 11:04         ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2019-02-18 11:04 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Konstantin Khlebnikov, Vivek Trivedi,
	Orion Poplawski

On Thu 14-02-19 20:53:38, Amir Goldstein wrote:
> On Thu, Feb 14, 2019 at 8:01 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 13-02-19 23:02:17, Amir Goldstein wrote:
> > > On Wed, Feb 13, 2019 at 4:54 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > When waiting for response to fanotify permission events, we currently
> > > > use uninterruptible waits. That makes code simple however it can cause
> > > > lots of processes to end up in uninterruptible sleep with hard reboot
> > > > being the only alternative in case fanotify listener process stops
> > > > responding (e.g. due to a bug in its implementation). Uninterruptible
> > > > sleep also makes system hibernation fail if the listener gets frozen
> > > > before the process generating fanotify permission event.
> > > >
> > > > Fix these problems by using interruptible sleep for waiting for response
> > > > to fanotify event. This is slightly tricky though - we have to
> > > > detect when the event got already reported to userspace as in that
> > > > case we must not free the event. Instead we push the responsibility for
> > > > freeing the event to the process that will write response to the
> > > > event.
> > > >
> > > > Reported-by: Orion Poplawski <orion@nwra.com>
> > > > Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > > ---
> > > >  fs/notify/fanotify/fanotify.c | 35 ++++++++++++++++++++++++++++++++---
> > > >  fs/notify/fanotify/fanotify.h |  3 ++-
> > > >  2 files changed, 34 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> > > > index e725831be161..2e40d5d8660b 100644
> > > > --- a/fs/notify/fanotify/fanotify.c
> > > > +++ b/fs/notify/fanotify/fanotify.c
> > > > @@ -57,6 +57,13 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
> > > >         return 0;
> > > >  }
> > > >
> > > > +/*
> > > > + * Wait for response to permission event. The function also takes care of
> > > > + * freeing the permission event (or offloads that in case the wait is canceled
> > > > + * by a signal). The function returns 0 in case access got allowed by userspace,
> > > > + * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
> > > > + * the wait got interrupted by a signal.
> > > > + */
> > > >  static int fanotify_get_response(struct fsnotify_group *group,
> > > >                                  struct fanotify_perm_event_info *event,
> > > >                                  struct fsnotify_iter_info *iter_info)
> > > > @@ -65,8 +72,29 @@ static int fanotify_get_response(struct fsnotify_group *group,
> > > >
> > > >         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > > >
> > > > -       wait_event(group->fanotify_data.access_waitq,
> > > > -                  event->state == FAN_EVENT_ANSWERED);
> > > > +       ret = wait_event_interruptible(group->fanotify_data.access_waitq,
> > > > +                                      event->state == FAN_EVENT_ANSWERED);
> > > > +       /* Signal pending? */
> > > > +       if (ret < 0) {
> > > > +               spin_lock(&group->notification_lock);
> > > > +               /* Event reported to userspace and no answer yet? */
> > > > +               if (event->state == FAN_EVENT_REPORTED) {
> > > > +                       /* Event will get freed once userspace answers to it */
> > >
> > > Did you forget to commit the
> > > if (event->state == FAN_EVENT_CANCELED)
> > > code?
> >
> > Yeah, somehow I forgot that bit. Thanks for noticing! Attached is a fixed
> > version of the patch.
> >
> 
> Looks ok.
> 
> Feel free to add:
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>

Thanks for review Amir! I'll queue patches to my tree then.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible
  2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
                   ` (5 preceding siblings ...)
  2019-02-13 14:54 ` [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events Jan Kara
@ 2019-02-20 17:27 ` Orion Poplawski
  2019-02-21  9:02   ` Marko Rauhamaa
  2019-02-21 10:53   ` Jan Kara
  6 siblings, 2 replies; 20+ messages in thread
From: Orion Poplawski @ 2019-02-20 17:27 UTC (permalink / raw)
  To: Jan Kara, linux-fsdevel
  Cc: Amir Goldstein, Konstantin Khlebnikov, Vivek Trivedi

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

On 2/13/19 7:54 AM, Jan Kara wrote:
> Hello,
> 
> When waiting for response to fanotify permission events, we currently use
> uninterruptible waits. That makes code simple however it can cause lots of
> processes to end up in uninterruptible sleep with hard reboot being the only
> alternative in case fanotify listener process stops responding (e.g. due to a
> bug in its implementation) - reported e.g. in [1]. Uninterruptible sleep also
> makes system hibernation fail if the listener gets frozen before the process
> generating fanotify permission event (as reported e.g. here [2]).
> 
> This patch set modifies fanotify so that it will use interruptible wait when
> waiting for fanotify permission event response. Patches are based on current
> Linus' tree for the ease of testing (I plan to rebase them on top of Amir's
> pending changes later). I have also create LTP test which stresses handling of
> permission events while sending processes signals to test the new code [3]
> Review, comments, and testing are welcome.
> 
> [1] https://lore.kernel.org/lkml/153474898224.6806.12518115530793064797.stgit@buzz/
> [2] https://lore.kernel.org/lkml/c1bb16b7-9eee-9cea-2c96-a512d8b3b9c7@nwra.com/
> [3] https://lwn.net/ml/linux-fsdevel/20190108165307.GA11259@quack2.suse.cz/
> 
> Changes since v1:
> * leave pr_debug() calls alone (Amir)
> * simplify permission event state tracking (Amir)
> * split some changes into separate patches (Amir)
> 
> 								Honza
> 

I backported these patches to the RHEL7 kernel and have started running that.
One thing I've noticed are messages like the following at login time:

bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call

I've commented on a bash bug report here https://savannah.gnu.org/support/?109159

But I'm wondering if these changes are leading to more EINTR returns from
open() than expected.  Of if this is the new "normal", or if this points to
bugs in the antivirus software holding the fanotify callbacks.

Thoughts?

Thanks again.


-- 
Orion Poplawski
Manager of NWRA Technical Systems          720-772-5637
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion@nwra.com
Boulder, CO 80301                 https://www.nwra.com/


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3799 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible
  2019-02-20 17:27 ` [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Orion Poplawski
@ 2019-02-21  9:02   ` Marko Rauhamaa
  2019-02-21 10:53   ` Jan Kara
  1 sibling, 0 replies; 20+ messages in thread
From: Marko Rauhamaa @ 2019-02-21  9:02 UTC (permalink / raw)
  To: Orion Poplawski
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Konstantin Khlebnikov,
	Vivek Trivedi

Orion Poplawski <orion@nwra.com>:

> I backported these patches to the RHEL7 kernel and have started
> running that. One thing I've noticed are messages like the following
> at login time:
>
> bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call
>
> [...]
>
> But I'm wondering if these changes are leading to more EINTR returns
> from open() than expected. Of if this is the new "normal", or if this
> points to bugs in the antivirus software holding the fanotify
> callbacks.
>
> Thoughts?

A great observation!

I believe 99.9% of Linux software is unprepared for an EINTR from
regular file I/O. That might even be considered a violation of the
API.

(You problem report is reminiscent of what I'm experiencing with emacs'
"M-x compile" command. The compilation fails if I should change the
window geometry simultaneously because a SIGWINCH hits the compilation
job. Nothing to do with fanotify but somewhat analogous.)


Marko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible
  2019-02-20 17:27 ` [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Orion Poplawski
  2019-02-21  9:02   ` Marko Rauhamaa
@ 2019-02-21 10:53   ` Jan Kara
  2019-02-21 10:55     ` Jan Kara
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Kara @ 2019-02-21 10:53 UTC (permalink / raw)
  To: Orion Poplawski
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Konstantin Khlebnikov,
	Vivek Trivedi

On Wed 20-02-19 10:27:04, Orion Poplawski wrote:
> On 2/13/19 7:54 AM, Jan Kara wrote:
> > Hello,
> > 
> > When waiting for response to fanotify permission events, we currently use
> > uninterruptible waits. That makes code simple however it can cause lots of
> > processes to end up in uninterruptible sleep with hard reboot being the only
> > alternative in case fanotify listener process stops responding (e.g. due to a
> > bug in its implementation) - reported e.g. in [1]. Uninterruptible sleep also
> > makes system hibernation fail if the listener gets frozen before the process
> > generating fanotify permission event (as reported e.g. here [2]).
> > 
> > This patch set modifies fanotify so that it will use interruptible wait when
> > waiting for fanotify permission event response. Patches are based on current
> > Linus' tree for the ease of testing (I plan to rebase them on top of Amir's
> > pending changes later). I have also create LTP test which stresses handling of
> > permission events while sending processes signals to test the new code [3]
> > Review, comments, and testing are welcome.
> > 
> > [1] https://lore.kernel.org/lkml/153474898224.6806.12518115530793064797.stgit@buzz/
> > [2] https://lore.kernel.org/lkml/c1bb16b7-9eee-9cea-2c96-a512d8b3b9c7@nwra.com/
> > [3] https://lwn.net/ml/linux-fsdevel/20190108165307.GA11259@quack2.suse.cz/
> > 
> > Changes since v1:
> > * leave pr_debug() calls alone (Amir)
> > * simplify permission event state tracking (Amir)
> > * split some changes into separate patches (Amir)
> > 
> > 								Honza
> > 
> 
> I backported these patches to the RHEL7 kernel and have started running that.
> One thing I've noticed are messages like the following at login time:
> 
> bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call
> 
> I've commented on a bash bug report here
> https://savannah.gnu.org/support/?109159

Thanks for trying these out! Yes, so the patches can definitely lead to
EINTR returns from open(2) if there's fanotify permission event generated
by it and the opening process has a signal pending. Now EINTR is documented
as a possible return from open(2) but Marko is right that in practice
open(2) on local filesystem never returns EINTR so programs just don't
bother handling it. Since breaking userspace is no-go, we probably cannot
apply the change as is.

What we can do easily is to change the wait_event_interruptible() to
wait_event_killable(). This is what we commonly do when we want to allow
administrator to interrupt a syscall but userspace is not prepared for
EINTR. That will at least allow processes that are waiting for fanotify
response to be killed. So I'll do this for the coming merge window
(attached patch). However this will not solve your problems with
hibernation as TASK_KILLABLE tasks cannot be hibernated AFAICS. I will have
to talk with people more knowledgeable about hibernation if there's a
solution to this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible
  2019-02-21 10:53   ` Jan Kara
@ 2019-02-21 10:55     ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2019-02-21 10:55 UTC (permalink / raw)
  To: Orion Poplawski
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Konstantin Khlebnikov,
	Vivek Trivedi

[-- Attachment #1: Type: text/plain, Size: 3259 bytes --]

Sorry, forgot to attach the patch...

								Honza

On Thu 21-02-19 11:53:10, Jan Kara wrote:
> On Wed 20-02-19 10:27:04, Orion Poplawski wrote:
> > On 2/13/19 7:54 AM, Jan Kara wrote:
> > > Hello,
> > > 
> > > When waiting for response to fanotify permission events, we currently use
> > > uninterruptible waits. That makes code simple however it can cause lots of
> > > processes to end up in uninterruptible sleep with hard reboot being the only
> > > alternative in case fanotify listener process stops responding (e.g. due to a
> > > bug in its implementation) - reported e.g. in [1]. Uninterruptible sleep also
> > > makes system hibernation fail if the listener gets frozen before the process
> > > generating fanotify permission event (as reported e.g. here [2]).
> > > 
> > > This patch set modifies fanotify so that it will use interruptible wait when
> > > waiting for fanotify permission event response. Patches are based on current
> > > Linus' tree for the ease of testing (I plan to rebase them on top of Amir's
> > > pending changes later). I have also create LTP test which stresses handling of
> > > permission events while sending processes signals to test the new code [3]
> > > Review, comments, and testing are welcome.
> > > 
> > > [1] https://lore.kernel.org/lkml/153474898224.6806.12518115530793064797.stgit@buzz/
> > > [2] https://lore.kernel.org/lkml/c1bb16b7-9eee-9cea-2c96-a512d8b3b9c7@nwra.com/
> > > [3] https://lwn.net/ml/linux-fsdevel/20190108165307.GA11259@quack2.suse.cz/
> > > 
> > > Changes since v1:
> > > * leave pr_debug() calls alone (Amir)
> > > * simplify permission event state tracking (Amir)
> > > * split some changes into separate patches (Amir)
> > > 
> > > 								Honza
> > > 
> > 
> > I backported these patches to the RHEL7 kernel and have started running that.
> > One thing I've noticed are messages like the following at login time:
> > 
> > bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call
> > 
> > I've commented on a bash bug report here
> > https://savannah.gnu.org/support/?109159
> 
> Thanks for trying these out! Yes, so the patches can definitely lead to
> EINTR returns from open(2) if there's fanotify permission event generated
> by it and the opening process has a signal pending. Now EINTR is documented
> as a possible return from open(2) but Marko is right that in practice
> open(2) on local filesystem never returns EINTR so programs just don't
> bother handling it. Since breaking userspace is no-go, we probably cannot
> apply the change as is.
> 
> What we can do easily is to change the wait_event_interruptible() to
> wait_event_killable(). This is what we commonly do when we want to allow
> administrator to interrupt a syscall but userspace is not prepared for
> EINTR. That will at least allow processes that are waiting for fanotify
> response to be killed. So I'll do this for the coming merge window
> (attached patch). However this will not solve your problems with
> hibernation as TASK_KILLABLE tasks cannot be hibernated AFAICS. I will have
> to talk with people more knowledgeable about hibernation if there's a
> solution to this.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-fanotify-Make-waits-for-fanotify-events-only-killabl.patch --]
[-- Type: text/x-patch, Size: 1642 bytes --]

From b51905798195eeb427c873643b3ada0d7bd991a7 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 21 Feb 2019 11:47:23 +0100
Subject: [PATCH] fanotify: Make waits for fanotify events only killable

Making waits for response to fanotify permission events interruptible
can result in EINTR returns from open(2) or other syscalls when there's
e.g. AV software that's monitoring the file. Orion reports that e.g.
bash is complaining like:

bash: /etc/bash_completion.d/itweb-settings.bash: Interrupted system call

So for now convert the wait from interruptible to only killable one.
That is mostly invisible to userspace. Sadly this breaks hibernation
with fanotify permission events pending again but we have to put more
thought into how to fix this without regressing userspace visible
behavior.

Reported-by: Orion Poplawski <orion@nwra.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index ff7b8a1cdfe1..6b9c27548997 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -92,8 +92,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	ret = wait_event_interruptible(group->fanotify_data.access_waitq,
-				       event->state == FAN_EVENT_ANSWERED);
+	ret = wait_event_killable(group->fanotify_data.access_waitq,
+				  event->state == FAN_EVENT_ANSWERED);
 	/* Signal pending? */
 	if (ret < 0) {
 		spin_lock(&group->notification_lock);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-02-21 10:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13 14:54 [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Jan Kara
2019-02-13 14:54 ` [PATCH 1/6] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
2019-02-13 19:42   ` Amir Goldstein
2019-02-13 14:54 ` [PATCH 2/6] fanotify: Move locking inside get_one_event() Jan Kara
2019-02-13 20:23   ` Amir Goldstein
2019-02-13 14:54 ` [PATCH 3/6] fsnotify: Create function to remove event from notification list Jan Kara
2019-02-13 20:23   ` Amir Goldstein
2019-02-13 14:54 ` [PATCH 4/6] fanotify: Simplify cleaning of access_list Jan Kara
2019-02-13 20:25   ` Amir Goldstein
2019-02-13 14:54 ` [PATCH 5/6] fanotify: Track permission event state Jan Kara
2019-02-13 20:33   ` Amir Goldstein
2019-02-13 14:54 ` [PATCH 6/6] fanotify: Use interruptible wait when waiting for permission events Jan Kara
2019-02-13 21:02   ` Amir Goldstein
2019-02-14 18:01     ` Jan Kara
2019-02-14 18:53       ` Amir Goldstein
2019-02-18 11:04         ` Jan Kara
2019-02-20 17:27 ` [PATCH v2 0/6] fanotify: Make wait for permission event response interruptible Orion Poplawski
2019-02-21  9:02   ` Marko Rauhamaa
2019-02-21 10:53   ` Jan Kara
2019-02-21 10:55     ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).