linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] fanotify: Make wait for permission event response interruptible
@ 2019-01-08 16:46 Jan Kara
  2019-01-08 16:46 ` [PATCH 1/4] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jan Kara @ 2019-01-08 16:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov, Jan Kara

Hello,

When waiting for response to fanotify permission events, we currently use
uninterruptible waits. That makes code simple however it can cause lots of
processes to end up in uninterruptible sleep with hard reboot being the only
alternative in case fanotify listener process stops responding (e.g. due to a
bug in its implementation) - reported e.g. in [1]. Uninterruptible sleep also
makes system hibernation fail if the listener gets frozen before the process
generating fanotify permission event (as reported e.g. here [2]).

This patch set modifies fanotify so that it will use interruptible wait when
waiting for fanotify permission event response. Patches are based on current
Linus' tree for the ease of testing (I plan to rebase them on top of Amir's
pending changes later). I have also create LTP test which stresses handling of
permission events while sending processes signals to test the new code - I'll
send that separately later. Review, comments, and testing are welcome.

[1] https://lore.kernel.org/lkml/153474898224.6806.12518115530793064797.stgit@buzz/
[2] https://lore.kernel.org/lkml/c1bb16b7-9eee-9cea-2c96-a512d8b3b9c7@nwra.com/

								Honza

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] fanotify: Fold dequeue_event() into process_access_response()
  2019-01-08 16:46 [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
@ 2019-01-08 16:46 ` Jan Kara
  2019-01-09  6:59   ` Amir Goldstein
  2019-01-08 16:46 ` [PATCH 2/4] fanotify: Move locking inside get_one_event() Jan Kara
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-01-08 16:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov, Jan Kara

Fold dequeue_event() into process_access_response(). This will make
changes to use of ->response field easier.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify_user.c | 41 ++++++++++++--------------------------
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9c870b0d2b56..908ebc421d15 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -144,28 +144,6 @@ static int fill_event_metadata(struct fsnotify_group *group,
 	return ret;
 }
 
-static struct fanotify_perm_event_info *dequeue_event(
-				struct fsnotify_group *group, int fd)
-{
-	struct fanotify_perm_event_info *event, *return_e = NULL;
-
-	spin_lock(&group->notification_lock);
-	list_for_each_entry(event, &group->fanotify_data.access_list,
-			    fae.fse.list) {
-		if (event->fd != fd)
-			continue;
-
-		list_del_init(&event->fae.fse.list);
-		return_e = event;
-		break;
-	}
-	spin_unlock(&group->notification_lock);
-
-	pr_debug("%s: found return_re=%p\n", __func__, return_e);
-
-	return return_e;
-}
-
 static int process_access_response(struct fsnotify_group *group,
 				   struct fanotify_response *response_struct)
 {
@@ -194,14 +172,21 @@ static int process_access_response(struct fsnotify_group *group,
 	if ((response & FAN_AUDIT) && !FAN_GROUP_FLAG(group, FAN_ENABLE_AUDIT))
 		return -EINVAL;
 
-	event = dequeue_event(group, fd);
-	if (!event)
-		return -ENOENT;
+	spin_lock(&group->notification_lock);
+	list_for_each_entry(event, &group->fanotify_data.access_list,
+			    fae.fse.list) {
+		if (event->fd != fd)
+			continue;
 
-	event->response = response;
-	wake_up(&group->fanotify_data.access_waitq);
+		list_del_init(&event->fae.fse.list);
+		event->response = response;
+		spin_unlock(&group->notification_lock);
+		wake_up(&group->fanotify_data.access_waitq);
+		return 0;
+	}
+	spin_unlock(&group->notification_lock);
 
-	return 0;
+	return -ENOENT;
 }
 
 static ssize_t copy_event_to_user(struct fsnotify_group *group,
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] fanotify: Move locking inside get_one_event()
  2019-01-08 16:46 [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
  2019-01-08 16:46 ` [PATCH 1/4] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
@ 2019-01-08 16:46 ` Jan Kara
  2019-01-09  7:09   ` Amir Goldstein
  2019-01-08 16:46 ` [PATCH 3/4] fanotify: Track permission event state Jan Kara
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-01-08 16:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov, Jan Kara

get_one_event() has a single caller and that just locks
notification_lock around the call. Move locking inside get_one_event()
as that will make using ->response field for permission event state
easier.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify_user.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 908ebc421d15..2b2c8b8a17bd 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -51,25 +51,24 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
  * Get an fsnotify notification event if one exists and is small
  * enough to fit in "count". Return an error pointer if the count
  * is not large enough.
- *
- * Called with the group->notification_lock held.
  */
 static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
 					    size_t count)
 {
-	assert_spin_locked(&group->notification_lock);
-
-	pr_debug("%s: group=%p count=%zd\n", __func__, group, count);
+	struct fsnotify_event *event = NULL;
 
+	spin_lock(&group->notification_lock);
 	if (fsnotify_notify_queue_is_empty(group))
-		return NULL;
+		goto out;
 
-	if (FAN_EVENT_METADATA_LEN > count)
-		return ERR_PTR(-EINVAL);
-
-	/* held the notification_lock the whole time, so this is the
-	 * same event we peeked above */
-	return fsnotify_remove_first_event(group);
+	if (FAN_EVENT_METADATA_LEN > count) {
+		event = ERR_PTR(-EINVAL);
+		goto out;
+	}
+	event = fsnotify_remove_first_event(group);
+out:
+	spin_unlock(&group->notification_lock);
+	return event;
 }
 
 static int create_fd(struct fsnotify_group *group,
@@ -261,10 +260,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
 
 	add_wait_queue(&group->notification_waitq, &wait);
 	while (1) {
-		spin_lock(&group->notification_lock);
 		kevent = get_one_event(group, count);
-		spin_unlock(&group->notification_lock);
-
 		if (IS_ERR(kevent)) {
 			ret = PTR_ERR(kevent);
 			break;
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] fanotify: Track permission event state
  2019-01-08 16:46 [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
  2019-01-08 16:46 ` [PATCH 1/4] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
  2019-01-08 16:46 ` [PATCH 2/4] fanotify: Move locking inside get_one_event() Jan Kara
@ 2019-01-08 16:46 ` Jan Kara
  2019-01-09  7:22   ` Amir Goldstein
  2019-01-08 16:46 ` [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events Jan Kara
  2019-01-08 16:53 ` [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-01-08 16:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov, Jan Kara

Track whether permission event got already reported to userspace and
whether userspace already answered to the permission event in high bits
of its ->response field. Also protect stores to ->response field by
group->notification_lock. This will allow aborting wait for reply to
permission event from userspace.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c      | 16 ++++++++++------
 fs/notify/fanotify/fanotify.h      | 10 +++++++++-
 fs/notify/fanotify/fanotify_user.c | 17 ++++++++++++-----
 3 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index 3723f3d18d20..cca13adc3a4c 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -62,13 +62,19 @@ static int fanotify_get_response(struct fsnotify_group *group,
 				 struct fsnotify_iter_info *iter_info)
 {
 	int ret;
+	unsigned int response;
 
+	BUILD_BUG_ON(FAN_EVENT_STATE_MASK & (FAN_AUDIT | FAN_ALLOW | FAN_DENY));
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	wait_event(group->fanotify_data.access_waitq, event->response);
+	wait_event(group->fanotify_data.access_waitq,
+		   (event->response & FAN_EVENT_STATE_MASK) ==
+							FAN_EVENT_ANSWERED);
+
+	response = event->response & ~FAN_EVENT_STATE_MASK;
 
 	/* userspace responded, convert to something usable */
-	switch (event->response & ~FAN_AUDIT) {
+	switch (response & ~FAN_AUDIT) {
 	case FAN_ALLOW:
 		ret = 0;
 		break;
@@ -78,10 +84,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
 	}
 
 	/* Check if the response should be audited */
-	if (event->response & FAN_AUDIT)
-		audit_fanotify(event->response & ~FAN_AUDIT);
-
-	event->response = 0;
+	if (response & FAN_AUDIT)
+		audit_fanotify(response & ~FAN_AUDIT);
 
 	pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
 		 group, event, ret);
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index ea05b8a401e7..954d997745c3 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -22,6 +22,12 @@ struct fanotify_event_info {
 	struct pid *pid;
 };
 
+/* State of permission event we store inside response field */
+#define FAN_EVENT_STATE_MASK 0xc0000000
+
+#define FAN_EVENT_REPORTED 0x40000000	/* Event reported to userspace */
+#define FAN_EVENT_ANSWERED 0x80000000	/* Event answered by userspace */
+
 /*
  * Structure for permission fanotify events. It gets allocated and freed in
  * fanotify_handle_event() since we wait there for user response. When the
@@ -31,7 +37,9 @@ struct fanotify_event_info {
  */
 struct fanotify_perm_event_info {
 	struct fanotify_event_info fae;
-	int response;	/* userspace answer to question */
+	unsigned int response;	/* userspace answer to the event. We also use
+				 * high bits of this field for recording state
+				 * of the event. */
 	int fd;		/* fd we passed to userspace for this event */
 };
 
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 2b2c8b8a17bd..611c2ff50d64 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -50,7 +50,8 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
 /*
  * Get an fsnotify notification event if one exists and is small
  * enough to fit in "count". Return an error pointer if the count
- * is not large enough.
+ * is not large enough. When permission event is dequeued, its state is
+ * updated accordingly.
  */
 static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
 					    size_t count)
@@ -66,6 +67,8 @@ static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
 		goto out;
 	}
 	event = fsnotify_remove_first_event(group);
+	if (fanotify_is_perm_event(event->mask))
+		FANOTIFY_PE(event)->response = FAN_EVENT_REPORTED;
 out:
 	spin_unlock(&group->notification_lock);
 	return event;
@@ -178,7 +181,7 @@ static int process_access_response(struct fsnotify_group *group,
 			continue;
 
 		list_del_init(&event->fae.fse.list);
-		event->response = response;
+		event->response = response | FAN_EVENT_ANSWERED;
 		spin_unlock(&group->notification_lock);
 		wake_up(&group->fanotify_data.access_waitq);
 		return 0;
@@ -301,7 +304,10 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
 			fsnotify_destroy_event(group, kevent);
 		} else {
 			if (ret <= 0) {
-				FANOTIFY_PE(kevent)->response = FAN_DENY;
+				spin_lock(&group->notification_lock);
+				FANOTIFY_PE(kevent)->response =
+						FAN_DENY | FAN_EVENT_ANSWERED;
+				spin_unlock(&group->notification_lock);
 				wake_up(&group->fanotify_data.access_waitq);
 			} else {
 				spin_lock(&group->notification_lock);
@@ -372,7 +378,7 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 			 event);
 
 		list_del_init(&event->fae.fse.list);
-		event->response = FAN_ALLOW;
+		event->response = FAN_ALLOW | FAN_EVENT_ANSWERED;
 	}
 
 	/*
@@ -387,7 +393,8 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 			fsnotify_destroy_event(group, fsn_event);
 			spin_lock(&group->notification_lock);
 		} else {
-			FANOTIFY_PE(fsn_event)->response = FAN_ALLOW;
+			FANOTIFY_PE(fsn_event)->response =
+					FAN_ALLOW | FAN_EVENT_ANSWERED;
 		}
 	}
 	spin_unlock(&group->notification_lock);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events
  2019-01-08 16:46 [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
                   ` (2 preceding siblings ...)
  2019-01-08 16:46 ` [PATCH 3/4] fanotify: Track permission event state Jan Kara
@ 2019-01-08 16:46 ` Jan Kara
  2019-01-09  7:51   ` Amir Goldstein
  2019-01-08 16:53 ` [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
  4 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-01-08 16:46 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov, Jan Kara

When waiting for response to fanotify permission events, we currently
use uninterruptible waits. That makes code simple however it can cause
lots of processes to end up in uninterruptible sleep with hard reboot
being the only alternative in case fanotify listener process stops
responding (e.g. due to a bug in its implementation). Uninterruptible
sleep also makes system hibernation fail if the listener gets frozen
before the process generating fanotify permission event.

Fix these problems by using interruptible sleep for waiting for response
to fanotify event. This is slightly tricky though - we have to
detect when the event got already reported to userspace as in that
case we must not free the event. Instead we push the responsibility for
freeing the event to the process that will write response to the
event.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fanotify/fanotify.c      | 36 ++++++++++++++++++++++++++----
 fs/notify/fanotify/fanotify.h      |  1 +
 fs/notify/fanotify/fanotify_user.c | 45 ++++++++++++++++++++++++--------------
 fs/notify/notification.c           | 20 +++++++++++------
 include/linux/fsnotify_backend.h   |  3 +++
 5 files changed, 78 insertions(+), 27 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
index cca13adc3a4c..cb7f7c5484fc 100644
--- a/fs/notify/fanotify/fanotify.c
+++ b/fs/notify/fanotify/fanotify.c
@@ -57,6 +57,13 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
 	return 0;
 }
 
+/*
+ * Wait for response to permission event. The function also takes care of
+ * freeing the permission event (or offloads that in case the wait is canceled
+ * by a signal). The function returns 0 in case access got allowed by userspace,
+ * -EPERM in case userspace disallowed the access, and -ERESTARTSYS in case
+ * the wait got interrupted by a signal.
+ */
 static int fanotify_get_response(struct fsnotify_group *group,
 				 struct fanotify_perm_event_info *event,
 				 struct fsnotify_iter_info *iter_info)
@@ -67,9 +74,31 @@ static int fanotify_get_response(struct fsnotify_group *group,
 	BUILD_BUG_ON(FAN_EVENT_STATE_MASK & (FAN_AUDIT | FAN_ALLOW | FAN_DENY));
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
 
-	wait_event(group->fanotify_data.access_waitq,
+	ret = wait_event_interruptible(group->fanotify_data.access_waitq,
 		   (event->response & FAN_EVENT_STATE_MASK) ==
 							FAN_EVENT_ANSWERED);
+	/* Signal pending? */
+	if (ret < 0) {
+		spin_lock(&group->notification_lock);
+		/* Event reported to userspace and no answer yet? */
+		if ((event->response & FAN_EVENT_STATE_MASK) ==
+		    FAN_EVENT_REPORTED) {
+			/* Event will get freed once userspace answers to it */
+			event->response = FAN_EVENT_CANCELED;
+			spin_unlock(&group->notification_lock);
+			return ret;
+		}
+		/* Event not yet reported? Just remove it. */
+		if ((event->response & FAN_EVENT_STATE_MASK) == 0)
+			fsnotify_remove_queued_event(group, &event->fae.fse);
+		/*
+		 * Event may be also answered in case signal delivery raced
+		 * with wakeup. In that case we have nothing to do besides
+		 * freeing the event and reporting error.
+		 */
+		spin_unlock(&group->notification_lock);
+		goto out;
+	}
 
 	response = event->response & ~FAN_EVENT_STATE_MASK;
 
@@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
 	if (response & FAN_AUDIT)
 		audit_fanotify(response & ~FAN_AUDIT);
 
-	pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
-		 group, event, ret);
+out:
+	fsnotify_destroy_event(group, &event->fae.fse);
 	
 	return ret;
 }
@@ -259,7 +288,6 @@ static int fanotify_handle_event(struct fsnotify_group *group,
 	} else if (fanotify_is_perm_event(mask)) {
 		ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event),
 					    iter_info);
-		fsnotify_destroy_event(group, fsn_event);
 	}
 finish:
 	if (fanotify_is_perm_event(mask))
diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 954d997745c3..57286518f420 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -27,6 +27,7 @@ struct fanotify_event_info {
 
 #define FAN_EVENT_REPORTED 0x40000000	/* Event reported to userspace */
 #define FAN_EVENT_ANSWERED 0x80000000	/* Event answered by userspace */
+#define FAN_EVENT_CANCELED 0xc0000000	/* Event got canceled by a signal */
 
 /*
  * Structure for permission fanotify events. It gets allocated and freed in
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 611c2ff50d64..d6ae9e9338e6 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -146,6 +146,21 @@ static int fill_event_metadata(struct fsnotify_group *group,
 	return ret;
 }
 
+static void set_event_response(struct fsnotify_group *group,
+			       struct fanotify_perm_event_info *event,
+			       unsigned int response)
+{
+	spin_lock(&group->notification_lock);
+	/* Waiter got aborted by a signal? Free the event. */
+	if (unlikely(event->response == FAN_EVENT_CANCELED)) {
+		spin_unlock(&group->notification_lock);
+		fsnotify_destroy_event(group, &event->fae.fse);
+		return;
+	}
+	event->response = response | FAN_EVENT_ANSWERED;
+	spin_unlock(&group->notification_lock);
+}
+
 static int process_access_response(struct fsnotify_group *group,
 				   struct fanotify_response *response_struct)
 {
@@ -181,8 +196,8 @@ static int process_access_response(struct fsnotify_group *group,
 			continue;
 
 		list_del_init(&event->fae.fse.list);
-		event->response = response | FAN_EVENT_ANSWERED;
 		spin_unlock(&group->notification_lock);
+		set_event_response(group, event, response);
 		wake_up(&group->fanotify_data.access_waitq);
 		return 0;
 	}
@@ -304,10 +319,8 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
 			fsnotify_destroy_event(group, kevent);
 		} else {
 			if (ret <= 0) {
-				spin_lock(&group->notification_lock);
-				FANOTIFY_PE(kevent)->response =
-						FAN_DENY | FAN_EVENT_ANSWERED;
-				spin_unlock(&group->notification_lock);
+				set_event_response(group, FANOTIFY_PE(kevent),
+						   FAN_DENY);
 				wake_up(&group->fanotify_data.access_waitq);
 			} else {
 				spin_lock(&group->notification_lock);
@@ -357,7 +370,7 @@ static ssize_t fanotify_write(struct file *file, const char __user *buf, size_t
 static int fanotify_release(struct inode *ignored, struct file *file)
 {
 	struct fsnotify_group *group = file->private_data;
-	struct fanotify_perm_event_info *event, *next;
+	struct fanotify_perm_event_info *event;
 	struct fsnotify_event *fsn_event;
 
 	/*
@@ -372,13 +385,13 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 	 * and simulate reply from userspace.
 	 */
 	spin_lock(&group->notification_lock);
-	list_for_each_entry_safe(event, next, &group->fanotify_data.access_list,
-				 fae.fse.list) {
-		pr_debug("%s: found group=%p event=%p\n", __func__, group,
-			 event);
-
+	while (!list_empty(&group->fanotify_data.access_list)) {
+		event = list_first_entry(&group->fanotify_data.access_list,
+				struct fanotify_perm_event_info, fae.fse.list);
 		list_del_init(&event->fae.fse.list);
-		event->response = FAN_ALLOW | FAN_EVENT_ANSWERED;
+		spin_unlock(&group->notification_lock);
+		set_event_response(group, event, FAN_ALLOW);
+		spin_lock(&group->notification_lock);
 	}
 
 	/*
@@ -388,14 +401,14 @@ static int fanotify_release(struct inode *ignored, struct file *file)
 	 */
 	while (!fsnotify_notify_queue_is_empty(group)) {
 		fsn_event = fsnotify_remove_first_event(group);
+		spin_unlock(&group->notification_lock);
 		if (!(fsn_event->mask & FANOTIFY_PERM_EVENTS)) {
-			spin_unlock(&group->notification_lock);
 			fsnotify_destroy_event(group, fsn_event);
-			spin_lock(&group->notification_lock);
 		} else {
-			FANOTIFY_PE(fsn_event)->response =
-					FAN_ALLOW | FAN_EVENT_ANSWERED;
+			set_event_response(group, FANOTIFY_PE(fsn_event),
+					   FAN_ALLOW);
 		}
+		spin_lock(&group->notification_lock);
 	}
 	spin_unlock(&group->notification_lock);
 
diff --git a/fs/notify/notification.c b/fs/notify/notification.c
index 3c3e36745f59..2195a5cf745a 100644
--- a/fs/notify/notification.c
+++ b/fs/notify/notification.c
@@ -141,6 +141,18 @@ int fsnotify_add_event(struct fsnotify_group *group,
 	return ret;
 }
 
+void fsnotify_remove_queued_event(struct fsnotify_group *group,
+				  struct fsnotify_event *event)
+{
+	assert_spin_locked(&group->notification_lock);
+	/*
+	 * We need to init list head for the case of overflow event so that
+	 * check in fsnotify_add_event() works
+	 */
+	list_del_init(&event->list);
+	group->q_len--;
+}
+
 /*
  * Remove and return the first event from the notification list.  It is the
  * responsibility of the caller to destroy the obtained event
@@ -155,13 +167,7 @@ struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group)
 
 	event = list_first_entry(&group->notification_list,
 				 struct fsnotify_event, list);
-	/*
-	 * We need to init list head for the case of overflow event so that
-	 * check in fsnotify_add_event() works
-	 */
-	list_del_init(&event->list);
-	group->q_len--;
-
+	fsnotify_remove_queued_event(group, event);
 	return event;
 }
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 7639774e7475..cddf839bac96 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -416,6 +416,9 @@ extern bool fsnotify_notify_queue_is_empty(struct fsnotify_group *group);
 extern struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group);
 /* return AND dequeue the first event on the notification queue */
 extern struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group);
+/* Remove event queued in the notification list */
+extern void fsnotify_remove_queued_event(struct fsnotify_group *group,
+					 struct fsnotify_event *event);
 
 /* functions used to manipulate the marks attached to inodes */
 
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/4] fanotify: Make wait for permission event response interruptible
  2019-01-08 16:46 [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
                   ` (3 preceding siblings ...)
  2019-01-08 16:46 ` [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events Jan Kara
@ 2019-01-08 16:53 ` Jan Kara
  4 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2019-01-08 16:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Amir Goldstein, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On Tue 08-01-19 17:46:07, Jan Kara wrote:
> Hello,
> 
> When waiting for response to fanotify permission events, we currently use
> uninterruptible waits. That makes code simple however it can cause lots of
> processes to end up in uninterruptible sleep with hard reboot being the only
> alternative in case fanotify listener process stops responding (e.g. due to a
> bug in its implementation) - reported e.g. in [1]. Uninterruptible sleep also
> makes system hibernation fail if the listener gets frozen before the process
> generating fanotify permission event (as reported e.g. here [2]).
> 
> This patch set modifies fanotify so that it will use interruptible wait when
> waiting for fanotify permission event response. Patches are based on current
> Linus' tree for the ease of testing (I plan to rebase them on top of Amir's
> pending changes later). I have also create LTP test which stresses handling of
> permission events while sending processes signals to test the new code - I'll
> send that separately later. Review, comments, and testing are welcome.
> 
> [1] https://lore.kernel.org/lkml/153474898224.6806.12518115530793064797.stgit@buzz/
> [2] https://lore.kernel.org/lkml/c1bb16b7-9eee-9cea-2c96-a512d8b3b9c7@nwra.com/
> 

Patch adding LTP test I have created for excercising new code is attached.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-syscalls-fanotify12-New-test-to-test-signal-handling.patch --]
[-- Type: text/x-patch, Size: 5705 bytes --]

>From 53bdff3951596f8084127998c085d4ff94667873 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 8 Jan 2019 17:01:40 +0100
Subject: [PATCH] syscalls/fanotify12: New test to test signal handling for
 permission events

Test whether kernel survives when processes waiting for response to
fanotify permission event are interrupted by a signal.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 testcases/kernel/syscalls/fanotify/.gitignore   |   1 +
 testcases/kernel/syscalls/fanotify/fanotify12.c | 217 ++++++++++++++++++++++++
 2 files changed, 218 insertions(+)
 create mode 100644 testcases/kernel/syscalls/fanotify/fanotify12.c

diff --git a/testcases/kernel/syscalls/fanotify/.gitignore b/testcases/kernel/syscalls/fanotify/.gitignore
index 3818e241ffcf..d583d459b4aa 100644
--- a/testcases/kernel/syscalls/fanotify/.gitignore
+++ b/testcases/kernel/syscalls/fanotify/.gitignore
@@ -9,3 +9,4 @@
 /fanotify09
 /fanotify10
 /fanotify11
+/fanotify12
diff --git a/testcases/kernel/syscalls/fanotify/fanotify12.c b/testcases/kernel/syscalls/fanotify/fanotify12.c
new file mode 100644
index 000000000000..462c85031c0b
--- /dev/null
+++ b/testcases/kernel/syscalls/fanotify/fanotify12.c
@@ -0,0 +1,217 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2019 SUSE.  All Rights Reserved.
+ *
+ * Started by Jan Kara <jack@suse.cz>
+ *
+ * DESCRIPTION
+ *     Check how fanotify permission events deal with signals.
+ */
+#define _GNU_SOURCE
+#include "config.h"
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/fcntl.h>
+#include <sys/wait.h>
+#include <errno.h>
+#include <string.h>
+#include <signal.h>
+#include <sys/syscall.h>
+#include "tst_test.h"
+#include "lapi/syscalls.h"
+#include "fanotify.h"
+
+#if defined(HAVE_SYS_FANOTIFY_H)
+#include <sys/fanotify.h>
+
+#define BUF_SIZE 256
+static char fname[BUF_SIZE];
+static char buf[BUF_SIZE];
+static volatile int fd_notify;
+
+/* Number of children we start */
+#define MAX_CHILDREN 16
+static pid_t child_pid[MAX_CHILDREN];
+
+/* Number of events we process before stopping */
+#define MAX_EVENTS 10000
+
+static void generate_events(void)
+{
+	int fd;
+
+	/*
+	 * generate sequence of events
+	 */
+	if ((fd = open(fname, O_RDWR | O_CREAT, 0700)) == -1)
+		exit(1);
+
+	/* Run until killed... */
+	while (1) {
+		lseek(fd, 0, SEEK_SET);
+		if (read(fd, buf, BUF_SIZE) == -1)
+			exit(3);
+	}
+}
+
+static void usrhandler(int sig)
+{
+	/* Do nothing */
+}
+
+static void send_signals(void)
+{
+	while (1) {
+		usleep(random() % 200);
+		SAFE_KILL(child_pid[random() % (MAX_CHILDREN-1)], SIGUSR1);
+	}
+}
+
+static void run_children(void)
+{
+	int i;
+
+	for (i = 0; i < MAX_CHILDREN; i++) {
+		child_pid[i] = SAFE_FORK();
+		if (!child_pid[i]) {
+			/* Child will generate events now */
+			close(fd_notify);
+			if (i == MAX_CHILDREN - 1) {
+				send_signals();
+			} else {
+				SAFE_SIGNAL(SIGUSR1, usrhandler);
+				generate_events();
+			}
+			exit(0);
+		}
+	}
+}
+
+static int stop_children(void)
+{
+	int child_ret;
+	int i, ret = 0;
+
+	for (i = 0; i < MAX_CHILDREN; i++)
+		SAFE_KILL(child_pid[i], SIGKILL);
+
+	for (i = 0; i < MAX_CHILDREN; i++) {
+		SAFE_WAITPID(child_pid[i], &child_ret, 0);
+		if (!WIFSIGNALED(child_ret))
+			ret = 1;
+	}
+
+	return ret;
+}
+
+static int setup_instance(void)
+{
+	int fd;
+
+	fd = SAFE_FANOTIFY_INIT(FAN_CLASS_CONTENT, O_RDONLY);
+
+	if (fanotify_mark(fd, FAN_MARK_ADD, FAN_ACCESS_PERM, AT_FDCWD,
+			  fname) < 0) {
+		close(fd);
+		if (errno == EINVAL) {
+			tst_brk(TCONF | TERRNO,
+				"CONFIG_FANOTIFY_ACCESS_PERMISSIONS not "
+				"configured in kernel?");
+		} else {
+			tst_brk(TBROK | TERRNO,
+				"fanotify_mark (%d, FAN_MARK_ADD, FAN_ACCESS_PERM, "
+				"AT_FDCWD, %s) failed.", fd, fname);
+		}
+	}
+
+	return fd;
+}
+
+static void handle_fanotify_events(void)
+{
+	int events = 0;
+
+	/*
+	 * check events
+	 */
+	while (events < MAX_EVENTS) {
+		struct fanotify_event_metadata event;
+		struct fanotify_response resp;
+
+		/* Sleep randomly to give time for signal to be delivered */
+		usleep(random() % 1000);
+
+		/* Get more events */
+		SAFE_READ(1, fd_notify, &event, sizeof(event));
+
+		if (event.mask != FAN_ACCESS_PERM) {
+			tst_res(TFAIL,
+				"got event: mask=%llx (expected %llx) "
+				"pid=%u fd=%d",
+				(unsigned long long)event.mask,
+				(unsigned long long)FAN_ACCESS_PERM,
+				(unsigned)event.pid, event.fd);
+			break;
+		}
+
+		/* Sleep randomly to give time for signal to be delivered */
+		usleep(random() % 1000);
+		/* Write response to permission event */
+		resp.fd = event.fd;
+		resp.response = FAN_ALLOW;
+		SAFE_WRITE(1, fd_notify, &resp, sizeof(resp));
+		SAFE_CLOSE(event.fd);
+		events++;
+	}
+}
+
+static void test_fanotify(void)
+{
+	int ret;
+
+	fd_notify = setup_instance();
+	run_children();
+	handle_fanotify_events();
+
+	/*
+	 * Now destroy the fanotify instance while there are permission
+	 * events at various stages of processing. This may provoke
+	 * kernel hangs or crashes.
+	 */
+	SAFE_CLOSE(fd_notify);
+
+	ret = stop_children();
+	if (ret)
+		tst_res(TFAIL, "child exited for unexpected reason");
+	else
+		tst_res(TPASS, "all children exited successfully");
+}
+
+static void setup(void)
+{
+	sprintf(fname, "fname_%d", getpid());
+	SAFE_FILE_PRINTF(fname, "%s", fname);
+}
+
+static void cleanup(void)
+{
+	if (fd_notify > 0)
+		SAFE_CLOSE(fd_notify);
+}
+
+static struct tst_test test = {
+	.test_all = test_fanotify,
+	.setup = setup,
+	.cleanup = cleanup,
+	.needs_tmpdir = 1,
+	.forks_child = 1,
+	.needs_root = 1,
+};
+
+#else
+	TST_TEST_TCONF("system doesn't have required fanotify support");
+#endif
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] fanotify: Fold dequeue_event() into process_access_response()
  2019-01-08 16:46 ` [PATCH 1/4] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
@ 2019-01-09  6:59   ` Amir Goldstein
  0 siblings, 0 replies; 15+ messages in thread
From: Amir Goldstein @ 2019-01-09  6:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Vivek Trivedi, Orion Poplawski, Konstantin Khlebnikov

On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
>
> Fold dequeue_event() into process_access_response(). This will make
> changes to use of ->response field easier.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

Looks ok

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fanotify: Move locking inside get_one_event()
  2019-01-08 16:46 ` [PATCH 2/4] fanotify: Move locking inside get_one_event() Jan Kara
@ 2019-01-09  7:09   ` Amir Goldstein
  2019-01-09  9:12     ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2019-01-09  7:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Vivek Trivedi, Orion Poplawski, Konstantin Khlebnikov

On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
>
> get_one_event() has a single caller and that just locks
> notification_lock around the call. Move locking inside get_one_event()
> as that will make using ->response field for permission event state
> easier.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/fanotify/fanotify_user.c | 26 +++++++++++---------------
>  1 file changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 908ebc421d15..2b2c8b8a17bd 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -51,25 +51,24 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
>   * Get an fsnotify notification event if one exists and is small
>   * enough to fit in "count". Return an error pointer if the count
>   * is not large enough.
> - *
> - * Called with the group->notification_lock held.
>   */
>  static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
>                                             size_t count)
>  {
> -       assert_spin_locked(&group->notification_lock);
> -
> -       pr_debug("%s: group=%p count=%zd\n", __func__, group, count);

I see you are slowly cleaning up pr_debug calls. Any particular reason?
Out of all the spam pr_debug calls in the code, this one looks rather useful.

Otherwise, looks ok.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] fanotify: Track permission event state
  2019-01-08 16:46 ` [PATCH 3/4] fanotify: Track permission event state Jan Kara
@ 2019-01-09  7:22   ` Amir Goldstein
  2019-01-09  9:11     ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2019-01-09  7:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Vivek Trivedi, Orion Poplawski, Konstantin Khlebnikov

On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
>
> Track whether permission event got already reported to userspace and
> whether userspace already answered to the permission event in high bits
> of its ->response field. Also protect stores to ->response field by
> group->notification_lock. This will allow aborting wait for reply to
> permission event from userspace.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/fanotify/fanotify.c      | 16 ++++++++++------
>  fs/notify/fanotify/fanotify.h      | 10 +++++++++-
>  fs/notify/fanotify/fanotify_user.c | 17 ++++++++++++-----
>  3 files changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 3723f3d18d20..cca13adc3a4c 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -62,13 +62,19 @@ static int fanotify_get_response(struct fsnotify_group *group,
>                                  struct fsnotify_iter_info *iter_info)
>  {
>         int ret;
> +       unsigned int response;
>
> +       BUILD_BUG_ON(FAN_EVENT_STATE_MASK & (FAN_AUDIT | FAN_ALLOW | FAN_DENY));
>         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
>
> -       wait_event(group->fanotify_data.access_waitq, event->response);
> +       wait_event(group->fanotify_data.access_waitq,
> +                  (event->response & FAN_EVENT_STATE_MASK) ==
> +                                                       FAN_EVENT_ANSWERED);
> +
> +       response = event->response & ~FAN_EVENT_STATE_MASK;
>
>         /* userspace responded, convert to something usable */
> -       switch (event->response & ~FAN_AUDIT) {
> +       switch (response & ~FAN_AUDIT) {
>         case FAN_ALLOW:
>                 ret = 0;
>                 break;
> @@ -78,10 +84,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
>         }
>
>         /* Check if the response should be audited */
> -       if (event->response & FAN_AUDIT)
> -               audit_fanotify(event->response & ~FAN_AUDIT);
> -
> -       event->response = 0;
> +       if (response & FAN_AUDIT)
> +               audit_fanotify(response & ~FAN_AUDIT);
>
>         pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
>                  group, event, ret);
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index ea05b8a401e7..954d997745c3 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -22,6 +22,12 @@ struct fanotify_event_info {
>         struct pid *pid;
>  };
>
> +/* State of permission event we store inside response field */
> +#define FAN_EVENT_STATE_MASK 0xc0000000
> +
> +#define FAN_EVENT_REPORTED 0x40000000  /* Event reported to userspace */
> +#define FAN_EVENT_ANSWERED 0x80000000  /* Event answered by userspace */
> +
>  /*
>   * Structure for permission fanotify events. It gets allocated and freed in
>   * fanotify_handle_event() since we wait there for user response. When the
> @@ -31,7 +37,9 @@ struct fanotify_event_info {
>   */
>  struct fanotify_perm_event_info {
>         struct fanotify_event_info fae;
> -       int response;   /* userspace answer to question */
> +       unsigned int response;  /* userspace answer to the event. We also use
> +                                * high bits of this field for recording state
> +                                * of the event. */

What's wrong with:

        short response;   /* userspace answer to question */
        short state;   /* the state of this event */

Will make for a much simpler code/patch IMO.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events
  2019-01-08 16:46 ` [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events Jan Kara
@ 2019-01-09  7:51   ` Amir Goldstein
  2019-01-09  9:23     ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2019-01-09  7:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Vivek Trivedi, Orion Poplawski, Konstantin Khlebnikov

On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
>
> When waiting for response to fanotify permission events, we currently
> use uninterruptible waits. That makes code simple however it can cause
> lots of processes to end up in uninterruptible sleep with hard reboot
> being the only alternative in case fanotify listener process stops
> responding (e.g. due to a bug in its implementation). Uninterruptible
> sleep also makes system hibernation fail if the listener gets frozen
> before the process generating fanotify permission event.
>
> Fix these problems by using interruptible sleep for waiting for response
> to fanotify event. This is slightly tricky though - we have to
> detect when the event got already reported to userspace as in that
> case we must not free the event.

> Instead we push the responsibility for
> freeing the event to the process that will write response to the
> event.

It a bit hard to follow this patch. Can we make the patch that
shifts responsibility to free the event a separate patch?
fsnotify_remove_queued_event() helper can tag along with it
or separate patch as you wish.

>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---

I think it would be good to add reported-by tand the links you provided
in cover letter in this patch as well.

...

>
> @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
>         if (response & FAN_AUDIT)
>                 audit_fanotify(response & ~FAN_AUDIT);
>
> -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
> -                group, event, ret);

Looks useful. Why remove?

...

>
> +static void set_event_response(struct fsnotify_group *group,
> +                              struct fanotify_perm_event_info *event,
> +                              unsigned int response)
> +{
> +       spin_lock(&group->notification_lock);
> +       /* Waiter got aborted by a signal? Free the event. */
> +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
> +               spin_unlock(&group->notification_lock);
> +               fsnotify_destroy_event(group, &event->fae.fse);
> +               return;
> +       }
> +       event->response = response | FAN_EVENT_ANSWERED;
> +       spin_unlock(&group->notification_lock);
> +}
> +

I don't understand. AFAICS, all callers of set_event_response()
call immediately after spin_unlock(&group->notification_lock),
without any user wait involved.
I think it makes more sense for set_event_response() to assert the
lock than it is to take it.

Am I missing anything?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] fanotify: Track permission event state
  2019-01-09  7:22   ` Amir Goldstein
@ 2019-01-09  9:11     ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2019-01-09  9:11 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov

On Wed 09-01-19 09:22:19, Amir Goldstein wrote:
> > @@ -31,7 +37,9 @@ struct fanotify_event_info {
> >   */
> >  struct fanotify_perm_event_info {
> >         struct fanotify_event_info fae;
> > -       int response;   /* userspace answer to question */
> > +       unsigned int response;  /* userspace answer to the event. We also use
> > +                                * high bits of this field for recording state
> > +                                * of the event. */
> 
> What's wrong with:
> 
>         short response;   /* userspace answer to question */
>         short state;   /* the state of this event */
> 
> Will make for a much simpler code/patch IMO.

I guess you're right. I'm somewhat cautious about types shorter than int as
I do read 'state' locklessly and for shorter-than-int types compiler can do
fancy things like read-modify-write cycle of a full word when storing some
value in either response or state. But I think this particular case should
be fine.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] fanotify: Move locking inside get_one_event()
  2019-01-09  7:09   ` Amir Goldstein
@ 2019-01-09  9:12     ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2019-01-09  9:12 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov

On Wed 09-01-19 09:09:20, Amir Goldstein wrote:
> On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
> >
> > get_one_event() has a single caller and that just locks
> > notification_lock around the call. Move locking inside get_one_event()
> > as that will make using ->response field for permission event state
> > easier.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/notify/fanotify/fanotify_user.c | 26 +++++++++++---------------
> >  1 file changed, 11 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 908ebc421d15..2b2c8b8a17bd 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -51,25 +51,24 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly;
> >   * Get an fsnotify notification event if one exists and is small
> >   * enough to fit in "count". Return an error pointer if the count
> >   * is not large enough.
> > - *
> > - * Called with the group->notification_lock held.
> >   */
> >  static struct fsnotify_event *get_one_event(struct fsnotify_group *group,
> >                                             size_t count)
> >  {
> > -       assert_spin_locked(&group->notification_lock);
> > -
> > -       pr_debug("%s: group=%p count=%zd\n", __func__, group, count);
> 
> I see you are slowly cleaning up pr_debug calls. Any particular reason?
> Out of all the spam pr_debug calls in the code, this one looks rather useful.

I didn't find it useful but if you think I it, I'll leave it in.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events
  2019-01-09  7:51   ` Amir Goldstein
@ 2019-01-09  9:23     ` Jan Kara
  2019-02-12 15:40       ` Orion Poplawski
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-01-09  9:23 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, linux-fsdevel, Vivek Trivedi, Orion Poplawski,
	Konstantin Khlebnikov

On Wed 09-01-19 09:51:08, Amir Goldstein wrote:
> On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
> >
> > When waiting for response to fanotify permission events, we currently
> > use uninterruptible waits. That makes code simple however it can cause
> > lots of processes to end up in uninterruptible sleep with hard reboot
> > being the only alternative in case fanotify listener process stops
> > responding (e.g. due to a bug in its implementation). Uninterruptible
> > sleep also makes system hibernation fail if the listener gets frozen
> > before the process generating fanotify permission event.
> >
> > Fix these problems by using interruptible sleep for waiting for response
> > to fanotify event. This is slightly tricky though - we have to
> > detect when the event got already reported to userspace as in that
> > case we must not free the event.
> 
> > Instead we push the responsibility for
> > freeing the event to the process that will write response to the
> > event.
> 
> It a bit hard to follow this patch. Can we make the patch that
> shifts responsibility to free the event a separate patch?
> fsnotify_remove_queued_event() helper can tag along with it
> or separate patch as you wish.

I'll have a look how to best split this. It should be possible.

> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> 
> I think it would be good to add reported-by tand the links you provided
> in cover letter in this patch as well.

Good point. Will do.

> >
> > @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
> >         if (response & FAN_AUDIT)
> >                 audit_fanotify(response & ~FAN_AUDIT);
> >
> > -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
> > -                group, event, ret);
> 
> Looks useful. Why remove?

OK, I'll leave it alone.

> > +static void set_event_response(struct fsnotify_group *group,
> > +                              struct fanotify_perm_event_info *event,
> > +                              unsigned int response)
> > +{
> > +       spin_lock(&group->notification_lock);
> > +       /* Waiter got aborted by a signal? Free the event. */
> > +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
> > +               spin_unlock(&group->notification_lock);
> > +               fsnotify_destroy_event(group, &event->fae.fse);
> > +               return;
> > +       }
> > +       event->response = response | FAN_EVENT_ANSWERED;
> > +       spin_unlock(&group->notification_lock);
> > +}
> > +
> 
> I don't understand. AFAICS, all callers of set_event_response()
> call immediately after spin_unlock(&group->notification_lock),
> without any user wait involved.
> I think it makes more sense for set_event_response() to assert the
> lock than it is to take it.
> 
> Am I missing anything?

In case we need to destroy the event, we want to drop the
notification_lock. So to avoid a situation where set_event_response() drops
a lock it did not acquire (which is not very intuitive), I've decided for
the less efficient scheme of dropping and retaking the lock.

But maybe with better function name and some asserts, we could live with
dropping the lock inside the function without taking it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events
  2019-01-09  9:23     ` Jan Kara
@ 2019-02-12 15:40       ` Orion Poplawski
  2019-02-13 14:56         ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Orion Poplawski @ 2019-02-12 15:40 UTC (permalink / raw)
  To: Jan Kara, Amir Goldstein
  Cc: linux-fsdevel, Vivek Trivedi, Konstantin Khlebnikov

[-- Attachment #1: Type: text/plain, Size: 3775 bytes --]

On 1/9/19 2:23 AM, Jan Kara wrote:
> On Wed 09-01-19 09:51:08, Amir Goldstein wrote:
>> On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
>>>
>>> When waiting for response to fanotify permission events, we currently
>>> use uninterruptible waits. That makes code simple however it can cause
>>> lots of processes to end up in uninterruptible sleep with hard reboot
>>> being the only alternative in case fanotify listener process stops
>>> responding (e.g. due to a bug in its implementation). Uninterruptible
>>> sleep also makes system hibernation fail if the listener gets frozen
>>> before the process generating fanotify permission event.
>>>
>>> Fix these problems by using interruptible sleep for waiting for response
>>> to fanotify event. This is slightly tricky though - we have to
>>> detect when the event got already reported to userspace as in that
>>> case we must not free the event.
>>
>>> Instead we push the responsibility for
>>> freeing the event to the process that will write response to the
>>> event.
>>
>> It a bit hard to follow this patch. Can we make the patch that
>> shifts responsibility to free the event a separate patch?
>> fsnotify_remove_queued_event() helper can tag along with it
>> or separate patch as you wish.
> 
> I'll have a look how to best split this. It should be possible.
> 
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>
>> I think it would be good to add reported-by tand the links you provided
>> in cover letter in this patch as well.
> 
> Good point. Will do.
> 
>>>
>>> @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
>>>         if (response & FAN_AUDIT)
>>>                 audit_fanotify(response & ~FAN_AUDIT);
>>>
>>> -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
>>> -                group, event, ret);
>>
>> Looks useful. Why remove?
> 
> OK, I'll leave it alone.
> 
>>> +static void set_event_response(struct fsnotify_group *group,
>>> +                              struct fanotify_perm_event_info *event,
>>> +                              unsigned int response)
>>> +{
>>> +       spin_lock(&group->notification_lock);
>>> +       /* Waiter got aborted by a signal? Free the event. */
>>> +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
>>> +               spin_unlock(&group->notification_lock);
>>> +               fsnotify_destroy_event(group, &event->fae.fse);
>>> +               return;
>>> +       }
>>> +       event->response = response | FAN_EVENT_ANSWERED;
>>> +       spin_unlock(&group->notification_lock);
>>> +}
>>> +
>>
>> I don't understand. AFAICS, all callers of set_event_response()
>> call immediately after spin_unlock(&group->notification_lock),
>> without any user wait involved.
>> I think it makes more sense for set_event_response() to assert the
>> lock than it is to take it.
>>
>> Am I missing anything?
> 
> In case we need to destroy the event, we want to drop the
> notification_lock. So to avoid a situation where set_event_response() drops
> a lock it did not acquire (which is not very intuitive), I've decided for
> the less efficient scheme of dropping and retaking the lock.
> 
> But maybe with better function name and some asserts, we could live with
> dropping the lock inside the function without taking it.
> 
> 								Honza
> 


Any more progress here?  Thanks for your work on this, it's a real thorn in
our side here.


-- 
Orion Poplawski
Manager of NWRA Technical Systems          720-772-5637
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion@nwra.com
Boulder, CO 80301                 https://www.nwra.com/


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3799 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events
  2019-02-12 15:40       ` Orion Poplawski
@ 2019-02-13 14:56         ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2019-02-13 14:56 UTC (permalink / raw)
  To: Orion Poplawski
  Cc: Jan Kara, Amir Goldstein, linux-fsdevel, Vivek Trivedi,
	Konstantin Khlebnikov

On Tue 12-02-19 08:40:13, Orion Poplawski wrote:
> On 1/9/19 2:23 AM, Jan Kara wrote:
> > On Wed 09-01-19 09:51:08, Amir Goldstein wrote:
> >> On Tue, Jan 8, 2019 at 6:46 PM Jan Kara <jack@suse.cz> wrote:
> >>>
> >>> When waiting for response to fanotify permission events, we currently
> >>> use uninterruptible waits. That makes code simple however it can cause
> >>> lots of processes to end up in uninterruptible sleep with hard reboot
> >>> being the only alternative in case fanotify listener process stops
> >>> responding (e.g. due to a bug in its implementation). Uninterruptible
> >>> sleep also makes system hibernation fail if the listener gets frozen
> >>> before the process generating fanotify permission event.
> >>>
> >>> Fix these problems by using interruptible sleep for waiting for response
> >>> to fanotify event. This is slightly tricky though - we have to
> >>> detect when the event got already reported to userspace as in that
> >>> case we must not free the event.
> >>
> >>> Instead we push the responsibility for
> >>> freeing the event to the process that will write response to the
> >>> event.
> >>
> >> It a bit hard to follow this patch. Can we make the patch that
> >> shifts responsibility to free the event a separate patch?
> >> fsnotify_remove_queued_event() helper can tag along with it
> >> or separate patch as you wish.
> > 
> > I'll have a look how to best split this. It should be possible.
> > 
> >>> Signed-off-by: Jan Kara <jack@suse.cz>
> >>> ---
> >>
> >> I think it would be good to add reported-by tand the links you provided
> >> in cover letter in this patch as well.
> > 
> > Good point. Will do.
> > 
> >>>
> >>> @@ -87,8 +116,8 @@ static int fanotify_get_response(struct fsnotify_group *group,
> >>>         if (response & FAN_AUDIT)
> >>>                 audit_fanotify(response & ~FAN_AUDIT);
> >>>
> >>> -       pr_debug("%s: group=%p event=%p about to return ret=%d\n", __func__,
> >>> -                group, event, ret);
> >>
> >> Looks useful. Why remove?
> > 
> > OK, I'll leave it alone.
> > 
> >>> +static void set_event_response(struct fsnotify_group *group,
> >>> +                              struct fanotify_perm_event_info *event,
> >>> +                              unsigned int response)
> >>> +{
> >>> +       spin_lock(&group->notification_lock);
> >>> +       /* Waiter got aborted by a signal? Free the event. */
> >>> +       if (unlikely(event->response == FAN_EVENT_CANCELED)) {
> >>> +               spin_unlock(&group->notification_lock);
> >>> +               fsnotify_destroy_event(group, &event->fae.fse);
> >>> +               return;
> >>> +       }
> >>> +       event->response = response | FAN_EVENT_ANSWERED;
> >>> +       spin_unlock(&group->notification_lock);
> >>> +}
> >>> +
> >>
> >> I don't understand. AFAICS, all callers of set_event_response()
> >> call immediately after spin_unlock(&group->notification_lock),
> >> without any user wait involved.
> >> I think it makes more sense for set_event_response() to assert the
> >> lock than it is to take it.
> >>
> >> Am I missing anything?
> > 
> > In case we need to destroy the event, we want to drop the
> > notification_lock. So to avoid a situation where set_event_response() drops
> > a lock it did not acquire (which is not very intuitive), I've decided for
> > the less efficient scheme of dropping and retaking the lock.
> > 
> > But maybe with better function name and some asserts, we could live with
> > dropping the lock inside the function without taking it.
> > 
> > 								Honza
> > 
> 
> 
> Any more progress here?  Thanks for your work on this, it's a real thorn in
> our side here.

I've sent v2 of the patches today to push things further.

								Honza


-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-02-13 14:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-08 16:46 [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara
2019-01-08 16:46 ` [PATCH 1/4] fanotify: Fold dequeue_event() into process_access_response() Jan Kara
2019-01-09  6:59   ` Amir Goldstein
2019-01-08 16:46 ` [PATCH 2/4] fanotify: Move locking inside get_one_event() Jan Kara
2019-01-09  7:09   ` Amir Goldstein
2019-01-09  9:12     ` Jan Kara
2019-01-08 16:46 ` [PATCH 3/4] fanotify: Track permission event state Jan Kara
2019-01-09  7:22   ` Amir Goldstein
2019-01-09  9:11     ` Jan Kara
2019-01-08 16:46 ` [PATCH 4/4] fanotify: Use interruptible wait when waiting for permission events Jan Kara
2019-01-09  7:51   ` Amir Goldstein
2019-01-09  9:23     ` Jan Kara
2019-02-12 15:40       ` Orion Poplawski
2019-02-13 14:56         ` Jan Kara
2019-01-08 16:53 ` [PATCH 0/4] fanotify: Make wait for permission event response interruptible Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).