All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path
@ 2020-02-10  9:41 Roman Penyaev
  2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
  2020-02-10  9:41 ` [PATCH v2 3/3] kselftest: introduce new epoll test case Roman Penyaev
  0 siblings, 2 replies; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10  9:41 UTC (permalink / raw)
  Cc: Roman Penyaev, Max Neunhoeffer, Jakub Kicinski,
	Christopher Kohlhoff, Davidlohr Bueso, Jason Baron,
	Andrew Morton, linux-fsdevel, linux-kernel

This fixes possible lost wakeup introduced by the a218cc491420.
Originally modifications to ep->wq were serialized by ep->wq.lock,
but in the a218cc491420 new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on
epoll_ctl() path) were switched to ep->lock, but ep_poll
(epoll_wait) was using ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock
internally, but actual waitqueue_active() check prior wake_up()
call can be reordered with modifications of ep ready list, thus
wake up can be lost.

And yes, can be healed by explicit smp_mb():

  list_add_tail(&epi->rdlink, &ep->rdllist);
  smp_mb();
  if (waitqueue_active(&ep->wq))
	wake_up(&ep->wp);

But let's make it simple, thus current patch replaces ep->wq.lock
with the ep->lock for wqueue modifications, thus wake up path
always observes activeness of the wqueue correcty.

Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Reported-by: Max Neunhoeffer <max@arangodb.com>
Bisected-by: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Nothing interesting in v2:
     changed the comment a bit and specified Reported-by and Bisected-by tags

 fs/eventpoll.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b041b66002db..eee3c92a9ebf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		waiter = true;
 		init_waitqueue_entry(&wait, current);
 
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		__add_wait_queue_exclusive(&ep->wq, &wait);
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 	}
 
 	for (;;) {
@@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		goto fetch_events;
 
 	if (waiter) {
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		__remove_wait_queue(&ep->wq, &wait);
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 	}
 
 	return res;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases
  2020-02-10  9:41 [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path Roman Penyaev
@ 2020-02-10  9:41 ` Roman Penyaev
  2020-02-10 18:16   ` Jason Baron
  2020-02-10  9:41 ` [PATCH v2 3/3] kselftest: introduce new epoll test case Roman Penyaev
  1 sibling, 1 reply; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10  9:41 UTC (permalink / raw)
  Cc: Roman Penyaev, Max Neunhoeffer, Jakub Kicinski,
	Christopher Kohlhoff, Davidlohr Bueso, Jason Baron,
	Andrew Morton, linux-fsdevel, linux-kernel

Now ep->lock is responsible for wqueue serialization, thus if ep->lock
is taken on write path, wake_up_locked() can be invoked.

Though, read path is different.  Since concurrent cpus can enter the
wake up function it needs to be internally serialized, thus wake_up()
variant is used which implies internal spin lock.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Nothing interesting in v2:
     changed the comment a bit

 fs/eventpoll.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index eee3c92a9ebf..6e218234bd4a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct epitem *epi)
  * Another thing worth to mention is that ep_poll_callback() can be called
  * concurrently for the same @epi from different CPUs if poll table was inited
  * with several wait queues entries.  Plural wakeup from different CPUs of a
- * single wait queue is serialized by wq.lock, but the case when multiple wait
+ * single wait queue is serialized by ep->lock, but the case when multiple wait
  * queues are used should be detected accordingly.  This is detected using
  * cmpxchg() operation.
  */
@@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 				break;
 			}
 		}
+		/*
+		 * Since here we have the read lock (ep->lock) taken, plural
+		 * wakeup from different CPUs can occur, thus we call wake_up()
+		 * variant which implies its own lock on wqueue. All other paths
+		 * take write lock.
+		 */
 		wake_up(&ep->wq);
 	}
 	if (waitqueue_active(&ep->poll_wait))
@@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 
 		/* Notify waiting tasks that events are available */
 		if (waitqueue_active(&ep->wq))
-			wake_up(&ep->wq);
+			wake_up_locked(&ep->wq);
 		if (waitqueue_active(&ep->poll_wait))
 			pwake++;
 	}
@@ -1657,7 +1663,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
 
 			/* Notify waiting tasks that events are available */
 			if (waitqueue_active(&ep->wq))
-				wake_up(&ep->wq);
+				wake_up_locked(&ep->wq);
 			if (waitqueue_active(&ep->poll_wait))
 				pwake++;
 		}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] kselftest: introduce new epoll test case
  2020-02-10  9:41 [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path Roman Penyaev
  2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
@ 2020-02-10  9:41 ` Roman Penyaev
  1 sibling, 0 replies; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10  9:41 UTC (permalink / raw)
  Cc: Roman Penyaev, Max Neunhoeffer, Jakub Kicinski,
	Christopher Kohlhoff, Davidlohr Bueso, Jason Baron,
	Andrew Morton, linux-fsdevel, linux-kernel

This testcase repeats epollbug.c from the bug:

  https://bugzilla.kernel.org/show_bug.cgi?id=205933

What it tests? It tests the race between epoll_ctl() and epoll_wait().
New event mask passed to epoll_ctl() triggers wake up, which can be
missed because of the bug described in the link.  Reproduction is 100%,
so easy to fix. Kudos, Max, for wonderful testcase.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Nothing interesting in v2:
     changed the comment a bit

 .../filesystems/epoll/epoll_wakeup_test.c     | 67 ++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
index 37a04dab56f0..11eee0b60040 100644
--- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
+++ b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
@@ -7,13 +7,14 @@
 #include <pthread.h>
 #include <sys/epoll.h>
 #include <sys/socket.h>
+#include <sys/eventfd.h>
 #include "../../kselftest_harness.h"
 
 struct epoll_mtcontext
 {
 	int efd[3];
 	int sfd[4];
-	int count;
+	volatile int count;
 
 	pthread_t main;
 	pthread_t waiter;
@@ -3071,4 +3072,68 @@ TEST(epoll58)
 	close(ctx.sfd[3]);
 }
 
+static void *epoll59_thread(void *ctx_)
+{
+	struct epoll_mtcontext *ctx = ctx_;
+	struct epoll_event e;
+	int i;
+
+	for (i = 0; i < 100000; i++) {
+		while (ctx->count == 0)
+			;
+
+		e.events = EPOLLIN | EPOLLERR | EPOLLET;
+		epoll_ctl(ctx->efd[0], EPOLL_CTL_MOD, ctx->sfd[0], &e);
+		ctx->count = 0;
+	}
+
+	return NULL;
+}
+
+/*
+ *        t0
+ *      (p) \
+ *           e0
+ *     (et) /
+ *        e0
+ *
+ * Based on https://bugzilla.kernel.org/show_bug.cgi?id=205933
+ */
+TEST(epoll59)
+{
+	pthread_t emitter;
+	struct pollfd pfd;
+	struct epoll_event e;
+	struct epoll_mtcontext ctx = { 0 };
+	int i, ret;
+
+	signal(SIGUSR1, signal_handler);
+
+	ctx.efd[0] = epoll_create1(0);
+	ASSERT_GE(ctx.efd[0], 0);
+
+	ctx.sfd[0] = eventfd(1, 0);
+	ASSERT_GE(ctx.sfd[0], 0);
+
+	e.events = EPOLLIN | EPOLLERR | EPOLLET;
+	ASSERT_EQ(epoll_ctl(ctx.efd[0], EPOLL_CTL_ADD, ctx.sfd[0], &e), 0);
+
+	ASSERT_EQ(pthread_create(&emitter, NULL, epoll59_thread, &ctx), 0);
+
+	for (i = 0; i < 100000; i++) {
+		ret = epoll_wait(ctx.efd[0], &e, 1, 1000);
+		ASSERT_GT(ret, 0);
+
+		while (ctx.count != 0)
+			;
+		ctx.count = 1;
+	}
+	if (pthread_tryjoin_np(emitter, NULL) < 0) {
+		pthread_kill(emitter, SIGUSR1);
+		pthread_join(emitter, NULL);
+	}
+	close(ctx.efd[0]);
+	close(ctx.sfd[0]);
+}
+
 TEST_HARNESS_MAIN
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases
  2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
@ 2020-02-10 18:16   ` Jason Baron
  2020-02-10 19:31     ` Roman Penyaev
  0 siblings, 1 reply; 5+ messages in thread
From: Jason Baron @ 2020-02-10 18:16 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Max Neunhoeffer, Jakub Kicinski, Christopher Kohlhoff,
	Davidlohr Bueso, Andrew Morton, linux-fsdevel, linux-kernel



On 2/10/20 4:41 AM, Roman Penyaev wrote:
> Now ep->lock is responsible for wqueue serialization, thus if ep->lock
> is taken on write path, wake_up_locked() can be invoked.
> 
> Though, read path is different.  Since concurrent cpus can enter the
> wake up function it needs to be internally serialized, thus wake_up()
> variant is used which implies internal spin lock.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Max Neunhoeffer <max@arangodb.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  Nothing interesting in v2:
>      changed the comment a bit
> 
>  fs/eventpoll.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index eee3c92a9ebf..6e218234bd4a 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct epitem *epi)
>   * Another thing worth to mention is that ep_poll_callback() can be called
>   * concurrently for the same @epi from different CPUs if poll table was inited
>   * with several wait queues entries.  Plural wakeup from different CPUs of a
> - * single wait queue is serialized by wq.lock, but the case when multiple wait
> + * single wait queue is serialized by ep->lock, but the case when multiple wait
>   * queues are used should be detected accordingly.  This is detected using
>   * cmpxchg() operation.
>   */
> @@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>  				break;
>  			}
>  		}
> +		/*
> +		 * Since here we have the read lock (ep->lock) taken, plural
> +		 * wakeup from different CPUs can occur, thus we call wake_up()
> +		 * variant which implies its own lock on wqueue. All other paths
> +		 * take write lock.
> +		 */
>  		wake_up(&ep->wq);
>  	}
>  	if (waitqueue_active(&ep->poll_wait))
> @@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
>  
>  		/* Notify waiting tasks that events are available */
>  		if (waitqueue_active(&ep->wq))
> -			wake_up(&ep->wq);
> +			wake_up_locked(&ep->wq);


So I think this will now hit the 'lockdep_assert_held()' in
__wake_up_common()? I agree that its correct, but I think it will
confuse lockdep here...

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases
  2020-02-10 18:16   ` Jason Baron
@ 2020-02-10 19:31     ` Roman Penyaev
  0 siblings, 0 replies; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10 19:31 UTC (permalink / raw)
  To: Jason Baron
  Cc: Max Neunhoeffer, Jakub Kicinski, Christopher Kohlhoff,
	Davidlohr Bueso, Andrew Morton, linux-fsdevel, linux-kernel

On 2020-02-10 19:16, Jason Baron wrote:
> On 2/10/20 4:41 AM, Roman Penyaev wrote:
>> Now ep->lock is responsible for wqueue serialization, thus if ep->lock
>> is taken on write path, wake_up_locked() can be invoked.
>> 
>> Though, read path is different.  Since concurrent cpus can enter the
>> wake up function it needs to be internally serialized, thus wake_up()
>> variant is used which implies internal spin lock.
>> 
>> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
>> Cc: Max Neunhoeffer <max@arangodb.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
>> Cc: Davidlohr Bueso <dbueso@suse.de>
>> Cc: Jason Baron <jbaron@akamai.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-fsdevel@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  Nothing interesting in v2:
>>      changed the comment a bit
>> 
>>  fs/eventpoll.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>> 
>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>> index eee3c92a9ebf..6e218234bd4a 100644
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct 
>> epitem *epi)
>>   * Another thing worth to mention is that ep_poll_callback() can be 
>> called
>>   * concurrently for the same @epi from different CPUs if poll table 
>> was inited
>>   * with several wait queues entries.  Plural wakeup from different 
>> CPUs of a
>> - * single wait queue is serialized by wq.lock, but the case when 
>> multiple wait
>> + * single wait queue is serialized by ep->lock, but the case when 
>> multiple wait
>>   * queues are used should be detected accordingly.  This is detected 
>> using
>>   * cmpxchg() operation.
>>   */
>> @@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t 
>> *wait, unsigned mode, int sync, v
>>  				break;
>>  			}
>>  		}
>> +		/*
>> +		 * Since here we have the read lock (ep->lock) taken, plural
>> +		 * wakeup from different CPUs can occur, thus we call wake_up()
>> +		 * variant which implies its own lock on wqueue. All other paths
>> +		 * take write lock.
>> +		 */
>>  		wake_up(&ep->wq);
>>  	}
>>  	if (waitqueue_active(&ep->poll_wait))
>> @@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const 
>> struct epoll_event *event,
>> 
>>  		/* Notify waiting tasks that events are available */
>>  		if (waitqueue_active(&ep->wq))
>> -			wake_up(&ep->wq);
>> +			wake_up_locked(&ep->wq);
> 
> 
> So I think this will now hit the 'lockdep_assert_held()' in
> __wake_up_common()? I agree that its correct, but I think it will
> confuse lockdep here...

Argh! True. And I do not see any neat way to shut up lockdep here
(Calling lock_acquire() manually seems not an option for such minor
thing).

Then this optimization is not needed, patch is cancelled.

Thanks for noting that.

--
Roman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-10 19:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10  9:41 [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path Roman Penyaev
2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
2020-02-10 18:16   ` Jason Baron
2020-02-10 19:31     ` Roman Penyaev
2020-02-10  9:41 ` [PATCH v2 3/3] kselftest: introduce new epoll test case Roman Penyaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.