linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path
@ 2020-02-10  9:41 Roman Penyaev
  2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
  2020-02-10  9:41 ` [PATCH v2 3/3] kselftest: introduce new epoll test case Roman Penyaev
  0 siblings, 2 replies; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10  9:41 UTC (permalink / raw)
  Cc: Roman Penyaev, Max Neunhoeffer, Jakub Kicinski,
	Christopher Kohlhoff, Davidlohr Bueso, Jason Baron,
	Andrew Morton, linux-fsdevel, linux-kernel

This fixes possible lost wakeup introduced by the a218cc491420.
Originally modifications to ep->wq were serialized by ep->wq.lock,
but in the a218cc491420 new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on
epoll_ctl() path) were switched to ep->lock, but ep_poll
(epoll_wait) was using ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock
internally, but actual waitqueue_active() check prior wake_up()
call can be reordered with modifications of ep ready list, thus
wake up can be lost.

And yes, can be healed by explicit smp_mb():

  list_add_tail(&epi->rdlink, &ep->rdllist);
  smp_mb();
  if (waitqueue_active(&ep->wq))
	wake_up(&ep->wp);

But let's make it simple, thus current patch replaces ep->wq.lock
with the ep->lock for wqueue modifications, thus wake up path
always observes activeness of the wqueue correcty.

Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Reported-by: Max Neunhoeffer <max@arangodb.com>
Bisected-by: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Nothing interesting in v2:
     changed the comment a bit and specified Reported-by and Bisected-by tags

 fs/eventpoll.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b041b66002db..eee3c92a9ebf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		waiter = true;
 		init_waitqueue_entry(&wait, current);
 
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		__add_wait_queue_exclusive(&ep->wq, &wait);
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 	}
 
 	for (;;) {
@@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		goto fetch_events;
 
 	if (waiter) {
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		__remove_wait_queue(&ep->wq, &wait);
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 	}
 
 	return res;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases
  2020-02-10  9:41 [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path Roman Penyaev
@ 2020-02-10  9:41 ` Roman Penyaev
  2020-02-10 18:16   ` Jason Baron
  2020-02-10  9:41 ` [PATCH v2 3/3] kselftest: introduce new epoll test case Roman Penyaev
  1 sibling, 1 reply; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10  9:41 UTC (permalink / raw)
  Cc: Roman Penyaev, Max Neunhoeffer, Jakub Kicinski,
	Christopher Kohlhoff, Davidlohr Bueso, Jason Baron,
	Andrew Morton, linux-fsdevel, linux-kernel

Now ep->lock is responsible for wqueue serialization, thus if ep->lock
is taken on write path, wake_up_locked() can be invoked.

Though, read path is different.  Since concurrent cpus can enter the
wake up function it needs to be internally serialized, thus wake_up()
variant is used which implies internal spin lock.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Nothing interesting in v2:
     changed the comment a bit

 fs/eventpoll.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index eee3c92a9ebf..6e218234bd4a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct epitem *epi)
  * Another thing worth to mention is that ep_poll_callback() can be called
  * concurrently for the same @epi from different CPUs if poll table was inited
  * with several wait queues entries.  Plural wakeup from different CPUs of a
- * single wait queue is serialized by wq.lock, but the case when multiple wait
+ * single wait queue is serialized by ep->lock, but the case when multiple wait
  * queues are used should be detected accordingly.  This is detected using
  * cmpxchg() operation.
  */
@@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
 				break;
 			}
 		}
+		/*
+		 * Since here we have the read lock (ep->lock) taken, plural
+		 * wakeup from different CPUs can occur, thus we call wake_up()
+		 * variant which implies its own lock on wqueue. All other paths
+		 * take write lock.
+		 */
 		wake_up(&ep->wq);
 	}
 	if (waitqueue_active(&ep->poll_wait))
@@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
 
 		/* Notify waiting tasks that events are available */
 		if (waitqueue_active(&ep->wq))
-			wake_up(&ep->wq);
+			wake_up_locked(&ep->wq);
 		if (waitqueue_active(&ep->poll_wait))
 			pwake++;
 	}
@@ -1657,7 +1663,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
 
 			/* Notify waiting tasks that events are available */
 			if (waitqueue_active(&ep->wq))
-				wake_up(&ep->wq);
+				wake_up_locked(&ep->wq);
 			if (waitqueue_active(&ep->poll_wait))
 				pwake++;
 		}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] kselftest: introduce new epoll test case
  2020-02-10  9:41 [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path Roman Penyaev
  2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
@ 2020-02-10  9:41 ` Roman Penyaev
  1 sibling, 0 replies; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10  9:41 UTC (permalink / raw)
  Cc: Roman Penyaev, Max Neunhoeffer, Jakub Kicinski,
	Christopher Kohlhoff, Davidlohr Bueso, Jason Baron,
	Andrew Morton, linux-fsdevel, linux-kernel

This testcase repeats epollbug.c from the bug:

  https://bugzilla.kernel.org/show_bug.cgi?id=205933

What it tests? It tests the race between epoll_ctl() and epoll_wait().
New event mask passed to epoll_ctl() triggers wake up, which can be
missed because of the bug described in the link.  Reproduction is 100%,
so easy to fix. Kudos, Max, for wonderful testcase.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Nothing interesting in v2:
     changed the comment a bit

 .../filesystems/epoll/epoll_wakeup_test.c     | 67 ++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
index 37a04dab56f0..11eee0b60040 100644
--- a/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
+++ b/tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
@@ -7,13 +7,14 @@
 #include <pthread.h>
 #include <sys/epoll.h>
 #include <sys/socket.h>
+#include <sys/eventfd.h>
 #include "../../kselftest_harness.h"
 
 struct epoll_mtcontext
 {
 	int efd[3];
 	int sfd[4];
-	int count;
+	volatile int count;
 
 	pthread_t main;
 	pthread_t waiter;
@@ -3071,4 +3072,68 @@ TEST(epoll58)
 	close(ctx.sfd[3]);
 }
 
+static void *epoll59_thread(void *ctx_)
+{
+	struct epoll_mtcontext *ctx = ctx_;
+	struct epoll_event e;
+	int i;
+
+	for (i = 0; i < 100000; i++) {
+		while (ctx->count == 0)
+			;
+
+		e.events = EPOLLIN | EPOLLERR | EPOLLET;
+		epoll_ctl(ctx->efd[0], EPOLL_CTL_MOD, ctx->sfd[0], &e);
+		ctx->count = 0;
+	}
+
+	return NULL;
+}
+
+/*
+ *        t0
+ *      (p) \
+ *           e0
+ *     (et) /
+ *        e0
+ *
+ * Based on https://bugzilla.kernel.org/show_bug.cgi?id=205933
+ */
+TEST(epoll59)
+{
+	pthread_t emitter;
+	struct pollfd pfd;
+	struct epoll_event e;
+	struct epoll_mtcontext ctx = { 0 };
+	int i, ret;
+
+	signal(SIGUSR1, signal_handler);
+
+	ctx.efd[0] = epoll_create1(0);
+	ASSERT_GE(ctx.efd[0], 0);
+
+	ctx.sfd[0] = eventfd(1, 0);
+	ASSERT_GE(ctx.sfd[0], 0);
+
+	e.events = EPOLLIN | EPOLLERR | EPOLLET;
+	ASSERT_EQ(epoll_ctl(ctx.efd[0], EPOLL_CTL_ADD, ctx.sfd[0], &e), 0);
+
+	ASSERT_EQ(pthread_create(&emitter, NULL, epoll59_thread, &ctx), 0);
+
+	for (i = 0; i < 100000; i++) {
+		ret = epoll_wait(ctx.efd[0], &e, 1, 1000);
+		ASSERT_GT(ret, 0);
+
+		while (ctx.count != 0)
+			;
+		ctx.count = 1;
+	}
+	if (pthread_tryjoin_np(emitter, NULL) < 0) {
+		pthread_kill(emitter, SIGUSR1);
+		pthread_join(emitter, NULL);
+	}
+	close(ctx.efd[0]);
+	close(ctx.sfd[0]);
+}
+
 TEST_HARNESS_MAIN
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases
  2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
@ 2020-02-10 18:16   ` Jason Baron
  2020-02-10 19:31     ` Roman Penyaev
  0 siblings, 1 reply; 5+ messages in thread
From: Jason Baron @ 2020-02-10 18:16 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Max Neunhoeffer, Jakub Kicinski, Christopher Kohlhoff,
	Davidlohr Bueso, Andrew Morton, linux-fsdevel, linux-kernel



On 2/10/20 4:41 AM, Roman Penyaev wrote:
> Now ep->lock is responsible for wqueue serialization, thus if ep->lock
> is taken on write path, wake_up_locked() can be invoked.
> 
> Though, read path is different.  Since concurrent cpus can enter the
> wake up function it needs to be internally serialized, thus wake_up()
> variant is used which implies internal spin lock.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Max Neunhoeffer <max@arangodb.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  Nothing interesting in v2:
>      changed the comment a bit
> 
>  fs/eventpoll.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index eee3c92a9ebf..6e218234bd4a 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct epitem *epi)
>   * Another thing worth to mention is that ep_poll_callback() can be called
>   * concurrently for the same @epi from different CPUs if poll table was inited
>   * with several wait queues entries.  Plural wakeup from different CPUs of a
> - * single wait queue is serialized by wq.lock, but the case when multiple wait
> + * single wait queue is serialized by ep->lock, but the case when multiple wait
>   * queues are used should be detected accordingly.  This is detected using
>   * cmpxchg() operation.
>   */
> @@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>  				break;
>  			}
>  		}
> +		/*
> +		 * Since here we have the read lock (ep->lock) taken, plural
> +		 * wakeup from different CPUs can occur, thus we call wake_up()
> +		 * variant which implies its own lock on wqueue. All other paths
> +		 * take write lock.
> +		 */
>  		wake_up(&ep->wq);
>  	}
>  	if (waitqueue_active(&ep->poll_wait))
> @@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
>  
>  		/* Notify waiting tasks that events are available */
>  		if (waitqueue_active(&ep->wq))
> -			wake_up(&ep->wq);
> +			wake_up_locked(&ep->wq);


So I think this will now hit the 'lockdep_assert_held()' in
__wake_up_common()? I agree that its correct, but I think it will
confuse lockdep here...

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases
  2020-02-10 18:16   ` Jason Baron
@ 2020-02-10 19:31     ` Roman Penyaev
  0 siblings, 0 replies; 5+ messages in thread
From: Roman Penyaev @ 2020-02-10 19:31 UTC (permalink / raw)
  To: Jason Baron
  Cc: Max Neunhoeffer, Jakub Kicinski, Christopher Kohlhoff,
	Davidlohr Bueso, Andrew Morton, linux-fsdevel, linux-kernel

On 2020-02-10 19:16, Jason Baron wrote:
> On 2/10/20 4:41 AM, Roman Penyaev wrote:
>> Now ep->lock is responsible for wqueue serialization, thus if ep->lock
>> is taken on write path, wake_up_locked() can be invoked.
>> 
>> Though, read path is different.  Since concurrent cpus can enter the
>> wake up function it needs to be internally serialized, thus wake_up()
>> variant is used which implies internal spin lock.
>> 
>> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
>> Cc: Max Neunhoeffer <max@arangodb.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
>> Cc: Davidlohr Bueso <dbueso@suse.de>
>> Cc: Jason Baron <jbaron@akamai.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-fsdevel@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  Nothing interesting in v2:
>>      changed the comment a bit
>> 
>>  fs/eventpoll.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>> 
>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>> index eee3c92a9ebf..6e218234bd4a 100644
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -1173,7 +1173,7 @@ static inline bool chain_epi_lockless(struct 
>> epitem *epi)
>>   * Another thing worth to mention is that ep_poll_callback() can be 
>> called
>>   * concurrently for the same @epi from different CPUs if poll table 
>> was inited
>>   * with several wait queues entries.  Plural wakeup from different 
>> CPUs of a
>> - * single wait queue is serialized by wq.lock, but the case when 
>> multiple wait
>> + * single wait queue is serialized by ep->lock, but the case when 
>> multiple wait
>>   * queues are used should be detected accordingly.  This is detected 
>> using
>>   * cmpxchg() operation.
>>   */
>> @@ -1248,6 +1248,12 @@ static int ep_poll_callback(wait_queue_entry_t 
>> *wait, unsigned mode, int sync, v
>>  				break;
>>  			}
>>  		}
>> +		/*
>> +		 * Since here we have the read lock (ep->lock) taken, plural
>> +		 * wakeup from different CPUs can occur, thus we call wake_up()
>> +		 * variant which implies its own lock on wqueue. All other paths
>> +		 * take write lock.
>> +		 */
>>  		wake_up(&ep->wq);
>>  	}
>>  	if (waitqueue_active(&ep->poll_wait))
>> @@ -1551,7 +1557,7 @@ static int ep_insert(struct eventpoll *ep, const 
>> struct epoll_event *event,
>> 
>>  		/* Notify waiting tasks that events are available */
>>  		if (waitqueue_active(&ep->wq))
>> -			wake_up(&ep->wq);
>> +			wake_up_locked(&ep->wq);
> 
> 
> So I think this will now hit the 'lockdep_assert_held()' in
> __wake_up_common()? I agree that its correct, but I think it will
> confuse lockdep here...

Argh! True. And I do not see any neat way to shut up lockdep here
(Calling lock_acquire() manually seems not an option for such minor
thing).

Then this optimization is not needed, patch is cancelled.

Thanks for noting that.

--
Roman


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-10 19:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10  9:41 [PATCH v2 1/3] epoll: fix possible lost wakeup on epoll_ctl() path Roman Penyaev
2020-02-10  9:41 ` [PATCH v2 2/3] epoll: ep->wq can be woken up unlocked in certain cases Roman Penyaev
2020-02-10 18:16   ` Jason Baron
2020-02-10 19:31     ` Roman Penyaev
2020-02-10  9:41 ` [PATCH v2 3/3] kselftest: introduce new epoll test case Roman Penyaev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).