From: Roman Penyaev <rpenyaev@suse.de>
To: Jason Baron <jbaron@akamai.com>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
Alexander Viro <viro@zeniv.linux.org.uk>, Heiher <r@hev.cc>,
Khazhismel Kumykov <khazhy@google.com>,
Davidlohr Bueso <dbueso@suse.de>,
stable@vger.kernel.org
Subject: Re: [PATCH] epoll: ensure ep_poll() doesn't miss wakeup events
Date: Sun, 03 May 2020 12:24:30 +0200 [thread overview]
Message-ID: <f3c2e63ec34a611ec256785ebfd39270@suse.de> (raw)
In-Reply-To: <81612721-9448-83fa-4efe-603996d56b9a@akamai.com>
On 2020-05-02 00:09, Jason Baron wrote:
> On 5/1/20 5:02 PM, Roman Penyaev wrote:
>> Hi Jason,
>>
>> That is indeed a nice catch.
>> Seems we need smp_rmb() pair between list_empty_careful(&rp->rdllist)
>> and
>> READ_ONCE(ep->ovflist) for ep_events_available(), do we?
>>
>
> Hi Roman,
>
> Good point, even if we order those reads its still racy, since the
> read of the ready list could come after its been cleared and the
> read of the overflow could again come after its been cleared.
You mean the second chunk? True. Sigh.
> So I'm afraid we might need instead something like this to make
> sure they are read together:
No, impossible, I can't believe in that :) We can't give up.
All we need is to keep a mark, that ep->rdllist is not empty,
even we've just spliced it. ep_poll_callback() always takes
the ->ovflist path, if ->ovflist is not EP_UNACTIVE_PTR, but
ep_events_available() does not need to observe ->ovflist at
all, just a ->rdllist.
Take a look at that, do I miss something? :
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index aba03ee749f8..a8770f9a917e 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -376,8 +376,7 @@ static void ep_nested_calls_init(struct nested_calls
*ncalls)
*/
static inline int ep_events_available(struct eventpoll *ep)
{
- return !list_empty_careful(&ep->rdllist) ||
- READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR;
+ return !list_empty_careful(&ep->rdllist);
}
#ifdef CONFIG_NET_RX_BUSY_POLL
@@ -683,7 +682,8 @@ static __poll_t ep_scan_ready_list(struct eventpoll
*ep,
{
__poll_t res;
struct epitem *epi, *nepi;
- LIST_HEAD(txlist);
+ LIST_HEAD(rdllist);
+ LIST_HEAD(ovflist);
lockdep_assert_irqs_enabled();
@@ -704,14 +704,22 @@ static __poll_t ep_scan_ready_list(struct
eventpoll *ep,
* in a lockless way.
*/
write_lock_irq(&ep->lock);
- list_splice_init(&ep->rdllist, &txlist);
+ /*
+ * We do not call list_splice_init() because for lockless
+ * ep_events_available() ->rdllist is still "not empty".
+ * Otherwise the feature that there is something left in
+ * the list can be lost which causes missed wakeup.
+ */
+ list_splice(&ep->rdllist, &rdllist);
+ /*
+ * If ->rdllist was empty we should pretend it was not,
+ * because after the unlock ->ovflist comes into play,
+ * which is invisible for lockless ep_events_available().
+ */
+ ep->rdllist.next = LIST_POISON1;
WRITE_ONCE(ep->ovflist, NULL);
write_unlock_irq(&ep->lock);
/*
* Now call the callback function.
*/
- res = (*sproc)(ep, &txlist, priv);
+ res = (*sproc)(ep, &rdllist, priv);
write_lock_irq(&ep->lock);
/*
@@ -724,7 +732,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll
*ep,
/*
* We need to check if the item is already in the list.
* During the "sproc" callback execution time, items are
- * queued into ->ovflist but the "txlist" might already
+ * queued into ->ovflist but the "rdllist" might already
* contain them, and the list_splice() below takes care
of them.
*/
if (!ep_is_linked(epi)) {
@@ -732,7 +740,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll
*ep,
* ->ovflist is LIFO, so we have to reverse it
in order
* to keep in FIFO.
*/
- list_add(&epi->rdllink, &ep->rdllist);
+ list_add(&epi->rdllink, &ovflist);
ep_pm_stay_awake(epi);
}
}
@@ -743,10 +751,11 @@ static __poll_t ep_scan_ready_list(struct
eventpoll *ep,
*/
WRITE_ONCE(ep->ovflist, EP_UNACTIVE_PTR);
- /*
- * Quickly re-inject items left on "txlist".
- */
- list_splice(&txlist, &ep->rdllist);
+ /* Events from ->ovflist happened later, thus splice to the tail
*/
+ list_splice_tail(&ovflist, &rdllist);
+ /* Just replace list */
+ list_replace(&rdllist, &ep->rdllist);
+
__pm_relax(ep->ws);
write_unlock_irq(&ep->lock);
@@ -1763,13 +1772,13 @@ static __poll_t ep_send_events_proc(struct
eventpoll *ep, struct list_head *head
* Trigger mode, we need to insert back inside
* the ready list, so that the next call to
* epoll_wait() will check again the events
- * availability. At this point, no one can
insert
- * into ep->rdllist besides us. The epoll_ctl()
- * callers are locked out by
- * ep_scan_ready_list() holding "mtx" and the
- * poll callback will queue them in ep->ovflist.
+ * availability. What we do here is simply
+ * return the epi to the same position where
+ * it was, the ep_scan_ready_list() will
+ * re-inject the leftovers to the ->rdllist
+ * under the proper lock.
*/
- list_add_tail(&epi->rdllink, &ep->rdllist);
+ list_add_tail(&epi->rdllink, &tmp->rdllink);
ep_pm_stay_awake(epi);
}
}
--
Roman
next prev parent reply other threads:[~2020-05-03 10:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-01 19:15 [PATCH] epoll: ensure ep_poll() doesn't miss wakeup events Jason Baron
2020-05-01 21:02 ` Roman Penyaev
2020-05-01 22:09 ` Jason Baron
2020-05-03 10:24 ` Roman Penyaev [this message]
2020-05-04 4:29 ` Jason Baron
2020-05-04 4:59 ` Jason Baron
2020-05-04 9:40 ` Roman Penyaev
2020-05-03 13:05 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f3c2e63ec34a611ec256785ebfd39270@suse.de \
--to=rpenyaev@suse.de \
--cc=akpm@linux-foundation.org \
--cc=dbueso@suse.de \
--cc=jbaron@akamai.com \
--cc=khazhy@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=r@hev.cc \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).