linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Soheil Hassas Yeganeh <soheil@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	 Eric Dumazet <edumazet@google.com>,
	Guantao Liu <guantaol@google.com>,
	 Khazhismel Kumykov <khazhy@google.com>,
	Linux-MM <linux-mm@kvack.org>,
	mm-commits@vger.kernel.org,  Al Viro <viro@zeniv.linux.org.uk>,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [patch 13/15] epoll: check ep_events_available() upon timeout
Date: Mon, 2 Nov 2020 12:48:53 -0500	[thread overview]
Message-ID: <CACSApvZmNAOCUEEs051G7bc9-JbSv1auh+xUvuitP6LkWfn-5w@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjweewYgnT2MX2iqbj2+7QamHH+pdMQGvr3duPL5a_dvA@mail.gmail.com>

On Mon, Nov 2, 2020 at 12:08 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sun, Nov 1, 2020 at 5:08 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > After abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2)
> > timeout"), we break out of the ep_poll loop upon timeout, without checking
> > whether there is any new events available.  Prior to that patch-series we
> > always called ep_events_available() after exiting the loop.
>
> This patch looks overly complicated to me.
>
> It does the exact same thing as the "break" does, except:
>
>  - it does it the non-optimized way without the "avoid spinlock"
>
>  - it sets eavail if there was
>
> It would seem like the *much* simpler patch is to just do this oneliner instead:
>
>     diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>     index 4df61129566d..29fa770ce1e3 100644
>     --- a/fs/eventpoll.c
>     +++ b/fs/eventpoll.c
>     @@ -1907,6 +1907,7 @@ static int ep_poll(struct eventpoll *ep,
> struct epoll_event __user *events,
>
>                 if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) {
>                         timed_out = 1;
>     +                   eavail = 1;
>                         break;
>                 }
>
> and *boom* you're done. That will mean that after a timeout we'll try
> one more time to just do that ep_events_available() thing.

Thank you for the suggestion. That was the first version I tried, and
I can confirm it fixes the race because we call ep_send_events() once
more before returning.  Though, I believe, due to time_out=1, we won't
goto fetch_events to call ep_events_available():

if (!res && eavail &&
   !(res = ep_send_events(ep, events, maxevents)) && !timed_out)
 goto fetch_events;

> I can see no downside to just setting eavail unconditionally and
> keeping the code much simpler. Hmm?

You're spot on that the patch is more complicated than your
suggestion.  However, the downside I observed was a performance
regression for the non-racy case: Suppose there are a few threads with
a similar non-zero timeout and no ready event. They will all
experience a noticeable contention in ep_scan_ready_list, by
unconditionally calling ep_send_events(). The contention was large
because there will be 2 write locks on ep->lock and one mutex lock on
ep->mtx with a large critical section.

FWIW, I also tried the following idea to eliminate that contention but
I couldn't prove that it's correct, because of the gap between calling
ep_events_available and removing this thread from the wait queue.
That was why I resorted to calling __remove_wait_queue() under the
lock.

     diff --git a/fs/eventpoll.c b/fs/eventpoll.c
     index 4df61129566d..29fa770ce1e3 100644
     --- a/fs/eventpoll.c
     +++ b/fs/eventpoll.c
     @@ -1907,6 +1907,7 @@ static int ep_poll(struct eventpoll *ep,
 struct epoll_event __user *events,

                 if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) {
                         timed_out = 1;
     +                   eavail = ep_events_available(ep);
                         break;
                 }

Also, please note that the non-optimized way without the "avoid
spinlock" is not problematic in any of our benchmarks because, when
the thread times out, it's likely on the wait queue except for this
particular racy scenario.

Thanks!
Soheil


>              Linus


  reply	other threads:[~2020-11-02 17:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02  1:06 incoming Andrew Morton
2020-11-02  1:07 ` [patch 01/15] mm/mremap_pages: fix static key devmap_managed_key updates Andrew Morton
2020-11-02  1:07 ` [patch 02/15] hugetlb_cgroup: fix reservation accounting Andrew Morton
2020-11-02  1:07 ` [patch 03/15] mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg Andrew Morton
2020-11-02  1:07 ` [patch 04/15] mm: memcg: link page counters to root if use_hierarchy is false Andrew Morton
2020-11-02  1:07 ` [patch 05/15] kasan: adopt KUNIT tests to SW_TAGS mode Andrew Morton
2020-11-02  1:07 ` [patch 06/15] mm: mempolicy: fix potential pte_unmap_unlock pte error Andrew Morton
2020-11-02  1:07 ` [patch 07/15] ptrace: fix task_join_group_stop() for the case when current is traced Andrew Morton
2020-11-02  1:07 ` [patch 08/15] lib/crc32test: remove extra local_irq_disable/enable Andrew Morton
2020-11-02  1:07 ` [patch 09/15] mm/truncate.c: make __invalidate_mapping_pages() static Andrew Morton
2020-11-02  1:07 ` [patch 10/15] kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled Andrew Morton
2020-11-02  1:07 ` [patch 11/15] mm, oom: keep oom_adj under or at upper limit when printing Andrew Morton
2020-11-02  1:08 ` [patch 12/15] mm: always have io_remap_pfn_range() set pgprot_decrypted() Andrew Morton
2020-11-02  1:08 ` [patch 13/15] epoll: check ep_events_available() upon timeout Andrew Morton
2020-11-02 17:08   ` Linus Torvalds
2020-11-02 17:48     ` Soheil Hassas Yeganeh [this message]
2020-11-02 18:51       ` Linus Torvalds
2020-11-02 19:38         ` Linus Torvalds
2020-11-02 19:54         ` Soheil Hassas Yeganeh
2020-11-02 20:12           ` Linus Torvalds
2020-11-02  1:08 ` [patch 14/15] epoll: add a selftest for epoll timeout race Andrew Morton
2020-11-02  1:08 ` [patch 15/15] kernel/hung_task.c: make type annotations consistent Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACSApvZmNAOCUEEs051G7bc9-JbSv1auh+xUvuitP6LkWfn-5w@mail.gmail.com \
    --to=soheil@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=edumazet@google.com \
    --cc=guantaol@google.com \
    --cc=khazhy@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).