linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Salman Qazi <sqazi@google.com>
Subject: Re: [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake()
Date: Wed, 18 Oct 2017 10:03:10 -0400	[thread overview]
Message-ID: <05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com> (raw)
In-Reply-To: <20171017153737.kvrzzhvy55ocgb7u@linux-n805>



On 10/17/2017 11:37 AM, Davidlohr Bueso wrote:
> On Fri, 13 Oct 2017, Jason Baron wrote:
> 
>> The ep_poll_safewake() function is used to wakeup potentially nested
>> epoll
>> file descriptors. The function uses ep_call_nested() to prevent entering
>> the same wake up queue more than once, and to prevent excessively deep
>> wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary
>> since we are already preventing these conditions during EPOLL_CTL_ADD.
>> This
>> saves extra function calls, and avoids taking a global lock during the
>> ep_call_nested() calls.
> 
> This makes sense.
> 
>>
>> I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC
>> case, since ep_call_nested() keeps track of the nesting level, and
>> this is
>> required by the call to spin_lock_irqsave_nested(). It would be nice to
>> remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as
>> well, however its not clear how to simply pass the nesting level through
>> multiple wake_up() levels without more surgery. In any case, I don't
>> think
>> CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch,
>> also
>> apparently fixes a workload at Google that Salman Qazi reported by
>> completely removing the poll_safewake_ncalls->lock from wakeup paths.
> 
> I'm a bit curious about the workload (which uses lots of EPOLL_CTL_ADDs) as
> I was tackling the nested epoll scaling issue with loop and readywalk lists
> in mind.
>>

I'm not sure the details of the workload - perhaps Salman can elaborate
further about it.

It would seem that the safewake would potentially be the most contended
in general in the nested case, because generally you have a few epoll
fds attached to lots of sources doing wakeups. So those sources are all
going to conflict on the safewake lock. The readywalk is used when
performing a 'nested' poll and in general this is likely going to be
called on a few epoll fds. That said, we should remove it too. I will
post a patch to remove it.

The loop lock is used during EPOLL_CTL_ADD to check for loops and deep
wakeup paths and so I would expect this to be less common, but I
wouldn't doubt there are workloads impacted by it. We can potentially, I
think remove this one too - and the global 'epmutex'. I posted some
ideas a while ago on it:

http://lkml.iu.edu/hypermail//linux/kernel/1501.1/05905.html

We can work through these ideas or others...

Thanks,

-Jason


>> Signed-off-by: Jason Baron <jbaron@akamai.com>
>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Salman Qazi <sqazi@google.com>
> 
> Acked-by: Davidlohr Bueso <dbueso@suse.de>
> 
>> ---
>> fs/eventpoll.c | 47 ++++++++++++++++++-----------------------------
>> 1 file changed, 18 insertions(+), 29 deletions(-)
> 
> Yay for getting rid of some of the callback hell.
> 
> Thanks,
> Davidlohr

  reply	other threads:[~2017-10-18 14:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13 18:48 [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake() Jason Baron
2017-10-17 15:37 ` Davidlohr Bueso
2017-10-18 14:03   ` Jason Baron [this message]
2017-10-28 14:05     ` Davidlohr Bueso
2017-10-31 14:14       ` Jason Baron
2018-01-18 11:00     ` Hou Tao
2018-01-18 21:18       ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com \
    --to=jbaron@akamai.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sqazi@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).