All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Salman Qazi <sqazi@google.com>
Subject: Re: [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake()
Date: Wed, 18 Oct 2017 10:03:10 -0400	[thread overview]
Message-ID: <05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com> (raw)
In-Reply-To: <20171017153737.kvrzzhvy55ocgb7u@linux-n805>



On 10/17/2017 11:37 AM, Davidlohr Bueso wrote:
> On Fri, 13 Oct 2017, Jason Baron wrote:
> 
>> The ep_poll_safewake() function is used to wakeup potentially nested
>> epoll
>> file descriptors. The function uses ep_call_nested() to prevent entering
>> the same wake up queue more than once, and to prevent excessively deep
>> wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary
>> since we are already preventing these conditions during EPOLL_CTL_ADD.
>> This
>> saves extra function calls, and avoids taking a global lock during the
>> ep_call_nested() calls.
> 
> This makes sense.
> 
>>
>> I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC
>> case, since ep_call_nested() keeps track of the nesting level, and
>> this is
>> required by the call to spin_lock_irqsave_nested(). It would be nice to
>> remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as
>> well, however its not clear how to simply pass the nesting level through
>> multiple wake_up() levels without more surgery. In any case, I don't
>> think
>> CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch,
>> also
>> apparently fixes a workload at Google that Salman Qazi reported by
>> completely removing the poll_safewake_ncalls->lock from wakeup paths.
> 
> I'm a bit curious about the workload (which uses lots of EPOLL_CTL_ADDs) as
> I was tackling the nested epoll scaling issue with loop and readywalk lists
> in mind.
>>

I'm not sure the details of the workload - perhaps Salman can elaborate
further about it.

It would seem that the safewake would potentially be the most contended
in general in the nested case, because generally you have a few epoll
fds attached to lots of sources doing wakeups. So those sources are all
going to conflict on the safewake lock. The readywalk is used when
performing a 'nested' poll and in general this is likely going to be
called on a few epoll fds. That said, we should remove it too. I will
post a patch to remove it.

The loop lock is used during EPOLL_CTL_ADD to check for loops and deep
wakeup paths and so I would expect this to be less common, but I
wouldn't doubt there are workloads impacted by it. We can potentially, I
think remove this one too - and the global 'epmutex'. I posted some
ideas a while ago on it:

http://lkml.iu.edu/hypermail//linux/kernel/1501.1/05905.html

We can work through these ideas or others...

Thanks,

-Jason


>> Signed-off-by: Jason Baron <jbaron@akamai.com>
>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Salman Qazi <sqazi@google.com>
> 
> Acked-by: Davidlohr Bueso <dbueso@suse.de>
> 
>> ---
>> fs/eventpoll.c | 47 ++++++++++++++++++-----------------------------
>> 1 file changed, 18 insertions(+), 29 deletions(-)
> 
> Yay for getting rid of some of the callback hell.
> 
> Thanks,
> Davidlohr

  reply	other threads:[~2017-10-18 14:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13 18:48 [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake() Jason Baron
2017-10-17 15:37 ` Davidlohr Bueso
2017-10-18 14:03   ` Jason Baron [this message]
2017-10-28 14:05     ` Davidlohr Bueso
2017-10-31 14:14       ` Jason Baron
2018-01-18 11:00     ` Hou Tao
2018-01-18 21:18       ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com \
    --to=jbaron@akamai.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sqazi@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.