All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hou Tao <houtao1@huawei.com>
To: Jason Baron <jbaron@akamai.com>, Davidlohr Bueso <dave@stgolabs.net>
Cc: <akpm@linux-foundation.org>, <linux-kernel@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Salman Qazi <sqazi@google.com>
Subject: Re: [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake()
Date: Thu, 18 Jan 2018 19:00:38 +0800	[thread overview]
Message-ID: <e49485bd-a66e-5843-4b77-3801c3b70d54@huawei.com> (raw)
In-Reply-To: <05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com>

Hi Jason,

On 2017/10/18 22:03, Jason Baron wrote:
> 
> 
> On 10/17/2017 11:37 AM, Davidlohr Bueso wrote:
>> On Fri, 13 Oct 2017, Jason Baron wrote:
>>
>>> The ep_poll_safewake() function is used to wakeup potentially nested
>>> epoll
>>> file descriptors. The function uses ep_call_nested() to prevent entering
>>> the same wake up queue more than once, and to prevent excessively deep
>>> wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary
>>> since we are already preventing these conditions during EPOLL_CTL_ADD.
>>> This
>>> saves extra function calls, and avoids taking a global lock during the
>>> ep_call_nested() calls.
>>
>> This makes sense.
>>
>>>
>>> I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC
>>> case, since ep_call_nested() keeps track of the nesting level, and
>>> this is
>>> required by the call to spin_lock_irqsave_nested(). It would be nice to
>>> remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as
>>> well, however its not clear how to simply pass the nesting level through
>>> multiple wake_up() levels without more surgery. In any case, I don't
>>> think
>>> CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch,
>>> also
>>> apparently fixes a workload at Google that Salman Qazi reported by
>>> completely removing the poll_safewake_ncalls->lock from wakeup paths.
>>
>> I'm a bit curious about the workload (which uses lots of EPOLL_CTL_ADDs) as
>> I was tackling the nested epoll scaling issue with loop and readywalk lists
>> in mind.
>>>
> 
> I'm not sure the details of the workload - perhaps Salman can elaborate
> further about it.
> 
> It would seem that the safewake would potentially be the most contended
> in general in the nested case, because generally you have a few epoll
> fds attached to lots of sources doing wakeups. So those sources are all
> going to conflict on the safewake lock. The readywalk is used when
> performing a 'nested' poll and in general this is likely going to be
> called on a few epoll fds. That said, we should remove it too. I will
> post a patch to remove it.
> 
> The loop lock is used during EPOLL_CTL_ADD to check for loops and deep
> wakeup paths and so I would expect this to be less common, but I
> wouldn't doubt there are workloads impacted by it. We can potentially, I
> think remove this one too - and the global 'epmutex'. I posted some
> ideas a while ago on it:
> 
> http://lkml.iu.edu/hypermail//linux/kernel/1501.1/05905.html
> 
> We can work through these ideas or others...

I have tested these patches on a simple non-nested epoll workload and found that
there are performance regression (-0.5% which is better than -1.0% from my patch set)
under normal nginx conf and performance improvement (+3%, and the number of my
patch set is +4%) under the one-fd-per-request conf. The reason for performance
improvement is clear, however, I haven't figured out the reason for the performance
regression (maybe cache related ?)

It seems that your patch set will work fine both under normal and nested epoll workload,
so I don't plan to continue my patch set which only works for normal epoll workload.
I have one question about your patch set. It is about the newly added fields in
struct file. I had tried to move these fields into struct eventpoll, so only epoll fds
will be added into a disjoint set and the target fd will be linked to a disjoint set
dynamically, but that method doesn't work out because there are multiple target fds and
the order in which these target fds are added to a disjoint set will lead to deadlock.
So do you have other ideas to reduce the size of these fields ?

Thanks,
Tao

The following lines are the result of performance result:

baseline: result for v4.14
ep_cc: result for v4.14 + ep_cc patch set
my: result for  v4.14 + my patch set

The first 15 columns are the results from wrk and their unit is Requests/sec,
the lats two columns are their average value and standard derivation.

* normal scenario: nginx + wrk

baseline
376704.74 377223.88 378096.67 379484.45 379199.04 379526.21 378008.82 \
379959.98 377634.47 379127.27 377622.92 379442.12 378994.44 376000.08 377046.58 \
378271.4447 1210.673592

ep_cc
373376.08 376897.65 373493.65 376154.44 377374.36 379080.88 375124.44 \
376404.16 376539.28 375141.84 379075.32 377214.39 374524.52 375605.87 372167.8 \
375878.312 1983.190539

my
373067.96 372369 375317.61 375564.66 373841.88 373699.88 376802.91 369988.97 \
376080.45 371580.64 374265.34 376370.34 377033.75 372786.91 375521.61 \
374286.1273 2059.92093


* one-fd-per-request scenario: nginx + wrk
baseline
124509.77 106542.46 118074.46 111434.02 117628.9 108260.56 119037.91 \
114413.75 108706.82 116277.99 111588.4 118216.56 121267.63 110950.4 116455.43 \
114891.004 5168.30288

ep_cc
124209.45 118297.8 120258.1 122785.95 118744.64 118471.76 117621.07 \
117525.14 114797.14 115348.47 117154.02 117208.37 118618.92 118578.8 118945.52 \
118571.01 2437.050182

my
125090.99 121408.92 116444.09 120055.4 119399.24 114938.04 122889.56 \
114363.49 120934.15 119052.03 118060.47 129976.45 121108.08 114849.93 114810.64 \
119558.7653 4338.18401

> Thanks,
> 
> -Jason
> 
> 
>>> Signed-off-by: Jason Baron <jbaron@akamai.com>
>>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Salman Qazi <sqazi@google.com>
>>
>> Acked-by: Davidlohr Bueso <dbueso@suse.de>
>>
>>> ---
>>> fs/eventpoll.c | 47 ++++++++++++++++++-----------------------------
>>> 1 file changed, 18 insertions(+), 29 deletions(-)
>>
>> Yay for getting rid of some of the callback hell.
>>
>> Thanks,
>> Davidlohr
> 
> .
> 

  parent reply	other threads:[~2018-01-18 11:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-13 18:48 [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake() Jason Baron
2017-10-17 15:37 ` Davidlohr Bueso
2017-10-18 14:03   ` Jason Baron
2017-10-28 14:05     ` Davidlohr Bueso
2017-10-31 14:14       ` Jason Baron
2018-01-18 11:00     ` Hou Tao [this message]
2018-01-18 21:18       ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e49485bd-a66e-5843-4b77-3801c3b70d54@huawei.com \
    --to=houtao1@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=jbaron@akamai.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sqazi@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.