linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Roman Penyaev <rpenyaev@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention
Date: Wed, 5 Dec 2018 11:38:09 -0500	[thread overview]
Message-ID: <b8c6ef78-f0ea-2bc5-566a-9482061fffb2@akamai.com> (raw)
In-Reply-To: <38cc83144a2ec332dead4e21214ea068@suse.de>



On 12/5/18 6:16 AM, Roman Penyaev wrote:
> On 2018-12-04 18:23, Jason Baron wrote:
>> On 12/3/18 6:02 AM, Roman Penyaev wrote:
> 
> [...]
> 
>>>
>>>      ep_set_busy_poll_napi_id(epi);
>>>
>>> @@ -1156,8 +1187,8 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>>       */
>>>      if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>>>          if (epi->next == EP_UNACTIVE_PTR) {
>>> -            epi->next = ep->ovflist;
>>> -            ep->ovflist = epi;
>>> +            /* Atomically exchange tail */
>>> +            epi->next = xchg(&ep->ovflist, epi);
>>
>> This also relies on the fact that the same epi can't be added to the
>> list in parallel, because the wait queue doing the wakeup will have the
>> wait_queue_head lock. That was a little confusing for me b/c we only had
>> the read_lock() above.
> 
> Yes, that is indeed not obvious path, but wq is locked by wake_up_*_poll()
> call or caller of wake_up_locked_poll() has to be sure wq.lock is taken.
> 
> I'll add an explicit comment here, thanks for pointing out.
> 
>>
>>>              if (epi->ws) {
>>>                  /*
>>>                   * Activate ep->ws since epi->ws may get
>>> @@ -1172,7 +1203,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>>
>>>      /* If this file is already in the ready list we exit soon */
>>>      if (!ep_is_linked(epi)) {
>>> -        list_add_tail(&epi->rdllink, &ep->rdllist);
>>> +        list_add_tail_lockless(&epi->rdllink, &ep->rdllist);
>>>          ep_pm_stay_awake_rcu(epi);
>>>      }
>>
>> same for this.
> 
> ... and an explicit comment here.
> 
>>
>>>
>>> @@ -1197,13 +1228,13 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>>                  break;
>>>              }
>>>          }
>>> -        wake_up_locked(&ep->wq);
>>> +        wake_up(&ep->wq);
>>
>> why the switch here to the locked() variant? Shouldn't the 'reader'
>> side, in this case which takes the rwlock for write see all updates in a
>> coherent state at this point?
> 
> lockdep inside __wake_up_common expects wq_head->lock is taken, and
> seems this is not a good idea to leave wq naked on wake up path,
> when several CPUs can enter wake function.  Although default_wake_function
> is protected by spinlock inside try_to_wake_up(), but for example
> autoremove_wake_function() can't be called concurrently for the same wq
> (it removes wq entry from the list).  Also in case of bookmarks
> __wake_up_common adds an entry to the list, thus can't be called without
> any locks.
> 
> I understand you concern and you are right saying that read side sees
> wq entries as stable, but that will work only if __wake_up_common does
> not modify anything, that is seems true now, but of course it is
> too scary to rely on that in the future.

I think it might be interesting for, at least testing, to see if not grabbing
wq.lock improves your benchmarks any further? fwiw, epoll only recently started
grabbing wq.lock bc lockdep required it.

Thanks,

-Jason

  reply	other threads:[~2018-12-05 16:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 11:02 [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
2018-12-03 17:34 ` Linus Torvalds
2018-12-04 11:50   ` Roman Penyaev
2018-12-04 23:59     ` Andrea Parri
2018-12-05 11:25       ` Roman Penyaev
2018-12-04 17:23 ` Jason Baron
2018-12-04 19:02   ` Paul E. McKenney
2018-12-05 11:22     ` Roman Penyaev
2018-12-05 11:16   ` Roman Penyaev
2018-12-05 16:38     ` Jason Baron [this message]
2018-12-05 20:11       ` Roman Penyaev
2018-12-06  1:54   ` Davidlohr Bueso
2018-12-06  3:08   ` Davidlohr Bueso
2018-12-06 10:27     ` Roman Penyaev
2018-12-06  4:04   ` Davidlohr Bueso
2018-12-06 10:25     ` Roman Penyaev
2018-12-05 23:46 ` Eric Wong
2018-12-06 10:52   ` Roman Penyaev
2018-12-06 20:35     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8c6ef78-f0ea-2bc5-566a-9482061fffb2@akamai.com \
    --to=jbaron@akamai.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rpenyaev@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).