linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Roman Penyaev <rpenyaev@suse.de>
To: Jason Baron <jbaron@akamai.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention
Date: Wed, 05 Dec 2018 12:16:46 +0100	[thread overview]
Message-ID: <38cc83144a2ec332dead4e21214ea068@suse.de> (raw)
In-Reply-To: <45bce871-edfd-c402-acde-2e57e80cc522@akamai.com>

On 2018-12-04 18:23, Jason Baron wrote:
> On 12/3/18 6:02 AM, Roman Penyaev wrote:

[...]

>> 
>>  	ep_set_busy_poll_napi_id(epi);
>> 
>> @@ -1156,8 +1187,8 @@ static int ep_poll_callback(wait_queue_entry_t 
>> *wait, unsigned mode, int sync, v
>>  	 */
>>  	if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>>  		if (epi->next == EP_UNACTIVE_PTR) {
>> -			epi->next = ep->ovflist;
>> -			ep->ovflist = epi;
>> +			/* Atomically exchange tail */
>> +			epi->next = xchg(&ep->ovflist, epi);
> 
> This also relies on the fact that the same epi can't be added to the
> list in parallel, because the wait queue doing the wakeup will have the
> wait_queue_head lock. That was a little confusing for me b/c we only 
> had
> the read_lock() above.

Yes, that is indeed not obvious path, but wq is locked by 
wake_up_*_poll()
call or caller of wake_up_locked_poll() has to be sure wq.lock is taken.

I'll add an explicit comment here, thanks for pointing out.

> 
>>  			if (epi->ws) {
>>  				/*
>>  				 * Activate ep->ws since epi->ws may get
>> @@ -1172,7 +1203,7 @@ static int ep_poll_callback(wait_queue_entry_t 
>> *wait, unsigned mode, int sync, v
>> 
>>  	/* If this file is already in the ready list we exit soon */
>>  	if (!ep_is_linked(epi)) {
>> -		list_add_tail(&epi->rdllink, &ep->rdllist);
>> +		list_add_tail_lockless(&epi->rdllink, &ep->rdllist);
>>  		ep_pm_stay_awake_rcu(epi);
>>  	}
> 
> same for this.

... and an explicit comment here.

> 
>> 
>> @@ -1197,13 +1228,13 @@ static int ep_poll_callback(wait_queue_entry_t 
>> *wait, unsigned mode, int sync, v
>>  				break;
>>  			}
>>  		}
>> -		wake_up_locked(&ep->wq);
>> +		wake_up(&ep->wq);
> 
> why the switch here to the locked() variant? Shouldn't the 'reader'
> side, in this case which takes the rwlock for write see all updates in 
> a
> coherent state at this point?

lockdep inside __wake_up_common expects wq_head->lock is taken, and
seems this is not a good idea to leave wq naked on wake up path,
when several CPUs can enter wake function.  Although 
default_wake_function
is protected by spinlock inside try_to_wake_up(), but for example
autoremove_wake_function() can't be called concurrently for the same wq
(it removes wq entry from the list).  Also in case of bookmarks
__wake_up_common adds an entry to the list, thus can't be called without
any locks.

I understand you concern and you are right saying that read side sees
wq entries as stable, but that will work only if __wake_up_common does
not modify anything, that is seems true now, but of course it is
too scary to rely on that in the future.

> 
>>  	}
>>  	if (waitqueue_active(&ep->poll_wait))
>>  		pwake++;
>> 
>>  out_unlock:
>> -	spin_unlock_irqrestore(&ep->wq.lock, flags);
>> +	read_unlock_irqrestore(&ep->lock, flags);
>> 
>>  	/* We have to call this outside the lock */
>>  	if (pwake)
>> @@ -1489,7 +1520,7 @@ static int ep_insert(struct eventpoll *ep, const 
>> struct epoll_event *event,
>>  		goto error_remove_epi;
>> 
>>  	/* We have to drop the new item inside our item list to keep track 
>> of it */
>> -	spin_lock_irq(&ep->wq.lock);
>> +	write_lock_irq(&ep->lock);
>> 
>>  	/* record NAPI ID of new item if present */
>>  	ep_set_busy_poll_napi_id(epi);
>> @@ -1501,12 +1532,12 @@ static int ep_insert(struct eventpoll *ep, 
>> const struct epoll_event *event,
>> 
>>  		/* Notify waiting tasks that events are available */
>>  		if (waitqueue_active(&ep->wq))
>> -			wake_up_locked(&ep->wq);
>> +			wake_up(&ep->wq);
> 
> is this necessary to switch as well? Is this to make lockdep happy?
> Looks like there are few more conversions too below...

Yes, necessary, wq.lock should be taken.

--
Roman


  parent reply	other threads:[~2018-12-05 11:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 11:02 [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
2018-12-03 17:34 ` Linus Torvalds
2018-12-04 11:50   ` Roman Penyaev
2018-12-04 23:59     ` Andrea Parri
2018-12-05 11:25       ` Roman Penyaev
2018-12-04 17:23 ` Jason Baron
2018-12-04 19:02   ` Paul E. McKenney
2018-12-05 11:22     ` Roman Penyaev
2018-12-05 11:16   ` Roman Penyaev [this message]
2018-12-05 16:38     ` Jason Baron
2018-12-05 20:11       ` Roman Penyaev
2018-12-06  1:54   ` Davidlohr Bueso
2018-12-06  3:08   ` Davidlohr Bueso
2018-12-06 10:27     ` Roman Penyaev
2018-12-06  4:04   ` Davidlohr Bueso
2018-12-06 10:25     ` Roman Penyaev
2018-12-05 23:46 ` Eric Wong
2018-12-06 10:52   ` Roman Penyaev
2018-12-06 20:35     ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38cc83144a2ec332dead4e21214ea068@suse.de \
    --to=rpenyaev@suse.de \
    --cc=jbaron@akamai.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).