linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pan Xinhui <xinhui@linux.vnet.ibm.com>
To: Boqun Feng <boqun.feng@gmail.com>, Xinhui Pan <mnipxh@gmail.com>
Cc: Waiman Long <longman@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs
Date: Wed, 8 Feb 2017 15:09:33 +0800	[thread overview]
Message-ID: <9c22e27d-e80e-81f9-d7a0-992e80425f44@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170208060939.GC9178@tardis.cn.ibm.com>



在 2017/2/8 14:09, Boqun Feng 写道:
> On Wed, Feb 08, 2017 at 12:05:40PM +0800, Boqun Feng wrote:
>> On Wed, Feb 08, 2017 at 11:39:10AM +0800, Xinhui Pan wrote:
>>> 2016-12-26 4:26 GMT+08:00 Waiman Long <longman@redhat.com>:
>>>
>>>> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
>>>> relaxed versions to improve performance on architectures that use LL/SC.
>>>>
>>>> All the locking related cmpxchg's are replaced with the _acquire
>>>> variants:
>>>>  - pv_queued_spin_steal_lock()
>>>>  - trylock_clear_pending()
>>>>
>>>> The cmpxchg's related to hashing are replaced by either by the _release
>>>> or the _relaxed variants. See the inline comment for details.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>>
>>>>  v1->v2:
>>>>   - Add comments in changelog and code for the rationale of the change.
>>>>
>>>> ---
>>>>  kernel/locking/qspinlock_paravirt.h | 50 ++++++++++++++++++++++++------
>>>> -------
>>>>  1 file changed, 33 insertions(+), 17 deletions(-)
>>>>
>>>>
>>>> @@ -323,8 +329,14 @@ static void pv_wait_node(struct mcs_spinlock *node,
>>>> struct mcs_spinlock *prev)
>>>>                  * If pv_kick_node() changed us to vcpu_hashed, retain that
>>>>                  * value so that pv_wait_head_or_lock() knows to not also
>>>> try
>>>>                  * to hash this lock.
>>>> +                *
>>>> +                * The smp_store_mb() and control dependency above will
>>>> ensure
>>>> +                * that state change won't happen before that.
>>>> Synchronizing
>>>> +                * with pv_kick_node() wrt hashing by this waiter or by the
>>>> +                * lock holder is done solely by the state variable. There
>>>> is
>>>> +                * no other ordering requirement.
>>>>                  */
>>>> -               cmpxchg(&pn->state, vcpu_halted, vcpu_running);
>>>> +               cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running);
>>>>
>>>>                 /*
>>>>                  * If the locked flag is still not set after wakeup, it is
>>>> a
>>>> @@ -360,9 +372,12 @@ static void pv_kick_node(struct qspinlock *lock,
>>>> struct mcs_spinlock *node)
>>>>          * pv_wait_node(). If OTOH this fails, the vCPU was running and
>>>> will
>>>>          * observe its next->locked value and advance itself.
>>>>          *
>>>> -        * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
>>>> +        * Matches with smp_store_mb() and cmpxchg_relaxed() in
>>>> pv_wait_node().
>>>> +        * A release barrier is used here to ensure that node->locked is
>>>> +        * always set before changing the state. See comment in
>>>> pv_wait_node().
>>>>          */
>>>> -       if (cmpxchg(&pn->state, vcpu_halted, vcpu_hashed) != vcpu_halted)
>>>> +       if (cmpxchg_release(&pn->state, vcpu_halted, vcpu_hashed)
>>>> +                       != vcpu_halted)
>>>>                 return;
>>>>
>>>> hi, Waiman
>>> We can't use _release here, a full barrier is needed.
>>>
>>> There is pv_kick_node vs pv_wait_head_or_lock
>>>
>>> [w] l->locked = _Q_SLOW_VAL  //reordered here
>>>
>>> if (READ_ONCE(pn->state) == vcpu_hashed) //False.
>>>
>>>                    lp = (struct qspinlock **)1;
>>>
>>> [STORE] pn->state = vcpu_hashed                        lp = pv_hash(lock,
>>> pn);
>>> pv_hash()                                                                if
>>> (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed.
>>>
>>
>> This analysis is correct, but..
>>
>
> Hmm.. look at this again, I don't think this analysis is meaningful,
> let's say the reordering didn't happen, we still got(similar to your
> case):
>
but there is
						cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_running);

> 						if (READ_ONCE(pn->state) == vcpu_hashed) // false.
> 						  lp = (struct qspinlock **)1;
>
> cmpxchg(pn->state, vcpu_halted, vcpu_hashed);
this cmpxchg will observe the cmpxchg_relaxed above, so this cmpxchg will fail as pn->state is vcpu_running.
No bug here..

> 						  if(!lp) {
> 						    lp = pv_hash(lock, pn);
> WRITE_ONCE(l->locked, _Q_SLOW_VAL);
> pv_hash();
> 						    if (xchg(&l->locked, _Q_SLOW_VAL) == 0) // fasle, not unhashed.
>
> , right?

>
> Actually, I think this or your case could not happen because we have
>
> 	cmpxchg(pn->state, vcpu_halted, vcpu_running);
>
> in pv_wait_node(), which makes us either observe vcpu_hashed or set
> pn->state to vcpu_running before pv_kick_node() trying to do the hash.
>
> I may miss something subtle, but does switching back to cmpxchg() could
> fix the RCU stall you observed?
>
> Regards,
> Boqun
>
>>> Then the same lock has hashed twice but only unhashed once. So at last as
>>> the hash table grows big, we hit RCU stall.
>>>
>>> I hit RCU stall when I run netperf benchmark
>>>
>>
>> how will a big hash table hit RCU stall? Do you have the call trace for
>> your RCU stall?
>>
>> Regards,
>> Boqun
>>
>>> thanks
>>> xinhui
>>>
>>>
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>
>

  parent reply	other threads:[~2017-02-08  7:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-25 20:26 [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs Waiman Long
2016-12-26  5:50 ` Boqun Feng
2017-01-03 22:23   ` Waiman Long
2017-01-03 16:18 ` Peter Zijlstra
2017-01-03 22:07   ` Waiman Long
2017-01-04  9:41     ` Peter Zijlstra
2017-01-05  8:16       ` Pan Xinhui
2017-01-05  9:48         ` Peter Zijlstra
2017-01-05  9:51         ` Boqun Feng
2017-01-05 15:17         ` Waiman Long
2017-01-05 15:40           ` Boqun Feng
2017-01-05 15:30       ` Waiman Long
     [not found] ` <CAH4ORazqsCBA4G5paHtsp8PMfM=J3P6rvyR-53-ZLjn=7U6J0g@mail.gmail.com>
2017-02-08  4:05   ` Boqun Feng
2017-02-08  6:09     ` Boqun Feng
2017-02-08  6:47       ` Pan Xinhui
2017-02-08  6:48       ` Pan Xinhui
2017-02-08  7:09       ` Pan Xinhui [this message]
2017-02-08  7:15         ` Boqun Feng
     [not found]   ` <778926a5-cf9f-586b-6bc4-b9453d88aabb@redhat.com>
2017-02-13  2:24     ` panxinhui
2017-02-13  3:19       ` Boqun Feng
2017-02-17 19:01       ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c22e27d-e80e-81f9-d7a0-992e80425f44@linux.vnet.ibm.com \
    --to=xinhui@linux.vnet.ibm.com \
    --cc=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=mnipxh@gmail.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).