All of lore.kernel.org
 help / color / mirror / Atom feed
From: xinhui <xinhui.pan@linux.vnet.ibm.com>
To: Waiman Long <waiman.long@hpe.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com
Subject: Re: [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
Date: Fri, 27 May 2016 18:32:54 +0800	[thread overview]
Message-ID: <201605271033.u4RATRXc042700@mx0a-001b2d01.pphosted.com> (raw)
In-Reply-To: <57474114.1030506@hpe.com>


On 2016年05月27日 02:31, Waiman Long wrote:
> On 05/25/2016 02:09 AM, Pan Xinhui wrote:
>> In pv_wait_head_or_lock, if there is a spurious_wakeup, and it fails to
>> get the lock as there is lock stealing, then after a short spin, we need
>> hash the lock again and enter pv_wait to yield.
>>
>> Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
>> pv_wait might do nothing and return directly, that is not
>> paravirt-friendly because pv_wait_head_or_lock will just spin on the
>> lock then.
>>
>> Signed-off-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
>> ---
>>   kernel/locking/qspinlock_paravirt.h | 39 +++++++++++++++++++++++++++++--------
>>   1 file changed, 31 insertions(+), 8 deletions(-)
>
> Is this a problem you can easily reproduce on PPC? I have not observed this issue when testing on x86.
>
Hi, Waiman
	I notice the spurious_wakeup count is very high when I do benchmark tests and stress tests. So after a simple investigation,
I find pv_wait_head_or_lock() just keep loops.

	Here is my story, in my pv-qspinlcok patchset V1&&v2, pv_wait on ppc ignore the first two parameters of *ptr and val, that makes lock_stealing hit too much.
and when I change SPIN_THRESHOLD to a small value, system is very much unstable because waiter will enter pv_wait quickly and no one will kick waiter's cpu if
we enter pv_wait twice thanks to the lock_stealing.
	So what I do in my pv-qspinlcok patchset V3 is that add if (*ptr == val) in pv_wait. However as I mentioned above, then spurious_wakeup count is too high, that also means our cpu
slice is wasted.
	
thanks
xinhui
>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>> index 2bbffe4..3482ce9 100644
>> --- a/kernel/locking/qspinlock_paravirt.h
>> +++ b/kernel/locking/qspinlock_paravirt.h
>> @@ -429,14 +429,15 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
>>           return;
>>
>>       /*
>> -     * Put the lock into the hash table and set the _Q_SLOW_VAL.
>> -     *
>> -     * As this is the same vCPU that will check the _Q_SLOW_VAL value and
>> -     * the hash table later on at unlock time, no atomic instruction is
>> -     * needed.
>> +     * Put the lock into the hash table and set the _Q_SLOW_VAL later
>>        */
>> -    WRITE_ONCE(l->locked, _Q_SLOW_VAL);
>>       (void)pv_hash(lock, pn);
>> +
>> +    /*
>> +    * Match the smp_load_acquire in pv_wait_head_or_lock()
>> +    * We mush set the _Q_SLOW_VAL after hash.
>> +    */
>> +    smp_store_release(&l->locked, _Q_SLOW_VAL);
>>   }
>>
>>   /*
>> @@ -454,6 +455,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>       struct qspinlock **lp = NULL;
>>       int waitcnt = 0;
>>       int loop;
>> +    u8 lock_val;
>>
>>       /*
>>        * If pv_kick_node() already advanced our state, we don't need to
>> @@ -487,7 +489,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>           clear_pending(lock);
>>
>>
>> -        if (!lp) { /* ONCE */
>> +        if (!lp) {
>>               lp = pv_hash(lock, pn);
>>
>>               /*
>> @@ -517,6 +519,13 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>           qstat_inc(qstat_pv_wait_again, waitcnt);
>>
>>           /*
>> +        * make sure pv_kick_node has hashed the lock, so after pv_wait
>> +        * if ->locked is not _Q_SLOW_VAL, we can hash the lock again.
>> +        */
>> +        if (lp == (struct qspinlock **)1
>> +                &&  smp_load_acquire(&l->locked) == _Q_SLOW_VAL)
>> +            lp = (struct qspinlock **)2;
>> +        /*
>>            * Pass in the previous node vCPU nmber which is likely to be
>>            * the lock holder vCPU. This additional information may help
>>            * the hypervisor to give more resource to that vCPU so that
>> @@ -525,13 +534,27 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>            */
>>           pv_wait(&l->locked, _Q_SLOW_VAL, pn->prev_cpu);
>>
>> +        lock_val = READ_ONCE(l->locked);
>> +
>> +        /* if ->locked is zero, then lock owner has unhashed the lock
>> +         * if ->locked is _Q_LOCKED_VAL,
>> +         * 1) pv_kick_node didnot hash the lock, and lp != 0x1
>> +         * 2) lock stealing, lock owner has unhashed the lock too
>> +         * 3) race with pv_kick_node, if lp == 2, we know it has hashed
>> +         * the lock and the lock is unhashed in unlock()
>> +         * if ->lock is _Q_SLOW_VAL, spurious_wakeup?
>> +        */
>> +        if (lock_val != _Q_SLOW_VAL) {
>> +            if (lock_val == 0 || lp != (struct qspinlock **)1)
>> +                lp = 0;
>> +        }
>
> It is a bit hard to verify the correctness of these checks as many variables are involved. Instead, I posted an alternative way to solve this problem by focusing just on the atomic setting of _Q_SLOW_VAL which should be easier to understand and verified. Would you mind testing that patch to see if it can fix the problem that you see?
>
> Cheers,
> Longman
>

  reply	other threads:[~2016-05-27 10:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-25  6:09 [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup Pan Xinhui
2016-05-26 18:31 ` Waiman Long
2016-05-27 10:32   ` xinhui [this message]
2016-05-28  3:41     ` Waiman Long
2016-05-30  8:53       ` xinhui
     [not found]       ` <201605300855.u4U8sLm5005684@mx0a-001b2d01.pphosted.com>
2016-05-31 18:13         ` Waiman Long
2016-06-01  5:54           ` xinhui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201605271033.u4RATRXc042700@mx0a-001b2d01.pphosted.com \
    --to=xinhui.pan@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=waiman.long@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.