[PATCH] pv-qspinlock: Try to re-hash the lock after spurious

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
@ 2016-05-25  6:09 Pan Xinhui
  2016-05-26 18:31 ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: Pan Xinhui @ 2016-05-25  6:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, Waiman.Long, Pan Xinhui

In pv_wait_head_or_lock, if there is a spurious_wakeup, and it fails to
get the lock as there is lock stealing, then after a short spin, we need
hash the lock again and enter pv_wait to yield.

Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
pv_wait might do nothing and return directly, that is not
paravirt-friendly because pv_wait_head_or_lock will just spin on the
lock then.

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/qspinlock_paravirt.h | 39 +++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 2bbffe4..3482ce9 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -429,14 +429,15 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
 		return;
 
 	/*
-	 * Put the lock into the hash table and set the _Q_SLOW_VAL.
-	 *
-	 * As this is the same vCPU that will check the _Q_SLOW_VAL value and
-	 * the hash table later on at unlock time, no atomic instruction is
-	 * needed.
+	 * Put the lock into the hash table and set the _Q_SLOW_VAL later
 	 */
-	WRITE_ONCE(l->locked, _Q_SLOW_VAL);
 	(void)pv_hash(lock, pn);
+
+	/*
+	* Match the smp_load_acquire in pv_wait_head_or_lock()
+	* We mush set the _Q_SLOW_VAL after hash.
+	*/
+	smp_store_release(&l->locked, _Q_SLOW_VAL);
 }
 
 /*
@@ -454,6 +455,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
 	struct qspinlock **lp = NULL;
 	int waitcnt = 0;
 	int loop;
+	u8 lock_val;
 
 	/*
 	 * If pv_kick_node() already advanced our state, we don't need to
@@ -487,7 +489,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
 		clear_pending(lock);
 
 
-		if (!lp) { /* ONCE */
+		if (!lp) {
 			lp = pv_hash(lock, pn);
 
 			/*
@@ -517,6 +519,13 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
 		qstat_inc(qstat_pv_wait_again, waitcnt);
 
 		/*
+		* make sure pv_kick_node has hashed the lock, so after pv_wait
+		* if ->locked is not _Q_SLOW_VAL, we can hash the lock again.
+		*/
+		if (lp == (struct qspinlock **)1
+				&& smp_load_acquire(&l->locked) == _Q_SLOW_VAL)
+			lp = (struct qspinlock **)2;
+		/*
 		 * Pass in the previous node vCPU nmber which is likely to be
 		 * the lock holder vCPU. This additional information may help
 		 * the hypervisor to give more resource to that vCPU so that
@@ -525,13 +534,27 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
 		 */
 		pv_wait(&l->locked, _Q_SLOW_VAL, pn->prev_cpu);
 
+		lock_val = READ_ONCE(l->locked);
+
+		/* if ->locked is zero, then lock owner has unhashed the lock
+		 * if ->locked is _Q_LOCKED_VAL,
+		 * 1) pv_kick_node didnot hash the lock, and lp != 0x1
+		 * 2) lock stealing, lock owner has unhashed the lock too
+		 * 3) race with pv_kick_node, if lp == 2, we know it has hashed
+		 * the lock and the lock is unhashed in unlock()
+		 * if ->lock is _Q_SLOW_VAL, spurious_wakeup?
+		*/
+		if (lock_val != _Q_SLOW_VAL) {
+			if (lock_val == 0 || lp != (struct qspinlock **)1)
+				lp = 0;
+		}
 		/*
 		 * The unlocker should have freed the lock before kicking the
 		 * CPU. So if the lock is still not free, it is a spurious
 		 * wakeup or another vCPU has stolen the lock. The current
 		 * vCPU should spin again.
 		 */
-		qstat_inc(qstat_pv_spurious_wakeup, READ_ONCE(l->locked));
+		qstat_inc(qstat_pv_spurious_wakeup, lock_val);
 	}
 
 	/*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
  2016-05-25  6:09 [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup Pan Xinhui
@ 2016-05-26 18:31 ` Waiman Long
  2016-05-27 10:32   ` xinhui
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2016-05-26 18:31 UTC (permalink / raw)
  To: Pan Xinhui; +Cc: linux-kernel, peterz, mingo

On 05/25/2016 02:09 AM, Pan Xinhui wrote:
> In pv_wait_head_or_lock, if there is a spurious_wakeup, and it fails to
> get the lock as there is lock stealing, then after a short spin, we need
> hash the lock again and enter pv_wait to yield.
>
> Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
> pv_wait might do nothing and return directly, that is not
> paravirt-friendly because pv_wait_head_or_lock will just spin on the
> lock then.
>
> Signed-off-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
> ---
>   kernel/locking/qspinlock_paravirt.h | 39 +++++++++++++++++++++++++++++--------
>   1 file changed, 31 insertions(+), 8 deletions(-)

Is this a problem you can easily reproduce on PPC? I have not observed 
this issue when testing on x86.

>
> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> index 2bbffe4..3482ce9 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -429,14 +429,15 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
>   		return;
>
>   	/*
> -	 * Put the lock into the hash table and set the _Q_SLOW_VAL.
> -	 *
> -	 * As this is the same vCPU that will check the _Q_SLOW_VAL value and
> -	 * the hash table later on at unlock time, no atomic instruction is
> -	 * needed.
> +	 * Put the lock into the hash table and set the _Q_SLOW_VAL later
>   	 */
> -	WRITE_ONCE(l->locked, _Q_SLOW_VAL);
>   	(void)pv_hash(lock, pn);
> +
> +	/*
> +	* Match the smp_load_acquire in pv_wait_head_or_lock()
> +	* We mush set the _Q_SLOW_VAL after hash.
> +	*/
> +	smp_store_release(&l->locked, _Q_SLOW_VAL);
>   }
>
>   /*
> @@ -454,6 +455,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>   	struct qspinlock **lp = NULL;
>   	int waitcnt = 0;
>   	int loop;
> +	u8 lock_val;
>
>   	/*
>   	 * If pv_kick_node() already advanced our state, we don't need to
> @@ -487,7 +489,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>   		clear_pending(lock);
>
>
> -		if (!lp) { /* ONCE */
> +		if (!lp) {
>   			lp = pv_hash(lock, pn);
>
>   			/*
> @@ -517,6 +519,13 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>   		qstat_inc(qstat_pv_wait_again, waitcnt);
>
>   		/*
> +		* make sure pv_kick_node has hashed the lock, so after pv_wait
> +		* if ->locked is not _Q_SLOW_VAL, we can hash the lock again.
> +		*/
> +		if (lp == (struct qspinlock **)1
> +				&&  smp_load_acquire(&l->locked) == _Q_SLOW_VAL)
> +			lp = (struct qspinlock **)2;
> +		/*
>   		 * Pass in the previous node vCPU nmber which is likely to be
>   		 * the lock holder vCPU. This additional information may help
>   		 * the hypervisor to give more resource to that vCPU so that
> @@ -525,13 +534,27 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>   		 */
>   		pv_wait(&l->locked, _Q_SLOW_VAL, pn->prev_cpu);
>
> +		lock_val = READ_ONCE(l->locked);
> +
> +		/* if ->locked is zero, then lock owner has unhashed the lock
> +		 * if ->locked is _Q_LOCKED_VAL,
> +		 * 1) pv_kick_node didnot hash the lock, and lp != 0x1
> +		 * 2) lock stealing, lock owner has unhashed the lock too
> +		 * 3) race with pv_kick_node, if lp == 2, we know it has hashed
> +		 * the lock and the lock is unhashed in unlock()
> +		 * if ->lock is _Q_SLOW_VAL, spurious_wakeup?
> +		*/
> +		if (lock_val != _Q_SLOW_VAL) {
> +			if (lock_val == 0 || lp != (struct qspinlock **)1)
> +				lp = 0;
> +		}

It is a bit hard to verify the correctness of these checks as many 
variables are involved. Instead, I posted an alternative way to solve 
this problem by focusing just on the atomic setting of _Q_SLOW_VAL which 
should be easier to understand and verified. Would you mind testing that 
patch to see if it can fix the problem that you see?

Cheers,
Longman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
  2016-05-26 18:31 ` Waiman Long
@ 2016-05-27 10:32   ` xinhui
  2016-05-28  3:41     ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: xinhui @ 2016-05-27 10:32 UTC (permalink / raw)
  To: Waiman Long; +Cc: linux-kernel, peterz, mingo


On 2016年05月27日 02:31, Waiman Long wrote:
> On 05/25/2016 02:09 AM, Pan Xinhui wrote:
>> In pv_wait_head_or_lock, if there is a spurious_wakeup, and it fails to
>> get the lock as there is lock stealing, then after a short spin, we need
>> hash the lock again and enter pv_wait to yield.
>>
>> Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
>> pv_wait might do nothing and return directly, that is not
>> paravirt-friendly because pv_wait_head_or_lock will just spin on the
>> lock then.
>>
>> Signed-off-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
>> ---
>>   kernel/locking/qspinlock_paravirt.h | 39 +++++++++++++++++++++++++++++--------
>>   1 file changed, 31 insertions(+), 8 deletions(-)
>
> Is this a problem you can easily reproduce on PPC? I have not observed this issue when testing on x86.
>
Hi, Waiman
	I notice the spurious_wakeup count is very high when I do benchmark tests and stress tests. So after a simple investigation,
I find pv_wait_head_or_lock() just keep loops.

	Here is my story, in my pv-qspinlcok patchset V1&&v2, pv_wait on ppc ignore the first two parameters of *ptr and val, that makes lock_stealing hit too much.
and when I change SPIN_THRESHOLD to a small value, system is very much unstable because waiter will enter pv_wait quickly and no one will kick waiter's cpu if
we enter pv_wait twice thanks to the lock_stealing.
	So what I do in my pv-qspinlcok patchset V3 is that add if (*ptr == val) in pv_wait. However as I mentioned above, then spurious_wakeup count is too high, that also means our cpu
slice is wasted.
	
thanks
xinhui
>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>> index 2bbffe4..3482ce9 100644
>> --- a/kernel/locking/qspinlock_paravirt.h
>> +++ b/kernel/locking/qspinlock_paravirt.h
>> @@ -429,14 +429,15 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
>>           return;
>>
>>       /*
>> -     * Put the lock into the hash table and set the _Q_SLOW_VAL.
>> -     *
>> -     * As this is the same vCPU that will check the _Q_SLOW_VAL value and
>> -     * the hash table later on at unlock time, no atomic instruction is
>> -     * needed.
>> +     * Put the lock into the hash table and set the _Q_SLOW_VAL later
>>        */
>> -    WRITE_ONCE(l->locked, _Q_SLOW_VAL);
>>       (void)pv_hash(lock, pn);
>> +
>> +    /*
>> +    * Match the smp_load_acquire in pv_wait_head_or_lock()
>> +    * We mush set the _Q_SLOW_VAL after hash.
>> +    */
>> +    smp_store_release(&l->locked, _Q_SLOW_VAL);
>>   }
>>
>>   /*
>> @@ -454,6 +455,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>       struct qspinlock **lp = NULL;
>>       int waitcnt = 0;
>>       int loop;
>> +    u8 lock_val;
>>
>>       /*
>>        * If pv_kick_node() already advanced our state, we don't need to
>> @@ -487,7 +489,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>           clear_pending(lock);
>>
>>
>> -        if (!lp) { /* ONCE */
>> +        if (!lp) {
>>               lp = pv_hash(lock, pn);
>>
>>               /*
>> @@ -517,6 +519,13 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>           qstat_inc(qstat_pv_wait_again, waitcnt);
>>
>>           /*
>> +        * make sure pv_kick_node has hashed the lock, so after pv_wait
>> +        * if ->locked is not _Q_SLOW_VAL, we can hash the lock again.
>> +        */
>> +        if (lp == (struct qspinlock **)1
>> +                &&  smp_load_acquire(&l->locked) == _Q_SLOW_VAL)
>> +            lp = (struct qspinlock **)2;
>> +        /*
>>            * Pass in the previous node vCPU nmber which is likely to be
>>            * the lock holder vCPU. This additional information may help
>>            * the hypervisor to give more resource to that vCPU so that
>> @@ -525,13 +534,27 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
>>            */
>>           pv_wait(&l->locked, _Q_SLOW_VAL, pn->prev_cpu);
>>
>> +        lock_val = READ_ONCE(l->locked);
>> +
>> +        /* if ->locked is zero, then lock owner has unhashed the lock
>> +         * if ->locked is _Q_LOCKED_VAL,
>> +         * 1) pv_kick_node didnot hash the lock, and lp != 0x1
>> +         * 2) lock stealing, lock owner has unhashed the lock too
>> +         * 3) race with pv_kick_node, if lp == 2, we know it has hashed
>> +         * the lock and the lock is unhashed in unlock()
>> +         * if ->lock is _Q_SLOW_VAL, spurious_wakeup?
>> +        */
>> +        if (lock_val != _Q_SLOW_VAL) {
>> +            if (lock_val == 0 || lp != (struct qspinlock **)1)
>> +                lp = 0;
>> +        }
>
> It is a bit hard to verify the correctness of these checks as many variables are involved. Instead, I posted an alternative way to solve this problem by focusing just on the atomic setting of _Q_SLOW_VAL which should be easier to understand and verified. Would you mind testing that patch to see if it can fix the problem that you see?
>
> Cheers,
> Longman
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
  2016-05-27 10:32   ` xinhui
@ 2016-05-28  3:41     ` Waiman Long
  2016-05-30  8:53       ` xinhui
       [not found]       ` <201605300855.u4U8sLm5005684@mx0a-001b2d01.pphosted.com>
  0 siblings, 2 replies; 7+ messages in thread
From: Waiman Long @ 2016-05-28  3:41 UTC (permalink / raw)
  To: xinhui; +Cc: linux-kernel, peterz, mingo

On 05/27/2016 06:32 AM, xinhui wrote:
>
> On 2016年05月27日 02:31, Waiman Long wrote:
>> On 05/25/2016 02:09 AM, Pan Xinhui wrote:
>>> In pv_wait_head_or_lock, if there is a spurious_wakeup, and it fails to
>>> get the lock as there is lock stealing, then after a short spin, we 
>>> need
>>> hash the lock again and enter pv_wait to yield.
>>>
>>> Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
>>> pv_wait might do nothing and return directly, that is not
>>> paravirt-friendly because pv_wait_head_or_lock will just spin on the
>>> lock then.
>>>
>>> Signed-off-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
>>> ---
>>>   kernel/locking/qspinlock_paravirt.h | 39 
>>> +++++++++++++++++++++++++++++--------
>>>   1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> Is this a problem you can easily reproduce on PPC? I have not 
>> observed this issue when testing on x86.
>>
> Hi, Waiman
>     I notice the spurious_wakeup count is very high when I do 
> benchmark tests and stress tests. So after a simple investigation,
> I find pv_wait_head_or_lock() just keep loops.
>

That shouldn't happen in normal case. When testing on x86, I typically 
get the following stat data for an over-commited guest:

pv_lock_slowpath=9256211
pv_lock_stealing=36398363
pv_spurious_wakeup=311
pv_wait_again=294
pv_wait_early=3255605
pv_wait_head=173
pv_wait_node=3256280

The queue head don't call pv_wait that often. There are a bit of 
spurious wakeup, but it is mostly caused by lock stealing. How long is a 
cpu_relax() in PPC takes?

>     Here is my story, in my pv-qspinlcok patchset V1&&v2, pv_wait on 
> ppc ignore the first two parameters of *ptr and val, that makes 
> lock_stealing hit too much.

The pvqspinlock code does depend on pv_wait() doing a final check to see 
if the lock value change. The code may not work reliably without that.

> and when I change SPIN_THRESHOLD to a small value, system is very much 
> unstable because waiter will enter pv_wait quickly and no one will 
> kick waiter's cpu if
> we enter pv_wait twice thanks to the lock_stealing.
>     So what I do in my pv-qspinlcok patchset V3 is that add if (*ptr 
> == val) in pv_wait. However as I mentioned above, then spurious_wakeup 
> count is too high, that also means our cpu
> slice is wasted.

The SPIN_THRESHOLD should be sufficiently big. A small value will cause 
too many waits and wake-up's which may not be good. Anyway, more testing 
and tuning may be needed to make the pvqspinlock code work well with PPC.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
  2016-05-28  3:41     ` Waiman Long
@ 2016-05-30  8:53       ` xinhui
       [not found]       ` <201605300855.u4U8sLm5005684@mx0a-001b2d01.pphosted.com>
  1 sibling, 0 replies; 7+ messages in thread
From: xinhui @ 2016-05-30  8:53 UTC (permalink / raw)
  To: Waiman Long; +Cc: linux-kernel, peterz, mingo



On 2016年05月28日 11:41, Waiman Long wrote:
> On 05/27/2016 06:32 AM, xinhui wrote:
>>
>> On 2016年05月27日 02:31, Waiman Long wrote:
>>> On 05/25/2016 02:09 AM, Pan Xinhui wrote:
>>>> In pv_wait_head_or_lock, if there is a spurious_wakeup, and it fails to
>>>> get the lock as there is lock stealing, then after a short spin, we need
>>>> hash the lock again and enter pv_wait to yield.
>>>>
>>>> Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
>>>> pv_wait might do nothing and return directly, that is not
>>>> paravirt-friendly because pv_wait_head_or_lock will just spin on the
>>>> lock then.
>>>>
>>>> Signed-off-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
>>>> ---
>>>>   kernel/locking/qspinlock_paravirt.h | 39 +++++++++++++++++++++++++++++--------
>>>>   1 file changed, 31 insertions(+), 8 deletions(-)
>>>
>>> Is this a problem you can easily reproduce on PPC? I have not observed this issue when testing on x86.
>>>
>> Hi, Waiman
>>     I notice the spurious_wakeup count is very high when I do benchmark tests and stress tests. So after a simple investigation,
>> I find pv_wait_head_or_lock() just keep loops.
>>
>
> That shouldn't happen in normal case. When testing on x86, I typically get the following stat data for an over-commited guest:
>
> pv_lock_slowpath=9256211
> pv_lock_stealing=36398363
> pv_spurious_wakeup=311
> pv_wait_again=294
> pv_wait_early=3255605
> pv_wait_head=173
> pv_wait_node=3256280
>
OK, here is the result after run command  perf bench sched messaging -g 512

pv_lock_slowpath=2331407
pv_lock_stealing=192038
pv_spurious_wakeup=236319
pv_wait_again=215668
pv_wait_early=177299
pv_wait_head=9206
pv_wait_node=228781

> The queue head don't call pv_wait that often. There are a bit of spurious wakeup, but it is mostly caused by lock stealing. How long is a cpu_relax() in PPC takes?
>
946012160 cpu_relax loops with 10 seconds. So if SPIN_THRESHOLD is 1<<15, it costs 0.3ms to spin on the lock. How about x86?

And only 10134976 pv_wait/pv_kick hyper-call loops within 10 seconds. so every hyper-call itself(the so-called latency) costs less than 1us.

>>     Here is my story, in my pv-qspinlcok patchset V1&&v2, pv_wait on ppc ignore the first two parameters of *ptr and val, that makes lock_stealing hit too much.
>
> The pvqspinlock code does depend on pv_wait() doing a final check to see if the lock value change. The code may not work reliably without that.
>
agree, So pv_wait now do the check of *ptr and val.

>> and when I change SPIN_THRESHOLD to a small value, system is very much unstable because waiter will enter pv_wait quickly and no one will kick waiter's cpu if
>> we enter pv_wait twice thanks to the lock_stealing.
>>     So what I do in my pv-qspinlcok patchset V3 is that add if (*ptr == val) in pv_wait. However as I mentioned above, then spurious_wakeup count is too high, that also means our cpu
>> slice is wasted.
>
> The SPIN_THRESHOLD should be sufficiently big. A small value will cause too many waits and wake-up's which may not be good. Anyway, more testing and tuning may be needed to make the pvqspinlock code work well with PPC.
>
agree , but I think the SPIN_THRESHOLD (1<<15) for ppc is a little large.

I even come up with an idea that make SPIN_THRESHOLD an extern variable on ppc. But I am busy and I wonder if it's worth doing that.

> Cheers,
> Longman
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
       [not found]       ` <201605300855.u4U8sLm5005684@mx0a-001b2d01.pphosted.com>
@ 2016-05-31 18:13         ` Waiman Long
  2016-06-01  5:54           ` xinhui
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2016-05-31 18:13 UTC (permalink / raw)
  To: xinhui; +Cc: linux-kernel, peterz, mingo

On 05/30/2016 04:53 AM, xinhui wrote:
>
>
> On 2016年05月28日 11:41, Waiman Long wrote:
>> On 05/27/2016 06:32 AM, xinhui wrote:
>>>
>>> On 2016年05月27日 02:31, Waiman Long wrote:
>>>> On 05/25/2016 02:09 AM, Pan Xinhui wrote:
>>>>> In pv_wait_head_or_lock, if there is a spurious_wakeup, and it 
>>>>> fails to
>>>>> get the lock as there is lock stealing, then after a short spin, 
>>>>> we need
>>>>> hash the lock again and enter pv_wait to yield.
>>>>>
>>>>> Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
>>>>> pv_wait might do nothing and return directly, that is not
>>>>> paravirt-friendly because pv_wait_head_or_lock will just spin on the
>>>>> lock then.
>>>>>
>>>>> Signed-off-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
>>>>> ---
>>>>>   kernel/locking/qspinlock_paravirt.h | 39 
>>>>> +++++++++++++++++++++++++++++--------
>>>>>   1 file changed, 31 insertions(+), 8 deletions(-)
>>>>
>>>> Is this a problem you can easily reproduce on PPC? I have not 
>>>> observed this issue when testing on x86.
>>>>
>>> Hi, Waiman
>>>     I notice the spurious_wakeup count is very high when I do 
>>> benchmark tests and stress tests. So after a simple investigation,
>>> I find pv_wait_head_or_lock() just keep loops.
>>>
>>
>> That shouldn't happen in normal case. When testing on x86, I 
>> typically get the following stat data for an over-commited guest:
>>
>> pv_lock_slowpath=9256211
>> pv_lock_stealing=36398363
>> pv_spurious_wakeup=311
>> pv_wait_again=294
>> pv_wait_early=3255605
>> pv_wait_head=173
>> pv_wait_node=3256280
>>
> OK, here is the result after run command  perf bench sched messaging 
> -g 512
>
> pv_lock_slowpath=2331407
> pv_lock_stealing=192038
> pv_spurious_wakeup=236319
> pv_wait_again=215668
> pv_wait_early=177299
> pv_wait_head=9206
> pv_wait_node=228781
>

Is the high spurious wakeup caused by the way PPC schedules processor 
resources to vCPUs? In x86, once the vCPU voluntarily sleep, it won't 
get woken up until there is an explicit vCPU kick request. It may not be 
the case for PPC, then. That may explain the high spurious wakeup number.

>> The queue head don't call pv_wait that often. There are a bit of 
>> spurious wakeup, but it is mostly caused by lock stealing. How long 
>> is a cpu_relax() in PPC takes?
>>
> 946012160 cpu_relax loops with 10 seconds. So if SPIN_THRESHOLD is 
> 1<<15, it costs 0.3ms to spin on the lock. How about x86?
>

For x86, one measurement that I got in the past is that each cpu_relax() 
loop took about 3ns. So the full spin will take about 0.9ms.

> And only 10134976 pv_wait/pv_kick hyper-call loops within 10 seconds. 
> so every hyper-call itself(the so-called latency) costs less than 1us.
>

The hypercall is much slower in x86. it is about 10-20 us for pv_kick 
and up to 100us for pv_kick=>pv_wait.

>>>     Here is my story, in my pv-qspinlcok patchset V1&&v2, pv_wait on 
>>> ppc ignore the first two parameters of *ptr and val, that makes 
>>> lock_stealing hit too much.
>>
>> The pvqspinlock code does depend on pv_wait() doing a final check to 
>> see if the lock value change. The code may not work reliably without 
>> that.
>>
> agree, So pv_wait now do the check of *ptr and val.
>
>>> and when I change SPIN_THRESHOLD to a small value, system is very 
>>> much unstable because waiter will enter pv_wait quickly and no one 
>>> will kick waiter's cpu if
>>> we enter pv_wait twice thanks to the lock_stealing.
>>>     So what I do in my pv-qspinlcok patchset V3 is that add if (*ptr 
>>> == val) in pv_wait. However as I mentioned above, then 
>>> spurious_wakeup count is too high, that also means our cpu
>>> slice is wasted.
>>
>> The SPIN_THRESHOLD should be sufficiently big. A small value will 
>> cause too many waits and wake-up's which may not be good. Anyway, 
>> more testing and tuning may be needed to make the pvqspinlock code 
>> work well with PPC.
>>
> agree , but I think the SPIN_THRESHOLD (1<<15) for ppc is a little large.
>
> I even come up with an idea that make SPIN_THRESHOLD an extern 
> variable on ppc. But I am busy and I wonder if it's worth doing that.

The purpose of the SPIN_THRESHOLD is to make sure that the vCPU won't 
call pv_wait() if the vCPUs in the guest aren't over-commited. The 
situation may be a bit different in PPC. So you need to make a decision 
as to how large the SPIN_THRESHOLD should be.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup
  2016-05-31 18:13         ` Waiman Long
@ 2016-06-01  5:54           ` xinhui
  0 siblings, 0 replies; 7+ messages in thread
From: xinhui @ 2016-06-01  5:54 UTC (permalink / raw)
  To: Waiman Long; +Cc: linux-kernel, peterz, mingo



On 2016年06月01日 02:13, Waiman Long wrote:
> On 05/30/2016 04:53 AM, xinhui wrote:
>>
>>
>> On 2016年05月28日 11:41, Waiman Long wrote:
>>> On 05/27/2016 06:32 AM, xinhui wrote:
>>>>
>>>> On 2016年05月27日 02:31, Waiman Long wrote:
>>>>> On 05/25/2016 02:09 AM, Pan Xinhui wrote:
>>>>>> In pv_wait_head_or_lock, if there is a spurious_wakeup, and it fails to
>>>>>> get the lock as there is lock stealing, then after a short spin, we need
>>>>>> hash the lock again and enter pv_wait to yield.
>>>>>>
>>>>>> Currently after a spurious_wakeup, as l->locked is not _Q_SLOW_VAL,
>>>>>> pv_wait might do nothing and return directly, that is not
>>>>>> paravirt-friendly because pv_wait_head_or_lock will just spin on the
>>>>>> lock then.
>>>>>>
>>>>>> Signed-off-by: Pan Xinhui<xinhui.pan@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>   kernel/locking/qspinlock_paravirt.h | 39 +++++++++++++++++++++++++++++--------
>>>>>>   1 file changed, 31 insertions(+), 8 deletions(-)
>>>>>
>>>>> Is this a problem you can easily reproduce on PPC? I have not observed this issue when testing on x86.
>>>>>
>>>> Hi, Waiman
>>>>     I notice the spurious_wakeup count is very high when I do benchmark tests and stress tests. So after a simple investigation,
>>>> I find pv_wait_head_or_lock() just keep loops.
>>>>
>>>
>>> That shouldn't happen in normal case. When testing on x86, I typically get the following stat data for an over-commited guest:
>>>
>>> pv_lock_slowpath=9256211
>>> pv_lock_stealing=36398363
>>> pv_spurious_wakeup=311
>>> pv_wait_again=294
>>> pv_wait_early=3255605
>>> pv_wait_head=173
>>> pv_wait_node=3256280
>>>
>> OK, here is the result after run command  perf bench sched messaging -g 512
>>
>> pv_lock_slowpath=2331407
>> pv_lock_stealing=192038
>> pv_spurious_wakeup=236319
>> pv_wait_again=215668
>> pv_wait_early=177299
>> pv_wait_head=9206
>> pv_wait_node=228781
>>
>
> Is the high spurious wakeup caused by the way PPC schedules processor resources to vCPUs? In x86, once the vCPU voluntarily sleep, it won't get woken up until there is an explicit vCPU kick request. It may not be the case for PPC, then. That may explain the high spurious wakeup number.
>
yes, vPCU can run if 1) someone kick it. 2) interrupts, for example, IPI, system reset. 3) vCPU has cpu slice again.

So seems I need rework __pv_wait on ppc,
say,
while(*ptr == val)
	__yiled_cpu(cpu...);

>>> The queue head don't call pv_wait that often. There are a bit of spurious wakeup, but it is mostly caused by lock stealing. How long is a cpu_relax() in PPC takes?
>>>
>> 946012160 cpu_relax loops with 10 seconds. So if SPIN_THRESHOLD is 1<<15, it costs 0.3ms to spin on the lock. How about x86?
>>
>
> For x86, one measurement that I got in the past is that each cpu_relax() loop took about 3ns. So the full spin will take about 0.9ms.
>
>> And only 10134976 pv_wait/pv_kick hyper-call loops within 10 seconds. so every hyper-call itself(the so-called latency) costs less than 1us.
>>
>
> The hypercall is much slower in x86. it is about 10-20 us for pv_kick and up to 100us for pv_kick=>pv_wait.
>
Okay, So for ppc, hyper-call is light-weight, It's good to enter pv_wait earlier.
thanks for your tips. :)

>>>>     Here is my story, in my pv-qspinlcok patchset V1&&v2, pv_wait on ppc ignore the first two parameters of *ptr and val, that makes lock_stealing hit too much.
>>>
>>> The pvqspinlock code does depend on pv_wait() doing a final check to see if the lock value change. The code may not work reliably without that.
>>>
>> agree, So pv_wait now do the check of *ptr and val.
>>
>>>> and when I change SPIN_THRESHOLD to a small value, system is very much unstable because waiter will enter pv_wait quickly and no one will kick waiter's cpu if
>>>> we enter pv_wait twice thanks to the lock_stealing.
>>>>     So what I do in my pv-qspinlcok patchset V3 is that add if (*ptr == val) in pv_wait. However as I mentioned above, then spurious_wakeup count is too high, that also means our cpu
>>>> slice is wasted.
>>>
>>> The SPIN_THRESHOLD should be sufficiently big. A small value will cause too many waits and wake-up's which may not be good. Anyway, more testing and tuning may be needed to make the pvqspinlock code work well with PPC.
>>>
>> agree , but I think the SPIN_THRESHOLD (1<<15) for ppc is a little large.
>>
>> I even come up with an idea that make SPIN_THRESHOLD an extern variable on ppc. But I am busy and I wonder if it's worth doing that.
>
> The purpose of the SPIN_THRESHOLD is to make sure that the vCPU won't call pv_wait() if the vCPUs in the guest aren't over-commited. The situation may be a bit different in PPC. So you need to make a decision as to how large the SPIN_THRESHOLD should be.
>
thanks for explaining that. I can keep SPIN_THRESHOLD as a const value for now.

thanks
xinhui

> Cheers,
> Longman
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-01  5:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-25  6:09 [PATCH] pv-qspinlock: Try to re-hash the lock after spurious_wakeup Pan Xinhui
2016-05-26 18:31 ` Waiman Long
2016-05-27 10:32   ` xinhui
2016-05-28  3:41     ` Waiman Long
2016-05-30  8:53       ` xinhui
     [not found]       ` <201605300855.u4U8sLm5005684@mx0a-001b2d01.pphosted.com>
2016-05-31 18:13         ` Waiman Long
2016-06-01  5:54           ` xinhui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.