KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
@ 2019-09-09  1:40 Wanpeng Li
  2019-09-09 10:56 ` Waiman Long
  0 siblings, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2019-09-09  1:40 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Waiman Long, loobinliu, stable

From: Wanpeng Li <wanpengli@tencent.com>

This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if 
vCPU is preempted), we found great regression caused by this commit.

Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800 
records/s with this commit.

          Host                       Guest                score

vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s

Exit from aggressive wait-early mechanism can result in yield premature and 
incur extra scheduling latency in over-subscribe scenario.

kvm optimizes:
[1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
[2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)

Tested-by: loobinliu@tencent.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: loobinliu@tencent.com
Cc: stable@vger.kernel.org 
Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 kernel/locking/qspinlock_paravirt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 89bab07..e84d21a 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
 	if ((loop & PV_PREV_CHECK_MASK) != 0)
 		return false;
 
-	return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
+	return READ_ONCE(prev->state) != vcpu_running;
 }
 
 /*
-- 
2.7.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
  2019-09-09  1:40 [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" Wanpeng Li
@ 2019-09-09 10:56 ` Waiman Long
  2019-09-09 11:06   ` Paolo Bonzini
  2019-09-10  5:56   ` Wanpeng Li
  0 siblings, 2 replies; 7+ messages in thread
From: Waiman Long @ 2019-09-09 10:56 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	loobinliu, stable

On 9/9/19 2:40 AM, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
>
> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if 
> vCPU is preempted), we found great regression caused by this commit.
>
> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800 
> records/s with this commit.
>
>           Host                       Guest                score
>
> vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
> vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
> vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
> vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
>
> Exit from aggressive wait-early mechanism can result in yield premature and 
> incur extra scheduling latency in over-subscribe scenario.
>
> kvm optimizes:
> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
>
> Tested-by: loobinliu@tencent.com
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: loobinliu@tencent.com
> Cc: stable@vger.kernel.org 
> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  kernel/locking/qspinlock_paravirt.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> index 89bab07..e84d21a 100644
> --- a/kernel/locking/qspinlock_paravirt.h
> +++ b/kernel/locking/qspinlock_paravirt.h
> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>  	if ((loop & PV_PREV_CHECK_MASK) != 0)
>  		return false;
>  
> -	return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
> +	return READ_ONCE(prev->state) != vcpu_running;
>  }
>  
>  /*

There are several possibilities for this performance regression:

1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
cacheline contention issue depending on how that callback is implemented.

2) KVM may set the preempt flag for a short period whenver an vmexit
happens even if a vmenter is executed shortly after. In this case, we
may want to use a more durable vcpu suspend flag that indicates the vcpu
won't get a real vcpu back for a longer period of time.

Perhaps you can add a lock event counter to count the number of
wait_early events caused by vcpu_is_preempted() being true to see if it
really cause a lot more wait_early than without the vcpu_is_preempted()
call.

I have no objection to this, I just want to find out the root cause of it.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
  2019-09-09 10:56 ` Waiman Long
@ 2019-09-09 11:06   ` Paolo Bonzini
  2019-09-09 12:16     ` Wanpeng Li
  2019-09-10  5:56   ` Wanpeng Li
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2019-09-09 11:06 UTC (permalink / raw)
  To: Waiman Long, Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	loobinliu, stable

On 09/09/19 12:56, Waiman Long wrote:
> On 9/9/19 2:40 AM, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpengli@tencent.com>
>>
>> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if 
>> vCPU is preempted), we found great regression caused by this commit.
>>
>> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
>> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800 
>> records/s with this commit.
>>
>>           Host                       Guest                score
>>
>> vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
>> vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
>> vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
>> vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
>>
>> Exit from aggressive wait-early mechanism can result in yield premature and 
>> incur extra scheduling latency in over-subscribe scenario.
>>
>> kvm optimizes:
>> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
>> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
>>
>> Tested-by: loobinliu@tencent.com
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Waiman Long <longman@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: loobinliu@tencent.com
>> Cc: stable@vger.kernel.org 
>> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>> ---
>>  kernel/locking/qspinlock_paravirt.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>> index 89bab07..e84d21a 100644
>> --- a/kernel/locking/qspinlock_paravirt.h
>> +++ b/kernel/locking/qspinlock_paravirt.h
>> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>>  	if ((loop & PV_PREV_CHECK_MASK) != 0)
>>  		return false;
>>  
>> -	return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
>> +	return READ_ONCE(prev->state) != vcpu_running;
>>  }
>>  
>>  /*
> 
> There are several possibilities for this performance regression:
> 
> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
> cacheline contention issue depending on how that callback is implemented.

Unlikely, it is a single percpu read.

> 2) KVM may set the preempt flag for a short period whenver an vmexit
> happens even if a vmenter is executed shortly after. In this case, we
> may want to use a more durable vcpu suspend flag that indicates the vcpu
> won't get a real vcpu back for a longer period of time.

It sets it for exits to userspace, but they shouldn't really happen on a 
properly-configured system.

However, it's easy to test this theory:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2e302e977dac..feb6c75a7a88 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3368,26 +3368,28 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	int idx;
 
-	if (vcpu->preempted)
+	if (vcpu->preempted) {
 		vcpu->arch.preempted_in_kernel = !kvm_x86_ops->get_cpl(vcpu);
 
-	/*
-	 * Disable page faults because we're in atomic context here.
-	 * kvm_write_guest_offset_cached() would call might_fault()
-	 * that relies on pagefault_disable() to tell if there's a
-	 * bug. NOTE: the write to guest memory may not go through if
-	 * during postcopy live migration or if there's heavy guest
-	 * paging.
-	 */
-	pagefault_disable();
-	/*
-	 * kvm_memslots() will be called by
-	 * kvm_write_guest_offset_cached() so take the srcu lock.
-	 */
-	idx = srcu_read_lock(&vcpu->kvm->srcu);
-	kvm_steal_time_set_preempted(vcpu);
-	srcu_read_unlock(&vcpu->kvm->srcu, idx);
-	pagefault_enable();
+		/*
+		 * Disable page faults because we're in atomic context here.
+		 * kvm_write_guest_offset_cached() would call might_fault()
+		 * that relies on pagefault_disable() to tell if there's a
+		 * bug. NOTE: the write to guest memory may not go through if
+		 * during postcopy live migration or if there's heavy guest
+		 * paging.
+		 */
+		pagefault_disable();
+		/*
+		 * kvm_memslots() will be called by
+		 * kvm_write_guest_offset_cached() so take the srcu lock.
+		 */
+		idx = srcu_read_lock(&vcpu->kvm->srcu);
+		kvm_steal_time_set_preempted(vcpu);
+		srcu_read_unlock(&vcpu->kvm->srcu, idx);
+		pagefault_enable();
+	}
+
 	kvm_x86_ops->vcpu_put(vcpu);
 	vcpu->arch.last_host_tsc = rdtsc();
 	/*

Wanpeng, can you try?

Paolo

> Perhaps you can add a lock event counter to count the number of
> wait_early events caused by vcpu_is_preempted() being true to see if it
> really cause a lot more wait_early than without the vcpu_is_preempted()
> call.
> 
> I have no objection to this, I just want to find out the root cause of it.
> 
> Cheers,
> Longman
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
  2019-09-09 11:06   ` Paolo Bonzini
@ 2019-09-09 12:16     ` Wanpeng Li
  0 siblings, 0 replies; 7+ messages in thread
From: Wanpeng Li @ 2019-09-09 12:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Waiman Long, LKML, kvm, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	loobinliu, # v3 . 10+

On Mon, 9 Sep 2019 at 19:06, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/09/19 12:56, Waiman Long wrote:
> > On 9/9/19 2:40 AM, Wanpeng Li wrote:
> >> From: Wanpeng Li <wanpengli@tencent.com>
> >>
> >> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if
> >> vCPU is preempted), we found great regression caused by this commit.
> >>
> >> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
> >> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
> >> records/s with this commit.
> >>
> >>           Host                       Guest                score
> >>
> >> vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
> >> vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
> >> vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
> >> vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
> >>
> >> Exit from aggressive wait-early mechanism can result in yield premature and
> >> incur extra scheduling latency in over-subscribe scenario.
> >>
> >> kvm optimizes:
> >> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
> >> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
> >>
> >> Tested-by: loobinliu@tencent.com
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Ingo Molnar <mingo@kernel.org>
> >> Cc: Waiman Long <longman@redhat.com>
> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
> >> Cc: loobinliu@tencent.com
> >> Cc: stable@vger.kernel.org
> >> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
> >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> >> ---
> >>  kernel/locking/qspinlock_paravirt.h | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> >> index 89bab07..e84d21a 100644
> >> --- a/kernel/locking/qspinlock_paravirt.h
> >> +++ b/kernel/locking/qspinlock_paravirt.h
> >> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
> >>      if ((loop & PV_PREV_CHECK_MASK) != 0)
> >>              return false;
> >>
> >> -    return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
> >> +    return READ_ONCE(prev->state) != vcpu_running;
> >>  }
> >>
> >>  /*
> >
> > There are several possibilities for this performance regression:
> >
> > 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
> > cacheline contention issue depending on how that callback is implemented.
>
> Unlikely, it is a single percpu read.
>
> > 2) KVM may set the preempt flag for a short period whenver an vmexit
> > happens even if a vmenter is executed shortly after. In this case, we
> > may want to use a more durable vcpu suspend flag that indicates the vcpu
> > won't get a real vcpu back for a longer period of time.
>
> It sets it for exits to userspace, but they shouldn't really happen on a
> properly-configured system.
>
> However, it's easy to test this theory:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2e302e977dac..feb6c75a7a88 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3368,26 +3368,28 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>         int idx;
>
> -       if (vcpu->preempted)
> +       if (vcpu->preempted) {
>                 vcpu->arch.preempted_in_kernel = !kvm_x86_ops->get_cpl(vcpu);
>
> -       /*
> -        * Disable page faults because we're in atomic context here.
> -        * kvm_write_guest_offset_cached() would call might_fault()
> -        * that relies on pagefault_disable() to tell if there's a
> -        * bug. NOTE: the write to guest memory may not go through if
> -        * during postcopy live migration or if there's heavy guest
> -        * paging.
> -        */
> -       pagefault_disable();
> -       /*
> -        * kvm_memslots() will be called by
> -        * kvm_write_guest_offset_cached() so take the srcu lock.
> -        */
> -       idx = srcu_read_lock(&vcpu->kvm->srcu);
> -       kvm_steal_time_set_preempted(vcpu);
> -       srcu_read_unlock(&vcpu->kvm->srcu, idx);
> -       pagefault_enable();
> +               /*
> +                * Disable page faults because we're in atomic context here.
> +                * kvm_write_guest_offset_cached() would call might_fault()
> +                * that relies on pagefault_disable() to tell if there's a
> +                * bug. NOTE: the write to guest memory may not go through if
> +                * during postcopy live migration or if there's heavy guest
> +                * paging.
> +                */
> +               pagefault_disable();
> +               /*
> +                * kvm_memslots() will be called by
> +                * kvm_write_guest_offset_cached() so take the srcu lock.
> +                */
> +               idx = srcu_read_lock(&vcpu->kvm->srcu);
> +               kvm_steal_time_set_preempted(vcpu);
> +               srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +               pagefault_enable();
> +       }
> +
>         kvm_x86_ops->vcpu_put(vcpu);
>         vcpu->arch.last_host_tsc = rdtsc();
>         /*
>
> Wanpeng, can you try?

Yes, there is no difference for the score.

Wanpeng

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
  2019-09-09 10:56 ` Waiman Long
  2019-09-09 11:06   ` Paolo Bonzini
@ 2019-09-10  5:56   ` Wanpeng Li
  2019-09-11  4:25     ` Waiman Long
  1 sibling, 1 reply; 7+ messages in thread
From: Wanpeng Li @ 2019-09-10  5:56 UTC (permalink / raw)
  To: Waiman Long
  Cc: LKML, kvm, Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	loobinliu, # v3 . 10+

On Mon, 9 Sep 2019 at 18:56, Waiman Long <longman@redhat.com> wrote:
>
> On 9/9/19 2:40 AM, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if
> > vCPU is preempted), we found great regression caused by this commit.
> >
> > Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
> > The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
> > records/s with this commit.
> >
> >           Host                       Guest                score
> >
> > vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
> > vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
> > vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
> > vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
> >
> > Exit from aggressive wait-early mechanism can result in yield premature and
> > incur extra scheduling latency in over-subscribe scenario.
> >
> > kvm optimizes:
> > [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
> > [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
> >
> > Tested-by: loobinliu@tencent.com
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: loobinliu@tencent.com
> > Cc: stable@vger.kernel.org
> > Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  kernel/locking/qspinlock_paravirt.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
> > index 89bab07..e84d21a 100644
> > --- a/kernel/locking/qspinlock_paravirt.h
> > +++ b/kernel/locking/qspinlock_paravirt.h
> > @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
> >       if ((loop & PV_PREV_CHECK_MASK) != 0)
> >               return false;
> >
> > -     return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
> > +     return READ_ONCE(prev->state) != vcpu_running;
> >  }
> >
> >  /*
>
> There are several possibilities for this performance regression:
>
> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
> cacheline contention issue depending on how that callback is implemented.
>
> 2) KVM may set the preempt flag for a short period whenver an vmexit
> happens even if a vmenter is executed shortly after. In this case, we
> may want to use a more durable vcpu suspend flag that indicates the vcpu
> won't get a real vcpu back for a longer period of time.
>
> Perhaps you can add a lock event counter to count the number of
> wait_early events caused by vcpu_is_preempted() being true to see if it
> really cause a lot more wait_early than without the vcpu_is_preempted()
> call.

pv_wait_again:1:179
pv_wait_early:1:189429
pv_wait_head:1:263
pv_wait_node:1:189429
pv_vcpu_is_preempted:1:45588
=========sleep 5============
pv_wait_again:1:181
pv_wait_early:1:202574
pv_wait_head:1:267
pv_wait_node:1:202590
pv_vcpu_is_preempted:1:46336

The sampling period is 5s, 6% of wait_early events caused by
vcpu_is_preempted() being true.

                Wanpeng

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
  2019-09-10  5:56   ` Wanpeng Li
@ 2019-09-11  4:25     ` Waiman Long
  2019-09-11 13:04       ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2019-09-11  4:25 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	loobinliu, # v3 . 10+

On 9/10/19 6:56 AM, Wanpeng Li wrote:
> On Mon, 9 Sep 2019 at 18:56, Waiman Long <longman@redhat.com> wrote:
>> On 9/9/19 2:40 AM, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpengli@tencent.com>
>>>
>>> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if
>>> vCPU is preempted), we found great regression caused by this commit.
>>>
>>> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
>>> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
>>> records/s with this commit.
>>>
>>>           Host                       Guest                score
>>>
>>> vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
>>> vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
>>> vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
>>> vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
>>>
>>> Exit from aggressive wait-early mechanism can result in yield premature and
>>> incur extra scheduling latency in over-subscribe scenario.
>>>
>>> kvm optimizes:
>>> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
>>> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
>>>
>>> Tested-by: loobinliu@tencent.com
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Waiman Long <longman@redhat.com>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Cc: loobinliu@tencent.com
>>> Cc: stable@vger.kernel.org
>>> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
>>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>>> ---
>>>  kernel/locking/qspinlock_paravirt.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>>> index 89bab07..e84d21a 100644
>>> --- a/kernel/locking/qspinlock_paravirt.h
>>> +++ b/kernel/locking/qspinlock_paravirt.h
>>> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>>>       if ((loop & PV_PREV_CHECK_MASK) != 0)
>>>               return false;
>>>
>>> -     return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
>>> +     return READ_ONCE(prev->state) != vcpu_running;
>>>  }
>>>
>>>  /*
>> There are several possibilities for this performance regression:
>>
>> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
>> cacheline contention issue depending on how that callback is implemented.
>>
>> 2) KVM may set the preempt flag for a short period whenver an vmexit
>> happens even if a vmenter is executed shortly after. In this case, we
>> may want to use a more durable vcpu suspend flag that indicates the vcpu
>> won't get a real vcpu back for a longer period of time.
>>
>> Perhaps you can add a lock event counter to count the number of
>> wait_early events caused by vcpu_is_preempted() being true to see if it
>> really cause a lot more wait_early than without the vcpu_is_preempted()
>> call.
> pv_wait_again:1:179
> pv_wait_early:1:189429
> pv_wait_head:1:263
> pv_wait_node:1:189429
> pv_vcpu_is_preempted:1:45588
> =========sleep 5============
> pv_wait_again:1:181
> pv_wait_early:1:202574
> pv_wait_head:1:267
> pv_wait_node:1:202590
> pv_vcpu_is_preempted:1:46336
>
> The sampling period is 5s, 6% of wait_early events caused by
> vcpu_is_preempted() being true.

6% isn't that high. However, when one vCPU voluntarily releases its
vCPU, all the subsequently waiters in the queue will do the same. It is
a cascading effect. Perhaps we wait early too aggressive with the
original patch.

I also look up the email chain of the original commit. The patch
submitter did not provide any performance data to support this change.
The patch just looked reasonable at that time. So there was no
objection. Given that we now have hard evidence that this was not a good
idea. I think we should revert it.

Reviewed-by: Waiman Long <longman@redhat.com>

Thanks,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
  2019-09-11  4:25     ` Waiman Long
@ 2019-09-11 13:04       ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2019-09-11 13:04 UTC (permalink / raw)
  To: Waiman Long, Wanpeng Li
  Cc: LKML, kvm, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	loobinliu, # v3 . 10+

On 11/09/19 06:25, Waiman Long wrote:
> On 9/10/19 6:56 AM, Wanpeng Li wrote:
>> On Mon, 9 Sep 2019 at 18:56, Waiman Long <longman@redhat.com> wrote:
>>> On 9/9/19 2:40 AM, Wanpeng Li wrote:
>>>> From: Wanpeng Li <wanpengli@tencent.com>
>>>>
>>>> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if
>>>> vCPU is preempted), we found great regression caused by this commit.
>>>>
>>>> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
>>>> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
>>>> records/s with this commit.
>>>>
>>>>           Host                       Guest                score
>>>>
>>>> vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
>>>> vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
>>>> vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
>>>> vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
>>>>
>>>> Exit from aggressive wait-early mechanism can result in yield premature and
>>>> incur extra scheduling latency in over-subscribe scenario.
>>>>
>>>> kvm optimizes:
>>>> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
>>>> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
>>>>
>>>> Tested-by: loobinliu@tencent.com
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@kernel.org>
>>>> Cc: Waiman Long <longman@redhat.com>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>>> Cc: loobinliu@tencent.com
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
>>>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>>>> ---
>>>>  kernel/locking/qspinlock_paravirt.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>>>> index 89bab07..e84d21a 100644
>>>> --- a/kernel/locking/qspinlock_paravirt.h
>>>> +++ b/kernel/locking/qspinlock_paravirt.h
>>>> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>>>>       if ((loop & PV_PREV_CHECK_MASK) != 0)
>>>>               return false;
>>>>
>>>> -     return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
>>>> +     return READ_ONCE(prev->state) != vcpu_running;
>>>>  }
>>>>
>>>>  /*
>>> There are several possibilities for this performance regression:
>>>
>>> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
>>> cacheline contention issue depending on how that callback is implemented.
>>>
>>> 2) KVM may set the preempt flag for a short period whenver an vmexit
>>> happens even if a vmenter is executed shortly after. In this case, we
>>> may want to use a more durable vcpu suspend flag that indicates the vcpu
>>> won't get a real vcpu back for a longer period of time.
>>>
>>> Perhaps you can add a lock event counter to count the number of
>>> wait_early events caused by vcpu_is_preempted() being true to see if it
>>> really cause a lot more wait_early than without the vcpu_is_preempted()
>>> call.
>> pv_wait_again:1:179
>> pv_wait_early:1:189429
>> pv_wait_head:1:263
>> pv_wait_node:1:189429
>> pv_vcpu_is_preempted:1:45588
>> =========sleep 5============
>> pv_wait_again:1:181
>> pv_wait_early:1:202574
>> pv_wait_head:1:267
>> pv_wait_node:1:202590
>> pv_vcpu_is_preempted:1:46336
>>
>> The sampling period is 5s, 6% of wait_early events caused by
>> vcpu_is_preempted() being true.
> 
> 6% isn't that high. However, when one vCPU voluntarily releases its
> vCPU, all the subsequently waiters in the queue will do the same. It is
> a cascading effect. Perhaps we wait early too aggressive with the
> original patch.
> 
> I also look up the email chain of the original commit. The patch
> submitter did not provide any performance data to support this change.
> The patch just looked reasonable at that time. So there was no
> objection. Given that we now have hard evidence that this was not a good
> idea. I think we should revert it.
> 
> Reviewed-by: Waiman Long <longman@redhat.com>
> 
> Thanks,
> Longman
> 

Queued, thanks.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-09  1:40 [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" Wanpeng Li
2019-09-09 10:56 ` Waiman Long
2019-09-09 11:06   ` Paolo Bonzini
2019-09-09 12:16     ` Wanpeng Li
2019-09-10  5:56   ` Wanpeng Li
2019-09-11  4:25     ` Waiman Long
2019-09-11 13:04       ` Paolo Bonzini

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox