stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/3] KVM: Fix leak vCPU's VMCS value into other pCPU
       [not found] <1564572438-15518-1-git-send-email-wanpengli@tencent.com>
@ 2019-07-31 11:27 ` Wanpeng Li
  2019-07-31 11:39   ` [PATCH v2 " Wanpeng Li
  0 siblings, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2019-07-31 11:27 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, stable

From: Wanpeng Li <wanpengli@tencent.com>

After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a 
five years ago bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs 
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting 
in the VMs after stress testing:

 INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
 Call Trace:
   flush_tlb_mm_range+0x68/0x140
   tlb_flush_mmu.part.75+0x37/0xe0
   tlb_finish_mmu+0x55/0x60
   zap_page_range+0x142/0x190
   SyS_madvise+0x3cd/0x9c0
   system_call_fastpath+0x1c/0x21

swait_active() sustains to be true before finish_swait() is called in 
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account 
by kvm_vcpu_on_spin() loop greatly increases the probability condition 
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv 
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by 
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current 
VMCS.

This patch fixes it by reverting the kvm_arch_vcpu_runnable() condition 
in kvm_vcpu_on_spin() loop.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ed061d8..12f2c91 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2506,7 +2506,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 				continue;
 			if (vcpu == me)
 				continue;
-			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
+			if (swait_active(&vcpu->wq))
 				continue;
 			if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
 				!kvm_arch_vcpu_in_kernel(vcpu))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 3/3] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-07-31 11:27 ` [PATCH 3/3] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
@ 2019-07-31 11:39   ` Wanpeng Li
  2019-07-31 12:55     ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Wanpeng Li @ 2019-07-31 11:39 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, stable

From: Wanpeng Li <wanpengli@tencent.com>

After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a 
five years ago bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs 
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting 
in the VMs after stress testing:

 INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
 Call Trace:
   flush_tlb_mm_range+0x68/0x140
   tlb_flush_mmu.part.75+0x37/0xe0
   tlb_finish_mmu+0x55/0x60
   zap_page_range+0x142/0x190
   SyS_madvise+0x3cd/0x9c0
   system_call_fastpath+0x1c/0x21

swait_active() sustains to be true before finish_swait() is called in 
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account 
by kvm_vcpu_on_spin() loop greatly increases the probability condition 
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv 
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by 
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current 
VMCS.

This patch fixes it by reverting the kvm_arch_vcpu_runnable() condition 
in kvm_vcpu_on_spin() loop and checking swait_active(&vcpu->wq) for 
involuntary preemption.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v1 -> v2:
 * checking swait_active(&vcpu->wq) for involuntary preemption

 virt/kvm/kvm_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ed061d8..12f2c91 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2506,7 +2506,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 				continue;
 			if (vcpu == me)
 				continue;
-			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
+			if (READ_ONCE(vcpu->preempted) && swait_active(&vcpu->wq))
 				continue;
 			if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
 				!kvm_arch_vcpu_in_kernel(vcpu))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 3/3] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-07-31 11:39   ` [PATCH v2 " Wanpeng Li
@ 2019-07-31 12:55     ` Paolo Bonzini
  2019-08-01  3:35       ` Wanpeng Li
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2019-07-31 12:55 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, stable, Marc Zyngier, Christian Borntraeger

On 31/07/19 13:39, Wanpeng Li wrote:
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ed061d8..12f2c91 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2506,7 +2506,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
>  				continue;
>  			if (vcpu == me)
>  				continue;
> -			if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
> +			if (READ_ONCE(vcpu->preempted) && swait_active(&vcpu->wq))
>  				continue;
>  			if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
>  				!kvm_arch_vcpu_in_kernel(vcpu))
> 

This cannot work.  swait_active means you are waiting, so you cannot be
involuntarily preempted.

The problem here is simply that kvm_vcpu_has_events is being called
without holding the lock.  So kvm_arch_vcpu_runnable is okay, it's the
implementation that's wrong.

Just rename the existing function to just vcpu_runnable and make a new
arch callback kvm_arch_dy_runnable.   kvm_arch_dy_runnable can be
conservative and only returns true for a subset of events, in particular
for x86 it can check:

- vcpu->arch.pv.pv_unhalted

- KVM_REQ_NMI or KVM_REQ_SMI or KVM_REQ_EVENT

- PIR.ON if APICv is set

Ultimately, all variables accessed in kvm_arch_dy_runnable should be
accessed with READ_ONCE or atomic_read.

And for all architectures, kvm_vcpu_on_spin should check
list_empty_careful(&vcpu->async_pf.done)

It's okay if your patch renames the function in non-x86 architectures,
leaving the fix to maintainers.  So, let's CC Marc and Christian since
ARM and s390 have pretty complex kvm_arch_vcpu_runnable as well.

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 3/3] KVM: Fix leak vCPU's VMCS value into other pCPU
  2019-07-31 12:55     ` Paolo Bonzini
@ 2019-08-01  3:35       ` Wanpeng Li
  0 siblings, 0 replies; 4+ messages in thread
From: Wanpeng Li @ 2019-08-01  3:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář, # v3 . 10+,
	Marc Zyngier, Christian Borntraeger

On Wed, 31 Jul 2019 at 20:55, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 31/07/19 13:39, Wanpeng Li wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index ed061d8..12f2c91 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2506,7 +2506,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
> >                               continue;
> >                       if (vcpu == me)
> >                               continue;
> > -                     if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
> > +                     if (READ_ONCE(vcpu->preempted) && swait_active(&vcpu->wq))
> >                               continue;
> >                       if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
> >                               !kvm_arch_vcpu_in_kernel(vcpu))
> >
>
> This cannot work.  swait_active means you are waiting, so you cannot be
> involuntarily preempted.
>
> The problem here is simply that kvm_vcpu_has_events is being called
> without holding the lock.  So kvm_arch_vcpu_runnable is okay, it's the
> implementation that's wrong.
>
> Just rename the existing function to just vcpu_runnable and make a new
> arch callback kvm_arch_dy_runnable.   kvm_arch_dy_runnable can be
> conservative and only returns true for a subset of events, in particular
> for x86 it can check:
>
> - vcpu->arch.pv.pv_unhalted
>
> - KVM_REQ_NMI or KVM_REQ_SMI or KVM_REQ_EVENT
>
> - PIR.ON if APICv is set
>
> Ultimately, all variables accessed in kvm_arch_dy_runnable should be
> accessed with READ_ONCE or atomic_read.
>
> And for all architectures, kvm_vcpu_on_spin should check
> list_empty_careful(&vcpu->async_pf.done)
>
> It's okay if your patch renames the function in non-x86 architectures,
> leaving the fix to maintainers.  So, let's CC Marc and Christian since
> ARM and s390 have pretty complex kvm_arch_vcpu_runnable as well.

Ok, just sent patch to do this.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-08-01  3:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1564572438-15518-1-git-send-email-wanpengli@tencent.com>
2019-07-31 11:27 ` [PATCH 3/3] KVM: Fix leak vCPU's VMCS value into other pCPU Wanpeng Li
2019-07-31 11:39   ` [PATCH v2 " Wanpeng Li
2019-07-31 12:55     ` Paolo Bonzini
2019-08-01  3:35       ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).