linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots
@ 2020-02-18  1:17 Wanpeng Li
  2020-02-18  1:17 ` [PATCH RESEND v4 2/2] KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow Wanpeng Li
  2020-02-18 14:54 ` [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Vitaly Kuznetsov
  0 siblings, 2 replies; 7+ messages in thread
From: Wanpeng Li @ 2020-02-18  1:17 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

In the progress of vCPUs creation, it queues a kvmclock sync worker to the global 
workqueue before each vCPU creation completes. Each worker will be scheduled 
after 300 * HZ delay and request a kvmclock update for all vCPUs and kick them 
out. This is especially worse when scaling to large VMs due to a lot of vmexits. 
Just one worker as a leader to trigger the kvmclock sync request for all vCPUs is 
enough.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v3 -> v4:
 * check vcpu->vcpu_idx

 arch/x86/kvm/x86.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fb5d64e..d0ba2d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9390,8 +9390,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 	if (!kvmclock_periodic_sync)
 		return;
 
-	schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
-					KVMCLOCK_SYNC_PERIOD);
+	if (vcpu->vcpu_idx == 0)
+		schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
+						KVMCLOCK_SYNC_PERIOD);
 }
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RESEND v4 2/2] KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow
  2020-02-18  1:17 [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Wanpeng Li
@ 2020-02-18  1:17 ` Wanpeng Li
  2020-02-18 14:54 ` [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Vitaly Kuznetsov
  1 sibling, 0 replies; 7+ messages in thread
From: Wanpeng Li @ 2020-02-18  1:17 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: wanpeng li <wanpengli@tencent.com>

For the duration of mapping eVMCS, it derefences ->memslots without holding 
->srcu or ->slots_lock when accessing hv assist page. This patch fixes it by 
moving nested_sync_vmcs12_to_shadow to prepare_guest_switch, where the SRCU 
is already taken.

It can be reproduced by running kvm's evmcs_test selftest.

  =============================
  warning: suspicious rcu usage
  5.6.0-rc1+ #53 tainted: g        w ioe
  -----------------------------
  ./include/linux/kvm_host.h:623 suspicious rcu_dereference_check() usage!
 
  other info that might help us debug this:
 
   rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by evmcs_test/8507:
   #0: ffff9ddd156d00d0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680 [kvm]
 
  stack backtrace:
  cpu: 6 pid: 8507 comm: evmcs_test tainted: g        w ioe     5.6.0-rc1+ #53
  hardware name: dell inc. optiplex 7040/0jctf8, bios 1.4.9 09/12/2016
  call trace:
   dump_stack+0x68/0x9b
   kvm_read_guest_cached+0x11d/0x150 [kvm]
   kvm_hv_get_assist_page+0x33/0x40 [kvm]
   nested_enlightened_vmentry+0x2c/0x60 [kvm_intel]
   nested_vmx_handle_enlightened_vmptrld.part.52+0x32/0x1c0 [kvm_intel]
   nested_sync_vmcs12_to_shadow+0x439/0x680 [kvm_intel]
   vmx_vcpu_run+0x67a/0xe60 [kvm_intel]
   vcpu_enter_guest+0x35e/0x1bc0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x40b/0x670 [kvm]
   kvm_vcpu_ioctl+0x370/0x680 [kvm]
   ksys_ioctl+0x235/0x850
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x77/0x780
   entry_syscall_64_after_hwframe+0x49/0xbe

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v2 -> v3:
 * update Subject
 * move the check above
 * add the WARN_ON_ONCE

 arch/x86/kvm/vmx/vmx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3be25ec..9a6797f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1175,6 +1175,10 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
 					   vmx->guest_msrs[i].mask);
 
 	}
+
+	if (vmx->nested.need_vmcs12_to_shadow_sync)
+		nested_sync_vmcs12_to_shadow(vcpu);
+
 	if (vmx->guest_state_loaded)
 		return;
 
@@ -6482,8 +6486,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		vmcs_write32(PLE_WINDOW, vmx->ple_window);
 	}
 
-	if (vmx->nested.need_vmcs12_to_shadow_sync)
-		nested_sync_vmcs12_to_shadow(vcpu);
+	WARN_ON_ONCE(vmx->nested.need_vmcs12_to_shadow_sync);
 
 	if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP))
 		vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots
  2020-02-18  1:17 [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Wanpeng Li
  2020-02-18  1:17 ` [PATCH RESEND v4 2/2] KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow Wanpeng Li
@ 2020-02-18 14:54 ` Vitaly Kuznetsov
  2020-02-18 15:33   ` Paolo Bonzini
  2020-02-19  0:32   ` Wanpeng Li
  1 sibling, 2 replies; 7+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-18 14:54 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson,
	Joerg Roedel, linux-kernel, kvm

Wanpeng Li <kernellwp@gmail.com> writes:

> From: Wanpeng Li <wanpengli@tencent.com>
>
> In the progress of vCPUs creation, it queues a kvmclock sync worker to the global 
> workqueue before each vCPU creation completes. Each worker will be scheduled 
> after 300 * HZ delay and request a kvmclock update for all vCPUs and kick them 
> out. This is especially worse when scaling to large VMs due to a lot of vmexits. 
> Just one worker as a leader to trigger the kvmclock sync request for all vCPUs is 
> enough.
>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v3 -> v4:
>  * check vcpu->vcpu_idx
>
>  arch/x86/kvm/x86.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fb5d64e..d0ba2d4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9390,8 +9390,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>  	if (!kvmclock_periodic_sync)
>  		return;
>  
> -	schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> -					KVMCLOCK_SYNC_PERIOD);
> +	if (vcpu->vcpu_idx == 0)
> +		schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> +						KVMCLOCK_SYNC_PERIOD);
>  }
>  
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)

Forgive me my ignorance, I was under the impression
schedule_delayed_work() doesn't do anything if the work is already
queued (see queue_delayed_work_on()) and we seem to be scheduling the
same work (&kvm->arch.kvmclock_sync_work) which is per-kvm (not
per-vcpu). Do we actually happen to finish executing it before next vCPU
is created or why does the storm you describe happens?

-- 
Vitaly


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots
  2020-02-18 14:54 ` [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Vitaly Kuznetsov
@ 2020-02-18 15:33   ` Paolo Bonzini
  2020-02-18 16:29     ` Vitaly Kuznetsov
  2020-02-19  0:32   ` Wanpeng Li
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2020-02-18 15:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Wanpeng Li
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-kernel, kvm

On 18/02/20 15:54, Vitaly Kuznetsov wrote:
>> -	schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
>> -					KVMCLOCK_SYNC_PERIOD);
>> +	if (vcpu->vcpu_idx == 0)
>> +		schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
>> +						KVMCLOCK_SYNC_PERIOD);
>>  }
>>  
>>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> Forgive me my ignorance, I was under the impression
> schedule_delayed_work() doesn't do anything if the work is already
> queued (see queue_delayed_work_on()) and we seem to be scheduling the
> same work (&kvm->arch.kvmclock_sync_work) which is per-kvm (not
> per-vcpu).

No, it executes after 5 minutes.  I agree that the patch shouldn't be
really necessary, though you do save on cacheline bouncing due to
test_and_set_bit.

Paolo

> Do we actually happen to finish executing it before next vCPU
> is created or why does the storm you describe happens?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots
  2020-02-18 15:33   ` Paolo Bonzini
@ 2020-02-18 16:29     ` Vitaly Kuznetsov
  2020-02-18 16:31       ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Vitaly Kuznetsov @ 2020-02-18 16:29 UTC (permalink / raw)
  To: Paolo Bonzini, Wanpeng Li
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-kernel, kvm

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 18/02/20 15:54, Vitaly Kuznetsov wrote:
>>> -	schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
>>> -					KVMCLOCK_SYNC_PERIOD);
>>> +	if (vcpu->vcpu_idx == 0)
>>> +		schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
>>> +						KVMCLOCK_SYNC_PERIOD);
>>>  }
>>>  
>>>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>> Forgive me my ignorance, I was under the impression
>> schedule_delayed_work() doesn't do anything if the work is already
>> queued (see queue_delayed_work_on()) and we seem to be scheduling the
>> same work (&kvm->arch.kvmclock_sync_work) which is per-kvm (not
>> per-vcpu).
>
> No, it executes after 5 minutes.  I agree that the patch shouldn't be
> really necessary, though you do save on cacheline bouncing due to
> test_and_set_bit.
>

True, but the changelog should probably be updated then.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots
  2020-02-18 16:29     ` Vitaly Kuznetsov
@ 2020-02-18 16:31       ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2020-02-18 16:31 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Wanpeng Li
  Cc: Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-kernel, kvm

On 18/02/20 17:29, Vitaly Kuznetsov wrote:
>> No, it executes after 5 minutes.  I agree that the patch shouldn't be
>> really necessary, though you do save on cacheline bouncing due to
>> test_and_set_bit.
> 
> True, but the changelog should probably be updated then.

Yes, I agree.

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots
  2020-02-18 14:54 ` [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Vitaly Kuznetsov
  2020-02-18 15:33   ` Paolo Bonzini
@ 2020-02-19  0:32   ` Wanpeng Li
  1 sibling, 0 replies; 7+ messages in thread
From: Wanpeng Li @ 2020-02-19  0:32 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Sean Christopherson, Wanpeng Li, Jim Mattson,
	Joerg Roedel, LKML, kvm

On Tue, 18 Feb 2020 at 22:54, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Wanpeng Li <kernellwp@gmail.com> writes:
>
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > In the progress of vCPUs creation, it queues a kvmclock sync worker to the global
> > workqueue before each vCPU creation completes. Each worker will be scheduled
> > after 300 * HZ delay and request a kvmclock update for all vCPUs and kick them
> > out. This is especially worse when scaling to large VMs due to a lot of vmexits.
> > Just one worker as a leader to trigger the kvmclock sync request for all vCPUs is
> > enough.
> >
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > v3 -> v4:
> >  * check vcpu->vcpu_idx
> >
> >  arch/x86/kvm/x86.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index fb5d64e..d0ba2d4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -9390,8 +9390,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
> >       if (!kvmclock_periodic_sync)
> >               return;
> >
> > -     schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> > -                                     KVMCLOCK_SYNC_PERIOD);
> > +     if (vcpu->vcpu_idx == 0)
> > +             schedule_delayed_work(&kvm->arch.kvmclock_sync_work,
> > +                                             KVMCLOCK_SYNC_PERIOD);
> >  }
> >
> >  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>
> Forgive me my ignorance, I was under the impression
> schedule_delayed_work() doesn't do anything if the work is already
> queued (see queue_delayed_work_on()) and we seem to be scheduling the
> same work (&kvm->arch.kvmclock_sync_work) which is per-kvm (not
> per-vcpu). Do we actually happen to finish executing it before next vCPU
> is created or why does the storm you describe happens?

I miss it, ok, let's just make patch 2/2 upstream.

    Wanpeng

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-02-19  0:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-18  1:17 [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Wanpeng Li
2020-02-18  1:17 ` [PATCH RESEND v4 2/2] KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow Wanpeng Li
2020-02-18 14:54 ` [PATCH v4 1/2] KVM: X86: Less kvmclock sync induced vmexits after VM boots Vitaly Kuznetsov
2020-02-18 15:33   ` Paolo Bonzini
2020-02-18 16:29     ` Vitaly Kuznetsov
2020-02-18 16:31       ` Paolo Bonzini
2020-02-19  0:32   ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).