kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios
@ 2021-05-13  1:59 Wanpeng Li
  2021-05-13  1:59 ` [PATCH v2 3/4] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Wanpeng Li @ 2021-05-13  1:59 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

In case of under-comitted scenarios, vCPU can get scheduling easily, 
kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race 
between vcpu->ready is true and yield fails due to p->state is 
TASK_RUNNING. Let's bail out in such scenarios by checking the length 
of current cpu runqueue, it can be treated as a hint of under-committed 
instead of guarantee of accuracy.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v1 -> v2:
 * move the check after attempted counting
 * update patch description

 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca6..dfb7c32 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 
 	vcpu->stat.directed_yield_attempted++;
 
+	if (single_task_running())
+		goto no_yield;
+
 	rcu_read_lock();
 	map = rcu_dereference(vcpu->kvm->arch.apic_map);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/4] KVM: X86: Fix vCPU preempted state from guest's point of view
  2021-05-13  1:59 [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios Wanpeng Li
@ 2021-05-13  1:59 ` Wanpeng Li
  2021-05-13  1:59 ` [PATCH v2 4/4] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
  2021-05-14 21:32 ` [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios David Matlack
  2 siblings, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2021-05-13  1:59 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, stable

From: Wanpeng Li <wanpengli@tencent.com>

Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's 
CPUID) avoids to access pv tlb shootdown host side logic when this pv feature 
is not exposed to guest, however, kvm_steal_time.preempted not only leveraged 
by pv tlb shootdown logic but also mitigate the lock holder preemption issue. 
From guest's point of view, vCPU is always preempted since we lose the reset 
of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not 
exposed. This patch fixes it by clearing kvm_steal_time.preempted before 
vmentry.

Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID)
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dfb7c32..bed7b53 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3105,6 +3105,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 				       st->preempted & KVM_VCPU_FLUSH_TLB);
 		if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
 			kvm_vcpu_flush_tlb_guest(vcpu);
+	} else {
+		st->preempted = 0;
 	}
 
 	vcpu->arch.st.preempted = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 4/4] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots()
  2021-05-13  1:59 [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios Wanpeng Li
  2021-05-13  1:59 ` [PATCH v2 3/4] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
@ 2021-05-13  1:59 ` Wanpeng Li
  2021-05-14 21:32 ` [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios David Matlack
  2 siblings, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2021-05-13  1:59 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

   WARNING: suspicious RCU usage
   5.13.0-rc1 #4 Not tainted
   -----------------------------
   ./include/linux/kvm_host.h:710 suspicious rcu_dereference_check() usage!
  
  other info that might help us debug this:
  
  rcu_scheduler_active = 2, debug_locks = 1
   1 lock held by hyperv_clock/8318:
    #0: ffffb6b8cb05a7d8 (&hv->hv_lock){+.+.}-{3:3}, at: kvm_hv_invalidate_tsc_page+0x3e/0xa0 [kvm]
  
  stack backtrace:
  CPU: 3 PID: 8318 Comm: hyperv_clock Not tainted 5.13.0-rc1 #4
  Call Trace:
   dump_stack+0x87/0xb7
   lockdep_rcu_suspicious+0xce/0xf0
   kvm_write_guest_page+0x1c1/0x1d0 [kvm]
   kvm_write_guest+0x50/0x90 [kvm]
   kvm_hv_invalidate_tsc_page+0x79/0xa0 [kvm]
   kvm_gen_update_masterclock+0x1d/0x110 [kvm]
   kvm_arch_vm_ioctl+0x2a7/0xc50 [kvm]
   kvm_vm_ioctl+0x123/0x11d0 [kvm]
   __x64_sys_ioctl+0x3ed/0x9d0
   do_syscall_64+0x3d/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

kvm_memslots() will be called by kvm_write_guest(), so we should take the srcu lock.

Fixes: e880c6ea5 (KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs)
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/hyperv.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index f98370a3..f00830e 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1172,6 +1172,7 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 {
 	struct kvm_hv *hv = to_kvm_hv(kvm);
 	u64 gfn;
+	int idx;
 
 	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
 	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
@@ -1190,9 +1191,16 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
 
 	hv->tsc_ref.tsc_sequence = 0;
+
+	/*
+	 * Take the srcu lock as memslots will be accessed to check the gfn
+	 * cache generation against the memslots generation.
+	 */
+	idx = srcu_read_lock(&kvm->srcu);
 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
 			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
 		hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
+	srcu_read_unlock(&kvm->srcu, idx);
 
 out_unlock:
 	mutex_unlock(&hv->hv_lock);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios
  2021-05-13  1:59 [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios Wanpeng Li
  2021-05-13  1:59 ` [PATCH v2 3/4] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
  2021-05-13  1:59 ` [PATCH v2 4/4] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
@ 2021-05-14 21:32 ` David Matlack
  2021-05-17  1:03   ` Wanpeng Li
  2 siblings, 1 reply; 5+ messages in thread
From: David Matlack @ 2021-05-14 21:32 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm list, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On Wed, May 12, 2021 at 7:01 PM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> In case of under-comitted scenarios, vCPU can get scheduling easily,
> kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race
> between vcpu->ready is true and yield fails due to p->state is
> TASK_RUNNING. Let's bail out in such scenarios by checking the length
> of current cpu runqueue, it can be treated as a hint of under-committed
> instead of guarantee of accuracy.
>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v1 -> v2:
>  * move the check after attempted counting
>  * update patch description
>
>  arch/x86/kvm/x86.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca6..dfb7c32 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
>
>         vcpu->stat.directed_yield_attempted++;
>
> +       if (single_task_running())
> +               goto no_yield;

Since this is a heuristic, do you have any experimental or real world
results that show the benefit?

> +
>         rcu_read_lock();
>         map = rcu_dereference(vcpu->kvm->arch.apic_map);
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios
  2021-05-14 21:32 ` [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios David Matlack
@ 2021-05-17  1:03   ` Wanpeng Li
  0 siblings, 0 replies; 5+ messages in thread
From: Wanpeng Li @ 2021-05-17  1:03 UTC (permalink / raw)
  To: David Matlack
  Cc: LKML, kvm list, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On Sat, 15 May 2021 at 05:33, David Matlack <dmatlack@google.com> wrote:
>
> On Wed, May 12, 2021 at 7:01 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > In case of under-comitted scenarios, vCPU can get scheduling easily,
> > kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race
> > between vcpu->ready is true and yield fails due to p->state is
> > TASK_RUNNING. Let's bail out in such scenarios by checking the length
> > of current cpu runqueue, it can be treated as a hint of under-committed
> > instead of guarantee of accuracy.
> >
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > v1 -> v2:
> >  * move the check after attempted counting
> >  * update patch description
> >
> >  arch/x86/kvm/x86.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9b6bca6..dfb7c32 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
> >
> >         vcpu->stat.directed_yield_attempted++;
> >
> > +       if (single_task_running())
> > +               goto no_yield;
>
> Since this is a heuristic, do you have any experimental or real world
> results that show the benefit?

I observe the directed_yield_successful/directed_yield_attempted
ratio, it can be improved from 50%+ to 80%+ in the under-committed
scenario.

     Wanpeng

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-17  1:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-13  1:59 [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios Wanpeng Li
2021-05-13  1:59 ` [PATCH v2 3/4] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
2021-05-13  1:59 ` [PATCH v2 4/4] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
2021-05-14 21:32 ` [PATCH v2 2/4] KVM: X86: Bail out of direct yield in case of under-comitted scenarios David Matlack
2021-05-17  1:03   ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).