All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling
@ 2021-05-18 12:00 Wanpeng Li
  2021-05-18 12:00 ` [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-05-18 12:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Ben Segall, Venkatesh Srinivas,
	David Matlack, Paul Mackerras, Suraj Jitindar Singh

From: Wanpeng Li <wanpengli@tencent.com>

Inspired by commit 262de4102c7bb8 (kvm: exit halt polling on need_resched()
as well), CFS_BANDWIDTH throttling will use resched_task() when there is just
one task to get the task to block. It was likely allowing VMs to overrun their
quota when halt polling. Due to PPC implements an arch specific halt polling
logic, we should add the need_resched() checking there as well. This
patch adds a helper function that to be shared between book3s and generic
halt-polling loop.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Venkatesh Srinivas <venkateshs@chromium.org>
Cc: Jim Mattson <jmattson@google.com> 
Cc: David Matlack <dmatlack@google.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v3 -> v4:
 * rename to kvm_vcpu_can_poll
v2 -> v3:
 * add a helper function
v1 -> v2:
 * update patch description

 arch/powerpc/kvm/book3s_hv.c | 2 +-
 include/linux/kvm_host.h     | 2 ++
 virt/kvm/kvm_main.c          | 8 ++++++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..7360350e66ff 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3936,7 +3936,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 				break;
 			}
 			cur = ktime_get();
-		} while (single_task_running() && ktime_before(cur, stop));
+		} while (kvm_vcpu_can_poll(cur, stop));
 
 		spin_lock(&vc->lock);
 		vc->vcore_state = VCORE_INACTIVE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2f34487e21f2..ba682f738a25 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1583,4 +1583,6 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
 
+bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop);
+
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6b4feb92dc79..62522c12beba 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2945,6 +2945,11 @@ update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
 		vcpu->stat.halt_poll_success_ns += poll_ns;
 }
 
+bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop)
+{
+	return single_task_running() && !need_resched() && ktime_before(cur, stop);
+}
+
 /*
  * The vCPU has executed a HLT instruction with in-kernel mode enabled.
  */
@@ -2973,8 +2978,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 				goto out;
 			}
 			poll_end = cur = ktime_get();
-		} while (single_task_running() && !need_resched() &&
-			 ktime_before(cur, stop));
+		} while (kvm_vcpu_can_poll(cur, stop));
 	}
 
 	prepare_to_rcuwait(&vcpu->wait);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios
  2021-05-18 12:00 [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
@ 2021-05-18 12:00 ` Wanpeng Li
  2021-05-18 19:21   ` Sean Christopherson
  2021-05-18 12:00 ` [PATCH v4 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2021-05-18 12:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

In case of under-committed scenarios, vCPU can get scheduling easily,
kvm_vcpu_yield_to add extra overhead, we can observe a lot of race
between vcpu->ready is true and yield fails due to p->state is
TASK_RUNNING. Let's bail out in such scenarios by checking the length
of current cpu runqueue, it can be treated as a hint of under-committed
instead of guarantee of accuracy. The directed_yield_successful/attempted
ratio can be improved from 50+% to 80+% in the under-committed scenario.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v2 -> v3:
 * update patch description
v1 -> v2:
 * move the check after attempted counting
 * update patch description

 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..dfb7c320581f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 
 	vcpu->stat.directed_yield_attempted++;
 
+	if (single_task_running())
+		goto no_yield;
+
 	rcu_read_lock();
 	map = rcu_dereference(vcpu->kvm->arch.apic_map);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view
  2021-05-18 12:00 [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
  2021-05-18 12:00 ` [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
@ 2021-05-18 12:00 ` Wanpeng Li
  2021-05-18 12:00 ` [PATCH v4 4/5] KVM: X86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-05-18 12:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, stable

From: Wanpeng Li <wanpengli@tencent.com>

Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's
CPUID) avoids to access pv tlb shootdown host side logic when this pv feature
is not exposed to guest, however, kvm_steal_time.preempted not only leveraged
by pv tlb shootdown logic but also mitigate the lock holder preemption issue.
From guest's point of view, vCPU is always preempted since we lose the reset
of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not
exposed. This patch fixes it by clearing kvm_steal_time.preempted before
vmentry.

Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID)
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v1 -> v2:
 * add curly braces

 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dfb7c320581f..bed7b5348c0e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3105,6 +3105,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 				       st->preempted & KVM_VCPU_FLUSH_TLB);
 		if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
 			kvm_vcpu_flush_tlb_guest(vcpu);
+	} else {
+		st->preempted = 0;
 	}
 
 	vcpu->arch.st.preempted = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/5] KVM: X86: hyper-v: Task srcu lock when accessing kvm_memslots()
  2021-05-18 12:00 [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
  2021-05-18 12:00 ` [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
  2021-05-18 12:00 ` [PATCH v4 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
@ 2021-05-18 12:00 ` Wanpeng Li
  2021-05-18 12:00 ` [PATCH v4 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
  2021-05-24 13:46 ` [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-05-18 12:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

   WARNING: suspicious RCU usage
   5.13.0-rc1 #4 Not tainted
   -----------------------------
   ./include/linux/kvm_host.h:710 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
   1 lock held by hyperv_clock/8318:
    #0: ffffb6b8cb05a7d8 (&hv->hv_lock){+.+.}-{3:3}, at: kvm_hv_invalidate_tsc_page+0x3e/0xa0 [kvm]

  stack backtrace:
  CPU: 3 PID: 8318 Comm: hyperv_clock Not tainted 5.13.0-rc1 #4
  Call Trace:
   dump_stack+0x87/0xb7
   lockdep_rcu_suspicious+0xce/0xf0
   kvm_write_guest_page+0x1c1/0x1d0 [kvm]
   kvm_write_guest+0x50/0x90 [kvm]
   kvm_hv_invalidate_tsc_page+0x79/0xa0 [kvm]
   kvm_gen_update_masterclock+0x1d/0x110 [kvm]
   kvm_arch_vm_ioctl+0x2a7/0xc50 [kvm]
   kvm_vm_ioctl+0x123/0x11d0 [kvm]
   __x64_sys_ioctl+0x3ed/0x9d0
   do_syscall_64+0x3d/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

kvm_memslots() will be called by kvm_write_guest(), so we should take the srcu lock.

Fixes: e880c6ea5 (KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs)
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/hyperv.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index f98370a39936..f00830e5202f 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1172,6 +1172,7 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 {
 	struct kvm_hv *hv = to_kvm_hv(kvm);
 	u64 gfn;
+	int idx;
 
 	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
 	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
@@ -1190,9 +1191,16 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
 
 	hv->tsc_ref.tsc_sequence = 0;
+
+	/*
+	 * Take the srcu lock as memslots will be accessed to check the gfn
+	 * cache generation against the memslots generation.
+	 */
+	idx = srcu_read_lock(&kvm->srcu);
 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
 			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
 		hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
+	srcu_read_unlock(&kvm->srcu, idx);
 
 out_unlock:
 	mutex_unlock(&hv->hv_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  2021-05-18 12:00 [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
                   ` (2 preceding siblings ...)
  2021-05-18 12:00 ` [PATCH v4 4/5] KVM: X86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
@ 2021-05-18 12:00 ` Wanpeng Li
  2021-05-18 19:13   ` Sean Christopherson
  2021-05-24 13:46 ` [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Paolo Bonzini
  4 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2021-05-18 12:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

Let's treat lapic_timer_advance_ns automatic tuning logic as hypervisor
overhead, move it before wait_lapic_expire instead of between wait_lapic_expire 
and the world switch, the wait duration should be calculated by the 
up-to-date guest_tsc after the overhead of automatic tuning logic. This 
patch reduces ~30+ cycles for kvm-unit-tests/tscdeadline-latency when testing 
busy waits.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v3 -> v4:
 * consider automatic tuning is disabled and timer did not arrive early
 * add a code comment

 arch/x86/kvm/lapic.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c0ebef560bd1..5d91f2367c31 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1598,11 +1598,19 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
 	apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
 
+	if (lapic_timer_advance_dynamic) {
+		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
+		/*
+		 * If the timer fired early, reread the TSC to account for the
+		 * overhead of the above adjustment to avoid waiting longer
+		 * than is necessary.
+		 */
+		if (guest_tsc < tsc_deadline)
+			guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+	}
+
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
-
-	if (lapic_timer_advance_dynamic)
-		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
 }
 
 void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  2021-05-18 12:00 ` [PATCH v4 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
@ 2021-05-18 19:13   ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2021-05-18 19:13 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Tue, May 18, 2021, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Let's treat lapic_timer_advance_ns automatic tuning logic as hypervisor
> overhead, move it before wait_lapic_expire instead of between wait_lapic_expire 
> and the world switch, the wait duration should be calculated by the 
> up-to-date guest_tsc after the overhead of automatic tuning logic. This 
> patch reduces ~30+ cycles for kvm-unit-tests/tscdeadline-latency when testing 
> busy waits.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios
  2021-05-18 12:00 ` [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
@ 2021-05-18 19:21   ` Sean Christopherson
  2021-05-19  2:57     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2021-05-18 19:21 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Tue, May 18, 2021, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> In case of under-committed scenarios, vCPU can get scheduling easily,
> kvm_vcpu_yield_to add extra overhead, we can observe a lot of race
> between vcpu->ready is true and yield fails due to p->state is
> TASK_RUNNING. Let's bail out in such scenarios by checking the length
> of current cpu runqueue, it can be treated as a hint of under-committed
> instead of guarantee of accuracy. The directed_yield_successful/attempted
> ratio can be improved from 50+% to 80+% in the under-committed scenario.

The "50+% to 80+%" comment will be a bit confusing for future readers now that
the single_task_running() case counts as an attempt.  I think the new comment
would be something like "30%+ of directed-yield attempts can avoid the expensive
lookups in kvm_sched_yield() in an under-committed scenario."  That would also
provide the real justification, as bumping the success ratio isn't the true goal
of this path.

> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v2 -> v3:
>  * update patch description
> v1 -> v2:
>  * move the check after attempted counting
>  * update patch description
> 
>  arch/x86/kvm/x86.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca616929..dfb7c320581f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
>  
>  	vcpu->stat.directed_yield_attempted++;
>  
> +	if (single_task_running())
> +		goto no_yield;
> +
>  	rcu_read_lock();
>  	map = rcu_dereference(vcpu->kvm->arch.apic_map);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios
  2021-05-18 19:21   ` Sean Christopherson
@ 2021-05-19  2:57     ` Wanpeng Li
  2021-05-24 13:42       ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2021-05-19  2:57 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, 19 May 2021 at 03:21, Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, May 18, 2021, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > In case of under-committed scenarios, vCPU can get scheduling easily,
> > kvm_vcpu_yield_to add extra overhead, we can observe a lot of race
> > between vcpu->ready is true and yield fails due to p->state is
> > TASK_RUNNING. Let's bail out in such scenarios by checking the length
> > of current cpu runqueue, it can be treated as a hint of under-committed
> > instead of guarantee of accuracy. The directed_yield_successful/attempted
> > ratio can be improved from 50+% to 80+% in the under-committed scenario.
>
> The "50+% to 80+%" comment will be a bit confusing for future readers now that
> the single_task_running() case counts as an attempt.  I think the new comment
> would be something like "30%+ of directed-yield attempts can avoid the expensive
> lookups in kvm_sched_yield() in an under-committed scenario."  That would also
> provide the real justification, as bumping the success ratio isn't the true goal
> of this path.

Looks good. Hope Paolo can update the patch description when applying. :)

"In case of under-committed scenarios, vCPU can get scheduling easily,
kvm_vcpu_yield_to add extra overhead, we can observe a lot of races
between vcpu->ready is true and yield fails due to p->state is
TASK_RUNNING. Let's bail out in such scenarios by checking the length
of current cpu runqueue, it can be treated as a hint of under-committed
instead of guaranteeing accuracy. 30%+ of directed-yield attempts can
avoid the expensive lookups in kvm_sched_yield() in an under-committed
scenario. "

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios
  2021-05-19  2:57     ` Wanpeng Li
@ 2021-05-24 13:42       ` Paolo Bonzini
  2021-05-24 13:47         ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 13:42 UTC (permalink / raw)
  To: Wanpeng Li, Sean Christopherson
  Cc: LKML, kvm, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On 19/05/21 04:57, Wanpeng Li wrote:
> Looks good. Hope Paolo can update the patch description when applying.:)
> 
> "In case of under-committed scenarios, vCPU can get scheduling easily,
> kvm_vcpu_yield_to add extra overhead, we can observe a lot of races
> between vcpu->ready is true and yield fails due to p->state is
> TASK_RUNNING. Let's bail out in such scenarios by checking the length
> of current cpu runqueue, it can be treated as a hint of under-committed
> instead of guaranteeing accuracy. 30%+ of directed-yield attempts can
> avoid the expensive lookups in kvm_sched_yield() in an under-committed
> scenario. "
> 

Here is what I used:

     In case of under-committed scenarios, vCPUs can be scheduled easily;
     kvm_vcpu_yield_to adds extra overhead, and it is also common to see
     when vcpu->ready is true but yield later failing due to p->state is
     TASK_RUNNING.
     
     Let's bail out in such scenarios by checking the length of current cpu
     runqueue, which can be treated as a hint of under-committed instead of
     guarantee of accuracy. 30%+ of directed-yield attempts can now avoid
     the expensive lookups in kvm_sched_yield() in an under-committed scenario.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling
  2021-05-18 12:00 [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
                   ` (3 preceding siblings ...)
  2021-05-18 12:00 ` [PATCH v4 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
@ 2021-05-24 13:46 ` Paolo Bonzini
  4 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-05-24 13:46 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Ben Segall, Venkatesh Srinivas, David Matlack,
	Paul Mackerras, Suraj Jitindar Singh

On 18/05/21 14:00, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Inspired by commit 262de4102c7bb8 (kvm: exit halt polling on need_resched()
> as well), CFS_BANDWIDTH throttling will use resched_task() when there is just
> one task to get the task to block. It was likely allowing VMs to overrun their
> quota when halt polling. Due to PPC implements an arch specific halt polling
> logic, we should add the need_resched() checking there as well. This
> patch adds a helper function that to be shared between book3s and generic
> halt-polling loop.
> 
> Reviewed-by: David Matlack <dmatlack@google.com>
> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Venkatesh Srinivas <venkateshs@chromium.org>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v3 -> v4:
>   * rename to kvm_vcpu_can_poll
> v2 -> v3:
>   * add a helper function
> v1 -> v2:
>   * update patch description
> 
>   arch/powerpc/kvm/book3s_hv.c | 2 +-
>   include/linux/kvm_host.h     | 2 ++
>   virt/kvm/kvm_main.c          | 8 ++++++--
>   3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 28a80d240b76..7360350e66ff 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3936,7 +3936,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>   				break;
>   			}
>   			cur = ktime_get();
> -		} while (single_task_running() && ktime_before(cur, stop));
> +		} while (kvm_vcpu_can_poll(cur, stop));
>   
>   		spin_lock(&vc->lock);
>   		vc->vcore_state = VCORE_INACTIVE;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2f34487e21f2..ba682f738a25 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1583,4 +1583,6 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>   /* Max number of entries allowed for each kvm dirty ring */
>   #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
>   
> +bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop);
> +
>   #endif
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6b4feb92dc79..62522c12beba 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2945,6 +2945,11 @@ update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
>   		vcpu->stat.halt_poll_success_ns += poll_ns;
>   }
>   
> +bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop)
> +{
> +	return single_task_running() && !need_resched() && ktime_before(cur, stop);
> +}
> +
>   /*
>    * The vCPU has executed a HLT instruction with in-kernel mode enabled.
>    */
> @@ -2973,8 +2978,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   				goto out;
>   			}
>   			poll_end = cur = ktime_get();
> -		} while (single_task_running() && !need_resched() &&
> -			 ktime_before(cur, stop));
> +		} while (kvm_vcpu_can_poll(cur, stop));
>   	}
>   
>   	prepare_to_rcuwait(&vcpu->wait);
> 

Queued all five, thanks.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios
  2021-05-24 13:42       ` Paolo Bonzini
@ 2021-05-24 13:47         ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2021-05-24 13:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, LKML, kvm, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Mon, 24 May 2021 at 21:42, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 19/05/21 04:57, Wanpeng Li wrote:
> > Looks good. Hope Paolo can update the patch description when applying.:)
> >
> > "In case of under-committed scenarios, vCPU can get scheduling easily,
> > kvm_vcpu_yield_to add extra overhead, we can observe a lot of races
> > between vcpu->ready is true and yield fails due to p->state is
> > TASK_RUNNING. Let's bail out in such scenarios by checking the length
> > of current cpu runqueue, it can be treated as a hint of under-committed
> > instead of guaranteeing accuracy. 30%+ of directed-yield attempts can
> > avoid the expensive lookups in kvm_sched_yield() in an under-committed
> > scenario. "
> >
>
> Here is what I used:
>
>      In case of under-committed scenarios, vCPUs can be scheduled easily;
>      kvm_vcpu_yield_to adds extra overhead, and it is also common to see
>      when vcpu->ready is true but yield later failing due to p->state is
>      TASK_RUNNING.
>
>      Let's bail out in such scenarios by checking the length of current cpu
>      runqueue, which can be treated as a hint of under-committed instead of
>      guarantee of accuracy. 30%+ of directed-yield attempts can now avoid
>      the expensive lookups in kvm_sched_yield() in an under-committed scenario.

Thanks Paolo! :)
    Wanpeng

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-05-24 13:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-18 12:00 [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
2021-05-18 12:00 ` [PATCH v4 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
2021-05-18 19:21   ` Sean Christopherson
2021-05-19  2:57     ` Wanpeng Li
2021-05-24 13:42       ` Paolo Bonzini
2021-05-24 13:47         ` Wanpeng Li
2021-05-18 12:00 ` [PATCH v4 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
2021-05-18 12:00 ` [PATCH v4 4/5] KVM: X86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
2021-05-18 12:00 ` [PATCH v4 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
2021-05-18 19:13   ` Sean Christopherson
2021-05-24 13:46 ` [PATCH v4 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.