kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling
@ 2021-05-17 14:00 Wanpeng Li
  2021-05-17 14:00 ` [PATCH v3 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Wanpeng Li @ 2021-05-17 14:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Ben Segall, Venkatesh Srinivas,
	David Matlack, Paul Mackerras, Suraj Jitindar Singh

From: Wanpeng Li <wanpengli@tencent.com>

Inspired by commit 262de4102c7bb8 (kvm: exit halt polling on need_resched()
as well), CFS_BANDWIDTH throttling will use resched_task() when there is just
one task to get the task to block. It was likely allowing VMs to overrun their
quota when halt polling. Due to PPC implements an arch specific halt polling
logic, we should add the need_resched() checking there as well. This
patch adds a helper function that to be shared between book3s and generic
halt-polling loop.

Cc: Ben Segall <bsegall@google.com>
Cc: Venkatesh Srinivas <venkateshs@chromium.org>
Cc: Jim Mattson <jmattson@google.com> 
Cc: David Matlack <dmatlack@google.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v2 -> v3:
 * add a helper function
v1 -> v2:
 * update patch description

 arch/powerpc/kvm/book3s_hv.c | 2 +-
 include/linux/kvm_host.h     | 2 ++
 virt/kvm/kvm_main.c          | 9 +++++++--
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..360165df345b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3936,7 +3936,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 				break;
 			}
 			cur = ktime_get();
-		} while (single_task_running() && ktime_before(cur, stop));
+		} while (kvm_vcpu_can_block(cur, stop));
 
 		spin_lock(&vc->lock);
 		vc->vcore_state = VCORE_INACTIVE;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2f34487e21f2..bf4fd60c4699 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1583,4 +1583,6 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
 
+bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop);
+
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6b4feb92dc79..c81080667fd1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2945,6 +2945,12 @@ update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
 		vcpu->stat.halt_poll_success_ns += poll_ns;
 }
 
+
+bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop)
+{
+	return single_task_running() && !need_resched() && ktime_before(cur, stop);
+}
+
 /*
  * The vCPU has executed a HLT instruction with in-kernel mode enabled.
  */
@@ -2973,8 +2979,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 				goto out;
 			}
 			poll_end = cur = ktime_get();
-		} while (single_task_running() && !need_resched() &&
-			 ktime_before(cur, stop));
+		} while (kvm_vcpu_can_block(cur, stop));
 	}
 
 	prepare_to_rcuwait(&vcpu->wait);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios
  2021-05-17 14:00 [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
@ 2021-05-17 14:00 ` Wanpeng Li
  2021-05-17 14:00 ` [PATCH v3 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2021-05-17 14:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

In case of under-committed scenarios, vCPU can get scheduling easily,
kvm_vcpu_yield_to add extra overhead, we can observe a lot of race
between vcpu->ready is true and yield fails due to p->state is
TASK_RUNNING. Let's bail out in such scenarios by checking the length
of current cpu runqueue, it can be treated as a hint of under-committed
instead of guarantee of accuracy. The directed_yield_successful/attempted 
ratio can be improved from 50+% to 80+% in the under-committed scenario. 

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v2 -> v3:
 * update patch description
v1 -> v2:
 * move the check after attempted counting
 * update patch description

 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..dfb7c320581f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8360,6 +8360,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id)
 
 	vcpu->stat.directed_yield_attempted++;
 
+	if (single_task_running())
+		goto no_yield;
+
 	rcu_read_lock();
 	map = rcu_dereference(vcpu->kvm->arch.apic_map);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view
  2021-05-17 14:00 [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
  2021-05-17 14:00 ` [PATCH v3 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
@ 2021-05-17 14:00 ` Wanpeng Li
  2021-05-17 14:00 ` [PATCH v3 4/5] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2021-05-17 14:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, stable

From: Wanpeng Li <wanpengli@tencent.com>

Commit 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's
CPUID) avoids to access pv tlb shootdown host side logic when this pv feature
is not exposed to guest, however, kvm_steal_time.preempted not only leveraged
by pv tlb shootdown logic but also mitigate the lock holder preemption issue.
From guest's point of view, vCPU is always preempted since we lose the reset
of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not
exposed. This patch fixes it by clearing kvm_steal_time.preempted before
vmentry.

Fixes: 66570e966dd9 (kvm: x86: only provide PV features if enabled in guest's CPUID)
Reviewed-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
v1 -> v2:
 * add curly braces

 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dfb7c320581f..bed7b5348c0e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3105,6 +3105,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 				       st->preempted & KVM_VCPU_FLUSH_TLB);
 		if (xchg(&st->preempted, 0) & KVM_VCPU_FLUSH_TLB)
 			kvm_vcpu_flush_tlb_guest(vcpu);
+	} else {
+		st->preempted = 0;
 	}
 
 	vcpu->arch.st.preempted = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 4/5] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots()
  2021-05-17 14:00 [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
  2021-05-17 14:00 ` [PATCH v3 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
  2021-05-17 14:00 ` [PATCH v3 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
@ 2021-05-17 14:00 ` Wanpeng Li
  2021-05-17 14:00 ` [PATCH v3 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2021-05-17 14:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

   WARNING: suspicious RCU usage
   5.13.0-rc1 #4 Not tainted
   -----------------------------
   ./include/linux/kvm_host.h:710 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
   1 lock held by hyperv_clock/8318:
    #0: ffffb6b8cb05a7d8 (&hv->hv_lock){+.+.}-{3:3}, at: kvm_hv_invalidate_tsc_page+0x3e/0xa0 [kvm]

  stack backtrace:
  CPU: 3 PID: 8318 Comm: hyperv_clock Not tainted 5.13.0-rc1 #4
  Call Trace:
   dump_stack+0x87/0xb7
   lockdep_rcu_suspicious+0xce/0xf0
   kvm_write_guest_page+0x1c1/0x1d0 [kvm]
   kvm_write_guest+0x50/0x90 [kvm]
   kvm_hv_invalidate_tsc_page+0x79/0xa0 [kvm]
   kvm_gen_update_masterclock+0x1d/0x110 [kvm]
   kvm_arch_vm_ioctl+0x2a7/0xc50 [kvm]
   kvm_vm_ioctl+0x123/0x11d0 [kvm]
   __x64_sys_ioctl+0x3ed/0x9d0
   do_syscall_64+0x3d/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

kvm_memslots() will be called by kvm_write_guest(), so we should take the srcu lock.

Fixes: e880c6ea5 (KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs)
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/hyperv.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index f98370a39936..f00830e5202f 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1172,6 +1172,7 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 {
 	struct kvm_hv *hv = to_kvm_hv(kvm);
 	u64 gfn;
+	int idx;
 
 	if (hv->hv_tsc_page_status == HV_TSC_PAGE_BROKEN ||
 	    hv->hv_tsc_page_status == HV_TSC_PAGE_UNSET ||
@@ -1190,9 +1191,16 @@ void kvm_hv_invalidate_tsc_page(struct kvm *kvm)
 	gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
 
 	hv->tsc_ref.tsc_sequence = 0;
+
+	/*
+	 * Take the srcu lock as memslots will be accessed to check the gfn
+	 * cache generation against the memslots generation.
+	 */
+	idx = srcu_read_lock(&kvm->srcu);
 	if (kvm_write_guest(kvm, gfn_to_gpa(gfn),
 			    &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)))
 		hv->hv_tsc_page_status = HV_TSC_PAGE_BROKEN;
+	srcu_read_unlock(&kvm->srcu, idx);
 
 out_unlock:
 	mutex_unlock(&hv->hv_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  2021-05-17 14:00 [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
                   ` (2 preceding siblings ...)
  2021-05-17 14:00 ` [PATCH v3 4/5] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
@ 2021-05-17 14:00 ` Wanpeng Li
  2021-05-17 17:51   ` Sean Christopherson
  2021-05-17 16:34 ` [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling David Matlack
  2021-05-17 18:13 ` Venkatesh Srinivas
  5 siblings, 1 reply; 10+ messages in thread
From: Wanpeng Li @ 2021-05-17 14:00 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

Let's treat lapic_timer_advance_ns automatically tune logic as hypervisor
overhead, move it before wait_lapic_expire instead of between wait_lapic_expire 
and the world switch, the wait duration should be calculated by the 
up-to-date guest_tsc after the overhead of automatically tune logic. This 
patch reduces ~30+ cycles for kvm-unit-tests/tscdeadline-latency when testing 
busy waits.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c0ebef560bd1..552d2acf89ab 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1598,11 +1598,12 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
 	apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
 
-	if (guest_tsc < tsc_deadline)
-		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
-
 	if (lapic_timer_advance_dynamic)
 		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
+
+	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+	if (guest_tsc < tsc_deadline)
+		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
 }
 
 void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling
  2021-05-17 14:00 [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
                   ` (3 preceding siblings ...)
  2021-05-17 14:00 ` [PATCH v3 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
@ 2021-05-17 16:34 ` David Matlack
  2021-05-18 12:03   ` Wanpeng Li
  2021-05-17 18:13 ` Venkatesh Srinivas
  5 siblings, 1 reply; 10+ messages in thread
From: David Matlack @ 2021-05-17 16:34 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm list, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Ben Segall, Venkatesh Srinivas, Paul Mackerras,
	Suraj Jitindar Singh

On Mon, May 17, 2021 at 7:01 AM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> Inspired by commit 262de4102c7bb8 (kvm: exit halt polling on need_resched()
> as well), CFS_BANDWIDTH throttling will use resched_task() when there is just
> one task to get the task to block. It was likely allowing VMs to overrun their
> quota when halt polling. Due to PPC implements an arch specific halt polling
> logic, we should add the need_resched() checking there as well. This
> patch adds a helper function that to be shared between book3s and generic
> halt-polling loop.
>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Venkatesh Srinivas <venkateshs@chromium.org>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>

Reviewed-by: David Matlack <dmatlack@google.com>

> ---
> v2 -> v3:
>  * add a helper function
> v1 -> v2:
>  * update patch description
>
>  arch/powerpc/kvm/book3s_hv.c | 2 +-
>  include/linux/kvm_host.h     | 2 ++
>  virt/kvm/kvm_main.c          | 9 +++++++--
>  3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 28a80d240b76..360165df345b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3936,7 +3936,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>                                 break;
>                         }
>                         cur = ktime_get();
> -               } while (single_task_running() && ktime_before(cur, stop));
> +               } while (kvm_vcpu_can_block(cur, stop));
>
>                 spin_lock(&vc->lock);
>                 vc->vcore_state = VCORE_INACTIVE;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2f34487e21f2..bf4fd60c4699 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1583,4 +1583,6 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  /* Max number of entries allowed for each kvm dirty ring */
>  #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
>
> +bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop);
> +
>  #endif
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6b4feb92dc79..c81080667fd1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2945,6 +2945,12 @@ update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
>                 vcpu->stat.halt_poll_success_ns += poll_ns;
>  }
>
> +
> +bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop)

nit: kvm_vcpu_can_poll() would be a more accurate name for this function.

> +{
> +       return single_task_running() && !need_resched() && ktime_before(cur, stop);
> +}
> +
>  /*
>   * The vCPU has executed a HLT instruction with in-kernel mode enabled.
>   */
> @@ -2973,8 +2979,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>                                 goto out;
>                         }
>                         poll_end = cur = ktime_get();
> -               } while (single_task_running() && !need_resched() &&
> -                        ktime_before(cur, stop));
> +               } while (kvm_vcpu_can_block(cur, stop));
>         }
>
>         prepare_to_rcuwait(&vcpu->wait);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  2021-05-17 14:00 ` [PATCH v3 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
@ 2021-05-17 17:51   ` Sean Christopherson
  2021-05-18 12:03     ` Wanpeng Li
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2021-05-17 17:51 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Mon, May 17, 2021, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Let's treat lapic_timer_advance_ns automatically tune logic as hypervisor
> overhead, move it before wait_lapic_expire instead of between wait_lapic_expire 
> and the world switch, the wait duration should be calculated by the 
> up-to-date guest_tsc after the overhead of automatically tune logic. This 
> patch reduces ~30+ cycles for kvm-unit-tests/tscdeadline-latency when testing 
> busy waits.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/lapic.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c0ebef560bd1..552d2acf89ab 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1598,11 +1598,12 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
>  	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>  	apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
>  
> -	if (guest_tsc < tsc_deadline)
> -		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
> -
>  	if (lapic_timer_advance_dynamic)
>  		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
> +
> +	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());

This is redundant and unnecessary if automatic tuning is disabled, or if the
timer did not arrive early.  A comment would also be helpful.  E.g. I think this
would micro-optimize all paths:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c0ebef560bd1..5d91f2367c31 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1598,11 +1598,19 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
        guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
        apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;

+       if (lapic_timer_advance_dynamic) {
+               adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
+               /*
+                * If the timer fired early, reread the TSC to account for the
+                * overhead of the above adjustment to avoid waiting longer
+                * than is necessary.
+                */
+               if (guest_tsc < tsc_deadline)
+                       guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+       }
+
        if (guest_tsc < tsc_deadline)
                __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
-
-       if (lapic_timer_advance_dynamic)
-               adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
 }

 void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)

> +	if (guest_tsc < tsc_deadline)
> +		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
>  }
>  
>  void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
> -- 
> 2.25.1
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling
  2021-05-17 14:00 [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
                   ` (4 preceding siblings ...)
  2021-05-17 16:34 ` [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling David Matlack
@ 2021-05-17 18:13 ` Venkatesh Srinivas
  5 siblings, 0 replies; 10+ messages in thread
From: Venkatesh Srinivas @ 2021-05-17 18:13 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Ben Segall, David Matlack, Paul Mackerras, Suraj Jitindar Singh

On Mon, May 17, 2021 at 7:01 AM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> Inspired by commit 262de4102c7bb8 (kvm: exit halt polling on need_resched()
> as well), CFS_BANDWIDTH throttling will use resched_task() when there is just
> one task to get the task to block. It was likely allowing VMs to overrun their
> quota when halt polling. Due to PPC implements an arch specific halt polling
> logic, we should add the need_resched() checking there as well. This
> patch adds a helper function that to be shared between book3s and generic
> halt-polling loop.
>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Venkatesh Srinivas <venkateshs@chromium.org>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: David Matlack <dmatlack@google.com>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> v2 -> v3:
>  * add a helper function
> v1 -> v2:
>  * update patch description
>
>  arch/powerpc/kvm/book3s_hv.c | 2 +-
>  include/linux/kvm_host.h     | 2 ++
>  virt/kvm/kvm_main.c          | 9 +++++++--
>  3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 28a80d240b76..360165df345b 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3936,7 +3936,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>                                 break;
>                         }
>                         cur = ktime_get();
> -               } while (single_task_running() && ktime_before(cur, stop));
> +               } while (kvm_vcpu_can_block(cur, stop));
>
>                 spin_lock(&vc->lock);
>                 vc->vcore_state = VCORE_INACTIVE;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2f34487e21f2..bf4fd60c4699 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1583,4 +1583,6 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  /* Max number of entries allowed for each kvm dirty ring */
>  #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
>
> +bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop);
> +
>  #endif
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6b4feb92dc79..c81080667fd1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2945,6 +2945,12 @@ update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
>                 vcpu->stat.halt_poll_success_ns += poll_ns;
>  }
>
> +
> +bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop)
> +{
> +       return single_task_running() && !need_resched() && ktime_before(cur, stop);
> +}
> +
>  /*
>   * The vCPU has executed a HLT instruction with in-kernel mode enabled.
>   */
> @@ -2973,8 +2979,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>                                 goto out;
>                         }
>                         poll_end = cur = ktime_get();
> -               } while (single_task_running() && !need_resched() &&
> -                        ktime_before(cur, stop));
> +               } while (kvm_vcpu_can_block(cur, stop));
>         }
>
>         prepare_to_rcuwait(&vcpu->wait);
> --
> 2.25.1
>

Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling
  2021-05-17 16:34 ` [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling David Matlack
@ 2021-05-18 12:03   ` Wanpeng Li
  0 siblings, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2021-05-18 12:03 UTC (permalink / raw)
  To: David Matlack
  Cc: LKML, kvm list, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Ben Segall, Venkatesh Srinivas, Paul Mackerras,
	Suraj Jitindar Singh

On Tue, 18 May 2021 at 00:35, David Matlack <dmatlack@google.com> wrote:
>
> On Mon, May 17, 2021 at 7:01 AM Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Inspired by commit 262de4102c7bb8 (kvm: exit halt polling on need_resched()
> > as well), CFS_BANDWIDTH throttling will use resched_task() when there is just
> > one task to get the task to block. It was likely allowing VMs to overrun their
> > quota when halt polling. Due to PPC implements an arch specific halt polling
> > logic, we should add the need_resched() checking there as well. This
> > patch adds a helper function that to be shared between book3s and generic
> > halt-polling loop.
> >
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Venkatesh Srinivas <venkateshs@chromium.org>
> > Cc: Jim Mattson <jmattson@google.com>
> > Cc: David Matlack <dmatlack@google.com>
> > Cc: Paul Mackerras <paulus@ozlabs.org>
> > Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>
> Reviewed-by: David Matlack <dmatlack@google.com>
>
> > ---
> > v2 -> v3:
> >  * add a helper function
> > v1 -> v2:
> >  * update patch description
> >
> >  arch/powerpc/kvm/book3s_hv.c | 2 +-
> >  include/linux/kvm_host.h     | 2 ++
> >  virt/kvm/kvm_main.c          | 9 +++++++--
> >  3 files changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 28a80d240b76..360165df345b 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3936,7 +3936,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >                                 break;
> >                         }
> >                         cur = ktime_get();
> > -               } while (single_task_running() && ktime_before(cur, stop));
> > +               } while (kvm_vcpu_can_block(cur, stop));
> >
> >                 spin_lock(&vc->lock);
> >                 vc->vcore_state = VCORE_INACTIVE;
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 2f34487e21f2..bf4fd60c4699 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1583,4 +1583,6 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
> >  /* Max number of entries allowed for each kvm dirty ring */
> >  #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
> >
> > +bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop);
> > +
> >  #endif
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 6b4feb92dc79..c81080667fd1 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2945,6 +2945,12 @@ update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
> >                 vcpu->stat.halt_poll_success_ns += poll_ns;
> >  }
> >
> > +
> > +bool kvm_vcpu_can_block(ktime_t cur, ktime_t stop)
>
> nit: kvm_vcpu_can_poll() would be a more accurate name for this function.

Do it in v4. :)

    Wanpeng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  2021-05-17 17:51   ` Sean Christopherson
@ 2021-05-18 12:03     ` Wanpeng Li
  0 siblings, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2021-05-18 12:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Tue, 18 May 2021 at 01:51, Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, May 17, 2021, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Let's treat lapic_timer_advance_ns automatically tune logic as hypervisor
> > overhead, move it before wait_lapic_expire instead of between wait_lapic_expire
> > and the world switch, the wait duration should be calculated by the
> > up-to-date guest_tsc after the overhead of automatically tune logic. This
> > patch reduces ~30+ cycles for kvm-unit-tests/tscdeadline-latency when testing
> > busy waits.
> >
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/kvm/lapic.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index c0ebef560bd1..552d2acf89ab 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1598,11 +1598,12 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
> >       guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> >       apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
> >
> > -     if (guest_tsc < tsc_deadline)
> > -             __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
> > -
> >       if (lapic_timer_advance_dynamic)
> >               adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
> > +
> > +     guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>
> This is redundant and unnecessary if automatic tuning is disabled, or if the
> timer did not arrive early.  A comment would also be helpful.  E.g. I think this
> would micro-optimize all paths:

Do it in v4, thanks.

    Wanpeng

>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c0ebef560bd1..5d91f2367c31 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1598,11 +1598,19 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
>         guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>         apic->lapic_timer.advance_expire_delta = guest_tsc - tsc_deadline;
>
> +       if (lapic_timer_advance_dynamic) {
> +               adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
> +               /*
> +                * If the timer fired early, reread the TSC to account for the
> +                * overhead of the above adjustment to avoid waiting longer
> +                * than is necessary.
> +                */
> +               if (guest_tsc < tsc_deadline)
> +                       guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> +       }
> +
>         if (guest_tsc < tsc_deadline)
>                 __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
> -
> -       if (lapic_timer_advance_dynamic)
> -               adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
>  }
>
>  void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
>
> > +     if (guest_tsc < tsc_deadline)
> > +             __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
> >  }
> >
> >  void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-18 12:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17 14:00 [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling Wanpeng Li
2021-05-17 14:00 ` [PATCH v3 2/5] KVM: X86: Bail out of direct yield in case of under-committed scenarios Wanpeng Li
2021-05-17 14:00 ` [PATCH v3 3/5] KVM: X86: Fix vCPU preempted state from guest's point of view Wanpeng Li
2021-05-17 14:00 ` [PATCH v3 4/5] KVM: x86: hyper-v: Task srcu lock when accessing kvm_memslots() Wanpeng Li
2021-05-17 14:00 ` [PATCH v3 5/5] KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch Wanpeng Li
2021-05-17 17:51   ` Sean Christopherson
2021-05-18 12:03     ` Wanpeng Li
2021-05-17 16:34 ` [PATCH v3 1/5] KVM: exit halt polling on need_resched() for both book3s and generic halt-polling David Matlack
2021-05-18 12:03   ` Wanpeng Li
2021-05-17 18:13 ` Venkatesh Srinivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).