linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND 0/3] KVM: Paravirt remote TLB flush
@ 2017-11-09  2:02 Wanpeng Li
  2017-11-09  2:02 ` [PATCH RESEND 1/3] KVM: Add vCPU running/preempted state Wanpeng Li
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-11-09  2:02 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter. Idea was discussed here:
https://lkml.org/lkml/2012/2/20/157

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based 
page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")

Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
one linux guest.

ebizzy -M 
              vanilla    optimized     boost
 8 vCPUs       10152       10083       -0.68% 
16 vCPUs        1224        4866       297.5% 
24 vCPUs        1109        3871       249%
32 vCPUs        1025        3375       229.3% 

Wanpeng Li (3):
  KVM: Add vCPU running/preempted state
  KVM: Add paravirt remote TLB flush
  KVM: Add flush_on_enter before guest enter

 arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
 arch/x86/kernel/kvm.c                | 31 ++++++++++++++++++++++++++++++-
 arch/x86/kvm/x86.c                   | 12 ++++++++++--
 3 files changed, 44 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RESEND 1/3] KVM: Add vCPU running/preempted state
  2017-11-09  2:02 [PATCH RESEND 0/3] KVM: Paravirt remote TLB flush Wanpeng Li
@ 2017-11-09  2:02 ` Wanpeng Li
  2017-11-09  2:02 ` [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush Wanpeng Li
  2017-11-09  2:02 ` [PATCH RESEND 3/3] KVM: Add flush_on_enter before guest enter Wanpeng Li
  2 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-11-09  2:02 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch reuses the preempted field in kvm_steal_time, and will export 
the vcpu running/pre-empted information to the guest from host. This will 
enable guest to intelligently send ipi to running vcpus and set flag for 
pre-empted vcpus. This will prevent waiting for vcpus that are not running.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/uapi/asm/kvm_para.h | 3 +++
 arch/x86/kernel/kvm.c                | 2 +-
 arch/x86/kvm/x86.c                   | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index a965e5b0..ff23ce9 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -50,6 +50,9 @@ struct kvm_steal_time {
 	__u32 pad[11];
 };
 
+#define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
+#define KVM_VCPU_PREEMPTED          (1 << 0)
+
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
 	__s64 sec;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8bb9594..1b1b641 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -608,7 +608,7 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
 {
 	struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
 
-	return !!src->preempted;
+	return !!(src->preempted & KVM_VCPU_PREEMPTED);
 }
 PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2507c6..1ea28a2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2116,7 +2116,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
 		return;
 
-	vcpu->arch.st.steal.preempted = 0;
+	vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
 
 	if (vcpu->arch.st.steal.version & 1)
 		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
@@ -2887,7 +2887,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
-	vcpu->arch.st.steal.preempted = 1;
+	vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
 
 	kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
 			&vcpu->arch.st.steal.preempted,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush
  2017-11-09  2:02 [PATCH RESEND 0/3] KVM: Paravirt remote TLB flush Wanpeng Li
  2017-11-09  2:02 ` [PATCH RESEND 1/3] KVM: Add vCPU running/preempted state Wanpeng Li
@ 2017-11-09  2:02 ` Wanpeng Li
  2017-11-09 10:48   ` Paolo Bonzini
  2017-11-09 15:11   ` Radim Krčmář
  2017-11-09  2:02 ` [PATCH RESEND 3/3] KVM: Add flush_on_enter before guest enter Wanpeng Li
  2 siblings, 2 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-11-09  2:02 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter.

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
one linux guest.

ebizzy -M 
              vanilla    optimized     boost
 8 vCPUs       10152       10083       -0.68% 
16 vCPUs        1224        4866       297.5% 
24 vCPUs        1109        3871       249%
32 vCPUs        1025        3375       229.3% 

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c                | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index ff23ce9..189e354 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -52,6 +52,7 @@ struct kvm_steal_time {
 
 #define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
 #define KVM_VCPU_PREEMPTED          (1 << 0)
+#define KVM_VCPU_SHOULD_FLUSH       (1 << 1)
 
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1b1b641..2e2f3ae 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void)
 	update_intr_gate(X86_TRAP_PF, async_page_fault);
 }
 
+static void kvm_flush_tlb_others(const struct cpumask *cpumask,
+			const struct flush_tlb_info *info)
+{
+	u8 state;
+	int cpu;
+	struct kvm_steal_time *src;
+	cpumask_t flushmask;
+
+
+	cpumask_copy(&flushmask, cpumask);
+	/*
+	 * We have to call flush only on online vCPUs. And
+	 * queue flush_on_enter for pre-empted vCPUs
+	 */
+	for_each_cpu(cpu, cpumask) {
+		src = &per_cpu(steal_time, cpu);
+		state = src->preempted;
+		if ((state & KVM_VCPU_PREEMPTED)) {
+			if (cmpxchg(&src->preempted, state, state | 1 <<
+				KVM_VCPU_SHOULD_FLUSH))
+					cpumask_clear_cpu(cpu, &flushmask);
+		}
+	}
+
+	native_flush_tlb_others(&flushmask, info);
+}
+
 void __init kvm_guest_init(void)
 {
 	int i;
@@ -484,6 +511,8 @@ void __init kvm_guest_init(void)
 		pv_time_ops.steal_clock = kvm_steal_clock;
 	}
 
+	pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND 3/3] KVM: Add flush_on_enter before guest enter
  2017-11-09  2:02 [PATCH RESEND 0/3] KVM: Paravirt remote TLB flush Wanpeng Li
  2017-11-09  2:02 ` [PATCH RESEND 1/3] KVM: Add vCPU running/preempted state Wanpeng Li
  2017-11-09  2:02 ` [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush Wanpeng Li
@ 2017-11-09  2:02 ` Wanpeng Li
  2017-11-09 10:54   ` Paolo Bonzini
  2 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2017-11-09  2:02 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

PV-Flush guest would indicate to flush on enter, flush the TLB before
entering and exiting the guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/x86.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1ea28a2..f295360 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2116,7 +2116,13 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
 		return;
 
-	vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
+	if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
+			(KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED))
+		/*
+		 * Do TLB_FLUSH before entering the guest, its passed
+		 * the stage of request checking
+		 */
+		kvm_x86_ops->tlb_flush(vcpu);
 
 	if (vcpu->arch.st.steal.version & 1)
 		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
@@ -2887,7 +2893,9 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
-	vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
+	if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_PREEMPTED) ==
+				KVM_VCPU_SHOULD_FLUSH)
+		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 
 	kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
 			&vcpu->arch.st.steal.preempted,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush
  2017-11-09  2:02 ` [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush Wanpeng Li
@ 2017-11-09 10:48   ` Paolo Bonzini
  2017-11-09 11:01     ` Wanpeng Li
  2017-11-09 15:11   ` Radim Krčmář
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2017-11-09 10:48 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Wanpeng Li, Eduardo Valentin

On 09/11/2017 03:02, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
> 
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter.
> 
> The best result is achieved when we're overcommiting the host by running 
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
> vCPUs which are not scheduled and avoid the wait on the main CPU.
> 
> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
> one linux guest.
> 
> ebizzy -M 
>               vanilla    optimized     boost
>  8 vCPUs       10152       10083       -0.68% 
> 16 vCPUs        1224        4866       297.5% 
> 24 vCPUs        1109        3871       249%
> 32 vCPUs        1025        3375       229.3% 
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/include/uapi/asm/kvm_para.h |  1 +
>  arch/x86/kernel/kvm.c                | 29 +++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index ff23ce9..189e354 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -52,6 +52,7 @@ struct kvm_steal_time {
>  
>  #define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
>  #define KVM_VCPU_PREEMPTED          (1 << 0)
> +#define KVM_VCPU_SHOULD_FLUSH       (1 << 1)
>  
>  #define KVM_CLOCK_PAIRING_WALLCLOCK 0
>  struct kvm_clock_pairing {
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 1b1b641..2e2f3ae 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void)
>  	update_intr_gate(X86_TRAP_PF, async_page_fault);
>  }
>  
> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
> +			const struct flush_tlb_info *info)
> +{
> +	u8 state;
> +	int cpu;
> +	struct kvm_steal_time *src;
> +	cpumask_t flushmask;
> +
> +
> +	cpumask_copy(&flushmask, cpumask);
> +	/*
> +	 * We have to call flush only on online vCPUs. And
> +	 * queue flush_on_enter for pre-empted vCPUs
> +	 */
> +	for_each_cpu(cpu, cpumask) {
> +		src = &per_cpu(steal_time, cpu);
> +		state = src->preempted;
> +		if ((state & KVM_VCPU_PREEMPTED)) {
> +			if (cmpxchg(&src->preempted, state, state | 1 <<
> +				KVM_VCPU_SHOULD_FLUSH))
> +					cpumask_clear_cpu(cpu, &flushmask);
> +		}
> +	}
> +
> +	native_flush_tlb_others(&flushmask, info);
> +}
> +
>  void __init kvm_guest_init(void)
>  {
>  	int i;
> @@ -484,6 +511,8 @@ void __init kvm_guest_init(void)
>  		pv_time_ops.steal_clock = kvm_steal_clock;
>  	}
>  
> +	pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;

This needs to be keyed on a new CPUID feature bit.  Eduardo is also
adding a new "PV_DEDICATED" hint and you might disable PV TLB flush when
PV_DEDICATED is set.

Paolo

>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>  		apic_set_eoi_write(kvm_guest_apic_eoi_write);
>  
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 3/3] KVM: Add flush_on_enter before guest enter
  2017-11-09  2:02 ` [PATCH RESEND 3/3] KVM: Add flush_on_enter before guest enter Wanpeng Li
@ 2017-11-09 10:54   ` Paolo Bonzini
  2017-11-09 12:31     ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2017-11-09 10:54 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm; +Cc: Radim Krčmář, Wanpeng Li

On 09/11/2017 03:02, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> PV-Flush guest would indicate to flush on enter, flush the TLB before
> entering and exiting the guest.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/kvm/x86.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1ea28a2..f295360 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2116,7 +2116,13 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>  		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
>  		return;
>  
> -	vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
> +	if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
> +			(KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED))
> +		/*
> +		 * Do TLB_FLUSH before entering the guest, its passed
> +		 * the stage of request checking
> +		 */
> +		kvm_x86_ops->tlb_flush(vcpu);
>  
>  	if (vcpu->arch.st.steal.version & 1)
>  		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
> @@ -2887,7 +2893,9 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
>  	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
>  		return;
>  
> -	vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
> +	if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_PREEMPTED) ==
> +				KVM_VCPU_SHOULD_FLUSH)
> +		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);

This is not necessary.  Instead, you can just OR the KVM_VCPU_PREEMPTED
bit; record_steal_time will pick up the request and do the TLB flush later.

Also, I think this is a case where you should prefer INVVPID to INVEP.
That's because "execution of the INVEPT instruction invalidates
guest-physical mappings and combined mappings" while "execution of the
INVVPID instruction invalidates linear mappings and combined mappings".
In this case, invalidating guest-physical mapping is unnecessary.

So you could add a new bool argument to kvm_x86_ops->tlb_flush.  In
vmx.c, __vmx_flush_tlb can do invept if "enable_ept && (invalidate_gpa
|| !enable_vpid)".

If !enable_ept && !enable_vpid, the feature cannot be made available.

Thanks,

Paolo

>  	kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
>  			&vcpu->arch.st.steal.preempted,
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush
  2017-11-09 10:48   ` Paolo Bonzini
@ 2017-11-09 11:01     ` Wanpeng Li
  2017-11-09 11:02       ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2017-11-09 11:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Radim Krčmář,
	Wanpeng Li, Eduardo Valentin

2017-11-09 18:48 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 09/11/2017 03:02, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Remote flushing api's does a busy wait which is fine in bare-metal
>> scenario. But with-in the guest, the vcpus might have been pre-empted
>> or blocked. In this scenario, the initator vcpu would end up
>> busy-waiting for a long amount of time.
>>
>> This patch set implements para-virt flush tlbs making sure that it
>> does not wait for vcpus that are sleeping. And all the sleeping vcpus
>> flush the tlb on guest enter.
>>
>> The best result is achieved when we're overcommiting the host by running
>> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
>> vCPUs which are not scheduled and avoid the wait on the main CPU.
>>
>> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in
>> one linux guest.
>>
>> ebizzy -M
>>               vanilla    optimized     boost
>>  8 vCPUs       10152       10083       -0.68%
>> 16 vCPUs        1224        4866       297.5%
>> 24 vCPUs        1109        3871       249%
>> 32 vCPUs        1025        3375       229.3%
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  arch/x86/include/uapi/asm/kvm_para.h |  1 +
>>  arch/x86/kernel/kvm.c                | 29 +++++++++++++++++++++++++++++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> index ff23ce9..189e354 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -52,6 +52,7 @@ struct kvm_steal_time {
>>
>>  #define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
>>  #define KVM_VCPU_PREEMPTED          (1 << 0)
>> +#define KVM_VCPU_SHOULD_FLUSH       (1 << 1)
>>
>>  #define KVM_CLOCK_PAIRING_WALLCLOCK 0
>>  struct kvm_clock_pairing {
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 1b1b641..2e2f3ae 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void)
>>       update_intr_gate(X86_TRAP_PF, async_page_fault);
>>  }
>>
>> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>> +                     const struct flush_tlb_info *info)
>> +{
>> +     u8 state;
>> +     int cpu;
>> +     struct kvm_steal_time *src;
>> +     cpumask_t flushmask;
>> +
>> +
>> +     cpumask_copy(&flushmask, cpumask);
>> +     /*
>> +      * We have to call flush only on online vCPUs. And
>> +      * queue flush_on_enter for pre-empted vCPUs
>> +      */
>> +     for_each_cpu(cpu, cpumask) {
>> +             src = &per_cpu(steal_time, cpu);
>> +             state = src->preempted;
>> +             if ((state & KVM_VCPU_PREEMPTED)) {
>> +                     if (cmpxchg(&src->preempted, state, state | 1 <<
>> +                             KVM_VCPU_SHOULD_FLUSH))
>> +                                     cpumask_clear_cpu(cpu, &flushmask);
>> +             }
>> +     }
>> +
>> +     native_flush_tlb_others(&flushmask, info);
>> +}
>> +
>>  void __init kvm_guest_init(void)
>>  {
>>       int i;
>> @@ -484,6 +511,8 @@ void __init kvm_guest_init(void)
>>               pv_time_ops.steal_clock = kvm_steal_clock;
>>       }
>>
>> +     pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
>
> This needs to be keyed on a new CPUID feature bit.  Eduardo is also

Will do.

> adding a new "PV_DEDICATED" hint and you might disable PV TLB flush when
> PV_DEDICATED is set.

Why disable PV TLB flush for PV_DEDICATED(qspinlock)?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush
  2017-11-09 11:01     ` Wanpeng Li
@ 2017-11-09 11:02       ` Paolo Bonzini
  2017-11-09 11:08         ` Wanpeng Li
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2017-11-09 11:02 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Radim Krčmář,
	Wanpeng Li, Eduardo Valentin

On 09/11/2017 12:01, Wanpeng Li wrote:
> 2017-11-09 18:48 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>> On 09/11/2017 03:02, Wanpeng Li wrote:
>>> @@ -484,6 +511,8 @@ void __init kvm_guest_init(void)
>>>               pv_time_ops.steal_clock = kvm_steal_clock;
>>>       }
>>>
>>> +     pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
>>
>> This needs to be keyed on a new CPUID feature bit.  Eduardo is also
> 
> Will do.
> 
>> adding a new "PV_DEDICATED" hint and you might disable PV TLB flush when
>> PV_DEDICATED is set.
> 
> Why disable PV TLB flush for PV_DEDICATED(qspinlock)?

PV_DEDICATED says pretty much that it is very unlikely to have a
preempted vCPU.  Therefore, the cpumask loop is unnecessary.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush
  2017-11-09 11:02       ` Paolo Bonzini
@ 2017-11-09 11:08         ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-11-09 11:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Radim Krčmář,
	Wanpeng Li, Eduardo Valentin

2017-11-09 19:02 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 09/11/2017 12:01, Wanpeng Li wrote:
>> 2017-11-09 18:48 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>>> On 09/11/2017 03:02, Wanpeng Li wrote:
>>>> @@ -484,6 +511,8 @@ void __init kvm_guest_init(void)
>>>>               pv_time_ops.steal_clock = kvm_steal_clock;
>>>>       }
>>>>
>>>> +     pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
>>>
>>> This needs to be keyed on a new CPUID feature bit.  Eduardo is also
>>
>> Will do.
>>
>>> adding a new "PV_DEDICATED" hint and you might disable PV TLB flush when
>>> PV_DEDICATED is set.
>>
>> Why disable PV TLB flush for PV_DEDICATED(qspinlock)?
>
> PV_DEDICATED says pretty much that it is very unlikely to have a
> preempted vCPU.  Therefore, the cpumask loop is unnecessary.

Thanks for pointing out this. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 3/3] KVM: Add flush_on_enter before guest enter
  2017-11-09 10:54   ` Paolo Bonzini
@ 2017-11-09 12:31     ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-11-09 12:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, Radim Krčmář, Wanpeng Li

2017-11-09 18:54 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 09/11/2017 03:02, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> PV-Flush guest would indicate to flush on enter, flush the TLB before
>> entering and exiting the guest.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  arch/x86/kvm/x86.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 1ea28a2..f295360 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2116,7 +2116,13 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>>               &vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
>>               return;
>>
>> -     vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
>> +     if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
>> +                     (KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED))
>> +             /*
>> +              * Do TLB_FLUSH before entering the guest, its passed
>> +              * the stage of request checking
>> +              */
>> +             kvm_x86_ops->tlb_flush(vcpu);
>>
>>       if (vcpu->arch.st.steal.version & 1)
>>               vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
>> @@ -2887,7 +2893,9 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
>>       if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
>>               return;
>>
>> -     vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
>> +     if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_PREEMPTED) ==
>> +                             KVM_VCPU_SHOULD_FLUSH)
>> +             kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>
> This is not necessary.  Instead, you can just OR the KVM_VCPU_PREEMPTED
> bit; record_steal_time will pick up the request and do the TLB flush later.
>
> Also, I think this is a case where you should prefer INVVPID to INVEP.
> That's because "execution of the INVEPT instruction invalidates
> guest-physical mappings and combined mappings" while "execution of the
> INVVPID instruction invalidates linear mappings and combined mappings".
> In this case, invalidating guest-physical mapping is unnecessary.
>
> So you could add a new bool argument to kvm_x86_ops->tlb_flush.  In
> vmx.c, __vmx_flush_tlb can do invept if "enable_ept && (invalidate_gpa
> || !enable_vpid)".

Agreed.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush
  2017-11-09  2:02 ` [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush Wanpeng Li
  2017-11-09 10:48   ` Paolo Bonzini
@ 2017-11-09 15:11   ` Radim Krčmář
  1 sibling, 0 replies; 11+ messages in thread
From: Radim Krčmář @ 2017-11-09 15:11 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-11-08 18:02-0800, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
> 
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter.
> 
> The best result is achieved when we're overcommiting the host by running 
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
> vCPUs which are not scheduled and avoid the wait on the main CPU.
> 
> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
> one linux guest.
> 
> ebizzy -M 
>               vanilla    optimized     boost
>  8 vCPUs       10152       10083       -0.68% 
> 16 vCPUs        1224        4866       297.5% 
> 24 vCPUs        1109        3871       249%
> 32 vCPUs        1025        3375       229.3% 
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void)
>  	update_intr_gate(X86_TRAP_PF, async_page_fault);
>  }
>  
> +static void kvm_flush_tlb_others(const struct cpumask *cpumask,
> +			const struct flush_tlb_info *info)
> +{
> +	u8 state;
> +	int cpu;
> +	struct kvm_steal_time *src;
> +	cpumask_t flushmask;
> +
> +
> +	cpumask_copy(&flushmask, cpumask);
> +	/*
> +	 * We have to call flush only on online vCPUs. And
> +	 * queue flush_on_enter for pre-empted vCPUs
> +	 */
> +	for_each_cpu(cpu, cpumask) {
> +		src = &per_cpu(steal_time, cpu);
> +		state = src->preempted;
> +		if ((state & KVM_VCPU_PREEMPTED)) {
> +			if (cmpxchg(&src->preempted, state, state | 1 <<
> +				KVM_VCPU_SHOULD_FLUSH))

We won't be flushing unless the last argument reads 'state |
KVM_VCPU_SHOULD_FLUSH' and the result will be the original value that
should be compared with state to avoid a race that would drop running
VCPU:

  if (cmpxchg(&src->preempted, state, state | KVM_VCPU_SHOULD_FLUSH) == state)

> +					cpumask_clear_cpu(cpu, &flushmask);
> +		}
> +	}
> +
> +	native_flush_tlb_others(&flushmask, info);
> +}
> +
>  void __init kvm_guest_init(void)
>  {
>  	int i;

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-11-09 15:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09  2:02 [PATCH RESEND 0/3] KVM: Paravirt remote TLB flush Wanpeng Li
2017-11-09  2:02 ` [PATCH RESEND 1/3] KVM: Add vCPU running/preempted state Wanpeng Li
2017-11-09  2:02 ` [PATCH RESEND 2/3] KVM: Add paravirt remote TLB flush Wanpeng Li
2017-11-09 10:48   ` Paolo Bonzini
2017-11-09 11:01     ` Wanpeng Li
2017-11-09 11:02       ` Paolo Bonzini
2017-11-09 11:08         ` Wanpeng Li
2017-11-09 15:11   ` Radim Krčmář
2017-11-09  2:02 ` [PATCH RESEND 3/3] KVM: Add flush_on_enter before guest enter Wanpeng Li
2017-11-09 10:54   ` Paolo Bonzini
2017-11-09 12:31     ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).