All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown
@ 2017-12-13  1:33 Wanpeng Li
  2017-12-13  1:33 ` [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state Wanpeng Li
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13  1:33 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Peter Zijlstra, Wanpeng Li

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter. Idea was discussed here:
https://lkml.org/lkml/2012/2/20/157

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based 
page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")

Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
so 64 pCPUs, and each VM is 64 vCPUs.

ebizzy -M 
              vanilla    optimized     boost
1VM            46799       48670         4%
2VM            23962       42691        78%
3VM            16152       37539       132%

Note: The patchset is not rebased against "locking/qspinlock/x86: Avoid
   test-and-set when PV_DEDICATED is set" v3 since I can still observe a 
   little improvement for 64 vCPUs on 64 pCPUs, it is due to the system 
   is not completely isolated, there are many housekeeping tasks work
   sporadically, and vCPUs are preemted some times, I also confirm this 
   when adding some print to the kvm_flush_tlb_others. After PV_DEDICATED
   is merged, we can disable pv tlb flush when not overcommiting if it 
   is needed. 
   
v7 -> v8:
 * rebase against latest kvm/queue

v6 -> v7:
 * don't check !flushmask 
 * use arch_initcall() to achieve late allocate percpu mask

v5 -> v6:
 * fix the percpu mask 
 * rebase against latest kvm/queue

v4 -> v5:
 * flushmask instead of cpumask

v3 -> v4:
 * use READ_ONCE()
 * use try_cmpxchg instead of cmpxchg
 * add {} to if
 * no FLUSH flags to preserve during set_preempted
 * "KVM: X86" prefix to patch subject

v2 -> v3: 
 * percpu cpumask

v1 -> v2:
 * a new CPUID feature bit
 * fix cmpxchg check
 * use kvm_vcpu_flush_tlb() to get the statistics right
 * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted
 * add a new bool argument to kvm_x86_ops->tlb_flush
 * __cpumask_clear_cpu() instead of cpumask_clear_cpu()
 * not put cpumask_t on stack
 * rebase the patchset against "locking/qspinlock/x86: Avoid
   test-and-set when PV_DEDICATED is set" v3


Wanpeng Li (4):
  KVM: X86: Add vCPU running/preempted state
  KVM: X86: Add Paravirt TLB Shootdown
  KVM: X86: introduce invalidate_gpa argument to tlb flush
  KVM: X86: Add flush_on_enter before guest enter

 Documentation/virtual/kvm/cpuid.txt  |  4 +++
 arch/x86/include/asm/kvm_host.h      |  2 +-
 arch/x86/include/uapi/asm/kvm_para.h |  5 ++++
 arch/x86/kernel/kvm.c                | 49 +++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/cpuid.c                 |  3 ++-
 arch/x86/kvm/svm.c                   | 14 +++++------
 arch/x86/kvm/vmx.c                   | 21 ++++++++--------
 arch/x86/kvm/x86.c                   | 25 +++++++++++-------
 8 files changed, 94 insertions(+), 29 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state
  2017-12-13  1:33 [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
@ 2017-12-13  1:33 ` Wanpeng Li
  2017-12-13 10:20   ` David Hildenbrand
  2017-12-13  1:33 ` [PATCH v8 2/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13  1:33 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Peter Zijlstra, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch reuses the preempted field in kvm_steal_time, and will export
the vcpu running/pre-empted information to the guest from host. This will
enable guest to intelligently send ipi to running vcpus and set flag for
pre-empted vcpus. This will prevent waiting for vcpus that are not running.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/uapi/asm/kvm_para.h | 3 +++
 arch/x86/kernel/kvm.c                | 2 +-
 arch/x86/kvm/x86.c                   | 4 ++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 09cc064..763b692 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -51,6 +51,9 @@ struct kvm_steal_time {
 	__u32 pad[11];
 };
 
+#define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
+#define KVM_VCPU_PREEMPTED          (1 << 0)
+
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
 	__s64 sec;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b40ffbf..6610b92 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -643,7 +643,7 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
 {
 	struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
 
-	return !!src->preempted;
+	return !!(src->preempted & KVM_VCPU_PREEMPTED);
 }
 PV_CALLEE_SAVE_REGS_THUNK(__kvm_vcpu_is_preempted);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9212dad..cdf716a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2131,7 +2131,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
 		return;
 
-	vcpu->arch.st.steal.preempted = 0;
+	vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
 
 	if (vcpu->arch.st.steal.version & 1)
 		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
@@ -2917,7 +2917,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
-	vcpu->arch.st.steal.preempted = 1;
+	vcpu->arch.st.steal.preempted = KVM_VCPU_PREEMPTED;
 
 	kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
 			&vcpu->arch.st.steal.preempted,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 2/4] KVM: X86: Add Paravirt TLB Shootdown
  2017-12-13  1:33 [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
  2017-12-13  1:33 ` [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state Wanpeng Li
@ 2017-12-13  1:33 ` Wanpeng Li
  2017-12-13 12:33   ` Paolo Bonzini
  2017-12-13  1:33 ` [PATCH v8 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush Wanpeng Li
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13  1:33 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Peter Zijlstra, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter.

The best result is achieved when we're overcommiting the host by running
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
vCPUs which are not scheduled and avoid the wait on the main CPU.

Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
so 64 pCPUs, and each VM is 64 vCPUs.

ebizzy -M
              vanilla    optimized     boost
1VM            46799       48670         4%
2VM            23962       42691        78%
3VM            16152       37539       132%

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 Documentation/virtual/kvm/cpuid.txt  |  4 +++
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kernel/kvm.c                | 47 ++++++++++++++++++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..dcab6dc 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,10 @@ KVM_FEATURE_PV_UNHALT              ||     7 || guest checks this feature bit
                                    ||       || before enabling paravirtualized
                                    ||       || spinlock support.
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_TLB_FLUSH           ||     9 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || tlb flush.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 763b692..8fbcc16 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -25,6 +25,7 @@
 #define KVM_FEATURE_STEAL_TIME		5
 #define KVM_FEATURE_PV_EOI		6
 #define KVM_FEATURE_PV_UNHALT		7
+#define KVM_FEATURE_PV_TLB_FLUSH	9
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -53,6 +54,7 @@ struct kvm_steal_time {
 
 #define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
 #define KVM_VCPU_PREEMPTED          (1 << 0)
+#define KVM_VCPU_SHOULD_FLUSH       (1 << 1)
 
 #define KVM_CLOCK_PAIRING_WALLCLOCK 0
 struct kvm_clock_pairing {
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6610b92..73e00c4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -498,6 +498,34 @@ static void __init kvm_apf_trap_init(void)
 	update_intr_gate(X86_TRAP_PF, async_page_fault);
 }
 
+static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask);
+
+static void kvm_flush_tlb_others(const struct cpumask *cpumask,
+			const struct flush_tlb_info *info)
+{
+	u8 state;
+	int cpu;
+	struct kvm_steal_time *src;
+	struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_tlb_mask);
+
+	cpumask_copy(flushmask, cpumask);
+	/*
+	 * We have to call flush only on online vCPUs. And
+	 * queue flush_on_enter for pre-empted vCPUs
+	 */
+	for_each_cpu(cpu, flushmask) {
+		src = &per_cpu(steal_time, cpu);
+		state = READ_ONCE(src->preempted);
+		if ((state & KVM_VCPU_PREEMPTED)) {
+			if (try_cmpxchg(&src->preempted, &state,
+				state | KVM_VCPU_SHOULD_FLUSH))
+				__cpumask_clear_cpu(cpu, flushmask);
+		}
+	}
+
+	native_flush_tlb_others(flushmask, info);
+}
+
 static void __init kvm_guest_init(void)
 {
 	int i;
@@ -517,6 +545,9 @@ static void __init kvm_guest_init(void)
 		pv_time_ops.steal_clock = kvm_steal_clock;
 	}
 
+	if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH))
+		pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others;
+
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
@@ -598,6 +629,22 @@ static __init int activate_jump_labels(void)
 }
 arch_initcall(activate_jump_labels);
 
+static __init int kvm_setup_pv_tlb_flush(void)
+{
+	int cpu;
+
+	if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH)) {
+		for_each_possible_cpu(cpu) {
+			zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu),
+				GFP_KERNEL, cpu_to_node(cpu));
+		}
+		pr_info("KVM setup pv remote TLB flush\n");
+	}
+
+	return 0;
+}
+arch_initcall(kvm_setup_pv_tlb_flush);
+
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 
 /* Kick a cpu by its apicid. Used to wake up a halted vcpu */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush
  2017-12-13  1:33 [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
  2017-12-13  1:33 ` [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state Wanpeng Li
  2017-12-13  1:33 ` [PATCH v8 2/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
@ 2017-12-13  1:33 ` Wanpeng Li
  2017-12-13 13:35   ` Peter Zijlstra
  2017-12-13  1:33 ` [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter Wanpeng Li
  2017-12-13  9:42 ` [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
  4 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13  1:33 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Peter Zijlstra, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush,
it will be used by later patches to just flush guest tlb.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c              | 14 +++++++-------
 arch/x86/kvm/vmx.c              | 21 +++++++++++----------
 arch/x86/kvm/x86.c              |  6 +++---
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index df640fc..46892ba 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -968,7 +968,7 @@ struct kvm_x86_ops {
 	unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
 	void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
 
-	void (*tlb_flush)(struct kvm_vcpu *vcpu);
+	void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 
 	void (*run)(struct kvm_vcpu *vcpu);
 	int (*handle_exit)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1f3e7f2..14cca8c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -285,7 +285,7 @@ static int vgif = true;
 module_param(vgif, int, 0444);
 
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-static void svm_flush_tlb(struct kvm_vcpu *vcpu);
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
 
 static int nested_svm_exit_handled(struct vcpu_svm *svm);
@@ -2035,7 +2035,7 @@ static int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 		return 1;
 
 	if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
-		svm_flush_tlb(vcpu);
+		svm_flush_tlb(vcpu, true);
 
 	vcpu->arch.cr4 = cr4;
 	if (!npt_enabled)
@@ -2385,7 +2385,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
 
 	svm->vmcb->control.nested_cr3 = __sme_set(root);
 	mark_dirty(svm->vmcb, VMCB_NPT);
-	svm_flush_tlb(vcpu);
+	svm_flush_tlb(vcpu, true);
 }
 
 static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
@@ -2989,7 +2989,7 @@ static void enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	svm->nested.intercept_exceptions = nested_vmcb->control.intercept_exceptions;
 	svm->nested.intercept            = nested_vmcb->control.intercept;
 
-	svm_flush_tlb(&svm->vcpu);
+	svm_flush_tlb(&svm->vcpu, true);
 	svm->vmcb->control.int_ctl = nested_vmcb->control.int_ctl | V_INTR_MASKING_MASK;
 	if (nested_vmcb->control.int_ctl & V_INTR_MASKING_MASK)
 		svm->vcpu.arch.hflags |= HF_VINTR_MASK;
@@ -4785,7 +4785,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
 	return 0;
 }
 
-static void svm_flush_tlb(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
@@ -5076,7 +5076,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root)
 
 	svm->vmcb->save.cr3 = __sme_set(root);
 	mark_dirty(svm->vmcb, VMCB_CR);
-	svm_flush_tlb(vcpu);
+	svm_flush_tlb(vcpu, true);
 }
 
 static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
@@ -5090,7 +5090,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
 	svm->vmcb->save.cr3 = kvm_read_cr3(vcpu);
 	mark_dirty(svm->vmcb, VMCB_CR);
 
-	svm_flush_tlb(vcpu);
+	svm_flush_tlb(vcpu, true);
 }
 
 static int is_disabled(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ef7d13e..c179175 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4140,9 +4140,10 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 
 #endif
 
-static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
+static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
+				bool invalidate_gpa)
 {
-	if (enable_ept) {
+	if (enable_ept && (invalidate_gpa || !enable_vpid)) {
 		if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
 			return;
 		ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));
@@ -4151,15 +4152,15 @@ static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
 	}
 }
 
-static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
+static void vmx_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
-	__vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid);
+	__vmx_flush_tlb(vcpu, to_vmx(vcpu)->vpid, invalidate_gpa);
 }
 
 static void vmx_flush_tlb_ept_only(struct kvm_vcpu *vcpu)
 {
 	if (enable_ept)
-		vmx_flush_tlb(vcpu);
+		vmx_flush_tlb(vcpu, true);
 }
 
 static void vmx_decache_cr0_guest_bits(struct kvm_vcpu *vcpu)
@@ -4357,7 +4358,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 		ept_load_pdptrs(vcpu);
 	}
 
-	vmx_flush_tlb(vcpu);
+	vmx_flush_tlb(vcpu, true);
 	vmcs_writel(GUEST_CR3, guest_cr3);
 }
 
@@ -7932,7 +7933,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu)
 		return kvm_skip_emulated_instruction(vcpu);
 	}
 
-	__vmx_flush_tlb(vcpu, vmx->nested.vpid02);
+	__vmx_flush_tlb(vcpu, vmx->nested.vpid02, true);
 	nested_vmx_succeed(vcpu);
 
 	return kvm_skip_emulated_instruction(vcpu);
@@ -10614,11 +10615,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->nested.vpid02);
 			if (vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
 				vmx->nested.last_vpid = vmcs12->virtual_processor_id;
-				__vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02);
+				__vmx_flush_tlb(vcpu, to_vmx(vcpu)->nested.vpid02, true);
 			}
 		} else {
 			vmcs_write16(VIRTUAL_PROCESSOR_ID, vmx->vpid);
-			vmx_flush_tlb(vcpu);
+			vmx_flush_tlb(vcpu, true);
 		}
 
 	}
@@ -11323,7 +11324,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 		 * L1's vpid. TODO: move to a more elaborate solution, giving
 		 * each L2 its own vpid and exposing the vpid feature to L1.
 		 */
-		vmx_flush_tlb(vcpu);
+		vmx_flush_tlb(vcpu, true);
 	}
 	/* Restore posted intr vector. */
 	if (nested_cpu_has_posted_intr(vmcs12))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cdf716a..6c7b2bc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6791,10 +6791,10 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 	kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);
 }
 
-static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
+static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
 {
 	++vcpu->stat.tlb_flush;
-	kvm_x86_ops->tlb_flush(vcpu);
+	kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
 }
 
 void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
@@ -6865,7 +6865,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
 			kvm_mmu_sync_roots(vcpu);
 		if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
-			kvm_vcpu_flush_tlb(vcpu);
+			kvm_vcpu_flush_tlb(vcpu, true);
 		if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
 			r = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter
  2017-12-13  1:33 [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
                   ` (2 preceding siblings ...)
  2017-12-13  1:33 ` [PATCH v8 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush Wanpeng Li
@ 2017-12-13  1:33 ` Wanpeng Li
  2017-12-13 12:54   ` Peter Zijlstra
  2017-12-13 13:35   ` Peter Zijlstra
  2017-12-13  9:42 ` [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
  4 siblings, 2 replies; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13  1:33 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Peter Zijlstra, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

PV-Flush guest would indicate to flush on enter, flush the TLB before
entering the guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/x86.c   | 21 ++++++++++++++-------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3db9e1c..a0e6c97 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -602,7 +602,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_ASYNC_PF) |
 			     (1 << KVM_FEATURE_PV_EOI) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-			     (1 << KVM_FEATURE_PV_UNHALT);
+			     (1 << KVM_FEATURE_PV_UNHALT) |
+			     (1 << KVM_FEATURE_PV_TLB_FLUSH);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c7b2bc..1c5c7a3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2122,6 +2122,12 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.pv_time_enabled = false;
 }
 
+static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
+{
+	++vcpu->stat.tlb_flush;
+	kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
+}
+
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
@@ -2131,7 +2137,14 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
 		return;
 
-	vcpu->arch.st.steal.preempted = KVM_VCPU_NOT_PREEMPTED;
+	if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
+			(KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED)) {
+		/*
+		 * Do TLB_FLUSH before entering the guest, its passed
+		 * the stage of request checking
+		 */
+		kvm_vcpu_flush_tlb(vcpu, false);
+	}
 
 	if (vcpu->arch.st.steal.version & 1)
 		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
@@ -6791,12 +6804,6 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 	kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);
 }
 
-static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
-{
-	++vcpu->stat.tlb_flush;
-	kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
-}
-
 void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
 		unsigned long start, unsigned long end)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown
  2017-12-13  1:33 [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
                   ` (3 preceding siblings ...)
  2017-12-13  1:33 ` [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter Wanpeng Li
@ 2017-12-13  9:42 ` Wanpeng Li
  2017-12-13 12:47   ` Paolo Bonzini
  4 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13  9:42 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Peter Zijlstra, Wanpeng Li

Paolo, is it ok to apply this version? :)
2017-12-13 9:33 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
>
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter. Idea was discussed here:
> https://lkml.org/lkml/2012/2/20/157
>
> The best result is achieved when we're overcommiting the host by running
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
> vCPUs which are not scheduled and avoid the wait on the main CPU.
>
> In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based
> page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
>
> Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
> so 64 pCPUs, and each VM is 64 vCPUs.
>
> ebizzy -M
>               vanilla    optimized     boost
> 1VM            46799       48670         4%
> 2VM            23962       42691        78%
> 3VM            16152       37539       132%
>
> Note: The patchset is not rebased against "locking/qspinlock/x86: Avoid
>    test-and-set when PV_DEDICATED is set" v3 since I can still observe a
>    little improvement for 64 vCPUs on 64 pCPUs, it is due to the system
>    is not completely isolated, there are many housekeeping tasks work
>    sporadically, and vCPUs are preemted some times, I also confirm this
>    when adding some print to the kvm_flush_tlb_others. After PV_DEDICATED
>    is merged, we can disable pv tlb flush when not overcommiting if it
>    is needed.
>
> v7 -> v8:
>  * rebase against latest kvm/queue
>
> v6 -> v7:
>  * don't check !flushmask
>  * use arch_initcall() to achieve late allocate percpu mask
>
> v5 -> v6:
>  * fix the percpu mask
>  * rebase against latest kvm/queue
>
> v4 -> v5:
>  * flushmask instead of cpumask
>
> v3 -> v4:
>  * use READ_ONCE()
>  * use try_cmpxchg instead of cmpxchg
>  * add {} to if
>  * no FLUSH flags to preserve during set_preempted
>  * "KVM: X86" prefix to patch subject
>
> v2 -> v3:
>  * percpu cpumask
>
> v1 -> v2:
>  * a new CPUID feature bit
>  * fix cmpxchg check
>  * use kvm_vcpu_flush_tlb() to get the statistics right
>  * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted
>  * add a new bool argument to kvm_x86_ops->tlb_flush
>  * __cpumask_clear_cpu() instead of cpumask_clear_cpu()
>  * not put cpumask_t on stack
>  * rebase the patchset against "locking/qspinlock/x86: Avoid
>    test-and-set when PV_DEDICATED is set" v3
>
>
> Wanpeng Li (4):
>   KVM: X86: Add vCPU running/preempted state
>   KVM: X86: Add Paravirt TLB Shootdown
>   KVM: X86: introduce invalidate_gpa argument to tlb flush
>   KVM: X86: Add flush_on_enter before guest enter
>
>  Documentation/virtual/kvm/cpuid.txt  |  4 +++
>  arch/x86/include/asm/kvm_host.h      |  2 +-
>  arch/x86/include/uapi/asm/kvm_para.h |  5 ++++
>  arch/x86/kernel/kvm.c                | 49 +++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/cpuid.c                 |  3 ++-
>  arch/x86/kvm/svm.c                   | 14 +++++------
>  arch/x86/kvm/vmx.c                   | 21 ++++++++--------
>  arch/x86/kvm/x86.c                   | 25 +++++++++++-------
>  8 files changed, 94 insertions(+), 29 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state
  2017-12-13  1:33 ` [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state Wanpeng Li
@ 2017-12-13 10:20   ` David Hildenbrand
  2017-12-13 11:38     ` Wanpeng Li
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2017-12-13 10:20 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Peter Zijlstra, Wanpeng Li

On 13.12.2017 02:33, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> This patch reuses the preempted field in kvm_steal_time, and will export
> the vcpu running/pre-empted information to the guest from host. This will
> enable guest to intelligently send ipi to running vcpus and set flag for
> pre-empted vcpus. This will prevent waiting for vcpus that are not running.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
>  arch/x86/include/uapi/asm/kvm_para.h | 3 +++
>  arch/x86/kernel/kvm.c                | 2 +-
>  arch/x86/kvm/x86.c                   | 4 ++--
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 09cc064..763b692 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -51,6 +51,9 @@ struct kvm_steal_time {
>  	__u32 pad[11];
>  };
>  
> +#define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
> +#define KVM_VCPU_PREEMPTED          (1 << 0)

Is it really helpful to have two flags?

Just use KVM_VCPU_PREEMPTED  and clear that one in record_steal_time()


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state
  2017-12-13 10:20   ` David Hildenbrand
@ 2017-12-13 11:38     ` Wanpeng Li
  2017-12-13 11:45       ` David Hildenbrand
  0 siblings, 1 reply; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13 11:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář,
	Peter Zijlstra, Wanpeng Li

2017-12-13 18:20 GMT+08:00 David Hildenbrand <david@redhat.com>:
> On 13.12.2017 02:33, Wanpeng Li wrote:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> This patch reuses the preempted field in kvm_steal_time, and will export
>> the vcpu running/pre-empted information to the guest from host. This will
>> enable guest to intelligently send ipi to running vcpus and set flag for
>> pre-empted vcpus. This will prevent waiting for vcpus that are not running.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>>  arch/x86/include/uapi/asm/kvm_para.h | 3 +++
>>  arch/x86/kernel/kvm.c                | 2 +-
>>  arch/x86/kvm/x86.c                   | 4 ++--
>>  3 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> index 09cc064..763b692 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -51,6 +51,9 @@ struct kvm_steal_time {
>>       __u32 pad[11];
>>  };
>>
>> +#define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
>> +#define KVM_VCPU_PREEMPTED          (1 << 0)
>
> Is it really helpful to have two flags?
>
> Just use KVM_VCPU_PREEMPTED  and clear that one in record_steal_time()

I think it is fine since there is a third flag introduced in patch
2/4, it is more clear currently.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state
  2017-12-13 11:38     ` Wanpeng Li
@ 2017-12-13 11:45       ` David Hildenbrand
  2017-12-13 12:31         ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: David Hildenbrand @ 2017-12-13 11:45 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář,
	Peter Zijlstra, Wanpeng Li

On 13.12.2017 12:38, Wanpeng Li wrote:
> 2017-12-13 18:20 GMT+08:00 David Hildenbrand <david@redhat.com>:
>> On 13.12.2017 02:33, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> This patch reuses the preempted field in kvm_steal_time, and will export
>>> the vcpu running/pre-empted information to the guest from host. This will
>>> enable guest to intelligently send ipi to running vcpus and set flag for
>>> pre-empted vcpus. This will prevent waiting for vcpus that are not running.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> ---
>>>  arch/x86/include/uapi/asm/kvm_para.h | 3 +++
>>>  arch/x86/kernel/kvm.c                | 2 +-
>>>  arch/x86/kvm/x86.c                   | 4 ++--
>>>  3 files changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>>> index 09cc064..763b692 100644
>>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>>> @@ -51,6 +51,9 @@ struct kvm_steal_time {
>>>       __u32 pad[11];
>>>  };
>>>
>>> +#define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
>>> +#define KVM_VCPU_PREEMPTED          (1 << 0)
>>
>> Is it really helpful to have two flags?
>>
>> Just use KVM_VCPU_PREEMPTED  and clear that one in record_steal_time()
> 
> I think it is fine since there is a third flag introduced in patch
> 2/4, it is more clear currently.
> 
> Regards,
> Wanpeng Li
> 

Having two flags representing the same thing is not clear to me.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state
  2017-12-13 11:45       ` David Hildenbrand
@ 2017-12-13 12:31         ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-12-13 12:31 UTC (permalink / raw)
  To: David Hildenbrand, Wanpeng Li
  Cc: linux-kernel, kvm, Radim Krčmář,
	Peter Zijlstra, Wanpeng Li

On 13/12/2017 12:45, David Hildenbrand wrote:
> On 13.12.2017 12:38, Wanpeng Li wrote:
>> 2017-12-13 18:20 GMT+08:00 David Hildenbrand <david@redhat.com>:
>>> On 13.12.2017 02:33, Wanpeng Li wrote:
>>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>
>>>> This patch reuses the preempted field in kvm_steal_time, and will export
>>>> the vcpu running/pre-empted information to the guest from host. This will
>>>> enable guest to intelligently send ipi to running vcpus and set flag for
>>>> pre-empted vcpus. This will prevent waiting for vcpus that are not running.
>>>>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>> ---
>>>>  arch/x86/include/uapi/asm/kvm_para.h | 3 +++
>>>>  arch/x86/kernel/kvm.c                | 2 +-
>>>>  arch/x86/kvm/x86.c                   | 4 ++--
>>>>  3 files changed, 6 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>>>> index 09cc064..763b692 100644
>>>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>>>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>>>> @@ -51,6 +51,9 @@ struct kvm_steal_time {
>>>>       __u32 pad[11];
>>>>  };
>>>>
>>>> +#define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
>>>> +#define KVM_VCPU_PREEMPTED          (1 << 0)
>>>
>>> Is it really helpful to have two flags?
>>>
>>> Just use KVM_VCPU_PREEMPTED  and clear that one in record_steal_time()
>>
>> I think it is fine since there is a third flag introduced in patch
>> 2/4, it is more clear currently.
>>
>> Regards,
>> Wanpeng Li
>>
> 
> Having two flags representing the same thing is not clear to me.

I agree that KVM_VCPU_NOT_PREEMPTED is not particularly necessary, but
it is not correct to clear KVM_VCPU_PREEMPTED; instead, the entire field
must be cleared to zero.

Also, this patch is not justified very well by the commit message.  A
better wording would be:

The next patch will add another bit to the preempted field in
kvm_steal_time.  Define a constant for bit 0 (the only one that is
currently used).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 2/4] KVM: X86: Add Paravirt TLB Shootdown
  2017-12-13  1:33 ` [PATCH v8 2/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
@ 2017-12-13 12:33   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-12-13 12:33 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Peter Zijlstra, Wanpeng Li

On 13/12/2017 02:33, Wanpeng Li wrote:
> @@ -53,6 +54,7 @@ struct kvm_steal_time {
>  
>  #define KVM_VCPU_NOT_PREEMPTED      (0 << 0)
>  #define KVM_VCPU_PREEMPTED          (1 << 0)
> +#define KVM_VCPU_SHOULD_FLUSH       (1 << 1)
>  
>  #define KVM_CLOCK_PAIRING_WALLCLOCK 0

Two issues with the name:

1) it doesn't say *what* is flushed

2) in the usual meaning of "should", it would be just a recommendation
i.e. it's fine not to flush the TLB.  This is not true in this case, so
KVM_VCPU_FLUSH_TLB is better.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown
  2017-12-13  9:42 ` [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
@ 2017-12-13 12:47   ` Paolo Bonzini
  2017-12-13 13:13     ` Wanpeng Li
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2017-12-13 12:47 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Peter Zijlstra, Wanpeng Li

On 13/12/2017 10:42, Wanpeng Li wrote:
> Paolo, is it ok to apply this version? 

Take a look at the changes in kvm/queue, but yes. :)

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter
  2017-12-13  1:33 ` [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter Wanpeng Li
@ 2017-12-13 12:54   ` Peter Zijlstra
  2017-12-13 13:32     ` Paolo Bonzini
  2017-12-13 13:35   ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2017-12-13 12:54 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář,
	Wanpeng Li

On Tue, Dec 12, 2017 at 05:33:04PM -0800, Wanpeng Li wrote:
> +static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
> +{
> +	++vcpu->stat.tlb_flush;
> +	kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);

WRT to PTI; how much does this actually invalidate? Does this invalidate
the _entire_ guest TLB, or only the current guest PCID?

> +}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown
  2017-12-13 12:47   ` Paolo Bonzini
@ 2017-12-13 13:13     ` Wanpeng Li
  0 siblings, 0 replies; 18+ messages in thread
From: Wanpeng Li @ 2017-12-13 13:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, kvm, Radim Krčmář,
	Peter Zijlstra, Wanpeng Li

2017-12-13 20:47 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
> On 13/12/2017 10:42, Wanpeng Li wrote:
>> Paolo, is it ok to apply this version?
>
> Take a look at the changes in kvm/queue, but yes. :)

Thanks for the nice patch descriptions and updates. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter
  2017-12-13 12:54   ` Peter Zijlstra
@ 2017-12-13 13:32     ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-12-13 13:32 UTC (permalink / raw)
  To: Peter Zijlstra, Wanpeng Li
  Cc: linux-kernel, kvm, Radim Krčmář, Wanpeng Li

On 13/12/2017 13:54, Peter Zijlstra wrote:
>> +	++vcpu->stat.tlb_flush;
>> +	kvm_x86_ops->tlb_flush(vcpu, invalidate_gpa);
> 
> WRT to PTI; how much does this actually invalidate? Does this invalidate
> the _entire_ guest TLB, or only the current guest PCID?

All the PCIDs.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush
  2017-12-13  1:33 ` [PATCH v8 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush Wanpeng Li
@ 2017-12-13 13:35   ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2017-12-13 13:35 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář,
	Wanpeng Li

On Tue, Dec 12, 2017 at 05:33:03PM -0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush,
> it will be used by later patches to just flush guest tlb.

As opposed to what? Will it now also flush host TLB? Why would it ever
want to flush host TLBs?

> @@ -4785,7 +4785,7 @@ static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
>  	return 0;
>  }
>  
> -static void svm_flush_tlb(struct kvm_vcpu *vcpu)
> +static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  

So this is a no-op for SVM.

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ef7d13e..c179175 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4140,9 +4140,10 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
>  
>  #endif
>  
> -static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid)
> +static inline void __vmx_flush_tlb(struct kvm_vcpu *vcpu, int vpid,
> +				bool invalidate_gpa)
>  {
> -	if (enable_ept) {
> +	if (enable_ept && (invalidate_gpa || !enable_vpid)) {
>  		if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
>  			return;
>  		ept_sync_context(construct_eptp(vcpu, vcpu->arch.mmu.root_hpa));

And for EPT you explicitly fall back to INVPVID when !gpa.

Why?

This really needs a better changelog; this is incomprehensible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter
  2017-12-13  1:33 ` [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter Wanpeng Li
  2017-12-13 12:54   ` Peter Zijlstra
@ 2017-12-13 13:35   ` Peter Zijlstra
  2017-12-13 13:37     ` Paolo Bonzini
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2017-12-13 13:35 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář,
	Wanpeng Li

On Tue, Dec 12, 2017 at 05:33:04PM -0800, Wanpeng Li wrote:
> +	if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
> +			(KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED)) {
> +		/*
> +		 * Do TLB_FLUSH before entering the guest, its passed
> +		 * the stage of request checking
> +		 */
> +		kvm_vcpu_flush_tlb(vcpu, false);
> +	}

And not a single word, _anywhere_ on why this doesn't need
invalidate_gpa or wtf that actually is.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter
  2017-12-13 13:35   ` Peter Zijlstra
@ 2017-12-13 13:37     ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2017-12-13 13:37 UTC (permalink / raw)
  To: Peter Zijlstra, Wanpeng Li
  Cc: linux-kernel, kvm, Radim Krčmář, Wanpeng Li

On 13/12/2017 14:35, Peter Zijlstra wrote:
>> +	if (xchg(&vcpu->arch.st.steal.preempted, KVM_VCPU_NOT_PREEMPTED) ==
>> +			(KVM_VCPU_SHOULD_FLUSH | KVM_VCPU_PREEMPTED)) {
>> +		/*
>> +		 * Do TLB_FLUSH before entering the guest, its passed
>> +		 * the stage of request checking
>> +		 */
>> +		kvm_vcpu_flush_tlb(vcpu, false);
>> +	}
> And not a single word, _anywhere_ on why this doesn't need
> invalidate_gpa or wtf that actually is.

FWIW I've fixed the changelogs in kvm/queue.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-12-13 13:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-13  1:33 [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
2017-12-13  1:33 ` [PATCH v8 1/4] KVM: X86: Add vCPU running/preempted state Wanpeng Li
2017-12-13 10:20   ` David Hildenbrand
2017-12-13 11:38     ` Wanpeng Li
2017-12-13 11:45       ` David Hildenbrand
2017-12-13 12:31         ` Paolo Bonzini
2017-12-13  1:33 ` [PATCH v8 2/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
2017-12-13 12:33   ` Paolo Bonzini
2017-12-13  1:33 ` [PATCH v8 3/4] KVM: X86: introduce invalidate_gpa argument to tlb flush Wanpeng Li
2017-12-13 13:35   ` Peter Zijlstra
2017-12-13  1:33 ` [PATCH v8 4/4] KVM: X86: Add flush_on_enter before guest enter Wanpeng Li
2017-12-13 12:54   ` Peter Zijlstra
2017-12-13 13:32     ` Paolo Bonzini
2017-12-13 13:35   ` Peter Zijlstra
2017-12-13 13:37     ` Paolo Bonzini
2017-12-13  9:42 ` [PATCH v8 0/4] KVM: X86: Add Paravirt TLB Shootdown Wanpeng Li
2017-12-13 12:47   ` Paolo Bonzini
2017-12-13 13:13     ` Wanpeng Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.