[PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting
@ 2022-03-25 13:58 Wanpeng Li
  2022-03-25 13:58 ` [PATCH RESEND 1/5] KVM: X86: Add MSR_KVM_PREEMPT_COUNT support Wanpeng Li
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Wanpeng Li @ 2022-03-25 13:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

The missing semantic gap that occurs when a guest OS is preempted 
when executing its own critical section, this leads to degradation 
of application scalability. We try to bridge this semantic gap in 
some ways, by passing guest preempt_count to the host and checking 
guest irq disable state, the hypervisor now knows whether guest 
OSes are running in the critical section, the hypervisor yield-on-spin 
heuristics can be more smart this time to boost the vCPU candidate 
who is in the critical section to mitigate this preemption problem, 
in addition, it is more likely to be a potential lock holder.

Testing on 96 HT 2 socket Xeon CLX server, with 96 vCPUs VM 100GB RAM,
one VM running benchmark, the other(none-2) VMs running cpu-bound 
workloads, There is no performance regression for other benchmarks 
like Unixbench etc.

1VM
            vanilla    optimized    improved

hackbench -l 50000
              28         21.45        30.5%
ebizzy -M
             12189       12354        1.4%
dbench
             712 MB/sec  722 MB/sec   1.4%

2VM:
            vanilla    optimized    improved

hackbench -l 10000
              29.4        26          13%
ebizzy -M
             3834        4033          5%
dbench
           42.3 MB/sec  44.1 MB/sec   4.3%

3VM:
            vanilla    optimized    improved

hackbench -l 10000
              47         35.46        33%
ebizzy -M
	     3828        4031         5%
dbench 
           30.5 MB/sec  31.16 MB/sec  2.3%

Wanpeng Li (5):
  KVM: X86: Add MSR_KVM_PREEMPT_COUNT support
  KVM: X86: Add guest interrupt disable state support
  KVM: X86: Boost vCPU which is in critical section
  x86/kvm: Add MSR_KVM_PREEMPT_COUNT guest support
  KVM: X86: Expose PREEMT_COUNT CPUID feature bit to guest

 Documentation/virt/kvm/cpuid.rst     |  3 ++
 arch/x86/include/asm/kvm_host.h      |  7 ++++
 arch/x86/include/uapi/asm/kvm_para.h |  2 +
 arch/x86/kernel/kvm.c                | 10 +++++
 arch/x86/kvm/cpuid.c                 |  3 +-
 arch/x86/kvm/x86.c                   | 60 ++++++++++++++++++++++++++++
 include/linux/kvm_host.h             |  1 +
 virt/kvm/kvm_main.c                  |  7 ++++
 8 files changed, 92 insertions(+), 1 deletion(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH RESEND 1/5] KVM: X86: Add MSR_KVM_PREEMPT_COUNT support
  2022-03-25 13:58 [PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting Wanpeng Li
@ 2022-03-25 13:58 ` Wanpeng Li
  2022-03-25 13:58 ` [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support Wanpeng Li
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2022-03-25 13:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

x86 preempt_count is per-cpu, any non-zero value for preempt_count
indicates that either preemption has been disabled explicitly or the 
CPU is currently servicing some sort of interrupt. The guest will 
pass this value to the hypervisor, so the hypervisor knows whether 
the guest is running in the critical section.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/include/asm/kvm_host.h      |  6 +++++
 arch/x86/include/uapi/asm/kvm_para.h |  2 ++
 arch/x86/kvm/x86.c                   | 35 ++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f72e80178ffc..50f011a7445a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -856,6 +856,12 @@ struct kvm_vcpu_arch {
 
 	u64 msr_kvm_poll_control;
 
+	struct {
+		u64 msr_val;
+		bool preempt_count_enabled;
+		struct gfn_to_hva_cache preempt_count_cache;
+	} pv_pc;
+
 	/*
 	 * Indicates the guest is trying to write a gfn that contains one or
 	 * more of the PTEs used to translate the write itself, i.e. the access
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..f99fa4407604 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -36,6 +36,7 @@
 #define KVM_FEATURE_MSI_EXT_DEST_ID	15
 #define KVM_FEATURE_HC_MAP_GPA_RANGE	16
 #define KVM_FEATURE_MIGRATION_CONTROL	17
+#define KVM_FEATURE_PREEMPT_COUNT	18
 
 #define KVM_HINTS_REALTIME      0
 
@@ -58,6 +59,7 @@
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
 #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
+#define MSR_KVM_PREEMPT_COUNT	0x4b564d09
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 51106d32f04e..af75e273cb32 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1456,6 +1456,7 @@ static const u32 emulated_msrs_all[] = {
 
 	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
 	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
+	MSR_KVM_PREEMPT_COUNT,
 
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSC_DEADLINE,
@@ -3433,6 +3434,25 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 	mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
 }
 
+static int kvm_pv_enable_preempt_count(struct kvm_vcpu *vcpu, u64 data)
+{
+	u64 addr = data & ~KVM_MSR_ENABLED;
+	struct gfn_to_hva_cache *ghc = &vcpu->arch.pv_pc.preempt_count_cache;
+
+	vcpu->arch.pv_pc.preempt_count_enabled = false;
+	vcpu->arch.pv_pc.msr_val = data;
+
+	if (!(data & KVM_MSR_ENABLED))
+		return 0;
+
+	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, addr, sizeof(int)))
+		return 1;
+
+	vcpu->arch.pv_pc.preempt_count_enabled = true;
+
+	return 0;
+}
+
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	bool pr = false;
@@ -3652,6 +3672,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.msr_kvm_poll_control = data;
 		break;
 
+	case MSR_KVM_PREEMPT_COUNT:
+		if (!guest_pv_has(vcpu, KVM_FEATURE_PREEMPT_COUNT))
+			return 1;
+
+		if (kvm_pv_enable_preempt_count(vcpu, data))
+			return 1;
+		break;
+
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -3992,6 +4020,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		msr_info->data = vcpu->arch.msr_kvm_poll_control;
 		break;
+	case MSR_KVM_PREEMPT_COUNT:
+		if (!guest_pv_has(vcpu, KVM_FEATURE_PREEMPT_COUNT))
+			return 1;
+
+		msr_info->data = vcpu->arch.pv_pc.msr_val;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
@@ -11190,6 +11224,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.pending_external_vector = -1;
 	vcpu->arch.preempted_in_kernel = false;
+	vcpu->arch.pv_pc.preempt_count_enabled = false;
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	vcpu->arch.hv_root_tdp = INVALID_PAGE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support
  2022-03-25 13:58 [PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting Wanpeng Li
  2022-03-25 13:58 ` [PATCH RESEND 1/5] KVM: X86: Add MSR_KVM_PREEMPT_COUNT support Wanpeng Li
@ 2022-03-25 13:58 ` Wanpeng Li
  2022-03-30  0:04   ` Sean Christopherson
  2022-03-25 13:58 ` [PATCH RESEND 3/5] KVM: X86: Boost vCPU which is in critical section Wanpeng Li
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2022-03-25 13:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

Let's get the information whether or not guests disable interruptions.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/x86.c              | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 50f011a7445a..8e05cbfa9827 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -861,6 +861,7 @@ struct kvm_vcpu_arch {
 		bool preempt_count_enabled;
 		struct gfn_to_hva_cache preempt_count_cache;
 	} pv_pc;
+	bool irq_disabled;
 
 	/*
 	 * Indicates the guest is trying to write a gfn that contains one or
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af75e273cb32..425fd7f38fa9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4576,6 +4576,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	static_call(kvm_x86_vcpu_load)(vcpu, cpu);
 
+	vcpu->arch.irq_disabled = false;
 	/* Save host pkru register if supported */
 	vcpu->arch.host_pkru = read_pkru();
 
@@ -4668,6 +4669,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 	static_call(kvm_x86_vcpu_put)(vcpu);
 	vcpu->arch.last_host_tsc = rdtsc();
+	vcpu->arch.irq_disabled = !static_call(kvm_x86_get_if_flag)(vcpu);
 }
 
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
@@ -11225,6 +11227,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.pending_external_vector = -1;
 	vcpu->arch.preempted_in_kernel = false;
 	vcpu->arch.pv_pc.preempt_count_enabled = false;
+	vcpu->arch.irq_disabled = false;
 
 #if IS_ENABLED(CONFIG_HYPERV)
 	vcpu->arch.hv_root_tdp = INVALID_PAGE;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RESEND 3/5] KVM: X86: Boost vCPU which is in critical section
  2022-03-25 13:58 [PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting Wanpeng Li
  2022-03-25 13:58 ` [PATCH RESEND 1/5] KVM: X86: Add MSR_KVM_PREEMPT_COUNT support Wanpeng Li
  2022-03-25 13:58 ` [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support Wanpeng Li
@ 2022-03-25 13:58 ` Wanpeng Li
  2022-03-30  0:07   ` Sean Christopherson
  2022-03-25 13:58 ` [PATCH RESEND 4/5] x86/kvm: Add MSR_KVM_PREEMPT_COUNT guest support Wanpeng Li
  2022-03-25 13:58 ` [PATCH RESEND 5/5] KVM: X86: Expose PREEMT_COUNT CPUID feature bit to guest Wanpeng Li
  4 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2022-03-25 13:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

The missing semantic gap that occurs when a guest OS is preempted 
when executing its own critical section, this leads to degradation 
of application scalability. We try to bridge this semantic gap in 
some ways, by passing guest preempt_count to the host and checking 
guest irq disable state, the hypervisor now knows whether guest 
OSes are running in the critical section, the hypervisor yield-on-spin 
heuristics can be more smart this time to boost the vCPU candidate 
who is in the critical section to mitigate this preemption problem, 
in addition, it is more likely to be a potential lock holder.

Testing on 96 HT 2 socket Xeon CLX server, with 96 vCPUs VM 100GB RAM,
one VM running benchmark, the other(none-2) VMs running cpu-bound 
workloads, There is no performance regression for other benchmarks 
like Unixbench etc.

1VM
            vanilla    optimized    improved

hackbench -l 50000
              28         21.45        30.5%
ebizzy -M
             12189       12354        1.4%
dbench
             712 MB/sec  722 MB/sec   1.4%

2VM:
            vanilla    optimized    improved

hackbench -l 10000
              29.4        26          13%
ebizzy -M
             3834        4033          5%
dbench
           42.3 MB/sec  44.1 MB/sec   4.3%

3VM:
            vanilla    optimized    improved

hackbench -l 10000
              47         35.46        33%
ebizzy -M
	     3828        4031         5%
dbench 
           30.5 MB/sec  31.16 MB/sec  2.3%

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/x86.c       | 22 ++++++++++++++++++++++
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c      |  7 +++++++
 3 files changed, 30 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 425fd7f38fa9..6b300496bbd0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10375,6 +10375,28 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 	return r;
 }
 
+static int kvm_vcpu_non_preemptable(struct kvm_vcpu *vcpu)
+{
+	int count;
+
+	if (!vcpu->arch.pv_pc.preempt_count_enabled)
+		return 0;
+
+	if (!kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_pc.preempt_count_cache,
+	    &count, sizeof(int)))
+		return (count & ~PREEMPT_NEED_RESCHED);
+
+	return 0;
+}
+
+bool kvm_arch_boost_candidate(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.irq_disabled || kvm_vcpu_non_preemptable(vcpu))
+		return true;
+
+	return false;
+}
+
 static inline int complete_emulated_io(struct kvm_vcpu *vcpu)
 {
 	int r;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 252ee4a61b58..9f1a7d9540de 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1428,6 +1428,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+bool kvm_arch_boost_candidate(struct kvm_vcpu *vcpu);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9581a24c3d17..ee5a788892e0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3545,6 +3545,11 @@ bool __weak kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu)
 	return false;
 }
 
+bool __weak kvm_arch_boost_candidate(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 {
 	struct kvm *kvm = me->kvm;
@@ -3580,6 +3585,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
 			    !kvm_arch_dy_has_pending_interrupt(vcpu) &&
 			    !kvm_arch_vcpu_in_kernel(vcpu))
 				continue;
+			if (!kvm_arch_boost_candidate(vcpu))
+				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RESEND 4/5] x86/kvm: Add MSR_KVM_PREEMPT_COUNT guest support
  2022-03-25 13:58 [PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting Wanpeng Li
                   ` (2 preceding siblings ...)
  2022-03-25 13:58 ` [PATCH RESEND 3/5] KVM: X86: Boost vCPU which is in critical section Wanpeng Li
@ 2022-03-25 13:58 ` Wanpeng Li
  2022-03-25 13:58 ` [PATCH RESEND 5/5] KVM: X86: Expose PREEMT_COUNT CPUID feature bit to guest Wanpeng Li
  4 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2022-03-25 13:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

The x86 guest passes the per-cpu preempt_count value to the hypervisor,
so the hypervisor knows whether the guest is running in the critical 
section.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kernel/kvm.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 21933095a10e..e389fa4393ae 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -366,6 +366,14 @@ static void kvm_guest_cpu_init(void)
 
 	if (has_steal_clock)
 		kvm_register_steal_time();
+
+	if (kvm_para_has_feature(KVM_FEATURE_PREEMPT_COUNT)) {
+		u64 pa = slow_virt_to_phys(this_cpu_ptr(&__preempt_count))
+			| KVM_MSR_ENABLED;
+		wrmsrl(MSR_KVM_PREEMPT_COUNT, pa);
+
+		pr_debug("setup pv preempt_count: cpu %d\n", smp_processor_id());
+	}
 }
 
 static void kvm_pv_disable_apf(void)
@@ -442,6 +450,8 @@ static void kvm_guest_cpu_offline(bool shutdown)
 	if (!shutdown)
 		apf_task_wake_all();
 	kvmclock_disable();
+	if (kvm_para_has_feature(KVM_FEATURE_PREEMPT_COUNT))
+		wrmsrl(MSR_KVM_PREEMPT_COUNT, 0);
 }
 
 static int kvm_cpu_online(unsigned int cpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RESEND 5/5] KVM: X86: Expose PREEMT_COUNT CPUID feature bit to guest
  2022-03-25 13:58 [PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting Wanpeng Li
                   ` (3 preceding siblings ...)
  2022-03-25 13:58 ` [PATCH RESEND 4/5] x86/kvm: Add MSR_KVM_PREEMPT_COUNT guest support Wanpeng Li
@ 2022-03-25 13:58 ` Wanpeng Li
  4 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2022-03-25 13:58 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

Expose the PREEMPT_COUNT feature bit to the guest, the guest can check this
feature bit before using MSR_KVM_PREEMPT_COUNT.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 Documentation/virt/kvm/cpuid.rst | 3 +++
 arch/x86/kvm/cpuid.c             | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index bda3e3e737d7..c45158af98a7 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -103,6 +103,9 @@ KVM_FEATURE_HC_MAP_GPA_RANGE       16          guest checks this feature bit bef
 KVM_FEATURE_MIGRATION_CONTROL      17          guest checks this feature bit before
                                                using MSR_KVM_MIGRATION_CONTROL
 
+KVM_FEATURE_PREEMPT_COUNT          18          guest checks this feature bit before
+                                               using MSR_KVM_PREEMPT_COUNT
+
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                                per-cpu warps are expected in
                                                kvmclock
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 58b0b4e0263c..4785f5a63d8d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1071,7 +1071,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
 			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
-			     (1 << KVM_FEATURE_ASYNC_PF_INT);
+			     (1 << KVM_FEATURE_ASYNC_PF_INT) |
+			     (1 << KVM_FEATURE_PREEMPT_COUNT);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support
  2022-03-25 13:58 ` [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support Wanpeng Li
@ 2022-03-30  0:04   ` Sean Christopherson
  2022-03-30  1:17     ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-03-30  0:04 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Fri, Mar 25, 2022, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Let's get the information whether or not guests disable interruptions.

This is missing critical information for _why_.  It took me some staring to
understand that this allows querying IRQs from a _different_ vCPU, which needs
caching on VMX due to the need to do a VMREAD.

> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/x86.c              | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 50f011a7445a..8e05cbfa9827 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -861,6 +861,7 @@ struct kvm_vcpu_arch {
>  		bool preempt_count_enabled;
>  		struct gfn_to_hva_cache preempt_count_cache;
>  	} pv_pc;
> +	bool irq_disabled;

This is going to at best be confusing, and at worst lead to bugs  The flag is
valid if and only if the vCPU is not loaded.  I don't have a clever answer, but
this needs to have some form of guard to (a) clarify when it's valid and (b) actively
prevent misuse.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 3/5] KVM: X86: Boost vCPU which is in critical section
  2022-03-25 13:58 ` [PATCH RESEND 3/5] KVM: X86: Boost vCPU which is in critical section Wanpeng Li
@ 2022-03-30  0:07   ` Sean Christopherson
  2022-03-30  1:18     ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-03-30  0:07 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Fri, Mar 25, 2022, Wanpeng Li wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 425fd7f38fa9..6b300496bbd0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10375,6 +10375,28 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>  	return r;
>  }
>  
> +static int kvm_vcpu_non_preemptable(struct kvm_vcpu *vcpu)

s/preemtable/preemptible

And I'd recommend inverting the return, and also return a bool, i.e.

static bool kvm_vcpu_is_preemptible(struct kvm_vcpu *vcpu)

> +{
> +	int count;
> +
> +	if (!vcpu->arch.pv_pc.preempt_count_enabled)
> +		return 0;
> +
> +	if (!kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_pc.preempt_count_cache,
> +	    &count, sizeof(int)))
> +		return (count & ~PREEMPT_NEED_RESCHED);

This cements PREEMPT_NEED_RESCHED into KVM's guest/host ABI.  I doubt the sched
folks will be happy with that.

> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support
  2022-03-30  0:04   ` Sean Christopherson
@ 2022-03-30  1:17     ` Wanpeng Li
  2022-03-31 22:00       ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Wanpeng Li @ 2022-03-30  1:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, 30 Mar 2022 at 08:04, Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Mar 25, 2022, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Let's get the information whether or not guests disable interruptions.
>
> This is missing critical information for _why_.  It took me some staring to
> understand that this allows querying IRQs from a _different_ vCPU, which needs
> caching on VMX due to the need to do a VMREAD.

Yes.

>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/include/asm/kvm_host.h | 1 +
> >  arch/x86/kvm/x86.c              | 3 +++
> >  2 files changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 50f011a7445a..8e05cbfa9827 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -861,6 +861,7 @@ struct kvm_vcpu_arch {
> >               bool preempt_count_enabled;
> >               struct gfn_to_hva_cache preempt_count_cache;
> >       } pv_pc;
> > +     bool irq_disabled;
>
> This is going to at best be confusing, and at worst lead to bugs  The flag is
> valid if and only if the vCPU is not loaded.  I don't have a clever answer, but
> this needs to have some form of guard to (a) clarify when it's valid and (b) actively
> prevent misuse.

How about renaming it to last_guest_irq_disabled and comments as /*
Guest irq disabled state, valid iff the vCPU is not loaded */

    Wanpeng

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 3/5] KVM: X86: Boost vCPU which is in critical section
  2022-03-30  0:07   ` Sean Christopherson
@ 2022-03-30  1:18     ` Wanpeng Li
  0 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2022-03-30  1:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, 30 Mar 2022 at 08:07, Sean Christopherson <seanjc@google.com> wrote:
>
> On Fri, Mar 25, 2022, Wanpeng Li wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 425fd7f38fa9..6b300496bbd0 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10375,6 +10375,28 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> >       return r;
> >  }
> >
> > +static int kvm_vcpu_non_preemptable(struct kvm_vcpu *vcpu)
>
> s/preemtable/preemptible
>
> And I'd recommend inverting the return, and also return a bool, i.e.
>
> static bool kvm_vcpu_is_preemptible(struct kvm_vcpu *vcpu)

Good suggestion.

>
> > +{
> > +     int count;
> > +
> > +     if (!vcpu->arch.pv_pc.preempt_count_enabled)
> > +             return 0;
> > +
> > +     if (!kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_pc.preempt_count_cache,
> > +         &count, sizeof(int)))
> > +             return (count & ~PREEMPT_NEED_RESCHED);
>
> This cements PREEMPT_NEED_RESCHED into KVM's guest/host ABI.  I doubt the sched
> folks will be happy with that.
>
> > +
> > +     return 0;
> > +}
> > +

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support
  2022-03-30  1:17     ` Wanpeng Li
@ 2022-03-31 22:00       ` Sean Christopherson
  2022-04-01  1:36         ` Wanpeng Li
  0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2022-03-31 22:00 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Wed, Mar 30, 2022, Wanpeng Li wrote:
> On Wed, 30 Mar 2022 at 08:04, Sean Christopherson <seanjc@google.com> wrote:
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 50f011a7445a..8e05cbfa9827 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -861,6 +861,7 @@ struct kvm_vcpu_arch {
> > >               bool preempt_count_enabled;
> > >               struct gfn_to_hva_cache preempt_count_cache;
> > >       } pv_pc;
> > > +     bool irq_disabled;
> >
> > This is going to at best be confusing, and at worst lead to bugs  The flag is
> > valid if and only if the vCPU is not loaded.  I don't have a clever answer, but
> > this needs to have some form of guard to (a) clarify when it's valid and (b) actively
> > prevent misuse.
> 
> How about renaming it to last_guest_irq_disabled and comments as /*
> Guest irq disabled state, valid iff the vCPU is not loaded */

What about usurping vcpu->run->if_flag?  Userspace could manipulate the data, but
that should be fine since the data is already guest-controlled.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support
  2022-03-31 22:00       ` Sean Christopherson
@ 2022-04-01  1:36         ` Wanpeng Li
  0 siblings, 0 replies; 12+ messages in thread
From: Wanpeng Li @ 2022-04-01  1:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel

On Fri, 1 Apr 2022 at 06:00, Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Mar 30, 2022, Wanpeng Li wrote:
> > On Wed, 30 Mar 2022 at 08:04, Sean Christopherson <seanjc@google.com> wrote:
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 50f011a7445a..8e05cbfa9827 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -861,6 +861,7 @@ struct kvm_vcpu_arch {
> > > >               bool preempt_count_enabled;
> > > >               struct gfn_to_hva_cache preempt_count_cache;
> > > >       } pv_pc;
> > > > +     bool irq_disabled;
> > >
> > > This is going to at best be confusing, and at worst lead to bugs  The flag is
> > > valid if and only if the vCPU is not loaded.  I don't have a clever answer, but
> > > this needs to have some form of guard to (a) clarify when it's valid and (b) actively
> > > prevent misuse.
> >
> > How about renaming it to last_guest_irq_disabled and comments as /*
> > Guest irq disabled state, valid iff the vCPU is not loaded */
>
> What about usurping vcpu->run->if_flag?  Userspace could manipulate the data, but
> that should be fine since the data is already guest-controlled.

We should at least update vcpu->run->if_flag during vcpu scheduled for
the purpose of this patch, I think it looks strange for
vcpu->run->if_flag.

    Wanpeng

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-01  1:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25 13:58 [PATCH RESEND 0/5] KVM: X86: Scaling Guest OS Critical Sections with boosting Wanpeng Li
2022-03-25 13:58 ` [PATCH RESEND 1/5] KVM: X86: Add MSR_KVM_PREEMPT_COUNT support Wanpeng Li
2022-03-25 13:58 ` [PATCH RESEND 2/5] KVM: X86: Add guest interrupt disable state support Wanpeng Li
2022-03-30  0:04   ` Sean Christopherson
2022-03-30  1:17     ` Wanpeng Li
2022-03-31 22:00       ` Sean Christopherson
2022-04-01  1:36         ` Wanpeng Li
2022-03-25 13:58 ` [PATCH RESEND 3/5] KVM: X86: Boost vCPU which is in critical section Wanpeng Li
2022-03-30  0:07   ` Sean Christopherson
2022-03-30  1:18     ` Wanpeng Li
2022-03-25 13:58 ` [PATCH RESEND 4/5] x86/kvm: Add MSR_KVM_PREEMPT_COUNT guest support Wanpeng Li
2022-03-25 13:58 ` [PATCH RESEND 5/5] KVM: X86: Expose PREEMT_COUNT CPUID feature bit to guest Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).