linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] KVM: X86: Implement PV IPIs support
@ 2018-07-03  6:21 Wanpeng Li
  2018-07-03  6:21 ` [PATCH v3 1/6] KVM: X86: Add kvm hypervisor init time platform setup callback Wanpeng Li
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-03  6:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov

Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster 
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping 
is enabled in qemu, however, latest AMD EPYC still just supports xapic 
mode which can get great improvement by PV IPIs. This patchset supports 
PV IPIs for maximal 128 vCPUs VM, it is big enough for cloud environment 
currently, supporting more vCPUs needs to introduce more complex logic, 
in the future this might be extended if needed.

Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM 
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):

x2apic cluster mode, vanilla

 Dry-run:                         0,            2392199 ns
 Self-IPI:                  6907514,           15027589 ns
 Normal IPI:              223910476,          251301666 ns
 Broadcast IPI:                   0,         9282161150 ns
 Broadcast lock:                  0,         8812934104 ns

x2apic cluster mode, pv-ipi 

 Dry-run:                         0,            2449341 ns
 Self-IPI:                  6720360,           15028732 ns
 Normal IPI:              228643307,          255708477 ns
 Broadcast IPI:                   0,         7572293590 ns  => 22% performance boost 
 Broadcast lock:                  0,         8316124651 ns

x2apic physical mode, vanilla

 Dry-run:                         0,            3135933 ns
 Self-IPI:                  8572670,           17901757 ns
 Normal IPI:              226444334,          255421709 ns
 Broadcast IPI:                   0,        19845070887 ns
 Broadcast lock:                  0,        19827383656 ns

x2apic physical mode, pv-ipi

 Dry-run:                         0,            2446381 ns
 Self-IPI:                  6788217,           15021056 ns
 Normal IPI:              219454441,          249583458 ns
 Broadcast IPI:                   0,         7806540019 ns  => 154% performance boost 
 Broadcast lock:                  0,         9143618799 ns

v2 -> v3:
 * rename ipi_mask_done to irq_restore_exit, __send_ipi_mask return int 
   instead of bool 
 * fix build errors reported by 0day
 * split patches, nothing change 

v1 -> v2:
 * sparse apic id > 128, or any other errors, fallback to original apic hooks
 * have two bitmask arguments so that one hypercall handles 128 vCPUs 
 * fix KVM_FEATURE_PV_SEND_IPI doc
 * document hypercall
 * fix NMI selftest fails
 * fix build errors reported by 0day

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>

Wanpeng Li (6):
  KVM: X86: Add kvm hypervisor init time platform setup callback
  KVM: X86: Implement PV IPIs in linux guest
  KVM: X86: Fallback to original apic hooks when bad happens
  KVM: X86: Implement PV IPIs send hypercall
  KVM: X86: Add NMI support to PV IPIs
  KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest

 Documentation/virtual/kvm/cpuid.txt      |   4 ++
 Documentation/virtual/kvm/hypercalls.txt |   6 ++
 arch/x86/include/uapi/asm/kvm_para.h     |   1 +
 arch/x86/kernel/kvm.c                    | 101 +++++++++++++++++++++++++++++++
 arch/x86/kvm/cpuid.c                     |   3 +-
 arch/x86/kvm/x86.c                       |  42 +++++++++++++
 include/uapi/linux/kvm_para.h            |   1 +
 7 files changed, 157 insertions(+), 1 deletion(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/6] KVM: X86: Add kvm hypervisor init time platform setup callback
  2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
@ 2018-07-03  6:21 ` Wanpeng Li
  2018-07-03  6:21 ` [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest Wanpeng Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-03  6:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov

From: Wanpeng Li <wanpengli@tencent.com>

Add kvm hypervisor init time platform setup callback which 
will be used to replace native apic hooks by pararvirtual 
hooks.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kernel/kvm.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5b2300b..591bcf2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -624,12 +624,22 @@ static uint32_t __init kvm_detect(void)
 	return kvm_cpuid_base();
 }
 
+static void __init kvm_apic_init(void)
+{
+}
+
+static void __init kvm_init_platform(void)
+{
+	x86_platform.apic_post_init = kvm_apic_init;
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.name			= "KVM",
 	.detect			= kvm_detect,
 	.type			= X86_HYPER_KVM,
 	.init.guest_late_init	= kvm_guest_init,
 	.init.x2apic_available	= kvm_para_available,
+	.init.init_platform	= kvm_init_platform,
 };
 
 static __init int activate_jump_labels(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
  2018-07-03  6:21 ` [PATCH v3 1/6] KVM: X86: Add kvm hypervisor init time platform setup callback Wanpeng Li
@ 2018-07-03  6:21 ` Wanpeng Li
  2018-07-19 16:28   ` Radim Krčmář
  2018-07-19 23:05   ` David Matlack
  2018-07-03  6:21 ` [PATCH v3 3/6] KVM: X86: Fallback to original apic hooks when bad happens Wanpeng Li
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-03  6:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov

From: Wanpeng Li <wanpengli@tencent.com>

Implement paravirtual apic hooks to enable PV IPIs.

apic->send_IPI_mask
apic->send_IPI_mask_allbutself
apic->send_IPI_allbutself
apic->send_IPI_all

The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud 
environment currently, supporting more vCPUs needs to introduce more 
complex logic, in the future this might be extended if needed.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c                | 70 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm_para.h        |  1 +
 3 files changed, 72 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 0ede697..19980ec 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -28,6 +28,7 @@
 #define KVM_FEATURE_PV_UNHALT		7
 #define KVM_FEATURE_PV_TLB_FLUSH	9
 #define KVM_FEATURE_ASYNC_PF_VMEXIT	10
+#define KVM_FEATURE_PV_SEND_IPI	11
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 591bcf2..2fe1420 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
 }
 
 #ifdef CONFIG_SMP
+
+#ifdef CONFIG_X86_64
+static void __send_ipi_mask(const struct cpumask *mask, int vector)
+{
+	unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
+	int cpu, apic_id;
+
+	if (cpumask_empty(mask))
+		return;
+
+	local_irq_save(flags);
+
+	for_each_cpu(cpu, mask) {
+		apic_id = per_cpu(x86_cpu_to_apicid, cpu);
+		if (apic_id < BITS_PER_LONG)
+			__set_bit(apic_id, &ipi_bitmap_low);
+		else if (apic_id < 2 * BITS_PER_LONG)
+			__set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
+	}
+
+	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
+
+	local_irq_restore(flags);
+}
+
+static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
+{
+	__send_ipi_mask(mask, vector);
+}
+
+static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
+{
+	unsigned int this_cpu = smp_processor_id();
+	struct cpumask new_mask;
+	const struct cpumask *local_mask;
+
+	cpumask_copy(&new_mask, mask);
+	cpumask_clear_cpu(this_cpu, &new_mask);
+	local_mask = &new_mask;
+	__send_ipi_mask(local_mask, vector);
+}
+
+static void kvm_send_ipi_allbutself(int vector)
+{
+	kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
+}
+
+static void kvm_send_ipi_all(int vector)
+{
+	__send_ipi_mask(cpu_online_mask, vector);
+}
+
+/*
+ * Set the IPI entry points
+ */
+static void kvm_setup_pv_ipi(void)
+{
+	apic->send_IPI_mask = kvm_send_ipi_mask;
+	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
+	apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
+	apic->send_IPI_all = kvm_send_ipi_all;
+	pr_info("KVM setup pv IPIs\n");
+}
+#endif
+
 static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
 {
 	native_smp_prepare_cpus(max_cpus);
@@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)
 
 static void __init kvm_apic_init(void)
 {
+#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
+	if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
+		num_possible_cpus() <= 2 * BITS_PER_LONG)
+		kvm_setup_pv_ipi();
+#endif
 }
 
 static void __init kvm_init_platform(void)
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index dcf629d..84f8fe3 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -26,6 +26,7 @@
 #define KVM_HC_MIPS_EXIT_VM		7
 #define KVM_HC_MIPS_CONSOLE_OUTPUT	8
 #define KVM_HC_CLOCK_PAIRING		9
+#define KVM_HC_SEND_IPI 		10
 
 /*
  * hypercalls use architecture specific
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/6] KVM: X86: Fallback to original apic hooks when bad happens
  2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
  2018-07-03  6:21 ` [PATCH v3 1/6] KVM: X86: Add kvm hypervisor init time platform setup callback Wanpeng Li
  2018-07-03  6:21 ` [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest Wanpeng Li
@ 2018-07-03  6:21 ` Wanpeng Li
  2018-07-03  6:21 ` [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall Wanpeng Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-03  6:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov

From: Wanpeng Li <wanpengli@tencent.com>

Fallback to original apic hooks when apic id is sparse and larger 
than 128 or kvm fails to add the pending IRQ to lapic.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kernel/kvm.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 2fe1420..81be75b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -47,6 +47,7 @@
 #include <asm/hypervisor.h>
 #include <asm/kvm_guest.h>
 
+static struct apic orig_apic;
 static int kvmapf = 1;
 
 static int __init parse_no_kvmapf(char *arg)
@@ -456,13 +457,13 @@ static void __init sev_map_percpu_data(void)
 #ifdef CONFIG_SMP
 
 #ifdef CONFIG_X86_64
-static void __send_ipi_mask(const struct cpumask *mask, int vector)
+static int __send_ipi_mask(const struct cpumask *mask, int vector)
 {
 	unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
-	int cpu, apic_id;
+	int cpu, apic_id, ret = 0;
 
 	if (cpumask_empty(mask))
-		return;
+		return 0;
 
 	local_irq_save(flags);
 
@@ -472,16 +473,23 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
 			__set_bit(apic_id, &ipi_bitmap_low);
 		else if (apic_id < 2 * BITS_PER_LONG)
 			__set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
+		else {
+			ret = -EFAULT;
+			goto irq_restore_exit;
+		}
 	}
 
-	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
+	ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
 
+irq_restore_exit:
 	local_irq_restore(flags);
+	return ret;
 }
 
 static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
 {
-	__send_ipi_mask(mask, vector);
+	if (__send_ipi_mask(mask, vector))
+		orig_apic.send_IPI_mask(mask, vector);
 }
 
 static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
@@ -493,7 +501,8 @@ static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
 	cpumask_copy(&new_mask, mask);
 	cpumask_clear_cpu(this_cpu, &new_mask);
 	local_mask = &new_mask;
-	__send_ipi_mask(local_mask, vector);
+	if (__send_ipi_mask(local_mask, vector))
+		orig_apic.send_IPI_mask_allbutself(mask, vector);
 }
 
 static void kvm_send_ipi_allbutself(int vector)
@@ -503,7 +512,8 @@ static void kvm_send_ipi_allbutself(int vector)
 
 static void kvm_send_ipi_all(int vector)
 {
-	__send_ipi_mask(cpu_online_mask, vector);
+	if (__send_ipi_mask(cpu_online_mask, vector))
+		orig_apic.send_IPI_all(vector);
 }
 
 /*
@@ -511,6 +521,8 @@ static void kvm_send_ipi_all(int vector)
  */
 static void kvm_setup_pv_ipi(void)
 {
+	orig_apic = *apic;
+
 	apic->send_IPI_mask = kvm_send_ipi_mask;
 	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
 	apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall
  2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
                   ` (2 preceding siblings ...)
  2018-07-03  6:21 ` [PATCH v3 3/6] KVM: X86: Fallback to original apic hooks when bad happens Wanpeng Li
@ 2018-07-03  6:21 ` Wanpeng Li
  2018-07-19 16:47   ` Paolo Bonzini
  2018-07-03  6:21 ` [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs Wanpeng Li
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2018-07-03  6:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov

From: Wanpeng Li <wanpengli@tencent.com>

Using hypercall to send IPIs by one vmexit instead of one by one for
xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster 
mode. Intel guest can enter x2apic cluster mode when interrupt remmaping 
is enabled in qemu, however, latest AMD EPYC still just supports xapic 
mode which can get great improvement by PV IPIs.  

Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM 
is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):

x2apic cluster mode, vanilla

 Dry-run:                         0,            2392199 ns
 Self-IPI:                  6907514,           15027589 ns
 Normal IPI:              223910476,          251301666 ns
 Broadcast IPI:                   0,         9282161150 ns
 Broadcast lock:                  0,         8812934104 ns

x2apic cluster mode, pv-ipi 

 Dry-run:                         0,            2449341 ns
 Self-IPI:                  6720360,           15028732 ns
 Normal IPI:              228643307,          255708477 ns
 Broadcast IPI:                   0,         7572293590 ns  => 22% performance boost 
 Broadcast lock:                  0,         8316124651 ns

x2apic physical mode, vanilla

 Dry-run:                         0,            3135933 ns
 Self-IPI:                  8572670,           17901757 ns
 Normal IPI:              226444334,          255421709 ns
 Broadcast IPI:                   0,        19845070887 ns
 Broadcast lock:                  0,        19827383656 ns

x2apic physical mode, pv-ipi

 Dry-run:                         0,            2446381 ns
 Self-IPI:                  6788217,           15021056 ns
 Normal IPI:              219454441,          249583458 ns
 Broadcast IPI:                   0,         7806540019 ns  => 154% performance boost 
 Broadcast lock:                  0,         9143618799 ns

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 Documentation/virtual/kvm/hypercalls.txt |  6 ++++++
 arch/x86/kvm/x86.c                       | 36 ++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
index a890529..a771ee8 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -121,3 +121,9 @@ compute the CLOCK_REALTIME for its clock, at the same instant.
 
 Returns KVM_EOPNOTSUPP if the host does not use TSC clocksource,
 or if clock type is different than KVM_CLOCK_PAIRING_WALLCLOCK.
+
+6. KVM_HC_SEND_IPI
+------------------------
+Architecture: x86
+Status: active
+Purpose: Hypercall used to send IPIs.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0046aa7..46657ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6689,6 +6689,39 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
 	kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
 }
 
+/*
+ * Return 0 if successfully added and 1 if discarded.
+ */
+static int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
+			unsigned long ipi_bitmap_high, int vector)
+{
+	int i;
+	struct kvm_apic_map *map;
+	struct kvm_vcpu *vcpu;
+	struct kvm_lapic_irq irq = {
+		.delivery_mode = APIC_DM_FIXED,
+		.vector = vector,
+	};
+
+	rcu_read_lock();
+	map = rcu_dereference(kvm->arch.apic_map);
+
+	for_each_set_bit(i, &ipi_bitmap_low, BITS_PER_LONG) {
+		vcpu = map->phys_map[i]->vcpu;
+		if (!kvm_apic_set_irq(vcpu, &irq, NULL))
+			return 1;
+	}
+
+	for_each_set_bit(i, &ipi_bitmap_high, BITS_PER_LONG) {
+		vcpu = map->phys_map[i + BITS_PER_LONG]->vcpu;
+		if (!kvm_apic_set_irq(vcpu, &irq, NULL))
+			return 1;
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+
 void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.apicv_active = false;
@@ -6737,6 +6770,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_CLOCK_PAIRING:
 		ret = kvm_pv_clock_pairing(vcpu, a0, a1);
 		break;
+	case KVM_HC_SEND_IPI:
+		ret = kvm_pv_send_ipi(vcpu->kvm, a0, a1, a2);
+		break;
 #endif
 	default:
 		ret = -KVM_ENOSYS;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs
  2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
                   ` (3 preceding siblings ...)
  2018-07-03  6:21 ` [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall Wanpeng Li
@ 2018-07-03  6:21 ` Wanpeng Li
  2018-07-19 16:31   ` Radim Krčmář
  2018-07-03  6:21 ` [PATCH v3 6/6] KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest Wanpeng Li
  2018-07-18  3:00 ` [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
  6 siblings, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2018-07-03  6:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov

From: Wanpeng Li <wanpengli@tencent.com>

The NMI delivery mode of ICR is used to deliver an NMI to the processor, 
and the vector information is ignored.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kernel/kvm.c | 13 +++++++++++--
 arch/x86/kvm/x86.c    | 16 +++++++++++-----
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 81be75b..b866518 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -459,7 +459,7 @@ static void __init sev_map_percpu_data(void)
 #ifdef CONFIG_X86_64
 static int __send_ipi_mask(const struct cpumask *mask, int vector)
 {
-	unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
+	unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0, icr = 0;
 	int cpu, apic_id, ret = 0;
 
 	if (cpumask_empty(mask))
@@ -479,7 +479,16 @@ static int __send_ipi_mask(const struct cpumask *mask, int vector)
 		}
 	}
 
-	ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
+	switch (vector) {
+	default:
+		icr = APIC_DM_FIXED | vector;
+		break;
+	case NMI_VECTOR:
+		icr = APIC_DM_NMI;
+		break;
+	}
+
+	ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, icr);
 
 irq_restore_exit:
 	local_irq_restore(flags);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 46657ef..2e4c4d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6693,15 +6693,21 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
  * Return 0 if successfully added and 1 if discarded.
  */
 static int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
-			unsigned long ipi_bitmap_high, int vector)
+			unsigned long ipi_bitmap_high, unsigned long icr)
 {
 	int i;
 	struct kvm_apic_map *map;
 	struct kvm_vcpu *vcpu;
-	struct kvm_lapic_irq irq = {
-		.delivery_mode = APIC_DM_FIXED,
-		.vector = vector,
-	};
+	struct kvm_lapic_irq irq = {0};
+
+	switch (icr & APIC_VECTOR_MASK) {
+	default:
+		irq.vector = icr & APIC_VECTOR_MASK;
+		break;
+	case NMI_VECTOR:
+		break;
+	}
+	irq.delivery_mode = icr & APIC_MODE_MASK;
 
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 6/6] KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest
  2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
                   ` (4 preceding siblings ...)
  2018-07-03  6:21 ` [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs Wanpeng Li
@ 2018-07-03  6:21 ` Wanpeng Li
  2018-07-18  3:00 ` [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
  6 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-03  6:21 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář, Vitaly Kuznetsov

From: Wanpeng Li <wanpengli@tencent.com>

Expose PV_SEND_IPI feature bit to guest, the guest can check this feature 
bit before using paravirtualized send IPIs.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 Documentation/virtual/kvm/cpuid.txt | 4 ++++
 arch/x86/kvm/cpuid.c                | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt
index ab022dc..97ca194 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -62,6 +62,10 @@ KVM_FEATURE_ASYNC_PF_VMEXIT        ||    10 || paravirtualized async PF VM exit
                                    ||       || can be enabled by setting bit 2
                                    ||       || when writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_SEND_IPI            ||    11 || guest checks this feature bit
+                                   ||       || before using paravirtualized
+                                   ||       || send IPIs.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7e042e3..7bcfa61 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -621,7 +621,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
 			     (1 << KVM_FEATURE_PV_UNHALT) |
 			     (1 << KVM_FEATURE_PV_TLB_FLUSH) |
-			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT);
+			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
+			     (1 << KVM_FEATURE_PV_SEND_IPI);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 0/6] KVM: X86: Implement PV IPIs support
  2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
                   ` (5 preceding siblings ...)
  2018-07-03  6:21 ` [PATCH v3 6/6] KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest Wanpeng Li
@ 2018-07-18  3:00 ` Wanpeng Li
  6 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-18  3:00 UTC (permalink / raw)
  To: LKML, kvm; +Cc: Paolo Bonzini, Radim Krcmar, Vitaly Kuznetsov

Gentle ping, hope this series can catch up with the next merge window. :)
On Tue, 3 Jul 2018 at 14:21, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> Using hypercall to send IPIs by one vmexit instead of one by one for
> xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
> mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
> is enabled in qemu, however, latest AMD EPYC still just supports xapic
> mode which can get great improvement by PV IPIs. This patchset supports
> PV IPIs for maximal 128 vCPUs VM, it is big enough for cloud environment
> currently, supporting more vCPUs needs to introduce more complex logic,
> in the future this might be extended if needed.
>
> Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
> is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):
>
> x2apic cluster mode, vanilla
>
>  Dry-run:                         0,            2392199 ns
>  Self-IPI:                  6907514,           15027589 ns
>  Normal IPI:              223910476,          251301666 ns
>  Broadcast IPI:                   0,         9282161150 ns
>  Broadcast lock:                  0,         8812934104 ns
>
> x2apic cluster mode, pv-ipi
>
>  Dry-run:                         0,            2449341 ns
>  Self-IPI:                  6720360,           15028732 ns
>  Normal IPI:              228643307,          255708477 ns
>  Broadcast IPI:                   0,         7572293590 ns  => 22% performance boost
>  Broadcast lock:                  0,         8316124651 ns
>
> x2apic physical mode, vanilla
>
>  Dry-run:                         0,            3135933 ns
>  Self-IPI:                  8572670,           17901757 ns
>  Normal IPI:              226444334,          255421709 ns
>  Broadcast IPI:                   0,        19845070887 ns
>  Broadcast lock:                  0,        19827383656 ns
>
> x2apic physical mode, pv-ipi
>
>  Dry-run:                         0,            2446381 ns
>  Self-IPI:                  6788217,           15021056 ns
>  Normal IPI:              219454441,          249583458 ns
>  Broadcast IPI:                   0,         7806540019 ns  => 154% performance boost
>  Broadcast lock:                  0,         9143618799 ns
>
> v2 -> v3:
>  * rename ipi_mask_done to irq_restore_exit, __send_ipi_mask return int
>    instead of bool
>  * fix build errors reported by 0day
>  * split patches, nothing change
>
> v1 -> v2:
>  * sparse apic id > 128, or any other errors, fallback to original apic hooks
>  * have two bitmask arguments so that one hypercall handles 128 vCPUs
>  * fix KVM_FEATURE_PV_SEND_IPI doc
>  * document hypercall
>  * fix NMI selftest fails
>  * fix build errors reported by 0day
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> Wanpeng Li (6):
>   KVM: X86: Add kvm hypervisor init time platform setup callback
>   KVM: X86: Implement PV IPIs in linux guest
>   KVM: X86: Fallback to original apic hooks when bad happens
>   KVM: X86: Implement PV IPIs send hypercall
>   KVM: X86: Add NMI support to PV IPIs
>   KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest
>
>  Documentation/virtual/kvm/cpuid.txt      |   4 ++
>  Documentation/virtual/kvm/hypercalls.txt |   6 ++
>  arch/x86/include/uapi/asm/kvm_para.h     |   1 +
>  arch/x86/kernel/kvm.c                    | 101 +++++++++++++++++++++++++++++++
>  arch/x86/kvm/cpuid.c                     |   3 +-
>  arch/x86/kvm/x86.c                       |  42 +++++++++++++
>  include/uapi/linux/kvm_para.h            |   1 +
>  7 files changed, 157 insertions(+), 1 deletion(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-03  6:21 ` [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest Wanpeng Li
@ 2018-07-19 16:28   ` Radim Krčmář
  2018-07-19 16:47     ` Paolo Bonzini
  2018-07-20  3:33     ` Wanpeng Li
  2018-07-19 23:05   ` David Matlack
  1 sibling, 2 replies; 27+ messages in thread
From: Radim Krčmář @ 2018-07-19 16:28 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov

2018-07-03 14:21+0800, Wanpeng Li:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Implement paravirtual apic hooks to enable PV IPIs.
> 
> apic->send_IPI_mask
> apic->send_IPI_mask_allbutself
> apic->send_IPI_allbutself
> apic->send_IPI_all
> 
> The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud 
> environment currently, supporting more vCPUs needs to introduce more 
> complex logic, in the future this might be extended if needed.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> @@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
>  }
>  
>  #ifdef CONFIG_SMP
> +
> +#ifdef CONFIG_X86_64
> +static void __send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> +	unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
> +	int cpu, apic_id;
> +
> +	if (cpumask_empty(mask))
> +		return;
> +
> +	local_irq_save(flags);
> +
> +	for_each_cpu(cpu, mask) {
> +		apic_id = per_cpu(x86_cpu_to_apicid, cpu);
> +		if (apic_id < BITS_PER_LONG)
> +			__set_bit(apic_id, &ipi_bitmap_low);
> +		else if (apic_id < 2 * BITS_PER_LONG)
> +			__set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);

It'd be nicer with 'unsigned long ipi_bitmap[2]' and a single

	__set_bit(apic_id, ipi_bitmap);

> +	}
> +
> +	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);

and

	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);

Still, the main problem is that we can only address 128 APICs.

A simple improvement would reuse the vector field (as we need only 8
bits) and put a 'offset' in the rest.  The offset would say which
cluster of 128 are we addressing.  24 bits of offset results in 2^31
total addressable CPUs (we probably should even use that many bits).
The downside of this is that we can only address 128 at a time.

It's basically the same as x2apic cluster mode, only with 128 cluster
size instead of 16, so the code should be a straightforward port.
And because x2apic code doesn't seem to use any division by the cluster
size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
make the cluster size 192. :)

But because it is very similar to x2apic, I'd really need some real
performance data to see if this benefits a real workload.
Hardware could further optimize LAPIC (apicv, vapic) in the future,
which we'd lose by using paravirt.

e.g. AMD's acceleration should be superior to this when using < 8 VCPUs
as they can use logical xAPIC and send without VM exits (when all VCPUs
are running).

> +
> +	local_irq_restore(flags);
> +}
> +
> +static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> +	__send_ipi_mask(mask, vector);
> +}
> +
> +static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
> +{
> +	unsigned int this_cpu = smp_processor_id();
> +	struct cpumask new_mask;
> +	const struct cpumask *local_mask;
> +
> +	cpumask_copy(&new_mask, mask);
> +	cpumask_clear_cpu(this_cpu, &new_mask);
> +	local_mask = &new_mask;
> +	__send_ipi_mask(local_mask, vector);
> +}
> +
> +static void kvm_send_ipi_allbutself(int vector)
> +{
> +	kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
> +}
> +
> +static void kvm_send_ipi_all(int vector)
> +{
> +	__send_ipi_mask(cpu_online_mask, vector);

These should be faster when using the native APIC shorthand -- is this
the "Broadcast" in your tests?

> +}
> +
> +/*
> + * Set the IPI entry points
> + */
> +static void kvm_setup_pv_ipi(void)
> +{
> +	apic->send_IPI_mask = kvm_send_ipi_mask;
> +	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> +	apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
> +	apic->send_IPI_all = kvm_send_ipi_all;
> +	pr_info("KVM setup pv IPIs\n");
> +}
> +#endif
> +
>  static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
>  {
>  	native_smp_prepare_cpus(max_cpus);
> @@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)
>  
>  static void __init kvm_apic_init(void)
>  {
> +#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
> +	if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
> +		num_possible_cpus() <= 2 * BITS_PER_LONG)

It looks that num_possible_cpus() is actually NR_CPUS, so the feature
would never be used on a standard Linux distro.
And we're using APIC_ID, which can be higher even if maximum CPU the
number is lower.  Just remove it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs
  2018-07-03  6:21 ` [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs Wanpeng Li
@ 2018-07-19 16:31   ` Radim Krčmář
  2018-07-20  3:53     ` Wanpeng Li
  0 siblings, 1 reply; 27+ messages in thread
From: Radim Krčmář @ 2018-07-19 16:31 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Vitaly Kuznetsov

2018-07-03 14:21+0800, Wanpeng Li:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> The NMI delivery mode of ICR is used to deliver an NMI to the processor, 
> and the vector information is ignored.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> @@ -479,7 +479,16 @@ static int __send_ipi_mask(const struct cpumask *mask, int vector)
>  		}
>  	}
>  
> -	ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> +	switch (vector) {
> +	default:
> +		icr = APIC_DM_FIXED | vector;
> +		break;
> +	case NMI_VECTOR:
> +		icr = APIC_DM_NMI;

I think it would be better to say that KVM interprets NMI_VECTOR and
sends the interrupt as APIC_DM_NMI.

> +		break;
> +	}
> +
> +	ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, icr);
>  

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall
  2018-07-03  6:21 ` [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall Wanpeng Li
@ 2018-07-19 16:47   ` Paolo Bonzini
  2018-07-20  3:49     ` Wanpeng Li
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-19 16:47 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář, Vitaly Kuznetsov

On 03/07/2018 08:21, Wanpeng Li wrote:
> +
> +	rcu_read_lock();
> +	map = rcu_dereference(kvm->arch.apic_map);
> +
> +	for_each_set_bit(i, &ipi_bitmap_low, BITS_PER_LONG) {
> +		vcpu = map->phys_map[i]->vcpu;
> +		if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> +			return 1;
> +	}
> +
> +	for_each_set_bit(i, &ipi_bitmap_high, BITS_PER_LONG) {
> +		vcpu = map->phys_map[i + BITS_PER_LONG]->vcpu;
> +		if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> +			return 1;
> +	}
> +

This should be the guest's BITS_PER_LONG, not the host's (i.e. you need
to pass op_64_bit from kvm_emulate_hypercall).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-19 16:28   ` Radim Krčmář
@ 2018-07-19 16:47     ` Paolo Bonzini
  2018-07-19 17:22       ` Radim Krčmář
                         ` (2 more replies)
  2018-07-20  3:33     ` Wanpeng Li
  1 sibling, 3 replies; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-19 16:47 UTC (permalink / raw)
  To: Radim Krčmář, Wanpeng Li
  Cc: linux-kernel, kvm, Vitaly Kuznetsov

On 19/07/2018 18:28, Radim Krčmář wrote:
>> +
>> +	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> and
> 
> 	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
> 
> Still, the main problem is that we can only address 128 APICs.
> 
> A simple improvement would reuse the vector field (as we need only 8
> bits) and put a 'offset' in the rest.  The offset would say which
> cluster of 128 are we addressing.  24 bits of offset results in 2^31
> total addressable CPUs (we probably should even use that many bits).
> The downside of this is that we can only address 128 at a time.
> 
> It's basically the same as x2apic cluster mode, only with 128 cluster
> size instead of 16, so the code should be a straightforward port.
> And because x2apic code doesn't seem to use any division by the cluster
> size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> make the cluster size 192. :)

I did suggest an offset earlier in the discussion.

The main problem is that consecutive CPU ids do not map to consecutive
APIC ids.  But still, we could do an hypercall whenever the total range
exceeds 64.  Something like

u64 ipi_bitmap = 0;
for_each_cpu(cpu, mask)
	if (!ipi_bitmap) {
		min = max = cpu;
	} else if (cpu < min && max - cpu < 64) {
		ipi_bitmap <<= min - cpu;
		min = cpu;
	} else if (id < min + 64) {
		max = cpu < max ? max : cpu;
	} else {
		/* ... send hypercall... */
		min = max = cpu;
		ipi_bitmap = 0;
	}
	__set_bit(ipi_bitmap, cpu - min);
}
if (ipi_bitmap) {
	/* ... send hypercall... */
}

We could keep the cluster size of 128, but it would be more complicated
to do the left shift in the first "else if".  If the limit is 64, you
can keep the two arguments in the hypercall, and just pass 0 as the
"high" bitmap on 64-bit kernels.

Paolo

> But because it is very similar to x2apic, I'd really need some real
> performance data to see if this benefits a real workload.
> Hardware could further optimize LAPIC (apicv, vapic) in the future,
> which we'd lose by using paravirt.
>
> e.g. AMD's acceleration should be superior to this when using < 8 VCPUs
> as they can use logical xAPIC and send without VM exits (when all VCPUs
> are running).
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-19 16:47     ` Paolo Bonzini
@ 2018-07-19 17:22       ` Radim Krčmář
  2018-07-20  3:35       ` Wanpeng Li
  2018-07-20  5:58       ` Wanpeng Li
  2 siblings, 0 replies; 27+ messages in thread
From: Radim Krčmář @ 2018-07-19 17:22 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wanpeng Li, linux-kernel, kvm, Vitaly Kuznetsov

2018-07-19 18:47+0200, Paolo Bonzini:
> On 19/07/2018 18:28, Radim Krčmář wrote:
> >> +
> >> +	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > and
> > 
> > 	kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
> > 
> > Still, the main problem is that we can only address 128 APICs.
> > 
> > A simple improvement would reuse the vector field (as we need only 8
> > bits) and put a 'offset' in the rest.  The offset would say which
> > cluster of 128 are we addressing.  24 bits of offset results in 2^31
> > total addressable CPUs (we probably should even use that many bits).
> > The downside of this is that we can only address 128 at a time.
> > 
> > It's basically the same as x2apic cluster mode, only with 128 cluster
> > size instead of 16, so the code should be a straightforward port.
> > And because x2apic code doesn't seem to use any division by the cluster
> > size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> > make the cluster size 192. :)
> 
> I did suggest an offset earlier in the discussion.
> 
> The main problem is that consecutive CPU ids do not map to consecutive
> APIC ids.  But still, we could do an hypercall whenever the total range
> exceeds 64.  Something like

Right, the cluster x2apic implementation came with a second mapping to make
this in linear time and send as little IPIs as possible:

·       /* Collapse cpus in a cluster so a single IPI per cluster is sent */
·       for_each_cpu(cpu, tmpmsk) {
·       ·       struct cluster_mask *cmsk = per_cpu(cluster_masks, cpu);

·       ·       dest = 0;
·       ·       for_each_cpu_and(clustercpu, tmpmsk, &cmsk->mask)
·       ·       ·       dest |= per_cpu(x86_cpu_to_logical_apicid, clustercpu);

·       ·       if (!dest)
·       ·       ·       continue;

·       ·       __x2apic_send_IPI_dest(dest, vector, apic->dest_logical);
·       ·       /* Remove cluster CPUs from tmpmask */
·       ·       cpumask_andnot(tmpmsk, tmpmsk, &cmsk->mask);
·       }

I think that the extra memory consumption would be excusable.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-03  6:21 ` [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest Wanpeng Li
  2018-07-19 16:28   ` Radim Krčmář
@ 2018-07-19 23:05   ` David Matlack
  2018-07-20  3:45     ` Wanpeng Li
  1 sibling, 1 reply; 27+ messages in thread
From: David Matlack @ 2018-07-19 23:05 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm list, Paolo Bonzini,
	Radim Krčmář,
	vkuznets

On Mon, Jul 2, 2018 at 11:23 PM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> Implement paravirtual apic hooks to enable PV IPIs.

Very cool. Thanks for working on this!

>
> apic->send_IPI_mask
> apic->send_IPI_mask_allbutself
> apic->send_IPI_allbutself
> apic->send_IPI_all
>
> The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> environment currently,

From the Cloud perspective, 128 vCPUs is already obsolete. GCE's
n1-utlramem-160 VMs have 160 vCPUs where the maximum APIC ID is 231.
I'd definitely prefer an approach that scales to higher APIC IDs, like
Paolo's offset idea.

To Radim's point of real world performance testing, do you know what
is the primary source of multi-target IPIs? If it's TLB shootdowns we
might get a bigger bang for our buck with a PV TLB Shootdown.

> supporting more vCPUs needs to introduce more
> complex logic, in the future this might be extended if needed.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/include/uapi/asm/kvm_para.h |  1 +
>  arch/x86/kernel/kvm.c                | 70 ++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm_para.h        |  1 +
>  3 files changed, 72 insertions(+)
>
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 0ede697..19980ec 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -28,6 +28,7 @@
>  #define KVM_FEATURE_PV_UNHALT          7
>  #define KVM_FEATURE_PV_TLB_FLUSH       9
>  #define KVM_FEATURE_ASYNC_PF_VMEXIT    10
> +#define KVM_FEATURE_PV_SEND_IPI        11
>
>  #define KVM_HINTS_REALTIME      0
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 591bcf2..2fe1420 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
>  }
>
>  #ifdef CONFIG_SMP
> +
> +#ifdef CONFIG_X86_64
> +static void __send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> +       unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
> +       int cpu, apic_id;
> +
> +       if (cpumask_empty(mask))
> +               return;
> +
> +       local_irq_save(flags);
> +
> +       for_each_cpu(cpu, mask) {
> +               apic_id = per_cpu(x86_cpu_to_apicid, cpu);
> +               if (apic_id < BITS_PER_LONG)
> +                       __set_bit(apic_id, &ipi_bitmap_low);
> +               else if (apic_id < 2 * BITS_PER_LONG)
> +                       __set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
> +       }
> +
> +       kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> +
> +       local_irq_restore(flags);
> +}
> +
> +static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
> +{
> +       __send_ipi_mask(mask, vector);
> +}
> +
> +static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
> +{
> +       unsigned int this_cpu = smp_processor_id();
> +       struct cpumask new_mask;
> +       const struct cpumask *local_mask;
> +
> +       cpumask_copy(&new_mask, mask);
> +       cpumask_clear_cpu(this_cpu, &new_mask);
> +       local_mask = &new_mask;
> +       __send_ipi_mask(local_mask, vector);
> +}
> +
> +static void kvm_send_ipi_allbutself(int vector)
> +{
> +       kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
> +}
> +
> +static void kvm_send_ipi_all(int vector)
> +{
> +       __send_ipi_mask(cpu_online_mask, vector);
> +}
> +
> +/*
> + * Set the IPI entry points
> + */
> +static void kvm_setup_pv_ipi(void)
> +{
> +       apic->send_IPI_mask = kvm_send_ipi_mask;
> +       apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> +       apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
> +       apic->send_IPI_all = kvm_send_ipi_all;
> +       pr_info("KVM setup pv IPIs\n");
> +}
> +#endif
> +
>  static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
>  {
>         native_smp_prepare_cpus(max_cpus);
> @@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)
>
>  static void __init kvm_apic_init(void)
>  {
> +#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
> +       if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
> +               num_possible_cpus() <= 2 * BITS_PER_LONG)
> +               kvm_setup_pv_ipi();
> +#endif
>  }
>
>  static void __init kvm_init_platform(void)
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index dcf629d..84f8fe3 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -26,6 +26,7 @@
>  #define KVM_HC_MIPS_EXIT_VM            7
>  #define KVM_HC_MIPS_CONSOLE_OUTPUT     8
>  #define KVM_HC_CLOCK_PAIRING           9
> +#define KVM_HC_SEND_IPI                10
>
>  /*
>   * hypercalls use architecture specific
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-19 16:28   ` Radim Krčmář
  2018-07-19 16:47     ` Paolo Bonzini
@ 2018-07-20  3:33     ` Wanpeng Li
  2018-07-20  9:51       ` Radim Krcmar
  1 sibling, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2018-07-20  3:33 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 00:28, Radim Krčmář <rkrcmar@redhat.com> wrote:
>
> 2018-07-03 14:21+0800, Wanpeng Li:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Implement paravirtual apic hooks to enable PV IPIs.
> >
> > apic->send_IPI_mask
> > apic->send_IPI_mask_allbutself
> > apic->send_IPI_allbutself
> > apic->send_IPI_all
> >
> > The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> > environment currently, supporting more vCPUs needs to introduce more
> > complex logic, in the future this might be extended if needed.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > @@ -454,6 +454,71 @@ static void __init sev_map_percpu_data(void)
> >  }
> >
> >  #ifdef CONFIG_SMP
> > +
> > +#ifdef CONFIG_X86_64
> > +static void __send_ipi_mask(const struct cpumask *mask, int vector)
> > +{
> > +     unsigned long flags, ipi_bitmap_low = 0, ipi_bitmap_high = 0;
> > +     int cpu, apic_id;
> > +
> > +     if (cpumask_empty(mask))
> > +             return;
> > +
> > +     local_irq_save(flags);
> > +
> > +     for_each_cpu(cpu, mask) {
> > +             apic_id = per_cpu(x86_cpu_to_apicid, cpu);
> > +             if (apic_id < BITS_PER_LONG)
> > +                     __set_bit(apic_id, &ipi_bitmap_low);
> > +             else if (apic_id < 2 * BITS_PER_LONG)
> > +                     __set_bit(apic_id - BITS_PER_LONG, &ipi_bitmap_high);
>
> It'd be nicer with 'unsigned long ipi_bitmap[2]' and a single
>
>         __set_bit(apic_id, ipi_bitmap);
>
> > +     }
> > +
> > +     kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
>
> and
>
>         kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
>
> Still, the main problem is that we can only address 128 APICs.
>
> A simple improvement would reuse the vector field (as we need only 8
> bits) and put a 'offset' in the rest.  The offset would say which
> cluster of 128 are we addressing.  24 bits of offset results in 2^31
> total addressable CPUs (we probably should even use that many bits).
> The downside of this is that we can only address 128 at a time.
>
> It's basically the same as x2apic cluster mode, only with 128 cluster
> size instead of 16, so the code should be a straightforward port.
> And because x2apic code doesn't seem to use any division by the cluster
> size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> make the cluster size 192. :)
>
> But because it is very similar to x2apic, I'd really need some real
> performance data to see if this benefits a real workload.

Thanks for your review, Radim! :) I will find another real benchmark
instead of the micro one to evaluate the performance.

> Hardware could further optimize LAPIC (apicv, vapic) in the future,
> which we'd lose by using paravirt.
>
> e.g. AMD's acceleration should be superior to this when using < 8 VCPUs
> as they can use logical xAPIC and send without VM exits (when all VCPUs
> are running).
>
> > +
> > +     local_irq_restore(flags);
> > +}
> > +
> > +static void kvm_send_ipi_mask(const struct cpumask *mask, int vector)
> > +{
> > +     __send_ipi_mask(mask, vector);
> > +}
> > +
> > +static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector)
> > +{
> > +     unsigned int this_cpu = smp_processor_id();
> > +     struct cpumask new_mask;
> > +     const struct cpumask *local_mask;
> > +
> > +     cpumask_copy(&new_mask, mask);
> > +     cpumask_clear_cpu(this_cpu, &new_mask);
> > +     local_mask = &new_mask;
> > +     __send_ipi_mask(local_mask, vector);
> > +}
> > +
> > +static void kvm_send_ipi_allbutself(int vector)
> > +{
> > +     kvm_send_ipi_mask_allbutself(cpu_online_mask, vector);
> > +}
> > +
> > +static void kvm_send_ipi_all(int vector)
> > +{
> > +     __send_ipi_mask(cpu_online_mask, vector);
>
> These should be faster when using the native APIC shorthand -- is this
> the "Broadcast" in your tests?

Not true, .send_IPI_all almost no callers though linux apic drivers
implement this hook, in addition, shortcut is not used for x2apic
mode(__x2apic_send_IPI_dest()), and very limited using in other
scenarios according to linux apic drivers.

>
> > +}
> > +
> > +/*
> > + * Set the IPI entry points
> > + */
> > +static void kvm_setup_pv_ipi(void)
> > +{
> > +     apic->send_IPI_mask = kvm_send_ipi_mask;
> > +     apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> > +     apic->send_IPI_allbutself = kvm_send_ipi_allbutself;
> > +     apic->send_IPI_all = kvm_send_ipi_all;
> > +     pr_info("KVM setup pv IPIs\n");
> > +}
> > +#endif
> > +
> >  static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
> >  {
> >       native_smp_prepare_cpus(max_cpus);
> > @@ -626,6 +691,11 @@ static uint32_t __init kvm_detect(void)
> >
> >  static void __init kvm_apic_init(void)
> >  {
> > +#if defined(CONFIG_SMP) && defined(CONFIG_X86_64)
> > +     if (kvm_para_has_feature(KVM_FEATURE_PV_SEND_IPI) &&
> > +             num_possible_cpus() <= 2 * BITS_PER_LONG)
>
> It looks that num_possible_cpus() is actually NR_CPUS, so the feature
> would never be used on a standard Linux distro.
> And we're using APIC_ID, which can be higher even if maximum CPU the
> number is lower.  Just remove it.

Will do.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-19 16:47     ` Paolo Bonzini
  2018-07-19 17:22       ` Radim Krčmář
@ 2018-07-20  3:35       ` Wanpeng Li
  2018-07-20  5:58       ` Wanpeng Li
  2 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-20  3:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Radim Krcmar, LKML, kvm, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 00:47, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 19/07/2018 18:28, Radim Krčmář wrote:
> >> +
> >> +    kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > and
> >
> >       kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
> >
> > Still, the main problem is that we can only address 128 APICs.
> >
> > A simple improvement would reuse the vector field (as we need only 8
> > bits) and put a 'offset' in the rest.  The offset would say which
> > cluster of 128 are we addressing.  24 bits of offset results in 2^31
> > total addressable CPUs (we probably should even use that many bits).
> > The downside of this is that we can only address 128 at a time.
> >
> > It's basically the same as x2apic cluster mode, only with 128 cluster
> > size instead of 16, so the code should be a straightforward port.
> > And because x2apic code doesn't seem to use any division by the cluster
> > size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> > make the cluster size 192. :)
>
> I did suggest an offset earlier in the discussion.
>
> The main problem is that consecutive CPU ids do not map to consecutive
> APIC ids.  But still, we could do an hypercall whenever the total range
> exceeds 64.  Something like
>
> u64 ipi_bitmap = 0;
> for_each_cpu(cpu, mask)
>         if (!ipi_bitmap) {
>                 min = max = cpu;
>         } else if (cpu < min && max - cpu < 64) {
>                 ipi_bitmap <<= min - cpu;
>                 min = cpu;
>         } else if (id < min + 64) {
>                 max = cpu < max ? max : cpu;
>         } else {
>                 /* ... send hypercall... */
>                 min = max = cpu;
>                 ipi_bitmap = 0;
>         }
>         __set_bit(ipi_bitmap, cpu - min);
> }
> if (ipi_bitmap) {
>         /* ... send hypercall... */
> }
>
> We could keep the cluster size of 128, but it would be more complicated
> to do the left shift in the first "else if".  If the limit is 64, you
> can keep the two arguments in the hypercall, and just pass 0 as the
> "high" bitmap on 64-bit kernels.

Cool, i will try the offset method in next version. Thanks for your
review, Paolo! :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-19 23:05   ` David Matlack
@ 2018-07-20  3:45     ` Wanpeng Li
  2018-07-20 13:12       ` Radim Krcmar
  0 siblings, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2018-07-20  3:45 UTC (permalink / raw)
  To: David Matlack; +Cc: LKML, kvm, Paolo Bonzini, Radim Krcmar, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 07:05, David Matlack <dmatlack@google.com> wrote:
>
> On Mon, Jul 2, 2018 at 11:23 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> >
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Implement paravirtual apic hooks to enable PV IPIs.
>
> Very cool. Thanks for working on this!

Thanks David!

>
> >
> > apic->send_IPI_mask
> > apic->send_IPI_mask_allbutself
> > apic->send_IPI_allbutself
> > apic->send_IPI_all
> >
> > The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> > environment currently,
>
> From the Cloud perspective, 128 vCPUs is already obsolete. GCE's
> n1-utlramem-160 VMs have 160 vCPUs where the maximum APIC ID is 231.
> I'd definitely prefer an approach that scales to higher APIC IDs, like
> Paolo's offset idea.

Ok, I will try the offset method in next version.

>
> To Radim's point of real world performance testing, do you know what
> is the primary source of multi-target IPIs? If it's TLB shootdowns we
> might get a bigger bang for our buck with a PV TLB Shootdown.

The "Function Call interrupts", there is a lot of callers for
smp_call_function_many() except TLB Shootdowns in linux kernel which
try to run a function on a set of other CPUs. TLB Shootdown still can
get benefit from PV IPIs even if PV TLB Shootdown is enabled since
IPIs should be sent to the vCPUs which are active and will incur
vmexits. PV IPIs will benefit both vCPUs overcommit and
non-overcommit(which PV TLB Shootdown can't help) scenarios. Btw,
hyperv also implements PV IPIs even if PV TLB Shootdown is present.
https://lkml.org/lkml/2018/7/3/537

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall
  2018-07-19 16:47   ` Paolo Bonzini
@ 2018-07-20  3:49     ` Wanpeng Li
  0 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-20  3:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: LKML, kvm, Radim Krcmar, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 00:47, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 03/07/2018 08:21, Wanpeng Li wrote:
> > +
> > +     rcu_read_lock();
> > +     map = rcu_dereference(kvm->arch.apic_map);
> > +
> > +     for_each_set_bit(i, &ipi_bitmap_low, BITS_PER_LONG) {
> > +             vcpu = map->phys_map[i]->vcpu;
> > +             if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> > +                     return 1;
> > +     }
> > +
> > +     for_each_set_bit(i, &ipi_bitmap_high, BITS_PER_LONG) {
> > +             vcpu = map->phys_map[i + BITS_PER_LONG]->vcpu;
> > +             if (!kvm_apic_set_irq(vcpu, &irq, NULL))
> > +                     return 1;
> > +     }
> > +
>
> This should be the guest's BITS_PER_LONG, not the host's (i.e. you need
> to pass op_64_bit from kvm_emulate_hypercall).

Will do in next version.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs
  2018-07-19 16:31   ` Radim Krčmář
@ 2018-07-20  3:53     ` Wanpeng Li
  2018-07-20  8:04       ` Paolo Bonzini
  0 siblings, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2018-07-20  3:53 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 00:31, Radim Krčmář <rkrcmar@redhat.com> wrote:
>
> 2018-07-03 14:21+0800, Wanpeng Li:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > The NMI delivery mode of ICR is used to deliver an NMI to the processor,
> > and the vector information is ignored.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > @@ -479,7 +479,16 @@ static int __send_ipi_mask(const struct cpumask *mask, int vector)
> >               }
> >       }
> >
> > -     ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > +     switch (vector) {
> > +     default:
> > +             icr = APIC_DM_FIXED | vector;
> > +             break;
> > +     case NMI_VECTOR:
> > +             icr = APIC_DM_NMI;
>
> I think it would be better to say that KVM interprets NMI_VECTOR and
> sends the interrupt as APIC_DM_NMI.

Yeah, in addition, SDM 10.6.1 also mentioned that:
Delivery mode:
    100 (NMI) Delivers an NMI interrupt to the target processor or
processors. The vector information is ignored.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-19 16:47     ` Paolo Bonzini
  2018-07-19 17:22       ` Radim Krčmář
  2018-07-20  3:35       ` Wanpeng Li
@ 2018-07-20  5:58       ` Wanpeng Li
  2018-07-20  8:06         ` Paolo Bonzini
  2 siblings, 1 reply; 27+ messages in thread
From: Wanpeng Li @ 2018-07-20  5:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Radim Krcmar, LKML, kvm, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 00:47, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 19/07/2018 18:28, Radim Krčmář wrote:
> >> +
> >> +    kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > and
> >
> >       kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap[0], ipi_bitmap[1], vector);
> >
> > Still, the main problem is that we can only address 128 APICs.
> >
> > A simple improvement would reuse the vector field (as we need only 8
> > bits) and put a 'offset' in the rest.  The offset would say which
> > cluster of 128 are we addressing.  24 bits of offset results in 2^31
> > total addressable CPUs (we probably should even use that many bits).
> > The downside of this is that we can only address 128 at a time.
> >
> > It's basically the same as x2apic cluster mode, only with 128 cluster
> > size instead of 16, so the code should be a straightforward port.
> > And because x2apic code doesn't seem to use any division by the cluster
> > size, we could even try to use kvm_hypercall4, add ipi_bitmap[2], and
> > make the cluster size 192. :)
>
> I did suggest an offset earlier in the discussion.
>
> The main problem is that consecutive CPU ids do not map to consecutive
> APIC ids.  But still, we could do an hypercall whenever the total range
> exceeds 64.  Something like
>
> u64 ipi_bitmap = 0;
> for_each_cpu(cpu, mask)
>         if (!ipi_bitmap) {
>                 min = max = cpu;
>         } else if (cpu < min && max - cpu < 64) {
>                 ipi_bitmap <<= min - cpu;
>                 min = cpu;
>         } else if (id < min + 64) {
>                 max = cpu < max ? max : cpu;
>         } else {
>                 /* ... send hypercall... */
>                 min = max = cpu;
>                 ipi_bitmap = 0;
>         }
>         __set_bit(ipi_bitmap, cpu - min);
> }
> if (ipi_bitmap) {
>         /* ... send hypercall... */
> }
>
> We could keep the cluster size of 128, but it would be more complicated
> to do the left shift in the first "else if".  If the limit is 64, you
> can keep the two arguments in the hypercall, and just pass 0 as the
> "high" bitmap on 64-bit kernels.

As David pointed out, we need to scale to higher APIC IDs. I will add
the cpu id to apic id transfer in the for loop. How about:
kvm_hypercall2(KVM_HC_SEND_IPI, ipi_bitmap, vector); directly. In
addition, why need to pass the 0 as the "high" bitmap even if for 128
vCPUs case?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs
  2018-07-20  3:53     ` Wanpeng Li
@ 2018-07-20  8:04       ` Paolo Bonzini
  2018-07-20 13:26         ` Radim Krcmar
  0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-20  8:04 UTC (permalink / raw)
  To: Wanpeng Li, Radim Krcmar; +Cc: LKML, kvm, Vitaly Kuznetsov

On 20/07/2018 05:53, Wanpeng Li wrote:
>>> -     ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
>>> +     switch (vector) {
>>> +     default:
>>> +             icr = APIC_DM_FIXED | vector;
>>> +             break;
>>> +     case NMI_VECTOR:
>>> +             icr = APIC_DM_NMI;
>> I think it would be better to say that KVM interprets NMI_VECTOR and
>> sends the interrupt as APIC_DM_NMI.

It's not KVM, this is arch/x86/kernel/kvm.c so the guest side.

Paolo

> Yeah, in addition, SDM 10.6.1 also mentioned that:
> Delivery mode:
>     100 (NMI) Delivers an NMI interrupt to the target processor or
> processors. The vector information is ignored.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-20  5:58       ` Wanpeng Li
@ 2018-07-20  8:06         ` Paolo Bonzini
  0 siblings, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2018-07-20  8:06 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Radim Krcmar, LKML, kvm, Vitaly Kuznetsov

On 20/07/2018 07:58, Wanpeng Li wrote:
>>
>> We could keep the cluster size of 128, but it would be more complicated
>> to do the left shift in the first "else if".  If the limit is 64, you
>> can keep the two arguments in the hypercall, and just pass 0 as the
>> "high" bitmap on 64-bit kernels.
> As David pointed out, we need to scale to higher APIC IDs.

The offset is enough to scale to higher APIC IDs.

It's just an optimization to allow 128 CPUs per hypercall instead of 64
CPUs.  But actually you can use __uint128_t on 64-bit machines, I forgot
about that.  With u64 on 32-bit and __uint128_t on 64-bit, you can do 64
CPUs per hypercall on 32-bit and 128 CPUs per hypercall on 64-bit.

 I will add
> the cpu id to apic id transfer in the for loop. How about:
> kvm_hypercall2(KVM_HC_SEND_IPI, ipi_bitmap, vector); directly. In
> addition, why need to pass the 0 as the "high" bitmap even if for 128
> vCPUs case?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-20  3:33     ` Wanpeng Li
@ 2018-07-20  9:51       ` Radim Krcmar
  2018-07-20 10:17         ` Wanpeng Li
  0 siblings, 1 reply; 27+ messages in thread
From: Radim Krcmar @ 2018-07-20  9:51 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov

2018-07-20 11:33+0800, Wanpeng Li:
> On Fri, 20 Jul 2018 at 00:28, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > 2018-07-03 14:21+0800, Wanpeng Li:
> > But because it is very similar to x2apic, I'd really need some real
> > performance data to see if this benefits a real workload.
> 
> Thanks for your review, Radim! :) I will find another real benchmark
> instead of the micro one to evaluate the performance.

Analyzing the cpu bitmap for every IPI request on a non-small guest (at
least 32 VCPUs, ideally >256) during various workloads could also
provide some insight regardless of workload/benchmark result -- we want
to know how many VM exits we would save.

> > > +static void kvm_send_ipi_all(int vector)
> > > +{
> > > +     __send_ipi_mask(cpu_online_mask, vector);
> >
> > These should be faster when using the native APIC shorthand -- is this
> > the "Broadcast" in your tests?
> 
> Not true, .send_IPI_all almost no callers though linux apic drivers
> implement this hook, in addition, shortcut is not used for x2apic
> mode(__x2apic_send_IPI_dest()), and very limited using in other
> scenarios according to linux apic drivers.

Good point,
(xAPIC is using shorthands, so I didn't expect we'd stop doing so on
 x2APIC, but there was probably no need.)

thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-20  9:51       ` Radim Krcmar
@ 2018-07-20 10:17         ` Wanpeng Li
  0 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-20 10:17 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 17:51, Radim Krcmar <rkrcmar@redhat.com> wrote:
>
> 2018-07-20 11:33+0800, Wanpeng Li:
> > On Fri, 20 Jul 2018 at 00:28, Radim Krčmář <rkrcmar@redhat.com> wrote:
> > > 2018-07-03 14:21+0800, Wanpeng Li:
> > > But because it is very similar to x2apic, I'd really need some real
> > > performance data to see if this benefits a real workload.
> >
> > Thanks for your review, Radim! :) I will find another real benchmark
> > instead of the micro one to evaluate the performance.
>
> Analyzing the cpu bitmap for every IPI request on a non-small guest (at
> least 32 VCPUs, ideally >256) during various workloads could also
> provide some insight regardless of workload/benchmark result -- we want
> to know how many VM exits we would save.

I will try ebizzy benchmark, just complete the patchset w/ __uint128
which Paolo just suggested. In addition, I remember aliyun posted the
performance number for their online real workload "message oriented
middleware" from 800 k/s vmexits w/ PV IPIs to 150 k/s vmexits w/ PV
IPIs.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest
  2018-07-20  3:45     ` Wanpeng Li
@ 2018-07-20 13:12       ` Radim Krcmar
  0 siblings, 0 replies; 27+ messages in thread
From: Radim Krcmar @ 2018-07-20 13:12 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: David Matlack, LKML, kvm, Paolo Bonzini, Vitaly Kuznetsov

2018-07-20 11:45+0800, Wanpeng Li:
> On Fri, 20 Jul 2018 at 07:05, David Matlack <dmatlack@google.com> wrote:
> >
> > On Mon, Jul 2, 2018 at 11:23 PM Wanpeng Li <kernellwp@gmail.com> wrote:
> > >
> > > From: Wanpeng Li <wanpengli@tencent.com>
> > >
> > > Implement paravirtual apic hooks to enable PV IPIs.
> >
> > Very cool. Thanks for working on this!
> 
> Thanks David!
> 
> >
> > >
> > > apic->send_IPI_mask
> > > apic->send_IPI_mask_allbutself
> > > apic->send_IPI_allbutself
> > > apic->send_IPI_all
> > >
> > > The PV IPIs supports maximal 128 vCPUs VM, it is big enough for cloud
> > > environment currently,
> >
> > From the Cloud perspective, 128 vCPUs is already obsolete. GCE's
> > n1-utlramem-160 VMs have 160 vCPUs where the maximum APIC ID is 231.
> > I'd definitely prefer an approach that scales to higher APIC IDs, like
> > Paolo's offset idea.
> 
> Ok, I will try the offset method in next version.
> 
> >
> > To Radim's point of real world performance testing, do you know what
> > is the primary source of multi-target IPIs? If it's TLB shootdowns we
> > might get a bigger bang for our buck with a PV TLB Shootdown.

I assume it is the TLB shootdown by a large margin, but never profiled
it.

We could add a more complex PV TLB shootdown that does the whole TLB
invalidation inside KVM, but I don't think it would be significantly
faster, because we need to force a VM exit if the target VCPU is
running.  With PV IPI, we get exit-less IPI injection and a VM exit for
TLB invalidation.

> The "Function Call interrupts", there is a lot of callers for
> smp_call_function_many() except TLB Shootdowns in linux kernel which
> try to run a function on a set of other CPUs. TLB Shootdown still can
> get benefit from PV IPIs even if PV TLB Shootdown is enabled since
> IPIs should be sent to the vCPUs which are active and will incur
> vmexits. PV IPIs will benefit both vCPUs overcommit and
> non-overcommit(which PV TLB Shootdown can't help) scenarios. Btw,
> hyperv also implements PV IPIs even if PV TLB Shootdown is present.
> https://lkml.org/lkml/2018/7/3/537

Hyper-V in a better spot as guests can be using hypervisor's VPIDX for
cpu bitmaps and pass it to the IPI hypercall, so there is no APIC ID
translation on either side -- the IPI hypercall seems worth it just for
the simpler logic.

Hyper-V can also address up to 4096 VCPUs in one hypercall (optimized as
sparse 2 level 64-ary tree), so we might want to do something like that
(with higher ceiling) as the probability of being within 128 APIC IDs
should rapidly diminish with growing number of VCPUs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs
  2018-07-20  8:04       ` Paolo Bonzini
@ 2018-07-20 13:26         ` Radim Krcmar
  2018-07-23  0:52           ` Wanpeng Li
  0 siblings, 1 reply; 27+ messages in thread
From: Radim Krcmar @ 2018-07-20 13:26 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Wanpeng Li, LKML, kvm, Vitaly Kuznetsov

2018-07-20 10:04+0200, Paolo Bonzini:
> On 20/07/2018 05:53, Wanpeng Li wrote:
> >>> -     ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> >>> +     switch (vector) {
> >>> +     default:
> >>> +             icr = APIC_DM_FIXED | vector;
> >>> +             break;
> >>> +     case NMI_VECTOR:
> >>> +             icr = APIC_DM_NMI;
> >> I think it would be better to say that KVM interprets NMI_VECTOR and
> >> sends the interrupt as APIC_DM_NMI.
> 
> It's not KVM, this is arch/x86/kernel/kvm.c so the guest side.

Yes, I was preparing to drop the delivery mode bits as they are not
needed for NMI_VECTOR: real APIC can't send nor receive vectors 0-15, so
we could just say that NMI_VECTOR is sendable through this interface and
is delivered as APIC_DM_NMI.

I didn't realize that this allows the other delivery modes.  They are
not useful [1], but we don't really have a better use for the three
bits.  The only other option so far is extending the cluster index to
increase addressability ... I'm starting to reconsider as addressing
another few millions VCPUs would probably be even more useless.

Wanpeng, sorry for the confusion, feel free to use any version.
Document the hypercall layout, though,

thanks.

---
1: I think that OSes will use unicast INIT+SIPI as it's easier to call
   it after preparing structures for the new CPU;  SMI is pointless;
   LowestPriority shouldn't be supported; and Reserved (APIC_DM_REMRD)
   is reused for KVM unhalt, where unicast is most likely.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs
  2018-07-20 13:26         ` Radim Krcmar
@ 2018-07-23  0:52           ` Wanpeng Li
  0 siblings, 0 replies; 27+ messages in thread
From: Wanpeng Li @ 2018-07-23  0:52 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: Paolo Bonzini, LKML, kvm, Vitaly Kuznetsov

On Fri, 20 Jul 2018 at 21:27, Radim Krcmar <rkrcmar@redhat.com> wrote:
>
> 2018-07-20 10:04+0200, Paolo Bonzini:
> > On 20/07/2018 05:53, Wanpeng Li wrote:
> > >>> -     ret = kvm_hypercall3(KVM_HC_SEND_IPI, ipi_bitmap_low, ipi_bitmap_high, vector);
> > >>> +     switch (vector) {
> > >>> +     default:
> > >>> +             icr = APIC_DM_FIXED | vector;
> > >>> +             break;
> > >>> +     case NMI_VECTOR:
> > >>> +             icr = APIC_DM_NMI;
> > >> I think it would be better to say that KVM interprets NMI_VECTOR and
> > >> sends the interrupt as APIC_DM_NMI.
> >
> > It's not KVM, this is arch/x86/kernel/kvm.c so the guest side.
>
> Yes, I was preparing to drop the delivery mode bits as they are not
> needed for NMI_VECTOR: real APIC can't send nor receive vectors 0-15, so
> we could just say that NMI_VECTOR is sendable through this interface and
> is delivered as APIC_DM_NMI.
>
> I didn't realize that this allows the other delivery modes.  They are
> not useful [1], but we don't really have a better use for the three
> bits.  The only other option so far is extending the cluster index to
> increase addressability ... I'm starting to reconsider as addressing
> another few millions VCPUs would probably be even more useless.
>
> Wanpeng, sorry for the confusion, feel free to use any version.
> Document the hypercall layout, though,

No problem, you and Paolo always help me a lot, thanks for that! :)

Regards,
Wanpeng Li

>
> thanks.
>
> ---
> 1: I think that OSes will use unicast INIT+SIPI as it's easier to call
>    it after preparing structures for the new CPU;  SMI is pointless;
>    LowestPriority shouldn't be supported; and Reserved (APIC_DM_REMRD)
>    is reused for KVM unhalt, where unicast is most likely.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-07-23  0:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03  6:21 [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li
2018-07-03  6:21 ` [PATCH v3 1/6] KVM: X86: Add kvm hypervisor init time platform setup callback Wanpeng Li
2018-07-03  6:21 ` [PATCH v3 2/6] KVM: X86: Implement PV IPIs in linux guest Wanpeng Li
2018-07-19 16:28   ` Radim Krčmář
2018-07-19 16:47     ` Paolo Bonzini
2018-07-19 17:22       ` Radim Krčmář
2018-07-20  3:35       ` Wanpeng Li
2018-07-20  5:58       ` Wanpeng Li
2018-07-20  8:06         ` Paolo Bonzini
2018-07-20  3:33     ` Wanpeng Li
2018-07-20  9:51       ` Radim Krcmar
2018-07-20 10:17         ` Wanpeng Li
2018-07-19 23:05   ` David Matlack
2018-07-20  3:45     ` Wanpeng Li
2018-07-20 13:12       ` Radim Krcmar
2018-07-03  6:21 ` [PATCH v3 3/6] KVM: X86: Fallback to original apic hooks when bad happens Wanpeng Li
2018-07-03  6:21 ` [PATCH v3 4/6] KVM: X86: Implement PV IPIs send hypercall Wanpeng Li
2018-07-19 16:47   ` Paolo Bonzini
2018-07-20  3:49     ` Wanpeng Li
2018-07-03  6:21 ` [PATCH v3 5/6] KVM: X86: Add NMI support to PV IPIs Wanpeng Li
2018-07-19 16:31   ` Radim Krčmář
2018-07-20  3:53     ` Wanpeng Li
2018-07-20  8:04       ` Paolo Bonzini
2018-07-20 13:26         ` Radim Krcmar
2018-07-23  0:52           ` Wanpeng Li
2018-07-03  6:21 ` [PATCH v3 6/6] KVM: X86: Expose PV_SEND_IPI CPUID feature bit to guest Wanpeng Li
2018-07-18  3:00 ` [PATCH v3 0/6] KVM: X86: Implement PV IPIs support Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).