linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs
@ 2021-09-03 13:08 Juergen Gross
  2021-09-03 13:08 ` [PATCH v2 1/6] x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h Juergen Gross
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:08 UTC (permalink / raw)
  To: kvm, x86, linux-kernel, linux-doc, linux-arm-kernel
  Cc: maz, ehabkost, Juergen Gross, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Jonathan Corbet, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, kvmarm

In order to be able to have a single kernel for supporting even huge
numbers of vcpus per guest some arrays should be sized dynamically.

The easiest way to do that is to add boot parameters for the maximum
number of vcpus and to calculate the maximum vcpu-id from that using
either the host topology or a topology hint via another boot parameter.

This patch series is doing that for x86. The same scheme can be easily
adapted to other architectures, but I don't want to do that in the
first iteration.

In the long term I'd suggest to have a per-guest setting of the two
parameters allowing to spare some memory for smaller guests. OTOH this
would require new ioctl()s and respective qemu modifications, so I let
those away for now.

I've tested the series not to break normal guest operation and the new
parameters to be effective on x86. For Arm64 I did a compile test only.

Changes in V2:
- removed old patch 1, as already applied
- patch 1 (old patch 2) only for reference, as the patch is already in
  the kvm tree
- switch patch 2 (old patch 3) to calculate vcpu-id
- added patch 4

Juergen Gross (6):
  x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h
  x86/kvm: add boot parameter for adding vcpu-id bits
  x86/kvm: introduce per cpu vcpu masks
  kvm: use kvfree() in kvm_arch_free_vm()
  kvm: allocate vcpu pointer array separately
  x86/kvm: add boot parameter for setting max number of vcpus per guest

 .../admin-guide/kernel-parameters.txt         | 25 ++++++
 arch/arm64/include/asm/kvm_host.h             |  1 -
 arch/arm64/kvm/arm.c                          | 23 ++++--
 arch/x86/include/asm/kvm_host.h               | 26 +++++--
 arch/x86/kvm/hyperv.c                         | 25 ++++--
 arch/x86/kvm/ioapic.c                         | 12 ++-
 arch/x86/kvm/ioapic.h                         |  8 +-
 arch/x86/kvm/irq_comm.c                       |  9 ++-
 arch/x86/kvm/x86.c                            | 78 ++++++++++++++++++-
 include/linux/kvm_host.h                      | 26 ++++++-
 10 files changed, 198 insertions(+), 35 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/6] x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h
  2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
@ 2021-09-03 13:08 ` Juergen Gross
  2021-09-03 13:08 ` [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:08 UTC (permalink / raw)
  To: kvm, x86, linux-kernel
  Cc: maz, ehabkost, Juergen Gross, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

The file has been moved to arch/x86 long time ago. Time to get rid of
non-x86 stuff.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- patch only kept for reference, has already been added to kvm.git
---
 arch/x86/kvm/ioapic.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 11e4065e1617..bbd4a5d18b5d 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -35,11 +35,7 @@ struct kvm_vcpu;
 #define	IOAPIC_INIT			0x5
 #define	IOAPIC_EXTINT			0x7
 
-#ifdef CONFIG_X86
 #define RTC_GSI 8
-#else
-#define RTC_GSI -1U
-#endif
 
 struct dest_map {
 	/* vcpu bitmap where IRQ has been sent */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
  2021-09-03 13:08 ` [PATCH v2 1/6] x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h Juergen Gross
@ 2021-09-03 13:08 ` Juergen Gross
  2021-09-03 13:43   ` Vitaly Kuznetsov
                     ` (2 more replies)
  2021-09-03 13:08 ` [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks Juergen Gross
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:08 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: maz, ehabkost, Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin

Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
via a #define in a header file.

In order to support higher vcpu-ids without generally increasing the
memory consumption of guests on the host (some guest structures contain
arrays sized by KVM_MAX_VCPU_ID) add a boot parameter for adding some
bits to the vcpu-id. Additional bits are needed as the vcpu-id is
constructed via bit-wise concatenation of socket-id, core-id, etc.
As those ids maximum values are not always a power of 2, the vcpu-ids
are sparse.

The additional number of bits needed is basically the number of
topology levels with a non-power-of-2 maximum value, excluding the top
most level.

The default value of the new parameter will be to take the correct
setting from the host's topology.

Calculating the maximum vcpu-id dynamically requires to allocate the
arrays using KVM_MAX_VCPU_ID as the size dynamically.

Signed-of-by: Juergen Gross <jgross@suse.com>
---
V2:
- switch to specifying additional bits (based on comment by Vitaly
  Kuznetsov)

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 .../admin-guide/kernel-parameters.txt         | 18 ++++++++++++
 arch/x86/include/asm/kvm_host.h               |  4 ++-
 arch/x86/kvm/ioapic.c                         | 12 +++++++-
 arch/x86/kvm/ioapic.h                         |  4 +--
 arch/x86/kvm/x86.c                            | 29 +++++++++++++++++++
 5 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 84dc5790741b..37e194299311 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2435,6 +2435,24 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	kvm.vcpu_id_add_bits=
+			[KVM,X86] The vcpu-ids of guests are sparse, as they
+			are constructed by bit-wise concatenation of the ids of
+			the different topology levels (sockets, cores, threads).
+
+			This parameter specifies how many additional bits the
+			maximum vcpu-id needs compared to the maximum number of
+			vcpus.
+
+			Normally this value is the number of topology levels
+			without the threads level and without the highest
+			level.
+
+			The special value -1 can be used to support guests
+			with the same topology is the host.
+
+			Default: -1
+
 	l1d_flush=	[X86,INTEL]
 			Control mitigation for L1D based snooping vulnerability.
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af6ce8d4c86a..3513edee8e22 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
 
 #define KVM_MAX_VCPUS 288
 #define KVM_SOFT_MAX_VCPUS 240
-#define KVM_MAX_VCPU_ID 1023
+#define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
 
@@ -1588,6 +1588,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
 extern u64  kvm_default_tsc_scaling_ratio;
 /* bus lock detection supported? */
 extern bool kvm_has_bus_lock_exit;
+/* maximum vcpu-id */
+unsigned int kvm_max_vcpu_id(void);
 
 extern u64 kvm_mce_cap_supported;
 
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index ff005fe738a4..52e8ea90c914 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
 int kvm_ioapic_init(struct kvm *kvm)
 {
 	struct kvm_ioapic *ioapic;
+	size_t sz;
 	int ret;
 
-	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
+	sz = sizeof(struct kvm_ioapic) +
+	     sizeof(*ioapic->rtc_status.dest_map.map) *
+		    BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1) +
+	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
+		    (KVM_MAX_VCPU_ID + 1);
+	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
 	if (!ioapic)
 		return -ENOMEM;
+	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
+	ioapic->rtc_status.dest_map.vectors =
+		(void *)(ioapic->rtc_status.dest_map.map +
+			 BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1));
 	spin_lock_init(&ioapic->lock);
 	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
 	kvm->arch.vioapic = ioapic;
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index bbd4a5d18b5d..623a3c5afad7 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -39,13 +39,13 @@ struct kvm_vcpu;
 
 struct dest_map {
 	/* vcpu bitmap where IRQ has been sent */
-	DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
+	unsigned long *map;
 
 	/*
 	 * Vector sent to a given vcpu, only valid when
 	 * the vcpu's bit in map is set
 	 */
-	u8 vectors[KVM_MAX_VCPU_ID + 1];
+	u8 *vectors;
 };
 
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5d5c5ed7dd4..6b6f38f0b617 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -78,6 +78,7 @@
 #include <asm/intel_pt.h>
 #include <asm/emulate_prefix.h>
 #include <asm/sgx.h>
+#include <asm/topology.h>
 #include <clocksource/hyperv_timer.h>
 
 #define CREATE_TRACE_POINTS
@@ -184,6 +185,34 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+static int __read_mostly vcpu_id_add_bits = -1;
+module_param(vcpu_id_add_bits, int, S_IRUGO);
+
+unsigned int kvm_max_vcpu_id(void)
+{
+	int n_bits = fls(KVM_MAX_VCPUS - 1);
+
+	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
+		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
+		       vcpu_id_add_bits);
+		vcpu_id_add_bits = -1;
+	}
+
+	if (vcpu_id_add_bits >= 0) {
+		n_bits += vcpu_id_add_bits;
+	} else {
+		n_bits++;		/* One additional bit for core level. */
+		if (topology_max_die_per_package() > 1)
+			n_bits++;	/* One additional bit for die level. */
+	}
+
+	if (!n_bits)
+		n_bits = 1;
+
+	return (1U << n_bits) - 1;
+}
+EXPORT_SYMBOL_GPL(kvm_max_vcpu_id);
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks
  2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
  2021-09-03 13:08 ` [PATCH v2 1/6] x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h Juergen Gross
  2021-09-03 13:08 ` [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
@ 2021-09-03 13:08 ` Juergen Gross
  2021-09-03 16:05   ` Eduardo Habkost
  2021-09-07 18:34   ` Eduardo Habkost
  2021-09-03 13:08 ` [PATCH v2 4/6] kvm: use kvfree() in kvm_arch_free_vm() Juergen Gross
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:08 UTC (permalink / raw)
  To: kvm, x86, linux-kernel
  Cc: maz, ehabkost, Juergen Gross, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

In order to support high vcpu numbers per guest don't use on stack
vcpu bitmasks. As all those currently used bitmasks are not used in
functions subject to recursion it is fairly easy to replace them with
percpu bitmasks.

Disable preemption while such a bitmask is being used in order to
avoid double usage in case we'd switch cpus.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- use local_lock() instead of preempt_disable() (Paolo Bonzini)
---
 arch/x86/include/asm/kvm_host.h | 10 ++++++++++
 arch/x86/kvm/hyperv.c           | 25 ++++++++++++++++++-------
 arch/x86/kvm/irq_comm.c         |  9 +++++++--
 arch/x86/kvm/x86.c              | 22 +++++++++++++++++++++-
 4 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3513edee8e22..a809a9e4fa5c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -15,6 +15,7 @@
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
 #include <linux/irq.h>
+#include <linux/local_lock.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -1591,6 +1592,15 @@ extern bool kvm_has_bus_lock_exit;
 /* maximum vcpu-id */
 unsigned int kvm_max_vcpu_id(void);
 
+/* per cpu vcpu bitmasks, protected by kvm_pcpu_mask_lock */
+DECLARE_PER_CPU(local_lock_t, kvm_pcpu_mask_lock);
+extern unsigned long __percpu *kvm_pcpu_vcpu_mask;
+#define KVM_VCPU_MASK_SZ	\
+	(sizeof(*kvm_pcpu_vcpu_mask) * BITS_TO_LONGS(KVM_MAX_VCPUS))
+extern u64 __percpu *kvm_hv_vp_bitmap;
+#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
+#define KVM_HV_VPMAP_SZ		(sizeof(u64) * KVM_HV_MAX_SPARSE_VCPU_SET_BITS)
+
 extern u64 kvm_mce_cap_supported;
 
 /*
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 41d2a53c5dea..680743e43c5b 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -41,7 +41,7 @@
 /* "Hv#1" signature */
 #define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
 
-#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
+u64 __percpu *kvm_hv_vp_bitmap;
 
 static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
 				bool vcpu_kick);
@@ -1701,8 +1701,7 @@ static __always_inline unsigned long *sparse_set_to_vcpu_mask(
 	struct kvm_vcpu *vcpu;
 	int i, bank, sbank = 0;
 
-	memset(vp_bitmap, 0,
-	       KVM_HV_MAX_SPARSE_VCPU_SET_BITS * sizeof(*vp_bitmap));
+	memset(vp_bitmap, 0, KVM_HV_VPMAP_SZ);
 	for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
 			 KVM_HV_MAX_SPARSE_VCPU_SET_BITS)
 		vp_bitmap[bank] = sparse_banks[sbank++];
@@ -1740,8 +1739,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
 	struct hv_tlb_flush_ex flush_ex;
 	struct hv_tlb_flush flush;
-	u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
-	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+	u64 *vp_bitmap;
+	unsigned long *vcpu_bitmap;
 	unsigned long *vcpu_mask;
 	u64 valid_bank_mask;
 	u64 sparse_banks[64];
@@ -1821,6 +1820,10 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
 
 	cpumask_clear(&hv_vcpu->tlb_flush);
 
+	local_lock(&kvm_pcpu_mask_lock);
+	vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
+	vp_bitmap = this_cpu_ptr(kvm_hv_vp_bitmap);
+
 	vcpu_mask = all_cpus ? NULL :
 		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
 					vp_bitmap, vcpu_bitmap);
@@ -1832,6 +1835,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
 	kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST,
 				    NULL, vcpu_mask, &hv_vcpu->tlb_flush);
 
+	local_unlock(&kvm_pcpu_mask_lock);
+
 ret_success:
 	/* We always do full TLB flush, set 'Reps completed' = 'Rep Count' */
 	return (u64)HV_STATUS_SUCCESS |
@@ -1862,8 +1867,8 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
 	struct kvm *kvm = vcpu->kvm;
 	struct hv_send_ipi_ex send_ipi_ex;
 	struct hv_send_ipi send_ipi;
-	u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
-	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+	u64 *vp_bitmap;
+	unsigned long *vcpu_bitmap;
 	unsigned long *vcpu_mask;
 	unsigned long valid_bank_mask;
 	u64 sparse_banks[64];
@@ -1920,12 +1925,18 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
 	if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
 		return HV_STATUS_INVALID_HYPERCALL_INPUT;
 
+	local_lock(&kvm_pcpu_mask_lock);
+	vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
+	vp_bitmap = this_cpu_ptr(kvm_hv_vp_bitmap);
+
 	vcpu_mask = all_cpus ? NULL :
 		sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask,
 					vp_bitmap, vcpu_bitmap);
 
 	kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
 
+	local_unlock(&kvm_pcpu_mask_lock);
+
 ret_success:
 	return HV_STATUS_SUCCESS;
 }
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index d5b72a08e566..c331204de007 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -47,7 +47,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 {
 	int i, r = -1;
 	struct kvm_vcpu *vcpu, *lowest = NULL;
-	unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
+	unsigned long *dest_vcpu_bitmap;
 	unsigned int dest_vcpus = 0;
 
 	if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
@@ -59,7 +59,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		irq->delivery_mode = APIC_DM_FIXED;
 	}
 
-	memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
+	local_lock(&kvm_pcpu_mask_lock);
+	dest_vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
+
+	memset(dest_vcpu_bitmap, 0, KVM_VCPU_MASK_SZ);
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (!kvm_apic_present(vcpu))
@@ -93,6 +96,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		lowest = kvm_get_vcpu(kvm, idx);
 	}
 
+	local_unlock(&kvm_pcpu_mask_lock);
+
 	if (lowest)
 		r = kvm_apic_set_irq(lowest, irq, dest_map);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b6f38f0b617..fd19b72a5733 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -213,6 +213,10 @@ unsigned int kvm_max_vcpu_id(void)
 }
 EXPORT_SYMBOL_GPL(kvm_max_vcpu_id);
 
+DEFINE_PER_CPU(local_lock_t, kvm_pcpu_mask_lock) =
+	INIT_LOCAL_LOCK(kvm_pcpu_mask_lock);
+unsigned long __percpu *kvm_pcpu_vcpu_mask;
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
@@ -11029,9 +11033,18 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		rdmsrl(MSR_IA32_XSS, host_xss);
 
+	kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
+					    sizeof(unsigned long));
+	kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
+
+	if (!kvm_pcpu_vcpu_mask || !kvm_hv_vp_bitmap) {
+		r = -ENOMEM;
+		goto err;
+	}
+
 	r = ops->hardware_setup();
 	if (r != 0)
-		return r;
+		goto err;
 
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 	kvm_ops_static_call_update();
@@ -11059,11 +11072,18 @@ int kvm_arch_hardware_setup(void *opaque)
 
 	kvm_init_msr_list();
 	return 0;
+
+ err:
+	free_percpu(kvm_pcpu_vcpu_mask);
+	free_percpu(kvm_hv_vp_bitmap);
+	return r;
 }
 
 void kvm_arch_hardware_unsetup(void)
 {
 	static_call(kvm_x86_hardware_unsetup)();
+	free_percpu(kvm_pcpu_vcpu_mask);
+	free_percpu(kvm_hv_vp_bitmap);
 }
 
 int kvm_arch_check_processor_compat(void *opaque)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/6] kvm: use kvfree() in kvm_arch_free_vm()
  2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
                   ` (2 preceding siblings ...)
  2021-09-03 13:08 ` [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks Juergen Gross
@ 2021-09-03 13:08 ` Juergen Gross
  2021-09-28 16:48   ` Paolo Bonzini
  2021-09-03 13:08 ` [PATCH v2 5/6] kvm: allocate vcpu pointer array separately Juergen Gross
  2021-09-03 13:08 ` [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
  5 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:08 UTC (permalink / raw)
  To: kvm, x86, linux-arm-kernel, linux-kernel
  Cc: maz, ehabkost, Juergen Gross, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Will Deacon, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, kvmarm

By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
use the common variant. This can be accomplished by adding another
macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.

Further simplification can be achieved by adding __kvm_arch_free_vm()
doing the common part.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
---
 arch/arm64/include/asm/kvm_host.h | 1 -
 arch/arm64/kvm/arm.c              | 8 --------
 arch/x86/include/asm/kvm_host.h   | 2 ++
 arch/x86/kvm/x86.c                | 2 +-
 include/linux/kvm_host.h          | 9 ++++++++-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..39601fb87e69 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -767,7 +767,6 @@ int kvm_set_ipa_limit(void);
 
 #define __KVM_HAVE_ARCH_VM_ALLOC
 struct kvm *kvm_arch_alloc_vm(void);
-void kvm_arch_free_vm(struct kvm *kvm);
 
 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0ca72f5cda41..38fff5963d9f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -299,14 +299,6 @@ struct kvm *kvm_arch_alloc_vm(void)
 	return vzalloc(sizeof(struct kvm));
 }
 
-void kvm_arch_free_vm(struct kvm *kvm)
-{
-	if (!has_vhe())
-		kfree(kvm);
-	else
-		vfree(kvm);
-}
-
 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 {
 	if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a809a9e4fa5c..f16fadfc030a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1521,6 +1521,8 @@ static inline struct kvm *kvm_arch_alloc_vm(void)
 {
 	return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 }
+
+#define __KVM_HAVE_ARCH_VM_FREE
 void kvm_arch_free_vm(struct kvm *kvm);
 
 #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd19b72a5733..cc552763f0e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11129,7 +11129,7 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 void kvm_arch_free_vm(struct kvm *kvm)
 {
 	kfree(to_kvm_hv(kvm)->hv_pa_pg);
-	vfree(kvm);
+	__kvm_arch_free_vm(kvm);
 }
 
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae7735b490b4..d75e9c2a00b1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1044,10 +1044,17 @@ static inline struct kvm *kvm_arch_alloc_vm(void)
 {
 	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
 }
+#endif
+
+static inline void __kvm_arch_free_vm(struct kvm *kvm)
+{
+	kvfree(kvm);
+}
 
+#ifndef __KVM_HAVE_ARCH_VM_FREE
 static inline void kvm_arch_free_vm(struct kvm *kvm)
 {
-	kfree(kvm);
+	__kvm_arch_free_vm(kvm);
 }
 #endif
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/6] kvm: allocate vcpu pointer array separately
  2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
                   ` (3 preceding siblings ...)
  2021-09-03 13:08 ` [PATCH v2 4/6] kvm: use kvfree() in kvm_arch_free_vm() Juergen Gross
@ 2021-09-03 13:08 ` Juergen Gross
  2021-09-03 14:41   ` Marc Zyngier
  2021-09-03 13:08 ` [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
  5 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:08 UTC (permalink / raw)
  To: kvm, x86, linux-arm-kernel, linux-kernel
  Cc: maz, ehabkost, Juergen Gross, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, Will Deacon, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, kvmarm

Prepare support of very large vcpu numbers per guest by moving the
vcpu pointer array out of struct kvm.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- rebase to new kvm_arch_free_vm() implementation
---
 arch/arm64/kvm/arm.c            | 21 +++++++++++++++++++--
 arch/x86/include/asm/kvm_host.h |  5 +----
 arch/x86/kvm/x86.c              | 18 ++++++++++++++++++
 include/linux/kvm_host.h        | 17 +++++++++++++++--
 4 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 38fff5963d9f..8bb5caeba007 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -293,10 +293,27 @@ long kvm_arch_dev_ioctl(struct file *filp,
 
 struct kvm *kvm_arch_alloc_vm(void)
 {
+	struct kvm *kvm;
+
+	if (!has_vhe())
+		kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+	else
+		kvm = vzalloc(sizeof(struct kvm));
+
+	if (!kvm)
+		return NULL;
+
 	if (!has_vhe())
-		return kzalloc(sizeof(struct kvm), GFP_KERNEL);
+		kvm->vcpus = kcalloc(KVM_MAX_VCPUS, sizeof(void *), GFP_KERNEL);
+	else
+		kvm->vcpus = vzalloc(KVM_MAX_VCPUS * sizeof(void *));
+
+	if (!kvm->vcpus) {
+		kvm_arch_free_vm(kvm);
+		kvm = NULL;
+	}
 
-	return vzalloc(sizeof(struct kvm));
+	return kvm;
 }
 
 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f16fadfc030a..6c28d0800208 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1517,10 +1517,7 @@ static inline void kvm_ops_static_call_update(void)
 }
 
 #define __KVM_HAVE_ARCH_VM_ALLOC
-static inline struct kvm *kvm_arch_alloc_vm(void)
-{
-	return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
-}
+struct kvm *kvm_arch_alloc_vm(void);
 
 #define __KVM_HAVE_ARCH_VM_FREE
 void kvm_arch_free_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cc552763f0e4..ff142b6dd00c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11126,6 +11126,24 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 	static_call(kvm_x86_sched_in)(vcpu, cpu);
 }
 
+struct kvm *kvm_arch_alloc_vm(void)
+{
+	struct kvm *kvm;
+
+	kvm = __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (!kvm)
+		return NULL;
+
+	kvm->vcpus = __vmalloc(KVM_MAX_VCPUS * sizeof(void *),
+			       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (!kvm->vcpus) {
+		vfree(kvm);
+		kvm = NULL;
+	}
+
+	return kvm;
+}
+
 void kvm_arch_free_vm(struct kvm *kvm)
 {
 	kfree(to_kvm_hv(kvm)->hv_pa_pg);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d75e9c2a00b1..9e2a5f1c6f54 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -536,7 +536,7 @@ struct kvm {
 	struct mutex slots_arch_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
 	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
-	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+	struct kvm_vcpu **vcpus;
 
 	/*
 	 * created_vcpus is protected by kvm->lock, and is incremented
@@ -1042,12 +1042,25 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm);
  */
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
-	return kzalloc(sizeof(struct kvm), GFP_KERNEL);
+	struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+
+	if (!kvm)
+		return NULL;
+
+	kvm->vcpus = kcalloc(KVM_MAX_VCPUS, sizeof(void *), GFP_KERNEL);
+	if (!kvm->vcpus) {
+		kfree(kvm);
+		kvm = NULL;
+	}
+
+	return kvm;
 }
 #endif
 
 static inline void __kvm_arch_free_vm(struct kvm *kvm)
 {
+	if (kvm)
+		kvfree(kvm->vcpus);
 	kvfree(kvm);
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
                   ` (4 preceding siblings ...)
  2021-09-03 13:08 ` [PATCH v2 5/6] kvm: allocate vcpu pointer array separately Juergen Gross
@ 2021-09-03 13:08 ` Juergen Gross
  2021-09-06  0:45   ` Yao Yuan
  5 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:08 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: maz, ehabkost, Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin

Today the maximum number of vcpus of a kvm guest is set via a #define
in a header file.

In order to support higher vcpu numbers for guests without generally
increasing the memory consumption of guests on the host especially on
very large systems add a boot parameter for specifying the number of
allowed vcpus for guests.

The default will still be the current setting of 288. The value 0 has
the special meaning to limit the number of possible vcpus to the
number of possible cpus of the host.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
 arch/x86/include/asm/kvm_host.h                 | 5 ++++-
 arch/x86/kvm/x86.c                              | 9 ++++++++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 37e194299311..b9641c9989ef 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2435,6 +2435,13 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
+			guest. The special value 0 sets the limit to the number
+			of physical cpus possible on the host (including not
+			yet hotplugged cpus). Higher values will result in
+			slightly higher memory consumption per guest.
+			Default: 288
+
 	kvm.vcpu_id_add_bits=
 			[KVM,X86] The vcpu-ids of guests are sparse, as they
 			are constructed by bit-wise concatenation of the ids of
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6c28d0800208..a4ab387b0e1c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -38,7 +38,8 @@
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
-#define KVM_MAX_VCPUS 288
+#define KVM_DEFAULT_MAX_VCPUS 288
+#define KVM_MAX_VCPUS max_vcpus
 #define KVM_SOFT_MAX_VCPUS 240
 #define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
 /* memory slots that are not exposed to userspace */
@@ -1588,6 +1589,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
 extern u64  kvm_default_tsc_scaling_ratio;
 /* bus lock detection supported? */
 extern bool kvm_has_bus_lock_exit;
+/* maximum number of vcpus per guest */
+extern unsigned int max_vcpus;
 /* maximum vcpu-id */
 unsigned int kvm_max_vcpu_id(void);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff142b6dd00c..49c3d91c559e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -188,9 +188,13 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 static int __read_mostly vcpu_id_add_bits = -1;
 module_param(vcpu_id_add_bits, int, S_IRUGO);
 
+unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
+module_param(max_vcpus, uint, S_IRUGO);
+EXPORT_SYMBOL_GPL(max_vcpus);
+
 unsigned int kvm_max_vcpu_id(void)
 {
-	int n_bits = fls(KVM_MAX_VCPUS - 1);
+	int n_bits = fls(max_vcpus - 1);
 
 	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
 		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
@@ -11033,6 +11037,9 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		rdmsrl(MSR_IA32_XSS, host_xss);
 
+	if (max_vcpus == 0)
+		max_vcpus = num_possible_cpus();
+
 	kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
 					    sizeof(unsigned long));
 	kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-09-03 13:08 ` [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
@ 2021-09-03 13:43   ` Vitaly Kuznetsov
  2021-09-03 13:53     ` Juergen Gross
  2021-09-03 19:48   ` Eduardo Habkost
  2021-09-28 16:41   ` Paolo Bonzini
  2 siblings, 1 reply; 23+ messages in thread
From: Vitaly Kuznetsov @ 2021-09-03 13:43 UTC (permalink / raw)
  To: Juergen Gross, kvm, x86, linux-doc, linux-kernel
  Cc: maz, ehabkost, Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

Juergen Gross <jgross@suse.com> writes:

> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> via a #define in a header file.
>
> In order to support higher vcpu-ids without generally increasing the
> memory consumption of guests on the host (some guest structures contain
> arrays sized by KVM_MAX_VCPU_ID) add a boot parameter for adding some
> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> constructed via bit-wise concatenation of socket-id, core-id, etc.
> As those ids maximum values are not always a power of 2, the vcpu-ids
> are sparse.
>
> The additional number of bits needed is basically the number of
> topology levels with a non-power-of-2 maximum value, excluding the top
> most level.
>
> The default value of the new parameter will be to take the correct
> setting from the host's topology.
>
> Calculating the maximum vcpu-id dynamically requires to allocate the
> arrays using KVM_MAX_VCPU_ID as the size dynamically.
>
> Signed-of-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - switch to specifying additional bits (based on comment by Vitaly
>   Kuznetsov)
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  .../admin-guide/kernel-parameters.txt         | 18 ++++++++++++
>  arch/x86/include/asm/kvm_host.h               |  4 ++-
>  arch/x86/kvm/ioapic.c                         | 12 +++++++-
>  arch/x86/kvm/ioapic.h                         |  4 +--
>  arch/x86/kvm/x86.c                            | 29 +++++++++++++++++++
>  5 files changed, 63 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 84dc5790741b..37e194299311 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2435,6 +2435,24 @@
>  			feature (tagged TLBs) on capable Intel chips.
>  			Default is 1 (enabled)
>  
> +	kvm.vcpu_id_add_bits=
> +			[KVM,X86] The vcpu-ids of guests are sparse, as they
> +			are constructed by bit-wise concatenation of the ids of
> +			the different topology levels (sockets, cores, threads).
> +
> +			This parameter specifies how many additional bits the
> +			maximum vcpu-id needs compared to the maximum number of
> +			vcpus.
> +
> +			Normally this value is the number of topology levels
> +			without the threads level and without the highest
> +			level.
> +
> +			The special value -1 can be used to support guests
> +			with the same topology is the host.
> +
> +			Default: -1
> +
>  	l1d_flush=	[X86,INTEL]
>  			Control mitigation for L1D based snooping vulnerability.
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index af6ce8d4c86a..3513edee8e22 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -39,7 +39,7 @@
>  
>  #define KVM_MAX_VCPUS 288
>  #define KVM_SOFT_MAX_VCPUS 240
> -#define KVM_MAX_VCPU_ID 1023
> +#define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
>  /* memory slots that are not exposed to userspace */
>  #define KVM_PRIVATE_MEM_SLOTS 3
>  
> @@ -1588,6 +1588,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
>  extern u64  kvm_default_tsc_scaling_ratio;
>  /* bus lock detection supported? */
>  extern bool kvm_has_bus_lock_exit;
> +/* maximum vcpu-id */
> +unsigned int kvm_max_vcpu_id(void);
>  
>  extern u64 kvm_mce_cap_supported;
>  
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index ff005fe738a4..52e8ea90c914 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
>  int kvm_ioapic_init(struct kvm *kvm)
>  {
>  	struct kvm_ioapic *ioapic;
> +	size_t sz;
>  	int ret;
>  
> -	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
> +	sz = sizeof(struct kvm_ioapic) +
> +	     sizeof(*ioapic->rtc_status.dest_map.map) *
> +		    BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1) +
> +	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
> +		    (KVM_MAX_VCPU_ID + 1);
> +	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
>  	if (!ioapic)
>  		return -ENOMEM;
> +	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
> +	ioapic->rtc_status.dest_map.vectors =
> +		(void *)(ioapic->rtc_status.dest_map.map +
> +			 BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1));
>  	spin_lock_init(&ioapic->lock);
>  	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
>  	kvm->arch.vioapic = ioapic;
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index bbd4a5d18b5d..623a3c5afad7 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -39,13 +39,13 @@ struct kvm_vcpu;
>  
>  struct dest_map {
>  	/* vcpu bitmap where IRQ has been sent */
> -	DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
> +	unsigned long *map;
>  
>  	/*
>  	 * Vector sent to a given vcpu, only valid when
>  	 * the vcpu's bit in map is set
>  	 */
> -	u8 vectors[KVM_MAX_VCPU_ID + 1];
> +	u8 *vectors;
>  };
>  
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5d5c5ed7dd4..6b6f38f0b617 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -78,6 +78,7 @@
>  #include <asm/intel_pt.h>
>  #include <asm/emulate_prefix.h>
>  #include <asm/sgx.h>
> +#include <asm/topology.h>
>  #include <clocksource/hyperv_timer.h>
>  
>  #define CREATE_TRACE_POINTS
> @@ -184,6 +185,34 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>  int __read_mostly pi_inject_timer = -1;
>  module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>  
> +static int __read_mostly vcpu_id_add_bits = -1;
> +module_param(vcpu_id_add_bits, int, S_IRUGO);
> +
> +unsigned int kvm_max_vcpu_id(void)
> +{
> +	int n_bits = fls(KVM_MAX_VCPUS - 1);
> +
> +	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
> +		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
> +		       vcpu_id_add_bits);
> +		vcpu_id_add_bits = -1;
> +	}
> +
> +	if (vcpu_id_add_bits >= 0) {
> +		n_bits += vcpu_id_add_bits;
> +	} else {
> +		n_bits++;		/* One additional bit for core level. */
> +		if (topology_max_die_per_package() > 1)
> +			n_bits++;	/* One additional bit for die level. */

This assumes topology_max_die_per_package() can not be greater than 2,
or 1 additional bit may not suffice, right?

> +	}
> +
> +	if (!n_bits)
> +		n_bits = 1;

Nitpick: AFAIU n_bits can't be zero here as KVM_MAX_VCPUS is still
static. The last patch of the series, however, makes it possible when
max_vcpus = 1 and vcpu_id_add_bits = 0. With this, I'd suggest to move
the check to the last patch.

> +
> +	return (1U << n_bits) - 1;
> +}
> +EXPORT_SYMBOL_GPL(kvm_max_vcpu_id);
> +
>  /*
>   * Restoring the host value for MSRs that are only consumed when running in
>   * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU

-- 
Vitaly


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-09-03 13:43   ` Vitaly Kuznetsov
@ 2021-09-03 13:53     ` Juergen Gross
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-03 13:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, x86, linux-doc, linux-kernel
  Cc: maz, ehabkost, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 7437 bytes --]

On 03.09.21 15:43, Vitaly Kuznetsov wrote:
> Juergen Gross <jgross@suse.com> writes:
> 
>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>> via a #define in a header file.
>>
>> In order to support higher vcpu-ids without generally increasing the
>> memory consumption of guests on the host (some guest structures contain
>> arrays sized by KVM_MAX_VCPU_ID) add a boot parameter for adding some
>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>> As those ids maximum values are not always a power of 2, the vcpu-ids
>> are sparse.
>>
>> The additional number of bits needed is basically the number of
>> topology levels with a non-power-of-2 maximum value, excluding the top
>> most level.
>>
>> The default value of the new parameter will be to take the correct
>> setting from the host's topology.
>>
>> Calculating the maximum vcpu-id dynamically requires to allocate the
>> arrays using KVM_MAX_VCPU_ID as the size dynamically.
>>
>> Signed-of-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - switch to specifying additional bits (based on comment by Vitaly
>>    Kuznetsov)
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   .../admin-guide/kernel-parameters.txt         | 18 ++++++++++++
>>   arch/x86/include/asm/kvm_host.h               |  4 ++-
>>   arch/x86/kvm/ioapic.c                         | 12 +++++++-
>>   arch/x86/kvm/ioapic.h                         |  4 +--
>>   arch/x86/kvm/x86.c                            | 29 +++++++++++++++++++
>>   5 files changed, 63 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 84dc5790741b..37e194299311 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2435,6 +2435,24 @@
>>   			feature (tagged TLBs) on capable Intel chips.
>>   			Default is 1 (enabled)
>>   
>> +	kvm.vcpu_id_add_bits=
>> +			[KVM,X86] The vcpu-ids of guests are sparse, as they
>> +			are constructed by bit-wise concatenation of the ids of
>> +			the different topology levels (sockets, cores, threads).
>> +
>> +			This parameter specifies how many additional bits the
>> +			maximum vcpu-id needs compared to the maximum number of
>> +			vcpus.
>> +
>> +			Normally this value is the number of topology levels
>> +			without the threads level and without the highest
>> +			level.
>> +
>> +			The special value -1 can be used to support guests
>> +			with the same topology is the host.
>> +
>> +			Default: -1
>> +
>>   	l1d_flush=	[X86,INTEL]
>>   			Control mitigation for L1D based snooping vulnerability.
>>   
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index af6ce8d4c86a..3513edee8e22 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -39,7 +39,7 @@
>>   
>>   #define KVM_MAX_VCPUS 288
>>   #define KVM_SOFT_MAX_VCPUS 240
>> -#define KVM_MAX_VCPU_ID 1023
>> +#define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
>>   /* memory slots that are not exposed to userspace */
>>   #define KVM_PRIVATE_MEM_SLOTS 3
>>   
>> @@ -1588,6 +1588,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
>>   extern u64  kvm_default_tsc_scaling_ratio;
>>   /* bus lock detection supported? */
>>   extern bool kvm_has_bus_lock_exit;
>> +/* maximum vcpu-id */
>> +unsigned int kvm_max_vcpu_id(void);
>>   
>>   extern u64 kvm_mce_cap_supported;
>>   
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index ff005fe738a4..52e8ea90c914 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
>>   int kvm_ioapic_init(struct kvm *kvm)
>>   {
>>   	struct kvm_ioapic *ioapic;
>> +	size_t sz;
>>   	int ret;
>>   
>> -	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
>> +	sz = sizeof(struct kvm_ioapic) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.map) *
>> +		    BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
>> +		    (KVM_MAX_VCPU_ID + 1);
>> +	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
>>   	if (!ioapic)
>>   		return -ENOMEM;
>> +	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
>> +	ioapic->rtc_status.dest_map.vectors =
>> +		(void *)(ioapic->rtc_status.dest_map.map +
>> +			 BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1));
>>   	spin_lock_init(&ioapic->lock);
>>   	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
>>   	kvm->arch.vioapic = ioapic;
>> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
>> index bbd4a5d18b5d..623a3c5afad7 100644
>> --- a/arch/x86/kvm/ioapic.h
>> +++ b/arch/x86/kvm/ioapic.h
>> @@ -39,13 +39,13 @@ struct kvm_vcpu;
>>   
>>   struct dest_map {
>>   	/* vcpu bitmap where IRQ has been sent */
>> -	DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
>> +	unsigned long *map;
>>   
>>   	/*
>>   	 * Vector sent to a given vcpu, only valid when
>>   	 * the vcpu's bit in map is set
>>   	 */
>> -	u8 vectors[KVM_MAX_VCPU_ID + 1];
>> +	u8 *vectors;
>>   };
>>   
>>   
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e5d5c5ed7dd4..6b6f38f0b617 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -78,6 +78,7 @@
>>   #include <asm/intel_pt.h>
>>   #include <asm/emulate_prefix.h>
>>   #include <asm/sgx.h>
>> +#include <asm/topology.h>
>>   #include <clocksource/hyperv_timer.h>
>>   
>>   #define CREATE_TRACE_POINTS
>> @@ -184,6 +185,34 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>>   int __read_mostly pi_inject_timer = -1;
>>   module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>>   
>> +static int __read_mostly vcpu_id_add_bits = -1;
>> +module_param(vcpu_id_add_bits, int, S_IRUGO);
>> +
>> +unsigned int kvm_max_vcpu_id(void)
>> +{
>> +	int n_bits = fls(KVM_MAX_VCPUS - 1);
>> +
>> +	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
>> +		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
>> +		       vcpu_id_add_bits);
>> +		vcpu_id_add_bits = -1;
>> +	}
>> +
>> +	if (vcpu_id_add_bits >= 0) {
>> +		n_bits += vcpu_id_add_bits;
>> +	} else {
>> +		n_bits++;		/* One additional bit for core level. */
>> +		if (topology_max_die_per_package() > 1)
>> +			n_bits++;	/* One additional bit for die level. */
> 
> This assumes topology_max_die_per_package() can not be greater than 2,
> or 1 additional bit may not suffice, right?

No. Each topology level can at least add one additional bit. This
mechanism assumes that each level consumes not more bits as
necessary, so with e.g. a core count of 18 per die 5 bits are used,
and not more.

> 
>> +	}
>> +
>> +	if (!n_bits)
>> +		n_bits = 1;
> 
> Nitpick: AFAIU n_bits can't be zero here as KVM_MAX_VCPUS is still
> static. The last patch of the series, however, makes it possible when
> max_vcpus = 1 and vcpu_id_add_bits = 0. With this, I'd suggest to move
> the check to the last patch.

This is true only if no downstream has a patch setting KVM_MAX_VCPUS to
1. I'd rather be safe than sorry here, especially as it would be very
easy to miss this dependency.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] kvm: allocate vcpu pointer array separately
  2021-09-03 13:08 ` [PATCH v2 5/6] kvm: allocate vcpu pointer array separately Juergen Gross
@ 2021-09-03 14:41   ` Marc Zyngier
  2021-09-06  4:33     ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: Marc Zyngier @ 2021-09-03 14:41 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-arm-kernel, linux-kernel, ehabkost, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas, Will Deacon,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, kvmarm

On Fri, 03 Sep 2021 14:08:06 +0100,
Juergen Gross <jgross@suse.com> wrote:
> 
> Prepare support of very large vcpu numbers per guest by moving the
> vcpu pointer array out of struct kvm.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - rebase to new kvm_arch_free_vm() implementation
> ---
>  arch/arm64/kvm/arm.c            | 21 +++++++++++++++++++--
>  arch/x86/include/asm/kvm_host.h |  5 +----
>  arch/x86/kvm/x86.c              | 18 ++++++++++++++++++
>  include/linux/kvm_host.h        | 17 +++++++++++++++--
>  4 files changed, 53 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 38fff5963d9f..8bb5caeba007 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -293,10 +293,27 @@ long kvm_arch_dev_ioctl(struct file *filp,
>  
>  struct kvm *kvm_arch_alloc_vm(void)
>  {
> +	struct kvm *kvm;
> +
> +	if (!has_vhe())
> +		kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
> +	else
> +		kvm = vzalloc(sizeof(struct kvm));
> +
> +	if (!kvm)
> +		return NULL;
> +
>  	if (!has_vhe())
> -		return kzalloc(sizeof(struct kvm), GFP_KERNEL);
> +		kvm->vcpus = kcalloc(KVM_MAX_VCPUS, sizeof(void *), GFP_KERNEL);
> +	else
> +		kvm->vcpus = vzalloc(KVM_MAX_VCPUS * sizeof(void *));
> +
> +	if (!kvm->vcpus) {
> +		kvm_arch_free_vm(kvm);
> +		kvm = NULL;
> +	}
>  
> -	return vzalloc(sizeof(struct kvm));
> +	return kvm;
>  }
>  
>  int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f16fadfc030a..6c28d0800208 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1517,10 +1517,7 @@ static inline void kvm_ops_static_call_update(void)
>  }
>  
>  #define __KVM_HAVE_ARCH_VM_ALLOC
> -static inline struct kvm *kvm_arch_alloc_vm(void)
> -{
> -	return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> -}
> +struct kvm *kvm_arch_alloc_vm(void);
>  
>  #define __KVM_HAVE_ARCH_VM_FREE
>  void kvm_arch_free_vm(struct kvm *kvm);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cc552763f0e4..ff142b6dd00c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11126,6 +11126,24 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
>  	static_call(kvm_x86_sched_in)(vcpu, cpu);
>  }
>  
> +struct kvm *kvm_arch_alloc_vm(void)
> +{
> +	struct kvm *kvm;
> +
> +	kvm = __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (!kvm)
> +		return NULL;
> +
> +	kvm->vcpus = __vmalloc(KVM_MAX_VCPUS * sizeof(void *),
> +			       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (!kvm->vcpus) {
> +		vfree(kvm);
> +		kvm = NULL;
> +	}
> +
> +	return kvm;
> +}
> +
>  void kvm_arch_free_vm(struct kvm *kvm)
>  {
>  	kfree(to_kvm_hv(kvm)->hv_pa_pg);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d75e9c2a00b1..9e2a5f1c6f54 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -536,7 +536,7 @@ struct kvm {
>  	struct mutex slots_arch_lock;
>  	struct mm_struct *mm; /* userspace tied to this vm */
>  	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
> -	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
> +	struct kvm_vcpu **vcpus;

At this stage, I really wonder why we are not using an xarray instead.

I wrote this [1] a while ago, and nothing caught fire. It was also a
net deletion of code...

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/vcpu-xarray

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks
  2021-09-03 13:08 ` [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks Juergen Gross
@ 2021-09-03 16:05   ` Eduardo Habkost
  2021-09-06  4:34     ` Juergen Gross
  2021-09-07 18:34   ` Eduardo Habkost
  1 sibling, 1 reply; 23+ messages in thread
From: Eduardo Habkost @ 2021-09-03 16:05 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-kernel, maz, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Fri, Sep 03, 2021 at 03:08:04PM +0200, Juergen Gross wrote:
> In order to support high vcpu numbers per guest don't use on stack
> vcpu bitmasks. As all those currently used bitmasks are not used in
> functions subject to recursion it is fairly easy to replace them with
> percpu bitmasks.
> 
> Disable preemption while such a bitmask is being used in order to
> avoid double usage in case we'd switch cpus.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Note that there's another patch that will introduce another
KVM_MAX_VCPUS bitmap variable on the stack:
https://lore.kernel.org/lkml/20210827092516.1027264-7-vkuznets@redhat.com/

Considering that the patch is a bug fix, should this series be
rebased on top of that?

-- 
Eduardo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-09-03 13:08 ` [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
  2021-09-03 13:43   ` Vitaly Kuznetsov
@ 2021-09-03 19:48   ` Eduardo Habkost
  2021-09-06  4:46     ` Juergen Gross
  2021-09-28 16:41   ` Paolo Bonzini
  2 siblings, 1 reply; 23+ messages in thread
From: Eduardo Habkost @ 2021-09-03 19:48 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, maz, Jonathan Corbet,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin

On Fri, Sep 03, 2021 at 03:08:03PM +0200, Juergen Gross wrote:
> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> via a #define in a header file.
> 
> In order to support higher vcpu-ids without generally increasing the
> memory consumption of guests on the host (some guest structures contain
> arrays sized by KVM_MAX_VCPU_ID) add a boot parameter for adding some
> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> constructed via bit-wise concatenation of socket-id, core-id, etc.
> As those ids maximum values are not always a power of 2, the vcpu-ids
> are sparse.
> 
> The additional number of bits needed is basically the number of
> topology levels with a non-power-of-2 maximum value, excluding the top
> most level.
> 
> The default value of the new parameter will be to take the correct
> setting from the host's topology.

Having the default depend on the host topology makes the host
behaviour unpredictable (which might be a problem when migrating
VMs from another host with a different topology).  Can't we just
default to 2?

> 
> Calculating the maximum vcpu-id dynamically requires to allocate the
> arrays using KVM_MAX_VCPU_ID as the size dynamically.
> 
> Signed-of-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - switch to specifying additional bits (based on comment by Vitaly
>   Kuznetsov)
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
[...]
>  #define KVM_MAX_VCPUS 288
>  #define KVM_SOFT_MAX_VCPUS 240
> -#define KVM_MAX_VCPU_ID 1023
> +#define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
[...]
> +unsigned int kvm_max_vcpu_id(void)
> +{
> +	int n_bits = fls(KVM_MAX_VCPUS - 1);
> +
> +	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
> +		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
> +		       vcpu_id_add_bits);
> +		vcpu_id_add_bits = -1;
> +	}
> +
> +	if (vcpu_id_add_bits >= 0) {
> +		n_bits += vcpu_id_add_bits;
> +	} else {
> +		n_bits++;		/* One additional bit for core level. */
> +		if (topology_max_die_per_package() > 1)
> +			n_bits++;	/* One additional bit for die level. */
> +	}
> +
> +	if (!n_bits)
> +		n_bits = 1;
> +
> +	return (1U << n_bits) - 1;

The largest possible VCPU ID is not KVM_MAX_VCPU_ID,
it's (KVM_MAX_VCPU_ID - 1).  This is enforced by
kvm_vm_ioctl_create_vcpu().

That would mean KVM_MAX_VCPU_ID should be (1 << n_bits) instead
of ((1 << n_bits) - 1), wouldn't it?


> +}
> +EXPORT_SYMBOL_GPL(kvm_max_vcpu_id);
> +
>  /*
>   * Restoring the host value for MSRs that are only consumed when running in
>   * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
> -- 
> 2.26.2
> 

-- 
Eduardo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-09-03 13:08 ` [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
@ 2021-09-06  0:45   ` Yao Yuan
  2021-09-06  4:47     ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: Yao Yuan @ 2021-09-06  0:45 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, maz, ehabkost,
	Jonathan Corbet, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Fri, Sep 03, 2021 at 03:08:07PM +0200, Juergen Gross wrote:
> Today the maximum number of vcpus of a kvm guest is set via a #define
> in a header file.
>
> In order to support higher vcpu numbers for guests without generally
> increasing the memory consumption of guests on the host especially on
> very large systems add a boot parameter for specifying the number of
> allowed vcpus for guests.
>
> The default will still be the current setting of 288. The value 0 has
> the special meaning to limit the number of possible vcpus to the
> number of possible cpus of the host.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
>  arch/x86/include/asm/kvm_host.h                 | 5 ++++-
>  arch/x86/kvm/x86.c                              | 9 ++++++++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 37e194299311..b9641c9989ef 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2435,6 +2435,13 @@
>           feature (tagged TLBs) on capable Intel chips.
>           Default is 1 (enabled)
>
> +	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
> +			guest. The special value 0 sets the limit to the number
> +			of physical cpus possible on the host (including not
> +			yet hotplugged cpus). Higher values will result in
> +			slightly higher memory consumption per guest.
> +			Default: 288
> +
>   kvm.vcpu_id_add_bits=
>           [KVM,X86] The vcpu-ids of guests are sparse, as they
>           are constructed by bit-wise concatenation of the ids of
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6c28d0800208..a4ab387b0e1c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -38,7 +38,8 @@
>
>  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>
> -#define KVM_MAX_VCPUS 288
> +#define KVM_DEFAULT_MAX_VCPUS 288
> +#define KVM_MAX_VCPUS max_vcpus
>  #define KVM_SOFT_MAX_VCPUS 240
>  #define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
>  /* memory slots that are not exposed to userspace */
> @@ -1588,6 +1589,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
>  extern u64  kvm_default_tsc_scaling_ratio;
>  /* bus lock detection supported? */
>  extern bool kvm_has_bus_lock_exit;
> +/* maximum number of vcpus per guest */
> +extern unsigned int max_vcpus;
>  /* maximum vcpu-id */
>  unsigned int kvm_max_vcpu_id(void);
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ff142b6dd00c..49c3d91c559e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -188,9 +188,13 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>  static int __read_mostly vcpu_id_add_bits = -1;
>  module_param(vcpu_id_add_bits, int, S_IRUGO);
>
> +unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
> +module_param(max_vcpus, uint, S_IRUGO);
> +EXPORT_SYMBOL_GPL(max_vcpus);
> +
>  unsigned int kvm_max_vcpu_id(void)
>  {
> -	int n_bits = fls(KVM_MAX_VCPUS - 1);
> +	int n_bits = fls(max_vcpus - 1);

A quesintion here: the parameter "vcpu_id_add_bits" also depends
on the "max_vcpus", we can't calculate the "vcpu_id_add_bits" from
"max_vcpus" because KVM has no topologically knowledge to determine
bits needed for each socket/core/thread level, right?

>
>   if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
>       pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
> @@ -11033,6 +11037,9 @@ int kvm_arch_hardware_setup(void *opaque)
>   if (boot_cpu_has(X86_FEATURE_XSAVES))
>       rdmsrl(MSR_IA32_XSS, host_xss);
>
> +	if (max_vcpus == 0)
> +		max_vcpus = num_possible_cpus();
> +
>   kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
>                       sizeof(unsigned long));
>   kvm_hv_vp_bitmap = __alloc_percpu(KVM_HV_VPMAP_SZ, sizeof(u64));
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] kvm: allocate vcpu pointer array separately
  2021-09-03 14:41   ` Marc Zyngier
@ 2021-09-06  4:33     ` Juergen Gross
  2021-09-06  9:46       ` Marc Zyngier
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2021-09-06  4:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, x86, linux-arm-kernel, linux-kernel, ehabkost, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas, Will Deacon,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, kvmarm


[-- Attachment #1.1.1: Type: text/plain, Size: 3827 bytes --]

On 03.09.21 16:41, Marc Zyngier wrote:
> On Fri, 03 Sep 2021 14:08:06 +0100,
> Juergen Gross <jgross@suse.com> wrote:
>>
>> Prepare support of very large vcpu numbers per guest by moving the
>> vcpu pointer array out of struct kvm.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - rebase to new kvm_arch_free_vm() implementation
>> ---
>>   arch/arm64/kvm/arm.c            | 21 +++++++++++++++++++--
>>   arch/x86/include/asm/kvm_host.h |  5 +----
>>   arch/x86/kvm/x86.c              | 18 ++++++++++++++++++
>>   include/linux/kvm_host.h        | 17 +++++++++++++++--
>>   4 files changed, 53 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 38fff5963d9f..8bb5caeba007 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -293,10 +293,27 @@ long kvm_arch_dev_ioctl(struct file *filp,
>>   
>>   struct kvm *kvm_arch_alloc_vm(void)
>>   {
>> +	struct kvm *kvm;
>> +
>> +	if (!has_vhe())
>> +		kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
>> +	else
>> +		kvm = vzalloc(sizeof(struct kvm));
>> +
>> +	if (!kvm)
>> +		return NULL;
>> +
>>   	if (!has_vhe())
>> -		return kzalloc(sizeof(struct kvm), GFP_KERNEL);
>> +		kvm->vcpus = kcalloc(KVM_MAX_VCPUS, sizeof(void *), GFP_KERNEL);
>> +	else
>> +		kvm->vcpus = vzalloc(KVM_MAX_VCPUS * sizeof(void *));
>> +
>> +	if (!kvm->vcpus) {
>> +		kvm_arch_free_vm(kvm);
>> +		kvm = NULL;
>> +	}
>>   
>> -	return vzalloc(sizeof(struct kvm));
>> +	return kvm;
>>   }
>>   
>>   int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index f16fadfc030a..6c28d0800208 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1517,10 +1517,7 @@ static inline void kvm_ops_static_call_update(void)
>>   }
>>   
>>   #define __KVM_HAVE_ARCH_VM_ALLOC
>> -static inline struct kvm *kvm_arch_alloc_vm(void)
>> -{
>> -	return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> -}
>> +struct kvm *kvm_arch_alloc_vm(void);
>>   
>>   #define __KVM_HAVE_ARCH_VM_FREE
>>   void kvm_arch_free_vm(struct kvm *kvm);
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index cc552763f0e4..ff142b6dd00c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -11126,6 +11126,24 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
>>   	static_call(kvm_x86_sched_in)(vcpu, cpu);
>>   }
>>   
>> +struct kvm *kvm_arch_alloc_vm(void)
>> +{
>> +	struct kvm *kvm;
>> +
>> +	kvm = __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +	if (!kvm)
>> +		return NULL;
>> +
>> +	kvm->vcpus = __vmalloc(KVM_MAX_VCPUS * sizeof(void *),
>> +			       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>> +	if (!kvm->vcpus) {
>> +		vfree(kvm);
>> +		kvm = NULL;
>> +	}
>> +
>> +	return kvm;
>> +}
>> +
>>   void kvm_arch_free_vm(struct kvm *kvm)
>>   {
>>   	kfree(to_kvm_hv(kvm)->hv_pa_pg);
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index d75e9c2a00b1..9e2a5f1c6f54 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -536,7 +536,7 @@ struct kvm {
>>   	struct mutex slots_arch_lock;
>>   	struct mm_struct *mm; /* userspace tied to this vm */
>>   	struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
>> -	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
>> +	struct kvm_vcpu **vcpus;
> 
> At this stage, I really wonder why we are not using an xarray instead.
> 
> I wrote this [1] a while ago, and nothing caught fire. It was also a
> net deletion of code...

Indeed, I'd prefer that solution!

Are you fine with me swapping my patch with yours in the series?


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks
  2021-09-03 16:05   ` Eduardo Habkost
@ 2021-09-06  4:34     ` Juergen Gross
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-06  4:34 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: kvm, x86, linux-kernel, maz, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 854 bytes --]

On 03.09.21 18:05, Eduardo Habkost wrote:
> On Fri, Sep 03, 2021 at 03:08:04PM +0200, Juergen Gross wrote:
>> In order to support high vcpu numbers per guest don't use on stack
>> vcpu bitmasks. As all those currently used bitmasks are not used in
>> functions subject to recursion it is fairly easy to replace them with
>> percpu bitmasks.
>>
>> Disable preemption while such a bitmask is being used in order to
>> avoid double usage in case we'd switch cpus.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Note that there's another patch that will introduce another
> KVM_MAX_VCPUS bitmap variable on the stack:
> https://lore.kernel.org/lkml/20210827092516.1027264-7-vkuznets@redhat.com/
> 
> Considering that the patch is a bug fix, should this series be
> rebased on top of that?
> 

Yes, I can do that.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-09-03 19:48   ` Eduardo Habkost
@ 2021-09-06  4:46     ` Juergen Gross
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-06  4:46 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: kvm, x86, linux-doc, linux-kernel, maz, Jonathan Corbet,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2844 bytes --]

On 03.09.21 21:48, Eduardo Habkost wrote:
> On Fri, Sep 03, 2021 at 03:08:03PM +0200, Juergen Gross wrote:
>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>> via a #define in a header file.
>>
>> In order to support higher vcpu-ids without generally increasing the
>> memory consumption of guests on the host (some guest structures contain
>> arrays sized by KVM_MAX_VCPU_ID) add a boot parameter for adding some
>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>> As those ids maximum values are not always a power of 2, the vcpu-ids
>> are sparse.
>>
>> The additional number of bits needed is basically the number of
>> topology levels with a non-power-of-2 maximum value, excluding the top
>> most level.
>>
>> The default value of the new parameter will be to take the correct
>> setting from the host's topology.
> 
> Having the default depend on the host topology makes the host
> behaviour unpredictable (which might be a problem when migrating
> VMs from another host with a different topology).  Can't we just
> default to 2?

Okay, fine with me.

> 
>>
>> Calculating the maximum vcpu-id dynamically requires to allocate the
>> arrays using KVM_MAX_VCPU_ID as the size dynamically.
>>
>> Signed-of-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - switch to specifying additional bits (based on comment by Vitaly
>>    Kuznetsov)
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
> [...]
>>   #define KVM_MAX_VCPUS 288
>>   #define KVM_SOFT_MAX_VCPUS 240
>> -#define KVM_MAX_VCPU_ID 1023
>> +#define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
> [...]
>> +unsigned int kvm_max_vcpu_id(void)
>> +{
>> +	int n_bits = fls(KVM_MAX_VCPUS - 1);
>> +
>> +	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
>> +		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
>> +		       vcpu_id_add_bits);
>> +		vcpu_id_add_bits = -1;
>> +	}
>> +
>> +	if (vcpu_id_add_bits >= 0) {
>> +		n_bits += vcpu_id_add_bits;
>> +	} else {
>> +		n_bits++;		/* One additional bit for core level. */
>> +		if (topology_max_die_per_package() > 1)
>> +			n_bits++;	/* One additional bit for die level. */
>> +	}
>> +
>> +	if (!n_bits)
>> +		n_bits = 1;
>> +
>> +	return (1U << n_bits) - 1;
> 
> The largest possible VCPU ID is not KVM_MAX_VCPU_ID,
> it's (KVM_MAX_VCPU_ID - 1).  This is enforced by
> kvm_vm_ioctl_create_vcpu().
> 
> That would mean KVM_MAX_VCPU_ID should be (1 << n_bits) instead
> of ((1 << n_bits) - 1), wouldn't it?

Oh, indeed. I have been fooled by the IMO bad naming of this macro.

The current value 1023 suggests it is not only me having been fooled.

Shouldn't it be named "KVM_MAX_VCPU_IDS" instead?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-09-06  0:45   ` Yao Yuan
@ 2021-09-06  4:47     ` Juergen Gross
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2021-09-06  4:47 UTC (permalink / raw)
  To: Yao Yuan
  Cc: kvm, x86, linux-doc, linux-kernel, maz, ehabkost,
	Jonathan Corbet, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 3691 bytes --]

On 06.09.21 02:45, Yao Yuan wrote:
> On Fri, Sep 03, 2021 at 03:08:07PM +0200, Juergen Gross wrote:
>> Today the maximum number of vcpus of a kvm guest is set via a #define
>> in a header file.
>>
>> In order to support higher vcpu numbers for guests without generally
>> increasing the memory consumption of guests on the host especially on
>> very large systems add a boot parameter for specifying the number of
>> allowed vcpus for guests.
>>
>> The default will still be the current setting of 288. The value 0 has
>> the special meaning to limit the number of possible vcpus to the
>> number of possible cpus of the host.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
>>   arch/x86/include/asm/kvm_host.h                 | 5 ++++-
>>   arch/x86/kvm/x86.c                              | 9 ++++++++-
>>   3 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 37e194299311..b9641c9989ef 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2435,6 +2435,13 @@
>>            feature (tagged TLBs) on capable Intel chips.
>>            Default is 1 (enabled)
>>
>> +	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
>> +			guest. The special value 0 sets the limit to the number
>> +			of physical cpus possible on the host (including not
>> +			yet hotplugged cpus). Higher values will result in
>> +			slightly higher memory consumption per guest.
>> +			Default: 288
>> +
>>    kvm.vcpu_id_add_bits=
>>            [KVM,X86] The vcpu-ids of guests are sparse, as they
>>            are constructed by bit-wise concatenation of the ids of
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 6c28d0800208..a4ab387b0e1c 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -38,7 +38,8 @@
>>
>>   #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>>
>> -#define KVM_MAX_VCPUS 288
>> +#define KVM_DEFAULT_MAX_VCPUS 288
>> +#define KVM_MAX_VCPUS max_vcpus
>>   #define KVM_SOFT_MAX_VCPUS 240
>>   #define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
>>   /* memory slots that are not exposed to userspace */
>> @@ -1588,6 +1589,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
>>   extern u64  kvm_default_tsc_scaling_ratio;
>>   /* bus lock detection supported? */
>>   extern bool kvm_has_bus_lock_exit;
>> +/* maximum number of vcpus per guest */
>> +extern unsigned int max_vcpus;
>>   /* maximum vcpu-id */
>>   unsigned int kvm_max_vcpu_id(void);
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index ff142b6dd00c..49c3d91c559e 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -188,9 +188,13 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>>   static int __read_mostly vcpu_id_add_bits = -1;
>>   module_param(vcpu_id_add_bits, int, S_IRUGO);
>>
>> +unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
>> +module_param(max_vcpus, uint, S_IRUGO);
>> +EXPORT_SYMBOL_GPL(max_vcpus);
>> +
>>   unsigned int kvm_max_vcpu_id(void)
>>   {
>> -	int n_bits = fls(KVM_MAX_VCPUS - 1);
>> +	int n_bits = fls(max_vcpus - 1);
> 
> A quesintion here: the parameter "vcpu_id_add_bits" also depends
> on the "max_vcpus", we can't calculate the "vcpu_id_add_bits" from
> "max_vcpus" because KVM has no topologically knowledge to determine
> bits needed for each socket/core/thread level, right?

Correct.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] kvm: allocate vcpu pointer array separately
  2021-09-06  4:33     ` Juergen Gross
@ 2021-09-06  9:46       ` Marc Zyngier
  2021-09-09 20:28         ` Sean Christopherson
  0 siblings, 1 reply; 23+ messages in thread
From: Marc Zyngier @ 2021-09-06  9:46 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-arm-kernel, linux-kernel, ehabkost, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Catalin Marinas, Will Deacon,
	Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, kvmarm

On Mon, 06 Sep 2021 05:33:35 +0100,
Juergen Gross <jgross@suse.com> wrote:
> 
> On 03.09.21 16:41, Marc Zyngier wrote:
>
> > At this stage, I really wonder why we are not using an xarray instead.
> > 
> > I wrote this [1] a while ago, and nothing caught fire. It was also a
> > net deletion of code...
> 
> Indeed, I'd prefer that solution!
> 
> Are you fine with me swapping my patch with yours in the series?

Of course, feel free to grab the whole series. You'll probably need
the initial patches to set the scene for this. On their own, they are
a nice cleanup, and I trust you can write a decent commit message for
the three patches affecting mips/s390/x86.

I would normally do that myself, but my network connectivity is
reduced to almost nothing at the moment.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks
  2021-09-03 13:08 ` [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks Juergen Gross
  2021-09-03 16:05   ` Eduardo Habkost
@ 2021-09-07 18:34   ` Eduardo Habkost
  2021-09-08  8:41     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 23+ messages in thread
From: Eduardo Habkost @ 2021-09-07 18:34 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-kernel, maz, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Fri, Sep 03, 2021 at 03:08:04PM +0200, Juergen Gross wrote:
> In order to support high vcpu numbers per guest don't use on stack
> vcpu bitmasks. As all those currently used bitmasks are not used in
> functions subject to recursion it is fairly easy to replace them with
> percpu bitmasks.
> 
> Disable preemption while such a bitmask is being used in order to
> avoid double usage in case we'd switch cpus.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V2:
> - use local_lock() instead of preempt_disable() (Paolo Bonzini)
> ---
>  arch/x86/include/asm/kvm_host.h | 10 ++++++++++
>  arch/x86/kvm/hyperv.c           | 25 ++++++++++++++++++-------
>  arch/x86/kvm/irq_comm.c         |  9 +++++++--
>  arch/x86/kvm/x86.c              | 22 +++++++++++++++++++++-
>  4 files changed, 56 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3513edee8e22..a809a9e4fa5c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -15,6 +15,7 @@
>  #include <linux/cpumask.h>
>  #include <linux/irq_work.h>
>  #include <linux/irq.h>
> +#include <linux/local_lock.h>
>  
>  #include <linux/kvm.h>
>  #include <linux/kvm_para.h>
> @@ -1591,6 +1592,15 @@ extern bool kvm_has_bus_lock_exit;
>  /* maximum vcpu-id */
>  unsigned int kvm_max_vcpu_id(void);
>  
> +/* per cpu vcpu bitmasks, protected by kvm_pcpu_mask_lock */
> +DECLARE_PER_CPU(local_lock_t, kvm_pcpu_mask_lock);
> +extern unsigned long __percpu *kvm_pcpu_vcpu_mask;
> +#define KVM_VCPU_MASK_SZ	\
> +	(sizeof(*kvm_pcpu_vcpu_mask) * BITS_TO_LONGS(KVM_MAX_VCPUS))
> +extern u64 __percpu *kvm_hv_vp_bitmap;
> +#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
> +#define KVM_HV_VPMAP_SZ		(sizeof(u64) * KVM_HV_MAX_SPARSE_VCPU_SET_BITS)

I have just realized that the Hyper-V sparse bitmap format can
support only up to 4096 CPUs, and the current implementation of
sparse_set_to_vcpu_mask() won't even work correctly if
KVM_MAX_VCPUS is larger than 4096.

This means vp_bitmap can't and will never be larger than 512
bytes.  Isn't a per-CPU variable for vp_bitmap overkill in this
case?

> [...]

-- 
Eduardo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks
  2021-09-07 18:34   ` Eduardo Habkost
@ 2021-09-08  8:41     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 23+ messages in thread
From: Vitaly Kuznetsov @ 2021-09-08  8:41 UTC (permalink / raw)
  To: Eduardo Habkost, Juergen Gross
  Cc: kvm, x86, linux-kernel, maz, Paolo Bonzini, Sean Christopherson,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin

Eduardo Habkost <ehabkost@redhat.com> writes:

> On Fri, Sep 03, 2021 at 03:08:04PM +0200, Juergen Gross wrote:
>> In order to support high vcpu numbers per guest don't use on stack
>> vcpu bitmasks. As all those currently used bitmasks are not used in
>> functions subject to recursion it is fairly easy to replace them with
>> percpu bitmasks.
>> 
>> Disable preemption while such a bitmask is being used in order to
>> avoid double usage in case we'd switch cpus.
>> 
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - use local_lock() instead of preempt_disable() (Paolo Bonzini)
>> ---
>>  arch/x86/include/asm/kvm_host.h | 10 ++++++++++
>>  arch/x86/kvm/hyperv.c           | 25 ++++++++++++++++++-------
>>  arch/x86/kvm/irq_comm.c         |  9 +++++++--
>>  arch/x86/kvm/x86.c              | 22 +++++++++++++++++++++-
>>  4 files changed, 56 insertions(+), 10 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 3513edee8e22..a809a9e4fa5c 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -15,6 +15,7 @@
>>  #include <linux/cpumask.h>
>>  #include <linux/irq_work.h>
>>  #include <linux/irq.h>
>> +#include <linux/local_lock.h>
>>  
>>  #include <linux/kvm.h>
>>  #include <linux/kvm_para.h>
>> @@ -1591,6 +1592,15 @@ extern bool kvm_has_bus_lock_exit;
>>  /* maximum vcpu-id */
>>  unsigned int kvm_max_vcpu_id(void);
>>  
>> +/* per cpu vcpu bitmasks, protected by kvm_pcpu_mask_lock */
>> +DECLARE_PER_CPU(local_lock_t, kvm_pcpu_mask_lock);
>> +extern unsigned long __percpu *kvm_pcpu_vcpu_mask;
>> +#define KVM_VCPU_MASK_SZ	\
>> +	(sizeof(*kvm_pcpu_vcpu_mask) * BITS_TO_LONGS(KVM_MAX_VCPUS))
>> +extern u64 __percpu *kvm_hv_vp_bitmap;
>> +#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
>> +#define KVM_HV_VPMAP_SZ		(sizeof(u64) * KVM_HV_MAX_SPARSE_VCPU_SET_BITS)
>
> I have just realized that the Hyper-V sparse bitmap format can
> support only up to 4096 CPUs, and the current implementation of
> sparse_set_to_vcpu_mask() won't even work correctly if
> KVM_MAX_VCPUS is larger than 4096.

Nice catch! Indeed, we can only encode 64 'banks' with 64 vCPUs each. We
need to enforce the limit somehow. As a big hammer, I can suggest to
fail kvm_hv_vcpu_init() and write to HV_X64_MSR_VP_INDEX for vCPUs above
4095 for now (I seriously doubt there's real need to run such big
Windows guests anywhere but who knows).

>
> This means vp_bitmap can't and will never be larger than 512
> bytes.  Isn't a per-CPU variable for vp_bitmap overkill in this
> case?

Yes, it's OK to allocate 512 bytes on stack.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/6] kvm: allocate vcpu pointer array separately
  2021-09-06  9:46       ` Marc Zyngier
@ 2021-09-09 20:28         ` Sean Christopherson
  0 siblings, 0 replies; 23+ messages in thread
From: Sean Christopherson @ 2021-09-09 20:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Juergen Gross, kvm, x86, linux-arm-kernel, linux-kernel,
	ehabkost, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, kvmarm

On Mon, Sep 06, 2021, Marc Zyngier wrote:
> On Mon, 06 Sep 2021 05:33:35 +0100,
> Juergen Gross <jgross@suse.com> wrote:
> > 
> > On 03.09.21 16:41, Marc Zyngier wrote:
> >
> > > At this stage, I really wonder why we are not using an xarray instead.
> > > 
> > > I wrote this [1] a while ago, and nothing caught fire. It was also a
> > > net deletion of code...
> > 
> > Indeed, I'd prefer that solution!
> > 
> > Are you fine with me swapping my patch with yours in the series?
> 
> Of course, feel free to grab the whole series. You'll probably need
> the initial patches to set the scene for this. On their own, they are
> a nice cleanup, and I trust you can write a decent commit message for
> the three patches affecting mips/s390/x86.

It would also be a good idea to convert kvm_for_each_vcpu() to use xa_for_each(),
I assume that's more performant than 2x atomic_read() + xa_load().  Unless I'm
misreading the code, xa_for_each() is guaranteed to iterate in ascending index
order, i.e. preserves the current vcpu0..vcpuN iteration order.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-09-03 13:08 ` [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
  2021-09-03 13:43   ` Vitaly Kuznetsov
  2021-09-03 19:48   ` Eduardo Habkost
@ 2021-09-28 16:41   ` Paolo Bonzini
  2 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2021-09-28 16:41 UTC (permalink / raw)
  To: Juergen Gross, kvm, x86, linux-doc, linux-kernel
  Cc: maz, ehabkost, Jonathan Corbet, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 03/09/21 15:08, Juergen Gross wrote:
> +	if (vcpu_id_add_bits >= 0) {
> +		n_bits += vcpu_id_add_bits;
> +	} else {
> +		n_bits++;		/* One additional bit for core level. */
> +		if (topology_max_die_per_package() > 1)
> +			n_bits++;	/* One additional bit for die level. */

This needs to be unconditional since it is always possible to emulate a 
multiple-die-per-package topology for a guest, even if the host has just 
one.

Paolo


> +	}
> +
> +	if (!n_bits)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/6] kvm: use kvfree() in kvm_arch_free_vm()
  2021-09-03 13:08 ` [PATCH v2 4/6] kvm: use kvfree() in kvm_arch_free_vm() Juergen Gross
@ 2021-09-28 16:48   ` Paolo Bonzini
  0 siblings, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2021-09-28 16:48 UTC (permalink / raw)
  To: Juergen Gross, kvm, x86, linux-arm-kernel, linux-kernel
  Cc: maz, ehabkost, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	kvmarm

On 03/09/21 15:08, Juergen Gross wrote:
> By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
> use the common variant. This can be accomplished by adding another
> macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.
> 
> Further simplification can be achieved by adding __kvm_arch_free_vm()
> doing the common part.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Queued this one already, thanks.

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-09-28 16:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
2021-09-03 13:08 ` [PATCH v2 1/6] x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h Juergen Gross
2021-09-03 13:08 ` [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
2021-09-03 13:43   ` Vitaly Kuznetsov
2021-09-03 13:53     ` Juergen Gross
2021-09-03 19:48   ` Eduardo Habkost
2021-09-06  4:46     ` Juergen Gross
2021-09-28 16:41   ` Paolo Bonzini
2021-09-03 13:08 ` [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks Juergen Gross
2021-09-03 16:05   ` Eduardo Habkost
2021-09-06  4:34     ` Juergen Gross
2021-09-07 18:34   ` Eduardo Habkost
2021-09-08  8:41     ` Vitaly Kuznetsov
2021-09-03 13:08 ` [PATCH v2 4/6] kvm: use kvfree() in kvm_arch_free_vm() Juergen Gross
2021-09-28 16:48   ` Paolo Bonzini
2021-09-03 13:08 ` [PATCH v2 5/6] kvm: allocate vcpu pointer array separately Juergen Gross
2021-09-03 14:41   ` Marc Zyngier
2021-09-06  4:33     ` Juergen Gross
2021-09-06  9:46       ` Marc Zyngier
2021-09-09 20:28         ` Sean Christopherson
2021-09-03 13:08 ` [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
2021-09-06  0:45   ` Yao Yuan
2021-09-06  4:47     ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).