kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs
@ 2021-11-16 14:10 Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

In order to be able to have a single kernel for supporting even huge
numbers of vcpus per guest some arrays should be sized dynamically.

The easiest way to do that is to add boot parameters for the maximum
number of vcpus and to calculate the maximum vcpu-id from that using
either the host topology or a topology hint via another boot parameter.

This patch series is doing that for x86. The same scheme can be easily
adapted to other architectures, but I don't want to do that in the
first iteration.

I've tested the series not to break normal guest operation and the new
parameters to be effective on x86.

This series is based on Marc Zyngier's xarray series:
https://lore.kernel.org/kvm/20211105192101.3862492-1-maz@kernel.org/

Changes in V2:
- removed old patch 1, as already applied
- patch 1 (old patch 2) only for reference, as the patch is already in
  the kvm tree
- switch patch 2 (old patch 3) to calculate vcpu-id
- added patch 4

Changes in V3:
- removed V2 patches 1 and 4, as already applied
- removed V2 patch 5, as replaced by Marc Zyngier's xarray series
- removed hyperv handling from patch 2
- new patch 3 handling hyperv specifics
- comments addressed

Juergen Gross (4):
  x86/kvm: add boot parameter for adding vcpu-id bits
  x86/kvm: introduce a per cpu vcpu mask
  x86/kvm: add max number of vcpus for hyperv emulation
  x86/kvm: add boot parameter for setting max number of vcpus per guest

 .../admin-guide/kernel-parameters.txt         | 25 +++++++++
 arch/x86/include/asm/kvm_host.h               | 29 +++++-----
 arch/x86/kvm/hyperv.c                         | 15 +++---
 arch/x86/kvm/ioapic.c                         | 20 ++++++-
 arch/x86/kvm/ioapic.h                         |  4 +-
 arch/x86/kvm/irq_comm.c                       |  9 +++-
 arch/x86/kvm/x86.c                            | 54 ++++++++++++++++++-
 7 files changed, 128 insertions(+), 28 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
@ 2021-11-16 14:10 ` Juergen Gross
  2021-11-17  6:59   ` Juergen Gross
  2021-11-17 23:44   ` Sean Christopherson
  2021-11-16 14:10 ` [PATCH v3 2/4] x86/kvm: introduce a per cpu vcpu mask Juergen Gross
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
via a #define in a header file.

In order to support higher vcpu-ids without generally increasing the
memory consumption of guests on the host (some guest structures contain
arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
bits to the vcpu-id. Additional bits are needed as the vcpu-id is
constructed via bit-wise concatenation of socket-id, core-id, etc.
As those ids maximum values are not always a power of 2, the vcpu-ids
are sparse.

The additional number of bits needed is basically the number of
topology levels with a non-power-of-2 maximum value, excluding the top
most level.

The default value of the new parameter will be 2 in order to support
today's possible topologies. The special value of -1 will use the
number of bits needed for a guest with the current host's topology.

Calculating the maximum vcpu-id dynamically requires to allocate the
arrays using KVM_MAX_VCPU_IDS as the size dynamically.

Signed-of-by: Juergen Gross <jgross@suse.com>
---
V2:
- switch to specifying additional bits (based on comment by Vitaly
  Kuznetsov)
V3:
- set default of new parameter to 2 (Eduardo Habkost)
- deliberately NOT add another bit for topology_max_die_per_package()
  == 1 AND parameter being -1, as this would make this parameter
  setting always equivalent to specifying "2"

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 .../admin-guide/kernel-parameters.txt         | 18 ++++++++++++
 arch/x86/include/asm/kvm_host.h               | 16 ++--------
 arch/x86/kvm/ioapic.c                         | 12 +++++++-
 arch/x86/kvm/ioapic.h                         |  4 +--
 arch/x86/kvm/x86.c                            | 29 +++++++++++++++++++
 5 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..e269c3f66ba4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2445,6 +2445,24 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	kvm.vcpu_id_add_bits=
+			[KVM,X86] The vcpu-ids of guests are sparse, as they
+			are constructed by bit-wise concatenation of the ids of
+			the different topology levels (sockets, cores, threads).
+
+			This parameter specifies how many additional bits the
+			maximum vcpu-id needs compared to the maximum number of
+			vcpus.
+
+			Normally this value is the number of topology levels
+			without the threads level and without the highest
+			level.
+
+			The special value -1 can be used to support guests
+			with the same topology is the host.
+
+			Default: 2
+
 	l1d_flush=	[X86,INTEL]
 			Control mitigation for L1D based snooping vulnerability.
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e5d8700319cc..bcef56f1039a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -38,19 +38,7 @@
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
 #define KVM_MAX_VCPUS 1024
-
-/*
- * In x86, the VCPU ID corresponds to the APIC ID, and APIC IDs
- * might be larger than the actual number of VCPUs because the
- * APIC ID encodes CPU topology information.
- *
- * In the worst case, we'll need less than one extra bit for the
- * Core ID, and less than one extra bit for the Package (Die) ID,
- * so ratio of 4 should be enough.
- */
-#define KVM_VCPU_ID_RATIO 4
-#define KVM_MAX_VCPU_IDS (KVM_MAX_VCPUS * KVM_VCPU_ID_RATIO)
-
+#define KVM_MAX_VCPU_IDS kvm_max_vcpu_ids()
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
 
@@ -1621,6 +1609,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
 extern u64  kvm_default_tsc_scaling_ratio;
 /* bus lock detection supported? */
 extern bool kvm_has_bus_lock_exit;
+/* maximum vcpu-id */
+unsigned int kvm_max_vcpu_ids(void);
 
 extern u64 kvm_mce_cap_supported;
 
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 816a82515dcd..64ba9b1c8b3d 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
 int kvm_ioapic_init(struct kvm *kvm)
 {
 	struct kvm_ioapic *ioapic;
+	size_t sz;
 	int ret;
 
-	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
+	sz = sizeof(struct kvm_ioapic) +
+	     sizeof(*ioapic->rtc_status.dest_map.map) *
+		    BITS_TO_LONGS(KVM_MAX_VCPU_IDS) +
+	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
+		    (KVM_MAX_VCPU_IDS);
+	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
 	if (!ioapic)
 		return -ENOMEM;
+	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
+	ioapic->rtc_status.dest_map.vectors =
+		(void *)(ioapic->rtc_status.dest_map.map +
+			 BITS_TO_LONGS(KVM_MAX_VCPU_IDS));
 	spin_lock_init(&ioapic->lock);
 	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
 	kvm->arch.vioapic = ioapic;
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index e66e620c3bed..623a3c5afad7 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -39,13 +39,13 @@ struct kvm_vcpu;
 
 struct dest_map {
 	/* vcpu bitmap where IRQ has been sent */
-	DECLARE_BITMAP(map, KVM_MAX_VCPU_IDS);
+	unsigned long *map;
 
 	/*
 	 * Vector sent to a given vcpu, only valid when
 	 * the vcpu's bit in map is set
 	 */
-	u8 vectors[KVM_MAX_VCPU_IDS];
+	u8 *vectors;
 };
 
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 259f719014c9..61bab2bdeefb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -80,6 +80,7 @@
 #include <asm/intel_pt.h>
 #include <asm/emulate_prefix.h>
 #include <asm/sgx.h>
+#include <asm/topology.h>
 #include <clocksource/hyperv_timer.h>
 
 #define CREATE_TRACE_POINTS
@@ -186,6 +187,34 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
 int __read_mostly pi_inject_timer = -1;
 module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
+static int __read_mostly vcpu_id_add_bits = 2;
+module_param(vcpu_id_add_bits, int, S_IRUGO);
+
+unsigned int kvm_max_vcpu_ids(void)
+{
+	int n_bits = fls(KVM_MAX_VCPUS - 1);
+
+	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
+		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
+		       vcpu_id_add_bits);
+		vcpu_id_add_bits = 2;
+	}
+
+	if (vcpu_id_add_bits >= 0) {
+		n_bits += vcpu_id_add_bits;
+	} else {
+		n_bits++;		/* One additional bit for core level. */
+		if (topology_max_die_per_package() > 1)
+			n_bits++;	/* One additional bit for die level. */
+	}
+
+	if (!n_bits)
+		n_bits = 1;
+
+	return 1U << n_bits;
+}
+EXPORT_SYMBOL_GPL(kvm_max_vcpu_ids);
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/4] x86/kvm: introduce a per cpu vcpu mask
  2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
@ 2021-11-16 14:10 ` Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
  3 siblings, 0 replies; 24+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-kernel
  Cc: Juergen Gross, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

In order to support high vcpu numbers per guest don't use an on stack
vcpu bitmask. As this currently used bitmask is not used in functions
subject to recursion it is fairly easy to replace it with a percpu
bitmask.

Allocate this bitmask dynamically in order to support boot time
specified max number of vcpus in future.

Disable preemption while such a bitmask is being used in order to
avoid double usage in case we'd switch cpus.

Note that this doesn't apply to vcpu bitmasks used in hyperv.c, as
there the max number of vcpus is architecturally limited to 4096 and
that bitmask can remain on the stack.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- use local_lock() instead of preempt_disable() (Paolo Bonzini)
V3:
- drop hyperv.c related changes (Eduardo Habkost)
---
 arch/x86/include/asm/kvm_host.h |  7 +++++++
 arch/x86/kvm/ioapic.c           |  8 +++++++-
 arch/x86/kvm/irq_comm.c         |  9 +++++++--
 arch/x86/kvm/x86.c              | 18 +++++++++++++++++-
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bcef56f1039a..886930ec8264 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -15,6 +15,7 @@
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
 #include <linux/irq.h>
+#include <linux/local_lock.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -1612,6 +1613,12 @@ extern bool kvm_has_bus_lock_exit;
 /* maximum vcpu-id */
 unsigned int kvm_max_vcpu_ids(void);
 
+/* per cpu vcpu bitmask, protected by kvm_pcpu_mask_lock */
+DECLARE_PER_CPU(local_lock_t, kvm_pcpu_mask_lock);
+extern unsigned long __percpu *kvm_pcpu_vcpu_mask;
+#define KVM_VCPU_MASK_SZ	\
+	(sizeof(*kvm_pcpu_vcpu_mask) * BITS_TO_LONGS(KVM_MAX_VCPUS))
+
 extern u64 kvm_mce_cap_supported;
 
 /*
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 64ba9b1c8b3d..c81963a27594 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -320,7 +320,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 	bool mask_before, mask_after;
 	union kvm_ioapic_redirect_entry *e;
 	int old_remote_irr, old_delivery_status, old_dest_id, old_dest_mode;
-	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+	unsigned long *vcpu_bitmap;
 
 	switch (ioapic->ioregsel) {
 	case IOAPIC_REG_VERSION:
@@ -384,6 +384,10 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 			irq.shorthand = APIC_DEST_NOSHORT;
 			irq.dest_id = e->fields.dest_id;
 			irq.msi_redir_hint = false;
+
+			local_lock(&kvm_pcpu_mask_lock);
+
+			vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
 			bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
 			kvm_bitmap_or_dest_vcpus(ioapic->kvm, &irq,
 						 vcpu_bitmap);
@@ -403,6 +407,8 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 			}
 			kvm_make_scan_ioapic_request_mask(ioapic->kvm,
 							  vcpu_bitmap);
+
+			local_unlock(&kvm_pcpu_mask_lock);
 		} else {
 			kvm_make_scan_ioapic_request(ioapic->kvm);
 		}
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index d5b72a08e566..c331204de007 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -47,7 +47,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 {
 	int i, r = -1;
 	struct kvm_vcpu *vcpu, *lowest = NULL;
-	unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
+	unsigned long *dest_vcpu_bitmap;
 	unsigned int dest_vcpus = 0;
 
 	if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
@@ -59,7 +59,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		irq->delivery_mode = APIC_DM_FIXED;
 	}
 
-	memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
+	local_lock(&kvm_pcpu_mask_lock);
+	dest_vcpu_bitmap = this_cpu_ptr(kvm_pcpu_vcpu_mask);
+
+	memset(dest_vcpu_bitmap, 0, KVM_VCPU_MASK_SZ);
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (!kvm_apic_present(vcpu))
@@ -93,6 +96,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 		lowest = kvm_get_vcpu(kvm, idx);
 	}
 
+	local_unlock(&kvm_pcpu_mask_lock);
+
 	if (lowest)
 		r = kvm_apic_set_irq(lowest, irq, dest_map);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 61bab2bdeefb..a388acdc5eb0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -215,6 +215,10 @@ unsigned int kvm_max_vcpu_ids(void)
 }
 EXPORT_SYMBOL_GPL(kvm_max_vcpu_ids);
 
+DEFINE_PER_CPU(local_lock_t, kvm_pcpu_mask_lock) =
+	INIT_LOCAL_LOCK(kvm_pcpu_mask_lock);
+unsigned long __percpu *kvm_pcpu_vcpu_mask;
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
@@ -11247,9 +11251,16 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		rdmsrl(MSR_IA32_XSS, host_xss);
 
+	kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
+					    sizeof(unsigned long));
+	if (!kvm_pcpu_vcpu_mask) {
+		r = -ENOMEM;
+		goto err;
+	}
+
 	r = ops->hardware_setup();
 	if (r != 0)
-		return r;
+		goto err;
 
 	memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
 	kvm_ops_static_call_update();
@@ -11277,11 +11288,16 @@ int kvm_arch_hardware_setup(void *opaque)
 
 	kvm_init_msr_list();
 	return 0;
+
+ err:
+	free_percpu(kvm_pcpu_vcpu_mask);
+	return r;
 }
 
 void kvm_arch_hardware_unsetup(void)
 {
 	static_call(kvm_x86_hardware_unsetup)();
+	free_percpu(kvm_pcpu_vcpu_mask);
 }
 
 int kvm_arch_check_processor_compat(void *opaque)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation
  2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
  2021-11-16 14:10 ` [PATCH v3 2/4] x86/kvm: introduce a per cpu vcpu mask Juergen Gross
@ 2021-11-16 14:10 ` Juergen Gross
  2021-11-17 20:50   ` Sean Christopherson
  2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
  3 siblings, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-kernel
  Cc: Juergen Gross, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

When emulating Hyperv the theoretical maximum of vcpus supported is
4096, as this is the architectural limit for sending IPIs via the PV
interface.

For restricting the actual supported number of vcpus for that case
introduce another define KVM_MAX_HYPERV_VCPUS and set it to 1024, like
today's KVM_MAX_VCPUS. Make both values unsigned ones as this will be
needed later.

The actual number of supported vcpus for Hyperv emulation will be the
lower value of both defines.

This is a preparation for a future boot parameter support of the max
number of vcpus for a KVM guest.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- new patch
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/hyperv.c           | 15 ++++++++-------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 886930ec8264..8ea03ff01c45 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -38,7 +38,8 @@
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
-#define KVM_MAX_VCPUS 1024
+#define KVM_MAX_VCPUS 1024U
+#define KVM_MAX_HYPERV_VCPUS 1024U
 #define KVM_MAX_VCPU_IDS kvm_max_vcpu_ids()
 /* memory slots that are not exposed to userspace */
 #define KVM_PRIVATE_MEM_SLOTS 3
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 4a555f32885a..c0fa837121f1 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -41,7 +41,7 @@
 /* "Hv#1" signature */
 #define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
 
-#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
+#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_HYPERV_VCPUS, 64)
 
 static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
 				bool vcpu_kick);
@@ -166,7 +166,7 @@ static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx)
 	struct kvm_vcpu *vcpu = NULL;
 	int i;
 
-	if (vpidx >= KVM_MAX_VCPUS)
+	if (vpidx >= min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS))
 		return NULL;
 
 	vcpu = kvm_get_vcpu(kvm, vpidx);
@@ -1446,7 +1446,8 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
 		struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
 		u32 new_vp_index = (u32)data;
 
-		if (!host || new_vp_index >= KVM_MAX_VCPUS)
+		if (!host ||
+		    new_vp_index >= min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS))
 			return 1;
 
 		if (new_vp_index == hv_vcpu->vp_index)
@@ -1729,7 +1730,7 @@ static __always_inline unsigned long *sparse_set_to_vcpu_mask(
 		return (unsigned long *)vp_bitmap;
 	}
 
-	bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
+	bitmap_zero(vcpu_bitmap, min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS));
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (test_bit(kvm_hv_get_vpindex(vcpu), (unsigned long *)vp_bitmap))
 			__set_bit(i, vcpu_bitmap);
@@ -1757,7 +1758,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
 	struct hv_tlb_flush_ex flush_ex;
 	struct hv_tlb_flush flush;
 	u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
-	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_HYPERV_VCPUS);
 	unsigned long *vcpu_mask;
 	u64 valid_bank_mask;
 	u64 sparse_banks[64];
@@ -1880,7 +1881,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc, bool
 	struct hv_send_ipi_ex send_ipi_ex;
 	struct hv_send_ipi send_ipi;
 	u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
-	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+	DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_HYPERV_VCPUS);
 	unsigned long *vcpu_mask;
 	unsigned long valid_bank_mask;
 	u64 sparse_banks[64];
@@ -2505,7 +2506,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 
 		case HYPERV_CPUID_IMPLEMENT_LIMITS:
 			/* Maximum number of virtual processors */
-			ent->eax = KVM_MAX_VCPUS;
+			ent->eax = min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS);
 			/*
 			 * Maximum number of logical processors, matches
 			 * HyperV 2016.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
                   ` (2 preceding siblings ...)
  2021-11-16 14:10 ` [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation Juergen Gross
@ 2021-11-16 14:10 ` Juergen Gross
  2021-11-17 20:57   ` Sean Christopherson
  3 siblings, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2021-11-16 14:10 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Juergen Gross, Jonathan Corbet, Paolo Bonzini,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

Today the maximum number of vcpus of a kvm guest is set via a #define
in a header file.

In order to support higher vcpu numbers for guests without generally
increasing the memory consumption of guests on the host especially on
very large systems add a boot parameter for specifying the number of
allowed vcpus for guests.

The default will still be the current setting of 1024. The value 0 has
the special meaning to limit the number of possible vcpus to the
number of possible cpus of the host.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- rebase
---
 Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
 arch/x86/include/asm/kvm_host.h                 | 5 ++++-
 arch/x86/kvm/x86.c                              | 9 ++++++++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e269c3f66ba4..409a72c2d91b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2445,6 +2445,13 @@
 			feature (tagged TLBs) on capable Intel chips.
 			Default is 1 (enabled)
 
+	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
+			guest. The special value 0 sets the limit to the number
+			of physical cpus possible on the host (including not
+			yet hotplugged cpus). Higher values will result in
+			slightly higher memory consumption per guest.
+			Default: 1024
+
 	kvm.vcpu_id_add_bits=
 			[KVM,X86] The vcpu-ids of guests are sparse, as they
 			are constructed by bit-wise concatenation of the ids of
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8ea03ff01c45..8566e278ca91 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -38,7 +38,8 @@
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
 
-#define KVM_MAX_VCPUS 1024U
+#define KVM_DEFAULT_MAX_VCPUS 1024U
+#define KVM_MAX_VCPUS max_vcpus
 #define KVM_MAX_HYPERV_VCPUS 1024U
 #define KVM_MAX_VCPU_IDS kvm_max_vcpu_ids()
 /* memory slots that are not exposed to userspace */
@@ -1611,6 +1612,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
 extern u64  kvm_default_tsc_scaling_ratio;
 /* bus lock detection supported? */
 extern bool kvm_has_bus_lock_exit;
+/* maximum number of vcpus per guest */
+extern unsigned int max_vcpus;
 /* maximum vcpu-id */
 unsigned int kvm_max_vcpu_ids(void);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a388acdc5eb0..3571ea34135b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -190,9 +190,13 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 static int __read_mostly vcpu_id_add_bits = 2;
 module_param(vcpu_id_add_bits, int, S_IRUGO);
 
+unsigned int __read_mostly max_vcpus = KVM_DEFAULT_MAX_VCPUS;
+module_param(max_vcpus, uint, S_IRUGO);
+EXPORT_SYMBOL_GPL(max_vcpus);
+
 unsigned int kvm_max_vcpu_ids(void)
 {
-	int n_bits = fls(KVM_MAX_VCPUS - 1);
+	int n_bits = fls(max_vcpus - 1);
 
 	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
 		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
@@ -11251,6 +11255,9 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		rdmsrl(MSR_IA32_XSS, host_xss);
 
+	if (max_vcpus == 0)
+		max_vcpus = num_possible_cpus();
+
 	kvm_pcpu_vcpu_mask = __alloc_percpu(KVM_VCPU_MASK_SZ,
 					    sizeof(unsigned long));
 	if (!kvm_pcpu_vcpu_mask) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
@ 2021-11-17  6:59   ` Juergen Gross
  2021-11-17 23:46     ` Sean Christopherson
  2021-11-17 23:44   ` Sean Christopherson
  1 sibling, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2021-11-17  6:59 UTC (permalink / raw)
  To: kvm, x86, linux-doc, linux-kernel
  Cc: Jonathan Corbet, Paolo Bonzini, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2010 bytes --]

On 16.11.21 15:10, Juergen Gross wrote:
> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> via a #define in a header file.
> 
> In order to support higher vcpu-ids without generally increasing the
> memory consumption of guests on the host (some guest structures contain
> arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> constructed via bit-wise concatenation of socket-id, core-id, etc.
> As those ids maximum values are not always a power of 2, the vcpu-ids
> are sparse.
> 
> The additional number of bits needed is basically the number of
> topology levels with a non-power-of-2 maximum value, excluding the top
> most level.
> 
> The default value of the new parameter will be 2 in order to support
> today's possible topologies. The special value of -1 will use the
> number of bits needed for a guest with the current host's topology.
> 
> Calculating the maximum vcpu-id dynamically requires to allocate the
> arrays using KVM_MAX_VCPU_IDS as the size dynamically.
> 
> Signed-of-by: Juergen Gross <jgross@suse.com>

Just thought about vcpu-ids a little bit more.

It would be possible to replace the topology games completely by an
arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
depending on the max vcpu-id just as needed.

Right now the only vcpu-id dependent memory is for the ioapic consisting
of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).

We could start with a minimal size when setting up an ioapic and extend
the areas in case a new vcpu created would introduce a vcpu-id outside
the currently allocated memory. Both arrays are protected by the ioapic
specific lock (at least I couldn't spot any unprotected usage when
looking briefly into the code), so reallocating those arrays shouldn't
be hard. In case of ENOMEM the related vcpu creation would just fail.

Thoughts?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation
  2021-11-16 14:10 ` [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation Juergen Gross
@ 2021-11-17 20:50   ` Sean Christopherson
  2021-11-18  7:43     ` Juergen Gross
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-17 20:50 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

On Tue, Nov 16, 2021, Juergen Gross wrote:
> When emulating Hyperv the theoretical maximum of vcpus supported is
> 4096, as this is the architectural limit for sending IPIs via the PV
> interface.
> 
> For restricting the actual supported number of vcpus for that case
> introduce another define KVM_MAX_HYPERV_VCPUS and set it to 1024, like
> today's KVM_MAX_VCPUS. Make both values unsigned ones as this will be
> needed later.
> 
> The actual number of supported vcpus for Hyperv emulation will be the
> lower value of both defines.
> 
> This is a preparation for a future boot parameter support of the max
> number of vcpus for a KVM guest.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V3:
> - new patch
> ---
>  arch/x86/include/asm/kvm_host.h |  3 ++-
>  arch/x86/kvm/hyperv.c           | 15 ++++++++-------
>  2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 886930ec8264..8ea03ff01c45 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -38,7 +38,8 @@
>  
>  #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>  
> -#define KVM_MAX_VCPUS 1024
> +#define KVM_MAX_VCPUS 1024U
> +#define KVM_MAX_HYPERV_VCPUS 1024U

I don't see any reason to put this in kvm_host.h, it should never be used outside
of hyperv.c.

>  #define KVM_MAX_VCPU_IDS kvm_max_vcpu_ids()
>  /* memory slots that are not exposed to userspace */
>  #define KVM_PRIVATE_MEM_SLOTS 3
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 4a555f32885a..c0fa837121f1 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -41,7 +41,7 @@
>  /* "Hv#1" signature */
>  #define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
>  
> -#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
> +#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_HYPERV_VCPUS, 64)
>  
>  static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
>  				bool vcpu_kick);
> @@ -166,7 +166,7 @@ static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx)
>  	struct kvm_vcpu *vcpu = NULL;
>  	int i;
>  
> -	if (vpidx >= KVM_MAX_VCPUS)
> +	if (vpidx >= min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS))

IMO, this is conceptually wrong.  KVM should refuse to allow Hyper-V to be enabled
if the max number of vCPUs exceeds what can be supported, or should refuse to create
the vCPUs.  I agree it makes sense to add a Hyper-V specific limit, since there are
Hyper-V structures that have a hard limit, but detection of violations should be a
BUILD_BUG_ON, not a silent failure at runtime.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
@ 2021-11-17 20:57   ` Sean Christopherson
  2021-11-18  7:16     ` Juergen Gross
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-17 20:57 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Tue, Nov 16, 2021, Juergen Gross wrote:
> Today the maximum number of vcpus of a kvm guest is set via a #define
> in a header file.
> 
> In order to support higher vcpu numbers for guests without generally
> increasing the memory consumption of guests on the host especially on
> very large systems add a boot parameter for specifying the number of
> allowed vcpus for guests.
> 
> The default will still be the current setting of 1024. The value 0 has
> the special meaning to limit the number of possible vcpus to the
> number of possible cpus of the host.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> V3:
> - rebase
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
>  arch/x86/include/asm/kvm_host.h                 | 5 ++++-
>  arch/x86/kvm/x86.c                              | 9 ++++++++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index e269c3f66ba4..409a72c2d91b 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2445,6 +2445,13 @@
>  			feature (tagged TLBs) on capable Intel chips.
>  			Default is 1 (enabled)
>  
> +	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
> +			guest. The special value 0 sets the limit to the number
> +			of physical cpus possible on the host (including not
> +			yet hotplugged cpus). Higher values will result in
> +			slightly higher memory consumption per guest.
> +			Default: 1024

Rather than makes this a module param, I would prefer to start with the below
patch (originally from TDX pre-enabling) and then wire up a way for userspace to
_lower_ the max on a per-VM basis, e.g. add a capability.

VMs largely fall into two categories: (1) the max number of vCPUs is known prior
to VM creation, or (2) the max number of vCPUs is unbounded (up to KVM's hard
limit), e.g. for container-style use cases where "vCPUs" are created on-demand in
response to the "guest" creating a new task.

For #1, a per-VM control lets userspace lower the limit to the bare minimum.  For
#2, neither the module param nor the per-VM control is likely to be useful, but
a per-VM control does let mixed environments (both #1 and #2 VMs) lower the limits
for compatible VMs, whereas a module param must be set to the max of any potential VM.

From 0593cb4f73a6c3f0862f9411f0e14f00671f59ae Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson@intel.com>
Date: Fri, 2 Jul 2021 15:04:27 -0700
Subject: [PATCH] KVM: Add max_vcpus field in common 'struct kvm'

Move arm's per-VM max_vcpus field into the generic "struct kvm", and use
it to check vcpus_created in the generic code instead of checking only
the hardcoded absolute KVM-wide max.  x86 TDX guests will reuse the
generic check verbatim, as the max number of vCPUs for a TDX guest is
user defined at VM creation and immutable thereafter.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 3 ---
 arch/arm64/kvm/arm.c              | 7 ++-----
 arch/arm64/kvm/vgic/vgic-init.c   | 6 +++---
 include/linux/kvm_host.h          | 1 +
 virt/kvm/kvm_main.c               | 3 ++-
 5 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4be8486042a7..b51e1aa6ae27 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -108,9 +108,6 @@ struct kvm_arch {
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;

-	/* The maximum number of vCPUs depends on the used GIC model */
-	int max_vcpus;
-
 	/* Interrupt controller */
 	struct vgic_dist	vgic;

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f5490afe1ebf..97c3b83235b4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -153,7 +153,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm_vgic_early_init(kvm);

 	/* The maximum number of VCPUs is limited by the host's GIC model */
-	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
+	kvm->max_vcpus = kvm_arm_default_max_vcpus();

 	set_default_spectre(kvm);

@@ -228,7 +228,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_MAX_VCPUS:
 	case KVM_CAP_MAX_VCPU_ID:
 		if (kvm)
-			r = kvm->arch.max_vcpus;
+			r = kvm->max_vcpus;
 		else
 			r = kvm_arm_default_max_vcpus();
 		break;
@@ -304,9 +304,6 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 	if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
 		return -EBUSY;

-	if (id >= kvm->arch.max_vcpus)
-		return -EINVAL;
-
 	return 0;
 }

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 0a06d0648970..906aee52f2bc 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -97,11 +97,11 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	ret = 0;

 	if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
-		kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
+		kvm->max_vcpus = VGIC_V2_MAX_CPUS;
 	else
-		kvm->arch.max_vcpus = VGIC_V3_MAX_CPUS;
+		kvm->max_vcpus = VGIC_V3_MAX_CPUS;

-	if (atomic_read(&kvm->online_vcpus) > kvm->arch.max_vcpus) {
+	if (atomic_read(&kvm->online_vcpus) > kvm->max_vcpus) {
 		ret = -E2BIG;
 		goto out_unlock;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 60a35d9fe259..5f56516e2f5a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -566,6 +566,7 @@ struct kvm {
 	 * and is accessed atomically.
 	 */
 	atomic_t online_vcpus;
+	int max_vcpus;
 	int created_vcpus;
 	int last_boosted_vcpu;
 	struct list_head vm_list;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3f6d450355f0..e509b963651c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1052,6 +1052,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	rcuwait_init(&kvm->mn_memslots_update_rcuwait);

 	INIT_LIST_HEAD(&kvm->devices);
+	kvm->max_vcpus = KVM_MAX_VCPUS;

 	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);

@@ -3599,7 +3600,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 		return -EINVAL;

 	mutex_lock(&kvm->lock);
-	if (kvm->created_vcpus == KVM_MAX_VCPUS) {
+	if (kvm->created_vcpus >= kvm->max_vcpus) {
 		mutex_unlock(&kvm->lock);
 		return -EINVAL;
 	}
--
2.34.0.rc1.387.gb447b232ab-goog

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
  2021-11-17  6:59   ` Juergen Gross
@ 2021-11-17 23:44   ` Sean Christopherson
  2021-11-18  7:44     ` Juergen Gross
  1 sibling, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-17 23:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Tue, Nov 16, 2021, Juergen Gross wrote:
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 816a82515dcd..64ba9b1c8b3d 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
>  int kvm_ioapic_init(struct kvm *kvm)
>  {
>  	struct kvm_ioapic *ioapic;
> +	size_t sz;
>  	int ret;
>  
> -	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
> +	sz = sizeof(struct kvm_ioapic) +
> +	     sizeof(*ioapic->rtc_status.dest_map.map) *
> +		    BITS_TO_LONGS(KVM_MAX_VCPU_IDS) +
> +	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
> +		    (KVM_MAX_VCPU_IDS);
> +	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
>  	if (!ioapic)
>  		return -ENOMEM;
> +	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);

Oof.  Just do separate allocations.  I highly doubt the performance of the
emulated RTC hinges on the spatial locality of the bitmap and array.  The array
is going to end up in a second page for most configuration anyways.

> +	ioapic->rtc_status.dest_map.vectors =
> +		(void *)(ioapic->rtc_status.dest_map.map +
> +			 BITS_TO_LONGS(KVM_MAX_VCPU_IDS));
>  	spin_lock_init(&ioapic->lock);
>  	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
>  	kvm->arch.vioapic = ioapic;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-17  6:59   ` Juergen Gross
@ 2021-11-17 23:46     ` Sean Christopherson
  2021-11-18  7:45       ` Juergen Gross
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-17 23:46 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Wed, Nov 17, 2021, Juergen Gross wrote:
> On 16.11.21 15:10, Juergen Gross wrote:
> > Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> > via a #define in a header file.
> > 
> > In order to support higher vcpu-ids without generally increasing the
> > memory consumption of guests on the host (some guest structures contain
> > arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
> > bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> > constructed via bit-wise concatenation of socket-id, core-id, etc.
> > As those ids maximum values are not always a power of 2, the vcpu-ids
> > are sparse.
> > 
> > The additional number of bits needed is basically the number of
> > topology levels with a non-power-of-2 maximum value, excluding the top
> > most level.
> > 
> > The default value of the new parameter will be 2 in order to support
> > today's possible topologies. The special value of -1 will use the
> > number of bits needed for a guest with the current host's topology.
> > 
> > Calculating the maximum vcpu-id dynamically requires to allocate the
> > arrays using KVM_MAX_VCPU_IDS as the size dynamically.
> > 
> > Signed-of-by: Juergen Gross <jgross@suse.com>
> 
> Just thought about vcpu-ids a little bit more.
> 
> It would be possible to replace the topology games completely by an
> arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
> depending on the max vcpu-id just as needed.
> 
> Right now the only vcpu-id dependent memory is for the ioapic consisting
> of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
> 
> We could start with a minimal size when setting up an ioapic and extend
> the areas in case a new vcpu created would introduce a vcpu-id outside
> the currently allocated memory. Both arrays are protected by the ioapic
> specific lock (at least I couldn't spot any unprotected usage when
> looking briefly into the code), so reallocating those arrays shouldn't
> be hard. In case of ENOMEM the related vcpu creation would just fail.
> 
> Thoughts?

Why not have userspace state the max vcpu_id it intends to creates on a per-VM
basis?  Same end result, but doesn't require the complexity of reallocating the
I/O APIC stuff.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-17 20:57   ` Sean Christopherson
@ 2021-11-18  7:16     ` Juergen Gross
  2021-11-18 15:05       ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2021-11-18  7:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 7670 bytes --]

On 17.11.21 21:57, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Juergen Gross wrote:
>> Today the maximum number of vcpus of a kvm guest is set via a #define
>> in a header file.
>>
>> In order to support higher vcpu numbers for guests without generally
>> increasing the memory consumption of guests on the host especially on
>> very large systems add a boot parameter for specifying the number of
>> allowed vcpus for guests.
>>
>> The default will still be the current setting of 1024. The value 0 has
>> the special meaning to limit the number of possible vcpus to the
>> number of possible cpus of the host.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V3:
>> - rebase
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 7 +++++++
>>   arch/x86/include/asm/kvm_host.h                 | 5 ++++-
>>   arch/x86/kvm/x86.c                              | 9 ++++++++-
>>   3 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index e269c3f66ba4..409a72c2d91b 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2445,6 +2445,13 @@
>>   			feature (tagged TLBs) on capable Intel chips.
>>   			Default is 1 (enabled)
>>   
>> +	kvm.max_vcpus=	[KVM,X86] Set the maximum allowed numbers of vcpus per
>> +			guest. The special value 0 sets the limit to the number
>> +			of physical cpus possible on the host (including not
>> +			yet hotplugged cpus). Higher values will result in
>> +			slightly higher memory consumption per guest.
>> +			Default: 1024
> 
> Rather than makes this a module param, I would prefer to start with the below
> patch (originally from TDX pre-enabling) and then wire up a way for userspace to
> _lower_ the max on a per-VM basis, e.g. add a capability.
> 
> VMs largely fall into two categories: (1) the max number of vCPUs is known prior
> to VM creation, or (2) the max number of vCPUs is unbounded (up to KVM's hard
> limit), e.g. for container-style use cases where "vCPUs" are created on-demand in
> response to the "guest" creating a new task.
> 
> For #1, a per-VM control lets userspace lower the limit to the bare minimum.  For
> #2, neither the module param nor the per-VM control is likely to be useful, but
> a per-VM control does let mixed environments (both #1 and #2 VMs) lower the limits
> for compatible VMs, whereas a module param must be set to the max of any potential VM.

The main reason for this whole series is a request by a partner
to enable huge VMs on huge machines (huge meaning thousands of
vcpus on thousands of physical cpus).

Making this large number a compile time setting would hurt all
the users who have more standard requirements by allocating the
needed resources even on small systems, so I've switched to a boot
parameter in order to enable those huge numbers only when required.

With Marc's series to use an xarray for the vcpu pointers only the
bitmaps for sending IRQs to vcpus are left which need to be sized
according to the max vcpu limit. Your patch below seems to be fine, but
doesn't help for that case.


Juergen

> 
>  From 0593cb4f73a6c3f0862f9411f0e14f00671f59ae Mon Sep 17 00:00:00 2001
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> Date: Fri, 2 Jul 2021 15:04:27 -0700
> Subject: [PATCH] KVM: Add max_vcpus field in common 'struct kvm'
> 
> Move arm's per-VM max_vcpus field into the generic "struct kvm", and use
> it to check vcpus_created in the generic code instead of checking only
> the hardcoded absolute KVM-wide max.  x86 TDX guests will reuse the
> generic check verbatim, as the max number of vCPUs for a TDX guest is
> user defined at VM creation and immutable thereafter.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/arm64/include/asm/kvm_host.h | 3 ---
>   arch/arm64/kvm/arm.c              | 7 ++-----
>   arch/arm64/kvm/vgic/vgic-init.c   | 6 +++---
>   include/linux/kvm_host.h          | 1 +
>   virt/kvm/kvm_main.c               | 3 ++-
>   5 files changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4be8486042a7..b51e1aa6ae27 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -108,9 +108,6 @@ struct kvm_arch {
>   	/* VTCR_EL2 value for this VM */
>   	u64    vtcr;
> 
> -	/* The maximum number of vCPUs depends on the used GIC model */
> -	int max_vcpus;
> -
>   	/* Interrupt controller */
>   	struct vgic_dist	vgic;
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index f5490afe1ebf..97c3b83235b4 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -153,7 +153,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>   	kvm_vgic_early_init(kvm);
> 
>   	/* The maximum number of VCPUs is limited by the host's GIC model */
> -	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
> +	kvm->max_vcpus = kvm_arm_default_max_vcpus();
> 
>   	set_default_spectre(kvm);
> 
> @@ -228,7 +228,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_MAX_VCPUS:
>   	case KVM_CAP_MAX_VCPU_ID:
>   		if (kvm)
> -			r = kvm->arch.max_vcpus;
> +			r = kvm->max_vcpus;
>   		else
>   			r = kvm_arm_default_max_vcpus();
>   		break;
> @@ -304,9 +304,6 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
>   	if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
>   		return -EBUSY;
> 
> -	if (id >= kvm->arch.max_vcpus)
> -		return -EINVAL;
> -
>   	return 0;
>   }
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
> index 0a06d0648970..906aee52f2bc 100644
> --- a/arch/arm64/kvm/vgic/vgic-init.c
> +++ b/arch/arm64/kvm/vgic/vgic-init.c
> @@ -97,11 +97,11 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
>   	ret = 0;
> 
>   	if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
> -		kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
> +		kvm->max_vcpus = VGIC_V2_MAX_CPUS;
>   	else
> -		kvm->arch.max_vcpus = VGIC_V3_MAX_CPUS;
> +		kvm->max_vcpus = VGIC_V3_MAX_CPUS;
> 
> -	if (atomic_read(&kvm->online_vcpus) > kvm->arch.max_vcpus) {
> +	if (atomic_read(&kvm->online_vcpus) > kvm->max_vcpus) {
>   		ret = -E2BIG;
>   		goto out_unlock;
>   	}
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 60a35d9fe259..5f56516e2f5a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -566,6 +566,7 @@ struct kvm {
>   	 * and is accessed atomically.
>   	 */
>   	atomic_t online_vcpus;
> +	int max_vcpus;
>   	int created_vcpus;
>   	int last_boosted_vcpu;
>   	struct list_head vm_list;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 3f6d450355f0..e509b963651c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1052,6 +1052,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
>   	rcuwait_init(&kvm->mn_memslots_update_rcuwait);
> 
>   	INIT_LIST_HEAD(&kvm->devices);
> +	kvm->max_vcpus = KVM_MAX_VCPUS;
> 
>   	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
> 
> @@ -3599,7 +3600,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>   		return -EINVAL;
> 
>   	mutex_lock(&kvm->lock);
> -	if (kvm->created_vcpus == KVM_MAX_VCPUS) {
> +	if (kvm->created_vcpus >= kvm->max_vcpus) {
>   		mutex_unlock(&kvm->lock);
>   		return -EINVAL;
>   	}
> --
> 2.34.0.rc1.387.gb447b232ab-goog
> 


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation
  2021-11-17 20:50   ` Sean Christopherson
@ 2021-11-18  7:43     ` Juergen Gross
  2021-11-18 14:49       ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2021-11-18  7:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 3144 bytes --]

On 17.11.21 21:50, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Juergen Gross wrote:
>> When emulating Hyperv the theoretical maximum of vcpus supported is
>> 4096, as this is the architectural limit for sending IPIs via the PV
>> interface.
>>
>> For restricting the actual supported number of vcpus for that case
>> introduce another define KVM_MAX_HYPERV_VCPUS and set it to 1024, like
>> today's KVM_MAX_VCPUS. Make both values unsigned ones as this will be
>> needed later.
>>
>> The actual number of supported vcpus for Hyperv emulation will be the
>> lower value of both defines.
>>
>> This is a preparation for a future boot parameter support of the max
>> number of vcpus for a KVM guest.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> V3:
>> - new patch
>> ---
>>   arch/x86/include/asm/kvm_host.h |  3 ++-
>>   arch/x86/kvm/hyperv.c           | 15 ++++++++-------
>>   2 files changed, 10 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 886930ec8264..8ea03ff01c45 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -38,7 +38,8 @@
>>   
>>   #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>>   
>> -#define KVM_MAX_VCPUS 1024
>> +#define KVM_MAX_VCPUS 1024U
>> +#define KVM_MAX_HYPERV_VCPUS 1024U
> 
> I don't see any reason to put this in kvm_host.h, it should never be used outside
> of hyperv.c.

Okay, fine with me.

> 
>>   #define KVM_MAX_VCPU_IDS kvm_max_vcpu_ids()
>>   /* memory slots that are not exposed to userspace */
>>   #define KVM_PRIVATE_MEM_SLOTS 3
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 4a555f32885a..c0fa837121f1 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -41,7 +41,7 @@
>>   /* "Hv#1" signature */
>>   #define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
>>   
>> -#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_VCPUS, 64)
>> +#define KVM_HV_MAX_SPARSE_VCPU_SET_BITS DIV_ROUND_UP(KVM_MAX_HYPERV_VCPUS, 64)
>>   
>>   static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer,
>>   				bool vcpu_kick);
>> @@ -166,7 +166,7 @@ static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx)
>>   	struct kvm_vcpu *vcpu = NULL;
>>   	int i;
>>   
>> -	if (vpidx >= KVM_MAX_VCPUS)
>> +	if (vpidx >= min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS))
> 
> IMO, this is conceptually wrong.  KVM should refuse to allow Hyper-V to be enabled
> if the max number of vCPUs exceeds what can be supported, or should refuse to create

TBH, I wasn't sure where to put this test. Is there a guaranteed
sequence of ioctl()s regarding vcpu creation (or setting the max
number of vcpus) and the Hyper-V enabling?

> the vCPUs.  I agree it makes sense to add a Hyper-V specific limit, since there are
> Hyper-V structures that have a hard limit, but detection of violations should be a
> BUILD_BUG_ON, not a silent failure at runtime.
> 

A BUILD_BUG_ON won't be possible with KVM_MAX_VCPUS being selecteble via
boot parameter.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-17 23:44   ` Sean Christopherson
@ 2021-11-18  7:44     ` Juergen Gross
  0 siblings, 0 replies; 24+ messages in thread
From: Juergen Gross @ 2021-11-18  7:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 1197 bytes --]

On 18.11.21 00:44, Sean Christopherson wrote:
> On Tue, Nov 16, 2021, Juergen Gross wrote:
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 816a82515dcd..64ba9b1c8b3d 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
>>   int kvm_ioapic_init(struct kvm *kvm)
>>   {
>>   	struct kvm_ioapic *ioapic;
>> +	size_t sz;
>>   	int ret;
>>   
>> -	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
>> +	sz = sizeof(struct kvm_ioapic) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.map) *
>> +		    BITS_TO_LONGS(KVM_MAX_VCPU_IDS) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
>> +		    (KVM_MAX_VCPU_IDS);
>> +	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
>>   	if (!ioapic)
>>   		return -ENOMEM;
>> +	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
> 
> Oof.  Just do separate allocations.  I highly doubt the performance of the
> emulated RTC hinges on the spatial locality of the bitmap and array.  The array
> is going to end up in a second page for most configuration anyways.

Okay, fine with me.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-17 23:46     ` Sean Christopherson
@ 2021-11-18  7:45       ` Juergen Gross
  2021-11-18 15:09         ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2021-11-18  7:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2440 bytes --]

On 18.11.21 00:46, Sean Christopherson wrote:
> On Wed, Nov 17, 2021, Juergen Gross wrote:
>> On 16.11.21 15:10, Juergen Gross wrote:
>>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>>> via a #define in a header file.
>>>
>>> In order to support higher vcpu-ids without generally increasing the
>>> memory consumption of guests on the host (some guest structures contain
>>> arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
>>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>>> As those ids maximum values are not always a power of 2, the vcpu-ids
>>> are sparse.
>>>
>>> The additional number of bits needed is basically the number of
>>> topology levels with a non-power-of-2 maximum value, excluding the top
>>> most level.
>>>
>>> The default value of the new parameter will be 2 in order to support
>>> today's possible topologies. The special value of -1 will use the
>>> number of bits needed for a guest with the current host's topology.
>>>
>>> Calculating the maximum vcpu-id dynamically requires to allocate the
>>> arrays using KVM_MAX_VCPU_IDS as the size dynamically.
>>>
>>> Signed-of-by: Juergen Gross <jgross@suse.com>
>>
>> Just thought about vcpu-ids a little bit more.
>>
>> It would be possible to replace the topology games completely by an
>> arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
>> depending on the max vcpu-id just as needed.
>>
>> Right now the only vcpu-id dependent memory is for the ioapic consisting
>> of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
>>
>> We could start with a minimal size when setting up an ioapic and extend
>> the areas in case a new vcpu created would introduce a vcpu-id outside
>> the currently allocated memory. Both arrays are protected by the ioapic
>> specific lock (at least I couldn't spot any unprotected usage when
>> looking briefly into the code), so reallocating those arrays shouldn't
>> be hard. In case of ENOMEM the related vcpu creation would just fail.
>>
>> Thoughts?
> 
> Why not have userspace state the max vcpu_id it intends to creates on a per-VM
> basis?  Same end result, but doesn't require the complexity of reallocating the
> I/O APIC stuff.
> 

And if the userspace doesn't do it (like today)?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation
  2021-11-18  7:43     ` Juergen Gross
@ 2021-11-18 14:49       ` Sean Christopherson
  2021-11-18 15:24         ` Juergen Gross
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-18 14:49 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 17.11.21 21:50, Sean Christopherson wrote:
> > > @@ -166,7 +166,7 @@ static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx)
> > >   	struct kvm_vcpu *vcpu = NULL;
> > >   	int i;
> > > -	if (vpidx >= KVM_MAX_VCPUS)
> > > +	if (vpidx >= min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS))
> > 
> > IMO, this is conceptually wrong.  KVM should refuse to allow Hyper-V to be enabled
> > if the max number of vCPUs exceeds what can be supported, or should refuse to create
> 
> TBH, I wasn't sure where to put this test. Is there a guaranteed
> sequence of ioctl()s regarding vcpu creation (or setting the max
> number of vcpus) and the Hyper-V enabling?

For better or worse (mostly worse), like all other things CPUID, Hyper-V is a per-vCPU
knob.  If KVM can't detect the impossible condition at compile time, kvm_check_cpuid()
is probably the right place to prevent enabling Hyper-V on an unreachable vCPU.

> > the vCPUs.  I agree it makes sense to add a Hyper-V specific limit, since there are
> > Hyper-V structures that have a hard limit, but detection of violations should be a
> > BUILD_BUG_ON, not a silent failure at runtime.
> > 
> 
> A BUILD_BUG_ON won't be possible with KVM_MAX_VCPUS being selecteble via
> boot parameter.

I was thinking that there would still be a KVM-defined max that would cap whatever
comes in from userspace.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18  7:16     ` Juergen Gross
@ 2021-11-18 15:05       ` Sean Christopherson
  2021-11-18 15:15         ` Juergen Gross
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:05 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 17.11.21 21:57, Sean Christopherson wrote:
> > Rather than makes this a module param, I would prefer to start with the below
> > patch (originally from TDX pre-enabling) and then wire up a way for userspace to
> > _lower_ the max on a per-VM basis, e.g. add a capability.
>
> The main reason for this whole series is a request by a partner
> to enable huge VMs on huge machines (huge meaning thousands of
> vcpus on thousands of physical cpus).
> 
> Making this large number a compile time setting would hurt all
> the users who have more standard requirements by allocating the
> needed resources even on small systems, so I've switched to a boot
> parameter in order to enable those huge numbers only when required.
> 
> With Marc's series to use an xarray for the vcpu pointers only the
> bitmaps for sending IRQs to vcpus are left which need to be sized
> according to the max vcpu limit. Your patch below seems to be fine, but
> doesn't help for that case.

Ah, you want to let userspace define a MAX_VCPUS that goes well beyond the current
limit without negatively impacting existing setups.  My idea of a per-VM capability
still works, it would simply require separating the default max from the absolute
max, which this patch mostly does already, it just neglects to set an absolute max.

Which is a good segue into pointing out that if a module param is added, it needs
to be sanity checked against a KVM-defined max.  The admin may be trusted to some
extent, but there is zero reason to let userspace set max_vcspus to 4 billion.
At that point, it really is just a param vs. capability question.

I like the idea of a capability because there are already two known use cases,
arm64's GIC and x86's TDX, and it could also be used to reduce the kernel's footprint
for use cases that run large numbers of smaller VMs.

The other alternative would be to turn KVM_MAX_VCPUS into a Kconfig knob.  I assume
the partner isn't running a vanilla distro build and could set it as they see fit.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-18  7:45       ` Juergen Gross
@ 2021-11-18 15:09         ` Sean Christopherson
  2021-11-18 15:19           ` Juergen Gross
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:09 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 18.11.21 00:46, Sean Christopherson wrote:
> > On Wed, Nov 17, 2021, Juergen Gross wrote:
> > > On 16.11.21 15:10, Juergen Gross wrote:
> > > > Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
> > > > via a #define in a header file.
> > > > 
> > > > In order to support higher vcpu-ids without generally increasing the
> > > > memory consumption of guests on the host (some guest structures contain
> > > > arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
> > > > bits to the vcpu-id. Additional bits are needed as the vcpu-id is
> > > > constructed via bit-wise concatenation of socket-id, core-id, etc.
> > > > As those ids maximum values are not always a power of 2, the vcpu-ids
> > > > are sparse.
> > > > 
> > > > The additional number of bits needed is basically the number of
> > > > topology levels with a non-power-of-2 maximum value, excluding the top
> > > > most level.
> > > > 
> > > > The default value of the new parameter will be 2 in order to support
> > > > today's possible topologies. The special value of -1 will use the
> > > > number of bits needed for a guest with the current host's topology.
> > > > 
> > > > Calculating the maximum vcpu-id dynamically requires to allocate the
> > > > arrays using KVM_MAX_VCPU_IDS as the size dynamically.
> > > > 
> > > > Signed-of-by: Juergen Gross <jgross@suse.com>
> > > 
> > > Just thought about vcpu-ids a little bit more.
> > > 
> > > It would be possible to replace the topology games completely by an
> > > arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
> > > depending on the max vcpu-id just as needed.
> > > 
> > > Right now the only vcpu-id dependent memory is for the ioapic consisting
> > > of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
> > > 
> > > We could start with a minimal size when setting up an ioapic and extend
> > > the areas in case a new vcpu created would introduce a vcpu-id outside
> > > the currently allocated memory. Both arrays are protected by the ioapic
> > > specific lock (at least I couldn't spot any unprotected usage when
> > > looking briefly into the code), so reallocating those arrays shouldn't
> > > be hard. In case of ENOMEM the related vcpu creation would just fail.
> > > 
> > > Thoughts?
> > 
> > Why not have userspace state the max vcpu_id it intends to creates on a per-VM
> > basis?  Same end result, but doesn't require the complexity of reallocating the
> > I/O APIC stuff.
> > 
> 
> And if the userspace doesn't do it (like today)?

Similar to my comments in patch 4, KVM's current limits could be used as the
defaults, and any use case wanting to go beyond that would need an updated
userspace.  Exceeding those limits today doesn't work, so there's no ABI breakage
by requiring a userspace change.

Or again, this could be a Kconfig knob, though that feels a bit weird in this case.
But it might make sense if it can be tied to something in the kernel's config?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:05       ` Sean Christopherson
@ 2021-11-18 15:15         ` Juergen Gross
  2021-11-18 15:32           ` Sean Christopherson
  2021-11-18 15:46           ` Sean Christopherson
  0 siblings, 2 replies; 24+ messages in thread
From: Juergen Gross @ 2021-11-18 15:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2520 bytes --]

On 18.11.21 16:05, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 17.11.21 21:57, Sean Christopherson wrote:
>>> Rather than makes this a module param, I would prefer to start with the below
>>> patch (originally from TDX pre-enabling) and then wire up a way for userspace to
>>> _lower_ the max on a per-VM basis, e.g. add a capability.
>>
>> The main reason for this whole series is a request by a partner
>> to enable huge VMs on huge machines (huge meaning thousands of
>> vcpus on thousands of physical cpus).
>>
>> Making this large number a compile time setting would hurt all
>> the users who have more standard requirements by allocating the
>> needed resources even on small systems, so I've switched to a boot
>> parameter in order to enable those huge numbers only when required.
>>
>> With Marc's series to use an xarray for the vcpu pointers only the
>> bitmaps for sending IRQs to vcpus are left which need to be sized
>> according to the max vcpu limit. Your patch below seems to be fine, but
>> doesn't help for that case.
> 
> Ah, you want to let userspace define a MAX_VCPUS that goes well beyond the current
> limit without negatively impacting existing setups.  My idea of a per-VM capability

Correct.

> still works, it would simply require separating the default max from the absolute
> max, which this patch mostly does already, it just neglects to set an absolute max.
> 
> Which is a good segue into pointing out that if a module param is added, it needs
> to be sanity checked against a KVM-defined max.  The admin may be trusted to some
> extent, but there is zero reason to let userspace set max_vcspus to 4 billion.
> At that point, it really is just a param vs. capability question.

I agree. Capping it at e.g. 65536 would probably be a good idea.

> I like the idea of a capability because there are already two known use cases,
> arm64's GIC and x86's TDX, and it could also be used to reduce the kernel's footprint
> for use cases that run large numbers of smaller VMs.
> 
> The other alternative would be to turn KVM_MAX_VCPUS into a Kconfig knob.  I assume

I like combining the capping and a Kconfig knob. So let the distro (or
whoever is building the kernel) decide, which is the max allowed value
(e.g. above 65536 per default).

> the partner isn't running a vanilla distro build and could set it as they see fit.

And here you are wrong. They'd like to use standard SUSE Linux (SLE).


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits
  2021-11-18 15:09         ` Sean Christopherson
@ 2021-11-18 15:19           ` Juergen Gross
  0 siblings, 0 replies; 24+ messages in thread
From: Juergen Gross @ 2021-11-18 15:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 3364 bytes --]

On 18.11.21 16:09, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 18.11.21 00:46, Sean Christopherson wrote:
>>> On Wed, Nov 17, 2021, Juergen Gross wrote:
>>>> On 16.11.21 15:10, Juergen Gross wrote:
>>>>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>>>>> via a #define in a header file.
>>>>>
>>>>> In order to support higher vcpu-ids without generally increasing the
>>>>> memory consumption of guests on the host (some guest structures contain
>>>>> arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
>>>>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>>>>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>>>>> As those ids maximum values are not always a power of 2, the vcpu-ids
>>>>> are sparse.
>>>>>
>>>>> The additional number of bits needed is basically the number of
>>>>> topology levels with a non-power-of-2 maximum value, excluding the top
>>>>> most level.
>>>>>
>>>>> The default value of the new parameter will be 2 in order to support
>>>>> today's possible topologies. The special value of -1 will use the
>>>>> number of bits needed for a guest with the current host's topology.
>>>>>
>>>>> Calculating the maximum vcpu-id dynamically requires to allocate the
>>>>> arrays using KVM_MAX_VCPU_IDS as the size dynamically.
>>>>>
>>>>> Signed-of-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> Just thought about vcpu-ids a little bit more.
>>>>
>>>> It would be possible to replace the topology games completely by an
>>>> arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
>>>> depending on the max vcpu-id just as needed.
>>>>
>>>> Right now the only vcpu-id dependent memory is for the ioapic consisting
>>>> of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
>>>>
>>>> We could start with a minimal size when setting up an ioapic and extend
>>>> the areas in case a new vcpu created would introduce a vcpu-id outside
>>>> the currently allocated memory. Both arrays are protected by the ioapic
>>>> specific lock (at least I couldn't spot any unprotected usage when
>>>> looking briefly into the code), so reallocating those arrays shouldn't
>>>> be hard. In case of ENOMEM the related vcpu creation would just fail.
>>>>
>>>> Thoughts?
>>>
>>> Why not have userspace state the max vcpu_id it intends to creates on a per-VM
>>> basis?  Same end result, but doesn't require the complexity of reallocating the
>>> I/O APIC stuff.
>>>
>>
>> And if the userspace doesn't do it (like today)?
> 
> Similar to my comments in patch 4, KVM's current limits could be used as the
> defaults, and any use case wanting to go beyond that would need an updated
> userspace.  Exceeding those limits today doesn't work, so there's no ABI breakage
> by requiring a userspace change.

Hmm, nice idea. Will look into it.

> Or again, this could be a Kconfig knob, though that feels a bit weird in this case.
> But it might make sense if it can be tied to something in the kernel's config?

Having a Kconfig knob for an absolute upper bound of vcpus should
be fine. If someone doesn't like the capability to explicitly let
qemu create very large VMs, he/she can still set that upper bound
to the normal KVM_MAX_VCPUS value.

Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation
  2021-11-18 14:49       ` Sean Christopherson
@ 2021-11-18 15:24         ` Juergen Gross
  2021-11-18 16:12           ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Juergen Gross @ 2021-11-18 15:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 2066 bytes --]

On 18.11.21 15:49, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 17.11.21 21:50, Sean Christopherson wrote:
>>>> @@ -166,7 +166,7 @@ static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx)
>>>>    	struct kvm_vcpu *vcpu = NULL;
>>>>    	int i;
>>>> -	if (vpidx >= KVM_MAX_VCPUS)
>>>> +	if (vpidx >= min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS))
>>>
>>> IMO, this is conceptually wrong.  KVM should refuse to allow Hyper-V to be enabled
>>> if the max number of vCPUs exceeds what can be supported, or should refuse to create
>>
>> TBH, I wasn't sure where to put this test. Is there a guaranteed
>> sequence of ioctl()s regarding vcpu creation (or setting the max
>> number of vcpus) and the Hyper-V enabling?
> 
> For better or worse (mostly worse), like all other things CPUID, Hyper-V is a per-vCPU
> knob.  If KVM can't detect the impossible condition at compile time, kvm_check_cpuid()
> is probably the right place to prevent enabling Hyper-V on an unreachable vCPU.

With HYPERV_CPUID_IMPLEMENT_LIMITS already returning the
supported number of vcpus for the Hyper-V case I'm not sure
there is really more needed.

The problem I'm seeing is that the only thing I can do is to
let kvm_get_hv_cpuid() not adding the Hyper-V cpuid leaves for
vcpus > 64. I can't return a failure, because that would
probably let vcpu creation fail. And this is something we don't
want, as kvm_get_hv_cpuid() is called even in the case the guest
doesn't plan to use Hyper-V extensions.

> 
>>> the vCPUs.  I agree it makes sense to add a Hyper-V specific limit, since there are
>>> Hyper-V structures that have a hard limit, but detection of violations should be a
>>> BUILD_BUG_ON, not a silent failure at runtime.
>>>
>>
>> A BUILD_BUG_ON won't be possible with KVM_MAX_VCPUS being selecteble via
>> boot parameter.
> 
> I was thinking that there would still be a KVM-defined max that would cap whatever
> comes in from userspace.
> 

See my answers to you your other responses.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:15         ` Juergen Gross
@ 2021-11-18 15:32           ` Sean Christopherson
  2021-11-18 16:19             ` Juergen Gross
  2021-11-18 15:46           ` Sean Christopherson
  1 sibling, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:32 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 18.11.21 16:05, Sean Christopherson wrote:
> > the partner isn't running a vanilla distro build and could set it as they see fit.
> 
> And here you are wrong. They'd like to use standard SUSE Linux (SLE).

Huh.  As in, completely off-the-shelf kernel binaries without any tweaks to the
config?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:15         ` Juergen Gross
  2021-11-18 15:32           ` Sean Christopherson
@ 2021-11-18 15:46           ` Sean Christopherson
  1 sibling, 0 replies; 24+ messages in thread
From: Sean Christopherson @ 2021-11-18 15:46 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 18.11.21 16:05, Sean Christopherson wrote:
> > Which is a good segue into pointing out that if a module param is added, it needs
> > to be sanity checked against a KVM-defined max.  The admin may be trusted to some
> > extent, but there is zero reason to let userspace set max_vcspus to 4 billion.
> > At that point, it really is just a param vs. capability question.
> 
> I agree. Capping it at e.g. 65536 would probably be a good idea.

Any reason to choose 65536 in particular?  Why not cap it at the upper limit of
NR_CPUS_RANGE_END / MAXSMP, which is currently 8192?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation
  2021-11-18 15:24         ` Juergen Gross
@ 2021-11-18 16:12           ` Sean Christopherson
  0 siblings, 0 replies; 24+ messages in thread
From: Sean Christopherson @ 2021-11-18 16:12 UTC (permalink / raw)
  To: Juergen Gross
  Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin

On Thu, Nov 18, 2021, Juergen Gross wrote:
> On 18.11.21 15:49, Sean Christopherson wrote:
> > On Thu, Nov 18, 2021, Juergen Gross wrote:
> > > On 17.11.21 21:50, Sean Christopherson wrote:
> > > > > @@ -166,7 +166,7 @@ static struct kvm_vcpu *get_vcpu_by_vpidx(struct kvm *kvm, u32 vpidx)
> > > > >    	struct kvm_vcpu *vcpu = NULL;
> > > > >    	int i;
> > > > > -	if (vpidx >= KVM_MAX_VCPUS)
> > > > > +	if (vpidx >= min(KVM_MAX_VCPUS, KVM_MAX_HYPERV_VCPUS))
> > > > 
> > > > IMO, this is conceptually wrong.  KVM should refuse to allow Hyper-V to be enabled
> > > > if the max number of vCPUs exceeds what can be supported, or should refuse to create
> > > 
> > > TBH, I wasn't sure where to put this test. Is there a guaranteed
> > > sequence of ioctl()s regarding vcpu creation (or setting the max
> > > number of vcpus) and the Hyper-V enabling?
> > 
> > For better or worse (mostly worse), like all other things CPUID, Hyper-V is a per-vCPU
> > knob.  If KVM can't detect the impossible condition at compile time, kvm_check_cpuid()
> > is probably the right place to prevent enabling Hyper-V on an unreachable vCPU.
> 
> With HYPERV_CPUID_IMPLEMENT_LIMITS already returning the
> supported number of vcpus for the Hyper-V case I'm not sure
> there is really more needed.

Yep, that'll do nicely.

> The problem I'm seeing is that the only thing I can do is to
> let kvm_get_hv_cpuid() not adding the Hyper-V cpuid leaves for
> vcpus > 64. I can't return a failure, because that would
> probably let vcpu creation fail. And this is something we don't
> want, as kvm_get_hv_cpuid() is called even in the case the guest
> doesn't plan to use Hyper-V extensions.

Argh, that thing is annoying.

My vote is still to reject KVM_SET_CPUID{2} if userspace attempts to enable Hyper-V
for a vCPU when the max number of vCPUs exceeds HYPERV_CPUID_IMPLEMENT_LIMITS.  If
userspace parrots back KVM_GET_SUPPORTED_CPUID, it will specify KVM as the hypervisor,
i.e. enabling Hyper-V requires deliberate action from userspace.

The non-vCPU version of KVM_GET_SUPPORTED_HV_CPUID is not an issue, e.g. the generic
KVM_GET_SUPPORTED_CPUID also reports features that become unsupported if dependent
CPUID features are not enabled by userspace.

The discrepancy with the per-vCPU variant of kvm_get_hv_cpuid() would be unfortunate,
but IMO that ship sailed when the per-vCPU variant was added by commit 2bc39970e932
("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID").  We can't retroactively
yank that code out, but I don't think we should be overly concerned with keeping it
100% accurate.  IMO it's perfectly fine for KVM to define the output of
KVM_GET_SUPPORTED_HV_CPUID as being garbage if the vCPU cannot possibly support
Hyper-V enlightments.  That situation isn't possible today, so there's no backwards
compatibility to worry about.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest
  2021-11-18 15:32           ` Sean Christopherson
@ 2021-11-18 16:19             ` Juergen Gross
  0 siblings, 0 replies; 24+ messages in thread
From: Juergen Gross @ 2021-11-18 16:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, x86, linux-doc, linux-kernel, Jonathan Corbet,
	Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 446 bytes --]

On 18.11.21 16:32, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 18.11.21 16:05, Sean Christopherson wrote:
>>> the partner isn't running a vanilla distro build and could set it as they see fit.
>>
>> And here you are wrong. They'd like to use standard SUSE Linux (SLE).
> 
> Huh.  As in, completely off-the-shelf kernel binaries without any tweaks to the
> config?

This is the idea, yes.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-11-18 16:19 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16 14:10 [PATCH v3 0/4] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
2021-11-16 14:10 ` [PATCH v3 1/4] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
2021-11-17  6:59   ` Juergen Gross
2021-11-17 23:46     ` Sean Christopherson
2021-11-18  7:45       ` Juergen Gross
2021-11-18 15:09         ` Sean Christopherson
2021-11-18 15:19           ` Juergen Gross
2021-11-17 23:44   ` Sean Christopherson
2021-11-18  7:44     ` Juergen Gross
2021-11-16 14:10 ` [PATCH v3 2/4] x86/kvm: introduce a per cpu vcpu mask Juergen Gross
2021-11-16 14:10 ` [PATCH v3 3/4] x86/kvm: add max number of vcpus for hyperv emulation Juergen Gross
2021-11-17 20:50   ` Sean Christopherson
2021-11-18  7:43     ` Juergen Gross
2021-11-18 14:49       ` Sean Christopherson
2021-11-18 15:24         ` Juergen Gross
2021-11-18 16:12           ` Sean Christopherson
2021-11-16 14:10 ` [PATCH v3 4/4] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
2021-11-17 20:57   ` Sean Christopherson
2021-11-18  7:16     ` Juergen Gross
2021-11-18 15:05       ` Sean Christopherson
2021-11-18 15:15         ` Juergen Gross
2021-11-18 15:32           ` Sean Christopherson
2021-11-18 16:19             ` Juergen Gross
2021-11-18 15:46           ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).