linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>,
	kvm@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: maz@kernel.org, ehabkost@redhat.com,
	Jonathan Corbet <corbet@lwn.net>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits
Date: Fri, 3 Sep 2021 15:53:00 +0200	[thread overview]
Message-ID: <a6b00e7f-5b8f-315d-1d3c-a8641f44f0c3@suse.com> (raw)
In-Reply-To: <874kb1n59j.fsf@vitty.brq.redhat.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 7437 bytes --]

On 03.09.21 15:43, Vitaly Kuznetsov wrote:
> Juergen Gross <jgross@suse.com> writes:
> 
>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>> via a #define in a header file.
>>
>> In order to support higher vcpu-ids without generally increasing the
>> memory consumption of guests on the host (some guest structures contain
>> arrays sized by KVM_MAX_VCPU_ID) add a boot parameter for adding some
>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>> As those ids maximum values are not always a power of 2, the vcpu-ids
>> are sparse.
>>
>> The additional number of bits needed is basically the number of
>> topology levels with a non-power-of-2 maximum value, excluding the top
>> most level.
>>
>> The default value of the new parameter will be to take the correct
>> setting from the host's topology.
>>
>> Calculating the maximum vcpu-id dynamically requires to allocate the
>> arrays using KVM_MAX_VCPU_ID as the size dynamically.
>>
>> Signed-of-by: Juergen Gross <jgross@suse.com>
>> ---
>> V2:
>> - switch to specifying additional bits (based on comment by Vitaly
>>    Kuznetsov)
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   .../admin-guide/kernel-parameters.txt         | 18 ++++++++++++
>>   arch/x86/include/asm/kvm_host.h               |  4 ++-
>>   arch/x86/kvm/ioapic.c                         | 12 +++++++-
>>   arch/x86/kvm/ioapic.h                         |  4 +--
>>   arch/x86/kvm/x86.c                            | 29 +++++++++++++++++++
>>   5 files changed, 63 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 84dc5790741b..37e194299311 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2435,6 +2435,24 @@
>>   			feature (tagged TLBs) on capable Intel chips.
>>   			Default is 1 (enabled)
>>   
>> +	kvm.vcpu_id_add_bits=
>> +			[KVM,X86] The vcpu-ids of guests are sparse, as they
>> +			are constructed by bit-wise concatenation of the ids of
>> +			the different topology levels (sockets, cores, threads).
>> +
>> +			This parameter specifies how many additional bits the
>> +			maximum vcpu-id needs compared to the maximum number of
>> +			vcpus.
>> +
>> +			Normally this value is the number of topology levels
>> +			without the threads level and without the highest
>> +			level.
>> +
>> +			The special value -1 can be used to support guests
>> +			with the same topology is the host.
>> +
>> +			Default: -1
>> +
>>   	l1d_flush=	[X86,INTEL]
>>   			Control mitigation for L1D based snooping vulnerability.
>>   
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index af6ce8d4c86a..3513edee8e22 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -39,7 +39,7 @@
>>   
>>   #define KVM_MAX_VCPUS 288
>>   #define KVM_SOFT_MAX_VCPUS 240
>> -#define KVM_MAX_VCPU_ID 1023
>> +#define KVM_MAX_VCPU_ID kvm_max_vcpu_id()
>>   /* memory slots that are not exposed to userspace */
>>   #define KVM_PRIVATE_MEM_SLOTS 3
>>   
>> @@ -1588,6 +1588,8 @@ extern u64  kvm_max_tsc_scaling_ratio;
>>   extern u64  kvm_default_tsc_scaling_ratio;
>>   /* bus lock detection supported? */
>>   extern bool kvm_has_bus_lock_exit;
>> +/* maximum vcpu-id */
>> +unsigned int kvm_max_vcpu_id(void);
>>   
>>   extern u64 kvm_mce_cap_supported;
>>   
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index ff005fe738a4..52e8ea90c914 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -685,11 +685,21 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
>>   int kvm_ioapic_init(struct kvm *kvm)
>>   {
>>   	struct kvm_ioapic *ioapic;
>> +	size_t sz;
>>   	int ret;
>>   
>> -	ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL_ACCOUNT);
>> +	sz = sizeof(struct kvm_ioapic) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.map) *
>> +		    BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1) +
>> +	     sizeof(*ioapic->rtc_status.dest_map.vectors) *
>> +		    (KVM_MAX_VCPU_ID + 1);
>> +	ioapic = kzalloc(sz, GFP_KERNEL_ACCOUNT);
>>   	if (!ioapic)
>>   		return -ENOMEM;
>> +	ioapic->rtc_status.dest_map.map = (void *)(ioapic + 1);
>> +	ioapic->rtc_status.dest_map.vectors =
>> +		(void *)(ioapic->rtc_status.dest_map.map +
>> +			 BITS_TO_LONGS(KVM_MAX_VCPU_ID + 1));
>>   	spin_lock_init(&ioapic->lock);
>>   	INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
>>   	kvm->arch.vioapic = ioapic;
>> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
>> index bbd4a5d18b5d..623a3c5afad7 100644
>> --- a/arch/x86/kvm/ioapic.h
>> +++ b/arch/x86/kvm/ioapic.h
>> @@ -39,13 +39,13 @@ struct kvm_vcpu;
>>   
>>   struct dest_map {
>>   	/* vcpu bitmap where IRQ has been sent */
>> -	DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1);
>> +	unsigned long *map;
>>   
>>   	/*
>>   	 * Vector sent to a given vcpu, only valid when
>>   	 * the vcpu's bit in map is set
>>   	 */
>> -	u8 vectors[KVM_MAX_VCPU_ID + 1];
>> +	u8 *vectors;
>>   };
>>   
>>   
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e5d5c5ed7dd4..6b6f38f0b617 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -78,6 +78,7 @@
>>   #include <asm/intel_pt.h>
>>   #include <asm/emulate_prefix.h>
>>   #include <asm/sgx.h>
>> +#include <asm/topology.h>
>>   #include <clocksource/hyperv_timer.h>
>>   
>>   #define CREATE_TRACE_POINTS
>> @@ -184,6 +185,34 @@ module_param(force_emulation_prefix, bool, S_IRUGO);
>>   int __read_mostly pi_inject_timer = -1;
>>   module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
>>   
>> +static int __read_mostly vcpu_id_add_bits = -1;
>> +module_param(vcpu_id_add_bits, int, S_IRUGO);
>> +
>> +unsigned int kvm_max_vcpu_id(void)
>> +{
>> +	int n_bits = fls(KVM_MAX_VCPUS - 1);
>> +
>> +	if (vcpu_id_add_bits < -1 || vcpu_id_add_bits > (32 - n_bits)) {
>> +		pr_err("Invalid value of vcpu_id_add_bits=%d parameter!\n",
>> +		       vcpu_id_add_bits);
>> +		vcpu_id_add_bits = -1;
>> +	}
>> +
>> +	if (vcpu_id_add_bits >= 0) {
>> +		n_bits += vcpu_id_add_bits;
>> +	} else {
>> +		n_bits++;		/* One additional bit for core level. */
>> +		if (topology_max_die_per_package() > 1)
>> +			n_bits++;	/* One additional bit for die level. */
> 
> This assumes topology_max_die_per_package() can not be greater than 2,
> or 1 additional bit may not suffice, right?

No. Each topology level can at least add one additional bit. This
mechanism assumes that each level consumes not more bits as
necessary, so with e.g. a core count of 18 per die 5 bits are used,
and not more.

> 
>> +	}
>> +
>> +	if (!n_bits)
>> +		n_bits = 1;
> 
> Nitpick: AFAIU n_bits can't be zero here as KVM_MAX_VCPUS is still
> static. The last patch of the series, however, makes it possible when
> max_vcpus = 1 and vcpu_id_add_bits = 0. With this, I'd suggest to move
> the check to the last patch.

This is true only if no downstream has a patch setting KVM_MAX_VCPUS to
1. I'd rather be safe than sorry here, especially as it would be very
easy to miss this dependency.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

  reply	other threads:[~2021-09-03 13:53 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-03 13:08 [PATCH v2 0/6] x86/kvm: add boot parameters for max vcpu configs Juergen Gross
2021-09-03 13:08 ` [PATCH v2 1/6] x86/kvm: remove non-x86 stuff from arch/x86/kvm/ioapic.h Juergen Gross
2021-09-03 13:08 ` [PATCH v2 2/6] x86/kvm: add boot parameter for adding vcpu-id bits Juergen Gross
2021-09-03 13:43   ` Vitaly Kuznetsov
2021-09-03 13:53     ` Juergen Gross [this message]
2021-09-03 19:48   ` Eduardo Habkost
2021-09-06  4:46     ` Juergen Gross
2021-09-28 16:41   ` Paolo Bonzini
2021-09-03 13:08 ` [PATCH v2 3/6] x86/kvm: introduce per cpu vcpu masks Juergen Gross
2021-09-03 16:05   ` Eduardo Habkost
2021-09-06  4:34     ` Juergen Gross
2021-09-07 18:34   ` Eduardo Habkost
2021-09-08  8:41     ` Vitaly Kuznetsov
2021-09-03 13:08 ` [PATCH v2 4/6] kvm: use kvfree() in kvm_arch_free_vm() Juergen Gross
2021-09-28 16:48   ` Paolo Bonzini
2021-09-03 13:08 ` [PATCH v2 5/6] kvm: allocate vcpu pointer array separately Juergen Gross
2021-09-03 14:41   ` Marc Zyngier
2021-09-06  4:33     ` Juergen Gross
2021-09-06  9:46       ` Marc Zyngier
2021-09-09 20:28         ` Sean Christopherson
2021-09-03 13:08 ` [PATCH v2 6/6] x86/kvm: add boot parameter for setting max number of vcpus per guest Juergen Gross
2021-09-06  0:45   ` Yao Yuan
2021-09-06  4:47     ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6b00e7f-5b8f-315d-1d3c-a8641f44f0c3@suse.com \
    --to=jgross@suse.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=ehabkost@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).