Re: [PATCH] kvm: x86: Increase MAX_VCPUS to 710

From: Juergen Gross <jgross@suse.com>
To: Eduardo Habkost <ehabkost@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>
Subject: Re: [PATCH] kvm: x86: Increase MAX_VCPUS to 710
Date: Fri, 3 Sep 2021 10:13:43 +0200	[thread overview]
Message-ID: <442852d2-a5f7-8f6f-deb7-31f66a4579d6@suse.com> (raw)
In-Reply-To: <20210901152525.g5fnf5ketta3fjhl@habkost.net>

[-- Attachment #1.1.1: Type: text/plain, Size: 6361 bytes --]

On 01.09.21 17:25, Eduardo Habkost wrote:
> On Wed, Sep 01, 2021 at 04:42:08PM +0200, Vitaly Kuznetsov wrote:
>> Igor Mammedov <imammedo@redhat.com> writes:
>>
>>> On Wed, 01 Sep 2021 12:13:55 +0200
>>> Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>
>>>> Igor Mammedov <imammedo@redhat.com> writes:
>>>>
>>>>> On Wed, 01 Sep 2021 10:02:18 +0200
>>>>> Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>>>   
>>>>>> Eduardo Habkost <ehabkost@redhat.com> writes:
>>>>>>    
>>>>>>> Support for 710 VCPUs has been tested by Red Hat since RHEL-8.4.
>>>>>>> Increase KVM_MAX_VCPUS and KVM_SOFT_MAX_VCPUS to 710.
>>>>>>>
>>>>>>> For reference, visible effects of changing KVM_MAX_VCPUS are:
>>>>>>> - KVM_CAP_MAX_VCPUS and KVM_CAP_NR_VCPUS will now return 710 (of course)
>>>>>>> - Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
>>>>>>>    will now be 710
>>>>>>> - Bitmap stack variables that will grow:
>>>>>>>    - At kvm_hv_flush_tlb()  kvm_hv_send_ipi():
>>>>>>>      - Sparse VCPU bitmap (vp_bitmap) will be 96 bytes long
>>>>>>>      - vcpu_bitmap will be 92 bytes long
>>>>>>>    - vcpu_bitmap at bioapic_write_indirect() will be 92 bytes long
>>>>>>>      once patch "KVM: x86: Fix stack-out-of-bounds memory access
>>>>>>>      from ioapic_write_indirect()" is applied
>>>>>>>
>>>>>>> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>>>>>>> ---
>>>>>>>   arch/x86/include/asm/kvm_host.h | 4 ++--
>>>>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>>>> index af6ce8d4c86a..f76fae42bf45 100644
>>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>>> @@ -37,8 +37,8 @@
>>>>>>>   
>>>>>>>   #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
>>>>>>>   
>>>>>>> -#define KVM_MAX_VCPUS 288
>>>>>>> -#define KVM_SOFT_MAX_VCPUS 240
>>>>>>> +#define KVM_MAX_VCPUS 710
>>>>>>
>>>>>> Out of pure curiosity, where did 710 came from? Is this some particular
>>>>>> hardware which was used for testing (weird number btw). Should we maybe
>>>>>> go to e.g. 1024 for the sake of the beauty of powers of two? :-)
> 
> 710 wasn't tested with real VMs yet due to userspace limitations
> that still need to be addressed (specifically, due to SMBIOS 2.1
> table size limits).
> 
> I would be more than happy to set it to 1024 or 2048 if the KVM
> maintainers agree.  :)
> 
> For reference, RHEL-8.4 is compiled with KVM_MAX_VCPUS=2048, but
> userspace enforces a 710 VCPU limit.
> 
>>>>>>    
>>>>>>> +#define KVM_SOFT_MAX_VCPUS 710
>>>>>>
>>>>>> Do we really need KVM_SOFT_MAX_VCPUS which is equal to KVM_MAX_VCPUS?
>>>>>>
>>>>>> Reading
>>>>>>
>>>>>> commit 8c3ba334f8588e1d5099f8602cf01897720e0eca
>>>>>> Author: Sasha Levin <levinsasha928@gmail.com>
>>>>>> Date:   Mon Jul 18 17:17:15 2011 +0300
>>>>>>
>>>>>>      KVM: x86: Raise the hard VCPU count limit
>>>>>>
>>>>>> the idea behind KVM_SOFT_MAX_VCPUS was to allow developers to test high
>>>>>> vCPU numbers without claiming such configurations as supported.
>>>>>>
>>>>>> I have two alternative suggestions:
>>>>>> 1) Drop KVM_SOFT_MAX_VCPUS completely.
>>>>>> 2) Raise it to a higher number (e.g. 2048)
> 
> I will send a RFC later proposing we make KVM_MAX_VCPUS
> configurable by Kconfig, and dropping KVM_SOFT_MAX_VCPUS.
> 
>>>>>>    
>>>>>>>   #define KVM_MAX_VCPU_ID 1023
>>>>>>
>>>>>> 1023 may not be enough now. I rememeber there was a suggestion to make
>>>>>> max_vcpus configurable via module parameter and this question was
>>>>>> raised:
>>>>>>
>>>>>> https://lore.kernel.org/lkml/878s292k75.fsf@vitty.brq.redhat.com/
>>>>>>
>>>>>> TL;DR: to support EPYC-like topologies we need to keep
>>>>>>   KVM_MAX_VCPU_ID = 4 * KVM_MAX_VCPUS
> 
> 1024 seems to be enough on all the CPU topologies I have seen,
> but I can happily implement the 4x rule below, just to be sure.
> 
>>>>>
>>>>> VCPU_ID (sequential 0-n range) is not APIC ID (sparse distribution),
>>>>> so topology encoded in the later should be orthogonal to VCPU_ID.
>>>>
>>>> Why do we even have KVM_MAX_VCPU_ID which is != KVM[_SOFT]_MAX_VCPUS
>>>> then?
>>> I'd say for compat reasons (8c3ba334f85 KVM: x86: Raise the hard VCPU count limit)
>>>
>>> qemu warns users that they are out of recommended (tested) limit when
>>> it sees requested maxcpus over soft limit.
>>> See soft_vcpus_limit in qemu.
>>>
>>
>> That's the reason why we have KVM_SOFT_MAX_VCPUS in addition to
>> KVM_MAX_VCPUS, not why we have KVM_MAX_VCPU_ID :-)
>>
>>>
>>>> KVM_MAX_VCPU_ID is only checked in kvm_vm_ioctl_create_vcpu() which
>>>> passes 'id' down to kvm_vcpu_init() which, in its turn, sets
>>>> 'vcpu->vcpu_id'. This is, for example, returned by kvm_x2apic_id():
>>>>
>>>> static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
>>>> {
>>>>          return apic->vcpu->vcpu_id;
>>>> }
>>>>
>>>> So I'm pretty certain this is actually APIC id and it has topology in
>>>> it.
>>> Yep, I mixed it up with cpu_index on QEMU side,
>>> for x86 it fetches actual apic id and feeds that to kvm when creating vCPU.
>>>
>>> It looks like KVM_MAX_VCPU_ID (KVM_SOFT_MAX_VCPUS) is essentially
>>> MAX_[SOFT_]APIC_ID which in some places is treated as max number of vCPUs,
>>> so actual max count of vCPUs could be less than that (in case of sparse APIC
>>> IDs /non power of 2 thread|core|whatever count/).
>>
>> Yes. To avoid the confusion, I'd suggest we re-define KVM_MAX_VCPU_ID as
>> something like:
>>
>> #define KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO 4
>> #define KVM_MAX_VCPU_ID (KVM_MAX_VCPUS * KVM_MAX_VCPU_ID_TO_MAX_VCPUS_RATIO)
>>
>> and add a comment about sparse APIC IDs/topology.
> 
> I will submit a new version of this patch with a rule like the
> above.
> 
> A 4x ratio is very generous, but the only impact of a large
> KVM_MAX_VCPU_ID is a larger struct kvm_ioapic.  Changing
> KVM_MAX_VCPU_ID from 1024 to 4096 will make struct kvm_ioapic
> grow from 1628 bytes to 5084 bytes, which I assume is OK.
> 

I'm just about to send V2 of my series [1] to support specifying the
max vcpu-id and max number of vcpus of a guest via command line
parameters.

Juergen

[1]: https://lore.kernel.org/kvm/20210701154105.23215-1-jgross@suse.com/

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]