On 18.11.21 16:09, Sean Christopherson wrote:
> On Thu, Nov 18, 2021, Juergen Gross wrote:
>> On 18.11.21 00:46, Sean Christopherson wrote:
>>> On Wed, Nov 17, 2021, Juergen Gross wrote:
>>>> On 16.11.21 15:10, Juergen Gross wrote:
>>>>> Today the maximum vcpu-id of a kvm guest's vcpu on x86 systems is set
>>>>> via a #define in a header file.
>>>>>
>>>>> In order to support higher vcpu-ids without generally increasing the
>>>>> memory consumption of guests on the host (some guest structures contain
>>>>> arrays sized by KVM_MAX_VCPU_IDS) add a boot parameter for adding some
>>>>> bits to the vcpu-id. Additional bits are needed as the vcpu-id is
>>>>> constructed via bit-wise concatenation of socket-id, core-id, etc.
>>>>> As those ids maximum values are not always a power of 2, the vcpu-ids
>>>>> are sparse.
>>>>>
>>>>> The additional number of bits needed is basically the number of
>>>>> topology levels with a non-power-of-2 maximum value, excluding the top
>>>>> most level.
>>>>>
>>>>> The default value of the new parameter will be 2 in order to support
>>>>> today's possible topologies. The special value of -1 will use the
>>>>> number of bits needed for a guest with the current host's topology.
>>>>>
>>>>> Calculating the maximum vcpu-id dynamically requires to allocate the
>>>>> arrays using KVM_MAX_VCPU_IDS as the size dynamically.
>>>>>
>>>>> Signed-of-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> Just thought about vcpu-ids a little bit more.
>>>>
>>>> It would be possible to replace the topology games completely by an
>>>> arbitrary rather high vcpu-id limit (65536?) and to allocate the memory
>>>> depending on the max vcpu-id just as needed.
>>>>
>>>> Right now the only vcpu-id dependent memory is for the ioapic consisting
>>>> of a vcpu-id indexed bitmap and a vcpu-id indexed byte array (vectors).
>>>>
>>>> We could start with a minimal size when setting up an ioapic and extend
>>>> the areas in case a new vcpu created would introduce a vcpu-id outside
>>>> the currently allocated memory. Both arrays are protected by the ioapic
>>>> specific lock (at least I couldn't spot any unprotected usage when
>>>> looking briefly into the code), so reallocating those arrays shouldn't
>>>> be hard. In case of ENOMEM the related vcpu creation would just fail.
>>>>
>>>> Thoughts?
>>>
>>> Why not have userspace state the max vcpu_id it intends to creates on a per-VM
>>> basis?  Same end result, but doesn't require the complexity of reallocating the
>>> I/O APIC stuff.
>>>
>>
>> And if the userspace doesn't do it (like today)?
> 
> Similar to my comments in patch 4, KVM's current limits could be used as the
> defaults, and any use case wanting to go beyond that would need an updated
> userspace.  Exceeding those limits today doesn't work, so there's no ABI breakage
> by requiring a userspace change.

Hmm, nice idea. Will look into it.

> Or again, this could be a Kconfig knob, though that feels a bit weird in this case.
> But it might make sense if it can be tied to something in the kernel's config?

Having a Kconfig knob for an absolute upper bound of vcpus should
be fine. If someone doesn't like the capability to explicitly let
qemu create very large VMs, he/she can still set that upper bound
to the normal KVM_MAX_VCPUS value.

Juergen