Re: Can we boot a 512U kvm guest?

From: Marc Zyngier <maz@kernel.org>
To: Auger Eric <eric.auger@redhat.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	kvmarm@lists.cs.columbia.edu, qemu-arm@nongnu.org
Cc: zhang.zhanghailiang@huawei.com, kvm@vger.kernel.org
Subject: Re: Can we boot a 512U kvm guest?
Date: Thu, 22 Aug 2019 10:29:18 +0100	[thread overview]
Message-ID: <fbeb47df-7ea2-04ce-5fe3-a6a6a4751b8b@kernel.org> (raw)
In-Reply-To: <da5c87d6-8b66-75f9-e720-9f1d80a76d7d@redhat.com>

Hi Eric,

On 22/08/2019 10:08, Auger Eric wrote:
> Hi Zenghui,
> 
> On 8/13/19 10:50 AM, Zenghui Yu wrote:
>> Hi folks,
>>
>> Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to
>> 512"), we seemed to be allowed to boot a 512U guest.  But I failed to
>> start it up with the latest QEMU.  I guess there are at least *two*
>> reasons (limitations).
>>
>> First I got a QEMU abort:
>>     "kvm_set_irq: Invalid argument"
>>
>> Enable the trace_kvm_irq_line() under debugfs, when it comed with
>> vcpu-256, I got:
>>     "Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0"
>> and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space...
>>
>> So the thing is that we only have 8 bits for vcpu_index field ([23:16])
>> in KVM_IRQ_LINE ioctl.  irq_type field will be corrupted if we inject a
>> PPI to vcpu-256, whose vcpu_index will take 9 bits.
>>
>> I temporarily patched the KVM and QEMU with the following diff:
>>
>> ---8<---
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h
>> b/arch/arm64/include/uapi/asm/kvm.h
>> index 95516a4..39a0fb1 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -325,10 +325,10 @@ struct kvm_vcpu_events {
>>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER        1
>>
>>  /* KVM_IRQ_LINE irq field index values */
>> -#define KVM_ARM_IRQ_TYPE_SHIFT        24
>> -#define KVM_ARM_IRQ_TYPE_MASK        0xff
>> +#define KVM_ARM_IRQ_TYPE_SHIFT        28
>> +#define KVM_ARM_IRQ_TYPE_MASK        0xf
>>  #define KVM_ARM_IRQ_VCPU_SHIFT        16
>> -#define KVM_ARM_IRQ_VCPU_MASK        0xff
>> +#define KVM_ARM_IRQ_VCPU_MASK        0xfff
>>  #define KVM_ARM_IRQ_NUM_SHIFT        0
>>  #define KVM_ARM_IRQ_NUM_MASK        0xffff
>>
>> ---8<---
>>
>> It makes things a bit better, it also immediately BREAKs the api with
>> old versions.
>>
>>
>> Next comes one more QEMU abort (with the "fix" above):
>>     "Failed to set device address: No space left on device"
>>
>> We register two io devices (rd_dev and sgi_dev) on KVM_MMIO_BUS for
>> each redistributor. 512 vcpus take 1024 io devices, which is beyond the
>> maximum limitation of the current kernel - NR_IOBUS_DEVS (1000).
>> So we get a ENOSPC error here.
> 
> Do you plan to send a patch for increasing the NR_IOBUS_DEVS? Otherwise
> I can do it.

I really wonder whether that's a sensible thing to do on its own.

Looking at the implementation of kvm_io_bus_register_dev (which copies
the whole array each time we insert a device), we have an obvious issue
with systems that create a large number of device structures, leading to
large transient memory usage and slow guest start.

We could also try and reduce the number of devices we insert by making
the redistributor a single device (which it is in reality). It probably
means we need to make the MMIO decoding more flexible.

Thanks,

	M.
-- 
Jazz is not dead, it just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm