From: Marc Zyngier <marc.zyngier@arm.com>
To: Zenghui Yu <yuzenghui@huawei.com>
Cc: zhang.zhanghailiang@huawei.com, kvm@vger.kernel.org,
qemu-arm@nongnu.org, kvmarm@lists.cs.columbia.edu
Subject: Re: Can we boot a 512U kvm guest?
Date: Tue, 13 Aug 2019 15:17:12 +0100 [thread overview]
Message-ID: <86d0h9no9j.wl-marc.zyngier@arm.com> (raw)
In-Reply-To: <86aa9609-7dc9-1461-ae47-f50897cd0875@huawei.com>
On Tue, 13 Aug 2019 09:50:27 +0100,
Zenghui Yu <yuzenghui@huawei.com> wrote:
Hi Zenghui,
>
> Hi folks,
>
> Since commit e25028c8ded0 ("KVM: arm/arm64: Bump VGIC_V3_MAX_CPUS to
> 512"), we seemed to be allowed to boot a 512U guest. But I failed to
> start it up with the latest QEMU. I guess there are at least *two*
> reasons (limitations).
>
> First I got a QEMU abort:
> "kvm_set_irq: Invalid argument"
>
> Enable the trace_kvm_irq_line() under debugfs, when it comed with
> vcpu-256, I got:
> "Inject UNKNOWN interrupt (3), vcpu->idx: 0, num: 23, level: 0"
> and kvm_vm_ioctl_irq_line() returns -EINVAL to user-space...
>
> So the thing is that we only have 8 bits for vcpu_index field ([23:16])
> in KVM_IRQ_LINE ioctl. irq_type field will be corrupted if we inject a
> PPI to vcpu-256, whose vcpu_index will take 9 bits.
Irk. Not great indeed. Clearly, we have a couple of holes in the way
we test these ABI changes (/me eyes Eric...).
>
> I temporarily patched the KVM and QEMU with the following diff:
>
> ---8<---
> diff --git a/arch/arm64/include/uapi/asm/kvm.h
> b/arch/arm64/include/uapi/asm/kvm.h
> index 95516a4..39a0fb1 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -325,10 +325,10 @@ struct kvm_vcpu_events {
> #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
>
> /* KVM_IRQ_LINE irq field index values */
> -#define KVM_ARM_IRQ_TYPE_SHIFT 24
> -#define KVM_ARM_IRQ_TYPE_MASK 0xff
> +#define KVM_ARM_IRQ_TYPE_SHIFT 28
> +#define KVM_ARM_IRQ_TYPE_MASK 0xf
> #define KVM_ARM_IRQ_VCPU_SHIFT 16
> -#define KVM_ARM_IRQ_VCPU_MASK 0xff
> +#define KVM_ARM_IRQ_VCPU_MASK 0xfff
> #define KVM_ARM_IRQ_NUM_SHIFT 0
> #define KVM_ARM_IRQ_NUM_MASK 0xffff
>
> ---8<---
>
> It makes things a bit better, it also immediately BREAKs the api with
> old versions.
Yes, and we can't have that (specially if you consider that this API
is shared between 32 and 64bit). One "get out of jail card" is to
steal a few bits from the top of the word, and encode things there:
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 4602464ebdfb..86db092e4c2f 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -254,8 +254,10 @@ struct kvm_vcpu_events {
#define KVM_DEV_ARM_ITS_CTRL_RESET 4
/* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT 28
+#define KVM_ARM_IRQ_VCPU2_MASK 0xf
#define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK 0xff
+#define KVM_ARM_IRQ_TYPE_MASK 0xf
#define KVM_ARM_IRQ_VCPU_SHIFT 16
#define KVM_ARM_IRQ_VCPU_MASK 0xff
#define KVM_ARM_IRQ_NUM_SHIFT 0
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 7b7ac0f6cec9..44cb25bfc95e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -308,8 +308,10 @@ struct kvm_vcpu_events {
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
/* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT 28
+#define KVM_ARM_IRQ_VCPU2_MASK 0xf
#define KVM_ARM_IRQ_TYPE_SHIFT 24
-#define KVM_ARM_IRQ_TYPE_MASK 0xff
+#define KVM_ARM_IRQ_TYPE_MASK 0xf
#define KVM_ARM_IRQ_VCPU_SHIFT 16
#define KVM_ARM_IRQ_VCPU_MASK 0xff
#define KVM_ARM_IRQ_NUM_SHIFT 0
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 90cedebaeb94..fb685c1c0514 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -889,6 +889,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
irq_type = (irq >> KVM_ARM_IRQ_TYPE_SHIFT) & KVM_ARM_IRQ_TYPE_MASK;
vcpu_idx = (irq >> KVM_ARM_IRQ_VCPU_SHIFT) & KVM_ARM_IRQ_VCPU_MASK;
+ vcpu_idx += ((irq >> KVM_ARM_IRQ_VCPU2_SHIFT) & KVM_ARM_IRQ_VCPU2_MASK) * (KVM_ARM_IRQ_VCPU_MASK + 1);
irq_num = (irq >> KVM_ARM_IRQ_NUM_SHIFT) & KVM_ARM_IRQ_NUM_MASK;
trace_kvm_irq_line(irq_type, vcpu_idx, irq_num, irq_level->level);
It should work because we've been careful not to allow value outside
of {0, 1, 2} for irq_type. I don't like it, but I really don't feel
like adding another IRQ related ioctl. We still have 16 irq types
(which is already a waste of space), and we can go up to 4096 vcpu.
Peter, what do you think?
> Next comes one more QEMU abort (with the "fix" above):
> "Failed to set device address: No space left on device"
>
> We register two io devices (rd_dev and sgi_dev) on KVM_MMIO_BUS for
> each redistributor. 512 vcpus take 1024 io devices, which is beyond the
> maximum limitation of the current kernel - NR_IOBUS_DEVS (1000).
> So we get a ENOSPC error here.
I can reproduce that issue here ("499 vcpus on my Chromebook,
baby"). Not an ABI problem though, and we can bump it up if that's
needed.
> I don't know if the similar problems have been discussed before in ML.
> Is it time to really support the 512U guest?
The real question is "why the hell would you want to do that?" ;-)
Seriously, I'm very interested in finding out what is the use case for
these gigantic VMs, other than debugging the kernel for big machines.
Thanks,
M.
--
Jazz is not dead, it just smells funny.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2019-08-13 14:17 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-13 8:50 Can we boot a 512U kvm guest? Zenghui Yu
2019-08-13 14:17 ` Marc Zyngier [this message]
2019-08-13 21:44 ` Auger Eric
2019-08-14 6:51 ` Zenghui Yu
2019-08-22 9:08 ` Auger Eric
2019-08-22 9:29 ` Marc Zyngier
2019-08-22 9:50 ` Auger Eric
2019-08-22 10:23 ` Marc Zyngier
2019-08-23 2:21 ` Zenghui Yu
2019-08-23 7:37 ` Auger Eric
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86d0h9no9j.wl-marc.zyngier@arm.com \
--to=marc.zyngier@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=qemu-arm@nongnu.org \
--cc=yuzenghui@huawei.com \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).