linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL
@ 2019-09-27 11:53 Greg Kurz
  2019-09-27 11:53 ` [PATCH v2 1/6] KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated Greg Kurz
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Greg Kurz @ 2019-09-27 11:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

This brings some fixes and allows to start more VMs with an in-kernel
XIVE or XICS-on-XIVE device.

Changes since v1 (https://patchwork.ozlabs.org/cover/1166099/):
- drop a useless patch
- add a patch to show VP ids in debugfs
- update some changelogs
- fix buggy check in patch 5
- Cc: stable 

--
Greg

---

Greg Kurz (6):
      KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated
      KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
      KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs
      KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper
      KVM: PPC: Book3S HV: XIVE: Make VP block size configurable
      KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs


 Documentation/virt/kvm/devices/xics.txt |   14 +++
 Documentation/virt/kvm/devices/xive.txt |    8 ++
 arch/powerpc/include/uapi/asm/kvm.h     |    3 +
 arch/powerpc/kvm/book3s_xive.c          |  142 ++++++++++++++++++++++++-------
 arch/powerpc/kvm/book3s_xive.h          |   17 ++++
 arch/powerpc/kvm/book3s_xive_native.c   |   40 +++------
 6 files changed, 167 insertions(+), 57 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/6] KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated
  2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
@ 2019-09-27 11:53 ` Greg Kurz
  2019-09-27 11:53 ` [PATCH v2 2/6] KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use Greg Kurz
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2019-09-27 11:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

If we cannot allocate the XIVE VPs in OPAL, the creation of a XIVE or
XICS-on-XIVE device is aborted as expected, but we leave kvm->arch.xive
set forever since the release method isn't called in this case. Any
subsequent tentative to create a XIVE or XICS-on-XIVE for this VM will
thus always fail (DoS). This is a problem for QEMU since it destroys
and re-creates these devices when the VM is reset: the VM would be
restricted to using the much slower emulated XIVE or XICS forever.

As an alternative to adding rollback, do not assign kvm->arch.xive before
making sure the XIVE VPs are allocated in OPAL.

Cc: stable@vger.kernel.org # v5.2
Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method")
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c        |   11 +++++------
 arch/powerpc/kvm/book3s_xive_native.c |    2 +-
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 591bfb4bfd0f..99884662facb 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1997,6 +1997,10 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 
 	pr_devel("Creating xive for partition\n");
 
+	/* Already there ? */
+	if (kvm->arch.xive)
+		return -EEXIST;
+
 	xive = kvmppc_xive_get_device(kvm, type);
 	if (!xive)
 		return -ENOMEM;
@@ -2006,12 +2010,6 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 	xive->kvm = kvm;
 	mutex_init(&xive->lock);
 
-	/* Already there ? */
-	if (kvm->arch.xive)
-		ret = -EEXIST;
-	else
-		kvm->arch.xive = xive;
-
 	/* We use the default queue size set by the host */
 	xive->q_order = xive_native_default_eq_shift();
 	if (xive->q_order < PAGE_SHIFT)
@@ -2031,6 +2029,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 	if (ret)
 		return ret;
 
+	kvm->arch.xive = xive;
 	return 0;
 }
 
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 248c1ea9e788..6b91c74839d5 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -1079,7 +1079,6 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	dev->private = xive;
 	xive->dev = dev;
 	xive->kvm = kvm;
-	kvm->arch.xive = xive;
 	mutex_init(&xive->mapping_lock);
 	mutex_init(&xive->lock);
 
@@ -1100,6 +1099,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	if (ret)
 		return ret;
 
+	kvm->arch.xive = xive;
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/6] KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
  2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
  2019-09-27 11:53 ` [PATCH v2 1/6] KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated Greg Kurz
@ 2019-09-27 11:53 ` Greg Kurz
  2019-09-27 11:53 ` [PATCH v2 3/6] KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs Greg Kurz
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2019-09-27 11:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

Connecting a vCPU to a XIVE KVM device means establishing a 1:1
association between a vCPU id and the offset (VP id) of a VP
structure within a fixed size block of VPs. We currently try to
enforce the 1:1 relationship by checking that a vCPU with the
same id isn't already connected. This is good but unfortunately
not enough because we don't map VP ids to raw vCPU ids but to
packed vCPU ids, and the packing function kvmppc_pack_vcpu_id()
isn't bijective by design. We got away with it because QEMU passes
vCPU ids that fit well in the packing pattern. But nothing prevents
userspace to come up with a forged vCPU id resulting in a packed id
collision which causes the KVM device to associate two vCPUs to the
same VP. This greatly confuses the irq layer and ultimately crashes
the kernel, as shown below.

Example: a guest with 1 guest thread per core, a core stride of
8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to
inject at some point an invalid vCPU id 348, which is the packed
version of itself and 2392, we get:

genirq: Flags mismatch irq 199. 00010000 (kvm-2-2392) vs. 00010000 (kvm-2-348)
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
Call Trace:
[c000003f7f9937e0] [c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable)
[c000003f7f993820] [c0000000001cb480] __setup_irq+0xa70/0xad0
[c000003f7f9938d0] [c0000000001cb75c] request_threaded_irq+0x13c/0x260
[c000003f7f993940] [c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392
------------[ cut here ]------------
remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348'
WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:  c00000000053b0cc LR: c00000000053b0c8 CTR: c0000000000ba3b0
REGS: c000003f7f9934b0 TRAP: 0700   Not tainted  (5.3.0-xive-nr-servers-5.3-gku+)
MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48228222  XER: 20040000
CFAR: c000000000131a50 IRQMASK: 0
GPR00: c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057
GPR04: 0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf
GPR08: 0000000000000007 0000000000000007 0000000000000001 9000000000001033
GPR12: 0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006
GPR24: c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48
GPR28: 0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0
NIP [c00000000053b0cc] remove_proc_entry+0x1ec/0x200
LR [c00000000053b0c8] remove_proc_entry+0x1e8/0x200
Call Trace:
[c000003f7f993740] [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable)
[c000003f7f9937e0] [c0000000001d3654] unregister_irq_proc+0x114/0x150
[c000003f7f993880] [c0000000001c6284] free_desc+0x54/0xb0
[c000003f7f9938c0] [c0000000001c65ec] irq_free_descs+0xac/0x100
[c000003f7f993910] [c0000000001d1ff8] irq_dispose_mapping+0x68/0x80
[c000003f7f993940] [c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm]
[c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm]
[c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm]
[c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm]
[c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30
[c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120
[c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80
[c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68
Instruction dump:
2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378
3884f080 3863b948 4bbf6925 60000000 <0fe00000> 4bffff7c fba10088 4bbf6e41
---[ end trace b925b67a74a1d8d1 ]---
BUG: Kernel NULL pointer dereference at 0x00000010
Faulting instruction address: 0xc00800000d44fc04
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm]
CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G        W         5.3.0-xive-nr-servers-5.3-gku+ #38
NIP:  c00800000d44fc04 LR: c00800000d44fc00 CTR: c0000000001cd970
REGS: c000003f7f9938e0 TRAP: 0300   Tainted: G        W          (5.3.0-xive-nr-servers-5.3-gku+)
MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24228882  XER: 20040000
CFAR: c0000000001cd9ac DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
GPR00: c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000
GPR04: 00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8
GPR08: 0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd
GPR12: 0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1
GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000
GPR20: 000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8
GPR24: 0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000
GPR28: c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90
NIP [c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm]
LR [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm]
Call Trace:
[c000003f7f993b70] [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable)
[c000003f7f993bd0] [c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm]
[c000003f7f993c30] [c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm]
[c000003f7f993c70] [c00000000046730c] __fput+0xec/0x320
[c000003f7f993cd0] [c000000000164ae0] task_work_run+0x150/0x1c0
[c000003f7f993d30] [c000000000025034] do_notify_resume+0x304/0x440
[c000003f7f993e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
Instruction dump:
3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c
2f890000 419effb8 4800821d e8410018 <e9230010> e9490008 9b2a0039 7c0004ac
---[ end trace b925b67a74a1d8d2 ]---

Kernel panic - not syncing: Fatal exception

This affects both XIVE and XICS-on-XIVE devices since the beginning.

Check the VP id instead of the vCPU id when a new vCPU is connected.
The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu()
is moved after the check to avoid the need for rollback.

Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
---
v2: - more detailed changelog
---
 arch/powerpc/kvm/book3s_xive.c        |   24 ++++++++++++++++--------
 arch/powerpc/kvm/book3s_xive.h        |   12 ++++++++++++
 arch/powerpc/kvm/book3s_xive_native.c |    6 ++++--
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 99884662facb..baa740815b3c 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1217,6 +1217,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 	struct kvmppc_xive *xive = dev->private;
 	struct kvmppc_xive_vcpu *xc;
 	int i, r = -EBUSY;
+	u32 vp_id;
 
 	pr_devel("connect_vcpu(cpu=%d)\n", cpu);
 
@@ -1228,25 +1229,32 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 		return -EPERM;
 	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
 		return -EBUSY;
-	if (kvmppc_xive_find_server(vcpu->kvm, cpu)) {
-		pr_devel("Duplicate !\n");
-		return -EEXIST;
-	}
 	if (cpu >= (KVM_MAX_VCPUS * vcpu->kvm->arch.emul_smt_mode)) {
 		pr_devel("Out of bounds !\n");
 		return -EINVAL;
 	}
-	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
-	if (!xc)
-		return -ENOMEM;
 
 	/* We need to synchronize with queue provisioning */
 	mutex_lock(&xive->lock);
+
+	vp_id = kvmppc_xive_vp(xive, cpu);
+	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
+		pr_devel("Duplicate !\n");
+		r = -EEXIST;
+		goto bail;
+	}
+
+	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
+	if (!xc) {
+		r = -ENOMEM;
+		goto bail;
+	}
+
 	vcpu->arch.xive_vcpu = xc;
 	xc->xive = xive;
 	xc->vcpu = vcpu;
 	xc->server_num = cpu;
-	xc->vp_id = kvmppc_xive_vp(xive, cpu);
+	xc->vp_id = vp_id;
 	xc->mfrr = 0xff;
 	xc->valid = true;
 
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 955b820ffd6d..fe3ed50e0818 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -220,6 +220,18 @@ static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server)
 	return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server);
 }
 
+static inline bool kvmppc_xive_vp_in_use(struct kvm *kvm, u32 vp_id)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (vcpu->arch.xive_vcpu && vp_id == vcpu->arch.xive_vcpu->vp_id)
+			return true;
+	}
+	return false;
+}
+
 /*
  * Mapping between guest priorities and host priorities
  * is as follow.
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 6b91c74839d5..ebb4215baf45 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -106,6 +106,7 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 	struct kvmppc_xive *xive = dev->private;
 	struct kvmppc_xive_vcpu *xc = NULL;
 	int rc;
+	u32 vp_id;
 
 	pr_devel("native_connect_vcpu(server=%d)\n", server_num);
 
@@ -124,7 +125,8 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 
 	mutex_lock(&xive->lock);
 
-	if (kvmppc_xive_find_server(vcpu->kvm, server_num)) {
+	vp_id = kvmppc_xive_vp(xive, server_num);
+	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
 		pr_devel("Duplicate !\n");
 		rc = -EEXIST;
 		goto bail;
@@ -141,7 +143,7 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 	xc->vcpu = vcpu;
 	xc->server_num = server_num;
 
-	xc->vp_id = kvmppc_xive_vp(xive, server_num);
+	xc->vp_id = vp_id;
 	xc->valid = true;
 	vcpu->arch.irq_type = KVMPPC_IRQ_XIVE;
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/6] KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs
  2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
  2019-09-27 11:53 ` [PATCH v2 1/6] KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated Greg Kurz
  2019-09-27 11:53 ` [PATCH v2 2/6] KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use Greg Kurz
@ 2019-09-27 11:53 ` Greg Kurz
  2019-09-30 10:07   ` Cédric Le Goater
  2019-09-27 11:53 ` [PATCH v2 4/6] KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper Greg Kurz
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2019-09-27 11:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

Print out the VP id of each connected vCPU, this allow to see:
- the VP block base in which OPAL encodes information that may be
  useful when debugging
- the packed vCPU id which may differ from the raw vCPU id if the
  latter is >= KVM_MAX_VCPUS (2048)

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c        |    4 ++--
 arch/powerpc/kvm/book3s_xive_native.c |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index baa740815b3c..0b7859e40f66 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -2107,9 +2107,9 @@ static int xive_debug_show(struct seq_file *m, void *private)
 		if (!xc)
 			continue;
 
-		seq_printf(m, "cpu server %#x CPPR:%#x HWCPPR:%#x"
+		seq_printf(m, "cpu server %#x VP:%#x CPPR:%#x HWCPPR:%#x"
 			   " MFRR:%#x PEND:%#x h_xirr: R=%lld V=%lld\n",
-			   xc->server_num, xc->cppr, xc->hw_cppr,
+			   xc->server_num, xc->vp_id, xc->cppr, xc->hw_cppr,
 			   xc->mfrr, xc->pending,
 			   xc->stat_rm_h_xirr, xc->stat_vm_h_xirr);
 
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index ebb4215baf45..43a86858390a 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -1204,8 +1204,8 @@ static int xive_native_debug_show(struct seq_file *m, void *private)
 		if (!xc)
 			continue;
 
-		seq_printf(m, "cpu server %#x NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x\n",
-			   xc->server_num,
+		seq_printf(m, "cpu server %#x VP=%#x NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x\n",
+			   xc->server_num, xc->vp_id,
 			   vcpu->arch.xive_saved_state.nsr,
 			   vcpu->arch.xive_saved_state.cppr,
 			   vcpu->arch.xive_saved_state.ipb,


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/6] KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper
  2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
                   ` (2 preceding siblings ...)
  2019-09-27 11:53 ` [PATCH v2 3/6] KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs Greg Kurz
@ 2019-09-27 11:53 ` Greg Kurz
  2019-09-30 12:01   ` Cédric Le Goater
  2019-09-27 11:54 ` [PATCH v2 5/6] KVM: PPC: Book3S HV: XIVE: Make VP block size configurable Greg Kurz
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2019-09-27 11:53 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

Reduce code duplication by consolidating the checking of vCPU ids and VP
ids to a common helper used by both legacy and native XIVE KVM devices.
And explain the magic with a comment.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c        |   42 ++++++++++++++++++++++++++-------
 arch/powerpc/kvm/book3s_xive.h        |    1 +
 arch/powerpc/kvm/book3s_xive_native.c |   11 ++-------
 3 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 0b7859e40f66..d84da9f6ee88 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1211,6 +1211,37 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
 	vcpu->arch.xive_vcpu = NULL;
 }
 
+static bool kvmppc_xive_vcpu_id_valid(struct kvmppc_xive *xive, u32 cpu)
+{
+	/* We have a block of KVM_MAX_VCPUS VPs. We just need to check
+	 * raw vCPU ids are below the expected limit for this guest's
+	 * core stride ; kvmppc_pack_vcpu_id() will pack them down to an
+	 * index that can be safely used to compute a VP id that belongs
+	 * to the VP block.
+	 */
+	return cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode;
+}
+
+int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)
+{
+	u32 vp_id;
+
+	if (!kvmppc_xive_vcpu_id_valid(xive, cpu)) {
+		pr_devel("Out of bounds !\n");
+		return -EINVAL;
+	}
+
+	vp_id = kvmppc_xive_vp(xive, cpu);
+	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
+		pr_devel("Duplicate !\n");
+		return -EEXIST;
+	}
+
+	*vp = vp_id;
+
+	return 0;
+}
+
 int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 			     struct kvm_vcpu *vcpu, u32 cpu)
 {
@@ -1229,20 +1260,13 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 		return -EPERM;
 	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
 		return -EBUSY;
-	if (cpu >= (KVM_MAX_VCPUS * vcpu->kvm->arch.emul_smt_mode)) {
-		pr_devel("Out of bounds !\n");
-		return -EINVAL;
-	}
 
 	/* We need to synchronize with queue provisioning */
 	mutex_lock(&xive->lock);
 
-	vp_id = kvmppc_xive_vp(xive, cpu);
-	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
-		pr_devel("Duplicate !\n");
-		r = -EEXIST;
+	r = kvmppc_xive_compute_vp_id(xive, cpu, &vp_id);
+	if (r)
 		goto bail;
-	}
 
 	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
 	if (!xc) {
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index fe3ed50e0818..90cf6ec35a68 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -296,6 +296,7 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
 struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
 void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu,
 				    struct kvmppc_xive_vcpu *xc, int irq);
+int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp);
 
 #endif /* CONFIG_KVM_XICS */
 #endif /* _KVM_PPC_BOOK3S_XICS_H */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 43a86858390a..5bb480b2aafd 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -118,19 +118,12 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
 		return -EPERM;
 	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
 		return -EBUSY;
-	if (server_num >= (KVM_MAX_VCPUS * vcpu->kvm->arch.emul_smt_mode)) {
-		pr_devel("Out of bounds !\n");
-		return -EINVAL;
-	}
 
 	mutex_lock(&xive->lock);
 
-	vp_id = kvmppc_xive_vp(xive, server_num);
-	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
-		pr_devel("Duplicate !\n");
-		rc = -EEXIST;
+	rc = kvmppc_xive_compute_vp_id(xive, server_num, &vp_id);
+	if (rc)
 		goto bail;
-	}
 
 	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
 	if (!xc) {


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/6] KVM: PPC: Book3S HV: XIVE: Make VP block size configurable
  2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
                   ` (3 preceding siblings ...)
  2019-09-27 11:53 ` [PATCH v2 4/6] KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper Greg Kurz
@ 2019-09-27 11:54 ` Greg Kurz
  2019-09-30 12:11   ` Cédric Le Goater
  2019-09-27 11:54 ` [PATCH v2 6/6] KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs Greg Kurz
  2019-10-16 21:44 ` [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
  6 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2019-09-27 11:54 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

The XIVE VP is an internal structure which allow the XIVE interrupt
controller to maintain the interrupt context state of vCPUs non
dispatched on HW threads.

When a guest is started, the XIVE KVM device allocates a block of
XIVE VPs in OPAL, enough to accommodate the highest possible vCPU
id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048).
With a guest's core stride of 8 and a threading mode of 1 (QEMU's
default), a VM must run at least 256 vCPUs to actually need such a
range of VPs.

A POWER9 system has a limited XIVE VP space : 512k and KVM is
currently wasting this HW resource with large VP allocations,
especially since a typical VM likely runs with a lot less vCPUs.

Make the size of the VP block configurable. Add an nr_servers
field to the XIVE structure and a function to set it for this
purpose.

Split VP allocation out of the device create function. Since the
VP block isn't used before the first vCPU connects to the XIVE KVM
device, allocation is now performed by kvmppc_xive_connect_vcpu().
This gives the opportunity to set nr_servers in between:

          kvmppc_xive_create() / kvmppc_xive_native_create()
                               .
                               .
                     kvmppc_xive_set_nr_servers()
                               .
                               .
    kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu()

The connect_vcpu() functions check that the vCPU id is below nr_servers
and if it is the first vCPU they allocate the VP block. This is protected
against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers()
with the xive->lock mutex.

Also, the block is allocated once for the device lifetime: nr_servers
should stay constant otherwise connect_vcpu() could generate a boggus
VP id and likely crash OPAL. It is thus forbidden to update nr_servers
once the block is allocated.

If the VP allocation fail, return ENOSPC which seems more appropriate to
report the depletion of system wide HW resource than ENOMEM or ENXIO.

A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence
only need 256 VPs instead of 2048. If the stride is set to match the number
of threads per core, this goes further down to 32.

This will be exposed to userspace by a subsequent patch.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
v2: - update nr_server check and clip down to KVM_MAX_VCPUS
      in kvmppc_xive_set_nr_servers()
---
 arch/powerpc/kvm/book3s_xive.c        |   65 +++++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_xive.h        |    4 ++
 arch/powerpc/kvm/book3s_xive_native.c |   18 +++------
 3 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index d84da9f6ee88..6c35b3d95986 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1213,13 +1213,13 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
 
 static bool kvmppc_xive_vcpu_id_valid(struct kvmppc_xive *xive, u32 cpu)
 {
-	/* We have a block of KVM_MAX_VCPUS VPs. We just need to check
+	/* We have a block of xive->nr_servers VPs. We just need to check
 	 * raw vCPU ids are below the expected limit for this guest's
 	 * core stride ; kvmppc_pack_vcpu_id() will pack them down to an
 	 * index that can be safely used to compute a VP id that belongs
 	 * to the VP block.
 	 */
-	return cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode;
+	return cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode;
 }
 
 int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)
@@ -1231,6 +1231,14 @@ int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)
 		return -EINVAL;
 	}
 
+	if (xive->vp_base == XIVE_INVALID_VP) {
+		xive->vp_base = xive_native_alloc_vp_block(xive->nr_servers);
+		pr_devel("VP_Base=%x nr_servers=%d\n", xive->vp_base, xive->nr_servers);
+
+		if (xive->vp_base == XIVE_INVALID_VP)
+			return -ENOSPC;
+	}
+
 	vp_id = kvmppc_xive_vp(xive, cpu);
 	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
 		pr_devel("Duplicate !\n");
@@ -1858,6 +1866,43 @@ int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
 	return 0;
 }
 
+int kvmppc_xive_set_nr_servers(struct kvmppc_xive *xive, u64 addr)
+{
+	u32 __user *ubufp = (u32 __user *) addr;
+	u32 nr_servers;
+	int rc = 0;
+
+	if (get_user(nr_servers, ubufp))
+		return -EFAULT;
+
+	pr_devel("%s nr_servers=%u\n", __func__, nr_servers);
+
+	if (!nr_servers || nr_servers > KVM_MAX_VCPU_ID)
+		return -EINVAL;
+
+	mutex_lock(&xive->lock);
+	if (xive->vp_base != XIVE_INVALID_VP)
+		/* The VP block is allocated once and freed when the device
+		 * is released. Better not allow to change its size since its
+		 * used by connect_vcpu to validate vCPU ids are valid (eg,
+		 * setting it back to a higher value could allow connect_vcpu
+		 * to come up with a VP id that goes beyond the VP block, which
+		 * is likely to cause a crash in OPAL).
+		 */
+		rc = -EBUSY;
+	else if (nr_servers > KVM_MAX_VCPUS)
+		/* We don't need more servers. Higher vCPU ids get packed
+		 * down below KVM_MAX_VCPUS by kvmppc_pack_vcpu_id().
+		 */
+		xive->nr_servers = KVM_MAX_VCPUS;
+	else
+		xive->nr_servers = nr_servers;
+
+	mutex_unlock(&xive->lock);
+
+	return rc;
+}
+
 static int xive_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 {
 	struct kvmppc_xive *xive = dev->private;
@@ -2025,7 +2070,6 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 {
 	struct kvmppc_xive *xive;
 	struct kvm *kvm = dev->kvm;
-	int ret = 0;
 
 	pr_devel("Creating xive for partition\n");
 
@@ -2049,18 +2093,15 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 	else
 		xive->q_page_order = xive->q_order - PAGE_SHIFT;
 
-	/* Allocate a bunch of VPs */
-	xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
-	pr_devel("VP_Base=%x\n", xive->vp_base);
-
-	if (xive->vp_base == XIVE_INVALID_VP)
-		ret = -ENOMEM;
+	/* VP allocation is delayed to the first call to connect_vcpu */
+	xive->vp_base = XIVE_INVALID_VP;
+	/* KVM_MAX_VCPUS limits the number of VMs to roughly 64 per sockets
+	 * on a POWER9 system.
+	 */
+	xive->nr_servers = KVM_MAX_VCPUS;
 
 	xive->single_escalation = xive_native_has_single_escalation();
 
-	if (ret)
-		return ret;
-
 	kvm->arch.xive = xive;
 	return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 90cf6ec35a68..382e3a56e789 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -135,6 +135,9 @@ struct kvmppc_xive {
 	/* Flags */
 	u8	single_escalation;
 
+	/* Number of entries in the VP block */
+	u32	nr_servers;
+
 	struct kvmppc_xive_ops *ops;
 	struct address_space   *mapping;
 	struct mutex mapping_lock;
@@ -297,6 +300,7 @@ struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
 void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu,
 				    struct kvmppc_xive_vcpu *xc, int irq);
 int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp);
+int kvmppc_xive_set_nr_servers(struct kvmppc_xive *xive, u64 addr);
 
 #endif /* CONFIG_KVM_XICS */
 #endif /* _KVM_PPC_BOOK3S_XICS_H */
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 5bb480b2aafd..8ab333eabeef 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -1060,7 +1060,6 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 {
 	struct kvmppc_xive *xive;
 	struct kvm *kvm = dev->kvm;
-	int ret = 0;
 
 	pr_devel("Creating xive native device\n");
 
@@ -1077,23 +1076,16 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	mutex_init(&xive->mapping_lock);
 	mutex_init(&xive->lock);
 
-	/*
-	 * Allocate a bunch of VPs. KVM_MAX_VCPUS is a large value for
-	 * a default. Getting the max number of CPUs the VM was
-	 * configured with would improve our usage of the XIVE VP space.
+	/* VP allocation is delayed to the first call to connect_vcpu */
+	xive->vp_base = XIVE_INVALID_VP;
+	/* KVM_MAX_VCPUS limits the number of VMs to roughly 64 per sockets
+	 * on a POWER9 system.
 	 */
-	xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
-	pr_devel("VP_Base=%x\n", xive->vp_base);
-
-	if (xive->vp_base == XIVE_INVALID_VP)
-		ret = -ENXIO;
+	xive->nr_servers = KVM_MAX_VCPUS;
 
 	xive->single_escalation = xive_native_has_single_escalation();
 	xive->ops = &kvmppc_xive_native_ops;
 
-	if (ret)
-		return ret;
-
 	kvm->arch.xive = xive;
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 6/6] KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs
  2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
                   ` (4 preceding siblings ...)
  2019-09-27 11:54 ` [PATCH v2 5/6] KVM: PPC: Book3S HV: XIVE: Make VP block size configurable Greg Kurz
@ 2019-09-27 11:54 ` Greg Kurz
  2019-10-16 21:44 ` [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
  6 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2019-09-27 11:54 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

Add a new attribute to both XIVE and XICS-on-XIVE KVM devices so that
userspace can tell how many interrupt servers it needs. If a VM needs
less than the current default of KVM_MAX_VCPUS (2048), we can allocate
less VPs in OPAL. Combined with a core stride (VSMT) that matches the
number of guest threads per core, this may substantially increases the
number of VMs that can run concurrently with an in-kernel XIVE device.

Since the legacy XIVE KVM device is exposed to userspace through the
XICS KVM API, a new attribute group is added to it for this purpose.
While here, fix the syntax of the existing KVM_DEV_XICS_GRP_SOURCES
in the XICS documentation.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
---
v2: - changelog update
---
 Documentation/virt/kvm/devices/xics.txt |   14 ++++++++++++--
 Documentation/virt/kvm/devices/xive.txt |    8 ++++++++
 arch/powerpc/include/uapi/asm/kvm.h     |    3 +++
 arch/powerpc/kvm/book3s_xive.c          |   10 ++++++++++
 arch/powerpc/kvm/book3s_xive_native.c   |    3 +++
 5 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/devices/xics.txt b/Documentation/virt/kvm/devices/xics.txt
index 42864935ac5d..423332dda7bc 100644
--- a/Documentation/virt/kvm/devices/xics.txt
+++ b/Documentation/virt/kvm/devices/xics.txt
@@ -3,9 +3,19 @@ XICS interrupt controller
 Device type supported: KVM_DEV_TYPE_XICS
 
 Groups:
-  KVM_DEV_XICS_SOURCES
+  1. KVM_DEV_XICS_GRP_SOURCES
   Attributes: One per interrupt source, indexed by the source number.
 
+  2. KVM_DEV_XICS_GRP_CTRL
+  Attributes:
+    2.1 KVM_DEV_XICS_NR_SERVERS (write only)
+  The kvm_device_attr.addr points to a __u32 value which is the number of
+  interrupt server numbers (ie, highest possible vcpu id plus one).
+  Errors:
+    -EINVAL: Value greater than KVM_MAX_VCPU_ID.
+    -EFAULT: Invalid user pointer for attr->addr.
+    -EBUSY:  A vcpu is already connected to the device.
+
 This device emulates the XICS (eXternal Interrupt Controller
 Specification) defined in PAPR.  The XICS has a set of interrupt
 sources, each identified by a 20-bit source number, and a set of
@@ -38,7 +48,7 @@ least-significant end of the word:
 
 Each source has 64 bits of state that can be read and written using
 the KVM_GET_DEVICE_ATTR and KVM_SET_DEVICE_ATTR ioctls, specifying the
-KVM_DEV_XICS_SOURCES attribute group, with the attribute number being
+KVM_DEV_XICS_GRP_SOURCES attribute group, with the attribute number being
 the interrupt source number.  The 64 bit state word has the following
 bitfields, starting from the least-significant end of the word:
 
diff --git a/Documentation/virt/kvm/devices/xive.txt b/Documentation/virt/kvm/devices/xive.txt
index 9a24a4525253..f5d1d6b5af61 100644
--- a/Documentation/virt/kvm/devices/xive.txt
+++ b/Documentation/virt/kvm/devices/xive.txt
@@ -78,6 +78,14 @@ the legacy interrupt mode, referred as XICS (POWER7/8).
     migrating the VM.
     Errors: none
 
+    1.3 KVM_DEV_XIVE_NR_SERVERS (write only)
+    The kvm_device_attr.addr points to a __u32 value which is the number of
+    interrupt server numbers (ie, highest possible vcpu id plus one).
+    Errors:
+      -EINVAL: Value greater than KVM_MAX_VCPU_ID.
+      -EFAULT: Invalid user pointer for attr->addr.
+      -EBUSY:  A vCPU is already connected to the device.
+
   2. KVM_DEV_XIVE_GRP_SOURCE (write only)
   Initializes a new source in the XIVE device and mask it.
   Attributes:
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index b0f72dea8b11..264e266a85bf 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -667,6 +667,8 @@ struct kvm_ppc_cpu_char {
 
 /* PPC64 eXternal Interrupt Controller Specification */
 #define KVM_DEV_XICS_GRP_SOURCES	1	/* 64-bit source attributes */
+#define KVM_DEV_XICS_GRP_CTRL		2
+#define   KVM_DEV_XICS_NR_SERVERS	1
 
 /* Layout of 64-bit source attribute values */
 #define  KVM_XICS_DESTINATION_SHIFT	0
@@ -683,6 +685,7 @@ struct kvm_ppc_cpu_char {
 #define KVM_DEV_XIVE_GRP_CTRL		1
 #define   KVM_DEV_XIVE_RESET		1
 #define   KVM_DEV_XIVE_EQ_SYNC		2
+#define   KVM_DEV_XIVE_NR_SERVERS	3
 #define KVM_DEV_XIVE_GRP_SOURCE		2	/* 64-bit source identifier */
 #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG	3	/* 64-bit source identifier */
 #define KVM_DEV_XIVE_GRP_EQ_CONFIG	4	/* 64-bit EQ identifier */
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 6c35b3d95986..66858b7d3c6b 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1911,6 +1911,11 @@ static int xive_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 	switch (attr->group) {
 	case KVM_DEV_XICS_GRP_SOURCES:
 		return xive_set_source(xive, attr->attr, attr->addr);
+	case KVM_DEV_XICS_GRP_CTRL:
+		switch (attr->attr) {
+		case KVM_DEV_XICS_NR_SERVERS:
+			return kvmppc_xive_set_nr_servers(xive, attr->addr);
+		}
 	}
 	return -ENXIO;
 }
@@ -1936,6 +1941,11 @@ static int xive_has_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
 		    attr->attr < KVMPPC_XICS_NR_IRQS)
 			return 0;
 		break;
+	case KVM_DEV_XICS_GRP_CTRL:
+		switch (attr->attr) {
+		case KVM_DEV_XICS_NR_SERVERS:
+			return 0;
+		}
 	}
 	return -ENXIO;
 }
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 8ab333eabeef..34bd123fa024 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -921,6 +921,8 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
 			return kvmppc_xive_reset(xive);
 		case KVM_DEV_XIVE_EQ_SYNC:
 			return kvmppc_xive_native_eq_sync(xive);
+		case KVM_DEV_XIVE_NR_SERVERS:
+			return kvmppc_xive_set_nr_servers(xive, attr->addr);
 		}
 		break;
 	case KVM_DEV_XIVE_GRP_SOURCE:
@@ -960,6 +962,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 		switch (attr->attr) {
 		case KVM_DEV_XIVE_RESET:
 		case KVM_DEV_XIVE_EQ_SYNC:
+		case KVM_DEV_XIVE_NR_SERVERS:
 			return 0;
 		}
 		break;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/6] KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs
  2019-09-27 11:53 ` [PATCH v2 3/6] KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs Greg Kurz
@ 2019-09-30 10:07   ` Cédric Le Goater
  0 siblings, 0 replies; 13+ messages in thread
From: Cédric Le Goater @ 2019-09-30 10:07 UTC (permalink / raw)
  To: Greg Kurz, Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, stable, Paolo Bonzini, linuxppc-dev, David Gibson

On 27/09/2019 13:53, Greg Kurz wrote:
> Print out the VP id of each connected vCPU, this allow to see:
> - the VP block base in which OPAL encodes information that may be
>   useful when debugging
> - the packed vCPU id which may differ from the raw vCPU id if the
>   latter is >= KVM_MAX_VCPUS (2048)
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>


Reviewed-by: Cédric Le Goater <clg@kaod.org>
> ---
>  arch/powerpc/kvm/book3s_xive.c        |    4 ++--
>  arch/powerpc/kvm/book3s_xive_native.c |    4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index baa740815b3c..0b7859e40f66 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -2107,9 +2107,9 @@ static int xive_debug_show(struct seq_file *m, void *private)
>  		if (!xc)
>  			continue;
>  
> -		seq_printf(m, "cpu server %#x CPPR:%#x HWCPPR:%#x"
> +		seq_printf(m, "cpu server %#x VP:%#x CPPR:%#x HWCPPR:%#x"
>  			   " MFRR:%#x PEND:%#x h_xirr: R=%lld V=%lld\n",
> -			   xc->server_num, xc->cppr, xc->hw_cppr,
> +			   xc->server_num, xc->vp_id, xc->cppr, xc->hw_cppr,
>  			   xc->mfrr, xc->pending,
>  			   xc->stat_rm_h_xirr, xc->stat_vm_h_xirr);
>  
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index ebb4215baf45..43a86858390a 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -1204,8 +1204,8 @@ static int xive_native_debug_show(struct seq_file *m, void *private)
>  		if (!xc)
>  			continue;
>  
> -		seq_printf(m, "cpu server %#x NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x\n",
> -			   xc->server_num,
> +		seq_printf(m, "cpu server %#x VP=%#x NSR=%02x CPPR=%02x IBP=%02x PIPR=%02x w01=%016llx w2=%08x\n",
> +			   xc->server_num, xc->vp_id,
>  			   vcpu->arch.xive_saved_state.nsr,
>  			   vcpu->arch.xive_saved_state.cppr,
>  			   vcpu->arch.xive_saved_state.ipb,
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/6] KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper
  2019-09-27 11:53 ` [PATCH v2 4/6] KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper Greg Kurz
@ 2019-09-30 12:01   ` Cédric Le Goater
  2019-09-30 12:29     ` Greg Kurz
  0 siblings, 1 reply; 13+ messages in thread
From: Cédric Le Goater @ 2019-09-30 12:01 UTC (permalink / raw)
  To: Greg Kurz, Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, stable, Paolo Bonzini, linuxppc-dev, David Gibson

On 27/09/2019 13:53, Greg Kurz wrote:
> Reduce code duplication by consolidating the checking of vCPU ids and VP
> ids to a common helper used by both legacy and native XIVE KVM devices.
> And explain the magic with a comment.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Looks fine. One question below,

> ---
>  arch/powerpc/kvm/book3s_xive.c        |   42 ++++++++++++++++++++++++++-------
>  arch/powerpc/kvm/book3s_xive.h        |    1 +
>  arch/powerpc/kvm/book3s_xive_native.c |   11 ++-------
>  3 files changed, 36 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 0b7859e40f66..d84da9f6ee88 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1211,6 +1211,37 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
>  	vcpu->arch.xive_vcpu = NULL;
>  }
>  
> +static bool kvmppc_xive_vcpu_id_valid(struct kvmppc_xive *xive, u32 cpu)
> +{
> +	/* We have a block of KVM_MAX_VCPUS VPs. We just need to check
> +	 * raw vCPU ids are below the expected limit for this guest's
> +	 * core stride ; kvmppc_pack_vcpu_id() will pack them down to an
> +	 * index that can be safely used to compute a VP id that belongs
> +	 * to the VP block.
> +	 */
> +	return cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode;
> +}
> +
> +int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)
> +{
> +	u32 vp_id;
> +
> +	if (!kvmppc_xive_vcpu_id_valid(xive, cpu)) {
> +		pr_devel("Out of bounds !\n");
> +		return -EINVAL;
> +	}
> +
> +	vp_id = kvmppc_xive_vp(xive, cpu);
> +	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
> +		pr_devel("Duplicate !\n");
> +		return -EEXIST;
> +	}
> +
> +	*vp = vp_id;
> +
> +	return 0;

why not return vp_id ? and test for a negative value in callers.


C.

> +}
> +
>  int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  			     struct kvm_vcpu *vcpu, u32 cpu)
>  {
> @@ -1229,20 +1260,13 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  		return -EPERM;
>  	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
>  		return -EBUSY;
> -	if (cpu >= (KVM_MAX_VCPUS * vcpu->kvm->arch.emul_smt_mode)) {
> -		pr_devel("Out of bounds !\n");
> -		return -EINVAL;
> -	}
>  
>  	/* We need to synchronize with queue provisioning */
>  	mutex_lock(&xive->lock);
>  
> -	vp_id = kvmppc_xive_vp(xive, cpu);
> -	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
> -		pr_devel("Duplicate !\n");
> -		r = -EEXIST;
> +	r = kvmppc_xive_compute_vp_id(xive, cpu, &vp_id);
> +	if (r)
>  		goto bail;
> -	}
>  
>  	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
>  	if (!xc) {
> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> index fe3ed50e0818..90cf6ec35a68 100644
> --- a/arch/powerpc/kvm/book3s_xive.h
> +++ b/arch/powerpc/kvm/book3s_xive.h
> @@ -296,6 +296,7 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
>  struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
>  void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu,
>  				    struct kvmppc_xive_vcpu *xc, int irq);
> +int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp);
>  
>  #endif /* CONFIG_KVM_XICS */
>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 43a86858390a..5bb480b2aafd 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -118,19 +118,12 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>  		return -EPERM;
>  	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
>  		return -EBUSY;
> -	if (server_num >= (KVM_MAX_VCPUS * vcpu->kvm->arch.emul_smt_mode)) {
> -		pr_devel("Out of bounds !\n");
> -		return -EINVAL;
> -	}
>  
>  	mutex_lock(&xive->lock);
>  
> -	vp_id = kvmppc_xive_vp(xive, server_num);
> -	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
> -		pr_devel("Duplicate !\n");
> -		rc = -EEXIST;
> +	rc = kvmppc_xive_compute_vp_id(xive, server_num, &vp_id);
> +	if (rc)
>  		goto bail;
> -	}
>  
>  	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
>  	if (!xc) {
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 5/6] KVM: PPC: Book3S HV: XIVE: Make VP block size configurable
  2019-09-27 11:54 ` [PATCH v2 5/6] KVM: PPC: Book3S HV: XIVE: Make VP block size configurable Greg Kurz
@ 2019-09-30 12:11   ` Cédric Le Goater
  0 siblings, 0 replies; 13+ messages in thread
From: Cédric Le Goater @ 2019-09-30 12:11 UTC (permalink / raw)
  To: Greg Kurz, Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, stable, Paolo Bonzini, linuxppc-dev, David Gibson

On 27/09/2019 13:54, Greg Kurz wrote:
> The XIVE VP is an internal structure which allow the XIVE interrupt
> controller to maintain the interrupt context state of vCPUs non
> dispatched on HW threads.
> 
> When a guest is started, the XIVE KVM device allocates a block of
> XIVE VPs in OPAL, enough to accommodate the highest possible vCPU
> id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048).
> With a guest's core stride of 8 and a threading mode of 1 (QEMU's
> default), a VM must run at least 256 vCPUs to actually need such a
> range of VPs.
> 
> A POWER9 system has a limited XIVE VP space : 512k and KVM is
> currently wasting this HW resource with large VP allocations,
> especially since a typical VM likely runs with a lot less vCPUs.
> 
> Make the size of the VP block configurable. Add an nr_servers
> field to the XIVE structure and a function to set it for this
> purpose.
> 
> Split VP allocation out of the device create function. Since the
> VP block isn't used before the first vCPU connects to the XIVE KVM
> device, allocation is now performed by kvmppc_xive_connect_vcpu().
> This gives the opportunity to set nr_servers in between:
> 
>           kvmppc_xive_create() / kvmppc_xive_native_create()
>                                .
>                                .
>                      kvmppc_xive_set_nr_servers()
>                                .
>                                .
>     kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu()
> 
> The connect_vcpu() functions check that the vCPU id is below nr_servers
> and if it is the first vCPU they allocate the VP block. This is protected
> against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers()
> with the xive->lock mutex.
> 
> Also, the block is allocated once for the device lifetime: nr_servers
> should stay constant otherwise connect_vcpu() could generate a boggus
> VP id and likely crash OPAL. It is thus forbidden to update nr_servers
> once the block is allocated.
> 
> If the VP allocation fail, return ENOSPC which seems more appropriate to
> report the depletion of system wide HW resource than ENOMEM or ENXIO.
> 
> A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence
> only need 256 VPs instead of 2048. If the stride is set to match the number
> of threads per core, this goes further down to 32.
> 
> This will be exposed to userspace by a subsequent patch.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Reviewed-by: Cédric Le Goater <clg@kaod.org>

> ---
> v2: - update nr_server check and clip down to KVM_MAX_VCPUS
>       in kvmppc_xive_set_nr_servers()
> ---
>  arch/powerpc/kvm/book3s_xive.c        |   65 +++++++++++++++++++++++++++------
>  arch/powerpc/kvm/book3s_xive.h        |    4 ++
>  arch/powerpc/kvm/book3s_xive_native.c |   18 +++------
>  3 files changed, 62 insertions(+), 25 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index d84da9f6ee88..6c35b3d95986 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1213,13 +1213,13 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
>  
>  static bool kvmppc_xive_vcpu_id_valid(struct kvmppc_xive *xive, u32 cpu)
>  {
> -	/* We have a block of KVM_MAX_VCPUS VPs. We just need to check
> +	/* We have a block of xive->nr_servers VPs. We just need to check
>  	 * raw vCPU ids are below the expected limit for this guest's
>  	 * core stride ; kvmppc_pack_vcpu_id() will pack them down to an
>  	 * index that can be safely used to compute a VP id that belongs
>  	 * to the VP block.
>  	 */
> -	return cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode;
> +	return cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode;
>  }
>  
>  int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)
> @@ -1231,6 +1231,14 @@ int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)
>  		return -EINVAL;
>  	}
>  
> +	if (xive->vp_base == XIVE_INVALID_VP) {
> +		xive->vp_base = xive_native_alloc_vp_block(xive->nr_servers);
> +		pr_devel("VP_Base=%x nr_servers=%d\n", xive->vp_base, xive->nr_servers);
> +
> +		if (xive->vp_base == XIVE_INVALID_VP)
> +			return -ENOSPC;
> +	}>  	vp_id = kvmppc_xive_vp(xive, cpu);
>  	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
>  		pr_devel("Duplicate !\n");
> @@ -1858,6 +1866,43 @@ int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
>  	return 0;
>  }
>  
> +int kvmppc_xive_set_nr_servers(struct kvmppc_xive *xive, u64 addr)
> +{
> +	u32 __user *ubufp = (u32 __user *) addr;
> +	u32 nr_servers;
> +	int rc = 0;
> +
> +	if (get_user(nr_servers, ubufp))
> +		return -EFAULT;
> +
> +	pr_devel("%s nr_servers=%u\n", __func__, nr_servers);
> +
> +	if (!nr_servers || nr_servers > KVM_MAX_VCPU_ID)
> +		return -EINVAL;
> +
> +	mutex_lock(&xive->lock);
> +	if (xive->vp_base != XIVE_INVALID_VP)
> +		/* The VP block is allocated once and freed when the device
> +		 * is released. Better not allow to change its size since its
> +		 * used by connect_vcpu to validate vCPU ids are valid (eg,
> +		 * setting it back to a higher value could allow connect_vcpu
> +		 * to come up with a VP id that goes beyond the VP block, which
> +		 * is likely to cause a crash in OPAL).
> +		 */
> +		rc = -EBUSY;
> +	else if (nr_servers > KVM_MAX_VCPUS)
> +		/* We don't need more servers. Higher vCPU ids get packed
> +		 * down below KVM_MAX_VCPUS by kvmppc_pack_vcpu_id().
> +		 */
> +		xive->nr_servers = KVM_MAX_VCPUS;
> +	else
> +		xive->nr_servers = nr_servers;
> +
> +	mutex_unlock(&xive->lock);
> +
> +	return rc;
> +}
> +
>  static int xive_set_attr(struct kvm_device *dev, struct kvm_device_attr *attr)
>  {
>  	struct kvmppc_xive *xive = dev->private;
> @@ -2025,7 +2070,6 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>  {
>  	struct kvmppc_xive *xive;
>  	struct kvm *kvm = dev->kvm;
> -	int ret = 0;
>  
>  	pr_devel("Creating xive for partition\n");
>  
> @@ -2049,18 +2093,15 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>  	else
>  		xive->q_page_order = xive->q_order - PAGE_SHIFT;
>  
> -	/* Allocate a bunch of VPs */
> -	xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
> -	pr_devel("VP_Base=%x\n", xive->vp_base);
> -
> -	if (xive->vp_base == XIVE_INVALID_VP)
> -		ret = -ENOMEM;
> +	/* VP allocation is delayed to the first call to connect_vcpu */
> +	xive->vp_base = XIVE_INVALID_VP;
> +	/* KVM_MAX_VCPUS limits the number of VMs to roughly 64 per sockets
> +	 * on a POWER9 system.
> +	 */
> +	xive->nr_servers = KVM_MAX_VCPUS;
>  
>  	xive->single_escalation = xive_native_has_single_escalation();
>  
> -	if (ret)
> -		return ret;
> -
>  	kvm->arch.xive = xive;
>  	return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> index 90cf6ec35a68..382e3a56e789 100644
> --- a/arch/powerpc/kvm/book3s_xive.h
> +++ b/arch/powerpc/kvm/book3s_xive.h
> @@ -135,6 +135,9 @@ struct kvmppc_xive {
>  	/* Flags */
>  	u8	single_escalation;
>  
> +	/* Number of entries in the VP block */
> +	u32	nr_servers;
> +
>  	struct kvmppc_xive_ops *ops;
>  	struct address_space   *mapping;
>  	struct mutex mapping_lock;
> @@ -297,6 +300,7 @@ struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
>  void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu,
>  				    struct kvmppc_xive_vcpu *xc, int irq);
>  int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp);
> +int kvmppc_xive_set_nr_servers(struct kvmppc_xive *xive, u64 addr);
>  
>  #endif /* CONFIG_KVM_XICS */
>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 5bb480b2aafd..8ab333eabeef 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -1060,7 +1060,6 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
>  {
>  	struct kvmppc_xive *xive;
>  	struct kvm *kvm = dev->kvm;
> -	int ret = 0;
>  
>  	pr_devel("Creating xive native device\n");
>  
> @@ -1077,23 +1076,16 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
>  	mutex_init(&xive->mapping_lock);
>  	mutex_init(&xive->lock);
>  
> -	/*
> -	 * Allocate a bunch of VPs. KVM_MAX_VCPUS is a large value for
> -	 * a default. Getting the max number of CPUs the VM was
> -	 * configured with would improve our usage of the XIVE VP space.
> +	/* VP allocation is delayed to the first call to connect_vcpu */
> +	xive->vp_base = XIVE_INVALID_VP;
> +	/* KVM_MAX_VCPUS limits the number of VMs to roughly 64 per sockets
> +	 * on a POWER9 system.
>  	 */
> -	xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
> -	pr_devel("VP_Base=%x\n", xive->vp_base);
> -
> -	if (xive->vp_base == XIVE_INVALID_VP)
> -		ret = -ENXIO;
> +	xive->nr_servers = KVM_MAX_VCPUS;
>  
>  	xive->single_escalation = xive_native_has_single_escalation();
>  	xive->ops = &kvmppc_xive_native_ops;
>  
> -	if (ret)
> -		return ret;
> -
>  	kvm->arch.xive = xive;
>  	return 0;
>  }
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/6] KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper
  2019-09-30 12:01   ` Cédric Le Goater
@ 2019-09-30 12:29     ` Greg Kurz
  0 siblings, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2019-09-30 12:29 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: kvm, Radim Krčmář,
	kvm-ppc, stable, Paolo Bonzini, linuxppc-dev, David Gibson

On Mon, 30 Sep 2019 14:01:56 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> On 27/09/2019 13:53, Greg Kurz wrote:
> > Reduce code duplication by consolidating the checking of vCPU ids and VP
> > ids to a common helper used by both legacy and native XIVE KVM devices.
> > And explain the magic with a comment.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> 
> Looks fine. One question below,
> 
> > ---
> >  arch/powerpc/kvm/book3s_xive.c        |   42 ++++++++++++++++++++++++++-------
> >  arch/powerpc/kvm/book3s_xive.h        |    1 +
> >  arch/powerpc/kvm/book3s_xive_native.c |   11 ++-------
> >  3 files changed, 36 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> > index 0b7859e40f66..d84da9f6ee88 100644
> > --- a/arch/powerpc/kvm/book3s_xive.c
> > +++ b/arch/powerpc/kvm/book3s_xive.c
> > @@ -1211,6 +1211,37 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
> >  	vcpu->arch.xive_vcpu = NULL;
> >  }
> >  
> > +static bool kvmppc_xive_vcpu_id_valid(struct kvmppc_xive *xive, u32 cpu)
> > +{
> > +	/* We have a block of KVM_MAX_VCPUS VPs. We just need to check
> > +	 * raw vCPU ids are below the expected limit for this guest's
> > +	 * core stride ; kvmppc_pack_vcpu_id() will pack them down to an
> > +	 * index that can be safely used to compute a VP id that belongs
> > +	 * to the VP block.
> > +	 */
> > +	return cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode;
> > +}
> > +
> > +int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp)
> > +{
> > +	u32 vp_id;
> > +
> > +	if (!kvmppc_xive_vcpu_id_valid(xive, cpu)) {
> > +		pr_devel("Out of bounds !\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	vp_id = kvmppc_xive_vp(xive, cpu);
> > +	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
> > +		pr_devel("Duplicate !\n");
> > +		return -EEXIST;
> > +	}
> > +
> > +	*vp = vp_id;
> > +
> > +	return 0;
> 
> why not return vp_id ? and test for a negative value in callers.
> 

Simpler code in the callers IMHO.

> 
> C.
> 
> > +}
> > +
> >  int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
> >  			     struct kvm_vcpu *vcpu, u32 cpu)
> >  {
> > @@ -1229,20 +1260,13 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
> >  		return -EPERM;
> >  	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
> >  		return -EBUSY;
> > -	if (cpu >= (KVM_MAX_VCPUS * vcpu->kvm->arch.emul_smt_mode)) {
> > -		pr_devel("Out of bounds !\n");
> > -		return -EINVAL;
> > -	}
> >  
> >  	/* We need to synchronize with queue provisioning */
> >  	mutex_lock(&xive->lock);
> >  
> > -	vp_id = kvmppc_xive_vp(xive, cpu);
> > -	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
> > -		pr_devel("Duplicate !\n");
> > -		r = -EEXIST;
> > +	r = kvmppc_xive_compute_vp_id(xive, cpu, &vp_id);
> > +	if (r)
> >  		goto bail;
> > -	}
> >  
> >  	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
> >  	if (!xc) {
> > diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> > index fe3ed50e0818..90cf6ec35a68 100644
> > --- a/arch/powerpc/kvm/book3s_xive.h
> > +++ b/arch/powerpc/kvm/book3s_xive.h
> > @@ -296,6 +296,7 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
> >  struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
> >  void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu,
> >  				    struct kvmppc_xive_vcpu *xc, int irq);
> > +int kvmppc_xive_compute_vp_id(struct kvmppc_xive *xive, u32 cpu, u32 *vp);
> >  
> >  #endif /* CONFIG_KVM_XICS */
> >  #endif /* _KVM_PPC_BOOK3S_XICS_H */
> > diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> > index 43a86858390a..5bb480b2aafd 100644
> > --- a/arch/powerpc/kvm/book3s_xive_native.c
> > +++ b/arch/powerpc/kvm/book3s_xive_native.c
> > @@ -118,19 +118,12 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
> >  		return -EPERM;
> >  	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
> >  		return -EBUSY;
> > -	if (server_num >= (KVM_MAX_VCPUS * vcpu->kvm->arch.emul_smt_mode)) {
> > -		pr_devel("Out of bounds !\n");
> > -		return -EINVAL;
> > -	}
> >  
> >  	mutex_lock(&xive->lock);
> >  
> > -	vp_id = kvmppc_xive_vp(xive, server_num);
> > -	if (kvmppc_xive_vp_in_use(xive->kvm, vp_id)) {
> > -		pr_devel("Duplicate !\n");
> > -		rc = -EEXIST;
> > +	rc = kvmppc_xive_compute_vp_id(xive, server_num, &vp_id);
> > +	if (rc)
> >  		goto bail;
> > -	}
> >  
> >  	xc = kzalloc(sizeof(*xc), GFP_KERNEL);
> >  	if (!xc) {
> > 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL
  2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
                   ` (5 preceding siblings ...)
  2019-09-27 11:54 ` [PATCH v2 6/6] KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs Greg Kurz
@ 2019-10-16 21:44 ` Greg Kurz
  2019-10-21  4:06   ` Paul Mackerras
  6 siblings, 1 reply; 13+ messages in thread
From: Greg Kurz @ 2019-10-16 21:44 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

On Fri, 27 Sep 2019 13:53:32 +0200
Greg Kurz <groug@kaod.org> wrote:

> This brings some fixes and allows to start more VMs with an in-kernel
> XIVE or XICS-on-XIVE device.
> 
> Changes since v1 (https://patchwork.ozlabs.org/cover/1166099/):
> - drop a useless patch
> - add a patch to show VP ids in debugfs
> - update some changelogs
> - fix buggy check in patch 5
> - Cc: stable 
> 
> --
> Greg
> 
> ---
> 
> Greg Kurz (6):
>       KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated
>       KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
>       KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs
>       KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper
>       KVM: PPC: Book3S HV: XIVE: Make VP block size configurable
>       KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs
> 
> 
>  Documentation/virt/kvm/devices/xics.txt |   14 +++
>  Documentation/virt/kvm/devices/xive.txt |    8 ++
>  arch/powerpc/include/uapi/asm/kvm.h     |    3 +
>  arch/powerpc/kvm/book3s_xive.c          |  142 ++++++++++++++++++++++++-------
>  arch/powerpc/kvm/book3s_xive.h          |   17 ++++
>  arch/powerpc/kvm/book3s_xive_native.c   |   40 +++------
>  6 files changed, 167 insertions(+), 57 deletions(-)
> 

Ping ?

Cheers,

--
Greg

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL
  2019-10-16 21:44 ` [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
@ 2019-10-21  4:06   ` Paul Mackerras
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Mackerras @ 2019-10-21  4:06 UTC (permalink / raw)
  To: Greg Kurz
  Cc: kvm, Radim Krčmář,
	kvm-ppc, Cédric Le Goater, Paolo Bonzini, stable,
	linuxppc-dev, David Gibson

On Wed, Oct 16, 2019 at 11:44:03PM +0200, Greg Kurz wrote:
> On Fri, 27 Sep 2019 13:53:32 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
> > This brings some fixes and allows to start more VMs with an in-kernel
> > XIVE or XICS-on-XIVE device.
> > 
> > Changes since v1 (https://patchwork.ozlabs.org/cover/1166099/):
> > - drop a useless patch
> > - add a patch to show VP ids in debugfs
> > - update some changelogs
> > - fix buggy check in patch 5
> > - Cc: stable 
> > 
> > --
> > Greg
> > 
> > ---
> > 
> > Greg Kurz (6):
> >       KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated
> >       KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
> >       KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs
> >       KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper
> >       KVM: PPC: Book3S HV: XIVE: Make VP block size configurable
> >       KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs
> > 
> > 
> >  Documentation/virt/kvm/devices/xics.txt |   14 +++
> >  Documentation/virt/kvm/devices/xive.txt |    8 ++
> >  arch/powerpc/include/uapi/asm/kvm.h     |    3 +
> >  arch/powerpc/kvm/book3s_xive.c          |  142 ++++++++++++++++++++++++-------
> >  arch/powerpc/kvm/book3s_xive.h          |   17 ++++
> >  arch/powerpc/kvm/book3s_xive_native.c   |   40 +++------
> >  6 files changed, 167 insertions(+), 57 deletions(-)
> > 
> 
> Ping ?

I'm about to send a pull request to Paolo for 2/6 (to go into 5.4) and
I'm preparing a tree of stuff for 5.5 that will include the rest of
the patches.  However, I have been delayed by the fact that multipath
SCSI is currently broken upstream on the P8 test box that I use, so I
haven't been able to test things.

Paul.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-21  4:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 11:53 [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
2019-09-27 11:53 ` [PATCH v2 1/6] KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocated Greg Kurz
2019-09-27 11:53 ` [PATCH v2 2/6] KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use Greg Kurz
2019-09-27 11:53 ` [PATCH v2 3/6] KVM: PPC: Book3S HV: XIVE: Show VP id in debugfs Greg Kurz
2019-09-30 10:07   ` Cédric Le Goater
2019-09-27 11:53 ` [PATCH v2 4/6] KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helper Greg Kurz
2019-09-30 12:01   ` Cédric Le Goater
2019-09-30 12:29     ` Greg Kurz
2019-09-27 11:54 ` [PATCH v2 5/6] KVM: PPC: Book3S HV: XIVE: Make VP block size configurable Greg Kurz
2019-09-30 12:11   ` Cédric Le Goater
2019-09-27 11:54 ` [PATCH v2 6/6] KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPs Greg Kurz
2019-10-16 21:44 ` [PATCH v2 0/6] KVM: PPC: Book3S: HV: XIVE: Allocate less VPs in OPAL Greg Kurz
2019-10-21  4:06   ` Paul Mackerras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).