kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: PPC: Book3S HV: add XIVE native exploitation mode (v5 errata)
@ 2019-04-11 13:53 Cédric Le Goater
  2019-04-11 13:53 ` [PATCH 1/2] KVM: introduce a 'release' method for KVM devices Cédric Le Goater
  2019-04-11 13:53 ` [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release' Cédric Le Goater
  0 siblings, 2 replies; 5+ messages in thread
From: Cédric Le Goater @ 2019-04-11 13:53 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Paul Mackerras, David Gibson, kvm, Cédric Le Goater

Hello,

Here is a little series fixing the larger series "KVM: PPC: Book3S HV:
add XIVE native exploitation mode" that can be found here :

   http://patchwork.ozlabs.org/cover/1083513/

It adds a new 'release' method to the KVM device operation which, when
defined, is called when the file descriptor of the device is closed.
It acts as a replacement of the 'destroy' method.

These two patches replaces patch 16 :

   http://patchwork.ozlabs.org/patch/1083517/

Thanks,

C.

Cédric Le Goater (2):
  KVM: introduce a 'release' method for KVM devices
  KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release'

 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/kvm/book3s_xive.h        |  1 +
 include/linux/kvm_host.h              |  9 +++
 arch/powerpc/kvm/book3s_xive.c        | 82 +++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_xive_native.c | 30 ++++++++--
 arch/powerpc/kvm/powerpc.c            |  9 +++
 virt/kvm/kvm_main.c                   | 13 +++++
 7 files changed, 134 insertions(+), 11 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] KVM: introduce a 'release' method for KVM devices
  2019-04-11 13:53 [PATCH 0/2] KVM: PPC: Book3S HV: add XIVE native exploitation mode (v5 errata) Cédric Le Goater
@ 2019-04-11 13:53 ` Cédric Le Goater
  2019-04-12  6:32   ` David Gibson
  2019-04-11 13:53 ` [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release' Cédric Le Goater
  1 sibling, 1 reply; 5+ messages in thread
From: Cédric Le Goater @ 2019-04-11 13:53 UTC (permalink / raw)
  To: kvm-ppc
  Cc: Paul Mackerras, David Gibson, kvm, Cédric Le Goater, Paolo Bonzini

When a P9 sPAPR VM boots, the CAS negotiation process determines which
interrupt mode to use (XICS legacy or XIVE native) and invokes a
machine reset to activate the chosen mode.

To be able to switch from one interrupt mode to another, we introduce
the capability to release a KVM device without destroying the VM. The
KVM device interface is extended with a new 'release' method which is
called when the file descriptor of the device is closed.

Once 'release' is called, the 'destroy' method will not be called
anymore as the device is removed from the device list of the VM.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/linux/kvm_host.h |  9 +++++++++
 virt/kvm/kvm_main.c      | 13 +++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 831d963451d8..722692e2f745 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1240,6 +1240,15 @@ struct kvm_device_ops {
 	 */
 	void (*destroy)(struct kvm_device *dev);
 
+	/*
+	 * Release is an alternative method to free the device. It is
+	 * called when the device file descriptor is closed. Once
+	 * release is called, the destroy method will not be called
+	 * anymore as the device is removed from the device list of
+	 * the VM. kvm->lock is held.
+	 */
+	void (*release)(struct kvm_device *dev);
+
 	int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
 	int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
 	int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ea2018ae1cd7..ea2619d5ca98 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2938,6 +2938,19 @@ static int kvm_device_release(struct inode *inode, struct file *filp)
 	struct kvm_device *dev = filp->private_data;
 	struct kvm *kvm = dev->kvm;
 
+	if (!dev)
+		return -ENODEV;
+
+	if (dev->kvm != kvm)
+		return -EPERM;
+
+	if (dev->ops->release) {
+		mutex_lock(&kvm->lock);
+		list_del(&dev->vm_node);
+		dev->ops->release(dev);
+		mutex_unlock(&kvm->lock);
+	}
+
 	kvm_put_kvm(kvm);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release'
  2019-04-11 13:53 [PATCH 0/2] KVM: PPC: Book3S HV: add XIVE native exploitation mode (v5 errata) Cédric Le Goater
  2019-04-11 13:53 ` [PATCH 1/2] KVM: introduce a 'release' method for KVM devices Cédric Le Goater
@ 2019-04-11 13:53 ` Cédric Le Goater
  2019-04-12  6:34   ` David Gibson
  1 sibling, 1 reply; 5+ messages in thread
From: Cédric Le Goater @ 2019-04-11 13:53 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Paul Mackerras, David Gibson, kvm, Cédric Le Goater

When a P9 sPAPR VM boots, the CAS negotiation process determines which
interrupt mode to use (XICS legacy or XIVE native) and invokes a
machine reset to activate the chosen mode.

We introduce 'release' methods for the XICS-on-XIVE and the XIVE
native KVM devices which are called when the file descriptor of the
device is closed after the TIMA and ESB pages have been unmapped.
They perform the necessary cleanups : clear the vCPU interrupt
presenters that could be attached and then destroy the device. The
'release' methods replace the 'destroy' methods as 'destroy' is not
called anymore once 'release' is. Compatibility with older QEMU is
nevertheless maintained.

This is not considered as a safe operation as the vCPUs are still
running and could be referencing the KVM device through their
presenters. To protect the system from any breakage, the kvmppc_xive
objects representing both KVM devices are now stored in an array under
the VM. Allocation is performed on first usage and memory is freed
only when the VM exits.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/kvm/book3s_xive.h        |  1 +
 arch/powerpc/kvm/book3s_xive.c        | 82 +++++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_xive_native.c | 30 ++++++++--
 arch/powerpc/kvm/powerpc.c            |  9 +++
 5 files changed, 112 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 9cc6abdce1b9..ed059c95e56a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -314,6 +314,7 @@ struct kvm_arch {
 #ifdef CONFIG_KVM_XICS
 	struct kvmppc_xics *xics;
 	struct kvmppc_xive *xive;
+	struct kvmppc_xive *xive_devices[2];
 	struct kvmppc_passthru_irqmap *pimap;
 #endif
 	struct kvmppc_ops *kvm_ops;
diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index e011622dc038..426146332984 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -283,6 +283,7 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
 int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
 int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
 				  bool single_escalation);
+struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
 
 #endif /* CONFIG_KVM_XICS */
 #endif /* _KVM_PPC_BOOK3S_XICS_H */
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 480a3fc6b9fd..7dec0c350a14 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
 void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
-	struct kvmppc_xive *xive = xc->xive;
+	struct kvmppc_xive *xive;
 	int i;
 
+	if (!kvmppc_xics_enabled(vcpu))
+		return;
+
+	if (!xc)
+		return;
+
 	pr_devel("cleanup_vcpu(cpu=%d)\n", xc->server_num);
 
+	xive = xc->xive;
+
 	/* Ensure no interrupt is still routed to that VP */
 	xc->valid = false;
 	kvmppc_xive_disable_vcpu_interrupts(vcpu);
@@ -1141,6 +1149,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
 	}
 	/* Free the VP */
 	kfree(xc);
+
+	/* Cleanup the vcpu */
+	vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT;
+	vcpu->arch.xive_vcpu = NULL;
 }
 
 int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
@@ -1158,7 +1170,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
 	}
 	if (xive->kvm != vcpu->kvm)
 		return -EPERM;
-	if (vcpu->arch.irq_type)
+	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
 		return -EBUSY;
 	if (kvmppc_xive_find_server(vcpu->kvm, cpu)) {
 		pr_devel("Duplicate !\n");
@@ -1824,12 +1836,39 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
 	}
 }
 
-static void kvmppc_xive_free(struct kvm_device *dev)
+/*
+ * Called when device fd is closed
+ */
+static void kvmppc_xive_release(struct kvm_device *dev)
 {
 	struct kvmppc_xive *xive = dev->private;
 	struct kvm *kvm = xive->kvm;
+	struct kvm_vcpu *vcpu;
 	int i;
 
+	pr_devel("Releasing xive device\n");
+
+	/*
+	 * When releasing the KVM device fd, the vCPUs can still be
+	 * running and we should clean up the vCPU interrupt
+	 * presenters first.
+	 */
+	if (atomic_read(&kvm->online_vcpus) != 0) {
+		/*
+		 * call kick_all_cpus_sync() to ensure that all CPUs
+		 * have executed any pending interrupts
+		 */
+		if (is_kvmppc_hv_enabled(kvm))
+			kick_all_cpus_sync();
+
+		/*
+		 * TODO: There is still a race window with the early
+		 * checks in kvmppc_native_connect_vcpu()
+		 */
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			kvmppc_xive_cleanup_vcpu(vcpu);
+	}
+
 	debugfs_remove(xive->dentry);
 
 	if (kvm)
@@ -1846,11 +1885,42 @@ static void kvmppc_xive_free(struct kvm_device *dev)
 	if (xive->vp_base != XIVE_INVALID_VP)
 		xive_native_free_vp_block(xive->vp_base);
 
+	/*
+	 * A reference of the kvmppc_xive pointer is now kept under
+	 * the xive_devices[] array of the machine for reuse. It is
+	 * freed when the VM is destroyed for now until we fix all the
+	 * execution paths.
+	 */
 
-	kfree(xive);
 	kfree(dev);
 }
 
+/*
+ * When the guest chooses the interrupt mode (XICS legacy or XIVE
+ * native), the VM will switch of KVM device. The previous device will
+ * be "released" before the new one is created.
+ *
+ * Until we are sure all execution paths are well protected, provide a
+ * fail safe (transitional) method for device destruction, in which
+ * the XIVE device pointer is recycled and not directly freed.
+ */
+struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type)
+{
+	struct kvmppc_xive *xive;
+	bool xive_native_index = type == KVM_DEV_TYPE_XIVE;
+
+	xive = kvm->arch.xive_devices[xive_native_index];
+
+	if (!xive) {
+		xive = kzalloc(sizeof(*xive), GFP_KERNEL);
+		kvm->arch.xive_devices[xive_native_index] = xive;
+	} else {
+		memset(xive, 0, sizeof(*xive));
+	}
+
+	return xive;
+}
+
 static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 {
 	struct kvmppc_xive *xive;
@@ -1859,7 +1929,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 
 	pr_devel("Creating xive for partition\n");
 
-	xive = kzalloc(sizeof(*xive), GFP_KERNEL);
+	xive = kvmppc_xive_get_device(kvm, type);
 	if (!xive)
 		return -ENOMEM;
 
@@ -2024,7 +2094,7 @@ struct kvm_device_ops kvm_xive_ops = {
 	.name = "kvm-xive",
 	.create = kvmppc_xive_create,
 	.init = kvmppc_xive_init,
-	.destroy = kvmppc_xive_free,
+	.release = kvmppc_xive_release,
 	.set_attr = xive_set_attr,
 	.get_attr = xive_get_attr,
 	.has_attr = xive_has_attr,
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 62648f833adf..a99af2766ce3 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -964,15 +964,29 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
 	return -ENXIO;
 }
 
-static void kvmppc_xive_native_free(struct kvm_device *dev)
+/*
+ * Called when device fd is closed
+ */
+static void kvmppc_xive_native_release(struct kvm_device *dev)
 {
 	struct kvmppc_xive *xive = dev->private;
 	struct kvm *kvm = xive->kvm;
+	struct kvm_vcpu *vcpu;
 	int i;
 
 	debugfs_remove(xive->dentry);
 
-	pr_devel("Destroying xive native device\n");
+	pr_devel("Releasing xive native device\n");
+
+	/*
+	 * When releasing the KVM device fd, the vCPUs can still be
+	 * running and we should clean up the vCPU interrupt
+	 * presenters first.
+	 */
+	if (atomic_read(&kvm->online_vcpus) != 0) {
+		kvm_for_each_vcpu(i, vcpu, kvm)
+			kvmppc_xive_native_cleanup_vcpu(vcpu);
+	}
 
 	if (kvm)
 		kvm->arch.xive = NULL;
@@ -987,7 +1001,13 @@ static void kvmppc_xive_native_free(struct kvm_device *dev)
 	if (xive->vp_base != XIVE_INVALID_VP)
 		xive_native_free_vp_block(xive->vp_base);
 
-	kfree(xive);
+	/*
+	 * A reference of the kvmppc_xive pointer is now kept under
+	 * the xive_devices[] array of the machine for reuse. It is
+	 * freed when the VM is destroyed for now until we fix all the
+	 * execution paths.
+	 */
+
 	kfree(dev);
 }
 
@@ -1002,7 +1022,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	if (kvm->arch.xive)
 		return -EEXIST;
 
-	xive = kzalloc(sizeof(*xive), GFP_KERNEL);
+	xive = kvmppc_xive_get_device(kvm, type);
 	if (!xive)
 		return -ENOMEM;
 
@@ -1182,7 +1202,7 @@ struct kvm_device_ops kvm_xive_native_ops = {
 	.name = "kvm-xive-native",
 	.create = kvmppc_xive_native_create,
 	.init = kvmppc_xive_native_init,
-	.destroy = kvmppc_xive_native_free,
+	.release = kvmppc_xive_native_release,
 	.set_attr = kvmppc_xive_native_set_attr,
 	.get_attr = kvmppc_xive_native_get_attr,
 	.has_attr = kvmppc_xive_native_has_attr,
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index f54926c78320..9b9c8a9f28b5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -501,6 +501,15 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	mutex_unlock(&kvm->lock);
 
+	/*
+	 * Free the XIVE devices which are not directly freed by the
+	 * device 'release' method
+	 */
+	for (i = 0; i < ARRAY_SIZE(kvm->arch.xive_devices); i++) {
+		kfree(kvm->arch.xive_devices[i]);
+		kvm->arch.xive_devices[i] = NULL;
+	}
+
 	/* drop the module reference */
 	module_put(kvm->arch.kvm_ops->owner);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] KVM: introduce a 'release' method for KVM devices
  2019-04-11 13:53 ` [PATCH 1/2] KVM: introduce a 'release' method for KVM devices Cédric Le Goater
@ 2019-04-12  6:32   ` David Gibson
  0 siblings, 0 replies; 5+ messages in thread
From: David Gibson @ 2019-04-12  6:32 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm-ppc, Paul Mackerras, kvm, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

On Thu, Apr 11, 2019 at 03:53:01PM +0200, Cédric Le Goater wrote:
> When a P9 sPAPR VM boots, the CAS negotiation process determines which
> interrupt mode to use (XICS legacy or XIVE native) and invokes a
> machine reset to activate the chosen mode.
> 
> To be able to switch from one interrupt mode to another, we introduce
> the capability to release a KVM device without destroying the VM. The
> KVM device interface is extended with a new 'release' method which is
> called when the file descriptor of the device is closed.
> 
> Once 'release' is called, the 'destroy' method will not be called
> anymore as the device is removed from the device list of the VM.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Kind of a hack, but a better way to solve this that doesn't involve
inordinate amounts of work doesn't occur to me.

> ---
>  include/linux/kvm_host.h |  9 +++++++++
>  virt/kvm/kvm_main.c      | 13 +++++++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 831d963451d8..722692e2f745 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1240,6 +1240,15 @@ struct kvm_device_ops {
>  	 */
>  	void (*destroy)(struct kvm_device *dev);
>  
> +	/*
> +	 * Release is an alternative method to free the device. It is
> +	 * called when the device file descriptor is closed. Once
> +	 * release is called, the destroy method will not be called
> +	 * anymore as the device is removed from the device list of
> +	 * the VM. kvm->lock is held.
> +	 */
> +	void (*release)(struct kvm_device *dev);
> +
>  	int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
>  	int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
>  	int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ea2018ae1cd7..ea2619d5ca98 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2938,6 +2938,19 @@ static int kvm_device_release(struct inode *inode, struct file *filp)
>  	struct kvm_device *dev = filp->private_data;
>  	struct kvm *kvm = dev->kvm;
>  
> +	if (!dev)
> +		return -ENODEV;
> +
> +	if (dev->kvm != kvm)
> +		return -EPERM;
> +
> +	if (dev->ops->release) {
> +		mutex_lock(&kvm->lock);
> +		list_del(&dev->vm_node);
> +		dev->ops->release(dev);
> +		mutex_unlock(&kvm->lock);
> +	}
> +
>  	kvm_put_kvm(kvm);
>  	return 0;
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release'
  2019-04-11 13:53 ` [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release' Cédric Le Goater
@ 2019-04-12  6:34   ` David Gibson
  0 siblings, 0 replies; 5+ messages in thread
From: David Gibson @ 2019-04-12  6:34 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: kvm-ppc, Paul Mackerras, kvm

[-- Attachment #1: Type: text/plain, Size: 10830 bytes --]

On Thu, Apr 11, 2019 at 03:53:02PM +0200, Cédric Le Goater wrote:
> When a P9 sPAPR VM boots, the CAS negotiation process determines which
> interrupt mode to use (XICS legacy or XIVE native) and invokes a
> machine reset to activate the chosen mode.
> 
> We introduce 'release' methods for the XICS-on-XIVE and the XIVE
> native KVM devices which are called when the file descriptor of the
> device is closed after the TIMA and ESB pages have been unmapped.
> They perform the necessary cleanups : clear the vCPU interrupt
> presenters that could be attached and then destroy the device. The
> 'release' methods replace the 'destroy' methods as 'destroy' is not
> called anymore once 'release' is. Compatibility with older QEMU is
> nevertheless maintained.
> 
> This is not considered as a safe operation as the vCPUs are still
> running and could be referencing the KVM device through their
> presenters. To protect the system from any breakage, the kvmppc_xive
> objects representing both KVM devices are now stored in an array under
> the VM. Allocation is performed on first usage and memory is freed
> only when the VM exits.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Although a nit..

> ---
>  arch/powerpc/include/asm/kvm_host.h   |  1 +
>  arch/powerpc/kvm/book3s_xive.h        |  1 +
>  arch/powerpc/kvm/book3s_xive.c        | 82 +++++++++++++++++++++++++--
>  arch/powerpc/kvm/book3s_xive_native.c | 30 ++++++++--
>  arch/powerpc/kvm/powerpc.c            |  9 +++
>  5 files changed, 112 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 9cc6abdce1b9..ed059c95e56a 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -314,6 +314,7 @@ struct kvm_arch {
>  #ifdef CONFIG_KVM_XICS
>  	struct kvmppc_xics *xics;
>  	struct kvmppc_xive *xive;
> +	struct kvmppc_xive *xive_devices[2];

This is essentially a bool-indexed array, which is a fairly confusing
thing.  It thing using separate fields with meaningful names would be
better.  It'll mean slightly more code in get_device() but not really
that much, and I think the intent will be clearer.

>  	struct kvmppc_passthru_irqmap *pimap;
>  #endif
>  	struct kvmppc_ops *kvm_ops;
> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> index e011622dc038..426146332984 100644
> --- a/arch/powerpc/kvm/book3s_xive.h
> +++ b/arch/powerpc/kvm/book3s_xive.h
> @@ -283,6 +283,7 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
>  int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
>  int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
>  				  bool single_escalation);
> +struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type);
>  
>  #endif /* CONFIG_KVM_XICS */
>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 480a3fc6b9fd..7dec0c350a14 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
>  void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> -	struct kvmppc_xive *xive = xc->xive;
> +	struct kvmppc_xive *xive;
>  	int i;
>  
> +	if (!kvmppc_xics_enabled(vcpu))
> +		return;
> +
> +	if (!xc)
> +		return;
> +
>  	pr_devel("cleanup_vcpu(cpu=%d)\n", xc->server_num);
>  
> +	xive = xc->xive;
> +
>  	/* Ensure no interrupt is still routed to that VP */
>  	xc->valid = false;
>  	kvmppc_xive_disable_vcpu_interrupts(vcpu);
> @@ -1141,6 +1149,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu)
>  	}
>  	/* Free the VP */
>  	kfree(xc);
> +
> +	/* Cleanup the vcpu */
> +	vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT;
> +	vcpu->arch.xive_vcpu = NULL;
>  }
>  
>  int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
> @@ -1158,7 +1170,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  	}
>  	if (xive->kvm != vcpu->kvm)
>  		return -EPERM;
> -	if (vcpu->arch.irq_type)
> +	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
>  		return -EBUSY;
>  	if (kvmppc_xive_find_server(vcpu->kvm, cpu)) {
>  		pr_devel("Duplicate !\n");
> @@ -1824,12 +1836,39 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
>  	}
>  }
>  
> -static void kvmppc_xive_free(struct kvm_device *dev)
> +/*
> + * Called when device fd is closed
> + */
> +static void kvmppc_xive_release(struct kvm_device *dev)
>  {
>  	struct kvmppc_xive *xive = dev->private;
>  	struct kvm *kvm = xive->kvm;
> +	struct kvm_vcpu *vcpu;
>  	int i;
>  
> +	pr_devel("Releasing xive device\n");
> +
> +	/*
> +	 * When releasing the KVM device fd, the vCPUs can still be
> +	 * running and we should clean up the vCPU interrupt
> +	 * presenters first.
> +	 */
> +	if (atomic_read(&kvm->online_vcpus) != 0) {
> +		/*
> +		 * call kick_all_cpus_sync() to ensure that all CPUs
> +		 * have executed any pending interrupts
> +		 */
> +		if (is_kvmppc_hv_enabled(kvm))
> +			kick_all_cpus_sync();
> +
> +		/*
> +		 * TODO: There is still a race window with the early
> +		 * checks in kvmppc_native_connect_vcpu()
> +		 */
> +		kvm_for_each_vcpu(i, vcpu, kvm)
> +			kvmppc_xive_cleanup_vcpu(vcpu);
> +	}
> +
>  	debugfs_remove(xive->dentry);
>  
>  	if (kvm)
> @@ -1846,11 +1885,42 @@ static void kvmppc_xive_free(struct kvm_device *dev)
>  	if (xive->vp_base != XIVE_INVALID_VP)
>  		xive_native_free_vp_block(xive->vp_base);
>  
> +	/*
> +	 * A reference of the kvmppc_xive pointer is now kept under
> +	 * the xive_devices[] array of the machine for reuse. It is
> +	 * freed when the VM is destroyed for now until we fix all the
> +	 * execution paths.
> +	 */
>  
> -	kfree(xive);
>  	kfree(dev);
>  }
>  
> +/*
> + * When the guest chooses the interrupt mode (XICS legacy or XIVE
> + * native), the VM will switch of KVM device. The previous device will
> + * be "released" before the new one is created.
> + *
> + * Until we are sure all execution paths are well protected, provide a
> + * fail safe (transitional) method for device destruction, in which
> + * the XIVE device pointer is recycled and not directly freed.
> + */
> +struct kvmppc_xive *kvmppc_xive_get_device(struct kvm *kvm, u32 type)
> +{
> +	struct kvmppc_xive *xive;
> +	bool xive_native_index = type == KVM_DEV_TYPE_XIVE;
> +
> +	xive = kvm->arch.xive_devices[xive_native_index];
> +
> +	if (!xive) {
> +		xive = kzalloc(sizeof(*xive), GFP_KERNEL);
> +		kvm->arch.xive_devices[xive_native_index] = xive;
> +	} else {
> +		memset(xive, 0, sizeof(*xive));
> +	}
> +
> +	return xive;
> +}
> +
>  static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>  {
>  	struct kvmppc_xive *xive;
> @@ -1859,7 +1929,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>  
>  	pr_devel("Creating xive for partition\n");
>  
> -	xive = kzalloc(sizeof(*xive), GFP_KERNEL);
> +	xive = kvmppc_xive_get_device(kvm, type);
>  	if (!xive)
>  		return -ENOMEM;
>  
> @@ -2024,7 +2094,7 @@ struct kvm_device_ops kvm_xive_ops = {
>  	.name = "kvm-xive",
>  	.create = kvmppc_xive_create,
>  	.init = kvmppc_xive_init,
> -	.destroy = kvmppc_xive_free,
> +	.release = kvmppc_xive_release,
>  	.set_attr = xive_set_attr,
>  	.get_attr = xive_get_attr,
>  	.has_attr = xive_has_attr,
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 62648f833adf..a99af2766ce3 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -964,15 +964,29 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
>  	return -ENXIO;
>  }
>  
> -static void kvmppc_xive_native_free(struct kvm_device *dev)
> +/*
> + * Called when device fd is closed
> + */
> +static void kvmppc_xive_native_release(struct kvm_device *dev)
>  {
>  	struct kvmppc_xive *xive = dev->private;
>  	struct kvm *kvm = xive->kvm;
> +	struct kvm_vcpu *vcpu;
>  	int i;
>  
>  	debugfs_remove(xive->dentry);
>  
> -	pr_devel("Destroying xive native device\n");
> +	pr_devel("Releasing xive native device\n");
> +
> +	/*
> +	 * When releasing the KVM device fd, the vCPUs can still be
> +	 * running and we should clean up the vCPU interrupt
> +	 * presenters first.
> +	 */
> +	if (atomic_read(&kvm->online_vcpus) != 0) {
> +		kvm_for_each_vcpu(i, vcpu, kvm)
> +			kvmppc_xive_native_cleanup_vcpu(vcpu);
> +	}
>  
>  	if (kvm)
>  		kvm->arch.xive = NULL;
> @@ -987,7 +1001,13 @@ static void kvmppc_xive_native_free(struct kvm_device *dev)
>  	if (xive->vp_base != XIVE_INVALID_VP)
>  		xive_native_free_vp_block(xive->vp_base);
>  
> -	kfree(xive);
> +	/*
> +	 * A reference of the kvmppc_xive pointer is now kept under
> +	 * the xive_devices[] array of the machine for reuse. It is
> +	 * freed when the VM is destroyed for now until we fix all the
> +	 * execution paths.
> +	 */
> +
>  	kfree(dev);
>  }
>  
> @@ -1002,7 +1022,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
>  	if (kvm->arch.xive)
>  		return -EEXIST;
>  
> -	xive = kzalloc(sizeof(*xive), GFP_KERNEL);
> +	xive = kvmppc_xive_get_device(kvm, type);
>  	if (!xive)
>  		return -ENOMEM;
>  
> @@ -1182,7 +1202,7 @@ struct kvm_device_ops kvm_xive_native_ops = {
>  	.name = "kvm-xive-native",
>  	.create = kvmppc_xive_native_create,
>  	.init = kvmppc_xive_native_init,
> -	.destroy = kvmppc_xive_native_free,
> +	.release = kvmppc_xive_native_release,
>  	.set_attr = kvmppc_xive_native_set_attr,
>  	.get_attr = kvmppc_xive_native_get_attr,
>  	.has_attr = kvmppc_xive_native_has_attr,
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index f54926c78320..9b9c8a9f28b5 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -501,6 +501,15 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  
>  	mutex_unlock(&kvm->lock);
>  
> +	/*
> +	 * Free the XIVE devices which are not directly freed by the
> +	 * device 'release' method
> +	 */
> +	for (i = 0; i < ARRAY_SIZE(kvm->arch.xive_devices); i++) {
> +		kfree(kvm->arch.xive_devices[i]);
> +		kvm->arch.xive_devices[i] = NULL;
> +	}
> +
>  	/* drop the module reference */
>  	module_put(kvm->arch.kvm_ops->owner);
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-04-12  6:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 13:53 [PATCH 0/2] KVM: PPC: Book3S HV: add XIVE native exploitation mode (v5 errata) Cédric Le Goater
2019-04-11 13:53 ` [PATCH 1/2] KVM: introduce a 'release' method for KVM devices Cédric Le Goater
2019-04-12  6:32   ` David Gibson
2019-04-11 13:53 ` [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: replace the 'destroy' method by 'release' Cédric Le Goater
2019-04-12  6:34   ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).