linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback
@ 2021-05-19 15:39 Tony Krowiak
  2021-05-19 15:39 ` [PATCH v3 1/2] " Tony Krowiak
  2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak
  0 siblings, 2 replies; 15+ messages in thread
From: Tony Krowiak @ 2021-05-19 15:39 UTC (permalink / raw)
  To: linux-s390, linux-kernel
  Cc: borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson,
	kwankhede, Tony Krowiak

Fixes a memory leak in the mdev remove callback when invoked while the
mdev is in use by a KVM guest. Instead of returning -EBUSY from the
callback, a full cleanup of the resources allocated to the mdev is
performed because regardless of the value returned from the function, the
mdev is removed from sysfs.

The cleanup of resources allocated to the mdev may coincide with the 
interception of the PQAP(AQIC) instruction in which case data needed to
handle the interception may get removed. A patch is included in this series
to synchronize access to resources needed by the interception handler to
protect against invalid memory accesses.

Change log:
v2 -> v3:
--------
* Added a patch to control access to the PQAP(AQIC) hook using RCU

v1 -> v2:
--------
* Call vfio_ap_mdev_unset_kvm() function from the remove callback instead
  of merely clearing the guest's APCB.

Tony Krowiak (2):
  s390/vfio-ap: fix memory leak in mdev remove callback
  s390/vfio-ap: control access to PQAP(AQIC) interception handler

 arch/s390/include/asm/kvm_host.h  |  1 +
 arch/s390/kvm/priv.c              | 47 +++++++++++++---------
 drivers/s390/crypto/vfio_ap_ops.c | 67 +++++++++++++++++++++----------
 3 files changed, 75 insertions(+), 40 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/2] s390/vfio-ap: fix memory leak in mdev remove callback
  2021-05-19 15:39 [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback Tony Krowiak
@ 2021-05-19 15:39 ` Tony Krowiak
  2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak
  1 sibling, 0 replies; 15+ messages in thread
From: Tony Krowiak @ 2021-05-19 15:39 UTC (permalink / raw)
  To: linux-s390, linux-kernel
  Cc: borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson,
	kwankhede, Tony Krowiak, stable

The mdev remove callback for the vfio_ap device driver bails out with
-EBUSY if the mdev is in use by a KVM guest. The intended purpose was
to prevent the mdev from being removed while in use; however, returning a
non-zero rc does not prevent removal. This could result in a memory leak
of the resources allocated when the mdev was created. In addition, the
KVM guest will still have access to the AP devices assigned to the mdev
even though the mdev no longer exists.

To prevent this scenario, cleanup will be done - including unplugging the
AP adapters, domains and control domains - regardless of whether the mdev
is in use by a KVM guest or not.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index b2c7e10dfdcd..f90c9103dac2 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -26,6 +26,7 @@
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev);
 
 static int match_apqn(struct device *dev, const void *data)
 {
@@ -366,17 +367,7 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
 	mutex_lock(&matrix_dev->lock);
-
-	/*
-	 * If the KVM pointer is in flux or the guest is running, disallow
-	 * un-assignment of control domain.
-	 */
-	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
-		mutex_unlock(&matrix_dev->lock);
-		return -EBUSY;
-	}
-
-	vfio_ap_mdev_reset_queues(mdev);
+	vfio_ap_mdev_unset_kvm(matrix_mdev);
 	list_del(&matrix_mdev->node);
 	kfree(matrix_mdev);
 	mdev_set_drvdata(mdev, NULL);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 15:39 [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback Tony Krowiak
  2021-05-19 15:39 ` [PATCH v3 1/2] " Tony Krowiak
@ 2021-05-19 15:39 ` Tony Krowiak
  2021-05-19 16:16   ` Jason Gunthorpe
  2021-05-19 17:21   ` Halil Pasic
  1 sibling, 2 replies; 15+ messages in thread
From: Tony Krowiak @ 2021-05-19 15:39 UTC (permalink / raw)
  To: linux-s390, linux-kernel
  Cc: borntraeger, cohuck, pasic, jjherne, jgg, alex.williamson,
	kwankhede, Tony Krowiak

There is currently nothing that controls access to the structure that
contains the function pointer to the handler that processes interception of
the PQAP(AQIC) instruction. If the mdev is removed while the PQAP(AQIC)
instruction is being intercepted, there is a possibility that the function
pointer to the handler can get wiped out prior to the attempt to call it.

This patch utilizes RCU to synchronize access to the kvm_s390_module_hook
structure used to process interception of the PQAP(AQIC) instruction.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h  |  1 +
 arch/s390/kvm/priv.c              | 47 ++++++++++++++++-----------
 drivers/s390/crypto/vfio_ap_ops.c | 54 ++++++++++++++++++++++++-------
 3 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8925f3969478..4987e82d6116 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -806,6 +806,7 @@ struct kvm_s390_cpu_model {
 struct kvm_s390_module_hook {
 	int (*hook)(struct kvm_vcpu *vcpu);
 	struct module *owner;
+	void *data;
 };
 
 struct kvm_s390_crypto {
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 9928f785c677..2d330dfbdb61 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -610,8 +610,11 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
 static int handle_pqap(struct kvm_vcpu *vcpu)
 {
 	struct ap_queue_status status = {};
+	struct kvm_s390_module_hook *pqap_module_hook;
+	int (*pqap_hook)(struct kvm_vcpu *vcpu);
+	struct module *owner;
 	unsigned long reg0;
-	int ret;
+	int ret = 0;
 	uint8_t fc;
 
 	/* Verify that the AP instruction are available */
@@ -657,24 +660,32 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	 * Verify that the hook callback is registered, lock the owner
 	 * and call the hook.
 	 */
-	if (vcpu->kvm->arch.crypto.pqap_hook) {
-		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
-			return -EOPNOTSUPP;
-		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
-		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
-		if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
-			kvm_s390_set_psw_cc(vcpu, 3);
-		return ret;
+	rcu_read_lock();
+	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
+	if (pqap_module_hook) {
+		pqap_hook = pqap_module_hook->hook;
+		owner = pqap_module_hook->owner;
+		rcu_read_unlock();
+		if (!try_module_get(owner)) {
+			ret = -EOPNOTSUPP;
+		} else {
+			ret = pqap_hook(vcpu);
+			module_put(owner);
+			if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
+				kvm_s390_set_psw_cc(vcpu, 3);
+		}
+	} else {
+		rcu_read_unlock();
+		/*
+		 * A vfio_driver must register a hook.
+		 * No hook means no driver to enable the SIE CRYCB and no
+		 * queues. We send this response to the guest.
+		 */
+		status.response_code = 0x01;
+		memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
+		kvm_s390_set_psw_cc(vcpu, 3);
 	}
-	/*
-	 * A vfio_driver must register a hook.
-	 * No hook means no driver to enable the SIE CRYCB and no queues.
-	 * We send this response to the guest.
-	 */
-	status.response_code = 0x01;
-	memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
-	kvm_s390_set_psw_cc(vcpu, 3);
-	return 0;
+	return ret;
 }
 
 static int handle_stfl(struct kvm_vcpu *vcpu)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index f90c9103dac2..a6aa3f753ac4 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -16,6 +16,7 @@
 #include <linux/bitops.h>
 #include <linux/kvm_host.h>
 #include <linux/module.h>
+#include <linux/rcupdate.h>
 #include <asm/kvm.h>
 #include <asm/zcrypt.h>
 
@@ -279,6 +280,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	uint64_t status;
 	uint16_t apqn;
 	struct vfio_ap_queue *q;
+	struct kvm_s390_module_hook *pqap_module_hook;
 	struct ap_queue_status qstatus = {
 			       .response_code = AP_RESPONSE_Q_NOT_AVAIL, };
 	struct ap_matrix_mdev *matrix_mdev;
@@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
 		return -EOPNOTSUPP;
 
-	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
-	mutex_lock(&matrix_dev->lock);
+	rcu_read_lock();
+	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
+	if (!pqap_module_hook) {
+		rcu_read_unlock();
+		goto set_status;
+	}
 
-	if (!vcpu->kvm->arch.crypto.pqap_hook)
-		goto out_unlock;
-	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
-				   struct ap_matrix_mdev, pqap_hook);
+	matrix_mdev = pqap_module_hook->data;
+	rcu_read_unlock();
+	mutex_lock(&matrix_dev->lock);
+	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
 
 	/*
 	 * If the KVM pointer is in the process of being set, wait until the
@@ -322,9 +328,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 		qstatus = vfio_ap_irq_disable(q);
 
 out_unlock:
+	mutex_unlock(&matrix_dev->lock);
+set_status:
 	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
 	vcpu->run->s.regs.gprs[1] >>= 32;
-	mutex_unlock(&matrix_dev->lock);
 	return 0;
 }
 
@@ -353,8 +360,6 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
 	init_waitqueue_head(&matrix_mdev->wait_for_kvm);
 	mdev_set_drvdata(mdev, matrix_mdev);
-	matrix_mdev->pqap_hook.hook = handle_pqap;
-	matrix_mdev->pqap_hook.owner = THIS_MODULE;
 	mutex_lock(&matrix_dev->lock);
 	list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
 	mutex_unlock(&matrix_dev->lock);
@@ -1085,6 +1090,22 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 	NULL
 };
 
+static int vfio_ap_mdev_set_pqap_hook(struct ap_matrix_mdev *matrix_mdev,
+				       struct kvm *kvm)
+{
+	struct kvm_s390_module_hook *pqap_hook;
+
+	pqap_hook = kmalloc(sizeof(*kvm->arch.crypto.pqap_hook), GFP_KERNEL);
+	if (!pqap_hook)
+		return -ENOMEM;
+	pqap_hook->data = matrix_mdev;
+	pqap_hook->hook = handle_pqap;
+	pqap_hook->owner = THIS_MODULE;
+	rcu_assign_pointer(kvm->arch.crypto.pqap_hook, pqap_hook);
+
+	return 0;
+}
+
 /**
  * vfio_ap_mdev_set_kvm
  *
@@ -1107,6 +1128,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
 static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 				struct kvm *kvm)
 {
+	int ret;
 	struct ap_matrix_mdev *m;
 
 	if (kvm->arch.crypto.crycbd) {
@@ -1115,6 +1137,10 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 				return -EPERM;
 		}
 
+		ret = vfio_ap_mdev_set_pqap_hook(matrix_mdev, kvm);
+		if (ret)
+			return ret;
+
 		kvm_get_kvm(kvm);
 		matrix_mdev->kvm_busy = true;
 		mutex_unlock(&matrix_dev->lock);
@@ -1123,7 +1149,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 					  matrix_mdev->matrix.aqm,
 					  matrix_mdev->matrix.adm);
 		mutex_lock(&matrix_dev->lock);
-		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
 		matrix_mdev->kvm = kvm;
 		matrix_mdev->kvm_busy = false;
 		wake_up_all(&matrix_mdev->wait_for_kvm);
@@ -1161,6 +1186,13 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
+static void vfio_ap_mdev_unset_pqap_hook(struct kvm *kvm)
+{
+	rcu_assign_pointer(kvm->arch.crypto.pqap_hook, NULL);
+	synchronize_rcu();
+	kfree(kvm->arch.crypto.pqap_hook);
+}
+
 /**
  * vfio_ap_mdev_unset_kvm
  *
@@ -1189,11 +1221,11 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
 
 	if (matrix_mdev->kvm) {
 		matrix_mdev->kvm_busy = true;
+		vfio_ap_mdev_unset_pqap_hook(matrix_mdev->kvm);
 		mutex_unlock(&matrix_dev->lock);
 		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
 		mutex_lock(&matrix_dev->lock);
 		vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
 		kvm_put_kvm(matrix_mdev->kvm);
 		matrix_mdev->kvm = NULL;
 		matrix_mdev->kvm_busy = false;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak
@ 2021-05-19 16:16   ` Jason Gunthorpe
  2021-05-19 23:04     ` Tony Krowiak
  2021-05-19 17:21   ` Halil Pasic
  1 sibling, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2021-05-19 16:16 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	alex.williamson, kwankhede

On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote:

> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
>  		return -EOPNOTSUPP;
>  
> -	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> -	mutex_lock(&matrix_dev->lock);
> +	rcu_read_lock();
> +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
> +	if (!pqap_module_hook) {
> +		rcu_read_unlock();
> +		goto set_status;
> +	}
>  
> -	if (!vcpu->kvm->arch.crypto.pqap_hook)
> -		goto out_unlock;
> -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> -				   struct ap_matrix_mdev, pqap_hook);
> +	matrix_mdev = pqap_module_hook->data;
> +	rcu_read_unlock();
> +	mutex_lock(&matrix_dev->lock);

The matrix_mdev pointer was extracted from the pqap_module_hook,
but now there is nothing protecting it since the rcu was dropped and
it gets freed in vfio_ap_mdev_remove.

And, again, module locking doesn't prevent vfio_ap_mdev_remove() from
being called. None of these patches should be combining module locking
with RCU.

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak
  2021-05-19 16:16   ` Jason Gunthorpe
@ 2021-05-19 17:21   ` Halil Pasic
  2021-05-19 23:14     ` Tony Krowiak
  1 sibling, 1 reply; 15+ messages in thread
From: Halil Pasic @ 2021-05-19 17:21 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	jgg, alex.williamson, kwankhede

On Wed, 19 May 2021 11:39:21 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> There is currently nothing that controls access to the structure that
> contains the function pointer to the handler that processes interception of
> the PQAP(AQIC) instruction. If the mdev is removed while the PQAP(AQIC)
> instruction is being intercepted, there is a possibility that the function
> pointer to the handler can get wiped out prior to the attempt to call it.
> 
> This patch utilizes RCU to synchronize access to the kvm_s390_module_hook
> structure used to process interception of the PQAP(AQIC) instruction.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h  |  1 +
>  arch/s390/kvm/priv.c              | 47 ++++++++++++++++-----------
>  drivers/s390/crypto/vfio_ap_ops.c | 54 ++++++++++++++++++++++++-------
>  3 files changed, 73 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 8925f3969478..4987e82d6116 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -806,6 +806,7 @@ struct kvm_s390_cpu_model {
>  struct kvm_s390_module_hook {
>  	int (*hook)(struct kvm_vcpu *vcpu);
>  	struct module *owner;
> +	void *data;

I guess you need this, because you stopped using the member of struct
ap_mdev_matrix and instead you kzalloc() a new object. Yet I don't
understand why do you do so?

>  };
>  
>  struct kvm_s390_crypto {
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 9928f785c677..2d330dfbdb61 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -610,8 +610,11 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>  static int handle_pqap(struct kvm_vcpu *vcpu)
>  {
>  	struct ap_queue_status status = {};
> +	struct kvm_s390_module_hook *pqap_module_hook;
> +	int (*pqap_hook)(struct kvm_vcpu *vcpu);
> +	struct module *owner;
>  	unsigned long reg0;
> -	int ret;
> +	int ret = 0;
>  	uint8_t fc;
>  
>  	/* Verify that the AP instruction are available */
> @@ -657,24 +660,32 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	 * Verify that the hook callback is registered, lock the owner
>  	 * and call the hook.
>  	 */
> -	if (vcpu->kvm->arch.crypto.pqap_hook) {
> -		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> -			return -EOPNOTSUPP;
> -		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> -		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
> -		if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
> -			kvm_s390_set_psw_cc(vcpu, 3);
> -		return ret;
> +	rcu_read_lock();
> +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
> +	if (pqap_module_hook) {
> +		pqap_hook = pqap_module_hook->hook;
> +		owner = pqap_module_hook->owner;
> +		rcu_read_unlock();
> +		if (!try_module_get(owner)) {

Why do this outside the rcu_read lock?

What guarantees that the module ain't gone by this time? I don't think
try_module_get() is guaranteed to give you false if passed in a pointer
that points to some memory that ain't a struct module any more
(use-after-free).

> +			ret = -EOPNOTSUPP;
> +		} else {
> +			ret = pqap_hook(vcpu);
> +			module_put(owner);
> +			if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
> +				kvm_s390_set_psw_cc(vcpu, 3);
> +		}
> +	} else {
> +		rcu_read_unlock();
> +		/*
> +		 * A vfio_driver must register a hook.
> +		 * No hook means no driver to enable the SIE CRYCB and no
> +		 * queues. We send this response to the guest.
> +		 */
> +		status.response_code = 0x01;
> +		memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
> +		kvm_s390_set_psw_cc(vcpu, 3);
>  	}
> -	/*
> -	 * A vfio_driver must register a hook.
> -	 * No hook means no driver to enable the SIE CRYCB and no queues.
> -	 * We send this response to the guest.
> -	 */
> -	status.response_code = 0x01;
> -	memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
> -	kvm_s390_set_psw_cc(vcpu, 3);
> -	return 0;
> +	return ret;
>  }
>  
>  static int handle_stfl(struct kvm_vcpu *vcpu)
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index f90c9103dac2..a6aa3f753ac4 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -16,6 +16,7 @@
>  #include <linux/bitops.h>
>  #include <linux/kvm_host.h>
>  #include <linux/module.h>
> +#include <linux/rcupdate.h>
>  #include <asm/kvm.h>
>  #include <asm/zcrypt.h>
>  
> @@ -279,6 +280,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	uint64_t status;
>  	uint16_t apqn;
>  	struct vfio_ap_queue *q;
> +	struct kvm_s390_module_hook *pqap_module_hook;
>  	struct ap_queue_status qstatus = {
>  			       .response_code = AP_RESPONSE_Q_NOT_AVAIL, };
>  	struct ap_matrix_mdev *matrix_mdev;
> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
>  		return -EOPNOTSUPP;
>  
> -	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> -	mutex_lock(&matrix_dev->lock);
> +	rcu_read_lock();
> +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
> +	if (!pqap_module_hook) {
> +		rcu_read_unlock();
> +		goto set_status;
> +	}
>  
> -	if (!vcpu->kvm->arch.crypto.pqap_hook)
> -		goto out_unlock;
> -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> -				   struct ap_matrix_mdev, pqap_hook);
> +	matrix_mdev = pqap_module_hook->data;
> +	rcu_read_unlock();
> +	mutex_lock(&matrix_dev->lock);

I agree with Jason's assessment. At this point the matrix_dev pointer
may point to garbage.

Above, I think we can use the pqap_hook function pointer local to
handle_pqap, because we know that as long as the module is there
the callback will sit at the same address and won't go away. And
we do the try_module_get() to ensure that the module stays loaded.


> +	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>  
>  	/*
>  	 * If the KVM pointer is in the process of being set, wait until the
> @@ -322,9 +328,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  		qstatus = vfio_ap_irq_disable(q);
>  
>  out_unlock:
> +	mutex_unlock(&matrix_dev->lock);
> +set_status:
>  	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
>  	vcpu->run->s.regs.gprs[1] >>= 32;
> -	mutex_unlock(&matrix_dev->lock);
>  	return 0;
>  }
>  
> @@ -353,8 +360,6 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)
>  	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>  	init_waitqueue_head(&matrix_mdev->wait_for_kvm);
>  	mdev_set_drvdata(mdev, matrix_mdev);
> -	matrix_mdev->pqap_hook.hook = handle_pqap;
> -	matrix_mdev->pqap_hook.owner = THIS_MODULE;

I guess the member of struct ap_matrix_mdev is still around, it will
remain all zero. Is this somehow intentional?


>  	mutex_lock(&matrix_dev->lock);
>  	list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
>  	mutex_unlock(&matrix_dev->lock);
> @@ -1085,6 +1090,22 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>  	NULL
>  };
>  
> +static int vfio_ap_mdev_set_pqap_hook(struct ap_matrix_mdev *matrix_mdev,
> +				       struct kvm *kvm)
> +{
> +	struct kvm_s390_module_hook *pqap_hook;
> +
> +	pqap_hook = kmalloc(sizeof(*kvm->arch.crypto.pqap_hook), GFP_KERNEL);

What is the extra allocation supposed to buy us?

> +	if (!pqap_hook)
> +		return -ENOMEM;
> +	pqap_hook->data = matrix_mdev;
> +	pqap_hook->hook = handle_pqap;
> +	pqap_hook->owner = THIS_MODULE;
> +	rcu_assign_pointer(kvm->arch.crypto.pqap_hook, pqap_hook);
> +
> +	return 0;
> +}
> +
>  /**
>   * vfio_ap_mdev_set_kvm
>   *
> @@ -1107,6 +1128,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>  static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>  				struct kvm *kvm)
>  {
> +	int ret;
>  	struct ap_matrix_mdev *m;
>  
>  	if (kvm->arch.crypto.crycbd) {
> @@ -1115,6 +1137,10 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>  				return -EPERM;
>  		}
>  
> +		ret = vfio_ap_mdev_set_pqap_hook(matrix_mdev, kvm);
> +		if (ret)
> +			return ret;
> +
>  		kvm_get_kvm(kvm);
>  		matrix_mdev->kvm_busy = true;
>  		mutex_unlock(&matrix_dev->lock);
> @@ -1123,7 +1149,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>  					  matrix_mdev->matrix.aqm,
>  					  matrix_mdev->matrix.adm);
>  		mutex_lock(&matrix_dev->lock);
> -		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>  		matrix_mdev->kvm = kvm;
>  		matrix_mdev->kvm_busy = false;
>  		wake_up_all(&matrix_mdev->wait_for_kvm);
> @@ -1161,6 +1186,13 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>  	return NOTIFY_DONE;
>  }
>  
> +static void vfio_ap_mdev_unset_pqap_hook(struct kvm *kvm)
> +{
> +	rcu_assign_pointer(kvm->arch.crypto.pqap_hook, NULL);
> +	synchronize_rcu();
> +	kfree(kvm->arch.crypto.pqap_hook);
> +}
> +
>  /**
>   * vfio_ap_mdev_unset_kvm
>   *
> @@ -1189,11 +1221,11 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>  
>  	if (matrix_mdev->kvm) {
>  		matrix_mdev->kvm_busy = true;
> +		vfio_ap_mdev_unset_pqap_hook(matrix_mdev->kvm);
>  		mutex_unlock(&matrix_dev->lock);
>  		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>  		mutex_lock(&matrix_dev->lock);
>  		vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> -		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>  		kvm_put_kvm(matrix_mdev->kvm);
>  		matrix_mdev->kvm = NULL;
>  		matrix_mdev->kvm_busy = false;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 16:16   ` Jason Gunthorpe
@ 2021-05-19 23:04     ` Tony Krowiak
  2021-05-19 23:22       ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Tony Krowiak @ 2021-05-19 23:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	alex.williamson, kwankhede



On 5/19/21 12:16 PM, Jason Gunthorpe wrote:
> On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote:
>
>> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
>>   		return -EOPNOTSUPP;
>>   
>> -	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>> -	mutex_lock(&matrix_dev->lock);
>> +	rcu_read_lock();
>> +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
>> +	if (!pqap_module_hook) {
>> +		rcu_read_unlock();
>> +		goto set_status;
>> +	}
>>   
>> -	if (!vcpu->kvm->arch.crypto.pqap_hook)
>> -		goto out_unlock;
>> -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>> -				   struct ap_matrix_mdev, pqap_hook);
>> +	matrix_mdev = pqap_module_hook->data;
>> +	rcu_read_unlock();
>> +	mutex_lock(&matrix_dev->lock);
> The matrix_mdev pointer was extracted from the pqap_module_hook,
> but now there is nothing protecting it since the rcu was dropped and
> it gets freed in vfio_ap_mdev_remove.

Therein lies the rub. We can't hold the rcu_read_lock across the
entire time that the interception is being processed because of
wait conditions in the interception handler. Regardless of whether
the pointer to the matrix_mdev is retrieved as the container of
or extracted from the pqap_hook, there is nothing protecting it
and there appears to be no way to do so using RCU.
>
> And, again, module locking doesn't prevent vfio_ap_mdev_remove() from
> being called. None of these patches should be combining module locking
> with RCU.

Is there any other way besides user interaction with the mdev's
sysfs remove interface for the remove callback to get invoked?
If I try to remove the mdev using the sysfs interface while the
mdev fd is still open by the guest, the remove hangs until the
fd is closed. That being the case, the mdev release callback
will get invoked prior to the remove callback being invoked which
renders this whole debate moot. What am I missing here?

>
> Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 17:21   ` Halil Pasic
@ 2021-05-19 23:14     ` Tony Krowiak
  0 siblings, 0 replies; 15+ messages in thread
From: Tony Krowiak @ 2021-05-19 23:14 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	jgg, alex.williamson, kwankhede



On 5/19/21 1:21 PM, Halil Pasic wrote:
> On Wed, 19 May 2021 11:39:21 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> There is currently nothing that controls access to the structure that
>> contains the function pointer to the handler that processes interception of
>> the PQAP(AQIC) instruction. If the mdev is removed while the PQAP(AQIC)
>> instruction is being intercepted, there is a possibility that the function
>> pointer to the handler can get wiped out prior to the attempt to call it.
>>
>> This patch utilizes RCU to synchronize access to the kvm_s390_module_hook
>> structure used to process interception of the PQAP(AQIC) instruction.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h  |  1 +
>>   arch/s390/kvm/priv.c              | 47 ++++++++++++++++-----------
>>   drivers/s390/crypto/vfio_ap_ops.c | 54 ++++++++++++++++++++++++-------
>>   3 files changed, 73 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index 8925f3969478..4987e82d6116 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -806,6 +806,7 @@ struct kvm_s390_cpu_model {
>>   struct kvm_s390_module_hook {
>>   	int (*hook)(struct kvm_vcpu *vcpu);
>>   	struct module *owner;
>> +	void *data;
> I guess you need this, because you stopped using the member of struct
> ap_mdev_matrix and instead you kzalloc() a new object. Yet I don't
> understand why do you do so?

I did so because the mdev remove callback frees the matrix_mdev
and I thought I could protect against accessing freed storage by
storing the pointer in the pqap_hook structure; however, since we
can't hold the rcu_read_lock while the handle_pqap is executing - due
to wait conditions - this turns out to be a bad idea. I'm not sure at
this point that we can use RCU for this because the freeing of the
matrix_mdev is independent of pqap processing.

>
>>   };
>>   
>>   struct kvm_s390_crypto {
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 9928f785c677..2d330dfbdb61 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -610,8 +610,11 @@ static int handle_io_inst(struct kvm_vcpu *vcpu)
>>   static int handle_pqap(struct kvm_vcpu *vcpu)
>>   {
>>   	struct ap_queue_status status = {};
>> +	struct kvm_s390_module_hook *pqap_module_hook;
>> +	int (*pqap_hook)(struct kvm_vcpu *vcpu);
>> +	struct module *owner;
>>   	unsigned long reg0;
>> -	int ret;
>> +	int ret = 0;
>>   	uint8_t fc;
>>   
>>   	/* Verify that the AP instruction are available */
>> @@ -657,24 +660,32 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	 * Verify that the hook callback is registered, lock the owner
>>   	 * and call the hook.
>>   	 */
>> -	if (vcpu->kvm->arch.crypto.pqap_hook) {
>> -		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
>> -			return -EOPNOTSUPP;
>> -		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>> -		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>> -		if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
>> -			kvm_s390_set_psw_cc(vcpu, 3);
>> -		return ret;
>> +	rcu_read_lock();
>> +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
>> +	if (pqap_module_hook) {
>> +		pqap_hook = pqap_module_hook->hook;
>> +		owner = pqap_module_hook->owner;
>> +		rcu_read_unlock();
>> +		if (!try_module_get(owner)) {
> Why do this outside the rcu_read lock?

It should be done inside the lock.

>
> What guarantees that the module ain't gone by this time? I don't think
> try_module_get() is guaranteed to give you false if passed in a pointer
> that points to some memory that ain't a struct module any more
> (use-after-free).

That needs to be inside the lock.

>
>> +			ret = -EOPNOTSUPP;
>> +		} else {
>> +			ret = pqap_hook(vcpu);
>> +			module_put(owner);
>> +			if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
>> +				kvm_s390_set_psw_cc(vcpu, 3);
>> +		}
>> +	} else {
>> +		rcu_read_unlock();
>> +		/*
>> +		 * A vfio_driver must register a hook.
>> +		 * No hook means no driver to enable the SIE CRYCB and no
>> +		 * queues. We send this response to the guest.
>> +		 */
>> +		status.response_code = 0x01;
>> +		memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
>> +		kvm_s390_set_psw_cc(vcpu, 3);
>>   	}
>> -	/*
>> -	 * A vfio_driver must register a hook.
>> -	 * No hook means no driver to enable the SIE CRYCB and no queues.
>> -	 * We send this response to the guest.
>> -	 */
>> -	status.response_code = 0x01;
>> -	memcpy(&vcpu->run->s.regs.gprs[1], &status, sizeof(status));
>> -	kvm_s390_set_psw_cc(vcpu, 3);
>> -	return 0;
>> +	return ret;
>>   }
>>   
>>   static int handle_stfl(struct kvm_vcpu *vcpu)
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index f90c9103dac2..a6aa3f753ac4 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -16,6 +16,7 @@
>>   #include <linux/bitops.h>
>>   #include <linux/kvm_host.h>
>>   #include <linux/module.h>
>> +#include <linux/rcupdate.h>
>>   #include <asm/kvm.h>
>>   #include <asm/zcrypt.h>
>>   
>> @@ -279,6 +280,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	uint64_t status;
>>   	uint16_t apqn;
>>   	struct vfio_ap_queue *q;
>> +	struct kvm_s390_module_hook *pqap_module_hook;
>>   	struct ap_queue_status qstatus = {
>>   			       .response_code = AP_RESPONSE_Q_NOT_AVAIL, };
>>   	struct ap_matrix_mdev *matrix_mdev;
>> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
>>   		return -EOPNOTSUPP;
>>   
>> -	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>> -	mutex_lock(&matrix_dev->lock);
>> +	rcu_read_lock();
>> +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
>> +	if (!pqap_module_hook) {
>> +		rcu_read_unlock();
>> +		goto set_status;
>> +	}
>>   
>> -	if (!vcpu->kvm->arch.crypto.pqap_hook)
>> -		goto out_unlock;
>> -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>> -				   struct ap_matrix_mdev, pqap_hook);
>> +	matrix_mdev = pqap_module_hook->data;
>> +	rcu_read_unlock();
>> +	mutex_lock(&matrix_dev->lock);
> I agree with Jason's assessment. At this point the matrix_dev pointer
> may point to garbage.
>
> Above, I think we can use the pqap_hook function pointer local to
> handle_pqap, because we know that as long as the module is there
> the callback will sit at the same address and won't go away. And
> we do the try_module_get() to ensure that the module stays loaded.

I'll take your word that is true. If so, then I think I can replace
the way we get the matrix_mdev in the pqap handler which
can be done under the matrix_dev->lock. I can write a function
to get the matrix from the list of mdevs in matrix_dev. If the
matrix_mdev has been removed, then we can bail out of the
handle_pqap function. Maybe the RCU can be used after all.

>
>
>> +	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>>   
>>   	/*
>>   	 * If the KVM pointer is in the process of being set, wait until the
>> @@ -322,9 +328,10 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   		qstatus = vfio_ap_irq_disable(q);
>>   
>>   out_unlock:
>> +	mutex_unlock(&matrix_dev->lock);
>> +set_status:
>>   	memcpy(&vcpu->run->s.regs.gprs[1], &qstatus, sizeof(qstatus));
>>   	vcpu->run->s.regs.gprs[1] >>= 32;
>> -	mutex_unlock(&matrix_dev->lock);
>>   	return 0;
>>   }
>>   
>> @@ -353,8 +360,6 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)
>>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>>   	init_waitqueue_head(&matrix_mdev->wait_for_kvm);
>>   	mdev_set_drvdata(mdev, matrix_mdev);
>> -	matrix_mdev->pqap_hook.hook = handle_pqap;
>> -	matrix_mdev->pqap_hook.owner = THIS_MODULE;
> I guess the member of struct ap_matrix_mdev is still around, it will
> remain all zero. Is this somehow intentional?
>
>
>>   	mutex_lock(&matrix_dev->lock);
>>   	list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
>>   	mutex_unlock(&matrix_dev->lock);
>> @@ -1085,6 +1090,22 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>>   	NULL
>>   };
>>   
>> +static int vfio_ap_mdev_set_pqap_hook(struct ap_matrix_mdev *matrix_mdev,
>> +				       struct kvm *kvm)
>> +{
>> +	struct kvm_s390_module_hook *pqap_hook;
>> +
>> +	pqap_hook = kmalloc(sizeof(*kvm->arch.crypto.pqap_hook), GFP_KERNEL);
> What is the extra allocation supposed to buy us?
>
>> +	if (!pqap_hook)
>> +		return -ENOMEM;
>> +	pqap_hook->data = matrix_mdev;
>> +	pqap_hook->hook = handle_pqap;
>> +	pqap_hook->owner = THIS_MODULE;
>> +	rcu_assign_pointer(kvm->arch.crypto.pqap_hook, pqap_hook);
>> +
>> +	return 0;
>> +}
>> +
>>   /**
>>    * vfio_ap_mdev_set_kvm
>>    *
>> @@ -1107,6 +1128,7 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
>>   static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>   				struct kvm *kvm)
>>   {
>> +	int ret;
>>   	struct ap_matrix_mdev *m;
>>   
>>   	if (kvm->arch.crypto.crycbd) {
>> @@ -1115,6 +1137,10 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>   				return -EPERM;
>>   		}
>>   
>> +		ret = vfio_ap_mdev_set_pqap_hook(matrix_mdev, kvm);
>> +		if (ret)
>> +			return ret;
>> +
>>   		kvm_get_kvm(kvm);
>>   		matrix_mdev->kvm_busy = true;
>>   		mutex_unlock(&matrix_dev->lock);
>> @@ -1123,7 +1149,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>   					  matrix_mdev->matrix.aqm,
>>   					  matrix_mdev->matrix.adm);
>>   		mutex_lock(&matrix_dev->lock);
>> -		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>>   		matrix_mdev->kvm = kvm;
>>   		matrix_mdev->kvm_busy = false;
>>   		wake_up_all(&matrix_mdev->wait_for_kvm);
>> @@ -1161,6 +1186,13 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>>   	return NOTIFY_DONE;
>>   }
>>   
>> +static void vfio_ap_mdev_unset_pqap_hook(struct kvm *kvm)
>> +{
>> +	rcu_assign_pointer(kvm->arch.crypto.pqap_hook, NULL);
>> +	synchronize_rcu();
>> +	kfree(kvm->arch.crypto.pqap_hook);
>> +}
>> +
>>   /**
>>    * vfio_ap_mdev_unset_kvm
>>    *
>> @@ -1189,11 +1221,11 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>>   
>>   	if (matrix_mdev->kvm) {
>>   		matrix_mdev->kvm_busy = true;
>> +		vfio_ap_mdev_unset_pqap_hook(matrix_mdev->kvm);
>>   		mutex_unlock(&matrix_dev->lock);
>>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>   		mutex_lock(&matrix_dev->lock);
>>   		vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
>> -		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>>   		kvm_put_kvm(matrix_mdev->kvm);
>>   		matrix_mdev->kvm = NULL;
>>   		matrix_mdev->kvm_busy = false;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 23:04     ` Tony Krowiak
@ 2021-05-19 23:22       ` Jason Gunthorpe
  2021-05-20  1:08         ` Tony Krowiak
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2021-05-19 23:22 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	alex.williamson, kwankhede

On Wed, May 19, 2021 at 07:04:46PM -0400, Tony Krowiak wrote:
> 
> 
> On 5/19/21 12:16 PM, Jason Gunthorpe wrote:
> > On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote:
> > 
> > > @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> > >   	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
> > >   		return -EOPNOTSUPP;
> > > -	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> > > -	mutex_lock(&matrix_dev->lock);
> > > +	rcu_read_lock();
> > > +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
> > > +	if (!pqap_module_hook) {
> > > +		rcu_read_unlock();
> > > +		goto set_status;
> > > +	}
> > > -	if (!vcpu->kvm->arch.crypto.pqap_hook)
> > > -		goto out_unlock;
> > > -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> > > -				   struct ap_matrix_mdev, pqap_hook);
> > > +	matrix_mdev = pqap_module_hook->data;
> > > +	rcu_read_unlock();
> > > +	mutex_lock(&matrix_dev->lock);
> > The matrix_mdev pointer was extracted from the pqap_module_hook,
> > but now there is nothing protecting it since the rcu was dropped and
> > it gets freed in vfio_ap_mdev_remove.
> 
> Therein lies the rub. We can't hold the rcu_read_lock across the
> entire time that the interception is being processed because of
> wait conditions in the interception handler. Regardless of whether
> the pointer to the matrix_mdev is retrieved as the container of
> or extracted from the pqap_hook, there is nothing protecting it
> and there appears to be no way to do so using RCU.

RCU is a lock that should only be used for highly performance
sensitive read work loads. It eliminates one atomic from a lock, but
if you go on to immediately do something like try_module_get, which
has an atomic inside, the whole thing is pointless (assuming
try_module_get was even the right thing to do)

Use a simple sleepable rwsem around the whole thing and forget about
the module_get. Hold the write side when NULL'ing the pointer.

> > And, again, module locking doesn't prevent vfio_ap_mdev_remove() from
> > being called. None of these patches should be combining module locking
> > with RCU.
> 
> Is there any other way besides user interaction with the mdev's
> sysfs remove interface for the remove callback to get invoked?

There are more options after my re-organizing series.

But I'm not sure why you ask in response to my point about module
locking? Module locking is not really related to sysfs.

> If I try to remove the mdev using the sysfs interface while the
> mdev fd is still open by the guest, the remove hangs until the
> fd is closed.

Yes, it will wait when the vfio_device is being destroyed.

> That being the case, the mdev release callback will get invoked
> prior to the remove callback being invoked which renders this whole
> debate moot. What am I missing here?

AFAICT the control flow is such that release can immediately move on
to remove on the same CPU without an additional synchronization. So
the kfree can still race with the above.

But the more I look at this the wonkier it is.. The real issue is not
the matrix_mdev, it is the whole vcpu->kvm->arch.crypto.pqap_hook

This is nonesense too:

	if (vcpu->kvm->arch.crypto.pqap_hook) {
		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
			return -EOPNOTSUPP;
		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

It should have a lock around it of some kind, not a
try_module_get. module_get is not la lock.

And that lock should interact with loading the hook in the first
place:
		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;

And of course NULLin'g the hooke should be synchronous with the lock.

There should be no owner for something like this. unregistering the
function pointer should fully fence all access.

The simple solution in sketch is just this:

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 9928f785c6773a..f70386452367dd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -657,13 +657,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	 * Verify that the hook callback is registered, lock the owner
 	 * and call the hook.
 	 */
-	if (vcpu->kvm->arch.crypto.pqap_hook) {
-		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
-			return -EOPNOTSUPP;
+	if (down_read_trylock(&vcpu->kvm->arch.crypto.rwsem) &&
+	    vcpu->kvm->arch.crypto.pqap_hook) {
 		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
-		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
 		if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
 			kvm_s390_set_psw_cc(vcpu, 3);
+		up_read(&vcpu->kv->arch.crypto.rwsem);
 		return ret;
 	}
 	/*
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index b2c7e10dfdcdcf..64c89f6a711e94 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -352,8 +352,7 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
 	init_waitqueue_head(&matrix_mdev->wait_for_kvm);
 	mdev_set_drvdata(mdev, matrix_mdev);
-	matrix_mdev->pqap_hook.hook = handle_pqap;
-	matrix_mdev->pqap_hook.owner = THIS_MODULE;
+	down_write(&&vcpu->kvm->arch.crypto.rwsem);
 	mutex_lock(&matrix_dev->lock);
 	list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
 	mutex_unlock(&matrix_dev->lock);
@@ -1132,7 +1131,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 					  matrix_mdev->matrix.aqm,
 					  matrix_mdev->matrix.adm);
 		mutex_lock(&matrix_dev->lock);
+		down_write(&kvm->arch.crypto.rwsem);
 		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
+		up_write(&kvm->arch.crypto.rwsem);
 		matrix_mdev->kvm = kvm;
 		matrix_mdev->kvm_busy = false;
 		wake_up_all(&matrix_mdev->wait_for_kvm);
@@ -1202,7 +1203,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
 		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
 		mutex_lock(&matrix_dev->lock);
 		vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+		down_write(&matrix_mdev->kvm->arch.crypto.rwsem);
 		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+		up_write(&matrix_mdev->kvm->arch.crypto.rwsem);
 		kvm_put_kvm(matrix_mdev->kvm);
 		matrix_mdev->kvm = NULL;
 		matrix_mdev->kvm_busy = false;


Jason

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 23:22       ` Jason Gunthorpe
@ 2021-05-20  1:08         ` Tony Krowiak
  2021-05-20  8:48           ` Halil Pasic
  2021-05-20  8:38         ` Halil Pasic
  2021-05-21 18:24         ` Tony Krowiak
  2 siblings, 1 reply; 15+ messages in thread
From: Tony Krowiak @ 2021-05-20  1:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	alex.williamson, kwankhede



On 5/19/21 7:22 PM, Jason Gunthorpe wrote:
> On Wed, May 19, 2021 at 07:04:46PM -0400, Tony Krowiak wrote:
>>
>> On 5/19/21 12:16 PM, Jason Gunthorpe wrote:
>>> On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote:
>>>
>>>> @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>>>    	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
>>>>    		return -EOPNOTSUPP;
>>>> -	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
>>>> -	mutex_lock(&matrix_dev->lock);
>>>> +	rcu_read_lock();
>>>> +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
>>>> +	if (!pqap_module_hook) {
>>>> +		rcu_read_unlock();
>>>> +		goto set_status;
>>>> +	}
>>>> -	if (!vcpu->kvm->arch.crypto.pqap_hook)
>>>> -		goto out_unlock;
>>>> -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>>>> -				   struct ap_matrix_mdev, pqap_hook);
>>>> +	matrix_mdev = pqap_module_hook->data;
>>>> +	rcu_read_unlock();
>>>> +	mutex_lock(&matrix_dev->lock);
>>> The matrix_mdev pointer was extracted from the pqap_module_hook,
>>> but now there is nothing protecting it since the rcu was dropped and
>>> it gets freed in vfio_ap_mdev_remove.
>> Therein lies the rub. We can't hold the rcu_read_lock across the
>> entire time that the interception is being processed because of
>> wait conditions in the interception handler. Regardless of whether
>> the pointer to the matrix_mdev is retrieved as the container of
>> or extracted from the pqap_hook, there is nothing protecting it
>> and there appears to be no way to do so using RCU.
> RCU is a lock that should only be used for highly performance
> sensitive read work loads. It eliminates one atomic from a lock, but
> if you go on to immediately do something like try_module_get, which
> has an atomic inside, the whole thing is pointless (assuming
> try_module_get was even the right thing to do)
>
> Use a simple sleepable rwsem around the whole thing and forget about
> the module_get. Hold the write side when NULL'ing the pointer.
>
>>> And, again, module locking doesn't prevent vfio_ap_mdev_remove() from
>>> being called. None of these patches should be combining module locking
>>> with RCU.
>> Is there any other way besides user interaction with the mdev's
>> sysfs remove interface for the remove callback to get invoked?
> There are more options after my re-organizing series.
>
> But I'm not sure why you ask in response to my point about module
> locking? Module locking is not really related to sysfs.

I don't know why you keep harping on module locking. I didn't
write the original code, so I really have no idea why that was
incorporated. The question wasn't in response to module
locking, it's just something that popped into my head at the
time because based on my observations, the remove callback
did not get invoked in response to a user removing the mdev
via the sysfs interface until the mdev fd was closed by the
guest.


>
>> If I try to remove the mdev using the sysfs interface while the
>> mdev fd is still open by the guest, the remove hangs until the
>> fd is closed.
> Yes, it will wait when the vfio_device is being destroyed.
>
>> That being the case, the mdev release callback will get invoked
>> prior to the remove callback being invoked which renders this whole
>> debate moot. What am I missing here?
> AFAICT the control flow is such that release can immediately move on
> to remove on the same CPU without an additional synchronization. So
> the kfree can still race with the above.

I'd have to look into it more, but my initial thought is that this is
not true. If what I observed is true - i.e., the mdev remove callback
won't get invoked until the mdev fd is closed by the guest - then the
matrix_mdev will never get freed while the pqap_hook is not NULL
because the mdev release callback - invoked when the mdev fd is
closed - nullifies it. That is why I asked whether there is any other
way that the mdev remove callback gets invoked other than
when a user removes it via the sysfs remove attribute. Keep in mind,
the matrix_mdev is removed under the matrix_dev->lock. If retrieve
the matix_mdev in the interception handler from the matrix_dev->mdev_list
under the matrix_dev->lock - instead of from the pqap_hook or the
container_of macro - and exit the handler if it has been removed,
then I think the race issue can be avoided.

>
> But the more I look at this the wonkier it is.. The real issue is not
> the matrix_mdev, it is the whole vcpu->kvm->arch.crypto.pqap_hook

Actually, the matrix_mdev is the issue. If the handle_pqap
callback is invoked after the matrix_mdev is freed, then
because we retrieve it from the pqap_hook structure or
even if we retrieve it from the enclosing container, the
interception handler is screwed.

>
> This is nonesense too:
>
> 	if (vcpu->kvm->arch.crypto.pqap_hook) {
> 		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> 			return -EOPNOTSUPP;
> 		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
>
> It should have a lock around it of some kind, not a
> try_module_get. module_get is not la lock.

As I said earlier, I don't know why the author did this. My best guess
is that he wanted to ensure that the module was still loaded; otherwise,
the data structures contained therein - for example, the pqap_hook
and matrix_mdev that contains it - would be gonzo.

>
> And that lock should interact with loading the hook in the first
> place:
> 		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>
> And of course NULLin'g the hooke should be synchronous with the lock.
>
> There should be no owner for something like this. unregistering the
> function pointer should fully fence all access.
>
> The simple solution in sketch is just this:
>
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 9928f785c6773a..f70386452367dd 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -657,13 +657,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>   	 * Verify that the hook callback is registered, lock the owner
>   	 * and call the hook.
>   	 */
> -	if (vcpu->kvm->arch.crypto.pqap_hook) {
> -		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> -			return -EOPNOTSUPP;
> +	if (down_read_trylock(&vcpu->kvm->arch.crypto.rwsem) &&
> +	    vcpu->kvm->arch.crypto.pqap_hook) {
>   		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> -		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>   		if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
>   			kvm_s390_set_psw_cc(vcpu, 3);
> +		up_read(&vcpu->kv->arch.crypto.rwsem);
>   		return ret;
>   	}

Since reading the various review comments related to this patch,
I started working on something along the lines of that which
you propose. I will post a new set of patches, probably tomorrow.

>   	/*
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index b2c7e10dfdcdcf..64c89f6a711e94 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -352,8 +352,7 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)
>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>   	init_waitqueue_head(&matrix_mdev->wait_for_kvm);
>   	mdev_set_drvdata(mdev, matrix_mdev);
> -	matrix_mdev->pqap_hook.hook = handle_pqap;
> -	matrix_mdev->pqap_hook.owner = THIS_MODULE;
> +	down_write(&&vcpu->kvm->arch.crypto.rwsem);
>   	mutex_lock(&matrix_dev->lock);
>   	list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
>   	mutex_unlock(&matrix_dev->lock);
> @@ -1132,7 +1131,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>   					  matrix_mdev->matrix.aqm,
>   					  matrix_mdev->matrix.adm);
>   		mutex_lock(&matrix_dev->lock);
> +		down_write(&kvm->arch.crypto.rwsem);
>   		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> +		up_write(&kvm->arch.crypto.rwsem);
>   		matrix_mdev->kvm = kvm;
>   		matrix_mdev->kvm_busy = false;
>   		wake_up_all(&matrix_mdev->wait_for_kvm);
> @@ -1202,7 +1203,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>   		mutex_lock(&matrix_dev->lock);
>   		vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> +		down_write(&matrix_mdev->kvm->arch.crypto.rwsem);
>   		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> +		up_write(&matrix_mdev->kvm->arch.crypto.rwsem);
>   		kvm_put_kvm(matrix_mdev->kvm);
>   		matrix_mdev->kvm = NULL;
>   		matrix_mdev->kvm_busy = false;
>
>
> Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 23:22       ` Jason Gunthorpe
  2021-05-20  1:08         ` Tony Krowiak
@ 2021-05-20  8:38         ` Halil Pasic
  2021-05-20 12:51           ` Jason Gunthorpe
  2021-05-21 18:24         ` Tony Krowiak
  2 siblings, 1 reply; 15+ messages in thread
From: Halil Pasic @ 2021-05-20  8:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tony Krowiak, linux-s390, linux-kernel, borntraeger, cohuck,
	pasic, jjherne, alex.williamson, kwankhede

On Wed, 19 May 2021 20:22:02 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, May 19, 2021 at 07:04:46PM -0400, Tony Krowiak wrote:
> > 
> > 
> > On 5/19/21 12:16 PM, Jason Gunthorpe wrote:  
> > > On Wed, May 19, 2021 at 11:39:21AM -0400, Tony Krowiak wrote:
> > >   
> > > > @@ -287,13 +289,17 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> > > >   	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
> > > >   		return -EOPNOTSUPP;
> > > > -	apqn = vcpu->run->s.regs.gprs[0] & 0xffff;
> > > > -	mutex_lock(&matrix_dev->lock);
> > > > +	rcu_read_lock();
> > > > +	pqap_module_hook = rcu_dereference(vcpu->kvm->arch.crypto.pqap_hook);
> > > > +	if (!pqap_module_hook) {
> > > > +		rcu_read_unlock();
> > > > +		goto set_status;
> > > > +	}
> > > > -	if (!vcpu->kvm->arch.crypto.pqap_hook)
> > > > -		goto out_unlock;
> > > > -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> > > > -				   struct ap_matrix_mdev, pqap_hook);
> > > > +	matrix_mdev = pqap_module_hook->data;
> > > > +	rcu_read_unlock();
> > > > +	mutex_lock(&matrix_dev->lock);  
> > > The matrix_mdev pointer was extracted from the pqap_module_hook,
> > > but now there is nothing protecting it since the rcu was dropped and
> > > it gets freed in vfio_ap_mdev_remove.  
> > 
> > Therein lies the rub. We can't hold the rcu_read_lock across the
> > entire time that the interception is being processed because of
> > wait conditions in the interception handler. Regardless of whether
> > the pointer to the matrix_mdev is retrieved as the container of
> > or extracted from the pqap_hook, there is nothing protecting it
> > and there appears to be no way to do so using RCU.  
> 
> RCU is a lock that should only be used for highly performance
> sensitive read work loads. 

This is not a highly performance sensitive read workload.

> It eliminates one atomic from a lock, but
> if you go on to immediately do something like try_module_get, which
> has an atomic inside, the whole thing is pointless (assuming
> try_module_get was even the right thing to do)

I'm not sure about this argument, or that RCU should be used only for
highly performance sensitive read workloads. Can you please elaborate on
the argument above and also put your statement in perspective with
https://lwn.net/Articles/263130/?

@Christian: Since you proposed RCU for this, I guess your opinion
does not align with Jason's.

> 
> Use a simple sleepable rwsem around the whole thing and forget about
> the module_get. Hold the write side when NULL'ing the pointer.
> 

Yes a simple sleepable lock would work, and we wouldn't need
get_module(). Because before the vfio_ap module unloads, it
sets all vcpu->kvm->arch.crypto.pqap_hook instances to NULL. So if
we know that vcpu->kvm->arch.crypto.pqap_hook then we know
that the still have valid references to the module.

> > > And, again, module locking doesn't prevent vfio_ap_mdev_remove() from
> > > being called. None of these patches should be combining module locking
> > > with RCU.  
> > 
> > Is there any other way besides user interaction with the mdev's
> > sysfs remove interface for the remove callback to get invoked?  
> 
> There are more options after my re-organizing series.
> 
> But I'm not sure why you ask in response to my point about module
> locking? Module locking is not really related to sysfs.
> 
> > If I try to remove the mdev using the sysfs interface while the
> > mdev fd is still open by the guest, the remove hangs until the
> > fd is closed.  
> 
> Yes, it will wait when the vfio_device is being destroyed.
> 
> > That being the case, the mdev release callback will get invoked
> > prior to the remove callback being invoked which renders this whole
> > debate moot. What am I missing here?  
> 
> AFAICT the control flow is such that release can immediately move on
> to remove on the same CPU without an additional synchronization. So
> the kfree can still race with the above.
> 
> But the more I look at this the wonkier it is.. The real issue is not
> the matrix_mdev, it is the whole vcpu->kvm->arch.crypto.pqap_hook
> 
> This is nonesense too:
> 
> 	if (vcpu->kvm->arch.crypto.pqap_hook) {
> 		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> 			return -EOPNOTSUPP;
> 		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);


> 
> It should have a lock around it of some kind, not a
> try_module_get. module_get is not la lock.

I tend to agree. In fact I asked for a lock around it several times:
https://www.lkml.org/lkml/2019/3/1/260
https://lkml.org/lkml/2020/12/3/987
https://lkml.org/lkml/2020/12/4/994

But in my opinion RCU is also viable (not this patch). I think, I prefer
a lock for simplicity, unless it is not (deadlocks) ;).

Many thanks for bringing your perspective to this. I'm optimistic about
getting this finally addressed properly.

Regards,
Halil





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-20  1:08         ` Tony Krowiak
@ 2021-05-20  8:48           ` Halil Pasic
  2021-05-20 12:26             ` Jason Gunthorpe
  0 siblings, 1 reply; 15+ messages in thread
From: Halil Pasic @ 2021-05-20  8:48 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: Jason Gunthorpe, linux-s390, linux-kernel, borntraeger, cohuck,
	pasic, jjherne, alex.williamson, kwankhede

On Wed, 19 May 2021 21:08:15 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >
> > This is nonesense too:
> >
> > 	if (vcpu->kvm->arch.crypto.pqap_hook) {
> > 		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> > 			return -EOPNOTSUPP;
> > 		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> >
> > It should have a lock around it of some kind, not a
> > try_module_get. module_get is not la lock.  
> 
> As I said earlier, I don't know why the author did this. 

Please have a look at these links from the archive to get some
perspective:
https://lkml.org/lkml/2020/12/4/994
https://lkml.org/lkml/2020/12/3/987
https://www.lkml.org/lkml/2019/3/1/260

We can ask the original author, but I don't think we have to. BTW the
patch that introduced it has your r-b.

> My best guess
> is that he wanted to ensure that the module was still loaded; otherwise,
> the data structures contained therein - for example, the pqap_hook
> and matrix_mdev that contains it - would be gonzo.

More precisely prevent the module from unloading while we execute code
from it. As I've pointed out in a previous email the module may be gone
by the time we call try_module_get().

Regards,
Halil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-20  8:48           ` Halil Pasic
@ 2021-05-20 12:26             ` Jason Gunthorpe
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2021-05-20 12:26 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Tony Krowiak, linux-s390, linux-kernel, borntraeger, cohuck,
	pasic, jjherne, alex.williamson, kwankhede

On Thu, May 20, 2021 at 10:48:57AM +0200, Halil Pasic wrote:
> On Wed, 19 May 2021 21:08:15 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> 
> > >
> > > This is nonesense too:
> > >
> > > 	if (vcpu->kvm->arch.crypto.pqap_hook) {
> > > 		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> > > 			return -EOPNOTSUPP;
> > > 		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);
> > >
> > > It should have a lock around it of some kind, not a
> > > try_module_get. module_get is not la lock.  
> > 
> > As I said earlier, I don't know why the author did this. 
> 
> Please have a look at these links from the archive to get some
> perspective:
> https://lkml.org/lkml/2020/12/4/994
> https://lkml.org/lkml/2020/12/3/987
> https://www.lkml.org/lkml/2019/3/1/260
> 
> We can ask the original author, but I don't think we have to. BTW the
> patch that introduced it has your r-b.
> 
> > My best guess
> > is that he wanted to ensure that the module was still loaded; otherwise,
> > the data structures contained therein - for example, the pqap_hook
> > and matrix_mdev that contains it - would be gonzo.
> 
> More precisely prevent the module from unloading while we execute code
> from it. As I've pointed out in a previous email the module may be gone
> by the time we call try_module_get().

No, this is a common misconception.

The module_get prevents the module from even being attempted to be
unloaded. Code should acquire this if it has done something that would
cause a module remove function hang indefinitely, such as a design
that waits for a user FD to close.

This provides a good user experience but should generally not be
required for correctness.

All code passing function pointers across subsystems should always
fully fence those function pointers during removal. This means it
interacts with some kind of locking that guarentees nothing is
currently calling, or ever will call again, those function pointers.

This is not just to protect the function pointer code itself, but the
lock should also protect the data access that function pointer almost
always invokes. This is the bug here, ap is accessing the matrix_dev
data from a function pointer without any locking or serialization
against kfree(matrix_dev). Fencing to guarentee the hook isn't and
won't run also serves as a strong enough serialization to allow the
kfree().

The basic logic is that a module removal cannot complete until all
its function pointers have been removed from everywhere and all the
locking that protect those removals are satisified.

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-20  8:38         ` Halil Pasic
@ 2021-05-20 12:51           ` Jason Gunthorpe
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2021-05-20 12:51 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Tony Krowiak, linux-s390, linux-kernel, borntraeger, cohuck,
	pasic, jjherne, alex.williamson, kwankhede

On Thu, May 20, 2021 at 10:38:29AM +0200, Halil Pasic wrote:

> > It eliminates one atomic from a lock, but
> > if you go on to immediately do something like try_module_get, which
> > has an atomic inside, the whole thing is pointless (assuming
> > try_module_get was even the right thing to do)
> 
> I'm not sure about this argument, or that RCU should be used only for
> highly performance sensitive read workloads. 

Fundamentally RCU is a technique for building a read/write lock that
avoids an atomic incr on the read side. This comes at the cost of
significantly slowing down the write side.

Everything about RCU is very complicated, people typically use it
wrong.

People use it wrong so often than Paul wrote a very long list of
guidelines to help understand if it is being used properly:

  Documentation/RCU/checklist.rst

If you haven't memorized this document then you probably shouldn't be
using RCU at all :)

> Can you please elaborate on the argument above and also put your
> statement in perspective with https://lwn.net/Articles/263130/?

This article doesn't talk much about the downsides - and focuses on
performance win. If you need the performance then sure use RCU, but
RCU is not a general purpose replacement for a rwsem. People should
*always* reach for a rwsem first. Design that properly and then ask
themselves if the rwsem can or should be optimized to a RCU.

> Yes a simple sleepable lock would work, and we wouldn't need
> get_module(). Because before the vfio_ap module unloads, it
> sets all vcpu->kvm->arch.crypto.pqap_hook instances to NULL. So if
> we know that vcpu->kvm->arch.crypto.pqap_hook then we know
> that the still have valid references to the module.

Yes
 
> But in my opinion RCU is also viable (not this patch). I think, I prefer
> a lock for simplicity, unless it is not (deadlocks) ;).

As soon as you said you needed to sleep RCU stopped being viable. To
make a sleepable region you need to do an atomic write and at that
instant all the value of RCU was destroyed - use a normal rwsem.

There is SRCU that solves the sleepable problem, but it has an
incredible cost in both write side performance and memory usage that
should only be reached for if high read side performance is really
required.

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-19 23:22       ` Jason Gunthorpe
  2021-05-20  1:08         ` Tony Krowiak
  2021-05-20  8:38         ` Halil Pasic
@ 2021-05-21 18:24         ` Tony Krowiak
  2021-05-21 18:30           ` Jason Gunthorpe
  2 siblings, 1 reply; 15+ messages in thread
From: Tony Krowiak @ 2021-05-21 18:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	alex.williamson, kwankhede


> The simple solution in sketch is just this:

The code below results in a lockdep WARN_ON for the
reasons documented in my comments interspersed in
the code.

After trying several different permutations using RCU and
an rw_semaphore, I came to the conclusion that setting and
clearing the hook pointer while the mdev fd is being open
and closed or when the mdev is being removed unnecessarily
complicates things. There is no good reason to set/clear the
function pointer at this time, nor is there any compelling
reason to store the function pointer in a satellite structure
of the kvm struct. Since the hook function's lifespan coincides
with the lifespan of the vfio_ap module, why not store it
when the module is loaded and clear it when the module is
unloaded? Access to the function pointer can be controlled by a lock
that is independent of the matrix_dev->lock, thus avoiding
potential lock dependencies. Access to the mdev is controlled by
the matrix_dev lock, so if the mdev is retrieved from the
matrix_dev->mdev_list in the hook function, then we are assured
that the mdev will never be accessed after it is freed; problem solved.

>
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 9928f785c6773a..f70386452367dd 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -657,13 +657,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>   	 * Verify that the hook callback is registered, lock the owner
>   	 * and call the hook.
>   	 */
> -	if (vcpu->kvm->arch.crypto.pqap_hook) {
> -		if (!try_module_get(vcpu->kvm->arch.crypto.pqap_hook->owner))
> -			return -EOPNOTSUPP;
> +	if (down_read_trylock(&vcpu->kvm->arch.crypto.rwsem) &&
> +	    vcpu->kvm->arch.crypto.pqap_hook) {
>   		ret = vcpu->kvm->arch.crypto.pqap_hook->hook(vcpu);

So, the hook function (handle_pqap in vfio_ap_ops.c) is executed while
holding the rwsem lock. The hook function tries to lock the matrix_dev->lock
mutex.

> -		module_put(vcpu->kvm->arch.crypto.pqap_hook->owner);
>   		if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
>   			kvm_s390_set_psw_cc(vcpu, 3);
> +		up_read(&vcpu->kv->arch.crypto.rwsem);
>   		return ret;
>   	}
>   	/*
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index b2c7e10dfdcdcf..64c89f6a711e94 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -352,8 +352,7 @@ static int vfio_ap_mdev_create(struct mdev_device *mdev)
>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>   	init_waitqueue_head(&matrix_mdev->wait_for_kvm);
>   	mdev_set_drvdata(mdev, matrix_mdev);
> -	matrix_mdev->pqap_hook.hook = handle_pqap;
> -	matrix_mdev->pqap_hook.owner = THIS_MODULE;
> +	down_write(&&vcpu->kvm->arch.crypto.rwsem);
>   	mutex_lock(&matrix_dev->lock);
>   	list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
>   	mutex_unlock(&matrix_dev->lock);
> @@ -1132,7 +1131,9 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>   					  matrix_mdev->matrix.aqm,
>   					  matrix_mdev->matrix.adm);
>   		mutex_lock(&matrix_dev->lock);

Locks the matrix_dev->lock mutex, then tries to lock rwsem
to store the pqap_hook under the rwsem lock. During testing,
this occurred while the interception of the PQAP instruction
was taking place, so the read lock was already taken and the
hook function was waiting on the matrix_dev->lock taken above.
All of the test cases ran successfully to completion, so there
didn't appear to be a deadlock of any sort, but lockdep apparently
detected a problem.

I was able to get around this by doing the down_write and setting
the hook vfio_ap_mdev_group_notifier function before calling
this function (above) and before taking the matrix_dev->lock,
but that circumvents the protection against accessing a matrix_dev
that's already been freed.


> +		down_write(&kvm->arch.crypto.rwsem);
>   		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> +		up_write(&kvm->arch.crypto.rwsem);
>   		matrix_mdev->kvm = kvm;
>   		matrix_mdev->kvm_busy = false;
>   		wake_up_all(&matrix_mdev->wait_for_kvm);
> @@ -1202,7 +1203,9 @@ static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>   		mutex_lock(&matrix_dev->lock);
>   		vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> +		down_write(&matrix_mdev->kvm->arch.crypto.rwsem);

Same scenario here.

>   		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> +		up_write(&matrix_mdev->kvm->arch.crypto.rwsem);
>   		kvm_put_kvm(matrix_mdev->kvm);
>   		matrix_mdev->kvm = NULL;
>   		matrix_mdev->kvm_busy = false;
>
>
> Jason


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler
  2021-05-21 18:24         ` Tony Krowiak
@ 2021-05-21 18:30           ` Jason Gunthorpe
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2021-05-21 18:30 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, borntraeger, cohuck, pasic, jjherne,
	alex.williamson, kwankhede

On Fri, May 21, 2021 at 02:24:33PM -0400, Tony Krowiak wrote:
> 
> > The simple solution in sketch is just this:
> 
> The code below results in a lockdep WARN_ON for the
> reasons documented in my comments interspersed in
> the code.

Sure, to be expected for a 2 min effort, but you seem to have solved
it by trivially putting things in the right locking order?

> After trying several different permutations using RCU and
> an rw_semaphore, I came to the conclusion that setting and
> clearing the hook pointer while the mdev fd is being open
> and closed or when the mdev is being removed unnecessarily
> complicates things. There is no good reason to set/clear the
> function pointer at this time, nor is there any compelling
> reason to store the function pointer in a satellite structure
> of the kvm struct. Since the hook function's lifespan coincides
> with the lifespan of the vfio_ap module, why not store it
> when the module is loaded and clear it when the module is
> unloaded? 

Well, the hook function isn't really the problem..

> Access to the function pointer can be controlled by a lock
> that is independent of the matrix_dev->lock, thus avoiding
> potential lock dependencies. Access to the mdev is controlled by
> the matrix_dev lock, so if the mdev is retrieved from the
> matrix_dev->mdev_list in the hook function, then we are assured
> that the mdev will never be accessed after it is freed; problem solved.

This just transforms the problem into needing to hold a lock around
mdev_list while accessing a member of the mdev_list

Is it simpler?

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-21 18:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 15:39 [PATCH v3 0/2] s390/vfio-ap: fix memory leak in mdev remove callback Tony Krowiak
2021-05-19 15:39 ` [PATCH v3 1/2] " Tony Krowiak
2021-05-19 15:39 ` [PATCH v3 2/2] s390/vfio-ap: control access to PQAP(AQIC) interception handler Tony Krowiak
2021-05-19 16:16   ` Jason Gunthorpe
2021-05-19 23:04     ` Tony Krowiak
2021-05-19 23:22       ` Jason Gunthorpe
2021-05-20  1:08         ` Tony Krowiak
2021-05-20  8:48           ` Halil Pasic
2021-05-20 12:26             ` Jason Gunthorpe
2021-05-20  8:38         ` Halil Pasic
2021-05-20 12:51           ` Jason Gunthorpe
2021-05-21 18:24         ` Tony Krowiak
2021-05-21 18:30           ` Jason Gunthorpe
2021-05-19 17:21   ` Halil Pasic
2021-05-19 23:14     ` Tony Krowiak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).