linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/1] s390/vfio-ap: fix circular lockdep when starting SE guest
@ 2021-03-02 20:43 Tony Krowiak
  2021-03-02 20:43 ` [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks Tony Krowiak
  0 siblings, 1 reply; 11+ messages in thread
From: Tony Krowiak @ 2021-03-02 20:43 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: stable, borntraeger, cohuck, kwankhede, pbonzini,
	alex.williamson, pasic, Tony Krowiak

*Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
lockdep when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM 
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks be done under the matrix_dev->lock when the driver is
notified, the masks will not be updated under the matrix_dev->lock. The
lock is necessary for the setting/unsetting of the KVM pointer, however,
so that will remain in place. 

The dependency chain for the circular lockdep resolved by this patch 
is (in reverse order):

2:	vfio_ap_mdev_group_notifier:	kvm->lock
					matrix_dev->lock

1:	handle_pqap:			matrix_dev->lock
	kvm_vcpu_ioctl:			vcpu->mutex

0:	kvm_s390_cpus_to_pv:		vcpu->mutex
	kvm_vm_ioctl:  			kvm->lock

Please note:
-----------
* If checkpatch is run against this patch series, you may
  get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not 
  pulled?" message. The commit 'f21916ec4826', however, is definitely
  in the master branch on top of which this patch series was built, so I'm
 not sure why this message is being output by checkpatch.
* All acks granted from previous review of this patch have been removed due
  to the fact that this patch introduces non-trivial changes (see change
  log below).

Change log v2=> v3:
------------------ 
* Added two fields - 'bool kvm_busy' and 'wait_queue_head_t wait_for_kvm' -
  fields to struct ap_matrix_mdev. The former indicates that the KVM
  pointer is in the process of being updated and the second allows a
  function that needs access to the KVM pointer to wait until it is
  no longer being updated. Resolves problem of synchronization between
  the functions that change the KVM pointer value and the functions that
  required access to it.

Change log v1=> v2:
------------------
* No longer holding the matrix_dev->lock prior to setting/clearing the
  masks supplying the AP configuration to a KVM guest.
* Make all updates to the data in the matrix mdev that is used to manage
  AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm() function
  instead of the group notifier callback.
* Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm()
  function instead of the vfio_ap_mdev_release() function.

Tony Krowiak (1):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

 drivers/s390/crypto/vfio_ap_ops.c     | 312 ++++++++++++++++++--------
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 218 insertions(+), 96 deletions(-)

-- 
2.21.3


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-02 20:43 [PATCH v3 0/1] s390/vfio-ap: fix circular lockdep when starting SE guest Tony Krowiak
@ 2021-03-02 20:43 ` Tony Krowiak
  2021-03-03 15:23   ` Halil Pasic
  0 siblings, 1 reply; 11+ messages in thread
From: Tony Krowiak @ 2021-03-02 20:43 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: stable, borntraeger, cohuck, kwankhede, pbonzini,
	alex.williamson, pasic, Tony Krowiak

This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.

The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.

Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:

struct ap_matrix_mdev {
	...
	bool kvm_busy;
	wait_queue_head_t wait_for_kvm;
   ...
};

The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 312 ++++++++++++++++++--------
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 218 insertions(+), 96 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 41fc2e4135fe..aaf642a21a9d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -294,6 +294,19 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
 				   struct ap_matrix_mdev, pqap_hook);
 
+	/*
+	 * If the KVM pointer is in the process of being set, wait until the
+	 * process has completed.
+	 */
+	wait_event_cmd(matrix_mdev->wait_for_kvm,
+		       matrix_mdev->kvm_busy == false,
+		       mutex_unlock(&matrix_dev->lock),
+		       mutex_lock(&matrix_dev->lock));
+
+	/* If the there is no guest using the mdev, there is nothing to do */
+	if (!matrix_mdev->kvm)
+		goto out_unlock;
+
 	q = vfio_ap_get_queue(matrix_mdev, apqn);
 	if (!q)
 		goto out_unlock;
@@ -337,6 +350,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+	init_waitqueue_head(&matrix_mdev->wait_for_kvm);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
 	matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -351,17 +365,23 @@ static int vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	if (matrix_mdev->kvm)
+	mutex_lock(&matrix_dev->lock);
+
+	/*
+	 * If the KVM pointer is in flux or the guest is running, disallow
+	 * un-assignment of control domain.
+	 */
+	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+		mutex_unlock(&matrix_dev->lock);
 		return -EBUSY;
+	}
 
-	mutex_lock(&matrix_dev->lock);
 	vfio_ap_mdev_reset_queues(mdev);
 	list_del(&matrix_mdev->node);
-	mutex_unlock(&matrix_dev->lock);
-
 	kfree(matrix_mdev);
 	mdev_set_drvdata(mdev, NULL);
 	atomic_inc(&matrix_dev->available_instances);
+	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
 }
@@ -606,24 +626,31 @@ static ssize_t assign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
+	mutex_lock(&matrix_dev->lock);
+
+	/*
+	 * If the KVM pointer is in flux or the guest is running, disallow
+	 * un-assignment of adapter
+	 */
+	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+		ret = -EBUSY;
+		goto done;
+	}
 
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
-		return ret;
+		goto done;
 
-	if (apid > matrix_mdev->matrix.apm_max)
-		return -ENODEV;
+	if (apid > matrix_mdev->matrix.apm_max) {
+		ret = -ENODEV;
+		goto done;
+	}
 
 	/*
 	 * Set the bit in the AP mask (APM) corresponding to the AP adapter
 	 * number (APID). The bits in the mask, from most significant to least
 	 * significant bit, correspond to APIDs 0-255.
 	 */
-	mutex_lock(&matrix_dev->lock);
-
 	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
 	if (ret)
 		goto done;
@@ -672,22 +699,31 @@ static ssize_t unassign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
+	mutex_lock(&matrix_dev->lock);
+
+	/*
+	 * If the KVM pointer is in flux or the guest is running, disallow
+	 * un-assignment of adapter
+	 */
+	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+		ret = -EBUSY;
+		goto done;
+	}
 
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
-		return ret;
+		goto done;
 
-	if (apid > matrix_mdev->matrix.apm_max)
-		return -ENODEV;
+	if (apid > matrix_mdev->matrix.apm_max) {
+		ret = -ENODEV;
+		goto done;
+	}
 
-	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+	ret = count;
+done:
 	mutex_unlock(&matrix_dev->lock);
-
-	return count;
+	return ret;
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
@@ -753,17 +789,24 @@ static ssize_t assign_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
 
-	/* If the guest is running, disallow assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
+	mutex_lock(&matrix_dev->lock);
+
+	/*
+	 * If the KVM pointer is in flux or the guest is running, disallow
+	 * assignment of domain
+	 */
+	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+		ret = -EBUSY;
+		goto done;
+	}
 
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
-		return ret;
-	if (apqi > max_apqi)
-		return -ENODEV;
-
-	mutex_lock(&matrix_dev->lock);
+		goto done;
+	if (apqi > max_apqi) {
+		ret = -ENODEV;
+		goto done;
+	}
 
 	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
 	if (ret)
@@ -814,22 +857,32 @@ static ssize_t unassign_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
+	mutex_lock(&matrix_dev->lock);
+
+	/*
+	 * If the KVM pointer is in flux or the guest is running, disallow
+	 * un-assignment of domain
+	 */
+	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+		ret = -EBUSY;
+		goto done;
+	}
 
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
-		return ret;
+		goto done;
 
-	if (apqi > matrix_mdev->matrix.aqm_max)
-		return -ENODEV;
+	if (apqi > matrix_mdev->matrix.aqm_max) {
+		ret = -ENODEV;
+		goto done;
+	}
 
-	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
-	mutex_unlock(&matrix_dev->lock);
+	ret = count;
 
-	return count;
+done:
+	mutex_unlock(&matrix_dev->lock);
+	return ret;
 }
 static DEVICE_ATTR_WO(unassign_domain);
 
@@ -858,27 +911,36 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
+	mutex_lock(&matrix_dev->lock);
+
+	/*
+	 * If the KVM pointer is in flux or the guest is running, disallow
+	 * assignment of control domain.
+	 */
+	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+		ret = -EBUSY;
+		goto done;
+	}
 
 	ret = kstrtoul(buf, 0, &id);
 	if (ret)
-		return ret;
+		goto done;
 
-	if (id > matrix_mdev->matrix.adm_max)
-		return -ENODEV;
+	if (id > matrix_mdev->matrix.adm_max) {
+		ret = -ENODEV;
+		goto done;
+	}
 
 	/* Set the bit in the ADM (bitmask) corresponding to the AP control
 	 * domain number (id). The bits in the mask, from most significant to
 	 * least significant, correspond to IDs 0 up to the one less than the
 	 * number of control domains that can be assigned.
 	 */
-	mutex_lock(&matrix_dev->lock);
 	set_bit_inv(id, matrix_mdev->matrix.adm);
+	ret = count;
+done:
 	mutex_unlock(&matrix_dev->lock);
-
-	return count;
+	return ret;
 }
 static DEVICE_ATTR_WO(assign_control_domain);
 
@@ -908,21 +970,30 @@ static ssize_t unassign_control_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
 
-	/* If the guest is running, disallow un-assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
+	mutex_lock(&matrix_dev->lock);
+
+	/*
+	 * If the KVM pointer is in flux or the guest is running, disallow
+	 * un-assignment of control domain.
+	 */
+	if (matrix_mdev->kvm_busy || matrix_mdev->kvm) {
+		ret = -EBUSY;
+		goto done;
+	}
 
 	ret = kstrtoul(buf, 0, &domid);
 	if (ret)
-		return ret;
-	if (domid > max_domid)
-		return -ENODEV;
+		goto done;
+	if (domid > max_domid) {
+		ret = -ENODEV;
+		goto done;
+	}
 
-	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv(domid, matrix_mdev->matrix.adm);
+	ret = count;
+done:
 	mutex_unlock(&matrix_dev->lock);
-
-	return count;
+	return ret;
 }
 static DEVICE_ATTR_WO(unassign_control_domain);
 
@@ -1027,8 +1098,15 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
  * @matrix_mdev: a mediated matrix device
  * @kvm: reference to KVM instance
  *
- * Verifies no other mediated matrix device has @kvm and sets a reference to
- * it in @matrix_mdev->kvm.
+ * Sets all data for @matrix_mdev that are needed to manage AP resources
+ * for the guest whose state is represented by @kvm.
+ *
+ * Note: The matrix_dev->lock must be taken prior to calling
+ * this function; however, the lock will be temporarily released while the
+ * guest's AP configuration is set to avoid a potential lockdep splat.
+ * The kvm->lock is taken to set the guest's AP configuration which, under
+ * certain circumstances, will result in a circular lock dependency if this is
+ * done under the @matrix_mdev->lock.
  *
  * Return 0 if no other mediated matrix device has a reference to @kvm;
  * otherwise, returns an -EPERM.
@@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 {
 	struct ap_matrix_mdev *m;
 
-	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
-		if ((m != matrix_mdev) && (m->kvm == kvm))
-			return -EPERM;
-	}
+	if (kvm->arch.crypto.crycbd) {
+		matrix_mdev->kvm_busy = true;
 
-	matrix_mdev->kvm = kvm;
-	kvm_get_kvm(kvm);
-	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
+		list_for_each_entry(m, &matrix_dev->mdev_list, node) {
+			if ((m != matrix_mdev) && (m->kvm == kvm)) {
+				wake_up_all(&matrix_mdev->wait_for_kvm);
+				return -EPERM;
+			}
+		}
+
+		kvm_get_kvm(kvm);
+		mutex_unlock(&matrix_dev->lock);
+		kvm_arch_crypto_set_masks(kvm,
+					  matrix_mdev->matrix.apm,
+					  matrix_mdev->matrix.aqm,
+					  matrix_mdev->matrix.adm);
+		mutex_lock(&matrix_dev->lock);
+		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
+		matrix_mdev->kvm = kvm;
+		matrix_mdev->kvm_busy = false;
+		wake_up_all(&matrix_mdev->wait_for_kvm);
+	}
 
 	return 0;
 }
@@ -1079,51 +1171,65 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
+/**
+ * vfio_ap_mdev_unset_kvm
+ *
+ * @matrix_mdev: a matrix mediated device
+ *
+ * Performs clean-up of resources no longer needed by @matrix_mdev.
+ *
+ * Note: The matrix_dev->lock must be taken prior to calling
+ * this function; however, the lock will be temporarily released while the
+ * guest's AP configuration is cleared to avoid a potential lockdep splat.
+ * The kvm->lock is taken to clear the guest's AP configuration which, under
+ * certain circumstances, will result in a circular lock dependency if this is
+ * done under the @matrix_mdev->lock.
+ *
+ */
 static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
 {
-	kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-	vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
-	kvm_put_kvm(matrix_mdev->kvm);
-	matrix_mdev->kvm = NULL;
+	/*
+	 * If the KVM pointer is in the process of being set, wait until the
+	 * process has completed.
+	 */
+	wait_event_cmd(matrix_mdev->wait_for_kvm,
+		       matrix_mdev->kvm_busy == false,
+		       mutex_unlock(&matrix_dev->lock),
+		       mutex_lock(&matrix_dev->lock));
+
+	if (matrix_mdev->kvm) {
+		matrix_mdev->kvm_busy = true;
+		mutex_unlock(&matrix_dev->lock);
+		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+		mutex_lock(&matrix_dev->lock);
+		vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+		kvm_put_kvm(matrix_mdev->kvm);
+		matrix_mdev->kvm = NULL;
+		matrix_mdev->kvm_busy = false;
+		wake_up_all(&matrix_mdev->wait_for_kvm);
+	}
 }
 
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 				       unsigned long action, void *data)
 {
-	int ret, notify_rc = NOTIFY_OK;
+	int notify_rc = NOTIFY_OK;
 	struct ap_matrix_mdev *matrix_mdev;
 
 	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
 		return NOTIFY_OK;
 
-	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
 	mutex_lock(&matrix_dev->lock);
+	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
 
-	if (!data) {
-		if (matrix_mdev->kvm)
-			vfio_ap_mdev_unset_kvm(matrix_mdev);
-		goto notify_done;
-	}
-
-	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
-	if (ret) {
-		notify_rc = NOTIFY_DONE;
-		goto notify_done;
-	}
-
-	/* If there is no CRYCB pointer, then we can't copy the masks */
-	if (!matrix_mdev->kvm->arch.crypto.crycbd) {
+	if (!data)
+		vfio_ap_mdev_unset_kvm(matrix_mdev);
+	else if (vfio_ap_mdev_set_kvm(matrix_mdev, data))
 		notify_rc = NOTIFY_DONE;
-		goto notify_done;
-	}
 
-	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
-				  matrix_mdev->matrix.aqm,
-				  matrix_mdev->matrix.adm);
-
-notify_done:
 	mutex_unlock(&matrix_dev->lock);
+
 	return notify_rc;
 }
 
@@ -1258,8 +1364,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
 	mutex_lock(&matrix_dev->lock);
-	if (matrix_mdev->kvm)
-		vfio_ap_mdev_unset_kvm(matrix_mdev);
+	vfio_ap_mdev_unset_kvm(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
@@ -1293,6 +1398,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
 				    unsigned int cmd, unsigned long arg)
 {
 	int ret;
+	struct ap_matrix_mdev *matrix_mdev;
 
 	mutex_lock(&matrix_dev->lock);
 	switch (cmd) {
@@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
 		ret = vfio_ap_mdev_get_device_info(arg);
 		break;
 	case VFIO_DEVICE_RESET:
-		ret = vfio_ap_mdev_reset_queues(mdev);
+		matrix_mdev = mdev_get_drvdata(mdev);
+
+		/*
+		 * If the KVM pointer is in the process of being set, wait until
+		 * the process has completed.
+		 */
+		wait_event_cmd(matrix_mdev->wait_for_kvm,
+			       matrix_mdev->kvm_busy == false,
+			       mutex_unlock(&matrix_dev->lock),
+			       mutex_lock(&matrix_dev->lock));
+
+		if (matrix_mdev->kvm)
+			ret = vfio_ap_mdev_reset_queues(mdev);
+		else
+			ret = -ENODEV;
 		break;
 	default:
 		ret = -EOPNOTSUPP;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 28e9d9989768..f82a6396acae 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -83,6 +83,8 @@ struct ap_matrix_mdev {
 	struct ap_matrix matrix;
 	struct notifier_block group_notifier;
 	struct notifier_block iommu_notifier;
+	bool kvm_busy;
+	wait_queue_head_t wait_for_kvm;
 	struct kvm *kvm;
 	struct kvm_s390_module_hook pqap_hook;
 	struct mdev_device *mdev;
-- 
2.21.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-02 20:43 ` [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks Tony Krowiak
@ 2021-03-03 15:23   ` Halil Pasic
  2021-03-03 16:41     ` Tony Krowiak
  2021-03-03 17:10     ` Tony Krowiak
  0 siblings, 2 replies; 11+ messages in thread
From: Halil Pasic @ 2021-03-03 15:23 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic

On Tue,  2 Mar 2021 15:43:22 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> This patch fixes a lockdep splat introduced by commit f21916ec4826
> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
> The lockdep splat only occurs when starting a Secure Execution guest.
> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
> however, in order to avoid this problem when support becomes available,
> this fix is being provided.

[..]

> @@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>  {
>  	struct ap_matrix_mdev *m;
> 
> -	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> -		if ((m != matrix_mdev) && (m->kvm == kvm))
> -			return -EPERM;
> -	}
> +	if (kvm->arch.crypto.crycbd) {
> +		matrix_mdev->kvm_busy = true;
> 
> -	matrix_mdev->kvm = kvm;
> -	kvm_get_kvm(kvm);
> -	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> +		list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> +			if ((m != matrix_mdev) && (m->kvm == kvm)) {
> +				wake_up_all(&matrix_mdev->wait_for_kvm);

This ain't no good. kvm_busy will remain true if we take this exit. The
wake_up_all() is not needed, because we hold the lock, so nobody can
observe it if we don't forget kvm_busy set.

I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
before the unlock, and removing the wake_up_all().

> +				return -EPERM;
> +			}
> +		}
> +
> +		kvm_get_kvm(kvm);
> +		mutex_unlock(&matrix_dev->lock);
> +		kvm_arch_crypto_set_masks(kvm,
> +					  matrix_mdev->matrix.apm,
> +					  matrix_mdev->matrix.aqm,
> +					  matrix_mdev->matrix.adm);
> +		mutex_lock(&matrix_dev->lock);
> +		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> +		matrix_mdev->kvm = kvm;
> +		matrix_mdev->kvm_busy = false;
> +		wake_up_all(&matrix_mdev->wait_for_kvm);
> +	}
> 
>  	return 0;
>  }

[..]

> @@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
>  		ret = vfio_ap_mdev_get_device_info(arg);
>  		break;
>  	case VFIO_DEVICE_RESET:
> -		ret = vfio_ap_mdev_reset_queues(mdev);
> +		matrix_mdev = mdev_get_drvdata(mdev);
> +
> +		/*
> +		 * If the KVM pointer is in the process of being set, wait until
> +		 * the process has completed.
> +		 */
> +		wait_event_cmd(matrix_mdev->wait_for_kvm,
> +			       matrix_mdev->kvm_busy == false,
> +			       mutex_unlock(&matrix_dev->lock),
> +			       mutex_lock(&matrix_dev->lock));
> +
> +		if (matrix_mdev->kvm)
> +			ret = vfio_ap_mdev_reset_queues(mdev);
> +		else
> +			ret = -ENODEV;

I don't think rejecting the reset is a good idea. I have you a more detailed
explanation of the list, where we initially discussed this question.

How do you exect userspace to react to this -ENODEV?

Otherwise looks good to me!

I've tested your branch from yesterday (which looks to me like this patch
without the above check on ->kvm and reset) for the lockdep splat, but I
didn't do any comprehensive testing -- which would ensure that we didn't
break something else in the process. With the two issues fixed, and your
word that the patch was properly tested (except for the lockdep splat
which I tested myself), I feel comfortable with moving forward with this.

Regards,


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-03 15:23   ` Halil Pasic
@ 2021-03-03 16:41     ` Tony Krowiak
  2021-03-03 19:42       ` Halil Pasic
  2021-03-03 17:10     ` Tony Krowiak
  1 sibling, 1 reply; 11+ messages in thread
From: Tony Krowiak @ 2021-03-03 16:41 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic



On 3/3/21 10:23 AM, Halil Pasic wrote:
> On Tue,  2 Mar 2021 15:43:22 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> This patch fixes a lockdep splat introduced by commit f21916ec4826
>> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
>> The lockdep splat only occurs when starting a Secure Execution guest.
>> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
>> however, in order to avoid this problem when support becomes available,
>> this fix is being provided.
> [..]
>
>> @@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>   {
>>   	struct ap_matrix_mdev *m;
>>
>> -	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>> -		if ((m != matrix_mdev) && (m->kvm == kvm))
>> -			return -EPERM;
>> -	}
>> +	if (kvm->arch.crypto.crycbd) {
>> +		matrix_mdev->kvm_busy = true;
>>
>> -	matrix_mdev->kvm = kvm;
>> -	kvm_get_kvm(kvm);
>> -	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>> +		list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>> +			if ((m != matrix_mdev) && (m->kvm == kvm)) {
>> +				wake_up_all(&matrix_mdev->wait_for_kvm);
> This ain't no good. kvm_busy will remain true if we take this exit. The
> wake_up_all() is not needed, because we hold the lock, so nobody can
> observe it if we don't forget kvm_busy set.
>
> I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
> before the unlock, and removing the wake_up_all().

Okay

>
>> +				return -EPERM;
>> +			}
>> +		}
>> +
>> +		kvm_get_kvm(kvm);
>> +		mutex_unlock(&matrix_dev->lock);
>> +		kvm_arch_crypto_set_masks(kvm,
>> +					  matrix_mdev->matrix.apm,
>> +					  matrix_mdev->matrix.aqm,
>> +					  matrix_mdev->matrix.adm);
>> +		mutex_lock(&matrix_dev->lock);
>> +		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>> +		matrix_mdev->kvm = kvm;
>> +		matrix_mdev->kvm_busy = false;
>> +		wake_up_all(&matrix_mdev->wait_for_kvm);
>> +	}
>>
>>   	return 0;
>>   }
> [..]
>
>> @@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
>>   		ret = vfio_ap_mdev_get_device_info(arg);
>>   		break;
>>   	case VFIO_DEVICE_RESET:
>> -		ret = vfio_ap_mdev_reset_queues(mdev);
>> +		matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> +		/*
>> +		 * If the KVM pointer is in the process of being set, wait until
>> +		 * the process has completed.
>> +		 */
>> +		wait_event_cmd(matrix_mdev->wait_for_kvm,
>> +			       matrix_mdev->kvm_busy == false,
>> +			       mutex_unlock(&matrix_dev->lock),
>> +			       mutex_lock(&matrix_dev->lock));
>> +
>> +		if (matrix_mdev->kvm)
>> +			ret = vfio_ap_mdev_reset_queues(mdev);
>> +		else
>> +			ret = -ENODEV;
> I don't think rejecting the reset is a good idea. I have you a more detailed
> explanation of the list, where we initially discussed this question.
>
> How do you exect userspace to react to this -ENODEV?

The VFIO_DEVICE_RESET ioctl expects a return code.
The vfio_ap_mdev_reset_queues() function can return -EIO or
-EBUSY, so I would expect userspace to handle -ENODEV
similarly to -EIO or any other non-zero return code. I also
looked at all of the VFIO_DEVICE_RESET calls from QEMU to see
how the return from the ioctl call is handled:

* ap: reports the reset failed along with the rc
* ccw: doesn't check the rc
* pci: kind of hard to follow without digging deep, but definitely
          handles non-zero rc.

I think the caller should be notified whether the queues were
successfully reset or not, and why; in this case, the answer is
there are no devices to reset.

>
> Otherwise looks good to me!
>
> I've tested your branch from yesterday (which looks to me like this patch
> without the above check on ->kvm and reset) for the lockdep splat, but I
> didn't do any comprehensive testing -- which would ensure that we didn't
> break something else in the process. With the two issues fixed, and your
> word that the patch was properly tested (except for the lockdep splat
> which I tested myself), I feel comfortable with moving forward with this.
>
> Regards,
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-03 15:23   ` Halil Pasic
  2021-03-03 16:41     ` Tony Krowiak
@ 2021-03-03 17:10     ` Tony Krowiak
  2021-03-03 19:47       ` Halil Pasic
  1 sibling, 1 reply; 11+ messages in thread
From: Tony Krowiak @ 2021-03-03 17:10 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic



On 3/3/21 10:23 AM, Halil Pasic wrote:
> On Tue,  2 Mar 2021 15:43:22 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> This patch fixes a lockdep splat introduced by commit f21916ec4826
>> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
>> The lockdep splat only occurs when starting a Secure Execution guest.
>> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
>> however, in order to avoid this problem when support becomes available,
>> this fix is being provided.
> [..]
>
>> @@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>   {
>>   	struct ap_matrix_mdev *m;
>>
>> -	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>> -		if ((m != matrix_mdev) && (m->kvm == kvm))
>> -			return -EPERM;
>> -	}
>> +	if (kvm->arch.crypto.crycbd) {
>> +		matrix_mdev->kvm_busy = true;
>>
>> -	matrix_mdev->kvm = kvm;
>> -	kvm_get_kvm(kvm);
>> -	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>> +		list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>> +			if ((m != matrix_mdev) && (m->kvm == kvm)) {
>> +				wake_up_all(&matrix_mdev->wait_for_kvm);
> This ain't no good. kvm_busy will remain true if we take this exit. The
> wake_up_all() is not needed, because we hold the lock, so nobody can
> observe it if we don't forget kvm_busy set.
>
> I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
> before the unlock, and removing the wake_up_all().
>
>> +				return -EPERM;
>> +			}
>> +		}
>> +
>> +		kvm_get_kvm(kvm);
>> +		mutex_unlock(&matrix_dev->lock);
>> +		kvm_arch_crypto_set_masks(kvm,
>> +					  matrix_mdev->matrix.apm,
>> +					  matrix_mdev->matrix.aqm,
>> +					  matrix_mdev->matrix.adm);
>> +		mutex_lock(&matrix_dev->lock);
>> +		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>> +		matrix_mdev->kvm = kvm;
>> +		matrix_mdev->kvm_busy = false;
>> +		wake_up_all(&matrix_mdev->wait_for_kvm);
>> +	}
>>
>>   	return 0;
>>   }
> [..]
>
>> @@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
>>   		ret = vfio_ap_mdev_get_device_info(arg);
>>   		break;
>>   	case VFIO_DEVICE_RESET:
>> -		ret = vfio_ap_mdev_reset_queues(mdev);
>> +		matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> +		/*
>> +		 * If the KVM pointer is in the process of being set, wait until
>> +		 * the process has completed.
>> +		 */
>> +		wait_event_cmd(matrix_mdev->wait_for_kvm,
>> +			       matrix_mdev->kvm_busy == false,
>> +			       mutex_unlock(&matrix_dev->lock),
>> +			       mutex_lock(&matrix_dev->lock));
>> +
>> +		if (matrix_mdev->kvm)
>> +			ret = vfio_ap_mdev_reset_queues(mdev);
>> +		else
>> +			ret = -ENODEV;
> I don't think rejecting the reset is a good idea. I have you a more detailed
> explanation of the list, where we initially discussed this question.
>
> How do you exect userspace to react to this -ENODEV?

After reading your more detailed explanation, I have come to the
conclusion that the test for matrix_mdev->kvm should not be
performed here and the the vfio_ap_mdev_reset_queues() function
should be called regardless. Each queue assigned to the mdev
that is also bound to the vfio_ap driver will get reset and its
IRQ resources cleaned up if they haven't already been and the
other required conditions are met (i.e., see 
vfio_ap_mdev_free_irq_resources()).

>
> Otherwise looks good to me!
>
> I've tested your branch from yesterday (which looks to me like this patch
> without the above check on ->kvm and reset) for the lockdep splat, but I
> didn't do any comprehensive testing -- which would ensure that we didn't
> break something else in the process. With the two issues fixed, and your
> word that the patch was properly tested (except for the lockdep splat
> which I tested myself), I feel comfortable with moving forward with this.
>
> Regards,
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-03 16:41     ` Tony Krowiak
@ 2021-03-03 19:42       ` Halil Pasic
  2021-03-04 16:22         ` Tony Krowiak
  0 siblings, 1 reply; 11+ messages in thread
From: Halil Pasic @ 2021-03-03 19:42 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic

On Wed, 3 Mar 2021 11:41:22 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> > How do you exect userspace to react to this -ENODEV?  
> 
> The VFIO_DEVICE_RESET ioctl expects a return code.
> The vfio_ap_mdev_reset_queues() function can return -EIO or
> -EBUSY, so I would expect userspace to handle -ENODEV
> similarly to -EIO or any other non-zero return code. I also
> looked at all of the VFIO_DEVICE_RESET calls from QEMU to see
> how the return from the ioctl call is handled:
> 
> * ap: reports the reset failed along with the rc

And carries on as if nothing happened. There is not much smart
userspace can do in such a situation. Therefore the reset really
should not fail.

Please note that in this particular case, if the userspace would
opt for a retry, we would most likely end up in a retry loop.

> * ccw: doesn't check the rc
> * pci: kind of hard to follow without digging deep, but definitely
>           handles non-zero rc.
> 
> I think the caller should be notified whether the queues were
> successfully reset or not, and why; in this case, the answer is
> there are no devices to reset.

That is the wrong answer. The ioctl is supposed to reset the
ap_matrix_mdev device. The ap_matrix_mdev device still exists. Thus
returning -ENODEV is bugous.

Regards,
Halil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-03 17:10     ` Tony Krowiak
@ 2021-03-03 19:47       ` Halil Pasic
  2021-03-04 17:43         ` Tony Krowiak
  0 siblings, 1 reply; 11+ messages in thread
From: Halil Pasic @ 2021-03-03 19:47 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic

On Wed, 3 Mar 2021 12:10:11 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 3/3/21 10:23 AM, Halil Pasic wrote:
> > On Tue,  2 Mar 2021 15:43:22 -0500
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> This patch fixes a lockdep splat introduced by commit f21916ec4826
> >> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
> >> The lockdep splat only occurs when starting a Secure Execution guest.
> >> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
> >> however, in order to avoid this problem when support becomes available,
> >> this fix is being provided.  
> > [..]
> >  
> >> @@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
> >>   {
> >>   	struct ap_matrix_mdev *m;
> >>
> >> -	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> >> -		if ((m != matrix_mdev) && (m->kvm == kvm))
> >> -			return -EPERM;
> >> -	}
> >> +	if (kvm->arch.crypto.crycbd) {
> >> +		matrix_mdev->kvm_busy = true;
> >>
> >> -	matrix_mdev->kvm = kvm;
> >> -	kvm_get_kvm(kvm);
> >> -	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> >> +		list_for_each_entry(m, &matrix_dev->mdev_list, node) {
> >> +			if ((m != matrix_mdev) && (m->kvm == kvm)) {
> >> +				wake_up_all(&matrix_mdev->wait_for_kvm);  
> > This ain't no good. kvm_busy will remain true if we take this exit. The
> > wake_up_all() is not needed, because we hold the lock, so nobody can
> > observe it if we don't forget kvm_busy set.
> >
> > I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
> > before the unlock, and removing the wake_up_all().
> >  
> >> +				return -EPERM;
> >> +			}
> >> +		}
> >> +
> >> +		kvm_get_kvm(kvm);
> >> +		mutex_unlock(&matrix_dev->lock);
> >> +		kvm_arch_crypto_set_masks(kvm,
> >> +					  matrix_mdev->matrix.apm,
> >> +					  matrix_mdev->matrix.aqm,
> >> +					  matrix_mdev->matrix.adm);
> >> +		mutex_lock(&matrix_dev->lock);
> >> +		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> >> +		matrix_mdev->kvm = kvm;
> >> +		matrix_mdev->kvm_busy = false;
> >> +		wake_up_all(&matrix_mdev->wait_for_kvm);
> >> +	}
> >>
> >>   	return 0;
> >>   }  
> > [..]
> >  
> >> @@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
> >>   		ret = vfio_ap_mdev_get_device_info(arg);
> >>   		break;
> >>   	case VFIO_DEVICE_RESET:
> >> -		ret = vfio_ap_mdev_reset_queues(mdev);
> >> +		matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> +		/*
> >> +		 * If the KVM pointer is in the process of being set, wait until
> >> +		 * the process has completed.
> >> +		 */
> >> +		wait_event_cmd(matrix_mdev->wait_for_kvm,
> >> +			       matrix_mdev->kvm_busy == false,
> >> +			       mutex_unlock(&matrix_dev->lock),
> >> +			       mutex_lock(&matrix_dev->lock));
> >> +
> >> +		if (matrix_mdev->kvm)
> >> +			ret = vfio_ap_mdev_reset_queues(mdev);
> >> +		else
> >> +			ret = -ENODEV;  
> > I don't think rejecting the reset is a good idea. I have you a more detailed
> > explanation of the list, where we initially discussed this question.
> >
> > How do you exect userspace to react to this -ENODEV?  
> 
> After reading your more detailed explanation, I have come to the
> conclusion that the test for matrix_mdev->kvm should not be
> performed here and the the vfio_ap_mdev_reset_queues() function
> should be called regardless. Each queue assigned to the mdev
> that is also bound to the vfio_ap driver will get reset and its
> IRQ resources cleaned up if they haven't already been and the
> other required conditions are met (i.e., see 
> vfio_ap_mdev_free_irq_resources()).

My point is if !->kvm the other required conditions are not met. But
yes we can go back to unconditional vfio_ap_mdev_reset_queues(mdev),
and think about the necessity of performing a
vfio_ap_mdev_reset_queues() if !->kvm later as I proposed in the other
mail.

Regards,
Halil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-03 19:42       ` Halil Pasic
@ 2021-03-04 16:22         ` Tony Krowiak
  0 siblings, 0 replies; 11+ messages in thread
From: Tony Krowiak @ 2021-03-04 16:22 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic



On 3/3/21 2:42 PM, Halil Pasic wrote:
> On Wed, 3 Mar 2021 11:41:22 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>> How do you exect userspace to react to this -ENODEV?
>> The VFIO_DEVICE_RESET ioctl expects a return code.
>> The vfio_ap_mdev_reset_queues() function can return -EIO or
>> -EBUSY, so I would expect userspace to handle -ENODEV
>> similarly to -EIO or any other non-zero return code. I also
>> looked at all of the VFIO_DEVICE_RESET calls from QEMU to see
>> how the return from the ioctl call is handled:
>>
>> * ap: reports the reset failed along with the rc
> And carries on as if nothing happened. There is not much smart
> userspace can do in such a situation. Therefore the reset really
> should not fail.

Regardless of what we decide to do here, there is the
possibility that the vfio_ap_mdev_reset_queues()
function will return an error, so your point is moot
and maybe should be brought up as a QEMU
implementation issue. I don't think it is encumbent
upon the KVM code to anticipate how userspace
will respond to a non-zero return code. I think the
pertinent question is what return code makes sense.
Having said that, I have other concerns which I
discussed below.

>
> Please note that in this particular case, if the userspace would
> opt for a retry, we would most likely end up in a retry loop.
>
>> * ccw: doesn't check the rc
>> * pci: kind of hard to follow without digging deep, but definitely
>>            handles non-zero rc.
>>
>> I think the caller should be notified whether the queues were
>> successfully reset or not, and why; in this case, the answer is
>> there are no devices to reset.
> That is the wrong answer. The ioctl is supposed to reset the
> ap_matrix_mdev device. The ap_matrix_mdev device still exists. Thus
> returning -ENODEV is bugous.

That makes sense and it begs the question, what does it mean to
reset the mdev? Is resetting the queues an appropriate response
to the VFIO_DEVICE_RESET ioctl call?

The purpose of the mdev is to supply the AP configuration to a KVM
guest. The queues themselves belong to the guest. If the guest enables
interrupts for a queue and vfio_ap does a reset in response to the ioctl
call, then the guest will be sitting there waiting for interrupts which
have been disabled by the reset. It seems that as long as a guest is
using the mdev, then management of its queues (i.e., reset) should be
left to the guest. Unless there is something to reset as far as the
mdev is concerned, maybe the response to the VFIO_RESET_DEVICE
ioctl ought to be a NOP regardless of the value of ->kvm.

>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-03 19:47       ` Halil Pasic
@ 2021-03-04 17:43         ` Tony Krowiak
  2021-03-09 10:23           ` Halil Pasic
  0 siblings, 1 reply; 11+ messages in thread
From: Tony Krowiak @ 2021-03-04 17:43 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic



On 3/3/21 2:47 PM, Halil Pasic wrote:
> On Wed, 3 Mar 2021 12:10:11 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 3/3/21 10:23 AM, Halil Pasic wrote:
>>> On Tue,  2 Mar 2021 15:43:22 -0500
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> This patch fixes a lockdep splat introduced by commit f21916ec4826
>>>> ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
>>>> The lockdep splat only occurs when starting a Secure Execution guest.
>>>> Crypto virtualization (vfio_ap) is not yet supported for SE guests;
>>>> however, in order to avoid this problem when support becomes available,
>>>> this fix is being provided.
>>> [..]
>>>   
>>>> @@ -1038,14 +1116,28 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>>>    {
>>>>    	struct ap_matrix_mdev *m;
>>>>
>>>> -	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>>>> -		if ((m != matrix_mdev) && (m->kvm == kvm))
>>>> -			return -EPERM;
>>>> -	}
>>>> +	if (kvm->arch.crypto.crycbd) {
>>>> +		matrix_mdev->kvm_busy = true;
>>>>
>>>> -	matrix_mdev->kvm = kvm;
>>>> -	kvm_get_kvm(kvm);
>>>> -	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>>>> +		list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>>>> +			if ((m != matrix_mdev) && (m->kvm == kvm)) {
>>>> +				wake_up_all(&matrix_mdev->wait_for_kvm);
>>> This ain't no good. kvm_busy will remain true if we take this exit. The
>>> wake_up_all() is not needed, because we hold the lock, so nobody can
>>> observe it if we don't forget kvm_busy set.
>>>
>>> I suggest moving matrix_mdev->kvm_busy = true; after this loop, maybe right
>>> before the unlock, and removing the wake_up_all().
>>>   
>>>> +				return -EPERM;
>>>> +			}
>>>> +		}
>>>> +
>>>> +		kvm_get_kvm(kvm);
>>>> +		mutex_unlock(&matrix_dev->lock);
>>>> +		kvm_arch_crypto_set_masks(kvm,
>>>> +					  matrix_mdev->matrix.apm,
>>>> +					  matrix_mdev->matrix.aqm,
>>>> +					  matrix_mdev->matrix.adm);
>>>> +		mutex_lock(&matrix_dev->lock);
>>>> +		kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>>>> +		matrix_mdev->kvm = kvm;
>>>> +		matrix_mdev->kvm_busy = false;
>>>> +		wake_up_all(&matrix_mdev->wait_for_kvm);
>>>> +	}
>>>>
>>>>    	return 0;
>>>>    }
>>> [..]
>>>   
>>>> @@ -1300,7 +1406,21 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
>>>>    		ret = vfio_ap_mdev_get_device_info(arg);
>>>>    		break;
>>>>    	case VFIO_DEVICE_RESET:
>>>> -		ret = vfio_ap_mdev_reset_queues(mdev);
>>>> +		matrix_mdev = mdev_get_drvdata(mdev);
>>>> +
>>>> +		/*
>>>> +		 * If the KVM pointer is in the process of being set, wait until
>>>> +		 * the process has completed.
>>>> +		 */
>>>> +		wait_event_cmd(matrix_mdev->wait_for_kvm,
>>>> +			       matrix_mdev->kvm_busy == false,
>>>> +			       mutex_unlock(&matrix_dev->lock),
>>>> +			       mutex_lock(&matrix_dev->lock));
>>>> +
>>>> +		if (matrix_mdev->kvm)
>>>> +			ret = vfio_ap_mdev_reset_queues(mdev);
>>>> +		else
>>>> +			ret = -ENODEV;
>>> I don't think rejecting the reset is a good idea. I have you a more detailed
>>> explanation of the list, where we initially discussed this question.
>>>
>>> How do you exect userspace to react to this -ENODEV?
>> After reading your more detailed explanation, I have come to the
>> conclusion that the test for matrix_mdev->kvm should not be
>> performed here and the the vfio_ap_mdev_reset_queues() function
>> should be called regardless. Each queue assigned to the mdev
>> that is also bound to the vfio_ap driver will get reset and its
>> IRQ resources cleaned up if they haven't already been and the
>> other required conditions are met (i.e., see
>> vfio_ap_mdev_free_irq_resources()).
> My point is if !->kvm the other required conditions are not met. But
> yes we can go back to unconditional vfio_ap_mdev_reset_queues(mdev),
> and think about the necessity of performing a
> vfio_ap_mdev_reset_queues() if !->kvm later as I proposed in the other
> mail.

The other conditions may or may not have been met depending
upon whether ->kvm is NULL because the VFIO_DEVICE_RESET
ioctl was invoked while the matrix_dev->lock was released
in the vfio_ap_mdev_unset_kvm() function. If that was the case,
then there is no need to clean up the IRQ resources because it
will already have been done.

On the other hand, if we don't have ->kvm because something broke,
then we may be out of luck anyway. There will certainly be no
way to unregister the GISC; however, it may still be possible
to unpin the pages if we still have q->saved_pfn.

The point is, if the queue is bound to vfio_ap, it can be reset. If we can't
clean up the IRQ resources because something is broken, then there
is nothing we can do about that.


>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-04 17:43         ` Tony Krowiak
@ 2021-03-09 10:23           ` Halil Pasic
  2021-03-09 14:27             ` Tony Krowiak
  0 siblings, 1 reply; 11+ messages in thread
From: Halil Pasic @ 2021-03-09 10:23 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic

On Thu, 4 Mar 2021 12:43:44 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On the other hand, if we don't have ->kvm because something broke,
> then we may be out of luck anyway. There will certainly be no
> way to unregister the GISC; however, it may still be possible
> to unpin the pages if we still have q->saved_pfn.
> 
> The point is, if the queue is bound to vfio_ap, it can be reset. If we can't
> clean up the IRQ resources because something is broken, then there
> is nothing we can do about that.

Especially since the recently added WARN_ONCE macros calling reset_queues
unconditionally ain't that bad: we would at least see if there is a
problem with cleaning up the IRQ resources.

Let's make it unconditional again and observe. Can you send out a v4 with
this and the other issue fixed. 

Regards,
Halil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  2021-03-09 10:23           ` Halil Pasic
@ 2021-03-09 14:27             ` Tony Krowiak
  0 siblings, 0 replies; 11+ messages in thread
From: Tony Krowiak @ 2021-03-09 14:27 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, stable, borntraeger, cohuck,
	kwankhede, pbonzini, alex.williamson, pasic



On 3/9/21 5:23 AM, Halil Pasic wrote:
> On Thu, 4 Mar 2021 12:43:44 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On the other hand, if we don't have ->kvm because something broke,
>> then we may be out of luck anyway. There will certainly be no
>> way to unregister the GISC; however, it may still be possible
>> to unpin the pages if we still have q->saved_pfn.
>>
>> The point is, if the queue is bound to vfio_ap, it can be reset. If we can't
>> clean up the IRQ resources because something is broken, then there
>> is nothing we can do about that.
> Especially since the recently added WARN_ONCE macros calling reset_queues
> unconditionally ain't that bad: we would at least see if there is a
> problem with cleaning up the IRQ resources.
>
> Let's make it unconditional again and observe. Can you send out a v4 with
> this and the other issue fixed.

I agree and I can do that.

>   
>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-09 14:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 20:43 [PATCH v3 0/1] s390/vfio-ap: fix circular lockdep when starting SE guest Tony Krowiak
2021-03-02 20:43 ` [PATCH v3 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks Tony Krowiak
2021-03-03 15:23   ` Halil Pasic
2021-03-03 16:41     ` Tony Krowiak
2021-03-03 19:42       ` Halil Pasic
2021-03-04 16:22         ` Tony Krowiak
2021-03-03 17:10     ` Tony Krowiak
2021-03-03 19:47       ` Halil Pasic
2021-03-04 17:43         ` Tony Krowiak
2021-03-09 10:23           ` Halil Pasic
2021-03-09 14:27             ` Tony Krowiak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).