kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support
@ 2020-12-23  1:15 Tony Krowiak
  2020-12-23  1:15 ` [PATCH v13 01/15] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated Tony Krowiak
                   ` (15 more replies)
  0 siblings, 16 replies; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Note: Patch 1, s390/vfio-ap: clean up vfio_ap resources when KVM
      pointer invalidated does not belong to this series. It has been
      posted as a separate patch to fix a known problem. It is included
      here because it will likely pre-req for this series.

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few 
deficiencies this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
   from a running guest. In order to modify a guest's AP configuration,
   the guest must be terminated; only then can AP resources be assigned
   to or unassigned from the guest's matrix mdev. The new AP 
   configuration becomes available to the guest when it is subsequently
   restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
   be modified by a root user without any restrictions. A change to
   either mask can result in AP queue devices being unbound from the
   vfio_ap device driver and bound to a zcrypt device driver even if a
   guest is using the queues, thus giving the host access to the guest's
   private crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
   adapters and APQIs of the domains assigned to a matrix mdev must
   reference an AP queue device bound to the vfio_ap device driver. The
   AP architecture allows assignment of AP resources that are not
   available to the system, so this artificial restriction is not 
   compliant with the architecture.

4. The AP configuration profile can be dynamically changed for the linux
   host after a KVM guest is started. For example, a new domain can be
   dynamically added to the configuration profile via the SE or an HMC
   connected to a DPM enabled lpar. Likewise, AP adapters can be 
   dynamically configured (online state) and deconfigured (standby state)
   using the SE, an SCLP command or an HMC connected to a DPM enabled
   lpar. This can result in inadvertent sharing of AP queues between the
   guest and host.

5. A root user can manually unbind an AP queue device representing a 
   queue in use by a KVM guest via the vfio_ap device driver's sysfs 
   unbind attribute. In this case, the guest will be using a queue that
   is not bound to the driver which violates the device model.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:

1. A root user will be prevented from making edits to the AP bus's
   /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
   ownership of an APQN from the vfio_ap device driver to a zcrypt driver
   while the APQN is assigned to a matrix mdev.

2. Allow a root user to hot plug/unplug AP adapters, domains and control
   domains for a KVM guest using the matrix mdev via its sysfs
   assign/unassign attributes.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if
   it results in assignment of an APQN that does not reference an AP
   queue device bound to the vfio_ap device driver, as long as the APQN
   is not reserved for use by the default zcrypt drivers (also known as
   over-provisioning of AP resources). Allowing over-provisioning of AP
   resources better models the architecture which does not preclude
   assigning AP resources that are not yet available in the system. Such
   APQNs, however, will not be assigned to the guest using the matrix
   mdev; only APQNs referencing AP queue devices bound to the vfio_ap
   device driver will actually get assigned to the guest.

5. Handle dynamic changes to the AP device model. 

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.

2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes 
reconfiguring a guest's AP matrix much less disruptive.

3. Rationale for allowing over-provisioning of AP resources:
----------------------------------------------------------- 
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will 
automatically become unavailable to the guest.

Change log v12-v13:
------------------
* Combined patches 12/13 from previous series into one patch

* Moved all changes for linking queues and mdevs into a single patch

* Re-ordered some patches to aid in review

* Using mutex_trylock() function in adapter/domain assignment functions
  to avoid potential deadlock condition with in_use callback

* Using filtering function for refreshing the guest's APCB for all events
  that change the APCB: assign/unassign adapters, domains, control domains;
  bind/unbind of queue devices; and, changes to the host AP configuration.

Change log v11-v12:
------------------
* Moved matrix device lock to protect group notifier callback

* Split the 'No need to disable IRQ after queue reset' patch into
  multiple patches for easier review (move probe/remove callback
  functions and remove disable IRQ after queue reset)

* Added code to decrement reference count for KVM in group notifier
  callback

* Using mutex_trylock() in functions implementing the sysfs assign_adapter
  and assign_domain as well as the in_use callback to avoid deadlock 
  between the AP bus's ap_perms mutex and the matrix device lock used by
  vfio_ap driver.

* The sysfs guest_matrix attribute of the vfio_ap mdev will now display
  the shadow APCB regardless of whether a guest is using the mdev or not

* Replaced vfio_ap mdev filtering function with a function that initializes
  the guest's APCB by filtering the vfio_ap mdev by APID.

* No longer using filtering function during adapter/domain assignment
  to/from the vfio_ap mdev; replaced with new hot plug/unplug 
  adapter/domain functions.

* No longer using filtering function during bind/unbind; replaced with
  hot plug/unplug queue functions.

* No longer using filtering function for bulk assignment of new adapters
  and domains in on_scan_complete callback; replaced with new hot plug
  functions.    
  

Change log v10-v11:
------------------
* The matrix mdev's configuration is not filtered by APID so that if any
  APQN assigned to the mdev is not bound to the vfio_ap device driver,
  the adapter will not get plugged into the KVM guest on startup, or when
  a new adapter is assigned to the mdev.

* Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle 
  probe/remove).

* Added a patch 1 to remove disable IRQ after a reset because the reset
  already disables a queue.

* Now using filtering code to update the KVM guest's matrix when
  notified that AP bus scan has completed.

* Fixed issue with probe/remove not inititiated by a configuration change
  occurring within a config change.


Change log v9-v10:
-----------------
* Updated the documentation in vfio-ap.rst to include information about the
  AP dynamic configuration support

Change log v8-v9:
----------------
* Fixed errors flagged by the kernel test robot

* Fixed issue with guest losing queues when a new queue is probed due to
  manual bind operation.

Change log v7-v8:
----------------
* Now logging a message when an attempt to reserve APQNs for the zcrypt
  drivers will result in taking a queue away from a KVM guest to provide
  the sysadmin a way to ascertain why the sysfs operation failed.

* Created locked and unlocked versions of the ap_parse_mask_str() function.

* Now using new interface provided by an AP bus patch -
  s390/ap: introduce new ap function ap_get_qdev() - to retrieve
  struct ap_queue representing an AP queue device. This patch is not a
  part of this series but is a prerequisite for this series. 

Change log v6-v7:
----------------
* Added callbacks to AP bus:
  - on_config_changed: Notifies implementing drivers that
    the AP configuration has changed since last AP device scan.
  - on_scan_complete: Notifies implementing drivers that the device scan
    has completed.
  - implemented on_config_changed and on_scan_complete callbacks for
    vfio_ap device driver.
  - updated vfio_ap device driver's probe and remove callbacks to handle
    dynamic changes to the AP device model. 
* Added code to filter APQNs when assigning AP resources to a KVM guest's
  CRYCB

Change log v5-v6:
----------------
* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5 
  series. Harald Freudenberer pointed out that the mutex lock
  for ap_perms_mutex in the apmask_store and aqmask_store functions
  was not being freed. 

* Removed patch 6/7 which added logging to the vfio_ap driver
  to expedite acceptance of this series. The logging will be introduced
  with a separate patch series to allow more time to explore options
  such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
  AP queue devices bound to the vfio_ap device driver are not assigned
  to the guest CRYCB:

  Patch 4: Filter CRYCB bits for unavailable queue devices
  Patch 5: sysfs attribute to display the guest CRYCB
  Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
  invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
  patch 2. 

Change log v4-v5:
----------------
* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
  APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
  assigned to an mdev.

* No longer enforcing requirement restricting guest access to
  queues represented by a queue device bound to the vfio_ap
  device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
  from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
  Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
  the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
  out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from 
  the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic 
  changes to the AP guest configuration due to root user interventions
  or hardware anomalies.

Tony Krowiak (15):
  s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated
  s390/vfio-ap: No need to disable IRQ after queue reset
  s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
  s390/vfio-ap: use new AP bus interface to search for queue devices
  s390/vfio-ap: manage link between queue struct and matrix mdev
  s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  s390/vfio-ap: introduce shadow APCB
  s390/vfio-ap: sysfs attribute to display the guest's matrix
  s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  s390/zcrypt: driver callback to indicate resource in use
  s390/vfio-ap: implement in-use callback for vfio_ap driver
  s390/zcrypt: Notify driver on config changed and scan complete
    callbacks
  s390/vfio-ap: handle host AP config change notification
  s390/vfio-ap: handle AP bus scan completed notification
  s390/vfio-ap: update docs to include dynamic config support

 Documentation/s390/vfio-ap.rst        | 383 ++++++++---
 drivers/s390/crypto/ap_bus.c          | 251 +++++++-
 drivers/s390/crypto/ap_bus.h          |  16 +
 drivers/s390/crypto/vfio_ap_drv.c     |  50 +-
 drivers/s390/crypto/vfio_ap_ops.c     | 891 +++++++++++++++++---------
 drivers/s390/crypto/vfio_ap_private.h |  29 +-
 6 files changed, 1170 insertions(+), 450 deletions(-)

-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 01/15] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2020-12-23  1:15 ` [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak,
	stable

The vfio_ap device driver registers a group notifier with VFIO when the
file descriptor for a VFIO mediated device for a KVM guest is opened to
receive notification that the KVM pointer is set (VFIO_GROUP_NOTIFY_SET_KVM
event). When the KVM pointer is set, the vfio_ap driver takes the
following actions:
1. Stashes the KVM pointer in the vfio_ap_mdev struct that holds the state
   of the mediated device.
2. Calls the kvm_get_kvm() function to increment its reference counter.
3. Sets the function pointer to the function that handles interception of
   the instruction that enables/disables interrupt processing.
4. Sets the masks in the KVM guest's CRYCB to pass AP resources through to
   the guest.

In order to avoid memory leaks, when the notifier is called to receive
notification that the KVM pointer has been set to NULL, the vfio_ap device
driver should reverse the actions taken when the KVM pointer was set.

Fixes: 258287c994de ("s390: vfio-ap: implement mediated device open callback")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 49 ++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..7339043906cf 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1037,19 +1037,14 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 {
 	struct ap_matrix_mdev *m;
 
-	mutex_lock(&matrix_dev->lock);
-
 	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
-		if ((m != matrix_mdev) && (m->kvm == kvm)) {
-			mutex_unlock(&matrix_dev->lock);
+		if ((m != matrix_mdev) && (m->kvm == kvm))
 			return -EPERM;
-		}
 	}
 
 	matrix_mdev->kvm = kvm;
 	kvm_get_kvm(kvm);
 	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
-	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
 }
@@ -1083,35 +1078,52 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
+static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
+{
+	kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+	matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+	vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+	kvm_put_kvm(matrix_mdev->kvm);
+	matrix_mdev->kvm = NULL;
+}
+
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 				       unsigned long action, void *data)
 {
-	int ret;
+	int ret, notify_rc = NOTIFY_OK;
 	struct ap_matrix_mdev *matrix_mdev;
 
 	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
 		return NOTIFY_OK;
 
 	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
+	mutex_lock(&matrix_dev->lock);
 
 	if (!data) {
-		matrix_mdev->kvm = NULL;
-		return NOTIFY_OK;
+		if (matrix_mdev->kvm)
+			vfio_ap_mdev_unset_kvm(matrix_mdev);
+		goto notify_done;
 	}
 
 	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
-	if (ret)
-		return NOTIFY_DONE;
+	if (ret) {
+		notify_rc = NOTIFY_DONE;
+		goto notify_done;
+	}
 
 	/* If there is no CRYCB pointer, then we can't copy the masks */
-	if (!matrix_mdev->kvm->arch.crypto.crycbd)
-		return NOTIFY_DONE;
+	if (!matrix_mdev->kvm->arch.crypto.crycbd) {
+		notify_rc = NOTIFY_DONE;
+		goto notify_done;
+	}
 
 	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
 				  matrix_mdev->matrix.aqm,
 				  matrix_mdev->matrix.adm);
 
-	return NOTIFY_OK;
+notify_done:
+	mutex_unlock(&matrix_dev->lock);
+	return notify_rc;
 }
 
 static void vfio_ap_irq_disable_apqn(int apqn)
@@ -1222,13 +1234,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
 	mutex_lock(&matrix_dev->lock);
-	if (matrix_mdev->kvm) {
-		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
-		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
-		vfio_ap_mdev_reset_queues(mdev);
-		kvm_put_kvm(matrix_mdev->kvm);
-		matrix_mdev->kvm = NULL;
-	}
+	if (matrix_mdev->kvm)
+		vfio_ap_mdev_unset_kvm(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
  2020-12-23  1:15 ` [PATCH v13 01/15] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2021-01-11 16:32   ` Halil Pasic
  2020-12-23  1:15 ` [PATCH v13 03/15] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c Tony Krowiak
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The queues assigned to a matrix mediated device are currently reset when:

* The VFIO_DEVICE_RESET ioctl is invoked
* The mdev fd is closed by userspace (QEMU)
* The mdev is removed from sysfs.

Immediately after the reset of a queue, a call is made to disable
interrupts for the queue. This is entirely unnecessary because the reset of
a queue disables interrupts, so this will be removed.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  1 -
 drivers/s390/crypto/vfio_ap_ops.c     | 40 +++++++++++++++++----------
 drivers/s390/crypto/vfio_ap_private.h |  1 -
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..ca18c91afec9 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
 	apid = AP_QID_CARD(q->apqn);
 	apqi = AP_QID_QUEUE(q->apqn);
 	vfio_ap_mdev_reset_queue(apid, apqi, 1);
-	vfio_ap_irq_disable(q);
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
 }
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 7339043906cf..052f61391ec7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -25,6 +25,7 @@
 #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
 static int match_apqn(struct device *dev, const void *data)
 {
@@ -49,20 +50,15 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
 					int apqn)
 {
 	struct vfio_ap_queue *q;
-	struct device *dev;
 
 	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
 		return NULL;
 	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
 		return NULL;
 
-	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				 &apqn, match_apqn);
-	if (!dev)
-		return NULL;
-	q = dev_get_drvdata(dev);
-	q->matrix_mdev = matrix_mdev;
-	put_device(dev);
+	q = vfio_ap_find_queue(apqn);
+	if (q)
+		q->matrix_mdev = matrix_mdev;
 
 	return q;
 }
@@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	return notify_rc;
 }
 
-static void vfio_ap_irq_disable_apqn(int apqn)
+static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 {
 	struct device *dev;
-	struct vfio_ap_queue *q;
+	struct vfio_ap_queue *q = NULL;
 
 	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
 				 &apqn, match_apqn);
 	if (dev) {
 		q = dev_get_drvdata(dev);
-		vfio_ap_irq_disable(q);
 		put_device(dev);
 	}
+
+	return q;
 }
 
 int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
 			     unsigned int retry)
 {
 	struct ap_queue_status status;
+	struct vfio_ap_queue *q;
+	int ret;
 	int retry2 = 2;
 	int apqn = AP_MKQID(apid, apqi);
 
@@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
 				status = ap_tapq(apqn, NULL);
 			}
 			WARN_ON_ONCE(retry2 <= 0);
-			return 0;
+			ret = 0;
+			goto free_aqic_resources;
 		case AP_RESPONSE_RESET_IN_PROGRESS:
 		case AP_RESPONSE_BUSY:
 			msleep(20);
 			break;
 		default:
 			/* things are really broken, give up */
-			return -EIO;
+			ret = -EIO;
+			goto free_aqic_resources;
 		}
 	} while (retry--);
 
 	return -EBUSY;
+
+free_aqic_resources:
+	/*
+	 * In order to free the aqic resources, the queue must be linked to
+	 * the matrix_mdev to which its APQN is assigned and the KVM pointer
+	 * must be available.
+	 */
+	q = vfio_ap_find_queue(apqn);
+	if (q && q->matrix_mdev && q->matrix_mdev->kvm)
+		vfio_ap_free_aqic_resources(q);
+
+	return ret;
 }
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
@@ -1189,7 +1202,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
 			 */
 			if (ret)
 				rc = ret;
-			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
 		}
 	}
 
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index f46dde56b464..0db6fb3d56d5 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -100,5 +100,4 @@ struct vfio_ap_queue {
 #define VFIO_AP_ISC_INVALID 0xff
 	unsigned char saved_isc;
 };
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 03/15] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
  2020-12-23  1:15 ` [PATCH v13 01/15] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated Tony Krowiak
  2020-12-23  1:15 ` [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2020-12-23  1:15 ` [PATCH v13 04/15] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Let's move the probe and remove callbacks into the vfio_ap_ops.c
file to keep all code related to managing queues in a single file. This
way, all functions related to queue management can be removed from the
vfio_ap_private.h header file defining the public interfaces for the
vfio_ap device driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     | 44 ++-------------------------
 drivers/s390/crypto/vfio_ap_ops.c     | 34 +++++++++++++++++++--
 drivers/s390/crypto/vfio_ap_private.h |  6 ++--
 3 files changed, 37 insertions(+), 47 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index ca18c91afec9..73bd073fd5d3 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -43,46 +43,6 @@ static struct ap_device_id ap_queue_ids[] = {
 
 MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
 
-/**
- * vfio_ap_queue_dev_probe:
- *
- * Allocate a vfio_ap_queue structure and associate it
- * with the device as driver_data.
- */
-static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
-{
-	struct vfio_ap_queue *q;
-
-	q = kzalloc(sizeof(*q), GFP_KERNEL);
-	if (!q)
-		return -ENOMEM;
-	dev_set_drvdata(&apdev->device, q);
-	q->apqn = to_ap_queue(&apdev->device)->qid;
-	q->saved_isc = VFIO_AP_ISC_INVALID;
-	return 0;
-}
-
-/**
- * vfio_ap_queue_dev_remove:
- *
- * Takes the matrix lock to avoid actions on this device while removing
- * Free the associated vfio_ap_queue structure
- */
-static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
-{
-	struct vfio_ap_queue *q;
-	int apid, apqi;
-
-	mutex_lock(&matrix_dev->lock);
-	q = dev_get_drvdata(&apdev->device);
-	dev_set_drvdata(&apdev->device, NULL);
-	apid = AP_QID_CARD(q->apqn);
-	apqi = AP_QID_QUEUE(q->apqn);
-	vfio_ap_mdev_reset_queue(apid, apqi, 1);
-	kfree(q);
-	mutex_unlock(&matrix_dev->lock);
-}
-
 static void vfio_ap_matrix_dev_release(struct device *dev)
 {
 	struct ap_matrix_dev *matrix_dev = dev_get_drvdata(dev);
@@ -185,8 +145,8 @@ static int __init vfio_ap_init(void)
 		return ret;
 
 	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
-	vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
-	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
+	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
 	vfio_ap_drv.ids = ap_queue_ids;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 052f61391ec7..a83d6e75361b 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -140,7 +140,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
  * Returns if ap_aqic function failed with invalid, deconfigured or
  * checkstopped AP.
  */
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
+static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 {
 	struct ap_qirq_ctrl aqic_gisa = {};
 	struct ap_queue_status status;
@@ -1137,8 +1137,8 @@ static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 	return q;
 }
 
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-			     unsigned int retry)
+static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
+				    unsigned int retry)
 {
 	struct ap_queue_status status;
 	struct vfio_ap_queue *q;
@@ -1321,3 +1321,31 @@ void vfio_ap_mdev_unregister(void)
 {
 	mdev_unregister_device(&matrix_dev->device);
 }
+
+int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
+{
+	struct vfio_ap_queue *q;
+
+	q = kzalloc(sizeof(*q), GFP_KERNEL);
+	if (!q)
+		return -ENOMEM;
+	dev_set_drvdata(&apdev->device, q);
+	q->apqn = to_ap_queue(&apdev->device)->qid;
+	q->saved_isc = VFIO_AP_ISC_INVALID;
+	return 0;
+}
+
+void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
+{
+	struct vfio_ap_queue *q;
+	int apid, apqi;
+
+	mutex_lock(&matrix_dev->lock);
+	q = dev_get_drvdata(&apdev->device);
+	dev_set_drvdata(&apdev->device, NULL);
+	apid = AP_QID_CARD(q->apqn);
+	apqi = AP_QID_QUEUE(q->apqn);
+	vfio_ap_mdev_reset_queue(apid, apqi, 1);
+	kfree(q);
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 0db6fb3d56d5..d9003de4fbad 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -90,8 +90,6 @@ struct ap_matrix_mdev {
 
 extern int vfio_ap_mdev_register(void);
 extern void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-			     unsigned int retry);
 
 struct vfio_ap_queue {
 	struct ap_matrix_mdev *matrix_mdev;
@@ -100,4 +98,8 @@ struct vfio_ap_queue {
 #define VFIO_AP_ISC_INVALID 0xff
 	unsigned char saved_isc;
 };
+
+int vfio_ap_mdev_probe_queue(struct ap_device *queue);
+void vfio_ap_mdev_remove_queue(struct ap_device *queue);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 04/15] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (2 preceding siblings ...)
  2020-12-23  1:15 ` [PATCH v13 03/15] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2020-12-23  1:15 ` [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

This patch refactors the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index a83d6e75361b..835c963ae16d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,13 +27,6 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-static int match_apqn(struct device *dev, const void *data)
-{
-	struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
-	return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
 /**
  * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
  * @matrix_mdev: the associated mediated matrix
@@ -49,7 +42,7 @@ static struct vfio_ap_queue *vfio_ap_get_queue(
 					struct ap_matrix_mdev *matrix_mdev,
 					int apqn)
 {
-	struct vfio_ap_queue *q;
+	struct vfio_ap_queue *q = NULL;
 
 	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
 		return NULL;
@@ -1124,15 +1117,17 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
 {
-	struct device *dev;
+	struct ap_queue *queue;
 	struct vfio_ap_queue *q = NULL;
 
-	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				 &apqn, match_apqn);
-	if (dev) {
-		q = dev_get_drvdata(dev);
-		put_device(dev);
-	}
+	queue = ap_get_qdev(apqn);
+	if (!queue)
+		return NULL;
+
+	put_device(&queue->ap_dev.device);
+
+	if (queue->ap_dev.device.driver == &matrix_dev->vfio_ap_drv->driver)
+		q = dev_get_drvdata(&queue->ap_dev.device);
 
 	return q;
 }
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (3 preceding siblings ...)
  2020-12-23  1:15 ` [PATCH v13 04/15] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2021-01-11 19:17   ` Halil Pasic
  2020-12-23  1:15 ` [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue's APQN is assigned. The idea
is to facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

   * When the queue device is probed, if its APQN is assigned to a matrix
     mdev, the structures representing the queue device and the matrix mdev
     will be linked.

   * When an adapter or domain is assigned to a matrix mdev, for each new
     APQN assigned that references a queue device bound to the vfio_ap
     device driver, the structures representing the queue device and the
     matrix mdev will be linked.

The links will be removed as follows:

   * When the queue device is removed, if its APQN is assigned to a matrix
     mdev, the structures representing the queue device and the matrix mdev
     will be unlinked.

   * When an adapter or domain is unassigned from a matrix mdev, for each
     APQN unassigned that references a queue device bound to the vfio_ap
     device driver, the structures representing the queue device and the
     matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 140 +++++++++++++++++++++-----
 drivers/s390/crypto/vfio_ap_private.h |   3 +
 2 files changed, 117 insertions(+), 26 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 835c963ae16d..cdcc6378b4a5 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -27,33 +27,17 @@
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
 
-/**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
- * @matrix_mdev: the associated mediated matrix
- * @apqn: The queue APQN
- *
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
- *
- * Returns the pointer to the associated vfio_ap_queue
- */
-static struct vfio_ap_queue *vfio_ap_get_queue(
-					struct ap_matrix_mdev *matrix_mdev,
-					int apqn)
+static struct vfio_ap_queue *
+vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
 {
-	struct vfio_ap_queue *q = NULL;
-
-	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
-		return NULL;
-	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
-		return NULL;
+	struct vfio_ap_queue *q;
 
-	q = vfio_ap_find_queue(apqn);
-	if (q)
-		q->matrix_mdev = matrix_mdev;
+	hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+		if (q && (q->apqn == apqn))
+			return q;
+	}
 
-	return q;
+	return NULL;
 }
 
 /**
@@ -166,7 +150,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 		  status.response_code);
 end_free:
 	vfio_ap_free_aqic_resources(q);
-	q->matrix_mdev = NULL;
 	return status;
 }
 
@@ -282,7 +265,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
 				   struct ap_matrix_mdev, pqap_hook);
 
-	q = vfio_ap_get_queue(matrix_mdev, apqn);
+	q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
 	if (!q)
 		goto out_unlock;
 
@@ -325,6 +308,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
 	matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -553,6 +537,50 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
 	return 0;
 }
 
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+				    struct vfio_ap_queue *q)
+{
+	if (q) {
+		q->matrix_mdev = matrix_mdev;
+		hash_add(matrix_mdev->qtable,
+			 &q->mdev_qnode, q->apqn);
+	}
+}
+
+static void vfio_ap_mdev_link_apqn(struct ap_matrix_mdev *matrix_mdev, int apqn)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_find_queue(apqn);
+	vfio_ap_mdev_link_queue(matrix_mdev, q);
+}
+
+static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
+{
+	if (q) {
+		q->matrix_mdev = NULL;
+		hash_del(&q->mdev_qnode);
+	}
+}
+
+static void vfio_ap_mdev_unlink_apqn(int apqn)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_find_queue(apqn);
+	vfio_ap_mdev_unlink_queue(q);
+}
+
+static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
+				      unsigned long apid)
+{
+	unsigned long apqi;
+
+	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
+		vfio_ap_mdev_link_apqn(matrix_mdev,
+				       AP_MKQID(apid, apqi));
+}
+
 /**
  * assign_adapter_store
  *
@@ -622,6 +650,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (ret)
 		goto share_err;
 
+	vfio_ap_mdev_link_adapter(matrix_mdev, apid);
 	ret = count;
 	goto done;
 
@@ -634,6 +663,15 @@ static ssize_t assign_adapter_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(assign_adapter);
 
+static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long apid)
+{
+	unsigned long apqi;
+
+	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
+		vfio_ap_mdev_unlink_apqn(AP_MKQID(apid, apqi));
+}
+
 /**
  * unassign_adapter_store
  *
@@ -673,6 +711,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+	vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -699,6 +738,15 @@ vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
 	return 0;
 }
 
+static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
+				     unsigned long apqi)
+{
+	unsigned long apid;
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
+		vfio_ap_mdev_link_apqn(matrix_mdev, AP_MKQID(apid, apqi));
+}
+
 /**
  * assign_domain_store
  *
@@ -763,6 +811,7 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (ret)
 		goto share_err;
 
+	vfio_ap_mdev_link_domain(matrix_mdev, apqi);
 	ret = count;
 	goto done;
 
@@ -775,6 +824,14 @@ static ssize_t assign_domain_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(assign_domain);
 
+static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
+				       unsigned long apqi)
+{
+	unsigned long apid;
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
+		vfio_ap_mdev_unlink_apqn(AP_MKQID(apid, apqi));
+}
 
 /**
  * unassign_domain_store
@@ -815,6 +872,7 @@ static ssize_t unassign_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+	vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -1317,6 +1375,28 @@ void vfio_ap_mdev_unregister(void)
 	mdev_unregister_device(&matrix_dev->device);
 }
 
+/*
+ * vfio_ap_queue_link_mdev
+ *
+ * @q: The queue to link with the matrix mdev.
+ *
+ * Links @q with the matrix mdev to which the queue's APQN is assigned.
+ */
+static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
+{
+	unsigned long apid = AP_QID_CARD(q->apqn);
+	unsigned long apqi = AP_QID_QUEUE(q->apqn);
+	struct ap_matrix_mdev *matrix_mdev;
+
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
+		    test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+			vfio_ap_mdev_link_queue(matrix_mdev, q);
+			break;
+		}
+	}
+}
+
 int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 {
 	struct vfio_ap_queue *q;
@@ -1324,9 +1404,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 	q = kzalloc(sizeof(*q), GFP_KERNEL);
 	if (!q)
 		return -ENOMEM;
+	mutex_lock(&matrix_dev->lock);
 	dev_set_drvdata(&apdev->device, q);
 	q->apqn = to_ap_queue(&apdev->device)->qid;
 	q->saved_isc = VFIO_AP_ISC_INVALID;
+	vfio_ap_queue_link_mdev(q);
+	mutex_unlock(&matrix_dev->lock);
+
 	return 0;
 }
 
@@ -1341,6 +1425,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 	apid = AP_QID_CARD(q->apqn);
 	apqi = AP_QID_QUEUE(q->apqn);
 	vfio_ap_mdev_reset_queue(apid, apqi, 1);
+
+	if (q->matrix_mdev)
+		vfio_ap_mdev_unlink_queue(q);
+
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
 }
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index d9003de4fbad..4e5cc72fc0db 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -18,6 +18,7 @@
 #include <linux/delay.h>
 #include <linux/mutex.h>
 #include <linux/kvm_host.h>
+#include <linux/hashtable.h>
 
 #include "ap_bus.h"
 
@@ -86,6 +87,7 @@ struct ap_matrix_mdev {
 	struct kvm *kvm;
 	struct kvm_s390_module_hook pqap_hook;
 	struct mdev_device *mdev;
+	DECLARE_HASHTABLE(qtable, 8);
 };
 
 extern int vfio_ap_mdev_register(void);
@@ -97,6 +99,7 @@ struct vfio_ap_queue {
 	int	apqn;
 #define VFIO_AP_ISC_INVALID 0xff
 	unsigned char saved_isc;
+	struct hlist_node mdev_qnode;
 };
 
 int vfio_ap_mdev_probe_queue(struct ap_device *queue);
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (4 preceding siblings ...)
  2020-12-23  1:15 ` [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2021-01-11 20:40   ` Halil Pasic
  2020-12-23  1:15 ` [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB Tony Krowiak
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if each APQN resulting from the assignment
does not reference an AP queue device that is bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
   1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
   2. Are not assigned to another matrix mdev.

The rationale behind this is twofold:
   1. The AP architecture does not preclude assignment of APQNs to an AP
      configuration that are not available to the system.
   2. APQNs that do not reference a queue device bound to the vfio_ap
      device driver will not be assigned to the guest's CRYCB, so the
      guest will not get access to queues not bound to the vfio_ap driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 241 ++++++++----------------------
 1 file changed, 62 insertions(+), 179 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index cdcc6378b4a5..2d58b39977be 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -379,134 +379,37 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
 	NULL,
 };
 
-struct vfio_ap_queue_reserved {
-	unsigned long *apid;
-	unsigned long *apqi;
-	bool reserved;
-};
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+			 "already assigned to %s"
 
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+					 unsigned long *apm,
+					 unsigned long *aqm)
 {
-	struct vfio_ap_queue_reserved *qres = data;
-	struct ap_queue *ap_queue = to_ap_queue(dev);
-	ap_qid_t qid;
-	unsigned long id;
-
-	if (qres->apid && qres->apqi) {
-		qid = AP_MKQID(*qres->apid, *qres->apqi);
-		if (qid == ap_queue->qid)
-			qres->reserved = true;
-	} else if (qres->apid && !qres->apqi) {
-		id = AP_QID_CARD(ap_queue->qid);
-		if (id == *qres->apid)
-			qres->reserved = true;
-	} else if (!qres->apid && qres->apqi) {
-		id = AP_QID_QUEUE(ap_queue->qid);
-		if (id == *qres->apqi)
-			qres->reserved = true;
-	} else {
-		return -EINVAL;
-	}
+	unsigned long apid, apqi;
 
-	return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-					 unsigned long *apqi)
-{
-	int ret;
-	struct vfio_ap_queue_reserved qres;
-
-	qres.apid = apid;
-	qres.apqi = apqi;
-	qres.reserved = false;
-
-	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				     &qres, vfio_ap_has_queue);
-	if (ret)
-		return ret;
-
-	if (qres.reserved)
-		return 0;
-
-	return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apid)
-{
-	int ret;
-	unsigned long apqi;
-	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(&apid, NULL);
-
-	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
+	for_each_set_bit_inv(apid, apm, AP_DEVICES)
+		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+			pr_warn(MDEV_SHARING_ERR, apid, apqi, mdev_name);
 }
 
 /**
  * vfio_ap_mdev_verify_no_sharing
  *
- * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
- * mediated device. AP queue sharing is not allowed.
+ * Verifies that each APQN derived from the Cartesian product of the AP adapter
+ * IDs and AP queue indexes comprising the AP matrix are not configured for
+ * another mediated device. AP queue sharing is not allowed.
  *
- * @matrix_mdev: the mediated matrix device
+ * @matrix_mdev: the mediated matrix device to which the APQNs being verified
+ *		 are assigned.
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
  *
- * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
+ * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
  */
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
+					  unsigned long *mdev_apm,
+					  unsigned long *mdev_aqm)
 {
 	struct ap_matrix_mdev *lstdev;
 	DECLARE_BITMAP(apm, AP_DEVICES);
@@ -523,20 +426,31 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
 		 * We work on full longs, as we can only exclude the leftover
 		 * bits in non-inverse order. The leftover is all zeros.
 		 */
-		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
-				lstdev->matrix.apm, AP_DEVICES))
+		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
 			continue;
 
-		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
-				lstdev->matrix.aqm, AP_DOMAINS))
+		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
 			continue;
 
-		return -EADDRINUSE;
+		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
+					     apm, aqm);
+
+		return -EBUSY;
 	}
 
 	return 0;
 }
 
+static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
+				       unsigned long *mdev_apm,
+				       unsigned long *mdev_aqm)
+{
+	if (ap_apqn_in_matrix_owned_by_def_drv(mdev_apm, mdev_aqm))
+		return -EADDRNOTAVAIL;
+
+	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
+}
+
 static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
 				    struct vfio_ap_queue *q)
 {
@@ -608,10 +522,10 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
  *	   driver; or, if no APQIs have yet been assigned, the APID is not
  *	   contained in an APQN bound to the vfio_ap device driver.
  *
- *	4. -EADDRINUSE
+ *	4. -EBUSY
  *	   An APQN derived from the cross product of the APID being assigned
  *	   and the APQIs previously assigned is being used by another mediated
- *	   matrix device
+ *	   matrix device or the mdev lock could not be acquired.
  */
 static ssize_t assign_adapter_store(struct device *dev,
 				    struct device_attribute *attr,
@@ -619,6 +533,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 {
 	int ret;
 	unsigned long apid;
+	DECLARE_BITMAP(apm, AP_DEVICES);
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
@@ -633,33 +548,24 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (apid > matrix_mdev->matrix.apm_max)
 		return -ENODEV;
 
-	/*
-	 * Set the bit in the AP mask (APM) corresponding to the AP adapter
-	 * number (APID). The bits in the mask, from most significant to least
-	 * significant bit, correspond to APIDs 0-255.
-	 */
+	memset(apm, 0, sizeof(apm));
+	set_bit_inv(apid, apm);
+
 	mutex_lock(&matrix_dev->lock);
 
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
-	if (ret)
-		goto done;
+	ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
+					  matrix_mdev->matrix.aqm);
+	if (ret) {
+		mutex_unlock(&matrix_dev->lock);
+		return ret;
+	}
 
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
-
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
-	if (ret)
-		goto share_err;
-
 	vfio_ap_mdev_link_adapter(matrix_mdev, apid);
-	ret = count;
-	goto done;
 
-share_err:
-	clear_bit_inv(apid, matrix_mdev->matrix.apm);
-done:
 	mutex_unlock(&matrix_dev->lock);
 
-	return ret;
+	return count;
 }
 static DEVICE_ATTR_WO(assign_adapter);
 
@@ -718,26 +624,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apqi)
-{
-	int ret;
-	unsigned long apid;
-	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
-	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
 static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
 				     unsigned long apqi)
 {
@@ -774,10 +660,10 @@ static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
  *	   driver; or, if no APIDs have yet been assigned, the APQI is not
  *	   contained in an APQN bound to the vfio_ap device driver.
  *
- *	4. -EADDRINUSE
+ *	4. -BUSY
  *	   An APQN derived from the cross product of the APQI being assigned
  *	   and the APIDs previously assigned is being used by another mediated
- *	   matrix device
+ *	   matrix device or the mdev lock could not be acquired.
  */
 static ssize_t assign_domain_store(struct device *dev,
 				   struct device_attribute *attr,
@@ -785,6 +671,7 @@ static ssize_t assign_domain_store(struct device *dev,
 {
 	int ret;
 	unsigned long apqi;
+	DECLARE_BITMAP(aqm, AP_DOMAINS);
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
@@ -799,28 +686,24 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (apqi > max_apqi)
 		return -ENODEV;
 
+	memset(aqm, 0, sizeof(aqm));
+	set_bit_inv(apqi, aqm);
+
 	mutex_lock(&matrix_dev->lock);
 
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
-	if (ret)
-		goto done;
+	ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
+					  aqm);
+	if (ret) {
+		mutex_unlock(&matrix_dev->lock);
+		return ret;
+	}
 
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
-
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
-	if (ret)
-		goto share_err;
-
 	vfio_ap_mdev_link_domain(matrix_mdev, apqi);
-	ret = count;
-	goto done;
 
-share_err:
-	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
-done:
 	mutex_unlock(&matrix_dev->lock);
 
-	return ret;
+	return count;
 }
 static DEVICE_ATTR_WO(assign_domain);
 
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (5 preceding siblings ...)
  2020-12-23  1:15 ` [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2021-01-11 22:50   ` Halil Pasic
  2020-12-23  1:15 ` [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 15 +++++++++++++++
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 2d58b39977be..44b3a81cadfb 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -293,6 +293,20 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
 	matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev))
+		kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+					  matrix_mdev->shadow_apcb.apm,
+					  matrix_mdev->shadow_apcb.aqm,
+					  matrix_mdev->shadow_apcb.adm);
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
 	struct ap_matrix_mdev *matrix_mdev;
@@ -308,6 +322,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
 	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 4e5cc72fc0db..d2d26ba18602 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
  * @list:	allows the ap_matrix_mdev struct to be added to a list
  * @matrix:	the adapters, usage domains and control domains assigned to the
  *		mediated matrix device.
+ * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
  * @group_notifier: notifier block used for specifying callback function for
  *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
  * @kvm:	the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
 struct ap_matrix_mdev {
 	struct list_head node;
 	struct ap_matrix matrix;
+	struct ap_matrix shadow_apcb;
 	struct notifier_block group_notifier;
 	struct notifier_block iommu_notifier;
 	struct kvm *kvm;
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (6 preceding siblings ...)
  2020-12-23  1:15 ` [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB Tony Krowiak
@ 2020-12-23  1:15 ` Tony Krowiak
  2021-01-11 22:58   ` Halil Pasic
  2020-12-23  1:16 ` [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:15 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The matrix of adapters and domains configured in a guest's APCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of
adapters and domains that are or will be assigned to the APCB of a guest
that is or will be using the matrix mdev. For a matrix mdev denoted by
$uuid, the guest matrix can be displayed as follows:

   cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 51 ++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 44b3a81cadfb..1b1d5975ee0e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -894,29 +894,24 @@ static ssize_t control_domains_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(control_domains);
 
-static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
-			   char *buf)
+static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
 {
-	struct mdev_device *mdev = mdev_from_dev(dev);
-	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	char *bufpos = buf;
 	unsigned long apid;
 	unsigned long apqi;
 	unsigned long apid1;
 	unsigned long apqi1;
-	unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
-	unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
+	unsigned long napm_bits = matrix->apm_max + 1;
+	unsigned long naqm_bits = matrix->aqm_max + 1;
 	int nchars = 0;
 	int n;
 
-	apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
-	apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
-
-	mutex_lock(&matrix_dev->lock);
+	apid1 = find_first_bit_inv(matrix->apm, napm_bits);
+	apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
 
 	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
-		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
-			for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
+			for_each_set_bit_inv(apqi, matrix->aqm,
 					     naqm_bits) {
 				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
 					    apqi);
@@ -925,25 +920,52 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
 			}
 		}
 	} else if (apid1 < napm_bits) {
-		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
+		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
 			n = sprintf(bufpos, "%02lx.\n", apid);
 			bufpos += n;
 			nchars += n;
 		}
 	} else if (apqi1 < naqm_bits) {
-		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
+		for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
 			n = sprintf(bufpos, ".%04lx\n", apqi);
 			bufpos += n;
 			nchars += n;
 		}
 	}
 
+	return nchars;
+}
+
+static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	ssize_t nchars;
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+	mutex_lock(&matrix_dev->lock);
+	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->matrix, buf);
 	mutex_unlock(&matrix_dev->lock);
 
 	return nchars;
 }
 static DEVICE_ATTR_RO(matrix);
 
+static ssize_t guest_matrix_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	ssize_t nchars;
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+
+	mutex_lock(&matrix_dev->lock);
+	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
+	mutex_unlock(&matrix_dev->lock);
+
+	return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
 static struct attribute *vfio_ap_mdev_attrs[] = {
 	&dev_attr_assign_adapter.attr,
 	&dev_attr_unassign_adapter.attr,
@@ -953,6 +975,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
 	&dev_attr_unassign_control_domain.attr,
 	&dev_attr_control_domains.attr,
 	&dev_attr_matrix.attr,
+	&dev_attr_guest_matrix.attr,
 	NULL,
 };
 
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (7 preceding siblings ...)
  2020-12-23  1:15 ` [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
@ 2020-12-23  1:16 ` Tony Krowiak
  2021-01-12  1:12   ` Halil Pasic
  2020-12-23  1:16 ` [PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Let's allow adapters, domains and control domains to be hot plugged into
and hot unplugged from a KVM guest using a matrix mdev when:

* The adapter, domain or control domain is assigned to or unassigned from
  the matrix mdev

* A queue device with an APQN assigned to the matrix mdev is bound to or
  unbound from the vfio_ap device driver.

Whenever an assignment or unassignment of an adapter, domain or control
domain is performed as well as when a bind or unbind of a queue device
is executed, the AP control block (APCB) that supplies the AP configuration
to a guest is first refreshed. The APCB is refreshed by copying the AP
configuration from the mdev's matrix to the APCB, then filtering the
APCB according to the following rules:

* The APID of each adapter and the APQI of each domain that is not in the
  host's AP configuration is filtered out.

* The APID of each adapter comprising an APQN that does not reference a
  queue device bound to the vfio_ap device driver is filtered. The APQNs
  are derived from the Cartesian product of the APID of each adapter and
  APQI of each domain assigned to the mdev's matrix.

After refreshing the APCB, if the mdev is in use by a KVM guest, it is
hot plugged into the guest to provide access to dynamically provide
access to the adapters, domains and control domains provided via the
newly refreshed APCB.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 143 ++++++++++++++++++++++++------
 1 file changed, 118 insertions(+), 25 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 1b1d5975ee0e..843862c88379 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -307,6 +307,88 @@ static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
 					  matrix_mdev->shadow_apcb.adm);
 }
 
+static void vfio_ap_mdev_filter_apcb(struct ap_matrix_mdev *matrix_mdev,
+				     struct ap_matrix *shadow_apcb)
+{
+	int ret;
+	unsigned long apid, apqi, apqn;
+
+	ret = ap_qci(&matrix_dev->info);
+	if (ret)
+		return;
+
+	memcpy(shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
+
+	/*
+	 * Copy the adapters, domains and control domains to the shadow_apcb
+	 * from the matrix mdev, but only those that are assigned to the host's
+	 * AP configuration.
+	 */
+	bitmap_and(shadow_apcb->apm, matrix_mdev->matrix.apm,
+		   (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
+	bitmap_and(shadow_apcb->aqm, matrix_mdev->matrix.aqm,
+		   (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
+	bitmap_and(shadow_apcb->adm, matrix_mdev->matrix.adm,
+		   (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+
+	/* If there are no APQNs assigned, then filtering them be unnecessary */
+	if (bitmap_empty(shadow_apcb->apm, AP_DEVICES)) {
+		if (!bitmap_empty(shadow_apcb->aqm, AP_DOMAINS))
+			bitmap_clear(shadow_apcb->aqm, 0, AP_DOMAINS);
+		return;
+	} else if (bitmap_empty(shadow_apcb->aqm, AP_DOMAINS)) {
+		if (!bitmap_empty(shadow_apcb->apm, AP_DEVICES))
+			bitmap_clear(shadow_apcb->apm, 0, AP_DEVICES);
+		return;
+	}
+
+	for_each_set_bit_inv(apid, shadow_apcb->apm, AP_DEVICES) {
+		for_each_set_bit_inv(apqi, shadow_apcb->aqm, AP_DOMAINS) {
+			/*
+			 * If the APQN is not bound to the vfio_ap device
+			 * driver, then we can't assign it to the guest's
+			 * AP configuration. The AP architecture won't
+			 * allow filtering of a single APQN, so if we're
+			 * filtering APIDs, then filter the APID; otherwise,
+			 * filter the APQI.
+			 */
+			apqn = AP_MKQID(apid, apqi);
+			if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
+				clear_bit_inv(apid, shadow_apcb->apm);
+				break;
+			}
+		}
+	}
+}
+
+/**
+ * vfio_ap_mdev_refresh_apcb
+ *
+ * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
+ * device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
+ * @shadow_apcb:  the shadow of the KVM guest's APCB (contains AP configuration
+ *		  for guest)
+ * @filter_apids: boolean value indicating whether the APQNs shall be filtered
+ *		  by APID (true) or by APQI (false).
+ *
+ * Returns the number of APQNs remaining after filtering is complete.
+ */
+static void vfio_ap_mdev_refresh_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+	struct ap_matrix shadow_apcb;
+
+	vfio_ap_mdev_filter_apcb(matrix_mdev, &shadow_apcb);
+
+	if (memcmp(&shadow_apcb, &matrix_mdev->shadow_apcb,
+		   sizeof(struct ap_matrix)) != 0) {
+		memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
+		       sizeof(struct ap_matrix));
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
 	struct ap_matrix_mdev *matrix_mdev;
@@ -552,10 +634,6 @@ static ssize_t assign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -577,6 +655,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_adapter(matrix_mdev, apid);
+	vfio_ap_mdev_refresh_apcb(matrix_mdev);
 
 	mutex_unlock(&matrix_dev->lock);
 
@@ -619,10 +698,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -633,6 +708,8 @@ static ssize_t unassign_adapter_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
+	vfio_ap_mdev_refresh_apcb(matrix_mdev);
+
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -691,10 +768,6 @@ static ssize_t assign_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
 
-	/* If the guest is running, disallow assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -715,6 +788,7 @@ static ssize_t assign_domain_store(struct device *dev,
 
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_domain(matrix_mdev, apqi);
+	vfio_ap_mdev_refresh_apcb(matrix_mdev);
 
 	mutex_unlock(&matrix_dev->lock);
 
@@ -757,10 +831,6 @@ static ssize_t unassign_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -771,12 +841,24 @@ static ssize_t unassign_domain_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
+	vfio_ap_mdev_refresh_apcb(matrix_mdev);
+
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_domain);
 
+static void vfio_ap_mdev_hot_plug_cdom(struct ap_matrix_mdev *matrix_mdev,
+				       unsigned long domid)
+{
+	if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm) &&
+	    test_bit_inv(domid, (unsigned long *) matrix_dev->info.adm)) {
+		set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+}
+
 /**
  * assign_control_domain_store
  *
@@ -802,10 +884,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &id);
 	if (ret)
 		return ret;
@@ -820,12 +898,22 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	 */
 	mutex_lock(&matrix_dev->lock);
 	set_bit_inv(id, matrix_mdev->matrix.adm);
+	vfio_ap_mdev_hot_plug_cdom(matrix_mdev, id);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_control_domain);
 
+static void vfio_ap_mdev_hot_unplug_cdom(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long domid)
+{
+	if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+		clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+}
+
 /**
  * unassign_control_domain_store
  *
@@ -852,10 +940,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
 
-	/* If the guest is running, disallow un-assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &domid);
 	if (ret)
 		return ret;
@@ -864,6 +948,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv(domid, matrix_mdev->matrix.adm);
+	vfio_ap_mdev_hot_unplug_cdom(matrix_mdev, domid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -1089,6 +1174,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 				  matrix_mdev->matrix.aqm,
 				  matrix_mdev->matrix.adm);
 
+	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+
 notify_done:
 	mutex_unlock(&matrix_dev->lock);
 	return notify_rc;
@@ -1330,6 +1417,8 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 	q->apqn = to_ap_queue(&apdev->device)->qid;
 	q->saved_isc = VFIO_AP_ISC_INVALID;
 	vfio_ap_queue_link_mdev(q);
+	if (q->matrix_mdev)
+		vfio_ap_mdev_refresh_apcb(q->matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
@@ -1337,6 +1426,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
 
 void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 {
+	struct ap_matrix_mdev *matrix_mdev;
 	struct vfio_ap_queue *q;
 	int apid, apqi;
 
@@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 	apqi = AP_QID_QUEUE(q->apqn);
 	vfio_ap_mdev_reset_queue(apid, apqi, 1);
 
-	if (q->matrix_mdev)
+	if (q->matrix_mdev) {
+		matrix_mdev = q->matrix_mdev;
 		vfio_ap_mdev_unlink_queue(q);
+		vfio_ap_mdev_refresh_apcb(matrix_mdev);
+	}
 
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (8 preceding siblings ...)
  2020-12-23  1:16 ` [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
@ 2020-12-23  1:16 ` Tony Krowiak
  2021-01-12 16:50   ` Halil Pasic
  2020-12-23  1:16 ` [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The callback
will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
busy error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters and domains
assigned to the matrix mdev). This will enforce the proper procedure for
removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
---
 drivers/s390/crypto/ap_bus.c | 160 ++++++++++++++++++++++++++++++++---
 drivers/s390/crypto/ap_bus.h |   4 +
 2 files changed, 154 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 2758d05a802d..7d8add952dd6 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -36,6 +36,7 @@
 #include <linux/mod_devicetable.h>
 #include <linux/debugfs.h>
 #include <linux/ctype.h>
+#include <linux/module.h>
 
 #include "ap_bus.h"
 #include "ap_debug.h"
@@ -1006,6 +1007,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
 	return 0;
 }
 
+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
+			       unsigned long *newmap)
+{
+	unsigned long size;
+	int rc;
+
+	size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
+	if (*str == '+' || *str == '-') {
+		memcpy(newmap, bitmap, size);
+		rc = modify_bitmap(str, newmap, bits);
+	} else {
+		memset(newmap, 0, size);
+		rc = hex2bitmap(str, newmap, bits);
+	}
+	return rc;
+}
+
 int ap_parse_mask_str(const char *str,
 		      unsigned long *bitmap, int bits,
 		      struct mutex *lock)
@@ -1025,14 +1043,7 @@ int ap_parse_mask_str(const char *str,
 		kfree(newmap);
 		return -ERESTARTSYS;
 	}
-
-	if (*str == '+' || *str == '-') {
-		memcpy(newmap, bitmap, size);
-		rc = modify_bitmap(str, newmap, bits);
-	} else {
-		memset(newmap, 0, size);
-		rc = hex2bitmap(str, newmap, bits);
-	}
+	rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
 	if (rc == 0)
 		memcpy(bitmap, newmap, size);
 	mutex_unlock(lock);
@@ -1224,12 +1235,76 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newapm = (unsigned long *)data;
+
+	/*
+	 * No need to verify whether the driver is using the queues if it is the
+	 * default driver.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/*
+	 * increase the driver's module refcounter to be sure it is not
+	 * going away when we invoke the callback function.
+	 */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use) {
+		rc = ap_drv->in_use(newapm, ap_perms.aqm);
+		if (rc)
+			return rc;
+	}
+
+	/* release the driver's module */
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+	/*
+	 * Check if any bits in the apmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_card_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t apmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	DECLARE_BITMAP(newapm, AP_DEVICES);
+
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
 
-	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
+	rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
+	if (rc)
+		goto done;
+
+	rc = apmask_commit(newapm);
+
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
@@ -1255,12 +1330,77 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+static int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newaqm = (unsigned long *)data;
+
+	/*
+	 * If the reserved bits do not identify queues reserved for use by the
+	 * non-default driver, there is no need to verify the driver is using
+	 * the queues.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/*
+	 * increase the driver's module refcounter to be sure it is not
+	 * going away when we invoke the callback function.
+	 */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use) {
+		rc = ap_drv->in_use(ap_perms.apm, newaqm);
+		if (rc)
+			return rc;
+	}
+
+	/* release the driver's module */
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+	/*
+	 * Check if any bits in the aqmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_queue_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	DECLARE_BITMAP(newaqm, AP_DOMAINS);
+
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
+
+	rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
+	if (rc)
+		goto done;
+
+	rc = aqmask_commit(newaqm);
 
-	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 472efd3a755c..95c9da072f81 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -145,6 +145,7 @@ struct ap_driver {
 
 	int (*probe)(struct ap_device *);
 	void (*remove)(struct ap_device *);
+	int (*in_use)(unsigned long *apm, unsigned long *aqm);
 };
 
 #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -293,6 +294,9 @@ void ap_queue_init_state(struct ap_queue *aq);
 struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
 			       int comp_device_type, unsigned int functions);
 
+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
+
 struct ap_perms {
 	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
 	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (9 preceding siblings ...)
  2020-12-23  1:16 ` [PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
@ 2020-12-23  1:16 ` Tony Krowiak
  2021-01-12  1:20   ` Halil Pasic
  2020-12-23  1:16 ` [PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

There is potential for a deadlock condition between the matrix_dev->lock
used to lock the matrix device during assignment of adapters and domains
and the ap_perms_mutex locked by the AP bus when changes are made to the
sysfs apmask/aqmask attributes.

Consider following scenario (courtesy of Halil Pasic):
1) apmask_store() takes ap_perms_mutex
2) assign_adapter_store() takes matrix_dev->lock
3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
   to take matrix_dev->lock
4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
   which tries to take ap_perms_mutex

BANG!

To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
function to lock the matrix device during assignment of an adapter or
domain to a matrix_mdev as well as during the in_use callback, the
mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
obtained, then the assignment and in_use functions will terminate with
-EBUSY.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  1 +
 drivers/s390/crypto/vfio_ap_ops.c     | 21 ++++++++++++++++++---
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 73bd073fd5d3..8934471b7944 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -147,6 +147,7 @@ static int __init vfio_ap_init(void)
 	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
 	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
 	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
+	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.ids = ap_queue_ids;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 843862c88379..6bc2e80cc565 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -644,7 +644,8 @@ static ssize_t assign_adapter_store(struct device *dev,
 	memset(apm, 0, sizeof(apm));
 	set_bit_inv(apid, apm);
 
-	mutex_lock(&matrix_dev->lock);
+	if (!mutex_trylock(&matrix_dev->lock))
+		return -EBUSY;
 
 	ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
 					  matrix_mdev->matrix.aqm);
@@ -777,7 +778,8 @@ static ssize_t assign_domain_store(struct device *dev,
 	memset(aqm, 0, sizeof(aqm));
 	set_bit_inv(apqi, aqm);
 
-	mutex_lock(&matrix_dev->lock);
+	if (!mutex_trylock(&matrix_dev->lock))
+		return -EBUSY;
 
 	ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
 					  aqm);
@@ -896,7 +898,8 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	 * least significant, correspond to IDs 0 up to the one less than the
 	 * number of control domains that can be assigned.
 	 */
-	mutex_lock(&matrix_dev->lock);
+	if (!mutex_trylock(&matrix_dev->lock))
+		return -EBUSY;
 	set_bit_inv(id, matrix_mdev->matrix.adm);
 	vfio_ap_mdev_hot_plug_cdom(matrix_mdev, id);
 	mutex_unlock(&matrix_dev->lock);
@@ -1446,3 +1449,15 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
 }
+
+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
+{
+	int ret;
+
+	if (!mutex_trylock(&matrix_dev->lock))
+		return -EBUSY;
+	ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
+	mutex_unlock(&matrix_dev->lock);
+
+	return ret;
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index d2d26ba18602..15b7cd74843b 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -107,4 +107,6 @@ struct vfio_ap_queue {
 int vfio_ap_mdev_probe_queue(struct ap_device *queue);
 void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 
+int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (10 preceding siblings ...)
  2020-12-23  1:16 ` [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
@ 2020-12-23  1:16 ` Tony Krowiak
  2021-01-12 16:58   ` Halil Pasic
  2020-12-23  1:16 ` [PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification Tony Krowiak
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

This patch intruduces an extension to the ap bus to notify device drivers
when the host AP configuration changes - i.e., adapters, domains or
control domains are added or removed. To that end, two new callbacks are
introduced for AP device drivers:

  void (*on_config_changed)(struct ap_config_info *new_config_info,
                            struct ap_config_info *old_config_info);

     This callback is invoked at the start of the AP bus scan
     function when it determines that the host AP configuration information
     has changed since the previous scan. This is done by storing
     an old and current QCI info struct and comparing them. If there is any
     difference, the callback is invoked.

     Note that when the AP bus scan detects that AP adapters, domains or
     control domains have been removed from the host's AP configuration, it
     will remove the associated devices from the AP bus subsystem's device
     model. This callback gives the device driver a chance to respond to
     the removal of the AP devices from the host configuration prior to
     calling the device driver's remove callback. The primary purpose of
     this callback is to allow the vfio_ap driver to do a bulk unplug of
     all affected adapters, domains and control domains from affected
     guests rather than unplugging them one at a time when the remove
     callback is invoked.

  void (*on_scan_complete)(struct ap_config_info *new_config_info,
                           struct ap_config_info *old_config_info);

     The on_scan_complete callback is invoked after the ap bus scan is
     complete if the host AP configuration data has changed.

     Note that when the AP bus scan detects that adapters, domains or
     control domains have been added to the host's configuration, it will
     create new devices in the AP bus subsystem's device model. The primary
     purpose of this callback is to allow the vfio_ap driver to do a bulk
     plug of all affected adapters, domains and control domains into
     affected guests rather than plugging them one at a time when the
     probe callback is invoked.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/ap_bus.c | 91 +++++++++++++++++++++++++++++++++++-
 drivers/s390/crypto/ap_bus.h | 12 +++++
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 7d8add952dd6..788bfdaadafd 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -82,6 +82,7 @@ static atomic64_t ap_scan_bus_count;
 static DECLARE_COMPLETION(ap_init_apqn_bindings_complete);
 
 static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;
 
 /*
  * AP bus related debug feature things.
@@ -1579,6 +1580,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
 		&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
 }
 
+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+
+	if (try_module_get(drv->owner)) {
+		if (ap_drv->on_config_changed)
+			ap_drv->on_config_changed(ap_qci_info,
+						  ap_qci_info_old);
+		module_put(drv->owner);
+	}
+
+	return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+	bus_for_each_drv(&ap_bus_type, NULL, NULL,
+			 __drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+
+	if (try_module_get(drv->owner)) {
+		if (ap_drv->on_scan_complete)
+			ap_drv->on_scan_complete(ap_qci_info,
+						 ap_qci_info_old);
+		module_put(drv->owner);
+	}
+
+	return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+	bus_for_each_drv(&ap_bus_type, NULL, NULL,
+			 __drv_notify_scan_complete);
+}
+
+
+
 /*
  * Helper function for ap_scan_bus().
  * Remove card device and associated queue devices.
@@ -1857,15 +1904,51 @@ static inline void ap_scan_adapter(int ap)
 	put_device(&ac->ap_dev.device);
 }
 
+/*
+ * ap_get_configuration
+ *
+ * Stores the host AP configuration information returned from the previous call
+ * to Query Configuration Information (QCI), then retrieves and stores the
+ * current AP configuration returned from QCI.
+ *
+ * Returns true if the host AP configuration changed between calls to QCI;
+ * otherwise, returns false.
+ */
+static bool ap_get_configuration(void)
+{
+	bool cfg_chg = false;
+
+	if (ap_qci_info) {
+		if (!ap_qci_info_old) {
+			ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
+						  GFP_KERNEL);
+			if (!ap_qci_info_old)
+				return false;
+		} else {
+			memcpy(ap_qci_info_old, ap_qci_info,
+			       sizeof(struct ap_config_info));
+		}
+		ap_fetch_qci_info(ap_qci_info);
+		cfg_chg = memcmp(ap_qci_info,
+				 ap_qci_info_old,
+				 sizeof(struct ap_config_info)) != 0;
+	}
+
+	return cfg_chg;
+}
+
 /**
  * ap_scan_bus(): Scan the AP bus for new devices
  * Runs periodically, workqueue timer (ap_config_time)
  */
 static void ap_scan_bus(struct work_struct *unused)
 {
-	int ap;
+	int ap, config_changed = 0;
 
-	ap_fetch_qci_info(ap_qci_info);
+	/* config change notify */
+	config_changed = ap_get_configuration();
+	if (config_changed)
+		notify_config_changed();
 	ap_select_domain();
 
 	AP_DBF_DBG("%s running\n", __func__);
@@ -1874,6 +1957,10 @@ static void ap_scan_bus(struct work_struct *unused)
 	for (ap = 0; ap <= ap_max_adapter_id; ap++)
 		ap_scan_adapter(ap);
 
+	/* scan complete notify */
+	if (config_changed)
+		notify_scan_complete();
+
 	/* check if there is at least one queue available with default domain */
 	if (ap_domain_index >= 0) {
 		struct device *dev =
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 95c9da072f81..e91082bd159c 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -146,6 +146,18 @@ struct ap_driver {
 	int (*probe)(struct ap_device *);
 	void (*remove)(struct ap_device *);
 	int (*in_use)(unsigned long *apm, unsigned long *aqm);
+	/*
+	 * Called at the start of the ap bus scan function when
+	 * the crypto config information (qci) has changed.
+	 */
+	void (*on_config_changed)(struct ap_config_info *new_config_info,
+				  struct ap_config_info *old_config_info);
+	/*
+	 * Called at the end of the ap bus scan function when
+	 * the crypto config information (qci) has changed.
+	 */
+	void (*on_scan_complete)(struct ap_config_info *new_config_info,
+				 struct ap_config_info *old_config_info);
 };
 
 #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (11 preceding siblings ...)
  2020-12-23  1:16 ` [PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
@ 2020-12-23  1:16 ` Tony Krowiak
  2021-01-12 18:39   ` Halil Pasic
  2020-12-23  1:16 ` [PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

The motivation for config change notification is to enable the vfio_ap
device driver to handle hot plug/unplug of AP queues for a KVM guest as a
bulk operation. For example, if a new APID is dynamically assigned to the
host configuration, then a queue device will be created for each APQN that
can be formulated from the new APID and all APQIs already assigned to the
host configuration. Each of these new queue devices will get bound to their
respective driver one at a time, as they are created. In the case of the
vfio_ap driver, if the APQN of the queue device being bound to the driver
is assigned to a matrix mdev in use by a KVM guest, it will be hot plugged
into the guest if possible. Given that the AP architecture allows for 256
adapters and 256 domains, one can see the possibility of the vfio_ap
driver's probe/remove callbacks getting invoked an inordinate number of
times when the host configuration changes. Keep in mind that in order to
plug/unplug an AP queue for a guest, the guest's VCPUs must be suspended,
then the guest's AP configuration must be updated followed by the VCPUs
being resumed. If this is done each time the probe or remove callback is
invoked and there are hundreds or thousands of queues to be probed or
removed, this would be incredibly inefficient and could have a large impact
on guest performance. What the config notification does is allow us to
make the changes to the guest in a single operation.

This patch implements the on_cfg_changed callback which notifies the
AP device drivers that the host AP configuration has changed (i.e.,
adapters, domains and/or control domains are added to or removed from the
host AP configuration).

Adapters added to host configuration:
* The APIDs of the adapters added will be stored in a bitmap contained
  within the struct representing the matrix device which is the parent
  device of all matrix mediated devices.
* When a queue is probed, if the APQN of the queue being probed is
  assigned to an mdev in use by a guest, the queue may get hot plugged
  into the guest; however, if the APID of the adapter is contained in the
  bitmap of adapters added, the queue hot plug operation will be skipped
  until the AP bus notifies the driver that its scan operation has
  completed (another patch).
* When the vfio_ap driver is notified that the AP bus scan has completed,
  the guest's APCB will be refreshed by filtering the mdev's matrix by
  APID.

Domains added to host configuration:
* The APQIs of the domains added will be stored in a bitmap contained
  within the struct representing the matrix device which is the parent
  device of all matrix mediated devices.
* When a queue is probed, if the APQN of the queue being probed is
  assigned to an mdev in use by a guest, the queue may get hot plugged
  into the guest; however, if the APQI of the domain is contained in the
  bitmap of domains added, the queue hot plug operation will be skipped
  until the AP bus notifies the driver that its scan operation has
  completed (another patch).

Control domains added to the host configuration:
* The domain numbers of the domains added will be stored in a bitmap
  contained within the struct representing the matrix device which is the
  parent device of all matrix mediated devices.

When the vfio_ap device driver is notified that the AP bus scan has
completed, the APCB for each matrix mdev to which the adapters, domains
and control domains added are assigned will be refreshed. If a KVM guest is
using the matrix mdev, the APCB will be hot plugged into the guest to
refresh its AP configuration.

Adapters removed from configuration:
* Each queue device with the APID identifying an adapter removed from
  the host AP configuration will be unlinked from the matrix mdev to which
  the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
  device is not linked to the matrix mdev, the refresh of the guest's
  APCB will be skipped.

Domains removed from configuration:
* Each queue device with the APQI identifying a domain removed from
  the host AP configuration will be unlinked from the matrix mdev to which
  the queue's APQN is assigned.
* When the vfio_ap driver's remove callback is invoked, if the queue
  device is not linked to the matrix mdev, the refresh of the guest's
  APCB will be skipped.

If any queues with an APQN assigned to a given matrix mdev have been
unlinked or any control domains assigned to a given matrix mdev have been
removed from the host AP configuration, the APCB of the matrix mdev will
be refreshed. If a KVM guest is using the matrix mdev, the APCB will be hot
plugged into the guest to refresh its AP configuration.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |   3 +-
 drivers/s390/crypto/vfio_ap_ops.c     | 159 +++++++++++++++++++++++---
 drivers/s390/crypto/vfio_ap_private.h |  13 ++-
 3 files changed, 158 insertions(+), 17 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 8934471b7944..2029d8392416 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -87,7 +87,7 @@ static int vfio_ap_matrix_dev_create(void)
 
 	/* Fill in config info via PQAP(QCI), if available */
 	if (test_facility(12)) {
-		ret = ap_qci(&matrix_dev->info);
+		ret = ap_qci(&matrix_dev->config_info);
 		if (ret)
 			goto matrix_alloc_err;
 	}
@@ -148,6 +148,7 @@ static int __init vfio_ap_init(void)
 	vfio_ap_drv.probe = vfio_ap_mdev_probe_queue;
 	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
 	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
+	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
 	vfio_ap_drv.ids = ap_queue_ids;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6bc2e80cc565..8bbbd1dc7546 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -310,13 +310,8 @@ static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
 static void vfio_ap_mdev_filter_apcb(struct ap_matrix_mdev *matrix_mdev,
 				     struct ap_matrix *shadow_apcb)
 {
-	int ret;
 	unsigned long apid, apqi, apqn;
 
-	ret = ap_qci(&matrix_dev->info);
-	if (ret)
-		return;
-
 	memcpy(shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
 
 	/*
@@ -325,11 +320,11 @@ static void vfio_ap_mdev_filter_apcb(struct ap_matrix_mdev *matrix_mdev,
 	 * AP configuration.
 	 */
 	bitmap_and(shadow_apcb->apm, matrix_mdev->matrix.apm,
-		   (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
+		   (unsigned long *)matrix_dev->config_info.apm, AP_DEVICES);
 	bitmap_and(shadow_apcb->aqm, matrix_mdev->matrix.aqm,
-		   (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
+		   (unsigned long *)matrix_dev->config_info.aqm, AP_DOMAINS);
 	bitmap_and(shadow_apcb->adm, matrix_mdev->matrix.adm,
-		   (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
+		   (unsigned long *)matrix_dev->config_info.adm, AP_DOMAINS);
 
 	/* If there are no APQNs assigned, then filtering them be unnecessary */
 	if (bitmap_empty(shadow_apcb->apm, AP_DEVICES)) {
@@ -403,8 +398,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 	}
 
 	matrix_mdev->mdev = mdev;
-	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
-	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
+	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
+	vfio_ap_matrix_init(&matrix_dev->config_info,
+			    &matrix_mdev->shadow_apcb);
 	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -855,7 +851,8 @@ static void vfio_ap_mdev_hot_plug_cdom(struct ap_matrix_mdev *matrix_mdev,
 				       unsigned long domid)
 {
 	if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm) &&
-	    test_bit_inv(domid, (unsigned long *) matrix_dev->info.adm)) {
+	    test_bit_inv(domid,
+			 (unsigned long *) matrix_dev->config_info.adm)) {
 		set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
 		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	}
@@ -1436,11 +1433,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
 	mutex_lock(&matrix_dev->lock);
 	q = dev_get_drvdata(&apdev->device);
 	dev_set_drvdata(&apdev->device, NULL);
-	apid = AP_QID_CARD(q->apqn);
-	apqi = AP_QID_QUEUE(q->apqn);
-	vfio_ap_mdev_reset_queue(apid, apqi, 1);
 
 	if (q->matrix_mdev) {
+		apid = AP_QID_CARD(q->apqn);
+		apqi = AP_QID_QUEUE(q->apqn);
+		vfio_ap_mdev_reset_queue(apid, apqi, 1);
 		matrix_mdev = q->matrix_mdev;
 		vfio_ap_mdev_unlink_queue(q);
 		vfio_ap_mdev_refresh_apcb(matrix_mdev);
@@ -1461,3 +1458,137 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
 
 	return ret;
 }
+
+/*
+ * vfio_ap_mdev_unlink_apids
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apid_rem: The bitmap specifying the APIDs of the adapters removed from
+ *	      the host's AP configuration
+ *
+ * Unlinks @matrix_mdev from each queue assigned to @matrix_mdev whose APQN
+ * contains an APID specified in @apid_rem.
+ *
+ * Returns true if one or more AP queue devices were unlinked; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unlink_apids(struct ap_matrix_mdev *matrix_mdev,
+				      unsigned long *apid_rem)
+{
+	int bkt, apid, apqi;
+	bool q_unlinked = false;
+	struct vfio_ap_queue *q;
+
+	hash_for_each(matrix_mdev->qtable, bkt, q, mdev_qnode) {
+		apid = AP_QID_CARD(q->apqn);
+		if (test_bit_inv(apid, apid_rem)) {
+			apqi = AP_QID_QUEUE(q->apqn);
+			vfio_ap_mdev_reset_queue(apid, apqi, 1);
+			vfio_ap_mdev_unlink_queue(q);
+			q_unlinked = true;
+		}
+	}
+
+	return q_unlinked;
+}
+
+/*
+ * vfio_ap_mdev_unlink_apqis
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @apqi_rem: The bitmap specifying the APQIs of the domains removed from
+ *	      the host's AP configuration
+ *
+ * Unlinks @matrix_mdev from each queue assigned to @matrix_mdev whose APQN
+ * contains an APQI specified in @apqi_rem.
+ *
+ * Returns true if one or more AP queue devices were unlinked; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unlink_apqis(struct ap_matrix_mdev *matrix_mdev,
+				      unsigned long *apqi_rem)
+{
+	int bkt, apid, apqi;
+	bool q_unlinked = false;
+	struct vfio_ap_queue *q;
+
+	hash_for_each(matrix_mdev->qtable, bkt, q, mdev_qnode) {
+		apqi = AP_QID_QUEUE(q->apqn);
+		if (test_bit_inv(apqi, apqi_rem)) {
+			apid = AP_QID_CARD(q->apqn);
+			vfio_ap_mdev_reset_queue(apid, apqi, 1);
+			vfio_ap_mdev_unlink_queue(q);
+			q_unlinked = true;
+		}
+	}
+
+	return q_unlinked;
+}
+
+static void vfio_ap_mdev_on_cfg_remove(void)
+{
+	bool refresh_apcb = false;
+	int ap_remove, aq_remove;
+	struct ap_matrix_mdev *matrix_mdev;
+	DECLARE_BITMAP(aprem, AP_DEVICES);
+	DECLARE_BITMAP(aqrem, AP_DOMAINS);
+	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
+
+	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+
+	ap_remove = bitmap_andnot(aprem, prev_apm, cur_apm, AP_DEVICES);
+	aq_remove = bitmap_andnot(aqrem, prev_aqm, cur_aqm, AP_DOMAINS);
+
+	if (!ap_remove && !aq_remove)
+		return;
+
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (ap_remove)
+			refresh_apcb = vfio_ap_mdev_unlink_apids(matrix_mdev,
+								 aprem);
+
+		if (aq_remove)
+			refresh_apcb = vfio_ap_mdev_unlink_apqis(matrix_mdev,
+								 aqrem);
+
+		if (refresh_apcb)
+			vfio_ap_mdev_refresh_apcb(matrix_mdev);
+	}
+}
+
+static void vfio_ap_mdev_on_cfg_add(void)
+{
+	unsigned long *cur_apm, *cur_aqm, *cur_adm;
+	unsigned long *prev_apm, *prev_aqm, *prev_adm;
+
+	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+	cur_adm = (unsigned long *)matrix_dev->config_info.adm;
+
+	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+	prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
+
+	bitmap_andnot(matrix_dev->ap_add, cur_apm, prev_apm, AP_DEVICES);
+	bitmap_andnot(matrix_dev->aq_add, cur_aqm, prev_aqm, AP_DOMAINS);
+	bitmap_andnot(matrix_dev->ad_add, cur_adm, prev_adm, AP_DOMAINS);
+}
+
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+			    struct ap_config_info *old_config_info)
+{
+	mutex_lock(&matrix_dev->lock);
+	memcpy(&matrix_dev->config_info, new_config_info,
+	       sizeof(struct ap_config_info));
+	memcpy(&matrix_dev->config_info_prev, old_config_info,
+	       sizeof(struct ap_config_info));
+
+	vfio_ap_mdev_on_cfg_remove();
+	vfio_ap_mdev_on_cfg_add();
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 15b7cd74843b..b99b68968447 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -29,7 +29,9 @@
  * ap_matrix_dev - the AP matrix device structure
  * @device:	generic device structure associated with the AP matrix device
  * @available_instances: number of mediated matrix devices that can be created
- * @info:	the struct containing the output from the PQAP(QCI) instruction
+ * @config_info: the current host AP configuration information
+ * @config_info_prev: the host AP configuration information from the previous
+ *		      configuration changed notification
  * mdev_list:	the list of mediated matrix devices created
  * lock:	mutex for locking the AP matrix device. This lock will be
  *		taken every time we fiddle with state managed by the vfio_ap
@@ -40,10 +42,14 @@
 struct ap_matrix_dev {
 	struct device device;
 	atomic_t available_instances;
-	struct ap_config_info info;
+	struct ap_config_info config_info;
+	struct ap_config_info config_info_prev;
 	struct list_head mdev_list;
 	struct mutex lock;
 	struct ap_driver  *vfio_ap_drv;
+	DECLARE_BITMAP(ap_add, AP_DEVICES);
+	DECLARE_BITMAP(aq_add, AP_DOMAINS);
+	DECLARE_BITMAP(ad_add, AP_DOMAINS);
 };
 
 extern struct ap_matrix_dev *matrix_dev;
@@ -109,4 +115,7 @@ void vfio_ap_mdev_remove_queue(struct ap_device *queue);
 
 int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
 
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+			    struct ap_config_info *old_config_info);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (12 preceding siblings ...)
  2020-12-23  1:16 ` [PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification Tony Krowiak
@ 2020-12-23  1:16 ` Tony Krowiak
  2021-01-12 18:44   ` Halil Pasic
  2020-12-23  1:16 ` [PATCH v13 15/15] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
  2021-01-06 15:16 ` [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Implements the driver callback invoked by the AP bus when the AP bus
scan has completed. Since this callback is invoked after binding the newly
added devices to their respective device drivers, the vfio_ap driver will
attempt to hot plug the adapters, domains and control domains into each
guest using the matrix mdev to which they are assigned. Keep in mind that
an adapter or domain can be plugged in only if:
* Each APQN derived from the newly added APID of the adapter and the APQIs
  already assigned to the guest's APCB references an AP queue device bound
  to the vfio_ap driver
* Each APQN derived from the newly added APQI of the domain and the APIDs
  already assigned to the guest's APCB references an AP queue device bound
  to the vfio_ap driver

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  1 +
 drivers/s390/crypto/vfio_ap_ops.c     | 21 +++++++++++++++++++++
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 2029d8392416..075495fc44c0 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -149,6 +149,7 @@ static int __init vfio_ap_init(void)
 	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
 	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
+	vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
 	vfio_ap_drv.ids = ap_queue_ids;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8bbbd1dc7546..b8ed01297812 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1592,3 +1592,24 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
 	vfio_ap_mdev_on_cfg_add();
 	mutex_unlock(&matrix_dev->lock);
 }
+
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+			      struct ap_config_info *old_config_info)
+{
+	struct ap_matrix_mdev *matrix_mdev;
+
+	mutex_lock(&matrix_dev->lock);
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (bitmap_intersects(matrix_mdev->matrix.apm,
+				      matrix_dev->ap_add, AP_DEVICES) ||
+		    bitmap_intersects(matrix_mdev->matrix.aqm,
+				      matrix_dev->aq_add, AP_DOMAINS) ||
+		    bitmap_intersects(matrix_mdev->matrix.adm,
+				      matrix_dev->ad_add, AP_DOMAINS))
+			vfio_ap_mdev_refresh_apcb(matrix_mdev);
+	}
+
+	bitmap_clear(matrix_dev->ap_add, 0, AP_DEVICES);
+	bitmap_clear(matrix_dev->aq_add, 0, AP_DOMAINS);
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index b99b68968447..7f0f7c92e686 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -117,5 +117,7 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
 
 void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
 			    struct ap_config_info *old_config_info);
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+			      struct ap_config_info *old_config_info);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v13 15/15] s390/vfio-ap: update docs to include dynamic config support
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (13 preceding siblings ...)
  2020-12-23  1:16 ` [PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
@ 2020-12-23  1:16 ` Tony Krowiak
  2021-01-06 15:16 ` [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
  15 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2020-12-23  1:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor, Tony Krowiak

Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 Documentation/s390/vfio-ap.rst | 383 ++++++++++++++++++++++++---------
 1 file changed, 284 insertions(+), 99 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..031c2e5ee138 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -123,9 +123,9 @@ Let's now take a look at how AP instructions executed on a guest are interpreted
 by the hardware.
 
 A satellite control block called the Crypto Control Block (CRYCB) is attached to
-our main hardware virtualization control block. The CRYCB contains three fields
-to identify the adapters, usage domains and control domains assigned to the KVM
-guest:
+our main hardware virtualization control block. The CRYCB contains an AP Control
+Block (APCB) that has three fields to identify the adapters, usage domains and
+control domains assigned to the KVM guest:
 
 * The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
   to the KVM guest. Each bit in the mask, from left to right (i.e. from most
@@ -192,7 +192,7 @@ The design introduces three new objects:
 
 1. AP matrix device
 2. VFIO AP device driver (vfio_ap.ko)
-3. VFIO AP mediated matrix pass-through device
+3. VFIO AP mediated pass-through device
 
 The VFIO AP device driver
 -------------------------
@@ -200,12 +200,13 @@ The VFIO AP (vfio_ap) device driver serves the following purposes:
 
 1. Provides the interfaces to secure APQNs for exclusive use of KVM guests.
 
-2. Sets up the VFIO mediated device interfaces to manage a mediated matrix
+2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated
    device and creates the sysfs interfaces for assigning adapters, usage
    domains, and control domains comprising the matrix for a KVM guest.
 
-3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
-   SIE state description to grant the guest access to a matrix of AP devices
+3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB referenced
+   by a KVM guest's SIE state description to grant the guest access to a matrix
+   of AP devices
 
 Reserve APQNs for exclusive use of KVM guests
 ---------------------------------------------
@@ -253,7 +254,7 @@ The process for reserving an AP queue for use by a KVM guest is:
 1. The administrator loads the vfio_ap device driver
 2. The vfio-ap driver during its initialization will register a single 'matrix'
    device with the device core. This will serve as the parent device for
-   all mediated matrix devices used to configure an AP matrix for a guest.
+   all vfio_ap mediated devices used to configure an AP matrix for a guest.
 3. The /sys/devices/vfio_ap/matrix device is created by the device core
 4. The vfio_ap device driver will register with the AP bus for AP queue devices
    of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +270,7 @@ The process for reserving an AP queue for use by a KVM guest is:
    default zcrypt cex4queue driver.
 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
    it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type vfio_ap mediated device to be
    used by a guest
 10. The administrator assigns the adapters, usage domains and control domains
     to be exclusively used by a guest.
@@ -279,14 +280,14 @@ Set up the VFIO mediated device interfaces
 The VFIO AP device driver utilizes the common interface of the VFIO mediated
 device core driver to:
 
-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a vfio_ap mediated device to and
   remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a vfio_ap mediated device
+* Add a vfio_ap mediated device to and remove it from the AP mediated bus driver
+* Add a vfio_ap mediated device to and remove it from an IOMMU group
 
 The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP mediated device driver::
 
    +-------------+
    |             |
@@ -343,7 +344,7 @@ matrix device.
 	* device_api:
 	    the mediated device type's API
 	* available_instances:
-	    the number of mediated matrix passthrough devices
+	    the number of vfio_ap mediated passthrough devices
 	    that can be created
 	* device_api:
 	    specifies the VFIO API
@@ -351,29 +352,37 @@ matrix device.
     This attribute group identifies the user-defined sysfs attributes of the
     mediated device. When a device is registered with the VFIO mediated device
     framework, the sysfs attribute files identified in the 'mdev_attr_groups'
-    structure will be created in the mediated matrix device's directory. The
-    sysfs attributes for a mediated matrix device are:
+    structure will be created in the vfio_ap mediated device's directory. The
+    sysfs attributes for a vfio_ap mediated device are:
 
     assign_adapter / unassign_adapter:
       Write-only attributes for assigning/unassigning an AP adapter to/from the
-      mediated matrix device. To assign/unassign an adapter, the APID of the
+      vfio_ap mediated device. To assign/unassign an adapter, the APID of the
       adapter is echoed to the respective attribute file.
     assign_domain / unassign_domain:
       Write-only attributes for assigning/unassigning an AP usage domain to/from
-      the mediated matrix device. To assign/unassign a domain, the domain
+      the vfio_ap mediated device. To assign/unassign a domain, the domain
       number of the usage domain is echoed to the respective attribute
       file.
     matrix:
-      A read-only file for displaying the APQNs derived from the cross product
-      of the adapter and domain numbers assigned to the mediated matrix device.
+      A read-only file for displaying the APQNs derived from the Cartesian
+      product of the adapter and domain numbers assigned to the vfio_ap mediated
+      device.
+    guest_matrix:
+      A read-only file for displaying the APQNs derived from the Cartesian
+      product of the adapter and domain numbers assigned to the APM and AQM
+      fields respectively of the KVM guest's CRYCB. This may differ from the
+      the APQNs assigned to the vfio_ap mediated device if any APQN does not
+      reference a queue device bound to the vfio_ap device driver (i.e., the
+      queue is not in the host's AP configuration).
     assign_control_domain / unassign_control_domain:
       Write-only attributes for assigning/unassigning an AP control domain
-      to/from the mediated matrix device. To assign/unassign a control domain,
+      to/from the vfio_ap mediated device. To assign/unassign a control domain,
       the ID of the domain to be assigned/unassigned is echoed to the respective
       attribute file.
     control_domains:
       A read-only file for displaying the control domain numbers assigned to the
-      mediated matrix device.
+      vfio_ap mediated device.
 
 * functions:
 
@@ -385,7 +394,7 @@ matrix device.
       domains assigned via the corresponding sysfs attributes files
 
   remove:
-    deallocates the mediated matrix device's ap_matrix_mdev structure. This will
+    deallocates the vfio_ap mediated device's ap_matrix_mdev structure. This will
     be allowed only if a running guest is not using the mdev.
 
 * callback interfaces
@@ -397,24 +406,44 @@ matrix device.
     for the mdev matrix device to the MDEV bus. Access to the KVM structure used
     to configure the KVM guest is provided via this callback. The KVM structure,
     is used to configure the guest's access to the AP matrix defined via the
-    mediated matrix device's sysfs attribute files.
+    vfio_ap mediated device's sysfs attribute files.
   release:
     unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
     mdev matrix device and deconfigures the guest's AP matrix.
 
-Configure the APM, AQM and ADM in the CRYCB
--------------------------------------------
-Configuring the AP matrix for a KVM guest will be performed when the
+Configure the guest's AP resources
+----------------------------------
+Configuring the AP resources for a KVM guest will be performed when the
 VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
-function is called when QEMU connects to KVM. The guest's AP matrix is
-configured via it's CRYCB by:
+function is called when QEMU connects to KVM. The guest's AP resources are
+configured via it's APCB by:
 
 * Setting the bits in the APM corresponding to the APIDs assigned to the
-  mediated matrix device via its 'assign_adapter' interface.
+  vfio_ap mediated device via its 'assign_adapter' interface.
 * Setting the bits in the AQM corresponding to the domains assigned to the
-  mediated matrix device via its 'assign_domain' interface.
+  vfio_ap mediated device via its 'assign_domain' interface.
 * Setting the bits in the ADM corresponding to the domain dIDs assigned to the
-  mediated matrix device via its 'assign_control_domains' interface.
+  vfio_ap mediated device via its 'assign_control_domains' interface.
+
+The linux device model precludes passing a device through to a KVM guest that
+is not bound to the device driver facilitating its pass-through. Consequently,
+an APQN that does not reference a queue device bound to the vfio_ap device
+driver will not be assigned to a KVM guest's matrix. The AP architecture,
+however, does not provide a means to filter individual APQNs from the guest's
+matrix, so the adapters, domains and control domains assigned to vfio_ap
+mediated device via its sysfs 'assign_adapter', 'assign_domain' and
+'assign_control_domain' interfaces will be filtered before providing the AP
+configuration to a guest:
+
+* The APIDs of the adapters, the APQIs of the domains and the domain numbers of
+  the control domains assigned to the matrix mdev that are not also assigned to
+  the host's AP configuration will be filtered.
+
+* Each APQN derived from the Cartesian product of the APIDs and APQIs assigned
+  to the vfio_ap mdev is examined and if any one of them does not reference a
+  queue device bound to the vfio_ap device driver, the adapter will not be
+  plugged into the guest (i.e., the bit corresponding to its APID will not be
+  set in the APM of the guest's APCB).
 
 The CPU model features for AP
 -----------------------------
@@ -435,16 +464,20 @@ available to a KVM guest via the following CPU model features:
    can be made available to the guest only if it is available on the host (i.e.,
    facility bit 12 is set).
 
+4. apqi: Indicates AP queue interrupts are available on the guest. This facility
+   can be made available to the guest only if it is available on the host (i.e.,
+   facility bit 65 is set).
+
 Note: If the user chooses to specify a CPU model different than the 'host'
 model to QEMU, the CPU model features and facilities need to be turned on
 explicitly; for example::
 
-     /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on
+     /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on,apqi=on
 
 A guest can be precluded from using AP features/facilities by turning them off
 explicitly; for example::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
 
 Note: If the APFT facility is turned off (apft=off) for the guest, the guest
 will not see any AP devices. The zcrypt device drivers that register for type 10
@@ -530,40 +563,56 @@ These are the steps:
 
 2. Secure the AP queues to be used by the three guests so that the host can not
    access them. To secure them, there are two sysfs files that specify
-   bitmasks marking a subset of the APQN range as 'usable by the default AP
-   queue device drivers' or 'not usable by the default device drivers' and thus
-   available for use by the vfio_ap device driver'. The location of the sysfs
-   files containing the masks are::
+   bitmasks marking a subset of the APQN range as usable only by the default AP
+   queue device drivers. All remaining APQNs are available for use by
+   any other device driver. The vfio_ap device driver is currently the only
+   non-default device driver. The location of the sysfs files containing the
+   masks are::
 
      /sys/bus/ap/apmask
      /sys/bus/ap/aqmask
 
    The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
-   (APID). Each bit in the mask, from left to right (i.e., from most significant
-   to least significant bit in big endian order), corresponds to an APID from
-   0-255. If a bit is set, the APID is marked as usable only by the default AP
-   queue device drivers; otherwise, the APID is usable by the vfio_ap
-   device driver.
+   (APID). Each bit in the mask, from left to right, corresponds to an APID from
+   0-255. If a bit is set, the APID belongs to the subset of APQNs marked as
+   available only to the default AP queue device drivers.
 
    The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
-   (APQI). Each bit in the mask, from left to right (i.e., from most significant
-   to least significant bit in big endian order), corresponds to an APQI from
-   0-255. If a bit is set, the APQI is marked as usable only by the default AP
-   queue device drivers; otherwise, the APQI is usable by the vfio_ap device
-   driver.
+   (APQI). Each bit in the mask, from left to right, corresponds to an APQI from
+   0-255. If a bit is set, the APQI belongs to the subset of APQNs marked as
+   available only to the default AP queue device drivers.
+
+   The Cartesian product of the APIDs corresponding to the bits set in the
+   apmask and the APQIs corresponding to the bits set in the aqmask comprise
+   the subset of APQNs that can be used only by the host default device drivers.
+   All other APQNs are available to the non-default device drivers such as the
+   vfio_ap driver.
+
+   Take, for example, the following masks::
+
+      apmask:
+      0x7d00000000000000000000000000000000000000000000000000000000000000
 
-   Take, for example, the following mask::
+      aqmask:
+      0x8000000000000000000000000000000000000000000000000000000000000000
 
-      0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+   The masks indicate:
 
-    It indicates:
+   * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
+     device drivers.
 
-      1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
-      belong to the vfio_ap device driver's pool.
+   * Domain 0 is available for use by the host default device drivers
+
+   * The subset of APQNs available for use only by the default host device
+     drivers are:
+
+     (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
+
+   * All other APQNs are available for use by the non-default device drivers.
 
    The APQN of each AP queue device assigned to the linux host is checked by the
-   AP bus against the set of APQNs derived from the cross product of APIDs
-   and APQIs marked as usable only by the default AP queue device drivers. If a
+   AP bus against the set of APQNs derived from the Cartesian product of APIDs
+   and APQIs marked as available to the default AP queue device drivers. If a
    match is detected,  only the default AP queue device drivers will be probed;
    otherwise, the vfio_ap device driver will be probed.
 
@@ -627,11 +676,22 @@ These are the steps:
 	    default drivers pool:    adapter 0-15, domain 1
 	    alternate drivers pool:  adapter 16-255, domains 0, 2-255
 
+   Note ***:
+   Changing a mask such that one or more APQNs will be taken from a vfio_ap
+   mediated device (see below) will fail with an error (EBUSY). A message
+   is logged to the kernel ring buffer which can be viewed with the 'dmesg'
+   command. The output identifies each APQN flagged as 'in use' and identifies
+   the vfio_ap mediated device to which it is assigned; for example:
+
+   Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
+   Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
+
 Securing the APQNs for our example
 ----------------------------------
    To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
    06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding
-   APQNs can either be removed from the default masks::
+   APQNs can be removed from the default masks using either of the following
+   commands::
 
       echo -5,-6 > /sys/bus/ap/apmask
 
@@ -684,7 +744,7 @@ Securing the APQNs for our example
 
      /sys/devices/vfio_ap/matrix/
      --- [mdev_supported_types]
-     ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+     ------ [vfio_ap-passthrough] (passthrough vfio_ap mediated device type)
      --------- create
      --------- [devices]
 
@@ -735,6 +795,9 @@ Securing the APQNs for our example
      ----------------unassign_control_domain
      ----------------unassign_domain
 
+   Note *****: The vfio_ap mdevs do not persist across reboots unless the
+               mdevctl tool is used to create and persist them.
+
 4. The administrator now needs to configure the matrixes for the mediated
    devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3).
 
@@ -775,17 +838,23 @@ Securing the APQNs for our example
      higher than the maximum is specified, the operation will terminate with
      an error (ENODEV).
 
-   * All APQNs that can be derived from the adapter ID and the IDs of
-     the previously assigned domains must be bound to the vfio_ap device
-     driver. If no domains have yet been assigned, then there must be at least
-     one APQN with the specified APID bound to the vfio_ap driver. If no such
-     APQNs are bound to the driver, the operation will terminate with an
-     error (EADDRNOTAVAIL).
+   * Each APQN derived from the Cartesian product of the APID of the adapter
+     being assigned and the APQIs of the domains previously assigned:
 
-     No APQN that can be derived from the adapter ID and the IDs of the
-     previously assigned domains can be assigned to another mediated matrix
-     device. If an APQN is assigned to another mediated matrix device, the
-     operation will terminate with an error (EADDRINUSE).
+     - Must only be available to the vfio_ap device driver as specified in the
+       sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
+       one APQN is reserved for use by the host device driver, the operation
+       will terminate with an error (EADDRNOTAVAIL).
+
+     - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
+       is assigned to another vfio_ap mediated device, the operation will
+       terminate with an error (EBUSY).
+
+     - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
+       sys/bus/ap/aqmask attribute files are being edited or the operation may
+       terminate with an error (EBUSY).
+
+       Must reference an AP queue device bound to the vfio_ap device driver.
 
    In order to successfully assign a domain:
 
@@ -794,41 +863,51 @@ Securing the APQNs for our example
      higher than the maximum is specified, the operation will terminate with
      an error (ENODEV).
 
-   * All APQNs that can be derived from the domain ID and the IDs of
-     the previously assigned adapters must be bound to the vfio_ap device
-     driver. If no domains have yet been assigned, then there must be at least
-     one APQN with the specified APQI bound to the vfio_ap driver. If no such
-     APQNs are bound to the driver, the operation will terminate with an
-     error (EADDRNOTAVAIL).
+    * Each APQN derived from the Cartesian product of the APQI of the domain
+      being assigned and the APIDs of the adapters previously assigned:
+
+     - Must only be available to the vfio_ap device driver as specified in the
+       sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even
+       one APQN is reserved for use by the host device driver, the operation
+       will terminate with an error (EADDRNOTAVAIL).
+
+     - Must NOT be assigned to another vfio_ap mediated device. If even one APQN
+       is assigned to another vfio_ap mediated device, the operation will
+       terminate with an error (EBUSY).
+
+     - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and
+       sys/bus/ap/aqmask attribute files are being edited or the operation may
+       terminate with an error (EBUSY).
 
-     No APQN that can be derived from the domain ID and the IDs of the
-     previously assigned adapters can be assigned to another mediated matrix
-     device. If an APQN is assigned to another mediated matrix device, the
-     operation will terminate with an error (EADDRINUSE).
+       Must reference an AP queue device bound to the vfio_ap device driver.
 
-   In order to successfully assign a control domain, the domain number
-   specified must represent a value from 0 up to the maximum domain number
-   configured for the system. If a control domain number higher than the maximum
-   is specified, the operation will terminate with an error (ENODEV).
+   In order to successfully assign a control domain:
+
+   * The domain number specified must represent a value from 0 up to the maximum
+     domain number configured for the system. If a control domain number higher
+     than the maximum is specified, the operation will terminate with an
+     error (ENODEV).
+
+   * The control domain must be assigned to the host's AP configuration.
 
 5. Start Guest1::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
 
 7. Start Guest2::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
 
 7. Start Guest3::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
 
-When the guest is shut down, the mediated matrix devices may be removed.
+When the guest is shut down, the vfio_ap mediated devices may be removed.
 
-Using our example again, to remove the mediated matrix device $uuid1::
+Using our example again, to remove the vfio_ap mediated device $uuid1::
 
    /sys/devices/vfio_ap/matrix/
       --- [mdev_supported_types]
@@ -844,23 +923,129 @@ Using our example again, to remove the mediated matrix device $uuid1::
 This will remove all of the mdev matrix device's sysfs structures including
 the mdev device itself. To recreate and reconfigure the mdev matrix device,
 all of the steps starting with step 3 will have to be performed again. Note
-that the remove will fail if a guest using the mdev is still running.
+that the remove will fail if a guest using the vfio_ap mdev is still running.
 
-It is not necessary to remove an mdev matrix device, but one may want to
+It is not necessary to remove a vfio_ap mdev, but one may want to
 remove it if no guest will use it during the remaining lifetime of the linux
-host. If the mdev matrix device is removed, one may want to also reconfigure
+host. If the vfio_ap mdev is removed, one may want to also reconfigure
 the pool of adapters and queues reserved for use by the default drivers.
 
+Hot plug support:
+================
+An adapter, domain or control domain may be hot plugged into a running KVM
+guest by assigning it to the vfio_ap mediated device being used by the guest if
+the following conditions are met:
+
+* The adapter, domain or control domain must also be assigned to the host's
+  AP configuration.
+
+* To hot plug an adapter, each APQN derived from the Cartesian product
+  comprised APID of the adapter being assigned and the APQIs of the domains
+  assigned must reference a queue device bound to the vfio_ap device driver.
+
+* To hot plug a domain, each APQN derived from the Cartesian product
+  comprised APQI of the domain being assigned and the APIDs of the adapters
+  assigned must reference a queue device bound to the vfio_ap device driver.
+
+Over-provisioning of AP queues for a KVM guest:
+==============================================
+Over-provisioning is defined herein as the assignment of adapters or domains to
+a vfio_ap mediated device that do not reference AP devices in the host's AP
+configuration. The idea here is that when the adapter or domain becomes
+available, it will be automatically hot-plugged into the KVM guest using
+the vfio_ap mediated device to which it is assigned as long as each new APQN
+resulting from plugging it in references a queue device bound to the vfio_ap
+device driver.
+
 Limitations
 ===========
-* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
-  to the default drivers pool of a queue that is still assigned to a mediated
-  device in use by a guest. It is incumbent upon the administrator to
-  ensure there is no mediated device in use by a guest to which the APQN is
-  assigned lest the host be given access to the private data of the AP queue
-  device such as a private key configured specifically for the guest.
+Live guest migration is not supported for guests using AP devices without
+intervention by a system administrator. Before a KVM guest can be migrated,
+the vfio_ap mediated device must be removed. Unfortunately, it can not be
+removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
+the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
+its mdev can be hot unplugged from the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
+   the following commands:
+
+      virsh detach-device <guestname> <path-to-device-xml>
+
+      For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
+      the guest named 'my-guest':
+
+         virsh detach-device my-guest ~/config/my-guest-hostdev.xml
+
+            The contents of my-guest-hostdev.xml:
+
+            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+              <source>
+                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+              </source>
+            </hostdev>
+
+
+      virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
+
+      For example, to hot unplug the vfio_ap mediated device identified on the
+      qemu command line with 'id=hostdev0' from the guest named 'my-guest':
+
+         virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
+
+2. A vfio_ap mediated device can be hot unplugged by attaching the qemu monitor
+   to the guest and using the following qemu monitor command:
+
+      (QEMU) device-del id=<device-id>
+
+      For example, to hot unplug the vfio_ap mediated device that was specified
+      on the qemu command line with 'id=hostdev0' when the guest was started:
+
+         (QEMU) device-del id=hostdev0
+
+After live migration of the KVM guest completes, an AP configuration can be
+restored to the KVM guest by hot plugging a vfio_ap mediated device on the target
+system into the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
+   device into the guest via the following virsh commands:
+
+   virsh attach-device <guestname> <path-to-device-xml>
+
+      For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
+      the guest named 'my-guest':
+
+         virsh attach-device my-guest ~/config/my-guest-hostdev.xml
+
+            The contents of my-guest-hostdev.xml:
+
+            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+              <source>
+                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+              </source>
+            </hostdev>
+
+
+   virsh qemu-monitor-command <guest-name> --hmp \
+   "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+
+      For example, to hot plug the vfio_ap mediated device
+      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
+      device-id hostdev0:
+
+      virsh qemu-monitor-command my-guest --hmp \
+      "device_add vfio-ap,\
+      sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+      id=hostdev0"
+
+2. A vfio_ap mediated device can be hot plugged by attaching the qemu monitor
+   to the guest and using the following qemu monitor command:
+
+      (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
 
-* Dynamically modifying the AP matrix for a running guest (which would amount to
-  hot(un)plug of AP devices for the guest) is currently not supported
+      For example, to plug the vfio_ap mediated device
+      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
+      hostdev0:
 
-* Live guest migration is not supported for guests using AP devices.
+         (QEMU) device-add "vfio-ap,\
+         sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+         id=hostdev0"
-- 
2.21.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support
  2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (14 preceding siblings ...)
  2020-12-23  1:16 ` [PATCH v13 15/15] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
@ 2021-01-06 15:16 ` Tony Krowiak
  2021-01-07 14:41   ` Halil Pasic
  15 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2021-01-06 15:16 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, hca, gor

Ping

On 12/22/20 8:15 PM, Tony Krowiak wrote:
> Note: Patch 1, s390/vfio-ap: clean up vfio_ap resources when KVM
>        pointer invalidated does not belong to this series. It has been
>        posted as a separate patch to fix a known problem. It is included
>        here because it will likely pre-req for this series.
>
> The current design for AP pass-through does not support making dynamic
> changes to the AP matrix of a running guest resulting in a few
> deficiencies this patch series is intended to mitigate:
>
> 1. Adapters, domains and control domains can not be added to or removed
>     from a running guest. In order to modify a guest's AP configuration,
>     the guest must be terminated; only then can AP resources be assigned
>     to or unassigned from the guest's matrix mdev. The new AP
>     configuration becomes available to the guest when it is subsequently
>     restarted.
>
> 2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
>     be modified by a root user without any restrictions. A change to
>     either mask can result in AP queue devices being unbound from the
>     vfio_ap device driver and bound to a zcrypt device driver even if a
>     guest is using the queues, thus giving the host access to the guest's
>     private crypto data and vice versa.
>
> 3. The APQNs derived from the Cartesian product of the APIDs of the
>     adapters and APQIs of the domains assigned to a matrix mdev must
>     reference an AP queue device bound to the vfio_ap device driver. The
>     AP architecture allows assignment of AP resources that are not
>     available to the system, so this artificial restriction is not
>     compliant with the architecture.
>
> 4. The AP configuration profile can be dynamically changed for the linux
>     host after a KVM guest is started. For example, a new domain can be
>     dynamically added to the configuration profile via the SE or an HMC
>     connected to a DPM enabled lpar. Likewise, AP adapters can be
>     dynamically configured (online state) and deconfigured (standby state)
>     using the SE, an SCLP command or an HMC connected to a DPM enabled
>     lpar. This can result in inadvertent sharing of AP queues between the
>     guest and host.
>
> 5. A root user can manually unbind an AP queue device representing a
>     queue in use by a KVM guest via the vfio_ap device driver's sysfs
>     unbind attribute. In this case, the guest will be using a queue that
>     is not bound to the driver which violates the device model.
>
> This patch series introduces the following changes to the current design
> to alleviate the shortcomings described above as well as to implement
> more of the AP architecture:
>
> 1. A root user will be prevented from making edits to the AP bus's
>     /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
>     ownership of an APQN from the vfio_ap device driver to a zcrypt driver
>     while the APQN is assigned to a matrix mdev.
>
> 2. Allow a root user to hot plug/unplug AP adapters, domains and control
>     domains for a KVM guest using the matrix mdev via its sysfs
>     assign/unassign attributes.
>
> 4. Allow assignment of an AP adapter or domain to a matrix mdev even if
>     it results in assignment of an APQN that does not reference an AP
>     queue device bound to the vfio_ap device driver, as long as the APQN
>     is not reserved for use by the default zcrypt drivers (also known as
>     over-provisioning of AP resources). Allowing over-provisioning of AP
>     resources better models the architecture which does not preclude
>     assigning AP resources that are not yet available in the system. Such
>     APQNs, however, will not be assigned to the guest using the matrix
>     mdev; only APQNs referencing AP queue devices bound to the vfio_ap
>     device driver will actually get assigned to the guest.
>
> 5. Handle dynamic changes to the AP device model.
>
> 1. Rationale for changes to AP bus's apmask/aqmask interfaces:
> ----------------------------------------------------------
> Due to the extremely sensitive nature of cryptographic data, it is
> imperative that great care be taken to ensure that such data is secured.
> Allowing a root user, either inadvertently or maliciously, to configure
> these masks such that a queue is shared between the host and a guest is
> not only avoidable, it is advisable. It was suggested that this scenario
> is better handled in user space with management software, but that does
> not preclude a malicious administrator from using the sysfs interfaces
> to gain access to a guest's crypto data. It was also suggested that this
> scenario could be avoided by taking access to the adapter away from the
> guest and zeroing out the queues prior to the vfio_ap driver releasing the
> device; however, stealing an adapter in use from a guest as a by-product
> of an operation is bad and will likely cause problems for the guest
> unnecessarily. It was decided that the most effective solution with the
> least number of negative side effects is to prevent the situation at the
> source.
>
> 2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
> ----------------------------------------------------------------
> Allowing a user to hot plug/unplug AP resources using the matrix mdev
> sysfs interfaces circumvents the need to terminate the guest in order to
> modify its AP configuration. Allowing dynamic configuration makes
> reconfiguring a guest's AP matrix much less disruptive.
>
> 3. Rationale for allowing over-provisioning of AP resources:
> -----------------------------------------------------------
> Allowing assignment of AP resources to a matrix mdev and ultimately to a
> guest better models the AP architecture. The architecture does not
> preclude assignment of unavailable AP resources. If a queue subsequently
> becomes available while a guest using the matrix mdev to which its APQN
> is assigned, the guest will be given access to it. If an APQN
> is dynamically unassigned from the underlying host system, it will
> automatically become unavailable to the guest.
>
> Change log v12-v13:
> ------------------
> * Combined patches 12/13 from previous series into one patch
>
> * Moved all changes for linking queues and mdevs into a single patch
>
> * Re-ordered some patches to aid in review
>
> * Using mutex_trylock() function in adapter/domain assignment functions
>    to avoid potential deadlock condition with in_use callback
>
> * Using filtering function for refreshing the guest's APCB for all events
>    that change the APCB: assign/unassign adapters, domains, control domains;
>    bind/unbind of queue devices; and, changes to the host AP configuration.
>
> Change log v11-v12:
> ------------------
> * Moved matrix device lock to protect group notifier callback
>
> * Split the 'No need to disable IRQ after queue reset' patch into
>    multiple patches for easier review (move probe/remove callback
>    functions and remove disable IRQ after queue reset)
>
> * Added code to decrement reference count for KVM in group notifier
>    callback
>
> * Using mutex_trylock() in functions implementing the sysfs assign_adapter
>    and assign_domain as well as the in_use callback to avoid deadlock
>    between the AP bus's ap_perms mutex and the matrix device lock used by
>    vfio_ap driver.
>
> * The sysfs guest_matrix attribute of the vfio_ap mdev will now display
>    the shadow APCB regardless of whether a guest is using the mdev or not
>
> * Replaced vfio_ap mdev filtering function with a function that initializes
>    the guest's APCB by filtering the vfio_ap mdev by APID.
>
> * No longer using filtering function during adapter/domain assignment
>    to/from the vfio_ap mdev; replaced with new hot plug/unplug
>    adapter/domain functions.
>
> * No longer using filtering function during bind/unbind; replaced with
>    hot plug/unplug queue functions.
>
> * No longer using filtering function for bulk assignment of new adapters
>    and domains in on_scan_complete callback; replaced with new hot plug
>    functions.
>    
>
> Change log v10-v11:
> ------------------
> * The matrix mdev's configuration is not filtered by APID so that if any
>    APQN assigned to the mdev is not bound to the vfio_ap device driver,
>    the adapter will not get plugged into the KVM guest on startup, or when
>    a new adapter is assigned to the mdev.
>
> * Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle
>    probe/remove).
>
> * Added a patch 1 to remove disable IRQ after a reset because the reset
>    already disables a queue.
>
> * Now using filtering code to update the KVM guest's matrix when
>    notified that AP bus scan has completed.
>
> * Fixed issue with probe/remove not inititiated by a configuration change
>    occurring within a config change.
>
>
> Change log v9-v10:
> -----------------
> * Updated the documentation in vfio-ap.rst to include information about the
>    AP dynamic configuration support
>
> Change log v8-v9:
> ----------------
> * Fixed errors flagged by the kernel test robot
>
> * Fixed issue with guest losing queues when a new queue is probed due to
>    manual bind operation.
>
> Change log v7-v8:
> ----------------
> * Now logging a message when an attempt to reserve APQNs for the zcrypt
>    drivers will result in taking a queue away from a KVM guest to provide
>    the sysadmin a way to ascertain why the sysfs operation failed.
>
> * Created locked and unlocked versions of the ap_parse_mask_str() function.
>
> * Now using new interface provided by an AP bus patch -
>    s390/ap: introduce new ap function ap_get_qdev() - to retrieve
>    struct ap_queue representing an AP queue device. This patch is not a
>    part of this series but is a prerequisite for this series.
>
> Change log v6-v7:
> ----------------
> * Added callbacks to AP bus:
>    - on_config_changed: Notifies implementing drivers that
>      the AP configuration has changed since last AP device scan.
>    - on_scan_complete: Notifies implementing drivers that the device scan
>      has completed.
>    - implemented on_config_changed and on_scan_complete callbacks for
>      vfio_ap device driver.
>    - updated vfio_ap device driver's probe and remove callbacks to handle
>      dynamic changes to the AP device model.
> * Added code to filter APQNs when assigning AP resources to a KVM guest's
>    CRYCB
>
> Change log v5-v6:
> ----------------
> * Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
>    series. Harald Freudenberer pointed out that the mutex lock
>    for ap_perms_mutex in the apmask_store and aqmask_store functions
>    was not being freed.
>
> * Removed patch 6/7 which added logging to the vfio_ap driver
>    to expedite acceptance of this series. The logging will be introduced
>    with a separate patch series to allow more time to explore options
>    such as DBF logging vs. tracepoints.
>
> * Added 3 patches related to ensuring that APQNs that do not reference
>    AP queue devices bound to the vfio_ap device driver are not assigned
>    to the guest CRYCB:
>
>    Patch 4: Filter CRYCB bits for unavailable queue devices
>    Patch 5: sysfs attribute to display the guest CRYCB
>    Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks
>
> * Added a patch (Patch 9) to version the vfio_ap module.
>
> * Reshuffled patches to allow the in_use callback implementation to
>    invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
>    patch 2.
>
> Change log v4-v5:
> ----------------
> * Added a patch to provide kernel s390dbf debug logs for VFIO AP
>
> Change log v3->v4:
> -----------------
> * Restored patches preventing root user from changing ownership of
>    APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
>    assigned to an mdev.
>
> * No longer enforcing requirement restricting guest access to
>    queues represented by a queue device bound to the vfio_ap
>    device driver.
>
> * Removed shadow CRYCB and now directly updating the guest CRYCB
>    from the matrix mdev's matrix.
>
> * Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
>    Control' patches.
>
> * Disabled bind/unbind sysfs interfaces for vfio_ap driver
>
> Change log v2->v3:
> -----------------
> * Allow guest access to an AP queue only if the queue is bound to
>    the vfio_ap device driver.
>
> * Removed the patch to test CRYCB masks before taking the vCPUs
>    out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.
>
> Change log v1->v2:
> -----------------
> * Removed patches preventing root user from unbinding AP queues from
>    the vfio_ap device driver
> * Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
>    changes to the AP guest configuration due to root user interventions
>    or hardware anomalies.
>
> Tony Krowiak (15):
>    s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated
>    s390/vfio-ap: No need to disable IRQ after queue reset
>    s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
>    s390/vfio-ap: use new AP bus interface to search for queue devices
>    s390/vfio-ap: manage link between queue struct and matrix mdev
>    s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
>    s390/vfio-ap: introduce shadow APCB
>    s390/vfio-ap: sysfs attribute to display the guest's matrix
>    s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
>    s390/zcrypt: driver callback to indicate resource in use
>    s390/vfio-ap: implement in-use callback for vfio_ap driver
>    s390/zcrypt: Notify driver on config changed and scan complete
>      callbacks
>    s390/vfio-ap: handle host AP config change notification
>    s390/vfio-ap: handle AP bus scan completed notification
>    s390/vfio-ap: update docs to include dynamic config support
>
>   Documentation/s390/vfio-ap.rst        | 383 ++++++++---
>   drivers/s390/crypto/ap_bus.c          | 251 +++++++-
>   drivers/s390/crypto/ap_bus.h          |  16 +
>   drivers/s390/crypto/vfio_ap_drv.c     |  50 +-
>   drivers/s390/crypto/vfio_ap_ops.c     | 891 +++++++++++++++++---------
>   drivers/s390/crypto/vfio_ap_private.h |  29 +-
>   6 files changed, 1170 insertions(+), 450 deletions(-)
>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support
  2021-01-06 15:16 ` [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
@ 2021-01-07 14:41   ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-07 14:41 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Wed, 6 Jan 2021 10:16:24 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Ping
> 

pong

Will try have a look these days...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset
  2020-12-23  1:15 ` [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
@ 2021-01-11 16:32   ` Halil Pasic
  2021-01-13 17:06     ` Tony Krowiak
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-11 16:32 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:15:53 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The queues assigned to a matrix mediated device are currently reset when:
> 
> * The VFIO_DEVICE_RESET ioctl is invoked
> * The mdev fd is closed by userspace (QEMU)
> * The mdev is removed from sysfs.
> 
> Immediately after the reset of a queue, a call is made to disable
> interrupts for the queue. This is entirely unnecessary because the reset of
> a queue disables interrupts, so this will be removed.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     |  1 -
>  drivers/s390/crypto/vfio_ap_ops.c     | 40 +++++++++++++++++----------
>  drivers/s390/crypto/vfio_ap_private.h |  1 -
>  3 files changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index be2520cc010b..ca18c91afec9 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>  	apid = AP_QID_CARD(q->apqn);
>  	apqi = AP_QID_QUEUE(q->apqn);
>  	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> -	vfio_ap_irq_disable(q);
>  	kfree(q);
>  	mutex_unlock(&matrix_dev->lock);
>  }
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 7339043906cf..052f61391ec7 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -25,6 +25,7 @@
>  #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>  
>  static int match_apqn(struct device *dev, const void *data)
>  {
> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
>  					int apqn)
>  {
>  	struct vfio_ap_queue *q;
> -	struct device *dev;
>  
>  	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>  		return NULL;
>  	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>  		return NULL;
>  
> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				 &apqn, match_apqn);
> -	if (!dev)
> -		return NULL;
> -	q = dev_get_drvdata(dev);
> -	q->matrix_mdev = matrix_mdev;
> -	put_device(dev);
> +	q = vfio_ap_find_queue(apqn);
> +	if (q)
> +		q->matrix_mdev = matrix_mdev;
>  
>  	return q;
>  }
> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	return notify_rc;
>  }
>  
> -static void vfio_ap_irq_disable_apqn(int apqn)
> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
>  {
>  	struct device *dev;
> -	struct vfio_ap_queue *q;
> +	struct vfio_ap_queue *q = NULL;
>  
>  	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>  				 &apqn, match_apqn);
>  	if (dev) {
>  		q = dev_get_drvdata(dev);
> -		vfio_ap_irq_disable(q);
>  		put_device(dev);
>  	}
> +
> +	return q;
>  }

This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
have next to nothing to do with the patch's objective. If we were at an
earlier stage, I would ask to split it up.

>  
>  int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>  			     unsigned int retry)
>  {
>  	struct ap_queue_status status;
> +	struct vfio_ap_queue *q;
> +	int ret;
>  	int retry2 = 2;
>  	int apqn = AP_MKQID(apid, apqi);
>  
> @@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>  				status = ap_tapq(apqn, NULL);
>  			}
>  			WARN_ON_ONCE(retry2 <= 0);
> -			return 0;
> +			ret = 0;
> +			goto free_aqic_resources;
>  		case AP_RESPONSE_RESET_IN_PROGRESS:
>  		case AP_RESPONSE_BUSY:
>  			msleep(20);
>  			break;
>  		default:
>  			/* things are really broken, give up */
> -			return -EIO;
> +			ret = -EIO;
> +			goto free_aqic_resources;

Do we really want the unpin here? I mean the reset did not work and
we are giving up. So the irqs are potentially still enabled.

Without this patch we try to disable the interrupts using AQIC, and
do the cleanup after that.

I'm aware, the comment says we should not take the default branch,
but if that's really the case we should IMHO log an error and leak the
page.

It's up to you if you want to change this. I don't want to delay the
series any further than absolutely necessary.

Acked-by: Halil Pasic <pasic@linux.ibm.com>

>  		}
>  	} while (retry--);
>  
>  	return -EBUSY;
> +
> +free_aqic_resources:
> +	/*
> +	 * In order to free the aqic resources, the queue must be linked to
> +	 * the matrix_mdev to which its APQN is assigned and the KVM pointer
> +	 * must be available.
> +	 */
> +	q = vfio_ap_find_queue(apqn);
> +	if (q && q->matrix_mdev && q->matrix_mdev->kvm)

Is this of the type "we know there are no aqic resources to be freed" if
precondition is false?

vfio_ap_free_aqic_resources() checks the matrix_mdev pointer but not the
kvm pointer. Could we just check the kvm pointer in
vfio_ap_free_aqic_resources()?

At the end of the series, is seeing q! indicating a bug, or is it
something we expect to see under certain circumstances?


> +		vfio_ap_free_aqic_resources(q);
> +
> +	return ret;
>  }
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> @@ -1189,7 +1202,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>  			 */
>  			if (ret)
>  				rc = ret;
> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>  		}
>  	}
>  
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index f46dde56b464..0db6fb3d56d5 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -100,5 +100,4 @@ struct vfio_ap_queue {
>  #define VFIO_AP_ISC_INVALID 0xff
>  	unsigned char saved_isc;
>  };
> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
>  #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-12-23  1:15 ` [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
@ 2021-01-11 19:17   ` Halil Pasic
  2021-01-13 21:41     ` Tony Krowiak
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-11 19:17 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:15:56 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue's APQN is assigned. The idea
> is to facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 
> The links will be created as follows:
> 
>    * When the queue device is probed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be linked.
> 
>    * When an adapter or domain is assigned to a matrix mdev, for each new
>      APQN assigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>    * When the queue device is removed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be unlinked.
> 
>    * When an adapter or domain is unassigned from a matrix mdev, for each
>      APQN unassigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be unlinked.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 140 +++++++++++++++++++++-----
>  drivers/s390/crypto/vfio_ap_private.h |   3 +
>  2 files changed, 117 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 835c963ae16d..cdcc6378b4a5 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -27,33 +27,17 @@
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>  
> -/**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> - * @matrix_mdev: the associated mediated matrix
> - * @apqn: The queue APQN
> - *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> - *
> - * Returns the pointer to the associated vfio_ap_queue
> - */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> -					struct ap_matrix_mdev *matrix_mdev,
> -					int apqn)
> +static struct vfio_ap_queue *
> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
>  {
> -	struct vfio_ap_queue *q = NULL;
> -
> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> -		return NULL;
> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> -		return NULL;
> +	struct vfio_ap_queue *q;
>  
> -	q = vfio_ap_find_queue(apqn);
> -	if (q)
> -		q->matrix_mdev = matrix_mdev;
> +	hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> +		if (q && (q->apqn == apqn))
> +			return q;
> +	}
>  
> -	return q;
> +	return NULL;
>  }
>  
>  /**
> @@ -166,7 +150,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>  		  status.response_code);
>  end_free:
>  	vfio_ap_free_aqic_resources(q);
> -	q->matrix_mdev = NULL;
>  	return status;
>  }
>  
> @@ -282,7 +265,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>  				   struct ap_matrix_mdev, pqap_hook);
>  
> -	q = vfio_ap_get_queue(matrix_mdev, apqn);
> +	q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
>  	if (!q)
>  		goto out_unlock;
>  
> @@ -325,6 +308,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  
>  	matrix_mdev->mdev = mdev;
>  	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> +	hash_init(matrix_mdev->qtable);
>  	mdev_set_drvdata(mdev, matrix_mdev);
>  	matrix_mdev->pqap_hook.hook = handle_pqap;
>  	matrix_mdev->pqap_hook.owner = THIS_MODULE;
> @@ -553,6 +537,50 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>  	return 0;
>  }
>  
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> +				    struct vfio_ap_queue *q)
> +{
> +	if (q) {
> +		q->matrix_mdev = matrix_mdev;
> +		hash_add(matrix_mdev->qtable,
> +			 &q->mdev_qnode, q->apqn);
> +	}
> +}
> +
> +static void vfio_ap_mdev_link_apqn(struct ap_matrix_mdev *matrix_mdev, int apqn)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_find_queue(apqn);
> +	vfio_ap_mdev_link_queue(matrix_mdev, q);
> +}
> +
> +static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
> +{
> +	if (q) {
> +		q->matrix_mdev = NULL;
> +		hash_del(&q->mdev_qnode);
> +	}
> +}
> +
> +static void vfio_ap_mdev_unlink_apqn(int apqn)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_find_queue(apqn);
> +	vfio_ap_mdev_unlink_queue(q);
> +}
> +
> +static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
> +				      unsigned long apid)
> +{
> +	unsigned long apqi;
> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
> +		vfio_ap_mdev_link_apqn(matrix_mdev,
> +				       AP_MKQID(apid, apqi));
> +}
> +
>  /**
>   * assign_adapter_store
>   *
> @@ -622,6 +650,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	if (ret)
>  		goto share_err;
>  
> +	vfio_ap_mdev_link_adapter(matrix_mdev, apid);
>  	ret = count;
>  	goto done;
>  
> @@ -634,6 +663,15 @@ static ssize_t assign_adapter_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(assign_adapter);
>  
> +static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
> +					unsigned long apid)
> +{
> +	unsigned long apqi;
> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
> +		vfio_ap_mdev_unlink_apqn(AP_MKQID(apid, apqi));
> +}
> +
>  /**
>   * unassign_adapter_store
>   *
> @@ -673,6 +711,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> +	vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -699,6 +738,15 @@ vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
>  	return 0;
>  }
>  
> +static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
> +				     unsigned long apqi)
> +{
> +	unsigned long apid;
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
> +		vfio_ap_mdev_link_apqn(matrix_mdev, AP_MKQID(apid, apqi));
> +}
> +
>  /**
>   * assign_domain_store
>   *
> @@ -763,6 +811,7 @@ static ssize_t assign_domain_store(struct device *dev,
>  	if (ret)
>  		goto share_err;
>  
> +	vfio_ap_mdev_link_domain(matrix_mdev, apqi);
>  	ret = count;
>  	goto done;
>  
> @@ -775,6 +824,14 @@ static ssize_t assign_domain_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(assign_domain);
>  
> +static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
> +				       unsigned long apqi)
> +{
> +	unsigned long apid;
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
> +		vfio_ap_mdev_unlink_apqn(AP_MKQID(apid, apqi));
> +}
>  
>  /**
>   * unassign_domain_store
> @@ -815,6 +872,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> +	vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -1317,6 +1375,28 @@ void vfio_ap_mdev_unregister(void)
>  	mdev_unregister_device(&matrix_dev->device);
>  }
>  
> +/*
> + * vfio_ap_queue_link_mdev
> + *
> + * @q: The queue to link with the matrix mdev.
> + *
> + * Links @q with the matrix mdev to which the queue's APQN is assigned.
> + */
> +static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
> +{
> +	unsigned long apid = AP_QID_CARD(q->apqn);
> +	unsigned long apqi = AP_QID_QUEUE(q->apqn);
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> +		if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
> +		    test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
> +			vfio_ap_mdev_link_queue(matrix_mdev, q);
> +			break;
> +		}
> +	}
> +}
> +
>  int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  {
>  	struct vfio_ap_queue *q;
> @@ -1324,9 +1404,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  	q = kzalloc(sizeof(*q), GFP_KERNEL);
>  	if (!q)
>  		return -ENOMEM;
> +	mutex_lock(&matrix_dev->lock);
>  	dev_set_drvdata(&apdev->device, q);
>  	q->apqn = to_ap_queue(&apdev->device)->qid;
>  	q->saved_isc = VFIO_AP_ISC_INVALID;
> +	vfio_ap_queue_link_mdev(q);
> +	mutex_unlock(&matrix_dev->lock);
> +

Does the critical section have to include more than just
vfio_ap_queue_link_mdev()? Did we need the critical section
before this patch?

>  	return 0;
>  }
>  
> @@ -1341,6 +1425,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>  	apid = AP_QID_CARD(q->apqn);
>  	apqi = AP_QID_QUEUE(q->apqn);
>  	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> +
> +	if (q->matrix_mdev)
> +		vfio_ap_mdev_unlink_queue(q);
> +
>  	kfree(q);
>  	mutex_unlock(&matrix_dev->lock);
>  }
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index d9003de4fbad..4e5cc72fc0db 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -18,6 +18,7 @@
>  #include <linux/delay.h>
>  #include <linux/mutex.h>
>  #include <linux/kvm_host.h>
> +#include <linux/hashtable.h>
>  
>  #include "ap_bus.h"
>  
> @@ -86,6 +87,7 @@ struct ap_matrix_mdev {
>  	struct kvm *kvm;
>  	struct kvm_s390_module_hook pqap_hook;
>  	struct mdev_device *mdev;
> +	DECLARE_HASHTABLE(qtable, 8);
>  };
>  
>  extern int vfio_ap_mdev_register(void);
> @@ -97,6 +99,7 @@ struct vfio_ap_queue {
>  	int	apqn;
>  #define VFIO_AP_ISC_INVALID 0xff
>  	unsigned char saved_isc;
> +	struct hlist_node mdev_qnode;
>  };
>  
>  int vfio_ap_mdev_probe_queue(struct ap_device *queue);


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-12-23  1:15 ` [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
@ 2021-01-11 20:40   ` Halil Pasic
  2021-01-14 17:54     ` Tony Krowiak
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-11 20:40 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:15:57 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The current implementation does not allow assignment of an AP adapter or
> domain to an mdev device if each APQN resulting from the assignment
> does not reference an AP queue device that is bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the matrix mdev as
> long as the APQNs resulting from the assignment:
>    1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
>    2. Are not assigned to another matrix mdev.
> 
> The rationale behind this is twofold:
>    1. The AP architecture does not preclude assignment of APQNs to an AP
>       configuration that are not available to the system.
>    2. APQNs that do not reference a queue device bound to the vfio_ap
>       device driver will not be assigned to the guest's CRYCB, so the
>       guest will not get access to queues not bound to the vfio_ap driver.

You didn't tell us about the changed error code.

Also notice that this point we don't have neither filtering nor in-use.
This used to be patch 11, and most of that stuff used to be in place. But
I'm going to trust you, if you say its fine to enable it this early.

> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 241 ++++++++----------------------
>  1 file changed, 62 insertions(+), 179 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index cdcc6378b4a5..2d58b39977be 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -379,134 +379,37 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>  	NULL,
>  };
>  
> -struct vfio_ap_queue_reserved {
> -	unsigned long *apid;
> -	unsigned long *apqi;
> -	bool reserved;
> -};
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> +			 "already assigned to %s"
>  
> -/**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> - *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> - *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - *   as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - *   reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - *   reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> - */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
> +					 unsigned long *apm,
> +					 unsigned long *aqm)
[..]
> -	return 0;
> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
> +			pr_warn(MDEV_SHARING_ERR, apid, apqi, mdev_name);

I would prefer dev_warn() here. We know which device is about to get
more queues, and this device can provide a clue regarding the initiator.

Also I believe a warning is too heavy handed here. Warnings should not
be ignored. This is a condition that can emerge during normal operation,
AFAIU. Or am I worng?

>  }
>  
>  /**
>   * vfio_ap_mdev_verify_no_sharing
>   *
> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> - * and AP queue indexes comprising the AP matrix are not configured for another
> - * mediated device. AP queue sharing is not allowed.
> + * Verifies that each APQN derived from the Cartesian product of the AP adapter
> + * IDs and AP queue indexes comprising the AP matrix are not configured for
> + * another mediated device. AP queue sharing is not allowed.
>   *
> - * @matrix_mdev: the mediated matrix device
> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
> + *		 are assigned.
> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>   *
> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> + * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
>   */
> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> +					  unsigned long *mdev_apm,
> +					  unsigned long *mdev_aqm)
>  {
>  	struct ap_matrix_mdev *lstdev;
>  	DECLARE_BITMAP(apm, AP_DEVICES);
> @@ -523,20 +426,31 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>  		 * We work on full longs, as we can only exclude the leftover
>  		 * bits in non-inverse order. The leftover is all zeros.
>  		 */
> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> -				lstdev->matrix.apm, AP_DEVICES))
> +		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>  			continue;
>  
> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> -				lstdev->matrix.aqm, AP_DOMAINS))
> +		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>  			continue;
>  
> -		return -EADDRINUSE;
> +		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
> +					     apm, aqm);
> +
> +		return -EBUSY;

Why do we change -EADDRINUSE to -EBUSY? This gets bubbled up to
userspace, or? So a tool that checks for the other mdev has it
condition by checking for -EADDRINUSE, would be confused...

>  	}
>  
>  	return 0;
>  }
>  
> +static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
> +				       unsigned long *mdev_apm,
> +				       unsigned long *mdev_aqm)
> +{
> +	if (ap_apqn_in_matrix_owned_by_def_drv(mdev_apm, mdev_aqm))
> +		return -EADDRNOTAVAIL;
> +
> +	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
> +}
> +
>  static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
>  				    struct vfio_ap_queue *q)
>  {
> @@ -608,10 +522,10 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
>   *	   driver; or, if no APQIs have yet been assigned, the APID is not
>   *	   contained in an APQN bound to the vfio_ap device driver.
>   *
> - *	4. -EADDRINUSE
> + *	4. -EBUSY
>   *	   An APQN derived from the cross product of the APID being assigned
>   *	   and the APQIs previously assigned is being used by another mediated
> - *	   matrix device
> + *	   matrix device or the mdev lock could not be acquired.

This is premature. We don't use try_lock yet.

[..]

>  static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
>  				     unsigned long apqi)
>  {
> @@ -774,10 +660,10 @@ static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
>   *	   driver; or, if no APIDs have yet been assigned, the APQI is not
>   *	   contained in an APQN bound to the vfio_ap device driver.
>   *
> - *	4. -EADDRINUSE
> + *	4. -BUSY
>   *	   An APQN derived from the cross product of the APQI being assigned
>   *	   and the APIDs previously assigned is being used by another mediated
> - *	   matrix device
> + *	   matrix device or the mdev lock could not be acquired.

Same here as above.

Otherwise looks good.

[..]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB
  2020-12-23  1:15 ` [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB Tony Krowiak
@ 2021-01-11 22:50   ` Halil Pasic
  2021-01-14 21:35     ` Tony Krowiak
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-11 22:50 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:15:58 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 15 +++++++++++++++
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 2d58b39977be..44b3a81cadfb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -293,6 +293,20 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
>  	matrix->adm_max = info->apxa ? info->Nd : 15;
>  }
>  
> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
> +}
> +
> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	if (vfio_ap_mdev_has_crycb(matrix_mdev))
> +		kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> +					  matrix_mdev->shadow_apcb.apm,
> +					  matrix_mdev->shadow_apcb.aqm,
> +					  matrix_mdev->shadow_apcb.adm);
> +}
> +
>  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  {
>  	struct ap_matrix_mdev *matrix_mdev;
> @@ -308,6 +322,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  
>  	matrix_mdev->mdev = mdev;
>  	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> +	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>  	hash_init(matrix_mdev->qtable);
>  	mdev_set_drvdata(mdev, matrix_mdev);
>  	matrix_mdev->pqap_hook.hook = handle_pqap;
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 4e5cc72fc0db..d2d26ba18602 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -75,6 +75,7 @@ struct ap_matrix {
>   * @list:	allows the ap_matrix_mdev struct to be added to a list
>   * @matrix:	the adapters, usage domains and control domains assigned to the
>   *		mediated matrix device.
> + * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
>   * @group_notifier: notifier block used for specifying callback function for
>   *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
>   * @kvm:	the struct holding guest's state
> @@ -82,6 +83,7 @@ struct ap_matrix {
>  struct ap_matrix_mdev {
>  	struct list_head node;
>  	struct ap_matrix matrix;
> +	struct ap_matrix shadow_apcb;
>  	struct notifier_block group_notifier;
>  	struct notifier_block iommu_notifier;
>  	struct kvm *kvm;

What happened to the following hunk from v12?

@@ -1218,13 +1233,9 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	if (ret)
 		return NOTIFY_DONE;
 
-	/* If there is no CRYCB pointer, then we can't copy the masks */
-	if (!matrix_mdev->kvm->arch.crypto.crycbd)
-		return NOTIFY_DONE;
-
-	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
-				  matrix_mdev->matrix.aqm,
-				  matrix_mdev->matrix.adm);
+	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
+	       sizeof(matrix_mdev->shadow_apcb));
+	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 
 	return NOTIFY_OK;
 }

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-12-23  1:15 ` [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
@ 2021-01-11 22:58   ` Halil Pasic
  2021-01-28 21:29     ` Tony Krowiak
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-11 22:58 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:15:59 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The matrix of adapters and domains configured in a guest's APCB may
> differ from the matrix of adapters and domains assigned to the matrix mdev,
> so this patch introduces a sysfs attribute to display the matrix of
> adapters and domains that are or will be assigned to the APCB of a guest
> that is or will be using the matrix mdev. For a matrix mdev denoted by
> $uuid, the guest matrix can be displayed as follows:
> 
>    cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

But because vfio_ap_mdev_commit_shadow_apcb() is not used (see prev
patch) the attribute won't show the guest matrix at this point. :(

> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 51 ++++++++++++++++++++++---------
>  1 file changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 44b3a81cadfb..1b1d5975ee0e 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -894,29 +894,24 @@ static ssize_t control_domains_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(control_domains);
>  
> -static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> -			   char *buf)
> +static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
>  {
> -	struct mdev_device *mdev = mdev_from_dev(dev);
> -	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	char *bufpos = buf;
>  	unsigned long apid;
>  	unsigned long apqi;
>  	unsigned long apid1;
>  	unsigned long apqi1;
> -	unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
> -	unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
> +	unsigned long napm_bits = matrix->apm_max + 1;
> +	unsigned long naqm_bits = matrix->aqm_max + 1;
>  	int nchars = 0;
>  	int n;
>  
> -	apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
> -	apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
> -
> -	mutex_lock(&matrix_dev->lock);
> +	apid1 = find_first_bit_inv(matrix->apm, napm_bits);
> +	apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
>  
>  	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
> -		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
> -			for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> +		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
> +			for_each_set_bit_inv(apqi, matrix->aqm,
>  					     naqm_bits) {
>  				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
>  					    apqi);
> @@ -925,25 +920,52 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>  			}
>  		}
>  	} else if (apid1 < napm_bits) {
> -		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
> +		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
>  			n = sprintf(bufpos, "%02lx.\n", apid);
>  			bufpos += n;
>  			nchars += n;
>  		}
>  	} else if (apqi1 < naqm_bits) {
> -		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
> +		for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
>  			n = sprintf(bufpos, ".%04lx\n", apqi);
>  			bufpos += n;
>  			nchars += n;
>  		}
>  	}
>  
> +	return nchars;
> +}
> +
> +static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> +			   char *buf)
> +{
> +	ssize_t nchars;
> +	struct mdev_device *mdev = mdev_from_dev(dev);
> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> +	mutex_lock(&matrix_dev->lock);
> +	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->matrix, buf);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return nchars;
>  }
>  static DEVICE_ATTR_RO(matrix);
>  
> +static ssize_t guest_matrix_show(struct device *dev,
> +				 struct device_attribute *attr, char *buf)
> +{
> +	ssize_t nchars;
> +	struct mdev_device *mdev = mdev_from_dev(dev);
> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> +	mutex_lock(&matrix_dev->lock);
> +	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
> +	mutex_unlock(&matrix_dev->lock);
> +
> +	return nchars;
> +}
> +static DEVICE_ATTR_RO(guest_matrix);
> +
>  static struct attribute *vfio_ap_mdev_attrs[] = {
>  	&dev_attr_assign_adapter.attr,
>  	&dev_attr_unassign_adapter.attr,
> @@ -953,6 +975,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
>  	&dev_attr_unassign_control_domain.attr,
>  	&dev_attr_control_domains.attr,
>  	&dev_attr_matrix.attr,
> +	&dev_attr_guest_matrix.attr,
>  	NULL,
>  };
>  


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-12-23  1:16 ` [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
@ 2021-01-12  1:12   ` Halil Pasic
  2021-01-12 17:55     ` Halil Pasic
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-12  1:12 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:16:00 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's allow adapters, domains and control domains to be hot plugged into
> and hot unplugged from a KVM guest using a matrix mdev when:
> 
> * The adapter, domain or control domain is assigned to or unassigned from
>   the matrix mdev
> 
> * A queue device with an APQN assigned to the matrix mdev is bound to or
>   unbound from the vfio_ap device driver.
> 
> Whenever an assignment or unassignment of an adapter, domain or control
> domain is performed as well as when a bind or unbind of a queue device
> is executed, the AP control block (APCB) that supplies the AP configuration
> to a guest is first refreshed. The APCB is refreshed by copying the AP
> configuration from the mdev's matrix to the APCB, then filtering the
> APCB according to the following rules:
> 
> * The APID of each adapter and the APQI of each domain that is not in the
>   host's AP configuration is filtered out.
> 
> * The APID of each adapter comprising an APQN that does not reference a
>   queue device bound to the vfio_ap device driver is filtered. The APQNs
>   are derived from the Cartesian product of the APID of each adapter and
>   APQI of each domain assigned to the mdev's matrix.
> 
> After refreshing the APCB, if the mdev is in use by a KVM guest, it is
> hot plugged into the guest to provide access to dynamically provide
> access to the adapters, domains and control domains provided via the
> newly refreshed APCB.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 143 ++++++++++++++++++++++++------
>  1 file changed, 118 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 1b1d5975ee0e..843862c88379 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -307,6 +307,88 @@ static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>  					  matrix_mdev->shadow_apcb.adm);
>  }
>  
> +static void vfio_ap_mdev_filter_apcb(struct ap_matrix_mdev *matrix_mdev,
> +				     struct ap_matrix *shadow_apcb)
> +{
> +	int ret;
> +	unsigned long apid, apqi, apqn;
> +
> +	ret = ap_qci(&matrix_dev->info);

Here we do the qci ourselves, thus the view of vfio_ap and the view
of the ap bus may be different.

> +	if (ret)
> +		return;
> +
> +	memcpy(shadow_apcb, &matrix_mdev->matrix, sizeof(struct ap_matrix));
> +

Why is this memcpy necessary...

> +	/*
> +	 * Copy the adapters, domains and control domains to the shadow_apcb
> +	 * from the matrix mdev, but only those that are assigned to the host's
> +	 * AP configuration.
> +	 */
> +	bitmap_and(shadow_apcb->apm, matrix_mdev->matrix.apm,
> +		   (unsigned long *)matrix_dev->info.apm, AP_DEVICES);
> +	bitmap_and(shadow_apcb->aqm, matrix_mdev->matrix.aqm,
> +		   (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
> +	bitmap_and(shadow_apcb->adm, matrix_mdev->matrix.adm,
> +		   (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);

... aren't you overwriting shadow_apcb here anyway?

> +
> +	/* If there are no APQNs assigned, then filtering them be unnecessary */
> +	if (bitmap_empty(shadow_apcb->apm, AP_DEVICES)) {
> +		if (!bitmap_empty(shadow_apcb->aqm, AP_DOMAINS))
> +			bitmap_clear(shadow_apcb->aqm, 0, AP_DOMAINS);
> +		return;
> +	} else if (bitmap_empty(shadow_apcb->aqm, AP_DOMAINS)) {
> +		if (!bitmap_empty(shadow_apcb->apm, AP_DEVICES))
> +			bitmap_clear(shadow_apcb->apm, 0, AP_DEVICES);
> +		return;
> +	}
> +

I complained about this before. I still don't understand why do we need
this, but I'm willing to accept it, unless it breaks something later.

BTW I don't think you have to re examine shadow->a[pq]m to tell if empty,
bitmap_and already told you that.

> +	for_each_set_bit_inv(apid, shadow_apcb->apm, AP_DEVICES) {
> +		for_each_set_bit_inv(apqi, shadow_apcb->aqm, AP_DOMAINS) {
> +			/*
> +			 * If the APQN is not bound to the vfio_ap device
> +			 * driver, then we can't assign it to the guest's
> +			 * AP configuration. The AP architecture won't
> +			 * allow filtering of a single APQN, so if we're
> +			 * filtering APIDs, then filter the APID; otherwise,
> +			 * filter the APQI.
> +			 */
> +			apqn = AP_MKQID(apid, apqi);
> +			if (!vfio_ap_mdev_get_queue(matrix_mdev, apqn)) {
> +				clear_bit_inv(apid, shadow_apcb->apm);
> +				break;
> +			}
> +		}
> +	}
> +}
> +
> +/**
> + * vfio_ap_mdev_refresh_apcb
> + *
> + * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
> + * device bound to the vfio_ap device driver.
> + *
> + * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
> + * @shadow_apcb:  the shadow of the KVM guest's APCB (contains AP configuration
> + *		  for guest)
> + * @filter_apids: boolean value indicating whether the APQNs shall be filtered
> + *		  by APID (true) or by APQI (false).
> + *

The signature in the doc comment and of the function do not match.

Since none of the complains affects correctness, except maybe for the
qci suff:

Acked-by: Halil Pasic <pasic@linux.ibm.com>

If it's good enough for you, it's good enough for me.

> + * Returns the number of APQNs remaining after filtering is complete.
> + */
> +static void vfio_ap_mdev_refresh_apcb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	struct ap_matrix shadow_apcb;
> +
> +	vfio_ap_mdev_filter_apcb(matrix_mdev, &shadow_apcb);
> +
> +	if (memcmp(&shadow_apcb, &matrix_mdev->shadow_apcb,
> +		   sizeof(struct ap_matrix)) != 0) {
> +		memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb,
> +		       sizeof(struct ap_matrix));
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +}
> +
>  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  {
>  	struct ap_matrix_mdev *matrix_mdev;
> @@ -552,10 +634,6 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow assignment of adapter */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apid);
>  	if (ret)
>  		return ret;
> @@ -577,6 +655,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>  
>  	set_bit_inv(apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_link_adapter(matrix_mdev, apid);
> +	vfio_ap_mdev_refresh_apcb(matrix_mdev);
>  
>  	mutex_unlock(&matrix_dev->lock);
>  
> @@ -619,10 +698,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow un-assignment of adapter */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apid);
>  	if (ret)
>  		return ret;
> @@ -633,6 +708,8 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
> +	vfio_ap_mdev_refresh_apcb(matrix_mdev);
> +
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -691,10 +768,6 @@ static ssize_t assign_domain_store(struct device *dev,
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>  
> -	/* If the guest is running, disallow assignment of domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apqi);
>  	if (ret)
>  		return ret;
> @@ -715,6 +788,7 @@ static ssize_t assign_domain_store(struct device *dev,
>  
>  	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_link_domain(matrix_mdev, apqi);
> +	vfio_ap_mdev_refresh_apcb(matrix_mdev);
>  
>  	mutex_unlock(&matrix_dev->lock);
>  
> @@ -757,10 +831,6 @@ static ssize_t unassign_domain_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow un-assignment of domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apqi);
>  	if (ret)
>  		return ret;
> @@ -771,12 +841,24 @@ static ssize_t unassign_domain_store(struct device *dev,
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
> +	vfio_ap_mdev_refresh_apcb(matrix_mdev);
> +
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(unassign_domain);
>  
> +static void vfio_ap_mdev_hot_plug_cdom(struct ap_matrix_mdev *matrix_mdev,
> +				       unsigned long domid)
> +{
> +	if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm) &&
> +	    test_bit_inv(domid, (unsigned long *) matrix_dev->info.adm)) {
> +		set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +}
> +
>  /**
>   * assign_control_domain_store
>   *
> @@ -802,10 +884,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow assignment of control domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &id);
>  	if (ret)
>  		return ret;
> @@ -820,12 +898,22 @@ static ssize_t assign_control_domain_store(struct device *dev,
>  	 */
>  	mutex_lock(&matrix_dev->lock);
>  	set_bit_inv(id, matrix_mdev->matrix.adm);
> +	vfio_ap_mdev_hot_plug_cdom(matrix_mdev, id);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(assign_control_domain);
>  
> +static void vfio_ap_mdev_hot_unplug_cdom(struct ap_matrix_mdev *matrix_mdev,
> +					unsigned long domid)
> +{
> +	if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
> +		clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +}
> +
>  /**
>   * unassign_control_domain_store
>   *
> @@ -852,10 +940,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
>  
> -	/* If the guest is running, disallow un-assignment of control domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &domid);
>  	if (ret)
>  		return ret;
> @@ -864,6 +948,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv(domid, matrix_mdev->matrix.adm);
> +	vfio_ap_mdev_hot_unplug_cdom(matrix_mdev, domid);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -1089,6 +1174,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  				  matrix_mdev->matrix.aqm,
>  				  matrix_mdev->matrix.adm);
>  
> +	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +
>  notify_done:
>  	mutex_unlock(&matrix_dev->lock);
>  	return notify_rc;
> @@ -1330,6 +1417,8 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  	q->apqn = to_ap_queue(&apdev->device)->qid;
>  	q->saved_isc = VFIO_AP_ISC_INVALID;
>  	vfio_ap_queue_link_mdev(q);
> +	if (q->matrix_mdev)
> +		vfio_ap_mdev_refresh_apcb(q->matrix_mdev);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return 0;
> @@ -1337,6 +1426,7 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>  
>  void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>  {
> +	struct ap_matrix_mdev *matrix_mdev;
>  	struct vfio_ap_queue *q;
>  	int apid, apqi;
>  
> @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>  	apqi = AP_QID_QUEUE(q->apqn);
>  	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>  
> -	if (q->matrix_mdev)
> +	if (q->matrix_mdev) {
> +		matrix_mdev = q->matrix_mdev;
>  		vfio_ap_mdev_unlink_queue(q);
> +		vfio_ap_mdev_refresh_apcb(matrix_mdev);
> +	}
>  
>  	kfree(q);
>  	mutex_unlock(&matrix_dev->lock);


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-12-23  1:16 ` [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
@ 2021-01-12  1:20   ` Halil Pasic
  2021-01-12 14:14     ` Matthew Rosato
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-12  1:20 UTC (permalink / raw)
  To: Tony Krowiak, mjrosato
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	alex.williamson, kwankhede, fiuczy, frankja, david, hca, gor

On Tue, 22 Dec 2020 20:16:02 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> There is potential for a deadlock condition between the matrix_dev->lock
> used to lock the matrix device during assignment of adapters and domains
> and the ap_perms_mutex locked by the AP bus when changes are made to the
> sysfs apmask/aqmask attributes.
> 
> Consider following scenario (courtesy of Halil Pasic):
> 1) apmask_store() takes ap_perms_mutex
> 2) assign_adapter_store() takes matrix_dev->lock
> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
>    to take matrix_dev->lock
> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
>    which tries to take ap_perms_mutex
> 
> BANG!
> 
> To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
> function to lock the matrix device during assignment of an adapter or
> domain to a matrix_mdev as well as during the in_use callback, the
> mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
> obtained, then the assignment and in_use functions will terminate with
> -EBUSY.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c     | 21 ++++++++++++++++++---
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  3 files changed, 21 insertions(+), 3 deletions(-)
> 
[..]
>  }
> +
> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> +{
> +	int ret;
> +
> +	if (!mutex_trylock(&matrix_dev->lock))
> +		return -EBUSY;
> +	ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);

If we detect that resources are in use, then we spit warnings to the
message log, right?

@Matt: Is your userspace tooling going to guarantee that this will never
happen?

> +	mutex_unlock(&matrix_dev->lock);
> +
> +	return ret;
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index d2d26ba18602..15b7cd74843b 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -107,4 +107,6 @@ struct vfio_ap_queue {
>  int vfio_ap_mdev_probe_queue(struct ap_device *queue);
>  void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>  
> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
> +
>  #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2021-01-12  1:20   ` Halil Pasic
@ 2021-01-12 14:14     ` Matthew Rosato
  2021-01-12 16:49       ` Halil Pasic
  0 siblings, 1 reply; 48+ messages in thread
From: Matthew Rosato @ 2021-01-12 14:14 UTC (permalink / raw)
  To: Halil Pasic, Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	alex.williamson, kwankhede, fiuczy, frankja, david, hca, gor

On 1/11/21 8:20 PM, Halil Pasic wrote:
> On Tue, 22 Dec 2020 20:16:02 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> 
>> Let's implement the callback to indicate when an APQN
>> is in use by the vfio_ap device driver. The callback is
>> invoked whenever a change to the apmask or aqmask would
>> result in one or more queue devices being removed from the driver. The
>> vfio_ap device driver will indicate a resource is in use
>> if the APQN of any of the queue devices to be removed are assigned to
>> any of the matrix mdevs under the driver's control.
>>
>> There is potential for a deadlock condition between the matrix_dev->lock
>> used to lock the matrix device during assignment of adapters and domains
>> and the ap_perms_mutex locked by the AP bus when changes are made to the
>> sysfs apmask/aqmask attributes.
>>
>> Consider following scenario (courtesy of Halil Pasic):
>> 1) apmask_store() takes ap_perms_mutex
>> 2) assign_adapter_store() takes matrix_dev->lock
>> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
>>     to take matrix_dev->lock
>> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
>>     which tries to take ap_perms_mutex
>>
>> BANG!
>>
>> To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
>> function to lock the matrix device during assignment of an adapter or
>> domain to a matrix_mdev as well as during the in_use callback, the
>> mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
>> obtained, then the assignment and in_use functions will terminate with
>> -EBUSY.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     |  1 +
>>   drivers/s390/crypto/vfio_ap_ops.c     | 21 ++++++++++++++++++---
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   3 files changed, 21 insertions(+), 3 deletions(-)
>>
> [..]
>>   }
>> +
>> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>> +{
>> +	int ret;
>> +
>> +	if (!mutex_trylock(&matrix_dev->lock))
>> +		return -EBUSY;
>> +	ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
> 
> If we detect that resources are in use, then we spit warnings to the
> message log, right?
> 
> @Matt: Is your userspace tooling going to guarantee that this will never
> happen?

Yes, but only when using the tooling to modify apmask/aqmask.  You would 
still be able to create such a scenario by bypassing the tooling and 
invoking the sysfs interfaces directly.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2021-01-12 14:14     ` Matthew Rosato
@ 2021-01-12 16:49       ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-12 16:49 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Tony Krowiak, linux-s390, linux-kernel, kvm, freude, borntraeger,
	cohuck, alex.williamson, kwankhede, fiuczy, frankja, david, hca,
	gor

On Tue, 12 Jan 2021 09:14:07 -0500
Matthew Rosato <mjrosato@linux.ibm.com> wrote:

> On 1/11/21 8:20 PM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:16:02 -0500
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >   
> >> Let's implement the callback to indicate when an APQN
> >> is in use by the vfio_ap device driver. The callback is
> >> invoked whenever a change to the apmask or aqmask would
> >> result in one or more queue devices being removed from the driver. The
> >> vfio_ap device driver will indicate a resource is in use
> >> if the APQN of any of the queue devices to be removed are assigned to
> >> any of the matrix mdevs under the driver's control.
> >>
> >> There is potential for a deadlock condition between the matrix_dev->lock
> >> used to lock the matrix device during assignment of adapters and domains
> >> and the ap_perms_mutex locked by the AP bus when changes are made to the
> >> sysfs apmask/aqmask attributes.
> >>
> >> Consider following scenario (courtesy of Halil Pasic):
> >> 1) apmask_store() takes ap_perms_mutex
> >> 2) assign_adapter_store() takes matrix_dev->lock
> >> 3) apmask_store() calls vfio_ap_mdev_resource_in_use() which tries
> >>     to take matrix_dev->lock
> >> 4) assign_adapter_store() calls ap_apqn_in_matrix_owned_by_def_drv
> >>     which tries to take ap_perms_mutex
> >>
> >> BANG!
> >>
> >> To resolve this issue, instead of using the mutex_lock(&matrix_dev->lock)
> >> function to lock the matrix device during assignment of an adapter or
> >> domain to a matrix_mdev as well as during the in_use callback, the
> >> mutex_trylock(&matrix_dev->lock) function will be used. If the lock is not
> >> obtained, then the assignment and in_use functions will terminate with
> >> -EBUSY.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> >> ---
> >>   drivers/s390/crypto/vfio_ap_drv.c     |  1 +
> >>   drivers/s390/crypto/vfio_ap_ops.c     | 21 ++++++++++++++++++---
> >>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
> >>   3 files changed, 21 insertions(+), 3 deletions(-)
> >>  
> > [..]  
> >>   }
> >> +
> >> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> >> +{
> >> +	int ret;
> >> +
> >> +	if (!mutex_trylock(&matrix_dev->lock))
> >> +		return -EBUSY;
> >> +	ret = vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);  
> > 
> > If we detect that resources are in use, then we spit warnings to the
> > message log, right?
> > 
> > @Matt: Is your userspace tooling going to guarantee that this will never
> > happen?  
> 
> Yes, but only when using the tooling to modify apmask/aqmask.  You would 
> still be able to create such a scenario by bypassing the tooling and 
> invoking the sysfs interfaces directly.
> 
> 

Since, I suppose, the tooling is going to catch this anyway, and produce
much better feedback to the user, I believe we should be fine degrading
the severity to info or debug. 

I would prefer not producing a warning here, because I believe it is
likely to do more harm, than good (by implying a kernel problem, as I
don't think based on the message one will think that it is an userspace
problem). But if everybody else agrees, that we want a warning here, then
I can live with that as well.

Regards,
Halil

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use
  2020-12-23  1:16 ` [PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
@ 2021-01-12 16:50   ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-12 16:50 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:16:01 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The callback
> will be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> busy error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters and domains
> assigned to the matrix mdev). This will enforce the proper procedure for
> removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks
  2020-12-23  1:16 ` [PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
@ 2021-01-12 16:58   ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-12 16:58 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:16:03 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> This patch intruduces an extension to the ap bus to notify device drivers
> when the host AP configuration changes - i.e., adapters, domains or
> control domains are added or removed. To that end, two new callbacks are
> introduced for AP device drivers:
> 
>   void (*on_config_changed)(struct ap_config_info *new_config_info,
>                             struct ap_config_info *old_config_info);
> 
>      This callback is invoked at the start of the AP bus scan
>      function when it determines that the host AP configuration information
>      has changed since the previous scan. This is done by storing
>      an old and current QCI info struct and comparing them. If there is any
>      difference, the callback is invoked.
> 
>      Note that when the AP bus scan detects that AP adapters, domains or
>      control domains have been removed from the host's AP configuration, it
>      will remove the associated devices from the AP bus subsystem's device
>      model. This callback gives the device driver a chance to respond to
>      the removal of the AP devices from the host configuration prior to
>      calling the device driver's remove callback. The primary purpose of
>      this callback is to allow the vfio_ap driver to do a bulk unplug of
>      all affected adapters, domains and control domains from affected
>      guests rather than unplugging them one at a time when the remove
>      callback is invoked.
> 
>   void (*on_scan_complete)(struct ap_config_info *new_config_info,
>                            struct ap_config_info *old_config_info);
> 
>      The on_scan_complete callback is invoked after the ap bus scan is
>      complete if the host AP configuration data has changed.
> 
>      Note that when the AP bus scan detects that adapters, domains or
>      control domains have been added to the host's configuration, it will
>      create new devices in the AP bus subsystem's device model. The primary
>      purpose of this callback is to allow the vfio_ap driver to do a bulk
>      plug of all affected adapters, domains and control domains into
>      affected guests rather than plugging them one at a time when the
>      probe callback is invoked.
> 
> Please note that changes to the apmask and aqmask do not trigger
> these two callbacks since the bus scan function is not invoked by changes
> to those masks.
> 
> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>

Reviewed-by: Halil Pasic <pasic@linux.ibm.com>

[..]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2021-01-12  1:12   ` Halil Pasic
@ 2021-01-12 17:55     ` Halil Pasic
  2021-02-01 14:41       ` Tony Krowiak
  2021-02-03 23:13       ` Tony Krowiak
  0 siblings, 2 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-12 17:55 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 12 Jan 2021 02:12:51 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> > @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> >  	apqi = AP_QID_QUEUE(q->apqn);
> >  	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >  
> > -	if (q->matrix_mdev)
> > +	if (q->matrix_mdev) {
> > +		matrix_mdev = q->matrix_mdev;
> >  		vfio_ap_mdev_unlink_queue(q);
> > +		vfio_ap_mdev_refresh_apcb(matrix_mdev);
> > +	}
> >  
> >  	kfree(q);
> >  	mutex_unlock(&matrix_dev->lock);  

Shouldn't we first remove the queue from the APCB and then
reset? Sorry, I missed this one yesterday.

Regards,
Halil

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification
  2020-12-23  1:16 ` [PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification Tony Krowiak
@ 2021-01-12 18:39   ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-12 18:39 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:16:04 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The motivation for config change notification is to enable the vfio_ap
> device driver to handle hot plug/unplug of AP queues for a KVM guest as a
> bulk operation. For example, if a new APID is dynamically assigned to the
> host configuration, then a queue device will be created for each APQN that
> can be formulated from the new APID and all APQIs already assigned to the
> host configuration. Each of these new queue devices will get bound to their
> respective driver one at a time, as they are created. In the case of the
> vfio_ap driver, if the APQN of the queue device being bound to the driver
> is assigned to a matrix mdev in use by a KVM guest, it will be hot plugged
> into the guest if possible. Given that the AP architecture allows for 256
> adapters and 256 domains, one can see the possibility of the vfio_ap
> driver's probe/remove callbacks getting invoked an inordinate number of
> times when the host configuration changes. Keep in mind that in order to
> plug/unplug an AP queue for a guest, the guest's VCPUs must be suspended,
> then the guest's AP configuration must be updated followed by the VCPUs
> being resumed. If this is done each time the probe or remove callback is
> invoked and there are hundreds or thousands of queues to be probed or
> removed, this would be incredibly inefficient and could have a large impact
> on guest performance. What the config notification does is allow us to
> make the changes to the guest in a single operation.
> 
> This patch implements the on_cfg_changed callback which notifies the
> AP device drivers that the host AP configuration has changed (i.e.,
> adapters, domains and/or control domains are added to or removed from the
> host AP configuration).
> 
> Adapters added to host configuration:
> * The APIDs of the adapters added will be stored in a bitmap contained
>   within the struct representing the matrix device which is the parent
>   device of all matrix mediated devices.
> * When a queue is probed, if the APQN of the queue being probed is
>   assigned to an mdev in use by a guest, the queue may get hot plugged
>   into the guest; however, if the APID of the adapter is contained in the
>   bitmap of adapters added, the queue hot plug operation will be skipped
>   until the AP bus notifies the driver that its scan operation has
>   completed (another patch).

I guess, I should be able to find this in patch 14. But I can't.

> * When the vfio_ap driver is notified that the AP bus scan has completed,
>   the guest's APCB will be refreshed by filtering the mdev's matrix by
>   APID.
> 
> Domains added to host configuration:
> * The APQIs of the domains added will be stored in a bitmap contained
>   within the struct representing the matrix device which is the parent
>   device of all matrix mediated devices.
> * When a queue is probed, if the APQN of the queue being probed is
>   assigned to an mdev in use by a guest, the queue may get hot plugged
>   into the guest; however, if the APQI of the domain is contained in the
>   bitmap of domains added, the queue hot plug operation will be skipped
>   until the AP bus notifies the driver that its scan operation has
>   completed (another patch).
> 
> Control domains added to the host configuration:
> * The domain numbers of the domains added will be stored in a bitmap
>   contained within the struct representing the matrix device which is the
>   parent device of all matrix mediated devices.
> 
> When the vfio_ap device driver is notified that the AP bus scan has
> completed, the APCB for each matrix mdev to which the adapters, domains
> and control domains added are assigned will be refreshed. If a KVM guest is
> using the matrix mdev, the APCB will be hot plugged into the guest to
> refresh its AP configuration.
> 
> Adapters removed from configuration:
> * Each queue device with the APID identifying an adapter removed from
>   the host AP configuration will be unlinked from the matrix mdev to which
>   the queue's APQN is assigned.
> * When the vfio_ap driver's remove callback is invoked, if the queue
>   device is not linked to the matrix mdev, the refresh of the guest's
>   APCB will be skipped.
> 
> Domains removed from configuration:
> * Each queue device with the APQI identifying a domain removed from
>   the host AP configuration will be unlinked from the matrix mdev to which
>   the queue's APQN is assigned.
> * When the vfio_ap driver's remove callback is invoked, if the queue
>   device is not linked to the matrix mdev, the refresh of the guest's
>   APCB will be skipped.
> 
> If any queues with an APQN assigned to a given matrix mdev have been
> unlinked or any control domains assigned to a given matrix mdev have been
> removed from the host AP configuration, the APCB of the matrix mdev will
> be refreshed. If a KVM guest is using the matrix mdev, the APCB will be hot
> plugged into the guest to refresh its AP configuration.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
[..]
> +static void vfio_ap_mdev_on_cfg_add(void)
> +{
> +	unsigned long *cur_apm, *cur_aqm, *cur_adm;
> +	unsigned long *prev_apm, *prev_aqm, *prev_adm;
> +
> +	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
> +	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
> +	cur_adm = (unsigned long *)matrix_dev->config_info.adm;
> +
> +	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
> +	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
> +	prev_adm = (unsigned long *)matrix_dev->config_info_prev.adm;
> +
> +	bitmap_andnot(matrix_dev->ap_add, cur_apm, prev_apm, AP_DEVICES);
> +	bitmap_andnot(matrix_dev->aq_add, cur_aqm, prev_aqm, AP_DOMAINS);
> +	bitmap_andnot(matrix_dev->ad_add, cur_adm, prev_adm, AP_DOMAINS);
> +}

This function seems useless without the next patch, but I don't mind, it
can stay here.

Regards,
Halil
[..]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification
  2020-12-23  1:16 ` [PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
@ 2021-01-12 18:44   ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-12 18:44 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Tue, 22 Dec 2020 20:16:05 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Implements the driver callback invoked by the AP bus when the AP bus
> scan has completed. Since this callback is invoked after binding the newly
> added devices to their respective device drivers, the vfio_ap driver will
> attempt to hot plug the adapters, domains and control domains into each
> guest using the matrix mdev to which they are assigned. Keep in mind that
> an adapter or domain can be plugged in only if:
> * Each APQN derived from the newly added APID of the adapter and the APQIs
>   already assigned to the guest's APCB references an AP queue device bound
>   to the vfio_ap driver
> * Each APQN derived from the newly added APQI of the domain and the APIDs
>   already assigned to the guest's APCB references an AP queue device bound
>   to the vfio_ap driver

As stated in my comment to your previous patch, I don't see the promised
mechanism for delaying hotplug (from probe). Without that we can't
consolidate, and the handling of on_scan_complete() is useless, because
the hotplugs are already done.

Regards,
Halil

> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c     | 21 +++++++++++++++++++++
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index 2029d8392416..075495fc44c0 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -149,6 +149,7 @@ static int __init vfio_ap_init(void)
>  	vfio_ap_drv.remove = vfio_ap_mdev_remove_queue;
>  	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>  	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
> +	vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
>  	vfio_ap_drv.ids = ap_queue_ids;
>  
>  	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 8bbbd1dc7546..b8ed01297812 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1592,3 +1592,24 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>  	vfio_ap_mdev_on_cfg_add();
>  	mutex_unlock(&matrix_dev->lock);
>  }
> +
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> +			      struct ap_config_info *old_config_info)
> +{
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	mutex_lock(&matrix_dev->lock);
> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> +		if (bitmap_intersects(matrix_mdev->matrix.apm,
> +				      matrix_dev->ap_add, AP_DEVICES) ||
> +		    bitmap_intersects(matrix_mdev->matrix.aqm,
> +				      matrix_dev->aq_add, AP_DOMAINS) ||
> +		    bitmap_intersects(matrix_mdev->matrix.adm,
> +				      matrix_dev->ad_add, AP_DOMAINS))
> +			vfio_ap_mdev_refresh_apcb(matrix_mdev);
> +	}
> +
> +	bitmap_clear(matrix_dev->ap_add, 0, AP_DEVICES);
> +	bitmap_clear(matrix_dev->aq_add, 0, AP_DOMAINS);
> +	mutex_unlock(&matrix_dev->lock);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index b99b68968447..7f0f7c92e686 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -117,5 +117,7 @@ int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>  
>  void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>  			    struct ap_config_info *old_config_info);
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> +			      struct ap_config_info *old_config_info);
>  
>  #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset
  2021-01-11 16:32   ` Halil Pasic
@ 2021-01-13 17:06     ` Tony Krowiak
  2021-01-13 21:21       ` Halil Pasic
  0 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2021-01-13 17:06 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/11/21 11:32 AM, Halil Pasic wrote:
> On Tue, 22 Dec 2020 20:15:53 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The queues assigned to a matrix mediated device are currently reset when:
>>
>> * The VFIO_DEVICE_RESET ioctl is invoked
>> * The mdev fd is closed by userspace (QEMU)
>> * The mdev is removed from sysfs.
>>
>> Immediately after the reset of a queue, a call is made to disable
>> interrupts for the queue. This is entirely unnecessary because the reset of
>> a queue disables interrupts, so this will be removed.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     |  1 -
>>   drivers/s390/crypto/vfio_ap_ops.c     | 40 +++++++++++++++++----------
>>   drivers/s390/crypto/vfio_ap_private.h |  1 -
>>   3 files changed, 26 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index be2520cc010b..ca18c91afec9 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>>   	apid = AP_QID_CARD(q->apqn);
>>   	apqi = AP_QID_QUEUE(q->apqn);
>>   	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> -	vfio_ap_irq_disable(q);
>>   	kfree(q);
>>   	mutex_unlock(&matrix_dev->lock);
>>   }
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 7339043906cf..052f61391ec7 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -25,6 +25,7 @@
>>   #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>   
>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>>   
>>   static int match_apqn(struct device *dev, const void *data)
>>   {
>> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
>>   					int apqn)
>>   {
>>   	struct vfio_ap_queue *q;
>> -	struct device *dev;
>>   
>>   	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>>   		return NULL;
>>   	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>>   		return NULL;
>>   
>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> -				 &apqn, match_apqn);
>> -	if (!dev)
>> -		return NULL;
>> -	q = dev_get_drvdata(dev);
>> -	q->matrix_mdev = matrix_mdev;
>> -	put_device(dev);
>> +	q = vfio_ap_find_queue(apqn);
>> +	if (q)
>> +		q->matrix_mdev = matrix_mdev;
>>   
>>   	return q;
>>   }
>> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   	return notify_rc;
>>   }
>>   
>> -static void vfio_ap_irq_disable_apqn(int apqn)
>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
>>   {
>>   	struct device *dev;
>> -	struct vfio_ap_queue *q;
>> +	struct vfio_ap_queue *q = NULL;
>>   
>>   	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>   				 &apqn, match_apqn);
>>   	if (dev) {
>>   		q = dev_get_drvdata(dev);
>> -		vfio_ap_irq_disable(q);
>>   		put_device(dev);
>>   	}
>> +
>> +	return q;
>>   }
> This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
> have next to nothing to do with the patch's objective. If we were at an
> earlier stage, I would ask to split it up.

The rewrite of vfio_ap_get_queue() definitely is related to this
patch's objective. Below, in the vfio_ap_mdev_reset_queue()
function, there is the label 'free_aqic_resources' which is where
the call to vfio_ap_free_aqic_resources() function is called.
That function takes a struct vfio_ap_queue as an argument,
so the object needs to be retrieved prior to calling the function.
We can't use the vfio_ap_get_queue() function for two reasons:
1. The vfio_ap_get_queue() function takes a struct ap_matrix_mdev
     as a parameter and we do not have a pointer to such at the time.
2. The vfio_ap_get_queue() function is used to link the mdev to the
     vfio_ap_queue object with the specified APQN.
So, we needed a way to retrieve the vfio_ap_queue object by its
APQN only, Rather than creating a function that retrieves the
vfio_ap_queue object which duplicates the retrieval code in
vfio_ap_get_queue(), I created the vfio_ap_find_queue()
function to do just that and modified the vfio_ap_get_queue()
function to call it (i.e., code reuse).


>
>>   
>>   int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>   			     unsigned int retry)
>>   {
>>   	struct ap_queue_status status;
>> +	struct vfio_ap_queue *q;
>> +	int ret;
>>   	int retry2 = 2;
>>   	int apqn = AP_MKQID(apid, apqi);
>>   
>> @@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>   				status = ap_tapq(apqn, NULL);
>>   			}
>>   			WARN_ON_ONCE(retry2 <= 0);
>> -			return 0;
>> +			ret = 0;
>> +			goto free_aqic_resources;
>>   		case AP_RESPONSE_RESET_IN_PROGRESS:
>>   		case AP_RESPONSE_BUSY:
>>   			msleep(20);
>>   			break;
>>   		default:
>>   			/* things are really broken, give up */
>> -			return -EIO;
>> +			ret = -EIO;
>> +			goto free_aqic_resources;
> Do we really want the unpin here? I mean the reset did not work and
> we are giving up. So the irqs are potentially still enabled.
>
> Without this patch we try to disable the interrupts using AQIC, and
> do the cleanup after that.

If the reset failure lands here, then a subsequent AQIC will
also fail, so I see no reason to expend processing time for
something that will ultimately fail anyways.

>
> I'm aware, the comment says we should not take the default branch,
> but if that's really the case we should IMHO log an error and leak the
> page.

I do not see a good reason to leak the page, what purpose would
it serve? I don't have a problem with logging an error, do you think
it should just be a log message or a WARN_ON type of thing?

>
> It's up to you if you want to change this. I don't want to delay the
> series any further than absolutely necessary.
>
> Acked-by: Halil Pasic <pasic@linux.ibm.com>
>
>>   		}
>>   	} while (retry--);
>>   
>>   	return -EBUSY;
>> +
>> +free_aqic_resources:
>> +	/*
>> +	 * In order to free the aqic resources, the queue must be linked to
>> +	 * the matrix_mdev to which its APQN is assigned and the KVM pointer
>> +	 * must be available.
>> +	 */
>> +	q = vfio_ap_find_queue(apqn);
>> +	if (q && q->matrix_mdev && q->matrix_mdev->kvm)
> Is this of the type "we know there are no aqic resources to be freed" if
> precondition is false?

Yes

>
> vfio_ap_free_aqic_resources() checks the matrix_mdev pointer but not the
> kvm pointer. Could we just check the kvm pointer in
> vfio_ap_free_aqic_resources()?

A while back I posted a patch that did just that and someone pushed back
because they could not see how the vfio_ap_free_aqic_resources()
function would ever be called with a NULL kvm pointer which is
why I implemented the above check. The reset is called
when the mdev is removed which can happen only when there
is no kvm pointer, so I agree it would be better to check the kvm
pointer in the vfio_ap_free_aqic_resources() function.

>
> At the end of the series, is seeing q! indicating a bug, or is it
> something we expect to see under certain circumstances?

I'm not quite sure to what you are referring regarding "the
end of the series", but we can expect to see a NULL pointer
for q if a queue is manually unbound from the driver.

>
>
>> +		vfio_ap_free_aqic_resources(q);
>> +
>> +	return ret;
>>   }
>>   
>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> @@ -1189,7 +1202,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>   			 */
>>   			if (ret)
>>   				rc = ret;
>> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>>   		}
>>   	}
>>   
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index f46dde56b464..0db6fb3d56d5 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -100,5 +100,4 @@ struct vfio_ap_queue {
>>   #define VFIO_AP_ISC_INVALID 0xff
>>   	unsigned char saved_isc;
>>   };
>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
>>   #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset
  2021-01-13 17:06     ` Tony Krowiak
@ 2021-01-13 21:21       ` Halil Pasic
  2021-01-14  0:46         ` Tony Krowiak
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-13 21:21 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Wed, 13 Jan 2021 12:06:28 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 1/11/21 11:32 AM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:15:53 -0500
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> The queues assigned to a matrix mediated device are currently reset when:
> >>
> >> * The VFIO_DEVICE_RESET ioctl is invoked
> >> * The mdev fd is closed by userspace (QEMU)
> >> * The mdev is removed from sysfs.
> >>
> >> Immediately after the reset of a queue, a call is made to disable
> >> interrupts for the queue. This is entirely unnecessary because the reset of
> >> a queue disables interrupts, so this will be removed.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> >> ---
> >>   drivers/s390/crypto/vfio_ap_drv.c     |  1 -
> >>   drivers/s390/crypto/vfio_ap_ops.c     | 40 +++++++++++++++++----------
> >>   drivers/s390/crypto/vfio_ap_private.h |  1 -
> >>   3 files changed, 26 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> >> index be2520cc010b..ca18c91afec9 100644
> >> --- a/drivers/s390/crypto/vfio_ap_drv.c
> >> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> >> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> >>   	apid = AP_QID_CARD(q->apqn);
> >>   	apqi = AP_QID_QUEUE(q->apqn);
> >>   	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >> -	vfio_ap_irq_disable(q);
> >>   	kfree(q);
> >>   	mutex_unlock(&matrix_dev->lock);
> >>   }
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >> index 7339043906cf..052f61391ec7 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -25,6 +25,7 @@
> >>   #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> >>   
> >>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> >> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
> >>   
> >>   static int match_apqn(struct device *dev, const void *data)
> >>   {
> >> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
> >>   					int apqn)
> >>   {
> >>   	struct vfio_ap_queue *q;
> >> -	struct device *dev;
> >>   
> >>   	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >>   		return NULL;
> >>   	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> >>   		return NULL;
> >>   
> >> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> >> -				 &apqn, match_apqn);
> >> -	if (!dev)
> >> -		return NULL;
> >> -	q = dev_get_drvdata(dev);
> >> -	q->matrix_mdev = matrix_mdev;
> >> -	put_device(dev);
> >> +	q = vfio_ap_find_queue(apqn);
> >> +	if (q)
> >> +		q->matrix_mdev = matrix_mdev;
> >>   
> >>   	return q;
> >>   }
> >> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> >>   	return notify_rc;
> >>   }
> >>   
> >> -static void (int apqn)
> >> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
> >>   {
> >>   	struct device *dev;
> >> -	struct vfio_ap_queue *q;
> >> +	struct vfio_ap_queue *q = NULL;
> >>   
> >>   	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> >>   				 &apqn, match_apqn);
> >>   	if (dev) {
> >>   		q = dev_get_drvdata(dev);
> >> -		vfio_ap_irq_disable(q);
> >>   		put_device(dev);
> >>   	}
> >> +
> >> +	return q;
> >>   }  
> > This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
> > have next to nothing to do with the patch's objective. If we were at an
> > earlier stage, I would ask to split it up.  
> 
> The rewrite of vfio_ap_get_queue() definitely is related to this
> patch's objective. 

Definitively loosely related.

> Below, in the vfio_ap_mdev_reset_queue()
> function, there is the label 'free_aqic_resources' which is where
> the call to vfio_ap_free_aqic_resources() function is called.
> That function takes a struct vfio_ap_queue as an argument,
> so the object needs to be retrieved prior to calling the function.
> We can't use the vfio_ap_get_queue() function for two reasons:
> 1. The vfio_ap_get_queue() function takes a struct ap_matrix_mdev
>      as a parameter and we do not have a pointer to such at the time.
> 2. The vfio_ap_get_queue() function is used to link the mdev to the
>      vfio_ap_queue object with the specified APQN.
> So, we needed a way to retrieve the vfio_ap_queue object by its
> APQN only, Rather than creating a function that retrieves the
> vfio_ap_queue object which duplicates the retrieval code in
> vfio_ap_get_queue(), I created the vfio_ap_find_queue()
> function to do just that and modified the vfio_ap_get_queue()
> function to call it (i.e., code reuse).

Please tell me what prevented you from doing a doing the splitting out
vfio_ap_find_queue() from vfio_ap_get_queue() in a separate patch, that
precedes this patch? It would have resulted in simpler diffs, because
the split out wouldn't be intermingled with other stuff, i.e. getting
rid of vfio_ap_irq_disable_apqn(). Don't you see that the two are
intermingled in this diff?

> 
> 
> >  
> >>   
> >>   int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> >>   			     unsigned int retry)
> >>   {
> >>   	struct ap_queue_status status;
> >> +	struct vfio_ap_queue *q;
> >> +	int ret;
> >>   	int retry2 = 2;
> >>   	int apqn = AP_MKQID(apid, apqi);
> >>   
> >> @@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> >>   				status = ap_tapq(apqn, NULL);
> >>   			}
> >>   			WARN_ON_ONCE(retry2 <= 0);
> >> -			return 0;
> >> +			ret = 0;
> >> +			goto free_aqic_resources;
> >>   		case AP_RESPONSE_RESET_IN_PROGRESS:
> >>   		case AP_RESPONSE_BUSY:
> >>   			msleep(20);
> >>   			break;
> >>   		default:
> >>   			/* things are really broken, give up */
> >> -			return -EIO;
> >> +			ret = -EIO;
> >> +			goto free_aqic_resources;  
> > Do we really want the unpin here? I mean the reset did not work and
> > we are giving up. So the irqs are potentially still enabled.
> >
> > Without this patch we try to disable the interrupts using AQIC, and
> > do the cleanup after that.  
> 
> If the reset failure lands here, then a subsequent AQIC will
> also fail, so I see no reason to expend processing time for
> something that will ultimately fail anyways.
> 
> >
> > I'm aware, the comment says we should not take the default branch,
> > but if that's really the case we should IMHO log an error and leak the
> > page.  
> 
> I do not see a good reason to leak the page, what purpose would
> it serve? 

Well, the thing is we don't have a case for AP_RESPONSE_CHECKSTOPPED,
which is, AFAIK a valid outcome. I don't remember what is the exact
deal with checkstopped regarding interrupts.

If we take the default with something different
than AP_RESPONSE_CHECKSTOPPED, that is AFAICT a bug of the underlying
machine.

> I don't have a problem with logging an error, do you think
> it should just be a log message or a WARN_ON type of thing?
> 

Seeing an outcome we don't expect to see, due to a bug in the underlying
machine is in my book worth an error message. Furthermore we may not
assume that the interrupts where shut down for the queue. So the only
way we can protect the host is by leaking the page.

> >
> > It's up to you if you want to change this. I don't want to delay the
> > series any further than absolutely necessary.
> >
> > Acked-by: Halil Pasic <pasic@linux.ibm.com>
> >  
> >>   		}
> >>   	} while (retry--);
> >>   
> >>   	return -EBUSY;
> >> +
> >> +free_aqic_resources:
> >> +	/*
> >> +	 * In order to free the aqic resources, the queue must be linked to
> >> +	 * the matrix_mdev to which its APQN is assigned and the KVM pointer
> >> +	 * must be available.
> >> +	 */
> >> +	q = vfio_ap_find_queue(apqn);
> >> +	if (q && q->matrix_mdev && q->matrix_mdev->kvm)  
> > Is this of the type "we know there are no aqic resources to be freed" if
> > precondition is false?  
> 
> Yes
> 
> >
> > vfio_ap_free_aqic_resources() checks the matrix_mdev pointer but not the
> > kvm pointer. Could we just check the kvm pointer in
> > vfio_ap_free_aqic_resources()?  
> 
> A while back I posted a patch that did just that and someone pushed back
> because they could not see how the vfio_ap_free_aqic_resources()
> function would ever be called with a NULL kvm pointer which is
> why I implemented the above check. The reset is called
> when the mdev is removed which can happen only when there
> is no kvm pointer, so I agree it would be better to check the kvm
> pointer in the vfio_ap_free_aqic_resources() function.
> 

I don't remember. Sorry if it was me.

> >
> > At the end of the series, is seeing q! indicating a bug, or is it
> > something we expect to see under certain circumstances?  
> 
> I'm not quite sure to what you are referring regarding "the
> end of the series", but we can expect to see a NULL pointer
> for q if a queue is manually unbound from the driver.

By at the end of the series, I mean with all 15 patches applied.

Regarding the case where the queue is manually unbound form the
driver, this is exactly one of the scenarios I was latently concerned
about. Let me explain. The manually unbound queue was already reset
in vfio_ap_mdev_remove_queue() if necessary, so we don't need to reset
it again. And more importantly it is not bound to the vfio_ap driver,
so vfio_ap is not allowed to reset it. (It could in theory belong to
and be in use by another non-default driver).

I've just checked out vfio_ap_mdev_reset_queues() and it resets all
queues in the matrix. The in use mechanism does ensure that zcrypt
can't use these queues (together with a[pq]mask), but resetting a
queue that does not belong to us is going beyond our authority.


Regards,
Halil

> 
> >
> >  
> >> +		vfio_ap_free_aqic_resources(q);
> >> +
> >> +	return ret;
> >>   }
> >>   
> >>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> >> @@ -1189,7 +1202,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> >>   			 */
> >>   			if (ret)
> >>   				rc = ret;
> >> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
> >>   		}
> >>   	}
> >>   
> >> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> >> index f46dde56b464..0db6fb3d56d5 100644
> >> --- a/drivers/s390/crypto/vfio_ap_private.h
> >> +++ b/drivers/s390/crypto/vfio_ap_private.h
> >> @@ -100,5 +100,4 @@ struct vfio_ap_queue {
> >>   #define VFIO_AP_ISC_INVALID 0xff
> >>   	unsigned char saved_isc;
> >>   };
> >> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
> >>   #endif /* _VFIO_AP_PRIVATE_H_ */  
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev
  2021-01-11 19:17   ` Halil Pasic
@ 2021-01-13 21:41     ` Tony Krowiak
  2021-01-14  2:50       ` Halil Pasic
  0 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2021-01-13 21:41 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/11/21 2:17 PM, Halil Pasic wrote:
> On Tue, 22 Dec 2020 20:15:56 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Let's create links between each queue device bound to the vfio_ap device
>> driver and the matrix mdev to which the queue's APQN is assigned. The idea
>> is to facilitate efficient retrieval of the objects representing the queue
>> devices and matrix mdevs as well as to verify that a queue assigned to
>> a matrix mdev is bound to the driver.
>>
>> The links will be created as follows:
>>
>>     * When the queue device is probed, if its APQN is assigned to a matrix
>>       mdev, the structures representing the queue device and the matrix mdev
>>       will be linked.
>>
>>     * When an adapter or domain is assigned to a matrix mdev, for each new
>>       APQN assigned that references a queue device bound to the vfio_ap
>>       device driver, the structures representing the queue device and the
>>       matrix mdev will be linked.
>>
>> The links will be removed as follows:
>>
>>     * When the queue device is removed, if its APQN is assigned to a matrix
>>       mdev, the structures representing the queue device and the matrix mdev
>>       will be unlinked.
>>
>>     * When an adapter or domain is unassigned from a matrix mdev, for each
>>       APQN unassigned that references a queue device bound to the vfio_ap
>>       device driver, the structures representing the queue device and the
>>       matrix mdev will be unlinked.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 140 +++++++++++++++++++++-----
>>   drivers/s390/crypto/vfio_ap_private.h |   3 +
>>   2 files changed, 117 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 835c963ae16d..cdcc6378b4a5 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -27,33 +27,17 @@
>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>   static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>>   
>> -/**
>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>> - * @matrix_mdev: the associated mediated matrix
>> - * @apqn: The queue APQN
>> - *
>> - * Retrieve a queue with a specific APQN from the list of the
>> - * devices of the vfio_ap_drv.
>> - * Verify that the APID and the APQI are set in the matrix.
>> - *
>> - * Returns the pointer to the associated vfio_ap_queue
>> - */
>> -static struct vfio_ap_queue *vfio_ap_get_queue(
>> -					struct ap_matrix_mdev *matrix_mdev,
>> -					int apqn)
>> +static struct vfio_ap_queue *
>> +vfio_ap_mdev_get_queue(struct ap_matrix_mdev *matrix_mdev, unsigned long apqn)
>>   {
>> -	struct vfio_ap_queue *q = NULL;
>> -
>> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>> -		return NULL;
>> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>> -		return NULL;
>> +	struct vfio_ap_queue *q;
>>   
>> -	q = vfio_ap_find_queue(apqn);
>> -	if (q)
>> -		q->matrix_mdev = matrix_mdev;
>> +	hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
>> +		if (q && (q->apqn == apqn))
>> +			return q;
>> +	}
>>   
>> -	return q;
>> +	return NULL;
>>   }
>>   
>>   /**
>> @@ -166,7 +150,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>>   		  status.response_code);
>>   end_free:
>>   	vfio_ap_free_aqic_resources(q);
>> -	q->matrix_mdev = NULL;
>>   	return status;
>>   }
>>   
>> @@ -282,7 +265,7 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
>>   				   struct ap_matrix_mdev, pqap_hook);
>>   
>> -	q = vfio_ap_get_queue(matrix_mdev, apqn);
>> +	q = vfio_ap_mdev_get_queue(matrix_mdev, apqn);
>>   	if (!q)
>>   		goto out_unlock;
>>   
>> @@ -325,6 +308,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   
>>   	matrix_mdev->mdev = mdev;
>>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> +	hash_init(matrix_mdev->qtable);
>>   	mdev_set_drvdata(mdev, matrix_mdev);
>>   	matrix_mdev->pqap_hook.hook = handle_pqap;
>>   	matrix_mdev->pqap_hook.owner = THIS_MODULE;
>> @@ -553,6 +537,50 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>   	return 0;
>>   }
>>   
>> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
>> +				    struct vfio_ap_queue *q)
>> +{
>> +	if (q) {
>> +		q->matrix_mdev = matrix_mdev;
>> +		hash_add(matrix_mdev->qtable,
>> +			 &q->mdev_qnode, q->apqn);
>> +	}
>> +}
>> +
>> +static void vfio_ap_mdev_link_apqn(struct ap_matrix_mdev *matrix_mdev, int apqn)
>> +{
>> +	struct vfio_ap_queue *q;
>> +
>> +	q = vfio_ap_find_queue(apqn);
>> +	vfio_ap_mdev_link_queue(matrix_mdev, q);
>> +}
>> +
>> +static void vfio_ap_mdev_unlink_queue(struct vfio_ap_queue *q)
>> +{
>> +	if (q) {
>> +		q->matrix_mdev = NULL;
>> +		hash_del(&q->mdev_qnode);
>> +	}
>> +}
>> +
>> +static void vfio_ap_mdev_unlink_apqn(int apqn)
>> +{
>> +	struct vfio_ap_queue *q;
>> +
>> +	q = vfio_ap_find_queue(apqn);
>> +	vfio_ap_mdev_unlink_queue(q);
>> +}
>> +
>> +static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
>> +				      unsigned long apid)
>> +{
>> +	unsigned long apqi;
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
>> +		vfio_ap_mdev_link_apqn(matrix_mdev,
>> +				       AP_MKQID(apid, apqi));
>> +}
>> +
>>   /**
>>    * assign_adapter_store
>>    *
>> @@ -622,6 +650,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	if (ret)
>>   		goto share_err;
>>   
>> +	vfio_ap_mdev_link_adapter(matrix_mdev, apid);
>>   	ret = count;
>>   	goto done;
>>   
>> @@ -634,6 +663,15 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   }
>>   static DEVICE_ATTR_WO(assign_adapter);
>>   
>> +static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev,
>> +					unsigned long apid)
>> +{
>> +	unsigned long apqi;
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
>> +		vfio_ap_mdev_unlink_apqn(AP_MKQID(apid, apqi));
>> +}
>> +
>>   /**
>>    * unassign_adapter_store
>>    *
>> @@ -673,6 +711,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> +	vfio_ap_mdev_unlink_adapter(matrix_mdev, apid);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>> @@ -699,6 +738,15 @@ vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
>>   	return 0;
>>   }
>>   
>> +static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
>> +				     unsigned long apqi)
>> +{
>> +	unsigned long apid;
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
>> +		vfio_ap_mdev_link_apqn(matrix_mdev, AP_MKQID(apid, apqi));
>> +}
>> +
>>   /**
>>    * assign_domain_store
>>    *
>> @@ -763,6 +811,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	if (ret)
>>   		goto share_err;
>>   
>> +	vfio_ap_mdev_link_domain(matrix_mdev, apqi);
>>   	ret = count;
>>   	goto done;
>>   
>> @@ -775,6 +824,14 @@ static ssize_t assign_domain_store(struct device *dev,
>>   }
>>   static DEVICE_ATTR_WO(assign_domain);
>>   
>> +static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev,
>> +				       unsigned long apqi)
>> +{
>> +	unsigned long apid;
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES)
>> +		vfio_ap_mdev_unlink_apqn(AP_MKQID(apid, apqi));
>> +}
>>   
>>   /**
>>    * unassign_domain_store
>> @@ -815,6 +872,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>> +	vfio_ap_mdev_unlink_domain(matrix_mdev, apqi);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>> @@ -1317,6 +1375,28 @@ void vfio_ap_mdev_unregister(void)
>>   	mdev_unregister_device(&matrix_dev->device);
>>   }
>>   
>> +/*
>> + * vfio_ap_queue_link_mdev
>> + *
>> + * @q: The queue to link with the matrix mdev.
>> + *
>> + * Links @q with the matrix mdev to which the queue's APQN is assigned.
>> + */
>> +static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
>> +{
>> +	unsigned long apid = AP_QID_CARD(q->apqn);
>> +	unsigned long apqi = AP_QID_QUEUE(q->apqn);
>> +	struct ap_matrix_mdev *matrix_mdev;
>> +
>> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
>> +		if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
>> +		    test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
>> +			vfio_ap_mdev_link_queue(matrix_mdev, q);
>> +			break;
>> +		}
>> +	}
>> +}
>> +
>>   int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>>   {
>>   	struct vfio_ap_queue *q;
>> @@ -1324,9 +1404,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>>   	q = kzalloc(sizeof(*q), GFP_KERNEL);
>>   	if (!q)
>>   		return -ENOMEM;
>> +	mutex_lock(&matrix_dev->lock);
>>   	dev_set_drvdata(&apdev->device, q);
>>   	q->apqn = to_ap_queue(&apdev->device)->qid;
>>   	q->saved_isc = VFIO_AP_ISC_INVALID;
>> +	vfio_ap_queue_link_mdev(q);
>> +	mutex_unlock(&matrix_dev->lock);
>> +
> Does the critical section have to include more than just
> vfio_ap_queue_link_mdev()? Did we need the critical section
> before this patch?

We did not need the critical section before this patch because
the only function that retrieved the vfio_ap_queue via the queue
device's drvdata was the remove callback. I included the initialization
of the vfio_ap_queue object under lock because the
vfio_ap_find_queue() function retrieves the vfio_ap_queue object from
the queue device's drvdata so it might be advantageous to initialize
it under the mdev lock. On the other hand, I can't come up with a good
argument to change this.


>
>>   	return 0;
>>   }
>>   
>> @@ -1341,6 +1425,10 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>   	apid = AP_QID_CARD(q->apqn);
>>   	apqi = AP_QID_QUEUE(q->apqn);
>>   	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> +
>> +	if (q->matrix_mdev)
>> +		vfio_ap_mdev_unlink_queue(q);
>> +
>>   	kfree(q);
>>   	mutex_unlock(&matrix_dev->lock);
>>   }
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index d9003de4fbad..4e5cc72fc0db 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -18,6 +18,7 @@
>>   #include <linux/delay.h>
>>   #include <linux/mutex.h>
>>   #include <linux/kvm_host.h>
>> +#include <linux/hashtable.h>
>>   
>>   #include "ap_bus.h"
>>   
>> @@ -86,6 +87,7 @@ struct ap_matrix_mdev {
>>   	struct kvm *kvm;
>>   	struct kvm_s390_module_hook pqap_hook;
>>   	struct mdev_device *mdev;
>> +	DECLARE_HASHTABLE(qtable, 8);
>>   };
>>   
>>   extern int vfio_ap_mdev_register(void);
>> @@ -97,6 +99,7 @@ struct vfio_ap_queue {
>>   	int	apqn;
>>   #define VFIO_AP_ISC_INVALID 0xff
>>   	unsigned char saved_isc;
>> +	struct hlist_node mdev_qnode;
>>   };
>>   
>>   int vfio_ap_mdev_probe_queue(struct ap_device *queue);


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset
  2021-01-13 21:21       ` Halil Pasic
@ 2021-01-14  0:46         ` Tony Krowiak
  2021-01-14  3:13           ` Halil Pasic
  0 siblings, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2021-01-14  0:46 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/13/21 4:21 PM, Halil Pasic wrote:
> On Wed, 13 Jan 2021 12:06:28 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 1/11/21 11:32 AM, Halil Pasic wrote:
>>> On Tue, 22 Dec 2020 20:15:53 -0500
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> The queues assigned to a matrix mediated device are currently reset when:
>>>>
>>>> * The VFIO_DEVICE_RESET ioctl is invoked
>>>> * The mdev fd is closed by userspace (QEMU)
>>>> * The mdev is removed from sysfs.
>>>>
>>>> Immediately after the reset of a queue, a call is made to disable
>>>> interrupts for the queue. This is entirely unnecessary because the reset of
>>>> a queue disables interrupts, so this will be removed.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/crypto/vfio_ap_drv.c     |  1 -
>>>>    drivers/s390/crypto/vfio_ap_ops.c     | 40 +++++++++++++++++----------
>>>>    drivers/s390/crypto/vfio_ap_private.h |  1 -
>>>>    3 files changed, 26 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>>>> index be2520cc010b..ca18c91afec9 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>>> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
>>>>    	apid = AP_QID_CARD(q->apqn);
>>>>    	apqi = AP_QID_QUEUE(q->apqn);
>>>>    	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>>>> -	vfio_ap_irq_disable(q);
>>>>    	kfree(q);
>>>>    	mutex_unlock(&matrix_dev->lock);
>>>>    }
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 7339043906cf..052f61391ec7 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -25,6 +25,7 @@
>>>>    #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
>>>>    
>>>>    static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
>>>>    
>>>>    static int match_apqn(struct device *dev, const void *data)
>>>>    {
>>>> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
>>>>    					int apqn)
>>>>    {
>>>>    	struct vfio_ap_queue *q;
>>>> -	struct device *dev;
>>>>    
>>>>    	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>>>>    		return NULL;
>>>>    	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>>>>    		return NULL;
>>>>    
>>>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>>> -				 &apqn, match_apqn);
>>>> -	if (!dev)
>>>> -		return NULL;
>>>> -	q = dev_get_drvdata(dev);
>>>> -	q->matrix_mdev = matrix_mdev;
>>>> -	put_device(dev);
>>>> +	q = vfio_ap_find_queue(apqn);
>>>> +	if (q)
>>>> +		q->matrix_mdev = matrix_mdev;
>>>>    
>>>>    	return q;
>>>>    }
>>>> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>>>    	return notify_rc;
>>>>    }
>>>>    
>>>> -static void (int apqn)
>>>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
>>>>    {
>>>>    	struct device *dev;
>>>> -	struct vfio_ap_queue *q;
>>>> +	struct vfio_ap_queue *q = NULL;
>>>>    
>>>>    	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>>>    				 &apqn, match_apqn);
>>>>    	if (dev) {
>>>>    		q = dev_get_drvdata(dev);
>>>> -		vfio_ap_irq_disable(q);
>>>>    		put_device(dev);
>>>>    	}
>>>> +
>>>> +	return q;
>>>>    }
>>> This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
>>> have next to nothing to do with the patch's objective. If we were at an
>>> earlier stage, I would ask to split it up.
>> The rewrite of vfio_ap_get_queue() definitely is related to this
>> patch's objective.
> Definitively loosely related.

A matter of opinion I suppose and I respect yours.

>
>> Below, in the vfio_ap_mdev_reset_queue()
>> function, there is the label 'free_aqic_resources' which is where
>> the call to vfio_ap_free_aqic_resources() function is called.
>> That function takes a struct vfio_ap_queue as an argument,
>> so the object needs to be retrieved prior to calling the function.
>> We can't use the vfio_ap_get_queue() function for two reasons:
>> 1. The vfio_ap_get_queue() function takes a struct ap_matrix_mdev
>>       as a parameter and we do not have a pointer to such at the time.
>> 2. The vfio_ap_get_queue() function is used to link the mdev to the
>>       vfio_ap_queue object with the specified APQN.
>> So, we needed a way to retrieve the vfio_ap_queue object by its
>> APQN only, Rather than creating a function that retrieves the
>> vfio_ap_queue object which duplicates the retrieval code in
>> vfio_ap_get_queue(), I created the vfio_ap_find_queue()
>> function to do just that and modified the vfio_ap_get_queue()
>> function to call it (i.e., code reuse).
> Please tell me what prevented you from doing a doing the splitting out
> vfio_ap_find_queue() from vfio_ap_get_queue() in a separate patch, that
> precedes this patch? It would have resulted in simpler diffs, because
> the split out wouldn't be intermingled with other stuff, i.e. getting
> rid of vfio_ap_irq_disable_apqn(). Don't you see that the two are
> intermingled in this diff?

I included this here for the reasons I stated above.
If I was reviewing these patches and saw this in a separate
patch I would wonder why it was being done since it would
be an isolated change requiring examination of subsequent
patches to figure out why it was done.  Since you have
taken the time to bring this up again I'll go ahead and do it
since I have no major objections and it is a fairly simple change.

>
>>
>>>   
>>>>    
>>>>    int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>>>    			     unsigned int retry)
>>>>    {
>>>>    	struct ap_queue_status status;
>>>> +	struct vfio_ap_queue *q;
>>>> +	int ret;
>>>>    	int retry2 = 2;
>>>>    	int apqn = AP_MKQID(apid, apqi);
>>>>    
>>>> @@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>>>    				status = ap_tapq(apqn, NULL);
>>>>    			}
>>>>    			WARN_ON_ONCE(retry2 <= 0);
>>>> -			return 0;
>>>> +			ret = 0;
>>>> +			goto free_aqic_resources;
>>>>    		case AP_RESPONSE_RESET_IN_PROGRESS:
>>>>    		case AP_RESPONSE_BUSY:
>>>>    			msleep(20);
>>>>    			break;
>>>>    		default:
>>>>    			/* things are really broken, give up */
>>>> -			return -EIO;
>>>> +			ret = -EIO;
>>>> +			goto free_aqic_resources;
>>> Do we really want the unpin here? I mean the reset did not work and
>>> we are giving up. So the irqs are potentially still enabled.
>>>
>>> Without this patch we try to disable the interrupts using AQIC, and
>>> do the cleanup after that.
>> If the reset failure lands here, then a subsequent AQIC will
>> also fail, so I see no reason to expend processing time for
>> something that will ultimately fail anyways.
>>
>>> I'm aware, the comment says we should not take the default branch,
>>> but if that's really the case we should IMHO log an error and leak the
>>> page.
>> I do not see a good reason to leak the page, what purpose would
>> it serve?
> Well, the thing is we don't have a case for AP_RESPONSE_CHECKSTOPPED,
> which is, AFAIK a valid outcome. I don't remember what is the exact
> deal with checkstopped regarding interrupts.

The AP_RESPONSE_CHECKSTOPPED response code is set
when the AP function can not be performed due to a
machine failure resulting in loss of connectivity to the
queue. I find it hard to believe that interrupts would
continue to be signaled in that case. I will check with
the architecture folks for verification.

>
> If we take the default with something different
> than AP_RESPONSE_CHECKSTOPPED, that is AFAICT a bug of the underlying
> machine.

I think AP_RESPONSE_CHECKSTOPPED indicates a problem with
the machine also.

>
>> I don't have a problem with logging an error, do you think
>> it should just be a log message or a WARN_ON type of thing?
>>
> Seeing an outcome we don't expect to see, due to a bug in the underlying
> machine is in my book worth an error message. Furthermore we may not
> assume that the interrupts where shut down for the queue. So the only
> way we can protect the host is by leaking the page.

I won't assume anything - although I seriously doubt interrupts
will continue with a broken device - so I will get input from the
architecture folks regarding interrupts after a non-zero response
code.

>
>>> It's up to you if you want to change this. I don't want to delay the
>>> series any further than absolutely necessary.
>>>
>>> Acked-by: Halil Pasic <pasic@linux.ibm.com>
>>>   
>>>>    		}
>>>>    	} while (retry--);
>>>>    
>>>>    	return -EBUSY;
>>>> +
>>>> +free_aqic_resources:
>>>> +	/*
>>>> +	 * In order to free the aqic resources, the queue must be linked to
>>>> +	 * the matrix_mdev to which its APQN is assigned and the KVM pointer
>>>> +	 * must be available.
>>>> +	 */
>>>> +	q = vfio_ap_find_queue(apqn);
>>>> +	if (q && q->matrix_mdev && q->matrix_mdev->kvm)
>>> Is this of the type "we know there are no aqic resources to be freed" if
>>> precondition is false?
>> Yes
>>
>>> vfio_ap_free_aqic_resources() checks the matrix_mdev pointer but not the
>>> kvm pointer. Could we just check the kvm pointer in
>>> vfio_ap_free_aqic_resources()?
>> A while back I posted a patch that did just that and someone pushed back
>> because they could not see how the vfio_ap_free_aqic_resources()
>> function would ever be called with a NULL kvm pointer which is
>> why I implemented the above check. The reset is called
>> when the mdev is removed which can happen only when there
>> is no kvm pointer, so I agree it would be better to check the kvm
>> pointer in the vfio_ap_free_aqic_resources() function.
>>
> I don't remember. Sorry if it was me.
>
>>> At the end of the series, is seeing q! indicating a bug, or is it
>>> something we expect to see under certain circumstances?
>> I'm not quite sure to what you are referring regarding "the
>> end of the series", but we can expect to see a NULL pointer
>> for q if a queue is manually unbound from the driver.
> By at the end of the series, I mean with all 15 patches applied.
>
> Regarding the case where the queue is manually unbound form the
> driver, this is exactly one of the scenarios I was latently concerned
> about. Let me explain. The manually unbound queue was already reset
> in vfio_ap_mdev_remove_queue() if necessary, so we don't need to reset
> it again. And more importantly it is not bound to the vfio_ap driver,
> so vfio_ap is not allowed to reset it. (It could in theory belong to
> and be in use by another non-default driver).
>
> I've just checked out vfio_ap_mdev_reset_queues() and it resets all
> queues in the matrix. The in use mechanism does ensure that zcrypt
> can't use these queues (together with a[pq]mask), but resetting a
> queue that does not belong to us is going beyond our authority.

I agree which is why in the next version I am only resetting a queue if
it is bound at the time of the reset.

>
>
> Regards,
> Halil
>
>>>   
>>>> +		vfio_ap_free_aqic_resources(q);
>>>> +
>>>> +	return ret;
>>>>    }
>>>>    
>>>>    static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>>> @@ -1189,7 +1202,6 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>>>    			 */
>>>>    			if (ret)
>>>>    				rc = ret;
>>>> -			vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
>>>>    		}
>>>>    	}
>>>>    
>>>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>>>> index f46dde56b464..0db6fb3d56d5 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_private.h
>>>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>>>> @@ -100,5 +100,4 @@ struct vfio_ap_queue {
>>>>    #define VFIO_AP_ISC_INVALID 0xff
>>>>    	unsigned char saved_isc;
>>>>    };
>>>> -struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
>>>>    #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev
  2021-01-13 21:41     ` Tony Krowiak
@ 2021-01-14  2:50       ` Halil Pasic
  2021-01-14 21:10         ` Tony Krowiak
  0 siblings, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-14  2:50 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Wed, 13 Jan 2021 16:41:27 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 1/11/21 2:17 PM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:15:56 -0500
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> Let's create links between each queue device bound to the vfio_ap device
> >> driver and the matrix mdev to which the queue's APQN is assigned. The idea
> >> is to facilitate efficient retrieval of the objects representing the queue
> >> devices and matrix mdevs as well as to verify that a queue assigned to
> >> a matrix mdev is bound to the driver.
> >>
> >> The links will be created as follows:
> >>
> >>     * When the queue device is probed, if its APQN is assigned to a matrix
> >>       mdev, the structures representing the queue device and the matrix mdev
> >>       will be linked.
> >>
> >>     * When an adapter or domain is assigned to a matrix mdev, for each new
> >>       APQN assigned that references a queue device bound to the vfio_ap
> >>       device driver, the structures representing the queue device and the
> >>       matrix mdev will be linked.
> >>
> >> The links will be removed as follows:
> >>
> >>     * When the queue device is removed, if its APQN is assigned to a matrix
> >>       mdev, the structures representing the queue device and the matrix mdev
> >>       will be unlinked.
> >>
> >>     * When an adapter or domain is unassigned from a matrix mdev, for each
> >>       APQN unassigned that references a queue device bound to the vfio_ap
> >>       device driver, the structures representing the queue device and the
> >>       matrix mdev will be unlinked.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>  
> > Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
> >  

[..]

> >> +
> >>   int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> >>   {
> >>   	struct vfio_ap_queue *q;
> >> @@ -1324,9 +1404,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
> >>   	q = kzalloc(sizeof(*q), GFP_KERNEL);
> >>   	if (!q)
> >>   		return -ENOMEM;
> >> +	mutex_lock(&matrix_dev->lock);
> >>   	dev_set_drvdata(&apdev->device, q);
> >>   	q->apqn = to_ap_queue(&apdev->device)->qid;
> >>   	q->saved_isc = VFIO_AP_ISC_INVALID;
> >> +	vfio_ap_queue_link_mdev(q);
> >> +	mutex_unlock(&matrix_dev->lock);
> >> +  
> > Does the critical section have to include more than just
> > vfio_ap_queue_link_mdev()? Did we need the critical section
> > before this patch?  
> 
> We did not need the critical section before this patch because
> the only function that retrieved the vfio_ap_queue via the queue
> device's drvdata was the remove callback. I included the initialization
> of the vfio_ap_queue object under lock because the
> vfio_ap_find_queue() function retrieves the vfio_ap_queue object from
> the queue device's drvdata so it might be advantageous to initialize
> it under the mdev lock. On the other hand, I can't come up with a good
> argument to change this.
> 
> 

I was asking out of curiosity, not because I want it changed. I was
also wondering if somebody could see a partially initialized device:
we even first call dev_set_drvdata() and only then finish the
initialization. Before 's390/vfio-ap: use new AP bus interface to search
for queue devices', which is the previous patch, we had the klist code
in between, which uses spinlocks, which I think ensure, that all
effects of probe are seen when we get the queue from
vfio_ap_find_queue(). But with patch 4 in place that is not the case any
more. Or am I wrong?

Regards,
Halil

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset
  2021-01-14  0:46         ` Tony Krowiak
@ 2021-01-14  3:13           ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-14  3:13 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Wed, 13 Jan 2021 19:46:03 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 1/13/21 4:21 PM, Halil Pasic wrote:
> > On Wed, 13 Jan 2021 12:06:28 -0500
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> On 1/11/21 11:32 AM, Halil Pasic wrote:  
> >>> On Tue, 22 Dec 2020 20:15:53 -0500
> >>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >>>     
> >>>> The queues assigned to a matrix mediated device are currently reset when:
> >>>>
> >>>> * The VFIO_DEVICE_RESET ioctl is invoked
> >>>> * The mdev fd is closed by userspace (QEMU)
> >>>> * The mdev is removed from sysfs.
> >>>>
> >>>> Immediately after the reset of a queue, a call is made to disable
> >>>> interrupts for the queue. This is entirely unnecessary because the reset of
> >>>> a queue disables interrupts, so this will be removed.
> >>>>
> >>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> >>>> ---
> >>>>    drivers/s390/crypto/vfio_ap_drv.c     |  1 -
> >>>>    drivers/s390/crypto/vfio_ap_ops.c     | 40 +++++++++++++++++----------
> >>>>    drivers/s390/crypto/vfio_ap_private.h |  1 -
> >>>>    3 files changed, 26 insertions(+), 16 deletions(-)
> >>>>
> >>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> >>>> index be2520cc010b..ca18c91afec9 100644
> >>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
> >>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> >>>> @@ -79,7 +79,6 @@ static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
> >>>>    	apid = AP_QID_CARD(q->apqn);
> >>>>    	apqi = AP_QID_QUEUE(q->apqn);
> >>>>    	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >>>> -	vfio_ap_irq_disable(q);
> >>>>    	kfree(q);
> >>>>    	mutex_unlock(&matrix_dev->lock);
> >>>>    }
> >>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >>>> index 7339043906cf..052f61391ec7 100644
> >>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >>>> @@ -25,6 +25,7 @@
> >>>>    #define VFIO_AP_MDEV_NAME_HWVIRT "VFIO AP Passthrough Device"
> >>>>    
> >>>>    static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> >>>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn);
> >>>>    
> >>>>    static int match_apqn(struct device *dev, const void *data)
> >>>>    {
> >>>> @@ -49,20 +50,15 @@ static struct vfio_ap_queue *(
> >>>>    					int apqn)
> >>>>    {
> >>>>    	struct vfio_ap_queue *q;
> >>>> -	struct device *dev;
> >>>>    
> >>>>    	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >>>>    		return NULL;
> >>>>    	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> >>>>    		return NULL;
> >>>>    
> >>>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> >>>> -				 &apqn, match_apqn);
> >>>> -	if (!dev)
> >>>> -		return NULL;
> >>>> -	q = dev_get_drvdata(dev);
> >>>> -	q->matrix_mdev = matrix_mdev;
> >>>> -	put_device(dev);
> >>>> +	q = vfio_ap_find_queue(apqn);
> >>>> +	if (q)
> >>>> +		q->matrix_mdev = matrix_mdev;
> >>>>    
> >>>>    	return q;
> >>>>    }
> >>>> @@ -1126,24 +1122,27 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> >>>>    	return notify_rc;
> >>>>    }
> >>>>    
> >>>> -static void (int apqn)
> >>>> +static struct vfio_ap_queue *vfio_ap_find_queue(int apqn)
> >>>>    {
> >>>>    	struct device *dev;
> >>>> -	struct vfio_ap_queue *q;
> >>>> +	struct vfio_ap_queue *q = NULL;
> >>>>    
> >>>>    	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> >>>>    				 &apqn, match_apqn);
> >>>>    	if (dev) {
> >>>>    		q = dev_get_drvdata(dev);
> >>>> -		vfio_ap_irq_disable(q);
> >>>>    		put_device(dev);
> >>>>    	}
> >>>> +
> >>>> +	return q;
> >>>>    }  
> >>> This hunk and the previous one are a rewrite of vfio_ap_get_queue() and
> >>> have next to nothing to do with the patch's objective. If we were at an
> >>> earlier stage, I would ask to split it up.  
> >> The rewrite of vfio_ap_get_queue() definitely is related to this
> >> patch's objective.  
> > Definitively loosely related.  
> 
> A matter of opinion I suppose and I respect yours.
> 
> >  
> >> Below, in the vfio_ap_mdev_reset_queue()
> >> function, there is the label 'free_aqic_resources' which is where
> >> the call to vfio_ap_free_aqic_resources() function is called.
> >> That function takes a struct vfio_ap_queue as an argument,
> >> so the object needs to be retrieved prior to calling the function.
> >> We can't use the vfio_ap_get_queue() function for two reasons:
> >> 1. The vfio_ap_get_queue() function takes a struct ap_matrix_mdev
> >>       as a parameter and we do not have a pointer to such at the time.
> >> 2. The vfio_ap_get_queue() function is used to link the mdev to the
> >>       vfio_ap_queue object with the specified APQN.
> >> So, we needed a way to retrieve the vfio_ap_queue object by its
> >> APQN only, Rather than creating a function that retrieves the
> >> vfio_ap_queue object which duplicates the retrieval code in
> >> vfio_ap_get_queue(), I created the vfio_ap_find_queue()
> >> function to do just that and modified the vfio_ap_get_queue()
> >> function to call it (i.e., code reuse).  
> > Please tell me what prevented you from doing a doing the splitting out
> > vfio_ap_find_queue() from vfio_ap_get_queue() in a separate patch, that
> > precedes this patch? It would have resulted in simpler diffs, because
> > the split out wouldn't be intermingled with other stuff, i.e. getting
> > rid of vfio_ap_irq_disable_apqn(). Don't you see that the two are
> > intermingled in this diff?  
> 
> I included this here for the reasons I stated above.
> If I was reviewing these patches and saw this in a separate
> patch I would wonder why it was being done since it would
> be an isolated change requiring examination of subsequent
> patches to figure out why it was done.  Since you have
> taken the time to bring this up again I'll go ahead and do it
> since I have no major objections and it is a fairly simple change.
> 

As stated in my first comment, I don't insist. I made the comment
with future patches in mind. Splitting out prep work isn't unusual
at all, but you are right, the motivation should be stated in the
commit message. 

> >  
> >>  
> >>>     
> >>>>    
> >>>>    int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> >>>>    			     unsigned int retry)
> >>>>    {
> >>>>    	struct ap_queue_status status;
> >>>> +	struct vfio_ap_queue *q;
> >>>> +	int ret;
> >>>>    	int retry2 = 2;
> >>>>    	int apqn = AP_MKQID(apid, apqi);
> >>>>    
> >>>> @@ -1156,18 +1155,32 @@ int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> >>>>    				status = ap_tapq(apqn, NULL);
> >>>>    			}
> >>>>    			WARN_ON_ONCE(retry2 <= 0);
> >>>> -			return 0;
> >>>> +			ret = 0;
> >>>> +			goto free_aqic_resources;
> >>>>    		case AP_RESPONSE_RESET_IN_PROGRESS:
> >>>>    		case AP_RESPONSE_BUSY:
> >>>>    			msleep(20);
> >>>>    			break;
> >>>>    		default:
> >>>>    			/* things are really broken, give up */
> >>>> -			return -EIO;
> >>>> +			ret = -EIO;
> >>>> +			goto free_aqic_resources;  
> >>> Do we really want the unpin here? I mean the reset did not work and
> >>> we are giving up. So the irqs are potentially still enabled.
> >>>
> >>> Without this patch we try to disable the interrupts using AQIC, and
> >>> do the cleanup after that.  
> >> If the reset failure lands here, then a subsequent AQIC will
> >> also fail, so I see no reason to expend processing time for
> >> something that will ultimately fail anyways.
> >>  
> >>> I'm aware, the comment says we should not take the default branch,
> >>> but if that's really the case we should IMHO log an error and leak the
> >>> page.  
> >> I do not see a good reason to leak the page, what purpose would
> >> it serve?  
> > Well, the thing is we don't have a case for AP_RESPONSE_CHECKSTOPPED,
> > which is, AFAIK a valid outcome. I don't remember what is the exact
> > deal with checkstopped regarding interrupts.  
> 
> The AP_RESPONSE_CHECKSTOPPED response code is set
> when the AP function can not be performed due to a
> machine failure resulting in loss of connectivity to the
> queue. I find it hard to believe that interrupts would
> continue to be signaled in that case. I will check with
> the architecture folks for verification.
> 
> >
> > If we take the default with something different
> > than AP_RESPONSE_CHECKSTOPPED, that is AFAICT a bug of the underlying
> > machine.  
> 
> I think AP_RESPONSE_CHECKSTOPPED indicates a problem with
> the machine also.
> 

I think it indicates a problem with a queue, i.e. an IO device. It
is not the same as the machine is doing stuff it should never do.

> >  
> >> I don't have a problem with logging an error, do you think
> >> it should just be a log message or a WARN_ON type of thing?
> >>  
> > Seeing an outcome we don't expect to see, due to a bug in the underlying
> > machine is in my book worth an error message. Furthermore we may not
> > assume that the interrupts where shut down for the queue. So the only
> > way we can protect the host is by leaking the page.  
> 
> I won't assume anything - although I seriously doubt interrupts
> will continue with a broken device - so I will get input from the
> architecture folks regarding interrupts after a non-zero response
> code.
> 

I tend to agree, I just didn't re-read the stuff, and I don't remember
all the details. But what I do seem to remember is that, if the
checkstopped queue ever becomes operational again, it will come back
clean (i.e. as if reset). So if we are sure, there won't be any surprise
interrupts after some point (e.g. we observe the reason code for
checkstopped), we could just say the reset was OK because there in
no need to reset.

So maybe on AP_RESPONSE_CHECKSTOPPED we should do thesame as on
AP_RESPONSE_NORMAL, or?


> >  
> >>> It's up to you if you want to change this. I don't want to delay the
> >>> series any further than absolutely necessary.
> >>>
> >>> Acked-by: Halil Pasic <pasic@linux.ibm.com>
> >>>     
> >>>>    		}
> >>>>    	} while (retry--);
> >>>>    
> >>>>    	return -EBUSY;
> >>>> +
> >>>> +free_aqic_resources:
> >>>> +	/*
> >>>> +	 * In order to free the aqic resources, the queue must be linked to
> >>>> +	 * the matrix_mdev to which its APQN is assigned and the KVM pointer
> >>>> +	 * must be available.
> >>>> +	 */
> >>>> +	q = vfio_ap_find_queue(apqn);
> >>>> +	if (q && q->matrix_mdev && q->matrix_mdev->kvm)  
> >>> Is this of the type "we know there are no aqic resources to be freed" if
> >>> precondition is false?  
> >> Yes
> >>  
> >>> vfio_ap_free_aqic_resources() checks the matrix_mdev pointer but not the
> >>> kvm pointer. Could we just check the kvm pointer in
> >>> vfio_ap_free_aqic_resources()?  
> >> A while back I posted a patch that did just that and someone pushed back
> >> because they could not see how the vfio_ap_free_aqic_resources()
> >> function would ever be called with a NULL kvm pointer which is
> >> why I implemented the above check. The reset is called
> >> when the mdev is removed which can happen only when there
> >> is no kvm pointer, so I agree it would be better to check the kvm
> >> pointer in the vfio_ap_free_aqic_resources() function.
> >>  
> > I don't remember. Sorry if it was me.
> >  
> >>> At the end of the series, is seeing q! indicating a bug, or is it
> >>> something we expect to see under certain circumstances?  
> >> I'm not quite sure to what you are referring regarding "the
> >> end of the series", but we can expect to see a NULL pointer
> >> for q if a queue is manually unbound from the driver.  
> > By at the end of the series, I mean with all 15 patches applied.
> >
> > Regarding the case where the queue is manually unbound form the
> > driver, this is exactly one of the scenarios I was latently concerned
> > about. Let me explain. The manually unbound queue was already reset
> > in vfio_ap_mdev_remove_queue() if necessary, so we don't need to reset
> > it again. And more importantly it is not bound to the vfio_ap driver,
> > so vfio_ap is not allowed to reset it. (It could in theory belong to
> > and be in use by another non-default driver).
> >
> > I've just checked out vfio_ap_mdev_reset_queues() and it resets all
> > queues in the matrix. The in use mechanism does ensure that zcrypt
> > can't use these queues (together with a[pq]mask), but resetting a
> > queue that does not belong to us is going beyond our authority.  
> 
> I agree which is why in the next version I am only resetting a queue if
> it is bound at the time of the reset.
> 


Sounds good. Please keep my ack unless, the changes turn out
extensive. I will have a look at the new stuff again, and hopefully
upgrade to r-b. But I'm also fine with merging this patch as is
and addressing the stuff discussed later (hence the ack). It's up
to you.

Regards,
Halil

[..]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2021-01-11 20:40   ` Halil Pasic
@ 2021-01-14 17:54     ` Tony Krowiak
  2021-01-15  1:08       ` Halil Pasic
  2021-01-15  1:44       ` Halil Pasic
  0 siblings, 2 replies; 48+ messages in thread
From: Tony Krowiak @ 2021-01-14 17:54 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/11/21 3:40 PM, Halil Pasic wrote:
> On Tue, 22 Dec 2020 20:15:57 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The current implementation does not allow assignment of an AP adapter or
>> domain to an mdev device if each APQN resulting from the assignment
>> does not reference an AP queue device that is bound to the vfio_ap device
>> driver. This patch allows assignment of AP resources to the matrix mdev as
>> long as the APQNs resulting from the assignment:
>>     1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
>>     2. Are not assigned to another matrix mdev.
>>
>> The rationale behind this is twofold:
>>     1. The AP architecture does not preclude assignment of APQNs to an AP
>>        configuration that are not available to the system.
>>     2. APQNs that do not reference a queue device bound to the vfio_ap
>>        device driver will not be assigned to the guest's CRYCB, so the
>>        guest will not get access to queues not bound to the vfio_ap driver.
> You didn't tell us about the changed error code.

I am assuming you are talking about returning -EBUSY from
the vfio_ap_mdev_verify_no_sharing() function instead of
-EADDRINUSE. I'm going to change this back per your comments
below.

>
> Also notice that this point we don't have neither filtering nor in-use.
> This used to be patch 11, and most of that stuff used to be in place. But
> I'm going to trust you, if you say its fine to enable it this early.

The patch order was changed due to your review comments in
in Message ID <20201126165431.6ef1457a.pasic@linux.ibm.com>,
patch 07/17 in the v12 series. In order to ensure that only queues
bound to the vfio_ap driver are given to the guest, I'm going to
create a patch that will preceded this one which introduces the
filtering code currently introduced in the patch 12/17, the hot
plug patch.

>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 241 ++++++++----------------------
>>   1 file changed, 62 insertions(+), 179 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index cdcc6378b4a5..2d58b39977be 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -379,134 +379,37 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>>   	NULL,
>>   };
>>   
>> -struct vfio_ap_queue_reserved {
>> -	unsigned long *apid;
>> -	unsigned long *apqi;
>> -	bool reserved;
>> -};
>> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>> +			 "already assigned to %s"
>>   
>> -/**
>> - * vfio_ap_has_queue
>> - *
>> - * @dev: an AP queue device
>> - * @data: a struct vfio_ap_queue_reserved reference
>> - *
>> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
>> - * apid or apqi specified in @data:
>> - *
>> - * - If @data contains both an apid and apqi value, then @data will be flagged
>> - *   as reserved if the APID and APQI fields for the AP queue device matches
>> - *
>> - * - If @data contains only an apid value, @data will be flagged as
>> - *   reserved if the APID field in the AP queue device matches
>> - *
>> - * - If @data contains only an apqi value, @data will be flagged as
>> - *   reserved if the APQI field in the AP queue device matches
>> - *
>> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
>> - * @data does not contain either an apid or apqi.
>> - */
>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>> +static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
>> +					 unsigned long *apm,
>> +					 unsigned long *aqm)
> [..]
>> -	return 0;
>> +	for_each_set_bit_inv(apid, apm, AP_DEVICES)
>> +		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
>> +			pr_warn(MDEV_SHARING_ERR, apid, apqi, mdev_name);
> I would prefer dev_warn() here. We know which device is about to get
> more queues, and this device can provide a clue regarding the initiator.

Will do.

>
> Also I believe a warning is too heavy handed here. Warnings should not
> be ignored. This is a condition that can emerge during normal operation,
> AFAIU. Or am I worng?

It can happen during normal operation, but we had this discussion
in the previous review. Both Connie and I felt it should be a warning
since this message is the only way for a user to identify the queues
in use. A message of lower severity may not get logged depriving the
user from easily determining why an adapter or domain could not
be assigned.

>
>>   }
>>   
>>   /**
>>    * vfio_ap_mdev_verify_no_sharing
>>    *
>> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
>> - * and AP queue indexes comprising the AP matrix are not configured for another
>> - * mediated device. AP queue sharing is not allowed.
>> + * Verifies that each APQN derived from the Cartesian product of the AP adapter
>> + * IDs and AP queue indexes comprising the AP matrix are not configured for
>> + * another mediated device. AP queue sharing is not allowed.
>>    *
>> - * @matrix_mdev: the mediated matrix device
>> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
>> + *		 are assigned.
>> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
>> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>>    *
>> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>> + * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
>>    */
>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>> +					  unsigned long *mdev_apm,
>> +					  unsigned long *mdev_aqm)
>>   {
>>   	struct ap_matrix_mdev *lstdev;
>>   	DECLARE_BITMAP(apm, AP_DEVICES);
>> @@ -523,20 +426,31 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>   		 * We work on full longs, as we can only exclude the leftover
>>   		 * bits in non-inverse order. The leftover is all zeros.
>>   		 */
>> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>> -				lstdev->matrix.apm, AP_DEVICES))
>> +		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>>   			continue;
>>   
>> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>> -				lstdev->matrix.aqm, AP_DOMAINS))
>> +		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>>   			continue;
>>   
>> -		return -EADDRINUSE;
>> +		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
>> +					     apm, aqm);
>> +
>> +		return -EBUSY;
> Why do we change -EADDRINUSE to -EBUSY? This gets bubbled up to
> userspace, or? So a tool that checks for the other mdev has it
> condition by checking for -EADDRINUSE, would be confused...

Back in v8 of the series, Christian suggested the occurrences
of -EADDRINUSE should be replaced by the more appropriate
-EBUSY (Message ID <d7954c15-b14f-d6e5-0193-aadca61883a8@de.ibm.com>),
so I changed it here. It does get bubbled up to userspace, so you make a 
valid point. I will
change it back. I will, however, set the value returned from the
__verify_card_reservations() function in ap_bus.c to -EBUSY as
suggested by Christian.

>
>>   	}
>>   
>>   	return 0;
>>   }
>>   
>> +static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
>> +				       unsigned long *mdev_apm,
>> +				       unsigned long *mdev_aqm)
>> +{
>> +	if (ap_apqn_in_matrix_owned_by_def_drv(mdev_apm, mdev_aqm))
>> +		return -EADDRNOTAVAIL;
>> +
>> +	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
>> +}
>> +
>>   static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
>>   				    struct vfio_ap_queue *q)
>>   {
>> @@ -608,10 +522,10 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
>>    *	   driver; or, if no APQIs have yet been assigned, the APID is not
>>    *	   contained in an APQN bound to the vfio_ap device driver.
>>    *
>> - *	4. -EADDRINUSE
>> + *	4. -EBUSY
>>    *	   An APQN derived from the cross product of the APID being assigned
>>    *	   and the APQIs previously assigned is being used by another mediated
>> - *	   matrix device
>> + *	   matrix device or the mdev lock could not be acquired.
> This is premature. We don't use try_lock yet.

Yes it is, the comment describing the -EBUSY return code will
be moved to patch 11/15 where it is the try_lock is introduced.

>
> [..]
>
>>   static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
>>   				     unsigned long apqi)
>>   {
>> @@ -774,10 +660,10 @@ static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
>>    *	   driver; or, if no APIDs have yet been assigned, the APQI is not
>>    *	   contained in an APQN bound to the vfio_ap device driver.
>>    *
>> - *	4. -EADDRINUSE
>> + *	4. -BUSY
>>    *	   An APQN derived from the cross product of the APQI being assigned
>>    *	   and the APIDs previously assigned is being used by another mediated
>> - *	   matrix device
>> + *	   matrix device or the mdev lock could not be acquired.
> Same here as above.
>
> Otherwise looks good.
>
> [..]


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev
  2021-01-14  2:50       ` Halil Pasic
@ 2021-01-14 21:10         ` Tony Krowiak
  0 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2021-01-14 21:10 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/13/21 9:50 PM, Halil Pasic wrote:
> On Wed, 13 Jan 2021 16:41:27 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 1/11/21 2:17 PM, Halil Pasic wrote:
>>> On Tue, 22 Dec 2020 20:15:56 -0500
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> Let's create links between each queue device bound to the vfio_ap device
>>>> driver and the matrix mdev to which the queue's APQN is assigned. The idea
>>>> is to facilitate efficient retrieval of the objects representing the queue
>>>> devices and matrix mdevs as well as to verify that a queue assigned to
>>>> a matrix mdev is bound to the driver.
>>>>
>>>> The links will be created as follows:
>>>>
>>>>      * When the queue device is probed, if its APQN is assigned to a matrix
>>>>        mdev, the structures representing the queue device and the matrix mdev
>>>>        will be linked.
>>>>
>>>>      * When an adapter or domain is assigned to a matrix mdev, for each new
>>>>        APQN assigned that references a queue device bound to the vfio_ap
>>>>        device driver, the structures representing the queue device and the
>>>>        matrix mdev will be linked.
>>>>
>>>> The links will be removed as follows:
>>>>
>>>>      * When the queue device is removed, if its APQN is assigned to a matrix
>>>>        mdev, the structures representing the queue device and the matrix mdev
>>>>        will be unlinked.
>>>>
>>>>      * When an adapter or domain is unassigned from a matrix mdev, for each
>>>>        APQN unassigned that references a queue device bound to the vfio_ap
>>>>        device driver, the structures representing the queue device and the
>>>>        matrix mdev will be unlinked.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
>>>   
> [..]
>
>>>> +
>>>>    int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>>>>    {
>>>>    	struct vfio_ap_queue *q;
>>>> @@ -1324,9 +1404,13 @@ int vfio_ap_mdev_probe_queue(struct ap_device *apdev)
>>>>    	q = kzalloc(sizeof(*q), GFP_KERNEL);
>>>>    	if (!q)
>>>>    		return -ENOMEM;
>>>> +	mutex_lock(&matrix_dev->lock);
>>>>    	dev_set_drvdata(&apdev->device, q);
>>>>    	q->apqn = to_ap_queue(&apdev->device)->qid;
>>>>    	q->saved_isc = VFIO_AP_ISC_INVALID;
>>>> +	vfio_ap_queue_link_mdev(q);
>>>> +	mutex_unlock(&matrix_dev->lock);
>>>> +
>>> Does the critical section have to include more than just
>>> vfio_ap_queue_link_mdev()? Did we need the critical section
>>> before this patch?
>> We did not need the critical section before this patch because
>> the only function that retrieved the vfio_ap_queue via the queue
>> device's drvdata was the remove callback. I included the initialization
>> of the vfio_ap_queue object under lock because the
>> vfio_ap_find_queue() function retrieves the vfio_ap_queue object from
>> the queue device's drvdata so it might be advantageous to initialize
>> it under the mdev lock. On the other hand, I can't come up with a good
>> argument to change this.
>>
>>
> I was asking out of curiosity, not because I want it changed. I was
> also wondering if somebody could see a partially initialized device:
> we even first call dev_set_drvdata() and only then finish the
> initialization. Before 's390/vfio-ap: use new AP bus interface to search
> for queue devices', which is the previous patch, we had the klist code
> in between, which uses spinlocks, which I think ensure, that all
> effects of probe are seen when we get the queue from
> vfio_ap_find_queue(). But with patch 4 in place that is not the case any
> more. Or am I wrong?

You are correct insofar as patch 4 replaces the driver_find_device()
function call with a call to AP bus's ap_get_qdev() function which
does not use spinlocks. Without digging deeply into the probe call
chain I do not know whether or not  the use of spinlocks by the klist
code ensures all effects of the probe are seen when we get the
queue from vfio_ap_find_queue(). What I'm sure about is that since
both vfio_ap_find_queue() and the setting of the drvdata in the
probe function are always done under the mdev lock, consistency
should be maintained. What I did decide when thinking about your
previous review comment is that we should probably initialize the
vfio_ap_queue object before setting the drvdata, so I made that change.

>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB
  2021-01-11 22:50   ` Halil Pasic
@ 2021-01-14 21:35     ` Tony Krowiak
  0 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2021-01-14 21:35 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/11/21 5:50 PM, Halil Pasic wrote:
> On Tue, 22 Dec 2020 20:15:58 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The APCB is a field within the CRYCB that provides the AP configuration
>> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
>> maintain it for the lifespan of the guest.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 15 +++++++++++++++
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   2 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 2d58b39977be..44b3a81cadfb 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -293,6 +293,20 @@ static void vfio_ap_matrix_init(struct ap_config_info *info,
>>   	matrix->adm_max = info->apxa ? info->Nd : 15;
>>   }
>>   
>> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
>> +}
>> +
>> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev))
>> +		kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>> +					  matrix_mdev->shadow_apcb.apm,
>> +					  matrix_mdev->shadow_apcb.aqm,
>> +					  matrix_mdev->shadow_apcb.adm);
>> +}
>> +
>>   static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   {
>>   	struct ap_matrix_mdev *matrix_mdev;
>> @@ -308,6 +322,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   
>>   	matrix_mdev->mdev = mdev;
>>   	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> +	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
>>   	hash_init(matrix_mdev->qtable);
>>   	mdev_set_drvdata(mdev, matrix_mdev);
>>   	matrix_mdev->pqap_hook.hook = handle_pqap;
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 4e5cc72fc0db..d2d26ba18602 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -75,6 +75,7 @@ struct ap_matrix {
>>    * @list:	allows the ap_matrix_mdev struct to be added to a list
>>    * @matrix:	the adapters, usage domains and control domains assigned to the
>>    *		mediated matrix device.
>> + * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
>>    * @group_notifier: notifier block used for specifying callback function for
>>    *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
>>    * @kvm:	the struct holding guest's state
>> @@ -82,6 +83,7 @@ struct ap_matrix {
>>   struct ap_matrix_mdev {
>>   	struct list_head node;
>>   	struct ap_matrix matrix;
>> +	struct ap_matrix shadow_apcb;
>>   	struct notifier_block group_notifier;
>>   	struct notifier_block iommu_notifier;
>>   	struct kvm *kvm;
> What happened to the following hunk from v12?

That's a very good question, I'll reinstate it.

>
> @@ -1218,13 +1233,9 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>   	if (ret)
>   		return NOTIFY_DONE;
>   
> -	/* If there is no CRYCB pointer, then we can't copy the masks */
> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
> -		return NOTIFY_DONE;
> -
> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> -				  matrix_mdev->matrix.aqm,
> -				  matrix_mdev->matrix.adm);
> +	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> +	       sizeof(matrix_mdev->shadow_apcb));
> +	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>   
>   	return NOTIFY_OK;
>   }


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2021-01-14 17:54     ` Tony Krowiak
@ 2021-01-15  1:08       ` Halil Pasic
  2021-01-15  1:44       ` Halil Pasic
  1 sibling, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-01-15  1:08 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 14 Jan 2021 12:54:39 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 1/11/21 3:40 PM, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:15:57 -0500
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> The current implementation does not allow assignment of an AP adapter or
> >> domain to an mdev device if each APQN resulting from the assignment
> >> does not reference an AP queue device that is bound to the vfio_ap device
> >> driver. This patch allows assignment of AP resources to the matrix mdev as
> >> long as the APQNs resulting from the assignment:
> >>     1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
> >>     2. Are not assigned to another matrix mdev.
> >>
> >> The rationale behind this is twofold:
> >>     1. The AP architecture does not preclude assignment of APQNs to an AP
> >>        configuration that are not available to the system.
> >>     2. APQNs that do not reference a queue device bound to the vfio_ap
> >>        device driver will not be assigned to the guest's CRYCB, so the
> >>        guest will not get access to queues not bound to the vfio_ap driver.  
> > You didn't tell us about the changed error code.  
> 
> I am assuming you are talking about returning -EBUSY from
> the vfio_ap_mdev_verify_no_sharing() function instead of
> -EADDRINUSE. I'm going to change this back per your comments
> below.
> 
> >
> > Also notice that this point we don't have neither filtering nor in-use.
> > This used to be patch 11, and most of that stuff used to be in place. But
> > I'm going to trust you, if you say its fine to enable it this early.  
> 
> The patch order was changed due to your review comments in
> in Message ID <20201126165431.6ef1457a.pasic@linux.ibm.com>,
> patch 07/17 in the v12 series. In order to ensure that only queues
> bound to the vfio_ap driver are given to the guest, I'm going to
> create a patch that will preceded this one which introduces the
> filtering code currently introduced in the patch 12/17, the hot
> plug patch.
> 

I don't want to delay this any further, so it's up to you. I don't think
we will get the in-between steps perfect anyway.

I've re-readthe Message ID
 <20201126165431.6ef1457a.pasic@linux.ibm.com> and I didn't
ask for this change. I pointed out a problem, and said, maybe it can be
solved by reordering, I didn't think it through.

[..]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2021-01-14 17:54     ` Tony Krowiak
  2021-01-15  1:08       ` Halil Pasic
@ 2021-01-15  1:44       ` Halil Pasic
  2021-03-31 14:36         ` Tony Krowiak
  1 sibling, 1 reply; 48+ messages in thread
From: Halil Pasic @ 2021-01-15  1:44 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Thu, 14 Jan 2021 12:54:39 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >>   /**
> >>    * vfio_ap_mdev_verify_no_sharing
> >>    *
> >> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
> >> - * and AP queue indexes comprising the AP matrix are not configured for another
> >> - * mediated device. AP queue sharing is not allowed.
> >> + * Verifies that each APQN derived from the Cartesian product of the AP adapter
> >> + * IDs and AP queue indexes comprising the AP matrix are not configured for
> >> + * another mediated device. AP queue sharing is not allowed.
> >>    *
> >> - * @matrix_mdev: the mediated matrix device
> >> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
> >> + *		 are assigned.
> >> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
> >> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
> >>    *
> >> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
> >> + * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
> >>    */
> >> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> >> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> >> +					  unsigned long *mdev_apm,
> >> +					  unsigned long *mdev_aqm)
> >>   {
> >>   	struct ap_matrix_mdev *lstdev;
> >>   	DECLARE_BITMAP(apm, AP_DEVICES);
> >> @@ -523,20 +426,31 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
> >>   		 * We work on full longs, as we can only exclude the leftover
> >>   		 * bits in non-inverse order. The leftover is all zeros.
> >>   		 */
> >> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
> >> -				lstdev->matrix.apm, AP_DEVICES))
> >> +		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
> >>   			continue;
> >>   
> >> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
> >> -				lstdev->matrix.aqm, AP_DOMAINS))
> >> +		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
> >>   			continue;
> >>   
> >> -		return -EADDRINUSE;
> >> +		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
> >> +					     apm, aqm);
> >> +
> >> +		return -EBUSY;  
> > Why do we change -EADDRINUSE to -EBUSY? This gets bubbled up to
> > userspace, or? So a tool that checks for the other mdev has it
> > condition by checking for -EADDRINUSE, would be confused...  
> 
> Back in v8 of the series, Christian suggested the occurrences
> of -EADDRINUSE should be replaced by the more appropriate
> -EBUSY (Message ID <d7954c15-b14f-d6e5-0193-aadca61883a8@de.ibm.com>),
> so I changed it here. It does get bubbled up to userspace, so you make a 
> valid point. I will
> change it back. I will, however, set the value returned from the
> __verify_card_reservations() function in ap_bus.c to -EBUSY as
> suggested by Christian.

As long as the error code for an ephemeral failure due to can't take a
lock right now, and the error code for a failure due to a sharing
conflict are (which most likely requires admin action to be resolved)
I'm fine.

Choosing EBUSY for sharing conflict, and something else for can't take
lock for the bus attributes, while choosing EADDRINUSE for sharing
conflict, and EBUSY for can't take lock in the case of the mdev
attributes (assign_*; unassign_*) sounds confusing to me, but is still
better than collating the two conditions. Maybe we can choose EAGAIN
or EWOULDBLOCK for the can't take the lock right now. I don't know.

I'm open to suggestions. And if Christian wants to change this for
the already released interfaces, I will have to live with that. But it
has to be a conscious decision at least.

What I consider tricky about EBUSY, is that according to my intuition,
in pseudocode, object.operation(argument) returns -EBUSY probably tells
me that object is busy (i.e. is in the middle of something incompatible
with performing operation). In our case, it is not the object that is
busy, but the resource denoted by the argument.

Regards,
Halil

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2021-01-11 22:58   ` Halil Pasic
@ 2021-01-28 21:29     ` Tony Krowiak
  0 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2021-01-28 21:29 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/11/21 5:58 PM, Halil Pasic wrote:
> On Tue, 22 Dec 2020 20:15:59 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The matrix of adapters and domains configured in a guest's APCB may
>> differ from the matrix of adapters and domains assigned to the matrix mdev,
>> so this patch introduces a sysfs attribute to display the matrix of
>> adapters and domains that are or will be assigned to the APCB of a guest
>> that is or will be using the matrix mdev. For a matrix mdev denoted by
>> $uuid, the guest matrix can be displayed as follows:
>>
>>     cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
>
> But because vfio_ap_mdev_commit_shadow_apcb() is not used (see prev
> patch) the attribute won't show the guest matrix at this point. :(

I'll move this patch following all of the filtering and hot plug
patches.

>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 51 ++++++++++++++++++++++---------
>>   1 file changed, 37 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 44b3a81cadfb..1b1d5975ee0e 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -894,29 +894,24 @@ static ssize_t control_domains_show(struct device *dev,
>>   }
>>   static DEVICE_ATTR_RO(control_domains);
>>   
>> -static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>> -			   char *buf)
>> +static ssize_t vfio_ap_mdev_matrix_show(struct ap_matrix *matrix, char *buf)
>>   {
>> -	struct mdev_device *mdev = mdev_from_dev(dev);
>> -	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   	char *bufpos = buf;
>>   	unsigned long apid;
>>   	unsigned long apqi;
>>   	unsigned long apid1;
>>   	unsigned long apqi1;
>> -	unsigned long napm_bits = matrix_mdev->matrix.apm_max + 1;
>> -	unsigned long naqm_bits = matrix_mdev->matrix.aqm_max + 1;
>> +	unsigned long napm_bits = matrix->apm_max + 1;
>> +	unsigned long naqm_bits = matrix->aqm_max + 1;
>>   	int nchars = 0;
>>   	int n;
>>   
>> -	apid1 = find_first_bit_inv(matrix_mdev->matrix.apm, napm_bits);
>> -	apqi1 = find_first_bit_inv(matrix_mdev->matrix.aqm, naqm_bits);
>> -
>> -	mutex_lock(&matrix_dev->lock);
>> +	apid1 = find_first_bit_inv(matrix->apm, napm_bits);
>> +	apqi1 = find_first_bit_inv(matrix->aqm, naqm_bits);
>>   
>>   	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
>> -		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
>> -			for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> +		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
>> +			for_each_set_bit_inv(apqi, matrix->aqm,
>>   					     naqm_bits) {
>>   				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
>>   					    apqi);
>> @@ -925,25 +920,52 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>>   			}
>>   		}
>>   	} else if (apid1 < napm_bits) {
>> -		for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, napm_bits) {
>> +		for_each_set_bit_inv(apid, matrix->apm, napm_bits) {
>>   			n = sprintf(bufpos, "%02lx.\n", apid);
>>   			bufpos += n;
>>   			nchars += n;
>>   		}
>>   	} else if (apqi1 < naqm_bits) {
>> -		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, naqm_bits) {
>> +		for_each_set_bit_inv(apqi, matrix->aqm, naqm_bits) {
>>   			n = sprintf(bufpos, ".%04lx\n", apqi);
>>   			bufpos += n;
>>   			nchars += n;
>>   		}
>>   	}
>>   
>> +	return nchars;
>> +}
>> +
>> +static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>> +			   char *buf)
>> +{
>> +	ssize_t nchars;
>> +	struct mdev_device *mdev = mdev_from_dev(dev);
>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->matrix, buf);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return nchars;
>>   }
>>   static DEVICE_ATTR_RO(matrix);
>>   
>> +static ssize_t guest_matrix_show(struct device *dev,
>> +				 struct device_attribute *attr, char *buf)
>> +{
>> +	ssize_t nchars;
>> +	struct mdev_device *mdev = mdev_from_dev(dev);
>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	nchars = vfio_ap_mdev_matrix_show(&matrix_mdev->shadow_apcb, buf);
>> +	mutex_unlock(&matrix_dev->lock);
>> +
>> +	return nchars;
>> +}
>> +static DEVICE_ATTR_RO(guest_matrix);
>> +
>>   static struct attribute *vfio_ap_mdev_attrs[] = {
>>   	&dev_attr_assign_adapter.attr,
>>   	&dev_attr_unassign_adapter.attr,
>> @@ -953,6 +975,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
>>   	&dev_attr_unassign_control_domain.attr,
>>   	&dev_attr_control_domains.attr,
>>   	&dev_attr_matrix.attr,
>> +	&dev_attr_guest_matrix.attr,
>>   	NULL,
>>   };
>>   


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2021-01-12 17:55     ` Halil Pasic
@ 2021-02-01 14:41       ` Tony Krowiak
  2021-02-03 23:13       ` Tony Krowiak
  1 sibling, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2021-02-01 14:41 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/12/21 12:55 PM, Halil Pasic wrote:
> On Tue, 12 Jan 2021 02:12:51 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
>
>>> @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>>   	apqi = AP_QID_QUEUE(q->apqn);
>>>   	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>>>   
>>> -	if (q->matrix_mdev)
>>> +	if (q->matrix_mdev) {
>>> +		matrix_mdev = q->matrix_mdev;
>>>   		vfio_ap_mdev_unlink_queue(q);
>>> +		vfio_ap_mdev_refresh_apcb(matrix_mdev);
>>> +	}
>>>   
>>>   	kfree(q);
>>>   	mutex_unlock(&matrix_dev->lock);
> Shouldn't we first remove the queue from the APCB and then
> reset? Sorry, I missed this one yesterday.

Yes, that's probably the order in which  it should be done.
I'll change it.

>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2021-01-12 17:55     ` Halil Pasic
  2021-02-01 14:41       ` Tony Krowiak
@ 2021-02-03 23:13       ` Tony Krowiak
  2021-02-04  0:21         ` Halil Pasic
  1 sibling, 1 reply; 48+ messages in thread
From: Tony Krowiak @ 2021-02-03 23:13 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/12/21 12:55 PM, Halil Pasic wrote:
> On Tue, 12 Jan 2021 02:12:51 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
>
>>> @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>>   	apqi = AP_QID_QUEUE(q->apqn);
>>>   	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>>>   
>>> -	if (q->matrix_mdev)
>>> +	if (q->matrix_mdev) {
>>> +		matrix_mdev = q->matrix_mdev;
>>>   		vfio_ap_mdev_unlink_queue(q);
>>> +		vfio_ap_mdev_refresh_apcb(matrix_mdev);
>>> +	}
>>>   
>>>   	kfree(q);
>>>   	mutex_unlock(&matrix_dev->lock);
> Shouldn't we first remove the queue from the APCB and then
> reset? Sorry, I missed this one yesterday.

I agreed to move the reset, however if the remove callback is
invoked due to a manual unbind of the queue and the queue is
in use by a guest, the cleanup of the IRQ resources after the
reset of the queue will not happen because the link from the
queue to the matrix mdev was removed. Consequently, I'm going
to have to change the patch 05/15 to split the vfio_ap_mdev_unlink_queue()
function into two functions: one to remove the link from the matrix mdev to
the queue; and, one to remove the link from the queue to the matrix
mdev. Only the first will be used for the remove callback which should
be fine since the queue object is freed at the end of the remove
function anyway.

>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2021-02-03 23:13       ` Tony Krowiak
@ 2021-02-04  0:21         ` Halil Pasic
  0 siblings, 0 replies; 48+ messages in thread
From: Halil Pasic @ 2021-02-04  0:21 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor

On Wed, 3 Feb 2021 18:13:09 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 1/12/21 12:55 PM, Halil Pasic wrote:
> > On Tue, 12 Jan 2021 02:12:51 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >  
> >>> @@ -1347,8 +1437,11 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> >>>   	apqi = AP_QID_QUEUE(q->apqn);
> >>>   	vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >>>   
> >>> -	if (q->matrix_mdev)
> >>> +	if (q->matrix_mdev) {
> >>> +		matrix_mdev = q->matrix_mdev;
> >>>   		vfio_ap_mdev_unlink_queue(q);
> >>> +		vfio_ap_mdev_refresh_apcb(matrix_mdev);
> >>> +	}
> >>>   
> >>>   	kfree(q);
> >>>   	mutex_unlock(&matrix_dev->lock);  
> > Shouldn't we first remove the queue from the APCB and then
> > reset? Sorry, I missed this one yesterday.  
> 
> I agreed to move the reset, however if the remove callback is
> invoked due to a manual unbind of the queue and the queue is
> in use by a guest, the cleanup of the IRQ resources after the
> reset of the queue will not happen because the link from the
> queue to the matrix mdev was removed. Consequently, I'm going
> to have to change the patch 05/15 to split the vfio_ap_mdev_unlink_queue()
> function into two functions: one to remove the link from the matrix mdev to
> the queue; and, one to remove the link from the queue to the matrix
> mdev. 

Does that mean we should reset before the unlink (or before the second
part of it after the split up)?

I mean have a look at unassign_adapter_store() with all patches
of this series applied. It does an unlink but doesn't do any reset,
or cleanup IRQ resources. And after the unlink we can't clean up
the IRQ resources properly.

But before all this we should resolve this circular lock dependency
problem in a satisfactory way. I'm quite worried about how it is going
to mesh with this series and dynamic ap pass-through.

Regards,
Halil

>Only the first will be used for the remove callback which should
> be fine since the queue object is freed at the end of the remove
> function anyway.
> 
> >
> > Regards,
> > Halil  
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2021-01-15  1:44       ` Halil Pasic
@ 2021-03-31 14:36         ` Tony Krowiak
  0 siblings, 0 replies; 48+ messages in thread
From: Tony Krowiak @ 2021-03-31 14:36 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	hca, gor



On 1/14/21 8:44 PM, Halil Pasic wrote:
> On Thu, 14 Jan 2021 12:54:39 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>>>    /**
>>>>     * vfio_ap_mdev_verify_no_sharing
>>>>     *
>>>> - * Verifies that the APQNs derived from the cross product of the AP adapter IDs
>>>> - * and AP queue indexes comprising the AP matrix are not configured for another
>>>> - * mediated device. AP queue sharing is not allowed.
>>>> + * Verifies that each APQN derived from the Cartesian product of the AP adapter
>>>> + * IDs and AP queue indexes comprising the AP matrix are not configured for
>>>> + * another mediated device. AP queue sharing is not allowed.
>>>>     *
>>>> - * @matrix_mdev: the mediated matrix device
>>>> + * @matrix_mdev: the mediated matrix device to which the APQNs being verified
>>>> + *		 are assigned.
>>>> + * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
>>>> + * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
>>>>     *
>>>> - * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
>>>> + * Returns 0 if the APQNs are not shared, otherwise; returns -EBUSY.
>>>>     */
>>>> -static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>> +static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>>>> +					  unsigned long *mdev_apm,
>>>> +					  unsigned long *mdev_aqm)
>>>>    {
>>>>    	struct ap_matrix_mdev *lstdev;
>>>>    	DECLARE_BITMAP(apm, AP_DEVICES);
>>>> @@ -523,20 +426,31 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>>    		 * We work on full longs, as we can only exclude the leftover
>>>>    		 * bits in non-inverse order. The leftover is all zeros.
>>>>    		 */
>>>> -		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
>>>> -				lstdev->matrix.apm, AP_DEVICES))
>>>> +		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
>>>>    			continue;
>>>>    
>>>> -		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
>>>> -				lstdev->matrix.aqm, AP_DOMAINS))
>>>> +		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
>>>>    			continue;
>>>>    
>>>> -		return -EADDRINUSE;
>>>> +		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
>>>> +					     apm, aqm);
>>>> +
>>>> +		return -EBUSY;
>>> Why do we change -EADDRINUSE to -EBUSY? This gets bubbled up to
>>> userspace, or? So a tool that checks for the other mdev has it
>>> condition by checking for -EADDRINUSE, would be confused...
>> Back in v8 of the series, Christian suggested the occurrences
>> of -EADDRINUSE should be replaced by the more appropriate
>> -EBUSY (Message ID <d7954c15-b14f-d6e5-0193-aadca61883a8@de.ibm.com>),
>> so I changed it here. It does get bubbled up to userspace, so you make a
>> valid point. I will
>> change it back. I will, however, set the value returned from the
>> __verify_card_reservations() function in ap_bus.c to -EBUSY as
>> suggested by Christian.
> As long as the error code for an ephemeral failure due to can't take a
> lock right now, and the error code for a failure due to a sharing
> conflict are (which most likely requires admin action to be resolved)
> I'm fine.
>
> Choosing EBUSY for sharing conflict, and something else for can't take
> lock for the bus attributes, while choosing EADDRINUSE for sharing
> conflict, and EBUSY for can't take lock in the case of the mdev
> attributes (assign_*; unassign_*) sounds confusing to me, but is still
> better than collating the two conditions. Maybe we can choose EAGAIN
> or EWOULDBLOCK for the can't take the lock right now. I don't know.

I was in the process of creating the change log for v14 of
this patch series and realized I never addressed this.
I think EAGAIN would be a better return code for the
mutex_trylock failures in the mdev assign/unassign
operations.

>
> I'm open to suggestions. And if Christian wants to change this for
> the already released interfaces, I will have to live with that. But it
> has to be a conscious decision at least.
>
> What I consider tricky about EBUSY, is that according to my intuition,
> in pseudocode, object.operation(argument) returns -EBUSY probably tells
> me that object is busy (i.e. is in the middle of something incompatible
> with performing operation). In our case, it is not the object that is
> busy, but the resource denoted by the argument.
>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2021-03-31 14:37 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-23  1:15 [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
2020-12-23  1:15 ` [PATCH v13 01/15] s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated Tony Krowiak
2020-12-23  1:15 ` [PATCH v13 02/15] s390/vfio-ap: No need to disable IRQ after queue reset Tony Krowiak
2021-01-11 16:32   ` Halil Pasic
2021-01-13 17:06     ` Tony Krowiak
2021-01-13 21:21       ` Halil Pasic
2021-01-14  0:46         ` Tony Krowiak
2021-01-14  3:13           ` Halil Pasic
2020-12-23  1:15 ` [PATCH v13 03/15] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c Tony Krowiak
2020-12-23  1:15 ` [PATCH v13 04/15] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
2020-12-23  1:15 ` [PATCH v13 05/15] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
2021-01-11 19:17   ` Halil Pasic
2021-01-13 21:41     ` Tony Krowiak
2021-01-14  2:50       ` Halil Pasic
2021-01-14 21:10         ` Tony Krowiak
2020-12-23  1:15 ` [PATCH v13 06/15] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
2021-01-11 20:40   ` Halil Pasic
2021-01-14 17:54     ` Tony Krowiak
2021-01-15  1:08       ` Halil Pasic
2021-01-15  1:44       ` Halil Pasic
2021-03-31 14:36         ` Tony Krowiak
2020-12-23  1:15 ` [PATCH v13 07/15] s390/vfio-ap: introduce shadow APCB Tony Krowiak
2021-01-11 22:50   ` Halil Pasic
2021-01-14 21:35     ` Tony Krowiak
2020-12-23  1:15 ` [PATCH v13 08/15] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
2021-01-11 22:58   ` Halil Pasic
2021-01-28 21:29     ` Tony Krowiak
2020-12-23  1:16 ` [PATCH v13 09/15] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
2021-01-12  1:12   ` Halil Pasic
2021-01-12 17:55     ` Halil Pasic
2021-02-01 14:41       ` Tony Krowiak
2021-02-03 23:13       ` Tony Krowiak
2021-02-04  0:21         ` Halil Pasic
2020-12-23  1:16 ` [PATCH v13 10/15] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
2021-01-12 16:50   ` Halil Pasic
2020-12-23  1:16 ` [PATCH v13 11/15] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
2021-01-12  1:20   ` Halil Pasic
2021-01-12 14:14     ` Matthew Rosato
2021-01-12 16:49       ` Halil Pasic
2020-12-23  1:16 ` [PATCH v13 12/15] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
2021-01-12 16:58   ` Halil Pasic
2020-12-23  1:16 ` [PATCH v13 13/15] s390/vfio-ap: handle host AP config change notification Tony Krowiak
2021-01-12 18:39   ` Halil Pasic
2020-12-23  1:16 ` [PATCH v13 14/15] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
2021-01-12 18:44   ` Halil Pasic
2020-12-23  1:16 ` [PATCH v13 15/15] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
2021-01-06 15:16 ` [PATCH v13 00/15] s390/vfio-ap: dynamic configuration support Tony Krowiak
2021-01-07 14:41   ` Halil Pasic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).