KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v10 00/16]  s390/vfio-ap: dynamic configuration support
@ 2020-08-21 19:56 Tony Krowiak
  2020-08-21 19:56 ` [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module Tony Krowiak
                   ` (15 more replies)
  0 siblings, 16 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few 
deficiencies this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
   from a running guest. In order to modify a guest's AP configuration,
   the guest must be terminated; only then can AP resources be assigned
   to or unassigned from the guest's matrix mdev. The new AP 
   configuration becomes available to the guest when it is subsequently
   restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
   be modified by a root user without any restrictions. A change to
   either mask can result in AP queue devices being unbound from the
   vfio_ap device driver and bound to a zcrypt device driver even if a
   guest is using the queues, thus giving the host access to the guest's
   private crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
   adapters and APQIs of the domains assigned to a matrix mdev must
   reference an AP queue device bound to the vfio_ap device driver. The
   AP architecture allows assignment of AP resources that are not
   available to the system, so this artificial restriction is not 
   compliant with the architecture.

4. The AP configuration profile can be dynamically changed for the linux
   host after a KVM guest is started. For example, a new domain can be
   dynamically added to the configuration profile via the SE or an HMC
   connected to a DPM enabled lpar. Likewise, AP adapters can be 
   dynamically configured (online state) and deconfigured (standby state)
   using the SE, an SCLP command or an HMC connected to a DPM enabled
   lpar. This can result in inadvertent sharing of AP queues between the
   guest and host.

5. A root user can manually unbind an AP queue device representing a 
   queue in use by a KVM guest via the vfio_ap device driver's sysfs 
   unbind attribute. In this case, the guest will be using a queue that
   is not bound to the driver which violates the device model.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:

1. A root user will be prevented from making changes to the AP bus's
   /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the ownership of an APQN
   changes from the vfio_ap device driver to a zcrypt driver when the
   APQN is assigned to a matrix mdev.

2. Allow a root user to hot plug/unplug AP adapters, domains and control
   domains using the matrix mdev's assign/unassign attributes.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if
   it results in assignment of an APQN that does not reference an AP
   queue device bound to the vfio_ap device driver, as long as the APQN
   is not reserved for use by the default zcrypt drivers (also known as
   over-provisioning of AP resources). Allowing over-provisioning of AP
   resources better models the architecture which does not preclude
   assigning AP resources that are not yet available in the system. Such
   APQNs, however, will not be assigned to the guest using the matrix
   mdev; only APQNs referencing AP queue devices bound to the vfio_ap
   device driver will actually get assigned to the guest.

5. Handle dynamic changes to the AP device model. 

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.

2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes 
reconfiguring a guest's AP matrix much less disruptive.

3. Rationale for allowing over-provisioning of AP resources:
----------------------------------------------------------- 
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will 
automatically become unavailable to the guest.

Change log v9-v10:
-----------------
* Updated the documentation in vfio-ap.rst to include information about the
  AP dynamic configuration support

Change log v8-v9:
----------------
* Fixed errors flagged by the kernel test robot

* Fixed issue with guest losing queues when a new queue is probed due to
  manual bind operation.

Change log v7-v8:
----------------
* Now logging a message when an attempt to reserve APQNs for the zcrypt
  drivers will result in taking a queue away from a KVM guest to provide
  the sysadmin a way to ascertain why the sysfs operation failed.

* Created locked and unlocked versions of the ap_parse_mask_str() function.

* Now using new interface provided by an AP bus patch -
  s390/ap: introduce new ap function ap_get_qdev() - to retrieve
  struct ap_queue representing an AP queue device. This patch is not a
  part of this series but is a prerequisite for this series. 

Change log v6-v7:
----------------
* Added callbacks to AP bus:
  - on_config_changed: Notifies implementing drivers that
    the AP configuration has changed since last AP device scan.
  - on_scan_complete: Notifies implementing drivers that the device scan
    has completed.
  - implemented on_config_changed and on_scan_complete callbacks for
    vfio_ap device driver.
  - updated vfio_ap device driver's probe and remove callbacks to handle
    dynamic changes to the AP device model. 
* Added code to filter APQNs when assigning AP resources to a KVM guest's
  CRYCB

Change log v5-v6:
----------------
* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5 
  series. Harald Freudenberer pointed out that the mutex lock
  for ap_perms_mutex in the apmask_store and aqmask_store functions
  was not being freed. 

* Removed patch 6/7 which added logging to the vfio_ap driver
  to expedite acceptance of this series. The logging will be introduced
  with a separate patch series to allow more time to explore options
  such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
  AP queue devices bound to the vfio_ap device driver are not assigned
  to the guest CRYCB:

  Patch 4: Filter CRYCB bits for unavailable queue devices
  Patch 5: sysfs attribute to display the guest CRYCB
  Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
  invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
  patch 2. 

Change log v4-v5:
----------------
* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
  APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
  assigned to an mdev.

* No longer enforcing requirement restricting guest access to
  queues represented by a queue device bound to the vfio_ap
  device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
  from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
  Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
  the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
  out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from 
  the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic 
  changes to the AP guest configuration due to root user interventions
  or hardware anomalies.

Tony Krowiak (16):
  s390/vfio-ap: add version vfio_ap module
  s390/vfio-ap: use new AP bus interface to search for queue devices
  s390/vfio-ap: manage link between queue struct and matrix mdev
  s390/zcrypt: driver callback to indicate resource in use
  s390/vfio-ap: implement in-use callback for vfio_ap driver
  s390/vfio-ap: introduce shadow APCB
  s390/vfio-ap: sysfs attribute to display the guest's matrix
  s390/vfio-ap: filter matrix for unavailable queue devices
  s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
  s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  s390/zcrypt: Notify driver on config changed and scan complete
    callbacks
  s390/vfio-ap: handle host AP config change notification
  s390/vfio-ap: handle AP bus scan completed notification
  s390/vfio-ap: handle probe/remove not due to host AP config changes
  s390/vfio-ap: update docs to include dynamic config support

 Documentation/s390/vfio-ap.rst        |  362 ++++++--
 drivers/s390/crypto/ap_bus.c          |  233 ++++-
 drivers/s390/crypto/ap_bus.h          |   16 +
 drivers/s390/crypto/vfio_ap_drv.c     |   36 +-
 drivers/s390/crypto/vfio_ap_ops.c     | 1216 ++++++++++++++++++++-----
 drivers/s390/crypto/vfio_ap_private.h |   23 +-
 6 files changed, 1533 insertions(+), 353 deletions(-)

-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-08-25 10:04   ` Cornelia Huck
  2020-08-21 19:56 ` [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

Let's set a version for the vfio_ap module so that automated regression
tests can determine whether dynamic configuration tests can be run or
not.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index be2520cc010b..f4ceb380dd61 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -17,10 +17,12 @@
 
 #define VFIO_AP_ROOT_NAME "vfio_ap"
 #define VFIO_AP_DEV_NAME "matrix"
+#define VFIO_AP_MODULE_VERSION "1.2.0"
 
 MODULE_AUTHOR("IBM Corporation");
 MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2018");
 MODULE_LICENSE("GPL v2");
+MODULE_VERSION(VFIO_AP_MODULE_VERSION);
 
 static struct ap_driver vfio_ap_drv;
 
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
  2020-08-21 19:56 ` [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-08-25 10:13   ` Cornelia Huck
                     ` (2 more replies)
  2020-08-21 19:56 ` [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
                   ` (13 subsequent siblings)
  15 siblings, 3 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak, kernel test robot

This patch refactor's the vfio_ap device driver to use the AP bus's
ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
information about a queue that is bound to the vfio_ap device driver.
The bus's ap_get_qdev() function retrieves the queue device from a
hashtable keyed by APQN. This is much more efficient than looping over
the list of devices attached to the AP bus by several orders of
magnitude.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     | 27 ++-------
 drivers/s390/crypto/vfio_ap_ops.c     | 86 +++++++++++++++------------
 drivers/s390/crypto/vfio_ap_private.h |  8 ++-
 3 files changed, 59 insertions(+), 62 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index f4ceb380dd61..24cdef60039a 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -53,15 +53,9 @@ MODULE_DEVICE_TABLE(vfio_ap, ap_queue_ids);
  */
 static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
 {
-	struct vfio_ap_queue *q;
-
-	q = kzalloc(sizeof(*q), GFP_KERNEL);
-	if (!q)
-		return -ENOMEM;
-	dev_set_drvdata(&apdev->device, q);
-	q->apqn = to_ap_queue(&apdev->device)->qid;
-	q->saved_isc = VFIO_AP_ISC_INVALID;
-	return 0;
+	struct ap_queue *queue = to_ap_queue(&apdev->device);
+
+	return vfio_ap_mdev_probe_queue(queue);
 }
 
 /**
@@ -72,18 +66,9 @@ static int vfio_ap_queue_dev_probe(struct ap_device *apdev)
  */
 static void vfio_ap_queue_dev_remove(struct ap_device *apdev)
 {
-	struct vfio_ap_queue *q;
-	int apid, apqi;
-
-	mutex_lock(&matrix_dev->lock);
-	q = dev_get_drvdata(&apdev->device);
-	dev_set_drvdata(&apdev->device, NULL);
-	apid = AP_QID_CARD(q->apqn);
-	apqi = AP_QID_QUEUE(q->apqn);
-	vfio_ap_mdev_reset_queue(apid, apqi, 1);
-	vfio_ap_irq_disable(q);
-	kfree(q);
-	mutex_unlock(&matrix_dev->lock);
+	struct ap_queue *queue = to_ap_queue(&apdev->device);
+
+	vfio_ap_mdev_remove_queue(queue);
 }
 
 static void vfio_ap_matrix_dev_release(struct device *dev)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..ad3925f04f61 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -26,43 +26,26 @@
 
 static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
 
-static int match_apqn(struct device *dev, const void *data)
-{
-	struct vfio_ap_queue *q = dev_get_drvdata(dev);
-
-	return (q->apqn == *(int *)(data)) ? 1 : 0;
-}
-
 /**
- * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
- * @matrix_mdev: the associated mediated matrix
+ * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
  * @apqn: The queue APQN
  *
- * Retrieve a queue with a specific APQN from the list of the
- * devices of the vfio_ap_drv.
- * Verify that the APID and the APQI are set in the matrix.
+ * Retrieve a queue with a specific APQN from the AP queue devices attached to
+ * the AP bus.
  *
- * Returns the pointer to the associated vfio_ap_queue
+ * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
  */
-static struct vfio_ap_queue *vfio_ap_get_queue(
-					struct ap_matrix_mdev *matrix_mdev,
-					int apqn)
+static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
 {
+	struct ap_queue *queue;
 	struct vfio_ap_queue *q;
-	struct device *dev;
 
-	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
-		return NULL;
-	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
+	queue = ap_get_qdev(apqn);
+	if (!queue)
 		return NULL;
 
-	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				 &apqn, match_apqn);
-	if (!dev)
-		return NULL;
-	q = dev_get_drvdata(dev);
-	q->matrix_mdev = matrix_mdev;
-	put_device(dev);
+	q = dev_get_drvdata(&queue->ap_dev.device);
+	put_device(&queue->ap_dev.device);
 
 	return q;
 }
@@ -144,7 +127,7 @@ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
  * Returns if ap_aqic function failed with invalid, deconfigured or
  * checkstopped AP.
  */
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
+static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 {
 	struct ap_qirq_ctrl aqic_gisa = {};
 	struct ap_queue_status status;
@@ -293,10 +276,11 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
 				   struct ap_matrix_mdev, pqap_hook);
 
-	q = vfio_ap_get_queue(matrix_mdev, apqn);
+	q = vfio_ap_get_queue(apqn);
 	if (!q)
 		goto out_unlock;
 
+	q->matrix_mdev = matrix_mdev;
 	status = vcpu->run->s.regs.gprs[1];
 
 	/* If IR bit(16) is set we enable the interrupt */
@@ -1116,20 +1100,15 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 
 static void vfio_ap_irq_disable_apqn(int apqn)
 {
-	struct device *dev;
 	struct vfio_ap_queue *q;
 
-	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				 &apqn, match_apqn);
-	if (dev) {
-		q = dev_get_drvdata(dev);
+	q = vfio_ap_get_queue(apqn);
+	if (q)
 		vfio_ap_irq_disable(q);
-		put_device(dev);
-	}
 }
 
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-			     unsigned int retry)
+static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
+				    unsigned int retry)
 {
 	struct ap_queue_status status;
 	int retry2 = 2;
@@ -1302,3 +1281,34 @@ void vfio_ap_mdev_unregister(void)
 {
 	mdev_unregister_device(&matrix_dev->device);
 }
+
+int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
+{
+	struct vfio_ap_queue *q;
+
+	q = kzalloc(sizeof(*q), GFP_KERNEL);
+	if (!q)
+		return -ENOMEM;
+
+	dev_set_drvdata(&queue->ap_dev.device, q);
+	q->apqn = queue->qid;
+	q->saved_isc = VFIO_AP_ISC_INVALID;
+
+	return 0;
+}
+
+void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
+{
+	struct vfio_ap_queue *q;
+	int apid, apqi;
+
+	mutex_lock(&matrix_dev->lock);
+	q = dev_get_drvdata(&queue->ap_dev.device);
+	dev_set_drvdata(&queue->ap_dev.device, NULL);
+	apid = AP_QID_CARD(q->apqn);
+	apqi = AP_QID_QUEUE(q->apqn);
+	vfio_ap_mdev_reset_queue(apid, apqi, 1);
+	vfio_ap_irq_disable(q);
+	kfree(q);
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index f46dde56b464..a2aa05bec718 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -18,6 +18,7 @@
 #include <linux/delay.h>
 #include <linux/mutex.h>
 #include <linux/kvm_host.h>
+#include <linux/hashtable.h>
 
 #include "ap_bus.h"
 
@@ -90,8 +91,6 @@ struct ap_matrix_mdev {
 
 extern int vfio_ap_mdev_register(void);
 extern void vfio_ap_mdev_unregister(void);
-int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-			     unsigned int retry);
 
 struct vfio_ap_queue {
 	struct ap_matrix_mdev *matrix_mdev;
@@ -100,5 +99,8 @@ struct vfio_ap_queue {
 #define VFIO_AP_ISC_INVALID 0xff
 	unsigned char saved_isc;
 };
-struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q);
+
+int vfio_ap_mdev_probe_queue(struct ap_queue *queue);
+void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
  2020-08-21 19:56 ` [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module Tony Krowiak
  2020-08-21 19:56 ` [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-08-25 10:25   ` Cornelia Huck
                     ` (2 more replies)
  2020-08-21 19:56 ` [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
                   ` (12 subsequent siblings)
  15 siblings, 3 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

Let's create links between each queue device bound to the vfio_ap device
driver and the matrix mdev to which the queue is assigned. The idea is to
facilitate efficient retrieval of the objects representing the queue
devices and matrix mdevs as well as to verify that a queue assigned to
a matrix mdev is bound to the driver.

The links will be created as follows:

   * When the queue device is probed, if its APQN is assigned to a matrix
     mdev, the structures representing the queue device and the matrix mdev
     will be linked.

   * When an adapter or domain is assigned to a matrix mdev, for each new
     APQN assigned that references a queue device bound to the vfio_ap
     device driver, the structures representing the queue device and the
     matrix mdev will be linked.

The links will be removed as follows:

   * When the queue device is removed, if its APQN is assigned to a matrix
     mdev, the structures representing the queue device and the matrix mdev
     will be unlinked.

   * When an adapter or domain is unassigned from a matrix mdev, for each
     APQN unassigned that references a queue device bound to the vfio_ap
     device driver, the structures representing the queue device and the
     matrix mdev will be unlinked.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 132 +++++++++++++++++++++++++-
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 2 files changed, 129 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index ad3925f04f61..2e37ee82e422 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -50,6 +50,19 @@ static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
 	return q;
 }
 
+static struct vfio_ap_queue *vfio_ap_get_mdev_queue(struct ap_matrix_mdev *matrix_mdev,
+						    unsigned long apqn)
+{
+	struct vfio_ap_queue *q;
+
+	hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
+		if (q && (q->apqn == apqn))
+			return q;
+	}
+
+	return NULL;
+}
+
 /**
  * vfio_ap_wait_for_irqclear
  * @apqn: The AP Queue number
@@ -160,7 +173,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
 		  status.response_code);
 end_free:
 	vfio_ap_free_aqic_resources(q);
-	q->matrix_mdev = NULL;
 	return status;
 }
 
@@ -262,7 +274,6 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	struct vfio_ap_queue *q;
 	struct ap_queue_status qstatus = {
 			       .response_code = AP_RESPONSE_Q_NOT_AVAIL, };
-	struct ap_matrix_mdev *matrix_mdev;
 
 	/* If we do not use the AIV facility just go to userland */
 	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
@@ -273,14 +284,11 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 
 	if (!vcpu->kvm->arch.crypto.pqap_hook)
 		goto out_unlock;
-	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
-				   struct ap_matrix_mdev, pqap_hook);
 
 	q = vfio_ap_get_queue(apqn);
 	if (!q)
 		goto out_unlock;
 
-	q->matrix_mdev = matrix_mdev;
 	status = vcpu->run->s.regs.gprs[1];
 
 	/* If IR bit(16) is set we enable the interrupt */
@@ -320,6 +328,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 	matrix_mdev->mdev = mdev;
 	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
 	matrix_mdev->pqap_hook.owner = THIS_MODULE;
@@ -548,6 +557,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
 	return 0;
 }
 
+enum qlink_type {
+	LINK_APID,
+	LINK_APQI,
+	UNLINK_APID,
+	UNLINK_APQI,
+};
+
+static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
+				    unsigned long apid, unsigned long apqi)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+	if (q) {
+		q->matrix_mdev = matrix_mdev;
+		hash_add(matrix_mdev->qtable,
+			 &q->mdev_qnode, q->apqn);
+	}
+}
+
+static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
+{
+	struct vfio_ap_queue *q;
+
+	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
+	if (q) {
+		q->matrix_mdev = NULL;
+		hash_del(&q->mdev_qnode);
+	}
+}
+
+/**
+ * vfio_ap_mdev_link_queues
+ *
+ * @matrix_mdev: The matrix mdev to link.
+ * @type:	 The type of @qlink_id.
+ * @qlink_id:	 The APID or APQI of the queues to link.
+ *
+ * Sets or clears the links between the queues with the specified @qlink_id
+ * and the @matrix_mdev:
+ *     @type == LINK_APID: Set the links between the @matrix_mdev and the
+ *                         queues with the specified @qlink_id (APID)
+ *     @type == LINK_APQI: Set the links between the @matrix_mdev and the
+ *                         queues with the specified @qlink_id (APQI)
+ *     @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
+ *                           queues with the specified @qlink_id (APID)
+ *     @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
+ *                           queues with the specified @qlink_id (APQI)
+ */
+static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
+				     enum qlink_type type,
+				     unsigned long qlink_id)
+{
+	unsigned long id;
+
+	switch (type) {
+	case LINK_APID:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+				     matrix_mdev->matrix.aqm_max + 1)
+			vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
+		break;
+	case UNLINK_APID:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
+				     matrix_mdev->matrix.aqm_max + 1)
+			vfio_ap_mdev_unlink_queue(qlink_id, id);
+		break;
+	case LINK_APQI:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+				     matrix_mdev->matrix.apm_max + 1)
+			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+		break;
+	case UNLINK_APQI:
+		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
+				     matrix_mdev->matrix.apm_max + 1)
+			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 /**
  * assign_adapter_store
  *
@@ -617,6 +707,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (ret)
 		goto share_err;
 
+	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
 	ret = count;
 	goto done;
 
@@ -668,6 +759,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
+	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -758,6 +850,7 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (ret)
 		goto share_err;
 
+	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
 	ret = count;
 	goto done;
 
@@ -810,6 +903,7 @@ static ssize_t unassign_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
+	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -1282,6 +1376,29 @@ void vfio_ap_mdev_unregister(void)
 	mdev_unregister_device(&matrix_dev->device);
 }
 
+/**
+ * vfio_ap_queue_link_mdev
+ *
+ * @q: The queue to link with the matrix mdev.
+ *
+ * Links @q with the matrix mdev to which the queue's APQN is assigned.
+ */
+static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
+{
+	unsigned long apid = AP_QID_CARD(q->apqn);
+	unsigned long apqi = AP_QID_QUEUE(q->apqn);
+	struct ap_matrix_mdev *matrix_mdev;
+
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
+		    test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+			q->matrix_mdev = matrix_mdev;
+			hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
+			break;
+		}
+	}
+}
+
 int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
 {
 	struct vfio_ap_queue *q;
@@ -1290,9 +1407,12 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
 	if (!q)
 		return -ENOMEM;
 
+	mutex_lock(&matrix_dev->lock);
 	dev_set_drvdata(&queue->ap_dev.device, q);
 	q->apqn = queue->qid;
 	q->saved_isc = VFIO_AP_ISC_INVALID;
+	vfio_ap_queue_link_mdev(q);
+	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
 }
@@ -1309,6 +1429,8 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
 	apqi = AP_QID_QUEUE(q->apqn);
 	vfio_ap_mdev_reset_queue(apid, apqi, 1);
 	vfio_ap_irq_disable(q);
+	if (q->matrix_mdev)
+		hash_del(&q->mdev_qnode);
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
 }
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index a2aa05bec718..57da703b549a 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -87,6 +87,7 @@ struct ap_matrix_mdev {
 	struct kvm *kvm;
 	struct kvm_s390_module_hook pqap_hook;
 	struct mdev_device *mdev;
+	DECLARE_HASHTABLE(qtable, 8);
 };
 
 extern int vfio_ap_mdev_register(void);
@@ -98,6 +99,7 @@ struct vfio_ap_queue {
 	int	apqn;
 #define VFIO_AP_ISC_INVALID 0xff
 	unsigned char saved_isc;
+	struct hlist_node mdev_qnode;
 };
 
 int vfio_ap_mdev_probe_queue(struct ap_queue *queue);
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (2 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-14 15:29   ` Cornelia Huck
  2020-09-25  9:24   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak, kernel test robot

Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The intent of
this callback is to provide a driver with the means to prevent a root user
from inadvertently taking a queue away from a matrix mdev and giving it to
the host while it is assigned to the matrix mdev. The callback will
be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
in use error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver facilitates pass-through of an AP queue to a
guest. The idea here is that a guest may be administered by a different
sysadmin than the host and we don't want AP resources to unexpectedly
disappear from a guest's AP configuration (i.e., adapters, domains and
control domains assigned to the matrix mdev). This will enforce the proper
procedure for removing AP resources intended for guest usage which is to
first unassign them from the matrix mdev, then unbind them from the
vfio_ap device driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
---
 drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
 drivers/s390/crypto/ap_bus.h |   4 +
 2 files changed, 142 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 24a1940b829e..db27bd931308 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -35,6 +35,7 @@
 #include <linux/mod_devicetable.h>
 #include <linux/debugfs.h>
 #include <linux/ctype.h>
+#include <linux/module.h>
 
 #include "ap_bus.h"
 #include "ap_debug.h"
@@ -889,6 +890,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
 	return 0;
 }
 
+static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
+			       unsigned long *newmap)
+{
+	unsigned long size;
+	int rc;
+
+	size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
+	if (*str == '+' || *str == '-') {
+		memcpy(newmap, bitmap, size);
+		rc = modify_bitmap(str, newmap, bits);
+	} else {
+		memset(newmap, 0, size);
+		rc = hex2bitmap(str, newmap, bits);
+	}
+	return rc;
+}
+
 int ap_parse_mask_str(const char *str,
 		      unsigned long *bitmap, int bits,
 		      struct mutex *lock)
@@ -908,14 +926,7 @@ int ap_parse_mask_str(const char *str,
 		kfree(newmap);
 		return -ERESTARTSYS;
 	}
-
-	if (*str == '+' || *str == '-') {
-		memcpy(newmap, bitmap, size);
-		rc = modify_bitmap(str, newmap, bits);
-	} else {
-		memset(newmap, 0, size);
-		rc = hex2bitmap(str, newmap, bits);
-	}
+	rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
 	if (rc == 0)
 		memcpy(bitmap, newmap, size);
 	mutex_unlock(lock);
@@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+static int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newapm = (unsigned long *)data;
+
+	/*
+	 * No need to verify whether the driver is using the queues if it is the
+	 * default driver.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/* The non-default driver's module must be loaded */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use)
+		if (ap_drv->in_use(newapm, ap_perms.aqm))
+			rc = -EADDRINUSE;
+
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+	/*
+	 * Check if any bits in the apmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_card_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t apmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	DECLARE_BITMAP(newapm, AP_DEVICES);
+
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
+
+	rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
+	if (rc)
+		goto done;
 
-	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
+	rc = apmask_commit(newapm);
+
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
@@ -1138,12 +1207,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+static int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newaqm = (unsigned long *)data;
+
+	/*
+	 * If the reserved bits do not identify queues reserved for use by the
+	 * non-default driver, there is no need to verify the driver is using
+	 * the queues.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/* The non-default driver's module must be loaded */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use)
+		if (ap_drv->in_use(ap_perms.apm, newaqm))
+			rc = -EADDRINUSE;
+
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+	/*
+	 * Check if any bits in the aqmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_queue_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	DECLARE_BITMAP(newaqm, AP_DOMAINS);
 
-	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
+
+	rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
+	if (rc)
+		goto done;
+
+	rc = aqmask_commit(newaqm);
+
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 1ea046324e8f..48c57b3d53a0 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -136,6 +136,7 @@ struct ap_driver {
 
 	int (*probe)(struct ap_device *);
 	void (*remove)(struct ap_device *);
+	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
 };
 
 #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -255,6 +256,9 @@ void ap_queue_init_state(struct ap_queue *aq);
 struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
 			       int comp_device_type, unsigned int functions);
 
+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
+
 struct ap_perms {
 	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
 	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (3 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-14 15:31   ` Cornelia Huck
  2020-09-25  9:29   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB Tony Krowiak
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

Let's implement the callback to indicate when an APQN
is in use by the vfio_ap device driver. The callback is
invoked whenever a change to the apmask or aqmask would
result in one or more queue devices being removed from the driver. The
vfio_ap device driver will indicate a resource is in use
if the APQN of any of the queue devices to be removed are assigned to
any of the matrix mdevs under the driver's control.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |  1 +
 drivers/s390/crypto/vfio_ap_ops.c     | 68 ++++++++++++++++++++-------
 drivers/s390/crypto/vfio_ap_private.h |  2 +
 3 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index 24cdef60039a..aae5b3d8e3fa 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -175,6 +175,7 @@ static int __init vfio_ap_init(void)
 	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
 	vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
 	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
+	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.ids = ap_queue_ids;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 2e37ee82e422..fc1aa6f947eb 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -515,18 +515,36 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
 	return 0;
 }
 
+#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
+			 "already assigned to %s"
+
+static void vfio_ap_mdev_log_sharing_err(const char *mdev_name,
+					 unsigned long *apm,
+					 unsigned long *aqm)
+{
+	unsigned long apid, apqi;
+
+	for_each_set_bit_inv(apid, apm, AP_DEVICES)
+		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS)
+			pr_err(MDEV_SHARING_ERR, apid, apqi, mdev_name);
+}
+
 /**
  * vfio_ap_mdev_verify_no_sharing
  *
  * Verifies that the APQNs derived from the cross product of the AP adapter IDs
- * and AP queue indexes comprising the AP matrix are not configured for another
+ * and AP queue indexes comprising an AP matrix are not assigned to another
  * mediated device. AP queue sharing is not allowed.
  *
  * @matrix_mdev: the mediated matrix device
+ * @mdev_apm: mask indicating the APIDs of the APQNs to be verified
+ * @mdev_aqm: mask indicating the APQIs of the APQNs to be verified
  *
  * Returns 0 if the APQNs are not shared, otherwise; returns -EADDRINUSE.
  */
-static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
+static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
+					  unsigned long *mdev_apm,
+					  unsigned long *mdev_aqm)
 {
 	struct ap_matrix_mdev *lstdev;
 	DECLARE_BITMAP(apm, AP_DEVICES);
@@ -543,14 +561,15 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
 		 * We work on full longs, as we can only exclude the leftover
 		 * bits in non-inverse order. The leftover is all zeros.
 		 */
-		if (!bitmap_and(apm, matrix_mdev->matrix.apm,
-				lstdev->matrix.apm, AP_DEVICES))
+		if (!bitmap_and(apm, mdev_apm, lstdev->matrix.apm, AP_DEVICES))
 			continue;
 
-		if (!bitmap_and(aqm, matrix_mdev->matrix.aqm,
-				lstdev->matrix.aqm, AP_DOMAINS))
+		if (!bitmap_and(aqm, mdev_aqm, lstdev->matrix.aqm, AP_DOMAINS))
 			continue;
 
+		vfio_ap_mdev_log_sharing_err(dev_name(mdev_dev(lstdev->mdev)),
+					     apm, aqm);
+
 		return -EADDRINUSE;
 	}
 
@@ -676,6 +695,7 @@ static ssize_t assign_adapter_store(struct device *dev,
 {
 	int ret;
 	unsigned long apid;
+	DECLARE_BITMAP(apm, AP_DEVICES);
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
@@ -701,18 +721,18 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (ret)
 		goto done;
 
-	set_bit_inv(apid, matrix_mdev->matrix.apm);
+	memset(apm, 0, sizeof(apm));
+	set_bit_inv(apid, apm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
+					     matrix_mdev->matrix.aqm);
 	if (ret)
-		goto share_err;
+		goto done;
 
+	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
 	ret = count;
-	goto done;
 
-share_err:
-	clear_bit_inv(apid, matrix_mdev->matrix.apm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -824,6 +844,7 @@ static ssize_t assign_domain_store(struct device *dev,
 {
 	int ret;
 	unsigned long apqi;
+	DECLARE_BITMAP(aqm, AP_DOMAINS);
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
@@ -844,18 +865,18 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (ret)
 		goto done;
 
-	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
+	memset(aqm, 0, sizeof(aqm));
+	set_bit_inv(apqi, aqm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev);
+	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
+					     matrix_mdev->matrix.apm, aqm);
 	if (ret)
-		goto share_err;
+		goto done;
 
+	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
 	ret = count;
-	goto done;
 
-share_err:
-	clear_bit_inv(apqi, matrix_mdev->matrix.aqm);
 done:
 	mutex_unlock(&matrix_dev->lock);
 
@@ -1434,3 +1455,14 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
 	kfree(q);
 	mutex_unlock(&matrix_dev->lock);
 }
+
+bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
+{
+	bool in_use;
+
+	mutex_lock(&matrix_dev->lock);
+	in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
+	mutex_unlock(&matrix_dev->lock);
+
+	return in_use;
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 57da703b549a..0c796ef11426 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -105,4 +105,6 @@ struct vfio_ap_queue {
 int vfio_ap_mdev_probe_queue(struct ap_queue *queue);
 void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
 
+bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (4 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-17 14:22   ` Cornelia Huck
  2020-09-26  1:38   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
                   ` (9 subsequent siblings)
  15 siblings, 2 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

The APCB is a field within the CRYCB that provides the AP configuration
to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
maintain it for the lifespan of the guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c     | 32 ++++++++++++++++++++++-----
 drivers/s390/crypto/vfio_ap_private.h |  2 ++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index fc1aa6f947eb..efb229033f9e 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -305,14 +305,35 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
+{
+	bitmap_clear(matrix->apm, 0, AP_DEVICES);
+	bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
+	bitmap_clear(matrix->adm, 0, AP_DOMAINS);
+}
+
 static void vfio_ap_matrix_init(struct ap_config_info *info,
 				struct ap_matrix *matrix)
 {
+	vfio_ap_matrix_clear_masks(matrix);
 	matrix->apm_max = info->apxa ? info->Na : 63;
 	matrix->aqm_max = info->apxa ? info->Nd : 15;
 	matrix->adm_max = info->apxa ? info->Nd : 15;
 }
 
+static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
+}
+
+static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
+{
+	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
+				  matrix_mdev->shadow_apcb.apm,
+				  matrix_mdev->shadow_apcb.aqm,
+				  matrix_mdev->shadow_apcb.adm);
+}
+
 static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 {
 	struct ap_matrix_mdev *matrix_mdev;
@@ -1202,13 +1223,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	if (ret)
 		return NOTIFY_DONE;
 
-	/* If there is no CRYCB pointer, then we can't copy the masks */
-	if (!matrix_mdev->kvm->arch.crypto.crycbd)
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
 		return NOTIFY_DONE;
 
-	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
-				  matrix_mdev->matrix.aqm,
-				  matrix_mdev->matrix.adm);
+	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
+	       sizeof(matrix_mdev->shadow_apcb));
+	vfio_ap_mdev_commit_crycb(matrix_mdev);
 
 	return NOTIFY_OK;
 }
@@ -1323,6 +1343,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 		kvm_put_kvm(matrix_mdev->kvm);
 		matrix_mdev->kvm = NULL;
 	}
+
+	vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
 	mutex_unlock(&matrix_dev->lock);
 
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 0c796ef11426..055bce6d45db 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -75,6 +75,7 @@ struct ap_matrix {
  * @list:	allows the ap_matrix_mdev struct to be added to a list
  * @matrix:	the adapters, usage domains and control domains assigned to the
  *		mediated matrix device.
+ * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
  * @group_notifier: notifier block used for specifying callback function for
  *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
  * @kvm:	the struct holding guest's state
@@ -82,6 +83,7 @@ struct ap_matrix {
 struct ap_matrix_mdev {
 	struct list_head node;
 	struct ap_matrix matrix;
+	struct ap_matrix shadow_apcb;
 	struct notifier_block group_notifier;
 	struct notifier_block iommu_notifier;
 	struct kvm *kvm;
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (5 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-17 14:34   ` Cornelia Huck
  2020-08-21 19:56 ` [PATCH v10 08/16] s390/vfio-ap: filter matrix for unavailable queue devices Tony Krowiak
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

The matrix of adapters and domains configured in a guest's CRYCB may
differ from the matrix of adapters and domains assigned to the matrix mdev,
so this patch introduces a sysfs attribute to display the matrix of a guest
using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
guest using the matrix mdev can be displayed as follows:

   cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix

If a guest is not using the matrix mdev at the time the crycb is displayed,
an error (ENODEV) will be returned.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 58 +++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index efb229033f9e..30bf23734af6 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1119,6 +1119,63 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(matrix);
 
+static ssize_t guest_matrix_show(struct device *dev,
+				 struct device_attribute *attr, char *buf)
+{
+	struct mdev_device *mdev = mdev_from_dev(dev);
+	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
+	char *bufpos = buf;
+	unsigned long apid;
+	unsigned long apqi;
+	unsigned long apid1;
+	unsigned long apqi1;
+	unsigned long napm_bits = matrix_mdev->shadow_apcb.apm_max + 1;
+	unsigned long naqm_bits = matrix_mdev->shadow_apcb.aqm_max + 1;
+	int nchars = 0;
+	int n;
+
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+		return -ENODEV;
+
+	apid1 = find_first_bit_inv(matrix_mdev->shadow_apcb.apm, napm_bits);
+	apqi1 = find_first_bit_inv(matrix_mdev->shadow_apcb.aqm, naqm_bits);
+
+	mutex_lock(&matrix_dev->lock);
+
+	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
+		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
+				     napm_bits) {
+			for_each_set_bit_inv(apqi,
+					     matrix_mdev->shadow_apcb.aqm,
+					     naqm_bits) {
+				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
+					    apqi);
+				bufpos += n;
+				nchars += n;
+			}
+		}
+	} else if (apid1 < napm_bits) {
+		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
+				     napm_bits) {
+			n = sprintf(bufpos, "%02lx.\n", apid);
+			bufpos += n;
+			nchars += n;
+		}
+	} else if (apqi1 < naqm_bits) {
+		for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+				     naqm_bits) {
+			n = sprintf(bufpos, ".%04lx\n", apqi);
+			bufpos += n;
+			nchars += n;
+		}
+	}
+
+	mutex_unlock(&matrix_dev->lock);
+
+	return nchars;
+}
+static DEVICE_ATTR_RO(guest_matrix);
+
 static struct attribute *vfio_ap_mdev_attrs[] = {
 	&dev_attr_assign_adapter.attr,
 	&dev_attr_unassign_adapter.attr,
@@ -1128,6 +1185,7 @@ static struct attribute *vfio_ap_mdev_attrs[] = {
 	&dev_attr_unassign_control_domain.attr,
 	&dev_attr_control_domains.attr,
 	&dev_attr_matrix.attr,
+	&dev_attr_guest_matrix.attr,
 	NULL,
 };
 
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 08/16] s390/vfio-ap: filter matrix for unavailable queue devices
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (6 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-26  8:24   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

Even though APQNs for queues that are not in the host's AP configuration
may be assigned to a matrix mdev, we do not want to set bits in the guest's
APCB for APQNs that do not reference AP queue devices bound to the vfio_ap
device driver. Ideally, it would be great if such APQNs could be filtered
out before setting the bits in the guest's APCB; however, the architecture
precludes filtering individual APQNs. Consequently, either the APID or APQI
must be filtered.

This patch introduces code to filter the APIDs or APQIs assigned to the
matrix mdev's AP configuration before assigning them to the guest's AP
configuration (i.e., APCB). We'll start by filtering the APIDs:

   If an APQN assigned to the matrix mdev's AP configuration does not
   reference a queue device bound to the vfio_ap device driver, the APID
   will be filtered out (i.e., not assigned to the guest's APCB).

If every APID assigned to the matrix mdev is filtered out, then we'll try
filtering the APQI's:

   If an APQN assigned to the matrix mdev's AP configuration does not
   reference a queue device bound to the vfio_ap device driver, the APQI
   will be filtered out (i.e., not assigned to the guest's APCB).

In any case, if after filtering either the APIDs or APQIs there are any
APQNs that can be assigned to the guest's APCB, they will be assigned and
the CRYCB will be hot plugged into the guest.

Example
=======

APQNs bound to vfio_ap device driver:
   04.0004
   04.0047
   04.0054

   05.0005
   05.0047
   05.0054

Assignments to matrix mdev:
   APIDs  APQIs  -> APQNs
   04     0004      04.0004
   05     0005      04.0005
          0047      04.0047
          0054      04.0054
                    05.0004
                    05.0005
                    05.0047
                    04.0054

Filter APIDs:
   APID 04 will be filtered because APQN 04.0005 is not bound.
   APID 05 will be filtered because APQN 05.0004 is not bound.
   APQNs remaining: None

Filter APQIs:
   APQI 04 will be filtered because APQN 05.0004 is not bound.
   APQI 05 will be filtered because APQN 04.0005 is not bound.
   APQNs remaining: 04.0047, 04.0054, 05.0047, 05.0054

APQNs 04.0047, 04.0054, 05.0047, 05.0054 will be assigned to the CRYCB and
hot plugged into the KVM guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 159 +++++++++++++++++++++++++++++-
 1 file changed, 155 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 30bf23734af6..eaf4e9eab6cb 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -326,7 +326,7 @@ static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
 	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
 }
 
-static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
+static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
 {
 	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
 				  matrix_mdev->shadow_apcb.apm,
@@ -597,6 +597,157 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
 	return 0;
 }
 
+/**
+ * vfio_ap_mdev_filter_matrix
+ *
+ * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
+ * device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
+ * @shadow_apcb:  the shadow of the KVM guest's APCB (contains AP configuration
+ *		  for guest)
+ * @filter_apids: boolean value indicating whether the APQNs shall be filtered
+ *		  by APID (true) or by APQI (false).
+ *
+ * Returns the number of APQNs remaining after filtering is complete.
+ */
+static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
+				      struct ap_matrix *shadow_apcb,
+				      bool filter_apids)
+{
+	unsigned long apid, apqi, apqn;
+
+	memcpy(shadow_apcb, &matrix_mdev->matrix, sizeof(*shadow_apcb));
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+		/*
+		 * If the APID is not assigned to the host AP configuration,
+		 * we can not assign it to the guest's AP configuration
+		 */
+		if (!test_bit_inv(apid,
+				  (unsigned long *)matrix_dev->info.apm)) {
+			clear_bit_inv(apid, shadow_apcb->apm);
+			continue;
+		}
+
+		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+				     AP_DOMAINS) {
+			/*
+			 * If the APQI is not assigned to the host AP
+			 * configuration, then it can not be assigned to the
+			 * guest's AP configuration
+			 */
+			if (!test_bit_inv(apqi, (unsigned long *)
+					  matrix_dev->info.aqm)) {
+				clear_bit_inv(apqi, shadow_apcb->aqm);
+				continue;
+			}
+
+			/*
+			 * If the APQN is not bound to the vfio_ap device
+			 * driver, then we can't assign it to the guest's
+			 * AP configuration. The AP architecture won't
+			 * allow filtering of a single APQN, so if we're
+			 * filtering APIDs, then filter the APID; otherwise,
+			 * filter the APQI.
+			 */
+			apqn = AP_MKQID(apid, apqi);
+			if (!vfio_ap_get_queue(apqn)) {
+				if (filter_apids)
+					clear_bit_inv(apid, shadow_apcb->apm);
+				else
+					clear_bit_inv(apqi, shadow_apcb->aqm);
+				break;
+			}
+		}
+
+		/*
+		 * If we're filtering APQIs and all of them have been filtered,
+		 * there's no need to continue filtering.
+		 */
+		if (!filter_apids)
+			if (bitmap_empty(shadow_apcb->aqm, AP_DOMAINS))
+				break;
+	}
+
+	return bitmap_weight(shadow_apcb->apm, AP_DEVICES) *
+	       bitmap_weight(shadow_apcb->aqm, AP_DOMAINS);
+}
+
+/**
+ * vfio_ap_mdev_config_shadow_apcb
+ *
+ * Configure the shadow of a KVM guest's APCB specifying the adapters, domains
+ * and control domains to be assigned to the guest. The shadow APCB will be
+ * configured after filtering the APQNs assigned to the matrix mdev that do not
+ * reference a queue device bound to the vfio_ap device driver.
+ *
+ * @matrix_mdev: the matrix mdev whose shadow APCB is to be configured.
+ *
+ * Returns true if the shadow APCB contents have been changed; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
+{
+	int napm, naqm;
+	struct ap_matrix shadow_apcb;
+
+	vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
+	napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
+	naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
+
+	/*
+	 * If there are no APIDs or no APQIs assigned to the matrix mdev,
+	 * then no APQNs shall be assigned to the guest CRYCB.
+	 */
+	if ((napm != 0) || (naqm != 0)) {
+		/*
+		 * Filter the APIDs assigned to the matrix mdev for APQNs that
+		 * do not reference an AP queue device bound to the driver.
+		 */
+		napm = vfio_ap_mdev_filter_matrix(matrix_mdev, &shadow_apcb,
+						  true);
+		/*
+		 * If there are no APQNs that can be assigned to the guest's
+		 * CRYCB after filtering, then try filtering the APQIs.
+		 */
+		if (napm == 0) {
+			naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
+							  &shadow_apcb, false);
+
+			/*
+			 * If there are no APQNs that can be assigned to the
+			 * matrix mdev after filtering the APQIs, then no APQNs
+			 * shall be assigned to the guest's CRYCB.
+			 */
+			if (naqm == 0) {
+				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
+				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
+			}
+		}
+	}
+
+	/*
+	 * If the guest's AP configuration has not changed, then return
+	 * indicating such.
+	 */
+	if (bitmap_equal(matrix_mdev->shadow_apcb.apm, shadow_apcb.apm,
+			 AP_DEVICES) &&
+	    bitmap_equal(matrix_mdev->shadow_apcb.aqm, shadow_apcb.aqm,
+			 AP_DOMAINS) &&
+	    bitmap_equal(matrix_mdev->shadow_apcb.adm, shadow_apcb.adm,
+			 AP_DOMAINS))
+		return false;
+
+	/*
+	 * Copy the changes to the guest's CRYCB, then return indicating that
+	 * the guest's AP configuration has changed.
+	 */
+	memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb, sizeof(shadow_apcb));
+
+	return true;
+}
+
 enum qlink_type {
 	LINK_APID,
 	LINK_APQI,
@@ -1284,9 +1435,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
 		return NOTIFY_DONE;
 
-	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
-	       sizeof(matrix_mdev->shadow_apcb));
-	vfio_ap_mdev_commit_crycb(matrix_mdev);
+	if (vfio_ap_mdev_config_shadow_apcb(matrix_mdev))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 
 	return NOTIFY_OK;
 }
@@ -1396,6 +1546,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
 	mutex_lock(&matrix_dev->lock);
 	if (matrix_mdev->kvm) {
 		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+		vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
 		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
 		vfio_ap_mdev_reset_queues(mdev);
 		kvm_put_kvm(matrix_mdev->kvm);
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (7 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 08/16] s390/vfio-ap: filter matrix for unavailable queue devices Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-26 23:49   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 10/16] s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest Tony Krowiak
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

The current implementation does not allow assignment of an AP adapter or
domain to an mdev device if the APQNs resulting from the assignment
do not reference AP queue devices that are bound to the vfio_ap device
driver. This patch allows assignment of AP resources to the matrix mdev as
long as the APQNs resulting from the assignment:
   1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
   2. Are not assigned to another matrix mdev.

The rationale behind this is twofold:
   1. The AP architecture does not preclude assignment of APQNs to an AP
      configuration that are not available to the system.
   2. APQNs that do not reference a queue device bound to the vfio_ap
      device driver will not be assigned to the guest's CRYCB, so the
      guest will not get access to queues not bound to the vfio_ap driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 212 +++++-------------------------
 1 file changed, 35 insertions(+), 177 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index eaf4e9eab6cb..24fd47e43b80 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1,4 +1,3 @@
-// SPDX-License-Identifier: GPL-2.0+
 /*
  * Adjunct processor matrix VFIO device driver callbacks.
  *
@@ -420,122 +419,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
 	NULL,
 };
 
-struct vfio_ap_queue_reserved {
-	unsigned long *apid;
-	unsigned long *apqi;
-	bool reserved;
-};
-
-/**
- * vfio_ap_has_queue
- *
- * @dev: an AP queue device
- * @data: a struct vfio_ap_queue_reserved reference
- *
- * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
- * apid or apqi specified in @data:
- *
- * - If @data contains both an apid and apqi value, then @data will be flagged
- *   as reserved if the APID and APQI fields for the AP queue device matches
- *
- * - If @data contains only an apid value, @data will be flagged as
- *   reserved if the APID field in the AP queue device matches
- *
- * - If @data contains only an apqi value, @data will be flagged as
- *   reserved if the APQI field in the AP queue device matches
- *
- * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
- * @data does not contain either an apid or apqi.
- */
-static int vfio_ap_has_queue(struct device *dev, void *data)
-{
-	struct vfio_ap_queue_reserved *qres = data;
-	struct ap_queue *ap_queue = to_ap_queue(dev);
-	ap_qid_t qid;
-	unsigned long id;
-
-	if (qres->apid && qres->apqi) {
-		qid = AP_MKQID(*qres->apid, *qres->apqi);
-		if (qid == ap_queue->qid)
-			qres->reserved = true;
-	} else if (qres->apid && !qres->apqi) {
-		id = AP_QID_CARD(ap_queue->qid);
-		if (id == *qres->apid)
-			qres->reserved = true;
-	} else if (!qres->apid && qres->apqi) {
-		id = AP_QID_QUEUE(ap_queue->qid);
-		if (id == *qres->apqi)
-			qres->reserved = true;
-	} else {
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-/**
- * vfio_ap_verify_queue_reserved
- *
- * @matrix_dev: a mediated matrix device
- * @apid: an AP adapter ID
- * @apqi: an AP queue index
- *
- * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
- * driver according to the following rules:
- *
- * - If both @apid and @apqi are not NULL, then there must be an AP queue
- *   device bound to the vfio_ap driver with the APQN identified by @apid and
- *   @apqi
- *
- * - If only @apid is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apid
- *
- * - If only @apqi is not NULL, then there must be an AP queue device bound
- *   to the vfio_ap driver with an APQN containing @apqi
- *
- * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
- */
-static int vfio_ap_verify_queue_reserved(unsigned long *apid,
-					 unsigned long *apqi)
-{
-	int ret;
-	struct vfio_ap_queue_reserved qres;
-
-	qres.apid = apid;
-	qres.apqi = apqi;
-	qres.reserved = false;
-
-	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
-				     &qres, vfio_ap_has_queue);
-	if (ret)
-		return ret;
-
-	if (qres.reserved)
-		return 0;
-
-	return -EADDRNOTAVAIL;
-}
-
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apid)
-{
-	int ret;
-	unsigned long apqi;
-	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(&apid, NULL);
-
-	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
 #define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
 			 "already assigned to %s"
 
@@ -572,6 +455,11 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
 	DECLARE_BITMAP(aqm, AP_DOMAINS);
 
 	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
+		/*
+		 * If either of the input masks belongs to the mdev to which an
+		 * AP resource is being assigned, then we don't need to verify
+		 * that mdev's masks.
+		 */
 		if (matrix_mdev == lstdev)
 			continue;
 
@@ -597,6 +485,20 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
 	return 0;
 }
 
+static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
+				       unsigned long *mdev_apm,
+				       unsigned long *mdev_aqm)
+{
+	DECLARE_BITMAP(apm, AP_DEVICES);
+	DECLARE_BITMAP(aqm, AP_DOMAINS);
+
+	if (bitmap_and(apm, mdev_apm, ap_perms.apm, AP_DEVICES) &&
+	    bitmap_and(aqm, mdev_aqm, ap_perms.aqm, AP_DOMAINS))
+		return -EADDRNOTAVAIL;
+
+	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
+}
+
 /**
  * vfio_ap_mdev_filter_matrix
  *
@@ -882,33 +784,21 @@ static ssize_t assign_adapter_store(struct device *dev,
 	if (apid > matrix_mdev->matrix.apm_max)
 		return -ENODEV;
 
-	/*
-	 * Set the bit in the AP mask (APM) corresponding to the AP adapter
-	 * number (APID). The bits in the mask, from most significant to least
-	 * significant bit, correspond to APIDs 0-255.
-	 */
-	mutex_lock(&matrix_dev->lock);
-
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
-	if (ret)
-		goto done;
-
 	memset(apm, 0, sizeof(apm));
 	set_bit_inv(apid, apm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
-					     matrix_mdev->matrix.aqm);
-	if (ret)
-		goto done;
-
+	mutex_lock(&matrix_dev->lock);
+	ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
+					  matrix_mdev->matrix.aqm);
+	if (ret) {
+		mutex_unlock(&matrix_dev->lock);
+		return ret;
+	}
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
-	ret = count;
-
-done:
 	mutex_unlock(&matrix_dev->lock);
 
-	return ret;
+	return count;
 }
 static DEVICE_ATTR_WO(assign_adapter);
 
@@ -958,26 +848,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
-static int
-vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
-					     unsigned long apqi)
-{
-	int ret;
-	unsigned long apid;
-	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
-
-	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
-		return vfio_ap_verify_queue_reserved(NULL, &apqi);
-
-	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
-		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
 /**
  * assign_domain_store
  *
@@ -1031,28 +901,21 @@ static ssize_t assign_domain_store(struct device *dev,
 	if (apqi > max_apqi)
 		return -ENODEV;
 
-	mutex_lock(&matrix_dev->lock);
-
-	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
-	if (ret)
-		goto done;
-
 	memset(aqm, 0, sizeof(aqm));
 	set_bit_inv(apqi, aqm);
 
-	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
-					     matrix_mdev->matrix.apm, aqm);
-	if (ret)
-		goto done;
-
+	mutex_lock(&matrix_dev->lock);
+	ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
+					  aqm);
+	if (ret) {
+		mutex_unlock(&matrix_dev->lock);
+		return ret;
+	}
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
-	ret = count;
-
-done:
 	mutex_unlock(&matrix_dev->lock);
 
-	return ret;
+	return count;
 }
 static DEVICE_ATTR_WO(assign_domain);
 
@@ -1139,11 +1002,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	if (id > matrix_mdev->matrix.adm_max)
 		return -ENODEV;
 
-	/* Set the bit in the ADM (bitmask) corresponding to the AP control
-	 * domain number (id). The bits in the mask, from most significant to
-	 * least significant, correspond to IDs 0 up to the one less than the
-	 * number of control domains that can be assigned.
-	 */
 	mutex_lock(&matrix_dev->lock);
 	set_bit_inv(id, matrix_mdev->matrix.adm);
 	mutex_unlock(&matrix_dev->lock);
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 10/16] s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (8 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-27  0:03   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

The current support for pass-through crypto adapters does not allow
configuration of a matrix mdev when it is in use by a KVM guest. Let's
allow AP resources - i.e., adapters, domains and control domains - to be
assigned to or unassigned from a matrix mdev while it is in use by a guest.
This is in preparation for the introduction of support for dynamic
configuration of the AP matrix for a running KVM guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 24fd47e43b80..cf3321eb239b 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -773,10 +773,6 @@ static ssize_t assign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -828,10 +824,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -891,10 +883,6 @@ static ssize_t assign_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
 
-	/* If the guest is running, disallow assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -946,10 +934,6 @@ static ssize_t unassign_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -991,10 +975,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &id);
 	if (ret)
 		return ret;
@@ -1036,10 +1016,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
 
-	/* If the guest is running, disallow un-assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &domid);
 	if (ret)
 		return ret;
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (9 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 10/16] s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-28  1:01   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 12/16] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

Let's hot plug/unplug adapters, domains and control domains assigned to or
unassigned from an AP matrix mdev device while it is in use by a guest per
the following:

* When the APID of an adapter is assigned to a matrix mdev in use by a KVM
  guest, the adapter will be hot plugged into the KVM guest as long as each
  APQN derived from the Cartesian product of the APID being assigned and
  the APQIs already assigned to the guest's CRYCB references a queue device
  bound to the vfio_ap device driver.

* When the APID of an adapter is unassigned from a matrix mdev in use by a
  KVM guest, the adapter will be hot unplugged from the KVM guest.

* When the APQI of a domain is assigned to a matrix mdev in use by a KVM
  guest, the domain will be hot plugged into the KVM guest as long as each
  APQN derived from the Cartesian product of the APQI being assigned and
  the APIDs already assigned to the guest's CRYCB references a queue device
  bound to the vfio_ap device driver.

* When the APQI of a domain is unassigned from a matrix mdev in use by a
  KVM guest, the domain will be hot unplugged from the KVM guest

* When the domain number of a control domain is assigned to a matrix mdev
  in use by a KVM guest, the control domain will be hot plugged into the
  KVM guest.

* When the domain number of a control domain is unassigned from a matrix
  mdev in use by a KVM guest, the control domain will be hot unplugged
  from the KVM guest.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 196 ++++++++++++++++++++++++++++++
 1 file changed, 196 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index cf3321eb239b..2b01a8eb6ee7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -731,6 +731,56 @@ static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
 	}
 }
 
+static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
+					     unsigned long apid)
+{
+	DECLARE_BITMAP(aqm, AP_DOMAINS);
+	unsigned long apqi, apqn;
+
+	bitmap_copy(aqm, matrix_mdev->matrix.aqm, AP_DOMAINS);
+
+	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
+		if (!test_bit_inv(apqi,
+				  (unsigned long *) matrix_dev->info.aqm))
+			clear_bit_inv(apqi, aqm);
+
+		apqn = AP_MKQID(apid, apqi);
+		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
+			clear_bit_inv(apqi, aqm);
+	}
+
+	if (bitmap_empty(aqm, AP_DOMAINS))
+		return false;
+
+	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+	bitmap_copy(matrix_mdev->shadow_apcb.aqm, aqm, AP_DOMAINS);
+
+	return true;
+}
+
+static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
+					   unsigned long apid)
+{
+	unsigned long apqi, apqn;
+
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
+	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
+		return false;
+
+	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
+		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
+
+	for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
+		apqn = AP_MKQID(apid, apqi);
+		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
+			return false;
+	}
+
+	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+	return true;
+}
+
 /**
  * assign_adapter_store
  *
@@ -792,12 +842,42 @@ static ssize_t assign_adapter_store(struct device *dev,
 	}
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
+	if (vfio_ap_mdev_assign_guest_apid(matrix_mdev, apid))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_adapter);
 
+static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
+					     unsigned long apid)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
+			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+			/*
+			 * If there are no APIDs assigned to the guest, then
+			 * the guest will not have access to any queues, so
+			 * let's also go ahead and unassign the APQIs. Keeping
+			 * them around may yield unpredictable results during
+			 * a probe that is not related to a host AP
+			 * configuration change (i.e., an AP adapter is
+			 * configured online).
+			 */
+			if (bitmap_empty(matrix_mdev->shadow_apcb.apm,
+					 AP_DEVICES))
+				bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
+					     AP_DOMAINS);
+
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * unassign_adapter_store
  *
@@ -834,12 +914,64 @@ static ssize_t unassign_adapter_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
+	if (vfio_ap_mdev_unassign_guest_apid(matrix_mdev, apid))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
+static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
+					     unsigned long apqi)
+{
+	DECLARE_BITMAP(apm, AP_DEVICES);
+	unsigned long apid, apqn;
+
+	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
+
+	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+		if (!test_bit_inv(apid,
+				  (unsigned long *) matrix_dev->info.apm))
+			clear_bit_inv(apqi, apm);
+
+		apqn = AP_MKQID(apid, apqi);
+		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
+			clear_bit_inv(apid, apm);
+	}
+
+	if (bitmap_empty(apm, AP_DEVICES))
+		return false;
+
+	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+	bitmap_copy(matrix_mdev->shadow_apcb.apm, apm, AP_DEVICES);
+
+	return true;
+}
+
+static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
+					   unsigned long apqi)
+{
+	unsigned long apid, apqn;
+
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
+	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
+		return false;
+
+	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
+		return vfio_ap_mdev_assign_apids_4_apqi(matrix_mdev, apqi);
+
+	for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
+		apqn = AP_MKQID(apid, apqi);
+		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
+			return false;
+	}
+
+	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+
+	return true;
+}
+
 /**
  * assign_domain_store
  *
@@ -901,12 +1033,41 @@ static ssize_t assign_domain_store(struct device *dev,
 	}
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
+	if (vfio_ap_mdev_assign_guest_apqi(matrix_mdev, apqi))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_domain);
 
+static bool vfio_ap_mdev_unassign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
+					     unsigned long apqi)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
+			clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+
+			/*
+			 * If there are no APQIs assigned to the guest, then
+			 * the guest will not have access to any queues, so
+			 * let's also go ahead and unassign the APIDs. Keeping
+			 * them around may yield unpredictable results during
+			 * a probe that is not related to a host AP
+			 * configuration change (i.e., an AP adapter is
+			 * configured online).
+			 */
+			if (bitmap_empty(matrix_mdev->shadow_apcb.aqm,
+					 AP_DOMAINS))
+				bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
+					     AP_DEVICES);
+
+			return true;
+		}
+	}
+
+	return false;
+}
 
 /**
  * unassign_domain_store
@@ -944,12 +1105,28 @@ static ssize_t unassign_domain_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
+	if (vfio_ap_mdev_unassign_guest_apqi(matrix_mdev, apqi))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_domain);
 
+static bool vfio_ap_mdev_assign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
+					   unsigned long domid)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+			set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * assign_control_domain_store
  *
@@ -984,12 +1161,29 @@ static ssize_t assign_control_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	set_bit_inv(id, matrix_mdev->matrix.adm);
+	if (vfio_ap_mdev_assign_guest_cdom(matrix_mdev, id))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_control_domain);
 
+static bool
+vfio_ap_mdev_unassign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
+				 unsigned long domid)
+{
+	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
+		if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+			clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+
+			return true;
+		}
+	}
+
+	return false;
+}
+
 /**
  * unassign_control_domain_store
  *
@@ -1024,6 +1218,8 @@ static ssize_t unassign_control_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv(domid, matrix_mdev->matrix.adm);
+	if (vfio_ap_mdev_unassign_guest_cdom(matrix_mdev, domid))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 12/16] s390/zcrypt: Notify driver on config changed and scan complete callbacks
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (10 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-27  1:39   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification Tony Krowiak
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

This patch intruduces an extension to the ap bus to notify drivers
on crypto config changed and bus scan complete events.
Two new callbacks are introduced for ap_drivers:

  void (*on_config_changed)(struct ap_config_info *new_config_info,
                             struct ap_config_info *old_config_info);
  void (*on_scan_complete)(struct ap_config_info *new_config_info,
                             struct ap_config_info *old_config_info);

Both callbacks are optional. Both callbacks are only triggered
when QCI information is available (facility bit 12):

* The on_config_changed callback is invoked at the start of the AP bus scan
  function when it determines that the host AP configuration information
  has changed since the previous scan. This is done by storing
  an old and current QCI info struct and comparing them. If there is any
  difference, the callback is invoked.

  Note that when the AP bus scan detects that AP adapters or domains have
  been removed from the host's AP configuration, it will remove the
  associated devices from the AP bus subsystem's device model. This
  callback gives the device driver a chance to respond to the removal
  of the AP devices in bulk rather than one at a time as its remove
  callback is invoked. It will also allow the device driver to do any
  any cleanup prior to giving control back to the bus piecemeal. This is
  particularly important for the vfio_ap driver because there may be
  guests using the queues at the time.

* The on_scan_complete callback is invoked after the ap bus scan is
  complete if the host AP configuration data has changed.

  Note that when the AP bus scan detects that adapters or domains have
  been added to the host's configuration, it will create new devices in
  the AP bus subsystem's device model. This callback also allows the driver
  to process all of the new devices in bulk.

Please note that changes to the apmask and aqmask do not trigger
these two callbacks since the bus scan function is not invoked by changes
to those masks.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/ap_bus.c | 85 +++++++++++++++++++++++++++++++++++-
 drivers/s390/crypto/ap_bus.h | 12 +++++
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index db27bd931308..cbf4c4d2e573 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -73,8 +73,10 @@ struct ap_perms ap_perms;
 EXPORT_SYMBOL(ap_perms);
 DEFINE_MUTEX(ap_perms_mutex);
 EXPORT_SYMBOL(ap_perms_mutex);
+DEFINE_MUTEX(ap_config_lock);
 
 static struct ap_config_info *ap_qci_info;
+static struct ap_config_info *ap_qci_info_old;
 
 /*
  * AP bus related debug feature things.
@@ -1412,6 +1414,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
 		&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
 }
 
+/* Helper function for notify_config_changed */
+static int __drv_notify_config_changed(struct device_driver *drv, void *data)
+{
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+
+	if (try_module_get(drv->owner)) {
+		if (ap_drv->on_config_changed)
+			ap_drv->on_config_changed(ap_qci_info,
+						  ap_qci_info_old);
+		module_put(drv->owner);
+	}
+
+	return 0;
+}
+
+/* Notify all drivers about an qci config change */
+static inline void notify_config_changed(void)
+{
+	bus_for_each_drv(&ap_bus_type, NULL, NULL,
+			 __drv_notify_config_changed);
+}
+
+/* Helper function for notify_scan_complete */
+static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
+{
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+
+	if (try_module_get(drv->owner)) {
+		if (ap_drv->on_scan_complete)
+			ap_drv->on_scan_complete(ap_qci_info,
+						 ap_qci_info_old);
+		module_put(drv->owner);
+	}
+
+	return 0;
+}
+
+/* Notify all drivers about bus scan complete */
+static inline void notify_scan_complete(void)
+{
+	bus_for_each_drv(&ap_bus_type, NULL, NULL,
+			 __drv_notify_scan_complete);
+}
+
+
+
 /*
  * Helper function for ap_scan_bus().
  * Does the scan bus job for the given adapter id.
@@ -1565,15 +1613,44 @@ static void _ap_scan_bus_adapter(int id)
 		put_device(&ac->ap_dev.device);
 }
 
+static int ap_config_changed(void)
+{
+	int cfg_chg = 0;
+
+	if (ap_qci_info) {
+		if (!ap_qci_info_old) {
+			ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
+						  GFP_KERNEL);
+			if (!ap_qci_info_old)
+				return 0;
+		} else {
+			memcpy(ap_qci_info_old, ap_qci_info,
+			       sizeof(struct ap_config_info));
+		}
+		ap_fetch_qci_info(ap_qci_info);
+		cfg_chg = memcmp(ap_qci_info,
+				 ap_qci_info_old,
+				 sizeof(struct ap_config_info)) != 0;
+	}
+
+	return cfg_chg;
+}
+
 /**
  * ap_scan_bus(): Scan the AP bus for new devices
  * Runs periodically, workqueue timer (ap_config_time)
  */
 static void ap_scan_bus(struct work_struct *unused)
 {
-	int id;
+	int id, config_changed = 0;
 
 	ap_fetch_qci_info(ap_qci_info);
+	mutex_lock(&ap_config_lock);
+
+	/* config change notify */
+	config_changed = ap_config_changed();
+	if (config_changed)
+		notify_config_changed();
 	ap_select_domain();
 
 	AP_DBF(DBF_DEBUG, "%s running\n", __func__);
@@ -1582,6 +1659,12 @@ static void ap_scan_bus(struct work_struct *unused)
 	for (id = 0; id < AP_DEVICES; id++)
 		_ap_scan_bus_adapter(id);
 
+	/* scan complete notify */
+	if (config_changed)
+		notify_scan_complete();
+
+	mutex_unlock(&ap_config_lock);
+
 	/* check if there is at least one queue available with default domain */
 	if (ap_domain_index >= 0) {
 		struct device *dev =
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 48c57b3d53a0..3fc743ac549c 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -137,6 +137,18 @@ struct ap_driver {
 	int (*probe)(struct ap_device *);
 	void (*remove)(struct ap_device *);
 	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
+	/*
+	 * Called at the start of the ap bus scan function when
+	 * the crypto config information (qci) has changed.
+	 */
+	void (*on_config_changed)(struct ap_config_info *new_config_info,
+				  struct ap_config_info *old_config_info);
+	/*
+	 * Called at the end of the ap bus scan function when
+	 * the crypto config information (qci) has changed.
+	 */
+	void (*on_scan_complete)(struct ap_config_info *new_config_info,
+				 struct ap_config_info *old_config_info);
 };
 
 #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (11 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 12/16] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-28  1:38   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 14/16] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak, kernel test robot

Implements the driver callback invoked by the AP bus when the host
AP configuration has changed. Since this callback is invoked prior to
unbinding a device from its device driver, the vfio_ap driver will
respond by unplugging the AP adapters, domains and control domains
removed from the host's AP configuration from the guests using them.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |   5 +-
 drivers/s390/crypto/vfio_ap_ops.c     | 147 ++++++++++++++++++++++++--
 drivers/s390/crypto/vfio_ap_private.h |   7 +-
 3 files changed, 146 insertions(+), 13 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index aae5b3d8e3fa..ea0a7603e886 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -115,9 +115,11 @@ static int vfio_ap_matrix_dev_create(void)
 
 	/* Fill in config info via PQAP(QCI), if available */
 	if (test_facility(12)) {
-		ret = ap_qci(&matrix_dev->info);
+		ret = ap_qci(&matrix_dev->config_info);
 		if (ret)
 			goto matrix_alloc_err;
+		memcpy(&matrix_dev->config_info_prev, &matrix_dev->config_info,
+		       sizeof(struct ap_config_info));
 	}
 
 	mutex_init(&matrix_dev->lock);
@@ -177,6 +179,7 @@ static int __init vfio_ap_init(void)
 	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
 	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.ids = ap_queue_ids;
+	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
 	if (ret) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 2b01a8eb6ee7..e002d556abab 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -347,7 +347,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 	}
 
 	matrix_mdev->mdev = mdev;
-	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
+	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
+	vfio_ap_matrix_init(&matrix_dev->config_info,
+			    &matrix_mdev->shadow_apcb);
 	hash_init(matrix_mdev->qtable);
 	mdev_set_drvdata(mdev, matrix_mdev);
 	matrix_mdev->pqap_hook.hook = handle_pqap;
@@ -526,8 +528,8 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
 		 * If the APID is not assigned to the host AP configuration,
 		 * we can not assign it to the guest's AP configuration
 		 */
-		if (!test_bit_inv(apid,
-				  (unsigned long *)matrix_dev->info.apm)) {
+		if (!test_bit_inv(apid, (unsigned long *)
+				  matrix_dev->config_info.apm)) {
 			clear_bit_inv(apid, shadow_apcb->apm);
 			continue;
 		}
@@ -540,7 +542,7 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
 			 * guest's AP configuration
 			 */
 			if (!test_bit_inv(apqi, (unsigned long *)
-					  matrix_dev->info.aqm)) {
+					  matrix_dev->config_info.aqm)) {
 				clear_bit_inv(apqi, shadow_apcb->aqm);
 				continue;
 			}
@@ -594,7 +596,7 @@ static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
 	int napm, naqm;
 	struct ap_matrix shadow_apcb;
 
-	vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
+	vfio_ap_matrix_init(&matrix_dev->config_info, &shadow_apcb);
 	napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
 	naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
 
@@ -741,7 +743,7 @@ static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
 
 	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
 		if (!test_bit_inv(apqi,
-				  (unsigned long *) matrix_dev->info.aqm))
+				  (unsigned long *)matrix_dev->config_info.aqm))
 			clear_bit_inv(apqi, aqm);
 
 		apqn = AP_MKQID(apid, apqi);
@@ -764,7 +766,7 @@ static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
 	unsigned long apqi, apqn;
 
 	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
-	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
+	    !test_bit_inv(apid, (unsigned long *)matrix_dev->config_info.apm))
 		return false;
 
 	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
@@ -931,8 +933,8 @@ static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
 	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
 
 	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
-		if (!test_bit_inv(apid,
-				  (unsigned long *) matrix_dev->info.apm))
+		if (!test_bit_inv(apid, (unsigned long *)
+				  matrix_dev->config_info.apm))
 			clear_bit_inv(apqi, apm);
 
 		apqn = AP_MKQID(apid, apqi);
@@ -955,7 +957,7 @@ static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
 	unsigned long apid, apqn;
 
 	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
-	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
+	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->config_info.aqm))
 		return false;
 
 	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
@@ -1702,7 +1704,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
 void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
 {
 	struct vfio_ap_queue *q;
-	int apid, apqi;
+	unsigned long apid, apqi;
 
 	mutex_lock(&matrix_dev->lock);
 	q = dev_get_drvdata(&queue->ap_dev.device);
@@ -1727,3 +1729,126 @@ bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
 
 	return in_use;
 }
+
+/**
+ * vfio_ap_mdev_unassign_apids
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @aqm: A bitmap with 256 bits. Each bit in the map represents an APID from 0
+ *	 to 255 (with the leftmost bit corresponding to APID 0).
+ *
+ * Unassigns each APID specified in @aqm that is assigned to the shadow CRYCB
+ * of @matrix_mdev. Returns true if at least one APID is unassigned; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long *apm_unassign)
+{
+	unsigned long apid;
+	bool unassigned = false;
+
+	/*
+	 * If the matrix mdev is not in use by a KVM guest, return indicating
+	 * that no APIDs have been unassigned.
+	 */
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+		return false;
+
+	for_each_set_bit_inv(apid, apm_unassign, AP_DEVICES) {
+		unassigned |= vfio_ap_mdev_unassign_guest_apid(matrix_mdev,
+							       apid);
+	}
+
+	return unassigned;
+}
+
+/**
+ * vfio_ap_mdev_unassign_apqis
+ *
+ * @matrix_mdev: The matrix mediated device
+ *
+ * @aqm: A bitmap with 256 bits. Each bit in the map represents an APQI from 0
+ *	 to 255 (with the leftmost bit corresponding to APQI 0).
+ *
+ * Unassigns each APQI specified in @aqm that is assigned to the shadow CRYCB
+ * of @matrix_mdev. Returns true if at least one APQI is unassigned; otherwise,
+ * returns false.
+ */
+static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long *aqm_unassign)
+{
+	unsigned long apqi;
+	bool unassigned = false;
+
+	/*
+	 * If the matrix mdev is not in use by a KVM guest, return indicating
+	 * that no APQIs have been unassigned.
+	 */
+	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+		return false;
+
+	for_each_set_bit_inv(apqi, aqm_unassign, AP_DOMAINS) {
+		unassigned |= vfio_ap_mdev_unassign_guest_apqi(matrix_mdev,
+							       apqi);
+	}
+
+	return unassigned;
+}
+
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+			    struct ap_config_info *old_config_info)
+{
+	bool unassigned;
+	int ap_remove, aq_remove;
+	struct ap_matrix_mdev *matrix_mdev;
+	DECLARE_BITMAP(apm_unassign, AP_DEVICES);
+	DECLARE_BITMAP(aqm_unassign, AP_DOMAINS);
+
+	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
+
+	if (matrix_dev->flags & AP_MATRIX_CFG_CHG) {
+		WARN_ONCE(1, "AP host configuration change already reported");
+		return;
+	}
+
+	memcpy(&matrix_dev->config_info, new_config_info,
+	       sizeof(struct ap_config_info));
+	memcpy(&matrix_dev->config_info_prev, old_config_info,
+	       sizeof(struct ap_config_info));
+
+	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+
+	ap_remove = bitmap_andnot(apm_unassign, prev_apm, cur_apm, AP_DEVICES);
+	aq_remove = bitmap_andnot(aqm_unassign, prev_aqm, cur_aqm, AP_DOMAINS);
+
+	mutex_lock(&matrix_dev->lock);
+	matrix_dev->flags |= AP_MATRIX_CFG_CHG;
+
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+			continue;
+
+		unassigned = false;
+
+		if (ap_remove)
+			if (bitmap_intersects(matrix_mdev->shadow_apcb.apm,
+					      apm_unassign, AP_DEVICES))
+				if (vfio_ap_mdev_unassign_apids(matrix_mdev,
+								apm_unassign))
+					unassigned = true;
+		if (aq_remove)
+			if (bitmap_intersects(matrix_mdev->shadow_apcb.aqm,
+					      aqm_unassign, AP_DOMAINS))
+				if (vfio_ap_mdev_unassign_apqis(matrix_mdev,
+								aqm_unassign))
+					unassigned = true;
+
+		if (unassigned)
+			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 055bce6d45db..fc8629e28ad3 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -40,10 +40,13 @@
 struct ap_matrix_dev {
 	struct device device;
 	atomic_t available_instances;
-	struct ap_config_info info;
+	struct ap_config_info config_info;
+	struct ap_config_info config_info_prev;
 	struct list_head mdev_list;
 	struct mutex lock;
 	struct ap_driver  *vfio_ap_drv;
+	#define AP_MATRIX_CFG_CHG (1UL << 0)
+	unsigned long flags;
 };
 
 extern struct ap_matrix_dev *matrix_dev;
@@ -108,5 +111,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue);
 void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
 
 bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
+void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
+			    struct ap_config_info *old_config_info);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 14/16] s390/vfio-ap: handle AP bus scan completed notification
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (12 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-28  2:11   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 15/16] s390/vfio-ap: handle probe/remove not due to host AP config changes Tony Krowiak
  2020-08-21 19:56 ` [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

Implements the driver callback invoked by the AP bus when the AP bus
scan has completed. Since this callback is invoked after binding the newly
added devices to their respective device drivers, the vfio_ap driver will
attempt to plug the adapters, domains and control domains into each guest
using a matrix mdev to which they are assigned. Keep in mind that an
adapter or domain can be plugged in only if each APQN with the APID of the
adapter or the APQI of the domain references a queue device bound to the
vfio_ap device driver. Consequently, not all newly added adapters and
domains will necessarily get hot plugged.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_drv.c     |   1 +
 drivers/s390/crypto/vfio_ap_ops.c     | 110 +++++++++++++++++++++++++-
 drivers/s390/crypto/vfio_ap_private.h |   2 +
 3 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index ea0a7603e886..21bfae928be5 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -180,6 +180,7 @@ static int __init vfio_ap_init(void)
 	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
 	vfio_ap_drv.ids = ap_queue_ids;
 	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
+	vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
 
 	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
 	if (ret) {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e002d556abab..e6480f31a42b 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -616,14 +616,13 @@ static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
 		 * CRYCB after filtering, then try filtering the APQIs.
 		 */
 		if (napm == 0) {
-			naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
-							  &shadow_apcb, false);
-
 			/*
 			 * If there are no APQNs that can be assigned to the
 			 * matrix mdev after filtering the APQIs, then no APQNs
 			 * shall be assigned to the guest's CRYCB.
 			 */
+			naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
+							  &shadow_apcb, false);
 			if (naqm == 0) {
 				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
 				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
@@ -1758,6 +1757,16 @@ static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
 	for_each_set_bit_inv(apid, apm_unassign, AP_DEVICES) {
 		unassigned |= vfio_ap_mdev_unassign_guest_apid(matrix_mdev,
 							       apid);
+		/*
+		 * If the APID is not assigned to the matrix mdev's shadow
+		 * CRYCB, continue with the next APID.
+		 */
+		if (!test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
+			continue;
+
+		/* Unassign the APID from the matrix mdev's shadow CRYCB */
+		clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+		unassigned = true;
 	}
 
 	return unassigned;
@@ -1791,6 +1800,17 @@ static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
 	for_each_set_bit_inv(apqi, aqm_unassign, AP_DOMAINS) {
 		unassigned |= vfio_ap_mdev_unassign_guest_apqi(matrix_mdev,
 							       apqi);
+
+		/*
+		 * If the APQI is not assigned to the matrix mdev's shadow
+		 * CRYCB, continue with the next APQI
+		 */
+		if (!test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
+			continue;
+
+		/* Unassign the APQI from the matrix mdev's shadow CRYCB */
+		clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+		unassigned = true;
 	}
 
 	return unassigned;
@@ -1852,3 +1872,87 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
 	}
 	mutex_unlock(&matrix_dev->lock);
 }
+
+bool vfio_ap_mdev_assign_apids(struct ap_matrix_mdev *matrix_mdev,
+			       unsigned long *apm_assign)
+{
+	unsigned long apid;
+	bool assigned = false;
+
+	for_each_set_bit_inv(apid, apm_assign, AP_DEVICES)
+		if (test_bit_inv(apid, matrix_mdev->matrix.apm))
+			if (vfio_ap_mdev_assign_guest_apid(matrix_mdev, apid))
+				assigned = true;
+
+	return assigned;
+}
+
+bool vfio_ap_mdev_assign_apqis(struct ap_matrix_mdev *matrix_mdev,
+			       unsigned long *aqm_assign)
+{
+	unsigned long apqi;
+	bool assigned = false;
+
+	for_each_set_bit_inv(apqi, aqm_assign, AP_DOMAINS)
+		if (test_bit_inv(apqi, matrix_mdev->matrix.aqm))
+			if (vfio_ap_mdev_assign_guest_apqi(matrix_mdev, apqi))
+				assigned = true;
+
+	return assigned;
+}
+
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+			      struct ap_config_info *old_config_info)
+{
+	struct ap_matrix_mdev *matrix_mdev;
+	DECLARE_BITMAP(apm_assign, AP_DEVICES);
+	DECLARE_BITMAP(aqm_assign, AP_DOMAINS);
+	int ap_add, aq_add;
+	bool assign;
+	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
+
+	/*
+	 * If we are not in the middle of a host configuration change scan it is
+	 * likely that the vfio_ap driver was loaded mid-scan, so let's handle
+	 * this scenario by calling the vfio_ap_on_cfg_changed function which
+	 * gets called at the start of an AP bus scan when the host AP
+	 * configuration has changed.
+	 */
+	if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
+		vfio_ap_on_cfg_changed(new_config_info, old_config_info);
+
+	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
+	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
+
+	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
+	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
+
+	ap_add = bitmap_andnot(apm_assign, cur_apm, prev_apm, AP_DEVICES);
+	aq_add = bitmap_andnot(aqm_assign, cur_aqm, prev_aqm, AP_DOMAINS);
+
+	mutex_lock(&matrix_dev->lock);
+	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
+		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
+			continue;
+
+		assign = false;
+
+		if (ap_add)
+			if (bitmap_intersects(matrix_mdev->matrix.apm,
+					      apm_assign, AP_DEVICES))
+				assign |= vfio_ap_mdev_assign_apids(matrix_mdev,
+								    apm_assign);
+
+		if (aq_add)
+			if (bitmap_intersects(matrix_mdev->matrix.aqm,
+					      aqm_assign, AP_DOMAINS))
+				assign |= vfio_ap_mdev_assign_apqis(matrix_mdev,
+								    aqm_assign);
+
+		if (assign)
+			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+
+	matrix_dev->flags &= ~AP_MATRIX_CFG_CHG;
+	mutex_unlock(&matrix_dev->lock);
+}
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index fc8629e28ad3..da1754fd4f66 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -113,5 +113,7 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
 bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
 void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
 			    struct ap_config_info *old_config_info);
+void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
+			      struct ap_config_info *old_config_info);
 
 #endif /* _VFIO_AP_PRIVATE_H_ */
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 15/16] s390/vfio-ap: handle probe/remove not due to host AP config changes
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (13 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 14/16] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-09-28  2:45   ` Halil Pasic
  2020-08-21 19:56 ` [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
  15 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

AP queue devices are probed or removed for reasons other than changes
to the host AP configuration. For example:

* The state of an AP adapter can be dynamically changed from standby to
  online via the SE or by execution of the SCLP Configure AP command. When
  the state changes, each queue device associated with the card device
  representing the adapter will get created and probed.

* The state of an AP adapter can be dynamically changed from online to
  standby via the SE or by execution of the SCLP Deconfigure AP command.
  When the state changes, each queue device associated with the card device
  representing the adapter will get removed.

* Each queue device associated with a card device will get removed
  when the type of the AP adapter represented by the card device
  dynamically changes.

* Each queue device associated with a card device will get removed
  when the status of the queue represented by the queue device changes
  from operating to check stop.

* AP queue devices can be manually bound to or unbound from the vfio_ap
  device driver by a root user via the sysfs bind/unbind attributes of the
  driver.

In response to a queue device probe or remove that is not the result of a
change to the host's AP configuration, if a KVM guest is using the matrix
mdev to which the APQN of the queue device is assigned, the vfio_ap device
driver must respond accordingly. In an ideal world, the queue device being
probed would be hot plugged into the guest. Likewise, the queue
corresponding to the queue device being removed would
be hot unplugged from the guest. Unfortunately, the AP architecture
precludes plugging or unplugging individual queues, so let's handle
the probe or remove of an AP queue device as follows:

Handling Probe
--------------
There are two requirements that must be met in order to give a
guest access to the queue corresponding to the queue device being probed:

* Each APQN derived from the APID of the queue device and the APQIs of the
  domains already assigned to the guest's AP configuration must reference
  a queue device bound to the vfio_ap device driver.

* Each APQN derived from the APQI of the queue device and the APIDs of the
  adapters assigned to the guest's AP configuration must reference a queue
  device bound to the vfio_ap device driver.

If the above conditions are met, the APQN will be assigned to the guest's
AP configuration and the guest will be given access to the queue.

Handling Remove
---------------
Since the AP architecture precludes us from taking access to an individual
queue from a guest, we are left with the choice of taking access away from
either the adapter or the domain to which the queue is connected. Access to
the adapter will be taken away because it is likely that most of the time,
the remove callback will be invoked because the adapter state has
transitioned from online to standby. In such a case, no queue connected
to the adapter will be available to access.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 84 +++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e6480f31a42b..b6a1e280991d 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1682,6 +1682,61 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
 	}
 }
 
+static bool vfio_ap_mdev_assign_shadow_apid(struct ap_matrix_mdev *matrix_mdev,
+					    unsigned long apid)
+{
+	unsigned long apqi;
+
+	for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
+			     matrix_mdev->shadow_apcb.aqm_max + 1) {
+		if (!vfio_ap_get_queue(AP_MKQID(apid, apqi)))
+			return false;
+	}
+
+	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+	return true;
+}
+
+static bool vfio_ap_mdev_assign_shadow_apqi(struct ap_matrix_mdev *matrix_mdev,
+					    unsigned long apqi)
+{
+	unsigned long apid;
+
+	for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
+			     matrix_mdev->shadow_apcb.apm_max + 1) {
+		if (!vfio_ap_get_queue(AP_MKQID(apid, apqi)))
+			return false;
+	}
+
+	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+
+	return true;
+}
+
+static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
+{
+	bool commit = false;
+	unsigned long apid = AP_QID_CARD(q->apqn);
+	unsigned long apqi = AP_QID_QUEUE(q->apqn);
+
+	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+		return;
+
+	if (!test_bit_inv(apid, q->matrix_mdev->matrix.apm) ||
+	    !test_bit_inv(apqi, q->matrix_mdev->matrix.aqm))
+		return;
+
+	if (!test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm))
+		commit |= vfio_ap_mdev_assign_shadow_apid(q->matrix_mdev, apid);
+
+	if (!test_bit_inv(apqi, q->matrix_mdev->shadow_apcb.aqm))
+		commit |= vfio_ap_mdev_assign_shadow_apqi(q->matrix_mdev, apqi);
+
+	if (commit)
+		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+}
+
 int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
 {
 	struct vfio_ap_queue *q;
@@ -1695,11 +1750,35 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
 	q->apqn = queue->qid;
 	q->saved_isc = VFIO_AP_ISC_INVALID;
 	vfio_ap_queue_link_mdev(q);
+	/* Make sure we're not in the middle of an AP configuration change. */
+	if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
+		vfio_ap_mdev_hot_plug_queue(q);
 	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
 }
 
+void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
+{
+	unsigned long apid = AP_QID_CARD(q->apqn);
+	unsigned long apqi = AP_QID_QUEUE(q->apqn);
+
+	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
+		return;
+
+	/*
+	 * If the APQN is assigned to the guest, then let's
+	 * go ahead and unplug the adapter since the
+	 * architecture does not provide a means to unplug
+	 * an individual queue.
+	 */
+	if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) &&
+	    test_bit_inv(apqi, q->matrix_mdev->shadow_apcb.aqm)) {
+		if (vfio_ap_mdev_unassign_guest_apid(q->matrix_mdev, apid))
+			vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
+	}
+}
+
 void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
 {
 	struct vfio_ap_queue *q;
@@ -1707,6 +1786,11 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
 
 	mutex_lock(&matrix_dev->lock);
 	q = dev_get_drvdata(&queue->ap_dev.device);
+
+	/* Make sure we're not in the middle of an AP configuration change. */
+	if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
+		vfio_ap_mdev_hot_unplug_queue(q);
+
 	dev_set_drvdata(&queue->ap_dev.device, NULL);
 	apid = AP_QID_CARD(q->apqn);
 	apqi = AP_QID_QUEUE(q->apqn);
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support
  2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
                   ` (14 preceding siblings ...)
  2020-08-21 19:56 ` [PATCH v10 15/16] s390/vfio-ap: handle probe/remove not due to host AP config changes Tony Krowiak
@ 2020-08-21 19:56 ` Tony Krowiak
  2020-08-25 10:45   ` Cornelia Huck
  2020-09-28  2:48   ` Halil Pasic
  15 siblings, 2 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-21 19:56 UTC (permalink / raw)
  To: linux-s390, linux-kernel, kvm
  Cc: freude, borntraeger, cohuck, mjrosato, pasic, alex.williamson,
	kwankhede, fiuczy, frankja, david, imbrenda, hca, gor,
	Tony Krowiak

Update the documentation in vfio-ap.rst to include information about the
AP dynamic configuration support (i.e., hot plug of adapters, domains
and control domains via the matrix mediated device's sysfs assignment
attributes).

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 Documentation/s390/vfio-ap.rst | 362 ++++++++++++++++++++++++++-------
 1 file changed, 285 insertions(+), 77 deletions(-)

diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
index e15436599086..8907aeca8fb7 100644
--- a/Documentation/s390/vfio-ap.rst
+++ b/Documentation/s390/vfio-ap.rst
@@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM guest is:
 1. The administrator loads the vfio_ap device driver
 2. The vfio-ap driver during its initialization will register a single 'matrix'
    device with the device core. This will serve as the parent device for
-   all mediated matrix devices used to configure an AP matrix for a guest.
+   all matrix mediated devices used to configure an AP matrix for a guest.
 3. The /sys/devices/vfio_ap/matrix device is created by the device core
 4. The vfio_ap device driver will register with the AP bus for AP queue devices
    of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
@@ -269,7 +269,7 @@ The process for reserving an AP queue for use by a KVM guest is:
    default zcrypt cex4queue driver.
 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
    it.
-9. The administrator creates a passthrough type mediated matrix device to be
+9. The administrator creates a passthrough type matrix mediated device to be
    used by a guest
 10. The administrator assigns the adapters, usage domains and control domains
     to be exclusively used by a guest.
@@ -279,14 +279,14 @@ Set up the VFIO mediated device interfaces
 The VFIO AP device driver utilizes the common interface of the VFIO mediated
 device core driver to:
 
-* Register an AP mediated bus driver to add a mediated matrix device to and
+* Register an AP mediated bus driver to add a matrix mediated device to and
   remove it from a VFIO group.
-* Create and destroy a mediated matrix device
-* Add a mediated matrix device to and remove it from the AP mediated bus driver
-* Add a mediated matrix device to and remove it from an IOMMU group
+* Create and destroy a matrix mediated device
+* Add a matrix mediated device to and remove it from the AP mediated bus driver
+* Add a matrix mediated device to and remove it from an IOMMU group
 
 The following high-level block diagram shows the main components and interfaces
-of the VFIO AP mediated matrix device driver::
+of the VFIO AP matrix mediated device driver::
 
    +-------------+
    |             |
@@ -351,29 +351,37 @@ matrix device.
     This attribute group identifies the user-defined sysfs attributes of the
     mediated device. When a device is registered with the VFIO mediated device
     framework, the sysfs attribute files identified in the 'mdev_attr_groups'
-    structure will be created in the mediated matrix device's directory. The
-    sysfs attributes for a mediated matrix device are:
+    structure will be created in the matrix mediated device's directory. The
+    sysfs attributes for a matrix mediated device are:
 
     assign_adapter / unassign_adapter:
       Write-only attributes for assigning/unassigning an AP adapter to/from the
-      mediated matrix device. To assign/unassign an adapter, the APID of the
+      matrix mediated device. To assign/unassign an adapter, the APID of the
       adapter is echoed to the respective attribute file.
     assign_domain / unassign_domain:
       Write-only attributes for assigning/unassigning an AP usage domain to/from
-      the mediated matrix device. To assign/unassign a domain, the domain
+      the matrix mediated device. To assign/unassign a domain, the domain
       number of the usage domain is echoed to the respective attribute
       file.
     matrix:
-      A read-only file for displaying the APQNs derived from the cross product
-      of the adapter and domain numbers assigned to the mediated matrix device.
+      A read-only file for displaying the APQNs derived from the Cartesian
+      product of the adapter and domain numbers assigned to the mediated matrix
+      device.
+    guest_matrix:
+      A read-only file for displaying the APQNs derived from the Cartesian
+      product of the adapter and domain numbers assigned to the APM and AQM
+      fields respectively of the KVM guest's CRYCB. This will differ from the
+      matrix if any APQNs assigned to the matrix mediated device do not
+      reference a queue device bound to the vfio_ap device driver (i.e., the
+      queue is not in the AP configuration).
     assign_control_domain / unassign_control_domain:
       Write-only attributes for assigning/unassigning an AP control domain
-      to/from the mediated matrix device. To assign/unassign a control domain,
+      to/from the matrix mediated device. To assign/unassign a control domain,
       the ID of the domain to be assigned/unassigned is echoed to the respective
       attribute file.
     control_domains:
       A read-only file for displaying the control domain numbers assigned to the
-      mediated matrix device.
+      matrix mediated device.
 
 * functions:
 
@@ -385,7 +393,7 @@ matrix device.
       domains assigned via the corresponding sysfs attributes files
 
   remove:
-    deallocates the mediated matrix device's ap_matrix_mdev structure. This will
+    deallocates the matrix mediated device's ap_matrix_mdev structure. This will
     be allowed only if a running guest is not using the mdev.
 
 * callback interfaces
@@ -397,7 +405,7 @@ matrix device.
     for the mdev matrix device to the MDEV bus. Access to the KVM structure used
     to configure the KVM guest is provided via this callback. The KVM structure,
     is used to configure the guest's access to the AP matrix defined via the
-    mediated matrix device's sysfs attribute files.
+    matrix mediated device's sysfs attribute files.
   release:
     unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
     mdev matrix device and deconfigures the guest's AP matrix.
@@ -410,11 +418,49 @@ function is called when QEMU connects to KVM. The guest's AP matrix is
 configured via it's CRYCB by:
 
 * Setting the bits in the APM corresponding to the APIDs assigned to the
-  mediated matrix device via its 'assign_adapter' interface.
+  matrix mediated device via its 'assign_adapter' interface.
 * Setting the bits in the AQM corresponding to the domains assigned to the
-  mediated matrix device via its 'assign_domain' interface.
+  matrix mediated device via its 'assign_domain' interface.
 * Setting the bits in the ADM corresponding to the domain dIDs assigned to the
-  mediated matrix device via its 'assign_control_domains' interface.
+  matrix mediated device via its 'assign_control_domains' interface.
+
+The linux device model precludes passing a device through to a KVM guest that
+is not bound to the device driver facilitating its pass-through. Consequently,
+an APQN that does not reference a queue device bound to the vfio_ap device
+driver will not be assigned to a KVM guest's CRYCB. The AP architecture,
+however, does not provide a means to filter individual APQNs from the guest's
+CRYCB, so the following logic is employed to filter them:
+
+* Filter the APQNs assigned to the matrix mediated device by APID.
+
+  To filter APQNs by APID, each APQN derived from the Cartesian product of the
+  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
+  examined and if any one of them does not reference a queue device bound to the
+  vfio_ap device driver, the adapter will not be plugged into the guest (i.e.,
+  the bit corresponding to its APID will not be set in the APM of the guest's
+  CRYCB).
+
+  If at least one adapter is plugged into the guest, then all domains assigned
+  to the mdev will also be plugged into the guest (i.e., the bits corresponding
+  to the APQIs of the domains assigned to the mdev will be set in the AQM field
+  of the guest's CRYCB).
+
+* Filter the APQNs assigned to the matrix mediated device by APQI.
+
+  The APQNs will be filtered by APQI if filtering by APID does not result in any
+  adapters or domains getting plugged into the guest.
+
+  To filter APQNs by APQI, each APQN derived from the Cartesian product of the
+  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
+  examined and if any one of them does not reference a queue device bound to the
+  vfio_ap device driver, the domain will not be plugged into the guest (i.e.,
+  the bit corresponding to its APQI will not be set in the AQM of the guest's
+  CRYCB).
+
+  If at least one domain is plugged into the guest, then all adapters assigned
+  to the mdev will also be plugged into the guest (i.e., the bits corresponding
+  to the APIDs of the adapters assigned to the mdev will be set in the APM field
+  of the guest's CRYCB).
 
 The CPU model features for AP
 -----------------------------
@@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model features:
    can be made available to the guest only if it is available on the host (i.e.,
    facility bit 12 is set).
 
+4. apqi: Indicates AP queue interrupts are available on the guest. This facility
+   can be made available to the guest only if it is available on the host (i.e.,
+   facility bit 65 is set).
+
 Note: If the user chooses to specify a CPU model different than the 'host'
 model to QEMU, the CPU model features and facilities need to be turned on
 explicitly; for example::
@@ -444,7 +494,7 @@ explicitly; for example::
 A guest can be precluded from using AP features/facilities by turning them off
 explicitly; for example::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
 
 Note: If the APFT facility is turned off (apft=off) for the guest, the guest
 will not see any AP devices. The zcrypt device drivers that register for type 10
@@ -530,40 +580,56 @@ These are the steps:
 
 2. Secure the AP queues to be used by the three guests so that the host can not
    access them. To secure them, there are two sysfs files that specify
-   bitmasks marking a subset of the APQN range as 'usable by the default AP
-   queue device drivers' or 'not usable by the default device drivers' and thus
-   available for use by the vfio_ap device driver'. The location of the sysfs
-   files containing the masks are::
+   bitmasks marking a subset of the APQN range as usable only by the default AP
+   queue device drivers. All remaining APQNs are available available for use by
+   any other device driver. The vfio_ap device driver is currently the only
+   non-default device driver. The location of the sysfs files containing the
+   masks are::
 
      /sys/bus/ap/apmask
      /sys/bus/ap/aqmask
 
    The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
-   (APID). Each bit in the mask, from left to right (i.e., from most significant
-   to least significant bit in big endian order), corresponds to an APID from
-   0-255. If a bit is set, the APID is marked as usable only by the default AP
-   queue device drivers; otherwise, the APID is usable by the vfio_ap
-   device driver.
+   (APID). Each bit in the mask, from left to right corresponds to an APID from
+   0-255. If a bit is set, the APID is marked as available to the default AP
+   queue device drivers.
 
    The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
-   (APQI). Each bit in the mask, from left to right (i.e., from most significant
-   to least significant bit in big endian order), corresponds to an APQI from
-   0-255. If a bit is set, the APQI is marked as usable only by the default AP
-   queue device drivers; otherwise, the APQI is usable by the vfio_ap device
-   driver.
+   (APQI). Each bit in the mask, from left to right corresponds to an APQI from
+   0-255. If a bit is set, the APQI is marked as available to the default AP
+   queue device drivers.
+
+   The Cartesian product of the APIDs corresponding to the bits set in the
+   apmask and the APQIs corresponding to the bits set in the aqmask comprise
+   the subset of APQNs that can be used only by the host default device drivers.
+   All other APQNs are available to the non-default device drivers such as the
+   vfio_ap driver.
+
+   Take, for example, the following masks::
+
+      apmask:
+      0x7d00000000000000000000000000000000000000000000000000000000000000
+
+      aqmask:
+      0x8000000000000000000000000000000000000000000000000000000000000000
+
+   The masks indicate:
 
-   Take, for example, the following mask::
+   * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
+     device drivers.
 
-      0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
+   * Domain 0 is available for use by the host default device drivers
 
-    It indicates:
+   * The subset of APQNs available for use only by the default host device
+     drivers are:
 
-      1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
-      belong to the vfio_ap device driver's pool.
+     (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
+
+   * All other APQNs are available for use by the non-default device drivers.
 
    The APQN of each AP queue device assigned to the linux host is checked by the
-   AP bus against the set of APQNs derived from the cross product of APIDs
-   and APQIs marked as usable only by the default AP queue device drivers. If a
+   AP bus against the set of APQNs derived from the Cartesian product of APIDs
+   and APQIs marked as available to the default AP queue device drivers. If a
    match is detected,  only the default AP queue device drivers will be probed;
    otherwise, the vfio_ap device driver will be probed.
 
@@ -627,6 +693,16 @@ These are the steps:
 	    default drivers pool:    adapter 0-15, domain 1
 	    alternate drivers pool:  adapter 16-255, domains 0, 2-255
 
+   Note ***:
+   Changing a mask such that one or more APQNs will be taken from a matrix
+   mediated device (see below) will fail with an error (EADDRINUSE). The error
+   is logged to the kernel ring buffer which can be viewed with the 'dmesg'
+   command. The output identifies each APQN flagged as 'in use' and the matrix
+   mediated device to which it is assigned; for example:
+
+   Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
+   Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
+
 Securing the APQNs for our example
 ----------------------------------
    To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
@@ -684,7 +760,7 @@ Securing the APQNs for our example
 
      /sys/devices/vfio_ap/matrix/
      --- [mdev_supported_types]
-     ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+     ------ [vfio_ap-passthrough] (passthrough matrix mediated device type)
      --------- create
      --------- [devices]
 
@@ -775,17 +851,18 @@ Securing the APQNs for our example
      higher than the maximum is specified, the operation will terminate with
      an error (ENODEV).
 
-   * All APQNs that can be derived from the adapter ID and the IDs of
-     the previously assigned domains must be bound to the vfio_ap device
-     driver. If no domains have yet been assigned, then there must be at least
-     one APQN with the specified APID bound to the vfio_ap driver. If no such
-     APQNs are bound to the driver, the operation will terminate with an
-     error (EADDRNOTAVAIL).
+   * All APQNs that can be derived from the Cartesian product of the APID of the
+     adapter being assigned and the APQIs of the previously assigned domains
+     must be available to the vfio_ap device driver as specified in the sysfs
+     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
+     is reserved for use by the host device driver, the operation will terminate
+     with an error (EADDRNOTAVAIL).
 
-     No APQN that can be derived from the adapter ID and the IDs of the
-     previously assigned domains can be assigned to another mediated matrix
-     device. If an APQN is assigned to another mediated matrix device, the
-     operation will terminate with an error (EADDRINUSE).
+   * No APQN that can be derived from the Cartesian product of the APID of the
+     adapter being assigned and the APQIs of the previously assigned domains can
+     be assigned to another matrix mediated device. If even one APQN is assigned
+     to another matrix mediated device, the operation will terminate with an
+     error (EADDRINUSE).
 
    In order to successfully assign a domain:
 
@@ -794,17 +871,18 @@ Securing the APQNs for our example
      higher than the maximum is specified, the operation will terminate with
      an error (ENODEV).
 
-   * All APQNs that can be derived from the domain ID and the IDs of
-     the previously assigned adapters must be bound to the vfio_ap device
-     driver. If no domains have yet been assigned, then there must be at least
-     one APQN with the specified APQI bound to the vfio_ap driver. If no such
-     APQNs are bound to the driver, the operation will terminate with an
-     error (EADDRNOTAVAIL).
+   * All APQNs that can be derived from the Cartesian product of the APQI of the
+     domain being assigned and the APIDs of the previously assigned adapters
+     must be available to the vfio_ap device driver as specified in the sysfs
+     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
+     is reserved for use by the host device driver, the operation will terminate
+     with an error (EADDRNOTAVAIL).
 
-     No APQN that can be derived from the domain ID and the IDs of the
-     previously assigned adapters can be assigned to another mediated matrix
-     device. If an APQN is assigned to another mediated matrix device, the
-     operation will terminate with an error (EADDRINUSE).
+   * No APQN that can be derived from the Cartesian product of the APQI of the
+     domain being assigned and the APIDs of the previously assigned adapters can
+     be assigned to another matrix mediated device. If even one APQN is assigned
+     to another matrix mediated device, the operation will terminate with an
+     error (EADDRINUSE).
 
    In order to successfully assign a control domain, the domain number
    specified must represent a value from 0 up to the maximum domain number
@@ -813,22 +891,22 @@ Securing the APQNs for our example
 
 5. Start Guest1::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
 
 7. Start Guest2::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
 
 7. Start Guest3::
 
-     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
+     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
 	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
 
-When the guest is shut down, the mediated matrix devices may be removed.
+When the guest is shut down, the matrix mediated devices may be removed.
 
-Using our example again, to remove the mediated matrix device $uuid1::
+Using our example again, to remove the matrix mediated device $uuid1::
 
    /sys/devices/vfio_ap/matrix/
       --- [mdev_supported_types]
@@ -851,16 +929,146 @@ remove it if no guest will use it during the remaining lifetime of the linux
 host. If the mdev matrix device is removed, one may want to also reconfigure
 the pool of adapters and queues reserved for use by the default drivers.
 
+Hot plug support:
+================
+An adapter, domain or control domain may be hot plugged into a running KVM
+guest by assigning it to the matrix mediated device being used by the guest.
+Control domains will always be hot plugged; however, an adapter or domain will
+be hot plugged only if each new APQN resulting from its assignment
+references a queue device bound to the vfio_ap device driver as described
+below.
+
+When an adapter is assigned to a matrix mediated device in use by a KVM guest:
+
+* If no domains have yet been plugged into the KVM guest:
+
+  Hot plug the adapter and every domain previously assigned to the mdev if each
+  APQN derived from the Cartesian product of the APID of the adapter being
+  assigned and the APQIs of the domains previously assigned references a queue
+  device bound to the vfio_ap device driver.
+
+* If one or more domains have previously been plugged into the guest:
+
+  Hot plug the adapter if each APQN derived from the Cartesian product of the
+  APID of the adapter being assigned and the APQIs of the domains already
+  plugged into the guest references a queue device bound to the vfio_ap device
+  driver.
+
+When a domain is assigned to a matrix mediated device in use by a KVM guest:
+
+* If no adapters have yet been plugged into the KVM guest:
+
+  Hot plug the domain and every adapter previously assigned to the mdev if each
+  APQN derived from the Cartesian product of the APIDs of the adapters
+  previously assigned and the APQI of the domain being assigned references a
+  queue device bound to the vfio_ap device driver.
+
+* If one or more adapters have previously been plugged into the guest:
+
+  Hot plug the domain if each APQN derived from the Cartesian product of the
+  APIDs of the adapters already plugged into the guest and the APQI of the
+  domain being assigned references a queue device bound to the vfio_ap device
+  driver.
+
+Over-provisioning of AP queues for a KVM guest:
+==============================================
+Over-provisioning is defined herein as the assignment of adapters or domains to
+a matrix mediated device that do not reference AP devices in the host's AP
+configuration. The idea here is that when the adapter or domain becomes
+available, it will be automatically hot-plugged into the KVM guest using
+the matrix mediated device to which it is assigned as long as each new APQN
+resulting from plugging it in references a queue device bound to the vfio_ap
+device driver.
+
 Limitations
 ===========
-* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
-  to the default drivers pool of a queue that is still assigned to a mediated
-  device in use by a guest. It is incumbent upon the administrator to
-  ensure there is no mediated device in use by a guest to which the APQN is
-  assigned lest the host be given access to the private data of the AP queue
-  device such as a private key configured specifically for the guest.
+Live guest migration is not supported for guests using AP devices without
+intervention by a system administrator. Before a KVM guest can be migrated,
+the matrix mediated device must be removed. Unfortunately, it can not be
+removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
+the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
+its mdev can be hot unplugged from the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
+   the following commands:
+
+      virsh detach-device <guestname> <path-to-device-xml>
+
+      For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
+      the guest named 'my-guest':
+
+         virsh detach-device my-guest ~/config/my-guest-hostdev.xml
+
+            The contents of my-guest-hostdev.xml:
+
+            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+              <source>
+                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+              </source>
+            </hostdev>
+
+
+      virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
+
+      For example, to hot unplug the matrix mediated device identified on the
+      qemu command line with 'id=hostdev0' from the guest named 'my-guest':
+
+         virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
+
+2. A matrix mediated device can be hot unplugged by attaching the qemu monitor
+   to the guest and using the following qemu monitor command:
+
+      (QEMU) device-del id=<device-id>
+
+      For example, to hot unplug the matrix mediated device that was specified
+      on the qemu command line with 'id=hostdev0' when the guest was started:
+
+         (QEMU) device-del id=hostdev0
+
+After live migration of the KVM guest completes, an AP configuration can be
+restored to the KVM guest by hot plugging a matrix mediated device on the target
+system into the guest in one of two ways:
+
+1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
+   device into the guest via the following virsh commands:
+
+   virsh attach-device <guestname> <path-to-device-xml>
+
+      For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
+      the guest named 'my-guest':
+
+         virsh attach-device my-guest ~/config/my-guest-hostdev.xml
+
+            The contents of my-guest-hostdev.xml:
+
+            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
+              <source>
+                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
+              </source>
+            </hostdev>
+
+
+   virsh qemu-monitor-command <guest-name> --hmp \
+   "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+
+      For example, to hot plug the matrix mediated device
+      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
+      device-id hostdev0:
+
+      virsh qemu-monitor-command my-guest --hmp \
+      "device_add vfio-ap,\
+      sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+      id=hostdev0"
+
+2. A matrix mediated device can be hot plugged by attaching the qemu monitor
+   to the guest and using the following qemu monitor command:
+
+      (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
 
-* Dynamically modifying the AP matrix for a running guest (which would amount to
-  hot(un)plug of AP devices for the guest) is currently not supported
+      For example, to plug the matrix mediated device 
+      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
+      hostdev0:
 
-* Live guest migration is not supported for guests using AP devices.
+         (QEMU) device-add "vfio-ap,\
+         sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
+         id=hostdev0"
\ No newline at end of file
-- 
2.21.1


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module
  2020-08-21 19:56 ` [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module Tony Krowiak
@ 2020-08-25 10:04   ` Cornelia Huck
  2020-08-26 14:49     ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-08-25 10:04 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:01 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's set a version for the vfio_ap module so that automated regression
> tests can determine whether dynamic configuration tests can be run or
> not.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index be2520cc010b..f4ceb380dd61 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -17,10 +17,12 @@
>  
>  #define VFIO_AP_ROOT_NAME "vfio_ap"
>  #define VFIO_AP_DEV_NAME "matrix"
> +#define VFIO_AP_MODULE_VERSION "1.2.0"
>  
>  MODULE_AUTHOR("IBM Corporation");
>  MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2018");
>  MODULE_LICENSE("GPL v2");
> +MODULE_VERSION(VFIO_AP_MODULE_VERSION);
>  
>  static struct ap_driver vfio_ap_drv;
>  

Setting a version manually has some drawbacks:
- tools wanting to check for capabilities need to keep track which
  versions support which features
- you need to remember to actually bump the version when adding a new,
  visible feature
(- selective downstream backports may get into a pickle, but that's
arguably not your problem)

Is there no way for a tool to figure out whether this is supported?
E.g., via existence of a sysfs file, or via a known error that will
occur. If not, it's maybe better to expose known capabilities via a
generic interface.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-21 19:56 ` [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
@ 2020-08-25 10:13   ` Cornelia Huck
  2020-08-27 14:24     ` Tony Krowiak
  2020-09-04  8:11   ` Christian Borntraeger
  2020-09-25  2:27   ` Halil Pasic
  2 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-08-25 10:13 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot

On Fri, 21 Aug 2020 15:56:02 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> This patch refactor's the vfio_ap device driver to use the AP bus's

s/refactor's/refactors/

> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> information about a queue that is bound to the vfio_ap device driver.
> The bus's ap_get_qdev() function retrieves the queue device from a
> hashtable keyed by APQN. This is much more efficient than looping over
> the list of devices attached to the AP bus by several orders of
> magnitude.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     | 27 ++-------
>  drivers/s390/crypto/vfio_ap_ops.c     | 86 +++++++++++++++------------
>  drivers/s390/crypto/vfio_ap_private.h |  8 ++-
>  3 files changed, 59 insertions(+), 62 deletions(-)
> 

(...)

> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index e0bde8518745..ad3925f04f61 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,43 +26,26 @@
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  
> -static int match_apqn(struct device *dev, const void *data)
> -{
> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
>  /**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> - * @matrix_mdev: the associated mediated matrix
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>   * @apqn: The queue APQN
>   *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
> + * the AP bus.
>   *
> - * Returns the pointer to the associated vfio_ap_queue
> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> -					struct ap_matrix_mdev *matrix_mdev,
> -					int apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>  {
> +	struct ap_queue *queue;
>  	struct vfio_ap_queue *q;
> -	struct device *dev;
>  
> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> -		return NULL;
> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))

I think you should add some explanation to the patch description why
testing the matrix bitmasks is not needed anymore.

> +	queue = ap_get_qdev(apqn);
> +	if (!queue)
>  		return NULL;
>  
> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				 &apqn, match_apqn);
> -	if (!dev)
> -		return NULL;
> -	q = dev_get_drvdata(dev);
> -	q->matrix_mdev = matrix_mdev;
> -	put_device(dev);
> +	q = dev_get_drvdata(&queue->ap_dev.device);
> +	put_device(&queue->ap_dev.device);
>  
>  	return q;
>  }

(...)


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-08-21 19:56 ` [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
@ 2020-08-25 10:25   ` Cornelia Huck
  2020-08-28 23:05     ` Tony Krowiak
  2020-09-04  8:15   ` Christian Borntraeger
  2020-09-25  7:58   ` Halil Pasic
  2 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-08-25 10:25 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:03 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's create links between each queue device bound to the vfio_ap device
> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 
> The links will be created as follows:
> 
>    * When the queue device is probed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be linked.
> 
>    * When an adapter or domain is assigned to a matrix mdev, for each new
>      APQN assigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>    * When the queue device is removed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be unlinked.
> 
>    * When an adapter or domain is unassigned from a matrix mdev, for each
>      APQN unassigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be unlinked.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 132 +++++++++++++++++++++++++-
>  drivers/s390/crypto/vfio_ap_private.h |   2 +
>  2 files changed, 129 insertions(+), 5 deletions(-)
> 

(...)

> @@ -548,6 +557,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>  	return 0;
>  }
>  
> +enum qlink_type {

<bikeshed>I think this is less of a type, and more of an action, so
maybe call this 'qlink_action' (and the function parameter below
'action'?)</bikeshed>

> +	LINK_APID,
> +	LINK_APQI,
> +	UNLINK_APID,
> +	UNLINK_APQI,
> +};
> +
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> +				    unsigned long apid, unsigned long apqi)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> +	if (q) {
> +		q->matrix_mdev = matrix_mdev;
> +		hash_add(matrix_mdev->qtable,
> +			 &q->mdev_qnode, q->apqn);
> +	}
> +}
> +
> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> +	if (q) {
> +		q->matrix_mdev = NULL;
> +		hash_del(&q->mdev_qnode);
> +	}
> +}
> +
> +/**
> + * vfio_ap_mdev_link_queues
> + *
> + * @matrix_mdev: The matrix mdev to link.
> + * @type:	 The type of @qlink_id.
> + * @qlink_id:	 The APID or APQI of the queues to link.
> + *
> + * Sets or clears the links between the queues with the specified @qlink_id
> + * and the @matrix_mdev:
> + *     @type == LINK_APID: Set the links between the @matrix_mdev and the
> + *                         queues with the specified @qlink_id (APID)
> + *     @type == LINK_APQI: Set the links between the @matrix_mdev and the
> + *                         queues with the specified @qlink_id (APQI)
> + *     @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
> + *                           queues with the specified @qlink_id (APID)
> + *     @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
> + *                           queues with the specified @qlink_id (APQI)
> + */
> +static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
> +				     enum qlink_type type,
> +				     unsigned long qlink_id)
> +{
> +	unsigned long id;
> +
> +	switch (type) {
> +	case LINK_APID:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +				     matrix_mdev->matrix.aqm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
> +		break;
> +	case UNLINK_APID:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +				     matrix_mdev->matrix.aqm_max + 1)
> +			vfio_ap_mdev_unlink_queue(qlink_id, id);
> +		break;
> +	case LINK_APQI:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +				     matrix_mdev->matrix.apm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> +		break;
> +	case UNLINK_APQI:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +				     matrix_mdev->matrix.apm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +	}
> +}
> +

(...)

I have not reviewed this deeply, but at a glance, it seems fine.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support
  2020-08-21 19:56 ` [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
@ 2020-08-25 10:45   ` Cornelia Huck
  2020-08-31 18:34     ` Tony Krowiak
  2020-09-28  2:48   ` Halil Pasic
  1 sibling, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-08-25 10:45 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:16 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Update the documentation in vfio-ap.rst to include information about the
> AP dynamic configuration support (i.e., hot plug of adapters, domains
> and control domains via the matrix mediated device's sysfs assignment
> attributes).
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  Documentation/s390/vfio-ap.rst | 362 ++++++++++++++++++++++++++-------
>  1 file changed, 285 insertions(+), 77 deletions(-)
> 
> diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
> index e15436599086..8907aeca8fb7 100644
> --- a/Documentation/s390/vfio-ap.rst
> +++ b/Documentation/s390/vfio-ap.rst
> @@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM guest is:
>  1. The administrator loads the vfio_ap device driver
>  2. The vfio-ap driver during its initialization will register a single 'matrix'
>     device with the device core. This will serve as the parent device for
> -   all mediated matrix devices used to configure an AP matrix for a guest.
> +   all matrix mediated devices used to configure an AP matrix for a guest.

This (and many other changes here) seems to be unrelated to the new
feature. Split that out into a separate patch that can be applied right
away? That would make this patch smaller and easier to review; it's
hard to figure out which parts deal with the new feature, and which parts
simply got an update.

Also, do you want to do similar wording changes in the QEMU
documentation for vfio-ap?

>  3. The /sys/devices/vfio_ap/matrix device is created by the device core
>  4. The vfio_ap device driver will register with the AP bus for AP queue devices
>     of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap

(...)

> @@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model features:
>     can be made available to the guest only if it is available on the host (i.e.,
>     facility bit 12 is set).
>  
> +4. apqi: Indicates AP queue interrupts are available on the guest. This facility
> +   can be made available to the guest only if it is available on the host (i.e.,
> +   facility bit 65 is set).
> +
>  Note: If the user chooses to specify a CPU model different than the 'host'
>  model to QEMU, the CPU model features and facilities need to be turned on
>  explicitly; for example::
> @@ -444,7 +494,7 @@ explicitly; for example::
>  A guest can be precluded from using AP features/facilities by turning them off
>  explicitly; for example::
>  
> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off

Isn't that an already existing facility that was simply lacking
documentation? If yes, split it off?

>  
>  Note: If the APFT facility is turned off (apft=off) for the guest, the guest
>  will not see any AP devices. The zcrypt device drivers that register for type 10

(...)


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module
  2020-08-25 10:04   ` Cornelia Huck
@ 2020-08-26 14:49     ` Tony Krowiak
  2020-08-27 10:32       ` Cornelia Huck
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-26 14:49 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 8/25/20 6:04 AM, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 15:56:01 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Let's set a version for the vfio_ap module so that automated regression
>> tests can determine whether dynamic configuration tests can be run or
>> not.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index be2520cc010b..f4ceb380dd61 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -17,10 +17,12 @@
>>   
>>   #define VFIO_AP_ROOT_NAME "vfio_ap"
>>   #define VFIO_AP_DEV_NAME "matrix"
>> +#define VFIO_AP_MODULE_VERSION "1.2.0"
>>   
>>   MODULE_AUTHOR("IBM Corporation");
>>   MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2018");
>>   MODULE_LICENSE("GPL v2");
>> +MODULE_VERSION(VFIO_AP_MODULE_VERSION);
>>   
>>   static struct ap_driver vfio_ap_drv;
>>   
> Setting a version manually has some drawbacks:
> - tools wanting to check for capabilities need to keep track which
>    versions support which features
> - you need to remember to actually bump the version when adding a new,
>    visible feature
> (- selective downstream backports may get into a pickle, but that's
> arguably not your problem)
>
> Is there no way for a tool to figure out whether this is supported?
> E.g., via existence of a sysfs file, or via a known error that will
> occur. If not, it's maybe better to expose known capabilities via a
> generic interface.

This patch series introduces a new mediated device sysfs attribute,
guest_matrix, so the automated tests could check for the existence
of that interface. The problem I have with that is it will work for
this version of the vfio_ap device driver - which may be all that is
ever needed - but does not account for future enhancements
which may need to be detected by tooling or automated tests.
It seems to me that regardless of how a tool detects whether
a feature is supported or not, it will have to keep track of that
somehow.

Can you provide more details about this generic interface of
which you speak?

>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module
  2020-08-26 14:49     ` Tony Krowiak
@ 2020-08-27 10:32       ` Cornelia Huck
  2020-08-27 14:39         ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-08-27 10:32 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Wed, 26 Aug 2020 10:49:47 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 8/25/20 6:04 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:01 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> Let's set a version for the vfio_ap module so that automated regression
> >> tests can determine whether dynamic configuration tests can be run or
> >> not.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> >> ---
> >>   drivers/s390/crypto/vfio_ap_drv.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> >> index be2520cc010b..f4ceb380dd61 100644
> >> --- a/drivers/s390/crypto/vfio_ap_drv.c
> >> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> >> @@ -17,10 +17,12 @@
> >>   
> >>   #define VFIO_AP_ROOT_NAME "vfio_ap"
> >>   #define VFIO_AP_DEV_NAME "matrix"
> >> +#define VFIO_AP_MODULE_VERSION "1.2.0"
> >>   
> >>   MODULE_AUTHOR("IBM Corporation");
> >>   MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2018");
> >>   MODULE_LICENSE("GPL v2");
> >> +MODULE_VERSION(VFIO_AP_MODULE_VERSION);
> >>   
> >>   static struct ap_driver vfio_ap_drv;
> >>     
> > Setting a version manually has some drawbacks:
> > - tools wanting to check for capabilities need to keep track which
> >    versions support which features
> > - you need to remember to actually bump the version when adding a new,
> >    visible feature
> > (- selective downstream backports may get into a pickle, but that's
> > arguably not your problem)
> >
> > Is there no way for a tool to figure out whether this is supported?
> > E.g., via existence of a sysfs file, or via a known error that will
> > occur. If not, it's maybe better to expose known capabilities via a
> > generic interface.  
> 
> This patch series introduces a new mediated device sysfs attribute,
> guest_matrix, so the automated tests could check for the existence
> of that interface. The problem I have with that is it will work for
> this version of the vfio_ap device driver - which may be all that is
> ever needed - but does not account for future enhancements
> which may need to be detected by tooling or automated tests.
> It seems to me that regardless of how a tool detects whether
> a feature is supported or not, it will have to keep track of that
> somehow.

Which enhancements? If you change the interface in an incompatible way,
you have a different problem anyway. If someone trying to use the
enhanced version of the interface gets an error on a kernel providing
an older version of the interface, that's a reasonable way to discover
support.

I think "discover device driver capabilities by probing" is less
burdensome and error prone than trying to match up capabilities with a
version number. If you expose a version number, a tool would still have
to probe that version number, and then consult with a list of features
per version, which can easily go out of sync.

> Can you provide more details about this generic interface of
> which you speak?

If that is really needed, I'd probably do a driver sysfs attribute that
exposes a list of documented capabilities (as integer values, or as a
bit.) But since tools can simply check for guest_matrix to find out
about support for this feature here, it seems like overkill to me --
unless you have a multitude of features waiting in queue that need to
be made discoverable.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-25 10:13   ` Cornelia Huck
@ 2020-08-27 14:24     ` Tony Krowiak
  2020-08-28  8:13       ` Cornelia Huck
  2020-09-25  2:11       ` Halil Pasic
  0 siblings, 2 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-27 14:24 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 8/25/20 6:13 AM, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 15:56:02 -0400
> Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
>
>> This patch refactor's the vfio_ap device driver to use the AP bus's
> s/refactor's/refactors/

Of course, what was I thinking?:)

>> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
>> information about a queue that is bound to the vfio_ap device driver.
>> The bus's ap_get_qdev() function retrieves the queue device from a
>> hashtable keyed by APQN. This is much more efficient than looping over
>> the list of devices attached to the AP bus by several orders of
>> magnitude.
>>
>> Signed-off-by: Tony Krowiak<akrowiak@linux.ibm.com>
>> Reported-by: kernel test robot<lkp@intel.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     | 27 ++-------
>>   drivers/s390/crypto/vfio_ap_ops.c     | 86 +++++++++++++++------------
>>   drivers/s390/crypto/vfio_ap_private.h |  8 ++-
>>   3 files changed, 59 insertions(+), 62 deletions(-)
>>
> (...)
>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index e0bde8518745..ad3925f04f61 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -26,43 +26,26 @@
>>   
>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>   
>> -static int match_apqn(struct device *dev, const void *data)
>> -{
>> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
>> -
>> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
>> -}
>> -
>>   /**
>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>> - * @matrix_mdev: the associated mediated matrix
>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>>    * @apqn: The queue APQN
>>    *
>> - * Retrieve a queue with a specific APQN from the list of the
>> - * devices of the vfio_ap_drv.
>> - * Verify that the APID and the APQI are set in the matrix.
>> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
>> + * the AP bus.
>>    *
>> - * Returns the pointer to the associated vfio_ap_queue
>> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>>    */
>> -static struct vfio_ap_queue *vfio_ap_get_queue(
>> -					struct ap_matrix_mdev *matrix_mdev,
>> -					int apqn)
>> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>>   {
>> +	struct ap_queue *queue;
>>   	struct vfio_ap_queue *q;
>> -	struct device *dev;
>>   
>> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>> -		return NULL;
>> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> I think you should add some explanation to the patch description why
> testing the matrix bitmasks is not needed anymore.

As a result of this comment, I took a closer look at the code to
determine the reason for eliminating the matrix_mdev
parameter. The reason is because the code below (i.e., find the device
and get the driver data) was also repeated in the vfio_ap_irq_disable_apqn()
function, so I replaced it with a call to the function above; however, the
vfio_ap_irq_disable_apqn() function  does not have a reference to the
matrix_mdev, so I eliminated the matrix_mdev parameter. Note that the
vfio_ap_irq_disable_apqn() is called for each APQN assigned to a matrix
mdev, so there is no need to test the bitmasks there.

The other place from which the function above is called is
the handle_pqap() function which does have a reference to the
matrix_mdev. In order to ensure the integrity of the instruction
being intercepted - i.e., PQAP(AQIC) enable/disable IRQ for aN
AP queue - the testing of the matrix bitmasks probably ought to
be performed, so it will be done there instead of in the
vfio_ap_get_queue() function above.


> +	queue = ap_get_qdev(apqn);
> +	if (!queue)
>   		return NULL;
>   
> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				 &apqn, match_apqn);
> -	if (!dev)
> -		return NULL;
> -	q = dev_get_drvdata(dev);
> -	q->matrix_mdev = matrix_mdev;
> -	put_device(dev);
> +	q = dev_get_drvdata(&queue->ap_dev.device);
> +	put_device(&queue->ap_dev.device);
>   
>   	return q;
>   }
> (...)
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module
  2020-08-27 10:32       ` Cornelia Huck
@ 2020-08-27 14:39         ` Tony Krowiak
  2020-08-28  8:10           ` Cornelia Huck
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-08-27 14:39 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 8/27/20 6:32 AM, Cornelia Huck wrote:
> On Wed, 26 Aug 2020 10:49:47 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 8/25/20 6:04 AM, Cornelia Huck wrote:
>>> On Fri, 21 Aug 2020 15:56:01 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> Let's set a version for the vfio_ap module so that automated regression
>>>> tests can determine whether dynamic configuration tests can be run or
>>>> not.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/crypto/vfio_ap_drv.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>>>> index be2520cc010b..f4ceb380dd61 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>>> @@ -17,10 +17,12 @@
>>>>    
>>>>    #define VFIO_AP_ROOT_NAME "vfio_ap"
>>>>    #define VFIO_AP_DEV_NAME "matrix"
>>>> +#define VFIO_AP_MODULE_VERSION "1.2.0"
>>>>    
>>>>    MODULE_AUTHOR("IBM Corporation");
>>>>    MODULE_DESCRIPTION("VFIO AP device driver, Copyright IBM Corp. 2018");
>>>>    MODULE_LICENSE("GPL v2");
>>>> +MODULE_VERSION(VFIO_AP_MODULE_VERSION);
>>>>    
>>>>    static struct ap_driver vfio_ap_drv;
>>>>      
>>> Setting a version manually has some drawbacks:
>>> - tools wanting to check for capabilities need to keep track which
>>>     versions support which features
>>> - you need to remember to actually bump the version when adding a new,
>>>     visible feature
>>> (- selective downstream backports may get into a pickle, but that's
>>> arguably not your problem)
>>>
>>> Is there no way for a tool to figure out whether this is supported?
>>> E.g., via existence of a sysfs file, or via a known error that will
>>> occur. If not, it's maybe better to expose known capabilities via a
>>> generic interface.
>> This patch series introduces a new mediated device sysfs attribute,
>> guest_matrix, so the automated tests could check for the existence
>> of that interface. The problem I have with that is it will work for
>> this version of the vfio_ap device driver - which may be all that is
>> ever needed - but does not account for future enhancements
>> which may need to be detected by tooling or automated tests.
>> It seems to me that regardless of how a tool detects whether
>> a feature is supported or not, it will have to keep track of that
>> somehow.
> Which enhancements? If you change the interface in an incompatible way,
> you have a different problem anyway. If someone trying to use the
> enhanced version of the interface gets an error on a kernel providing
> an older version of the interface, that's a reasonable way to discover
> support.
>
> I think "discover device driver capabilities by probing" is less
> burdensome and error prone than trying to match up capabilities with a
> version number. If you expose a version number, a tool would still have
> to probe that version number, and then consult with a list of features
> per version, which can easily go out of sync.
>
>> Can you provide more details about this generic interface of
>> which you speak?
> If that is really needed, I'd probably do a driver sysfs attribute that
> exposes a list of documented capabilities (as integer values, or as a
> bit.) But since tools can simply check for guest_matrix to find out
> about support for this feature here, it seems like overkill to me --
> unless you have a multitude of features waiting in queue that need to
> be made discoverable.

Currently there are two tools that probably need to be aware of
the changes to these assignment interfaces:
* The hades test framework has tests that will fail if run against
    these patches that should be skipped if over-provisioning is
    allowed. There are also tests under development to test the
    function introduced by these patches that will fail if run against
    an older version of the driver. These tests should be skipped in
    that case.
* There is a tool under development for configuring AP matrix
    mediated devices that probably need to be aware of the change
    introduced by this series.

Since a tool would have to first determine whether a new sysfs
interface documenting facilities is available and it would only
expose one facility at this point, it seems reasonable for these tools
to check for the sysfs guest_matrix attribute to discern whether
over-provisioning is available or not. I'll go ahead and remove this
patch from the series.

>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module
  2020-08-27 14:39         ` Tony Krowiak
@ 2020-08-28  8:10           ` Cornelia Huck
  0 siblings, 0 replies; 79+ messages in thread
From: Cornelia Huck @ 2020-08-28  8:10 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Thu, 27 Aug 2020 10:39:07 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Currently there are two tools that probably need to be aware of
> the changes to these assignment interfaces:
> * The hades test framework has tests that will fail if run against
>     these patches that should be skipped if over-provisioning is
>     allowed. There are also tests under development to test the
>     function introduced by these patches that will fail if run against
>     an older version of the driver. These tests should be skipped in
>     that case.
> * There is a tool under development for configuring AP matrix
>     mediated devices that probably need to be aware of the change
>     introduced by this series.
> 
> Since a tool would have to first determine whether a new sysfs
> interface documenting facilities is available and it would only
> expose one facility at this point, it seems reasonable for these tools
> to check for the sysfs guest_matrix attribute to discern whether
> over-provisioning is available or not. I'll go ahead and remove this
> patch from the series.

Thanks for the explanation, that seems reasonable to me.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-27 14:24     ` Tony Krowiak
@ 2020-08-28  8:13       ` Cornelia Huck
  2020-08-28 15:10         ` Tony Krowiak
  2020-09-25  2:11       ` Halil Pasic
  1 sibling, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-08-28  8:13 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot

On Thu, 27 Aug 2020 10:24:07 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 8/25/20 6:13 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:02 -0400
> > Tony Krowiak<akrowiak@linux.ibm.com>  wrote:

> >>   /**
> >> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> >> - * @matrix_mdev: the associated mediated matrix
> >> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> >>    * @apqn: The queue APQN
> >>    *
> >> - * Retrieve a queue with a specific APQN from the list of the
> >> - * devices of the vfio_ap_drv.
> >> - * Verify that the APID and the APQI are set in the matrix.
> >> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
> >> + * the AP bus.
> >>    *
> >> - * Returns the pointer to the associated vfio_ap_queue
> >> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
> >>    */
> >> -static struct vfio_ap_queue *vfio_ap_get_queue(
> >> -					struct ap_matrix_mdev *matrix_mdev,
> >> -					int apqn)
> >> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
> >>   {
> >> +	struct ap_queue *queue;
> >>   	struct vfio_ap_queue *q;
> >> -	struct device *dev;
> >>   
> >> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >> -		return NULL;
> >> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))  
> > I think you should add some explanation to the patch description why
> > testing the matrix bitmasks is not needed anymore.  
> 
> As a result of this comment, I took a closer look at the code to
> determine the reason for eliminating the matrix_mdev
> parameter. The reason is because the code below (i.e., find the device
> and get the driver data) was also repeated in the vfio_ap_irq_disable_apqn()
> function, so I replaced it with a call to the function above; however, the
> vfio_ap_irq_disable_apqn() function  does not have a reference to the
> matrix_mdev, so I eliminated the matrix_mdev parameter. Note that the
> vfio_ap_irq_disable_apqn() is called for each APQN assigned to a matrix
> mdev, so there is no need to test the bitmasks there.
> 
> The other place from which the function above is called is
> the handle_pqap() function which does have a reference to the
> matrix_mdev. In order to ensure the integrity of the instruction
> being intercepted - i.e., PQAP(AQIC) enable/disable IRQ for aN
> AP queue - the testing of the matrix bitmasks probably ought to
> be performed, so it will be done there instead of in the
> vfio_ap_get_queue() function above.

Should you add a comment that vfio_ap_get_queue() assumes that the
caller makes sure that this is only called for APQNs that are assigned
to a matrix?

> 
> 
> > +	queue = ap_get_qdev(apqn);
> > +	if (!queue)
> >   		return NULL;
> >   
> > -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> > -				 &apqn, match_apqn);
> > -	if (!dev)
> > -		return NULL;
> > -	q = dev_get_drvdata(dev);
> > -	q->matrix_mdev = matrix_mdev;
> > -	put_device(dev);
> > +	q = dev_get_drvdata(&queue->ap_dev.device);
> > +	put_device(&queue->ap_dev.device);
> >   
> >   	return q;
> >   }
> > (...)
> >  
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-28  8:13       ` Cornelia Huck
@ 2020-08-28 15:10         ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-28 15:10 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 8/28/20 4:13 AM, Cornelia Huck wrote:
> On Thu, 27 Aug 2020 10:24:07 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 8/25/20 6:13 AM, Cornelia Huck wrote:
>>> On Fri, 21 Aug 2020 15:56:02 -0400
>>> Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
>>>>    /**
>>>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>>>> - * @matrix_mdev: the associated mediated matrix
>>>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>>>>     * @apqn: The queue APQN
>>>>     *
>>>> - * Retrieve a queue with a specific APQN from the list of the
>>>> - * devices of the vfio_ap_drv.
>>>> - * Verify that the APID and the APQI are set in the matrix.
>>>> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
>>>> + * the AP bus.
>>>>     *
>>>> - * Returns the pointer to the associated vfio_ap_queue
>>>> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>>>>     */
>>>> -static struct vfio_ap_queue *vfio_ap_get_queue(
>>>> -					struct ap_matrix_mdev *matrix_mdev,
>>>> -					int apqn)
>>>> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>>>>    {
>>>> +	struct ap_queue *queue;
>>>>    	struct vfio_ap_queue *q;
>>>> -	struct device *dev;
>>>>    
>>>> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>>>> -		return NULL;
>>>> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>>> I think you should add some explanation to the patch description why
>>> testing the matrix bitmasks is not needed anymore.
>> As a result of this comment, I took a closer look at the code to
>> determine the reason for eliminating the matrix_mdev
>> parameter. The reason is because the code below (i.e., find the device
>> and get the driver data) was also repeated in the vfio_ap_irq_disable_apqn()
>> function, so I replaced it with a call to the function above; however, the
>> vfio_ap_irq_disable_apqn() function  does not have a reference to the
>> matrix_mdev, so I eliminated the matrix_mdev parameter. Note that the
>> vfio_ap_irq_disable_apqn() is called for each APQN assigned to a matrix
>> mdev, so there is no need to test the bitmasks there.
>>
>> The other place from which the function above is called is
>> the handle_pqap() function which does have a reference to the
>> matrix_mdev. In order to ensure the integrity of the instruction
>> being intercepted - i.e., PQAP(AQIC) enable/disable IRQ for aN
>> AP queue - the testing of the matrix bitmasks probably ought to
>> be performed, so it will be done there instead of in the
>> vfio_ap_get_queue() function above.
> Should you add a comment that vfio_ap_get_queue() assumes that the
> caller makes sure that this is only called for APQNs that are assigned
> to a matrix?

I suppose it wouldn't hurt.

>
>>
>>> +	queue = ap_get_qdev(apqn);
>>> +	if (!queue)
>>>    		return NULL;
>>>    
>>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>> -				 &apqn, match_apqn);
>>> -	if (!dev)
>>> -		return NULL;
>>> -	q = dev_get_drvdata(dev);
>>> -	q->matrix_mdev = matrix_mdev;
>>> -	put_device(dev);
>>> +	q = dev_get_drvdata(&queue->ap_dev.device);
>>> +	put_device(&queue->ap_dev.device);
>>>    
>>>    	return q;
>>>    }
>>> (...)
>>>   


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-08-25 10:25   ` Cornelia Huck
@ 2020-08-28 23:05     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-28 23:05 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 8/25/20 6:25 AM, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 15:56:03 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Let's create links between each queue device bound to the vfio_ap device
>> driver and the matrix mdev to which the queue is assigned. The idea is to
>> facilitate efficient retrieval of the objects representing the queue
>> devices and matrix mdevs as well as to verify that a queue assigned to
>> a matrix mdev is bound to the driver.
>>
>> The links will be created as follows:
>>
>>     * When the queue device is probed, if its APQN is assigned to a matrix
>>       mdev, the structures representing the queue device and the matrix mdev
>>       will be linked.
>>
>>     * When an adapter or domain is assigned to a matrix mdev, for each new
>>       APQN assigned that references a queue device bound to the vfio_ap
>>       device driver, the structures representing the queue device and the
>>       matrix mdev will be linked.
>>
>> The links will be removed as follows:
>>
>>     * When the queue device is removed, if its APQN is assigned to a matrix
>>       mdev, the structures representing the queue device and the matrix mdev
>>       will be unlinked.
>>
>>     * When an adapter or domain is unassigned from a matrix mdev, for each
>>       APQN unassigned that references a queue device bound to the vfio_ap
>>       device driver, the structures representing the queue device and the
>>       matrix mdev will be unlinked.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 132 +++++++++++++++++++++++++-
>>   drivers/s390/crypto/vfio_ap_private.h |   2 +
>>   2 files changed, 129 insertions(+), 5 deletions(-)
>>
> (...)
>
>> @@ -548,6 +557,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>   	return 0;
>>   }
>>   
>> +enum qlink_type {
> <bikeshed>I think this is less of a type, and more of an action, so
> maybe call this 'qlink_action' (and the function parameter below
> 'action'?)</bikeshed>

Sure, but what is this <bikeshed> tag?

>
>> +	LINK_APID,
>> +	LINK_APQI,
>> +	UNLINK_APID,
>> +	UNLINK_APQI,
>> +};
>> +
>> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
>> +				    unsigned long apid, unsigned long apqi)
>> +{
>> +	struct vfio_ap_queue *q;
>> +
>> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
>> +	if (q) {
>> +		q->matrix_mdev = matrix_mdev;
>> +		hash_add(matrix_mdev->qtable,
>> +			 &q->mdev_qnode, q->apqn);
>> +	}
>> +}
>> +
>> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
>> +{
>> +	struct vfio_ap_queue *q;
>> +
>> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
>> +	if (q) {
>> +		q->matrix_mdev = NULL;
>> +		hash_del(&q->mdev_qnode);
>> +	}
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_link_queues
>> + *
>> + * @matrix_mdev: The matrix mdev to link.
>> + * @type:	 The type of @qlink_id.
>> + * @qlink_id:	 The APID or APQI of the queues to link.
>> + *
>> + * Sets or clears the links between the queues with the specified @qlink_id
>> + * and the @matrix_mdev:
>> + *     @type == LINK_APID: Set the links between the @matrix_mdev and the
>> + *                         queues with the specified @qlink_id (APID)
>> + *     @type == LINK_APQI: Set the links between the @matrix_mdev and the
>> + *                         queues with the specified @qlink_id (APQI)
>> + *     @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
>> + *                           queues with the specified @qlink_id (APID)
>> + *     @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
>> + *                           queues with the specified @qlink_id (APQI)
>> + */
>> +static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
>> +				     enum qlink_type type,
>> +				     unsigned long qlink_id)
>> +{
>> +	unsigned long id;
>> +
>> +	switch (type) {
>> +	case LINK_APID:
>> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
>> +				     matrix_mdev->matrix.aqm_max + 1)
>> +			vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
>> +		break;
>> +	case UNLINK_APID:
>> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
>> +				     matrix_mdev->matrix.aqm_max + 1)
>> +			vfio_ap_mdev_unlink_queue(qlink_id, id);
>> +		break;
>> +	case LINK_APQI:
>> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
>> +				     matrix_mdev->matrix.apm_max + 1)
>> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
>> +		break;
>> +	case UNLINK_APQI:
>> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
>> +				     matrix_mdev->matrix.apm_max + 1)
>> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
>> +		break;
>> +	default:
>> +		WARN_ON_ONCE(1);
>> +	}
>> +}
>> +
> (...)
>
> I have not reviewed this deeply, but at a glance, it seems fine.
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support
  2020-08-25 10:45   ` Cornelia Huck
@ 2020-08-31 18:34     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-08-31 18:34 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 8/25/20 6:45 AM, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 15:56:16 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Update the documentation in vfio-ap.rst to include information about the
>> AP dynamic configuration support (i.e., hot plug of adapters, domains
>> and control domains via the matrix mediated device's sysfs assignment
>> attributes).
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   Documentation/s390/vfio-ap.rst | 362 ++++++++++++++++++++++++++-------
>>   1 file changed, 285 insertions(+), 77 deletions(-)
>>
>> diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
>> index e15436599086..8907aeca8fb7 100644
>> --- a/Documentation/s390/vfio-ap.rst
>> +++ b/Documentation/s390/vfio-ap.rst
>> @@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM guest is:
>>   1. The administrator loads the vfio_ap device driver
>>   2. The vfio-ap driver during its initialization will register a single 'matrix'
>>      device with the device core. This will serve as the parent device for
>> -   all mediated matrix devices used to configure an AP matrix for a guest.
>> +   all matrix mediated devices used to configure an AP matrix for a guest.
> This (and many other changes here) seems to be unrelated to the new
> feature. Split that out into a separate patch that can be applied right
> away? That would make this patch smaller and easier to review; it's
> hard to figure out which parts deal with the new feature, and which parts
> simply got an update.
>
> Also, do you want to do similar wording changes in the QEMU
> documentation for vfio-ap?

Will do.

>
>>   3. The /sys/devices/vfio_ap/matrix device is created by the device core
>>   4. The vfio_ap device driver will register with the AP bus for AP queue devices
>>      of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
> (...)
>
>> @@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model features:
>>      can be made available to the guest only if it is available on the host (i.e.,
>>      facility bit 12 is set).
>>   
>> +4. apqi: Indicates AP queue interrupts are available on the guest. This facility
>> +   can be made available to the guest only if it is available on the host (i.e.,
>> +   facility bit 65 is set).
>> +
>>   Note: If the user chooses to specify a CPU model different than the 'host'
>>   model to QEMU, the CPU model features and facilities need to be turned on
>>   explicitly; for example::
>> @@ -444,7 +494,7 @@ explicitly; for example::
>>   A guest can be precluded from using AP features/facilities by turning them off
>>   explicitly; for example::
>>   
>> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
>> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
> Isn't that an already existing facility that was simply lacking
> documentation? If yes, split it off?

Yes and will do.

>
>>   
>>   Note: If the APFT facility is turned off (apft=off) for the guest, the guest
>>   will not see any AP devices. The zcrypt device drivers that register for type 10
> (...)
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-21 19:56 ` [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
  2020-08-25 10:13   ` Cornelia Huck
@ 2020-09-04  8:11   ` Christian Borntraeger
  2020-09-08 18:54     ` Tony Krowiak
  2020-09-25  2:27   ` Halil Pasic
  2 siblings, 1 reply; 79+ messages in thread
From: Christian Borntraeger @ 2020-09-04  8:11 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: freude, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, imbrenda, hca, gor, kernel test robot



On 21.08.20 21:56, Tony Krowiak wrote:
> This patch refactor's the vfio_ap device driver to use the AP bus's
> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> information about a queue that is bound to the vfio_ap device driver.
> The bus's ap_get_qdev() function retrieves the queue device from a
> hashtable keyed by APQN. This is much more efficient than looping over
> the list of devices attached to the AP bus by several orders of
> magnitude.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>

> Reported-by: kernel test robot <lkp@intel.com>

I think this can go. No need to mark that an earlier version of this patch had an issue.


[...]

> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index f46dde56b464..a2aa05bec718 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -18,6 +18,7 @@
>  #include <linux/delay.h>
>  #include <linux/mutex.h>
>  #include <linux/kvm_host.h>
> +#include <linux/hashtable.h>

I dont think that this header file needs it. Any user of it will now include this. 
Can you move this include into the respective C file when the hash stuff is
used?


Other than that this looks good. 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-08-21 19:56 ` [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
  2020-08-25 10:25   ` Cornelia Huck
@ 2020-09-04  8:15   ` Christian Borntraeger
  2020-09-08 19:03     ` Tony Krowiak
  2020-09-25  7:58   ` Halil Pasic
  2 siblings, 1 reply; 79+ messages in thread
From: Christian Borntraeger @ 2020-09-04  8:15 UTC (permalink / raw)
  To: Tony Krowiak, linux-s390, linux-kernel, kvm
  Cc: freude, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, imbrenda, hca, gor


On 21.08.20 21:56, Tony Krowiak wrote:
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index a2aa05bec718..57da703b549a 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -87,6 +87,7 @@ struct ap_matrix_mdev {
>  	struct kvm *kvm;
>  	struct kvm_s390_module_hook pqap_hook;
>  	struct mdev_device *mdev;
> +	DECLARE_HASHTABLE(qtable, 8);
>  };

Ah I think the include should go into this patch. But then you should revisit the patch description
of 2 as it talks about hashtables (but doesnt do anything about it). 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-09-04  8:11   ` Christian Borntraeger
@ 2020-09-08 18:54     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-08 18:54 UTC (permalink / raw)
  To: Christian Borntraeger, linux-s390, linux-kernel, kvm
  Cc: freude, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, imbrenda, hca, gor, kernel test robot



On 9/4/20 4:11 AM, Christian Borntraeger wrote:
>
> On 21.08.20 21:56, Tony Krowiak wrote:
>> This patch refactor's the vfio_ap device driver to use the AP bus's
>> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
>> information about a queue that is bound to the vfio_ap device driver.
>> The bus's ap_get_qdev() function retrieves the queue device from a
>> hashtable keyed by APQN. This is much more efficient than looping over
>> the list of devices attached to the AP bus by several orders of
>> magnitude.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reported-by: kernel test robot <lkp@intel.com>
> I think this can go. No need to mark that an earlier version of this patch had an issue.

I was just following the instructions in the robot comments. I'll get 
rid of it.

>
>
> [...]
>
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index f46dde56b464..a2aa05bec718 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -18,6 +18,7 @@
>>   #include <linux/delay.h>
>>   #include <linux/mutex.h>
>>   #include <linux/kvm_host.h>
>> +#include <linux/hashtable.h>
> I dont think that this header file needs it. Any user of it will now include this.
> Can you move this include into the respective C file when the hash stuff is
> used?

I can.

>
>
> Other than that this looks good.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-09-04  8:15   ` Christian Borntraeger
@ 2020-09-08 19:03     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-08 19:03 UTC (permalink / raw)
  To: Christian Borntraeger, linux-s390, linux-kernel, kvm
  Cc: freude, cohuck, mjrosato, pasic, alex.williamson, kwankhede,
	fiuczy, frankja, david, imbrenda, hca, gor



On 9/4/20 4:15 AM, Christian Borntraeger wrote:
> On 21.08.20 21:56, Tony Krowiak wrote:
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index a2aa05bec718..57da703b549a 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -87,6 +87,7 @@ struct ap_matrix_mdev {
>>   	struct kvm *kvm;
>>   	struct kvm_s390_module_hook pqap_hook;
>>   	struct mdev_device *mdev;
>> +	DECLARE_HASHTABLE(qtable, 8);
>>   };
> Ah I think the include should go into this patch. But then you should revisit the patch description
> of 2 as it talks about hashtables (but doesnt do anything about it).

Got it.

>   


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use
  2020-08-21 19:56 ` [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
@ 2020-09-14 15:29   ` Cornelia Huck
  2020-09-15 19:32     ` Tony Krowiak
  2020-09-25  9:24   ` Halil Pasic
  1 sibling, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-09-14 15:29 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot

On Fri, 21 Aug 2020 15:56:04 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The intent of
> this callback is to provide a driver with the means to prevent a root user
> from inadvertently taking a queue away from a matrix mdev and giving it to
> the host while it is assigned to the matrix mdev. The callback will
> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters, domains and
> control domains assigned to the matrix mdev). This will enforce the proper
> procedure for removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reported-by: kernel test robot <lkp@intel.com>

This looks a bit odd...

> ---
>  drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>  drivers/s390/crypto/ap_bus.h |   4 +
>  2 files changed, 142 insertions(+), 10 deletions(-)
> 

(...)

> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>  	return rc;
>  }
>  
> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> +	int rc = 0;
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +	unsigned long *newapm = (unsigned long *)data;
> +
> +	/*
> +	 * No need to verify whether the driver is using the queues if it is the
> +	 * default driver.
> +	 */
> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> +		return 0;
> +
> +	/* The non-default driver's module must be loaded */
> +	if (!try_module_get(drv->owner))
> +		return 0;
> +
> +	if (ap_drv->in_use)
> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
> +			rc = -EADDRINUSE;

ISTR that Christian suggested -EBUSY in a past revision of this series?
I think that would be more appropriate.

Also, I know we have discussed this before, but it is very hard to
figure out the offending device(s) if the sysfs manipulation failed. Can
we at least drop something into the syslog? That would be far from
perfect, but it gives an admin at least a chance to figure out why they
got an error. Some more structured way that would be usable from tools
can still be added later.

> +
> +	module_put(drv->owner);
> +
> +	return rc;
> +}


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-08-21 19:56 ` [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
@ 2020-09-14 15:31   ` Cornelia Huck
  2020-09-25  9:29   ` Halil Pasic
  1 sibling, 0 replies; 79+ messages in thread
From: Cornelia Huck @ 2020-09-14 15:31 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:05 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c     | 68 ++++++++++++++++++++-------
>  drivers/s390/crypto/vfio_ap_private.h |  2 +
>  3 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index 24cdef60039a..aae5b3d8e3fa 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -175,6 +175,7 @@ static int __init vfio_ap_init(void)
>  	memset(&vfio_ap_drv, 0, sizeof(vfio_ap_drv));
>  	vfio_ap_drv.probe = vfio_ap_queue_dev_probe;
>  	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
> +	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>  	vfio_ap_drv.ids = ap_queue_ids;
>  
>  	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 2e37ee82e422..fc1aa6f947eb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -515,18 +515,36 @@ vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>  	return 0;
>  }
>  
> +#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
> +			 "already assigned to %s"

Ah, I spoke too soon; this is what I had been looking for :)


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use
  2020-09-14 15:29   ` Cornelia Huck
@ 2020-09-15 19:32     ` Tony Krowiak
  2020-09-17 12:14       ` Cornelia Huck
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-09-15 19:32 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 9/14/20 11:29 AM, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 15:56:04 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The intent of
>> this callback is to provide a driver with the means to prevent a root user
>> from inadvertently taking a queue away from a matrix mdev and giving it to
>> the host while it is assigned to the matrix mdev. The callback will
>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a device
>> in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver facilitates pass-through of an AP queue to a
>> guest. The idea here is that a guest may be administered by a different
>> sysadmin than the host and we don't want AP resources to unexpectedly
>> disappear from a guest's AP configuration (i.e., adapters, domains and
>> control domains assigned to the matrix mdev). This will enforce the proper
>> procedure for removing AP resources intended for guest usage which is to
>> first unassign them from the matrix mdev, then unbind them from the
>> vfio_ap device driver.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reported-by: kernel test robot <lkp@intel.com>
> This looks a bit odd...

I've removed all of those. These kernel test robot errors were flagged
in the last series. The review comments from the robot suggested
the reported-by, but I assume that was for patches intended to
fix those errors, so I am removing these as per Christian's comments.

>
>> ---
>>   drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>>   drivers/s390/crypto/ap_bus.h |   4 +
>>   2 files changed, 142 insertions(+), 10 deletions(-)
>>
> (...)
>
>> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>>   	return rc;
>>   }
>>   
>> +static int __verify_card_reservations(struct device_driver *drv, void *data)
>> +{
>> +	int rc = 0;
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +	unsigned long *newapm = (unsigned long *)data;
>> +
>> +	/*
>> +	 * No need to verify whether the driver is using the queues if it is the
>> +	 * default driver.
>> +	 */
>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> +		return 0;
>> +
>> +	/* The non-default driver's module must be loaded */
>> +	if (!try_module_get(drv->owner))
>> +		return 0;
>> +
>> +	if (ap_drv->in_use)
>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
>> +			rc = -EADDRINUSE;
> ISTR that Christian suggested -EBUSY in a past revision of this series?
> I think that would be more appropriate.

I went back and looked and sure enough, he did recommend that.
You have a great memory! I didn't respond to that comment, so I
must have missed it at the time.

I personally prefer EADDRINUSE because I think it is more indicative
of the reason an AP resource can not be assigned back to the host
drivers is because it is in use by a guest or, at the very least, reserved
for use by a guest (i.e., assigned to an mdev). To say it is busy implies
that the device is busy performing encryption services which may or
may not be true at a given moment. Even if so, that is not the reason
for refusing to allow reassignment of the device.

>
> Also, I know we have discussed this before, but it is very hard to
> figure out the offending device(s) if the sysfs manipulation failed. Can
> we at least drop something into the syslog? That would be far from
> perfect, but it gives an admin at least a chance to figure out why they
> got an error. Some more structured way that would be usable from tools
> can still be added later.

I see you found the patch that logged this:)

>
>> +
>> +	module_put(drv->owner);
>> +
>> +	return rc;
>> +}


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use
  2020-09-15 19:32     ` Tony Krowiak
@ 2020-09-17 12:14       ` Cornelia Huck
  2020-09-17 13:54         ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-09-17 12:14 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot

On Tue, 15 Sep 2020 15:32:35 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 9/14/20 11:29 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:04 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >  
> >> Introduces a new driver callback to prevent a root user from unbinding
> >> an AP queue from its device driver if the queue is in use. The intent of
> >> this callback is to provide a driver with the means to prevent a root user
> >> from inadvertently taking a queue away from a matrix mdev and giving it to
> >> the host while it is assigned to the matrix mdev. The callback will
> >> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> >> attributes would result in one or more AP queues being removed from its
> >> driver. If the callback responds in the affirmative for any driver
> >> queried, the change to the apmask or aqmask will be rejected with a device
> >> in use error.
> >>
> >> For this patch, only non-default drivers will be queried. Currently,
> >> there is only one non-default driver, the vfio_ap device driver. The
> >> vfio_ap device driver facilitates pass-through of an AP queue to a
> >> guest. The idea here is that a guest may be administered by a different
> >> sysadmin than the host and we don't want AP resources to unexpectedly
> >> disappear from a guest's AP configuration (i.e., adapters, domains and
> >> control domains assigned to the matrix mdev). This will enforce the proper
> >> procedure for removing AP resources intended for guest usage which is to
> >> first unassign them from the matrix mdev, then unbind them from the
> >> vfio_ap device driver.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> >> Reported-by: kernel test robot <lkp@intel.com>  
> > This looks a bit odd...  
> 
> I've removed all of those. These kernel test robot errors were flagged
> in the last series. The review comments from the robot suggested
> the reported-by, but I assume that was for patches intended to
> fix those errors, so I am removing these as per Christian's comments.

Yes, I think the Reported-by: mostly makes sense if you include a patch
to fix something on top.

> 
> >  
> >> ---
> >>   drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
> >>   drivers/s390/crypto/ap_bus.h |   4 +
> >>   2 files changed, 142 insertions(+), 10 deletions(-)
> >>  
> > (...)
> >  
> >> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
> >>   	return rc;
> >>   }
> >>   
> >> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> >> +{
> >> +	int rc = 0;
> >> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> >> +	unsigned long *newapm = (unsigned long *)data;
> >> +
> >> +	/*
> >> +	 * No need to verify whether the driver is using the queues if it is the
> >> +	 * default driver.
> >> +	 */
> >> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> >> +		return 0;
> >> +
> >> +	/* The non-default driver's module must be loaded */
> >> +	if (!try_module_get(drv->owner))
> >> +		return 0;
> >> +
> >> +	if (ap_drv->in_use)
> >> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
> >> +			rc = -EADDRINUSE;  
> > ISTR that Christian suggested -EBUSY in a past revision of this series?
> > I think that would be more appropriate.  
> 
> I went back and looked and sure enough, he did recommend that.
> You have a great memory! I didn't respond to that comment, so I
> must have missed it at the time.
> 
> I personally prefer EADDRINUSE because I think it is more indicative
> of the reason an AP resource can not be assigned back to the host
> drivers is because it is in use by a guest or, at the very least, reserved
> for use by a guest (i.e., assigned to an mdev). To say it is busy implies
> that the device is busy performing encryption services which may or
> may not be true at a given moment. Even if so, that is not the reason
> for refusing to allow reassignment of the device.

I have a different understanding of these error codes: EADDRINUSE is
something used in the networking context when an actual address is
already used elsewhere. EBUSY is more of a generic error that indicates
that a certain resource is not free to perform the requested operation;
it does not necessarily mean that the resource is currently actively
doing something. Kind of when you get EBUSY when trying to eject
something another program holds a reference on: that other program
might not actually be doing anything, but it potentially could.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use
  2020-09-17 12:14       ` Cornelia Huck
@ 2020-09-17 13:54         ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-17 13:54 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 9/17/20 8:14 AM, Cornelia Huck wrote:
> On Tue, 15 Sep 2020 15:32:35 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 9/14/20 11:29 AM, Cornelia Huck wrote:
>>> On Fri, 21 Aug 2020 15:56:04 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>   
>>>> Introduces a new driver callback to prevent a root user from unbinding
>>>> an AP queue from its device driver if the queue is in use. The intent of
>>>> this callback is to provide a driver with the means to prevent a root user
>>>> from inadvertently taking a queue away from a matrix mdev and giving it to
>>>> the host while it is assigned to the matrix mdev. The callback will
>>>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>>>> attributes would result in one or more AP queues being removed from its
>>>> driver. If the callback responds in the affirmative for any driver
>>>> queried, the change to the apmask or aqmask will be rejected with a device
>>>> in use error.
>>>>
>>>> For this patch, only non-default drivers will be queried. Currently,
>>>> there is only one non-default driver, the vfio_ap device driver. The
>>>> vfio_ap device driver facilitates pass-through of an AP queue to a
>>>> guest. The idea here is that a guest may be administered by a different
>>>> sysadmin than the host and we don't want AP resources to unexpectedly
>>>> disappear from a guest's AP configuration (i.e., adapters, domains and
>>>> control domains assigned to the matrix mdev). This will enforce the proper
>>>> procedure for removing AP resources intended for guest usage which is to
>>>> first unassign them from the matrix mdev, then unbind them from the
>>>> vfio_ap device driver.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>>> Reported-by: kernel test robot <lkp@intel.com>
>>> This looks a bit odd...
>> I've removed all of those. These kernel test robot errors were flagged
>> in the last series. The review comments from the robot suggested
>> the reported-by, but I assume that was for patches intended to
>> fix those errors, so I am removing these as per Christian's comments.
> Yes, I think the Reported-by: mostly makes sense if you include a patch
> to fix something on top.
>
>>>   
>>>> ---
>>>>    drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>>>>    drivers/s390/crypto/ap_bus.h |   4 +
>>>>    2 files changed, 142 insertions(+), 10 deletions(-)
>>>>   
>>> (...)
>>>   
>>>> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>>>>    	return rc;
>>>>    }
>>>>    
>>>> +static int __verify_card_reservations(struct device_driver *drv, void *data)
>>>> +{
>>>> +	int rc = 0;
>>>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>>>> +	unsigned long *newapm = (unsigned long *)data;
>>>> +
>>>> +	/*
>>>> +	 * No need to verify whether the driver is using the queues if it is the
>>>> +	 * default driver.
>>>> +	 */
>>>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>>>> +		return 0;
>>>> +
>>>> +	/* The non-default driver's module must be loaded */
>>>> +	if (!try_module_get(drv->owner))
>>>> +		return 0;
>>>> +
>>>> +	if (ap_drv->in_use)
>>>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
>>>> +			rc = -EADDRINUSE;
>>> ISTR that Christian suggested -EBUSY in a past revision of this series?
>>> I think that would be more appropriate.
>> I went back and looked and sure enough, he did recommend that.
>> You have a great memory! I didn't respond to that comment, so I
>> must have missed it at the time.
>>
>> I personally prefer EADDRINUSE because I think it is more indicative
>> of the reason an AP resource can not be assigned back to the host
>> drivers is because it is in use by a guest or, at the very least, reserved
>> for use by a guest (i.e., assigned to an mdev). To say it is busy implies
>> that the device is busy performing encryption services which may or
>> may not be true at a given moment. Even if so, that is not the reason
>> for refusing to allow reassignment of the device.
> I have a different understanding of these error codes: EADDRINUSE is
> something used in the networking context when an actual address is
> already used elsewhere. EBUSY is more of a generic error that indicates
> that a certain resource is not free to perform the requested operation;
> it does not necessarily mean that the resource is currently actively
> doing something. Kind of when you get EBUSY when trying to eject
> something another program holds a reference on: that other program
> might not actually be doing anything, but it potentially could.

I'll go ahead and change it to -EBUSY.

>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB
  2020-08-21 19:56 ` [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB Tony Krowiak
@ 2020-09-17 14:22   ` Cornelia Huck
  2020-09-18 17:03     ` Tony Krowiak
  2020-09-26  1:38   ` Halil Pasic
  1 sibling, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-09-17 14:22 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:06 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 32 ++++++++++++++++++++++-----
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 29 insertions(+), 5 deletions(-)

(...)

> @@ -1202,13 +1223,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	if (ret)
>  		return NOTIFY_DONE;
>  
> -	/* If there is no CRYCB pointer, then we can't copy the masks */
> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>  		return NOTIFY_DONE;
>  
> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> -				  matrix_mdev->matrix.aqm,
> -				  matrix_mdev->matrix.adm);
> +	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> +	       sizeof(matrix_mdev->shadow_apcb));
> +	vfio_ap_mdev_commit_crycb(matrix_mdev);

We are sure that the shadow APCB always matches up as we are the only
ones manipulating the APCB in the CRYCB, right?

>  
>  	return NOTIFY_OK;
>  }


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-08-21 19:56 ` [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
@ 2020-09-17 14:34   ` Cornelia Huck
  2020-09-18 17:09     ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Cornelia Huck @ 2020-09-17 14:34 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:07 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The matrix of adapters and domains configured in a guest's CRYCB may
> differ from the matrix of adapters and domains assigned to the matrix mdev,
> so this patch introduces a sysfs attribute to display the matrix of a guest
> using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
> guest using the matrix mdev can be displayed as follows:
> 
>    cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
> 
> If a guest is not using the matrix mdev at the time the crycb is displayed,
> an error (ENODEV) will be returned.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 58 +++++++++++++++++++++++++++++++
>  1 file changed, 58 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index efb229033f9e..30bf23734af6 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1119,6 +1119,63 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>  }
>  static DEVICE_ATTR_RO(matrix);
>  
> +static ssize_t guest_matrix_show(struct device *dev,
> +				 struct device_attribute *attr, char *buf)
> +{
> +	struct mdev_device *mdev = mdev_from_dev(dev);
> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +	char *bufpos = buf;
> +	unsigned long apid;
> +	unsigned long apqi;
> +	unsigned long apid1;
> +	unsigned long apqi1;
> +	unsigned long napm_bits = matrix_mdev->shadow_apcb.apm_max + 1;
> +	unsigned long naqm_bits = matrix_mdev->shadow_apcb.aqm_max + 1;
> +	int nchars = 0;
> +	int n;
> +
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> +		return -ENODEV;
> +
> +	apid1 = find_first_bit_inv(matrix_mdev->shadow_apcb.apm, napm_bits);
> +	apqi1 = find_first_bit_inv(matrix_mdev->shadow_apcb.aqm, naqm_bits);
> +
> +	mutex_lock(&matrix_dev->lock);
> +
> +	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
> +				     napm_bits) {
> +			for_each_set_bit_inv(apqi,
> +					     matrix_mdev->shadow_apcb.aqm,
> +					     naqm_bits) {
> +				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
> +					    apqi);
> +				bufpos += n;
> +				nchars += n;
> +			}
> +		}
> +	} else if (apid1 < napm_bits) {
> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
> +				     napm_bits) {
> +			n = sprintf(bufpos, "%02lx.\n", apid);
> +			bufpos += n;
> +			nchars += n;
> +		}
> +	} else if (apqi1 < naqm_bits) {
> +		for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
> +				     naqm_bits) {
> +			n = sprintf(bufpos, ".%04lx\n", apqi);
> +			bufpos += n;
> +			nchars += n;
> +		}
> +	}
> +
> +	mutex_unlock(&matrix_dev->lock);
> +
> +	return nchars;
> +}

This basically looks like a version of matrix_show() operating on the
shadow apcb. I'm wondering if we could consolidate these two functions
by passing in the structure to operate on as a parameter? Might not be
worth the effort, though.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB
  2020-09-17 14:22   ` Cornelia Huck
@ 2020-09-18 17:03     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-18 17:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/17/20 10:22 AM, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 15:56:06 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The APCB is a field within the CRYCB that provides the AP configuration
>> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
>> maintain it for the lifespan of the guest.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 32 ++++++++++++++++++++++-----
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   2 files changed, 29 insertions(+), 5 deletions(-)
> (...)
>
>> @@ -1202,13 +1223,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   	if (ret)
>>   		return NOTIFY_DONE;
>>   
>> -	/* If there is no CRYCB pointer, then we can't copy the masks */
>> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>   		return NOTIFY_DONE;
>>   
>> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
>> -				  matrix_mdev->matrix.aqm,
>> -				  matrix_mdev->matrix.adm);
>> +	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> +	       sizeof(matrix_mdev->shadow_apcb));
>> +	vfio_ap_mdev_commit_crycb(matrix_mdev);
> We are sure that the shadow APCB always matches up as we are the only
> ones manipulating the APCB in the CRYCB, right?

Yes

>
>>   
>>   	return NOTIFY_OK;
>>   }


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-09-17 14:34   ` Cornelia Huck
@ 2020-09-18 17:09     ` Tony Krowiak
  2020-09-26  7:16       ` Halil Pasic
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-09-18 17:09 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, mjrosato,
	pasic, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/17/20 10:34 AM, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 15:56:07 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The matrix of adapters and domains configured in a guest's CRYCB may
>> differ from the matrix of adapters and domains assigned to the matrix mdev,
>> so this patch introduces a sysfs attribute to display the matrix of a guest
>> using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
>> guest using the matrix mdev can be displayed as follows:
>>
>>     cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
>>
>> If a guest is not using the matrix mdev at the time the crycb is displayed,
>> an error (ENODEV) will be returned.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 58 +++++++++++++++++++++++++++++++
>>   1 file changed, 58 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index efb229033f9e..30bf23734af6 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -1119,6 +1119,63 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>>   }
>>   static DEVICE_ATTR_RO(matrix);
>>   
>> +static ssize_t guest_matrix_show(struct device *dev,
>> +				 struct device_attribute *attr, char *buf)
>> +{
>> +	struct mdev_device *mdev = mdev_from_dev(dev);
>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +	char *bufpos = buf;
>> +	unsigned long apid;
>> +	unsigned long apqi;
>> +	unsigned long apid1;
>> +	unsigned long apqi1;
>> +	unsigned long napm_bits = matrix_mdev->shadow_apcb.apm_max + 1;
>> +	unsigned long naqm_bits = matrix_mdev->shadow_apcb.aqm_max + 1;
>> +	int nchars = 0;
>> +	int n;
>> +
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +		return -ENODEV;
>> +
>> +	apid1 = find_first_bit_inv(matrix_mdev->shadow_apcb.apm, napm_bits);
>> +	apqi1 = find_first_bit_inv(matrix_mdev->shadow_apcb.aqm, naqm_bits);
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +
>> +	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
>> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
>> +				     napm_bits) {
>> +			for_each_set_bit_inv(apqi,
>> +					     matrix_mdev->shadow_apcb.aqm,
>> +					     naqm_bits) {
>> +				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
>> +					    apqi);
>> +				bufpos += n;
>> +				nchars += n;
>> +			}
>> +		}
>> +	} else if (apid1 < napm_bits) {
>> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
>> +				     napm_bits) {
>> +			n = sprintf(bufpos, "%02lx.\n", apid);
>> +			bufpos += n;
>> +			nchars += n;
>> +		}
>> +	} else if (apqi1 < naqm_bits) {
>> +		for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
>> +				     naqm_bits) {
>> +			n = sprintf(bufpos, ".%04lx\n", apqi);
>> +			bufpos += n;
>> +			nchars += n;
>> +		}
>> +	}
>> +
>> +	mutex_unlock(&matrix_dev->lock);
>> +
>> +	return nchars;
>> +}
> This basically looks like a version of matrix_show() operating on the
> shadow apcb. I'm wondering if we could consolidate these two functions
> by passing in the structure to operate on as a parameter? Might not be
> worth the effort, though.

We still need the two functions because they back the mdev's
sysfs matrix and guest_matrix attributes, but we could call a function.
I'm not sure it buys us much though.

>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-27 14:24     ` Tony Krowiak
  2020-08-28  8:13       ` Cornelia Huck
@ 2020-09-25  2:11       ` Halil Pasic
  2020-10-16 20:59         ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-25  2:11 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: Cornelia Huck, linux-s390, linux-kernel, kvm, freude,
	borntraeger, mjrosato, alex.williamson, kwankhede, fiuczy,
	frankja, david, imbrenda, hca, gor

On Thu, 27 Aug 2020 10:24:07 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 
> 
> On 8/25/20 6:13 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:02 -0400
> > Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
> >
> >> This patch refactor's the vfio_ap device driver to use the AP bus's
> > s/refactor's/refactors/
> 
> Of course, what was I thinking?:)
> 
> >> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
> >> information about a queue that is bound to the vfio_ap device driver.
> >> The bus's ap_get_qdev() function retrieves the queue device from a
> >> hashtable keyed by APQN. This is much more efficient than looping over
> >> the list of devices attached to the AP bus by several orders of
> >> magnitude.
> >>
> >> Signed-off-by: Tony Krowiak<akrowiak@linux.ibm.com>
> >> Reported-by: kernel test robot<lkp@intel.com>
> >> ---
> >>   drivers/s390/crypto/vfio_ap_drv.c     | 27 ++-------
> >>   drivers/s390/crypto/vfio_ap_ops.c     | 86 +++++++++++++++------------
> >>   drivers/s390/crypto/vfio_ap_private.h |  8 ++-
> >>   3 files changed, 59 insertions(+), 62 deletions(-)
> >>
> > (...)
> >
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >> index e0bde8518745..ad3925f04f61 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -26,43 +26,26 @@
> >>   
> >>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> >>   
> >> -static int match_apqn(struct device *dev, const void *data)
> >> -{
> >> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
> >> -
> >> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
> >> -}
> >> -
> >>   /**
> >> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> >> - * @matrix_mdev: the associated mediated matrix
> >> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> >>    * @apqn: The queue APQN
> >>    *
> >> - * Retrieve a queue with a specific APQN from the list of the
> >> - * devices of the vfio_ap_drv.
> >> - * Verify that the APID and the APQI are set in the matrix.
> >> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
> >> + * the AP bus.
> >>    *
> >> - * Returns the pointer to the associated vfio_ap_queue
> >> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
> >>    */
> >> -static struct vfio_ap_queue *vfio_ap_get_queue(
> >> -					struct ap_matrix_mdev *matrix_mdev,
> >> -					int apqn)
> >> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
> >>   {
> >> +	struct ap_queue *queue;
> >>   	struct vfio_ap_queue *q;
> >> -	struct device *dev;
> >>   
> >> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >> -		return NULL;
> >> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> > I think you should add some explanation to the patch description why
> > testing the matrix bitmasks is not needed anymore.
> 
> As a result of this comment, I took a closer look at the code to
> determine the reason for eliminating the matrix_mdev
> parameter. The reason is because the code below (i.e., find the device
> and get the driver data) was also repeated in the vfio_ap_irq_disable_apqn()
> function, so I replaced it with a call to the function above; however, the
> vfio_ap_irq_disable_apqn() function  does not have a reference to the
> matrix_mdev, so I eliminated the matrix_mdev parameter. Note that the
> vfio_ap_irq_disable_apqn() is called for each APQN assigned to a matrix
> mdev, so there is no need to test the bitmasks there.
> 
> The other place from which the function above is called is
> the handle_pqap() function which does have a reference to the
> matrix_mdev. In order to ensure the integrity of the instruction
> being intercepted - i.e., PQAP(AQIC) enable/disable IRQ for aN
> AP queue - the testing of the matrix bitmasks probably ought to
> be performed, so it will be done there instead of in the
> vfio_ap_get_queue() function above.

I'm a little confused. I do agree that in handle_pqap() we do want to
make sure that we only operate on queues that belong to the given guest
that issued the PQAP instruction.

AFAICT with this patch set applied, this is not the case any more. Does
that 'will be done there instead' refer to v11?

Another question is, can we use vfio_ap_get_mdev_queue() in
handle_pqap() (instead of vfio_ap_get_queue())?
 
> 
> 
> > +	queue = ap_get_qdev(apqn);
> > +	if (!queue)
> >   		return NULL;
> >   
> > -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> > -				 &apqn, match_apqn);
> > -	if (!dev)
> > -		return NULL;
> > -	q = dev_get_drvdata(dev);
> > -	q->matrix_mdev = matrix_mdev;
> > -	put_device(dev);
> > +	q = dev_get_drvdata(&queue->ap_dev.device);
> > +	put_device(&queue->ap_dev.device);
> >   
> >   	return q;
> >   }
> > (...)
> >
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-08-21 19:56 ` [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
  2020-08-25 10:13   ` Cornelia Huck
  2020-09-04  8:11   ` Christian Borntraeger
@ 2020-09-25  2:27   ` Halil Pasic
  2020-09-29 13:07     ` Tony Krowiak
  2 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-25  2:27 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot

On Fri, 21 Aug 2020 15:56:02 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -26,43 +26,26 @@
>  
>  static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>  
> -static int match_apqn(struct device *dev, const void *data)
> -{
> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
> -
> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
> -}
> -
>  /**
> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> - * @matrix_mdev: the associated mediated matrix
> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>   * @apqn: The queue APQN
>   *
> - * Retrieve a queue with a specific APQN from the list of the
> - * devices of the vfio_ap_drv.
> - * Verify that the APID and the APQI are set in the matrix.
> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
> + * the AP bus.
>   *
> - * Returns the pointer to the associated vfio_ap_queue
> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>   */
> -static struct vfio_ap_queue *vfio_ap_get_queue(
> -					struct ap_matrix_mdev *matrix_mdev,
> -					int apqn)
> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>  {
> +	struct ap_queue *queue;
>  	struct vfio_ap_queue *q;
> -	struct device *dev;
>  
> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> -		return NULL;
> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> +	queue = ap_get_qdev(apqn);
> +	if (!queue)
>  		return NULL;
>  
> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				 &apqn, match_apqn);
> -	if (!dev)
> -		return NULL;
> -	q = dev_get_drvdata(dev);
> -	q->matrix_mdev = matrix_mdev;
> -	put_device(dev);
> +	q = dev_get_drvdata(&queue->ap_dev.device);

Is this cast here safe? (I don't think it is.)

> +	put_device(&queue->ap_dev.device);
>  
>  	return q;
>  }

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev
  2020-08-21 19:56 ` [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
  2020-08-25 10:25   ` Cornelia Huck
  2020-09-04  8:15   ` Christian Borntraeger
@ 2020-09-25  7:58   ` Halil Pasic
  2 siblings, 0 replies; 79+ messages in thread
From: Halil Pasic @ 2020-09-25  7:58 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:03 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's create links between each queue device bound to the vfio_ap device

How about: Let us establish a bidirectional link...

we kind of had a shaky queue --> matrix_mdev link prior to this patch,
you are making this one solid and you are adding the matrix_mdev -->
queue link.

> driver and the matrix mdev to which the queue is assigned. The idea is to
> facilitate efficient retrieval of the objects representing the queue
> devices and matrix mdevs as well as to verify that a queue assigned to
> a matrix mdev is bound to the driver.
> 

I will have to look at the rest of the series to figure the usage out.
One thought in the back of my head is that if vfio_ap_get_queue(apqn) is
already O(1) the there is not much efficiency to gain by adding another
hashtable whose keys are apqns and values queues.

But I don't want to hang up myself on this.

> The links will be created as follows:
> 
>    * When the queue device is probed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be linked.
> 
>    * When an adapter or domain is assigned to a matrix mdev, for each new
>      APQN assigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be linked.
> 
> The links will be removed as follows:
> 
>    * When the queue device is removed, if its APQN is assigned to a matrix
>      mdev, the structures representing the queue device and the matrix mdev
>      will be unlinked.
> 
>    * When an adapter or domain is unassigned from a matrix mdev, for each
>      APQN unassigned that references a queue device bound to the vfio_ap
>      device driver, the structures representing the queue device and the
>      matrix mdev will be unlinked.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>

Except for the things pointed out by Connie and Christian, LGTM.

> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 132 +++++++++++++++++++++++++-
>  drivers/s390/crypto/vfio_ap_private.h |   2 +
>  2 files changed, 129 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index ad3925f04f61..2e37ee82e422 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -50,6 +50,19 @@ static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>  	return q;
>  }
>  
> +static struct vfio_ap_queue *vfio_ap_get_mdev_queue(struct ap_matrix_mdev *matrix_mdev,
> +						    unsigned long apqn)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	hash_for_each_possible(matrix_mdev->qtable, q, mdev_qnode, apqn) {
> +		if (q && (q->apqn == apqn))
> +			return q;
> +	}
> +
> +	return NULL;
> +}
> +
>  /**
>   * vfio_ap_wait_for_irqclear
>   * @apqn: The AP Queue number
> @@ -160,7 +173,6 @@ static struct ap_queue_status vfio_ap_irq_disable(struct vfio_ap_queue *q)
>  		  status.response_code);
>  end_free:
>  	vfio_ap_free_aqic_resources(q);
> -	q->matrix_mdev = NULL;
>  	return status;
>  }
>  
> @@ -262,7 +274,6 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	struct vfio_ap_queue *q;
>  	struct ap_queue_status qstatus = {
>  			       .response_code = AP_RESPONSE_Q_NOT_AVAIL, };
> -	struct ap_matrix_mdev *matrix_mdev;
>  
>  	/* If we do not use the AIV facility just go to userland */
>  	if (!(vcpu->arch.sie_block->eca & ECA_AIV))
> @@ -273,14 +284,11 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  
>  	if (!vcpu->kvm->arch.crypto.pqap_hook)
>  		goto out_unlock;
> -	matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
> -				   struct ap_matrix_mdev, pqap_hook);
>  
>  	q = vfio_ap_get_queue(apqn);
>  	if (!q)
>  		goto out_unlock;
>  
> -	q->matrix_mdev = matrix_mdev;
>  	status = vcpu->run->s.regs.gprs[1];
>  
>  	/* If IR bit(16) is set we enable the interrupt */
> @@ -320,6 +328,7 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  
>  	matrix_mdev->mdev = mdev;
>  	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> +	hash_init(matrix_mdev->qtable);
>  	mdev_set_drvdata(mdev, matrix_mdev);
>  	matrix_mdev->pqap_hook.hook = handle_pqap;
>  	matrix_mdev->pqap_hook.owner = THIS_MODULE;
> @@ -548,6 +557,87 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>  	return 0;
>  }
>  
> +enum qlink_type {
> +	LINK_APID,
> +	LINK_APQI,
> +	UNLINK_APID,
> +	UNLINK_APQI,
> +};
> +
> +static void vfio_ap_mdev_link_queue(struct ap_matrix_mdev *matrix_mdev,
> +				    unsigned long apid, unsigned long apqi)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> +	if (q) {
> +		q->matrix_mdev = matrix_mdev;
> +		hash_add(matrix_mdev->qtable,
> +			 &q->mdev_qnode, q->apqn);
> +	}
> +}
> +
> +static void vfio_ap_mdev_unlink_queue(unsigned long apid, unsigned long apqi)
> +{
> +	struct vfio_ap_queue *q;
> +
> +	q = vfio_ap_get_queue(AP_MKQID(apid, apqi));
> +	if (q) {
> +		q->matrix_mdev = NULL;
> +		hash_del(&q->mdev_qnode);
> +	}
> +}
> +
> +/**
> + * vfio_ap_mdev_link_queues
> + *
> + * @matrix_mdev: The matrix mdev to link.
> + * @type:	 The type of @qlink_id.
> + * @qlink_id:	 The APID or APQI of the queues to link.
> + *
> + * Sets or clears the links between the queues with the specified @qlink_id
> + * and the @matrix_mdev:
> + *     @type == LINK_APID: Set the links between the @matrix_mdev and the
> + *                         queues with the specified @qlink_id (APID)
> + *     @type == LINK_APQI: Set the links between the @matrix_mdev and the
> + *                         queues with the specified @qlink_id (APQI)
> + *     @type == UNLINK_APID: Clear the links between the @matrix_mdev and the
> + *                           queues with the specified @qlink_id (APID)
> + *     @type == UNLINK_APQI: Clear the links between the @matrix_mdev and the
> + *                           queues with the specified @qlink_id (APQI)
> + */
> +static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
> +				     enum qlink_type type,
> +				     unsigned long qlink_id)
> +{
> +	unsigned long id;
> +
> +	switch (type) {
> +	case LINK_APID:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +				     matrix_mdev->matrix.aqm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, qlink_id, id);
> +		break;
> +	case UNLINK_APID:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.aqm,
> +				     matrix_mdev->matrix.aqm_max + 1)
> +			vfio_ap_mdev_unlink_queue(qlink_id, id);
> +		break;
> +	case LINK_APQI:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +				     matrix_mdev->matrix.apm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> +		break;
> +	case UNLINK_APQI:
> +		for_each_set_bit_inv(id, matrix_mdev->matrix.apm,
> +				     matrix_mdev->matrix.apm_max + 1)
> +			vfio_ap_mdev_link_queue(matrix_mdev, id, qlink_id);
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +	}
> +}
> +
>  /**
>   * assign_adapter_store
>   *
> @@ -617,6 +707,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	if (ret)
>  		goto share_err;
>  
> +	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>  	ret = count;
>  	goto done;
>  
> @@ -668,6 +759,7 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
> +	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -758,6 +850,7 @@ static ssize_t assign_domain_store(struct device *dev,
>  	if (ret)
>  		goto share_err;
>  
> +	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>  	ret = count;
>  	goto done;
>  
> @@ -810,6 +903,7 @@ static ssize_t unassign_domain_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
> +	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -1282,6 +1376,29 @@ void vfio_ap_mdev_unregister(void)
>  	mdev_unregister_device(&matrix_dev->device);
>  }
>  
> +/**
> + * vfio_ap_queue_link_mdev
> + *
> + * @q: The queue to link with the matrix mdev.
> + *
> + * Links @q with the matrix mdev to which the queue's APQN is assigned.
> + */
> +static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
> +{
> +	unsigned long apid = AP_QID_CARD(q->apqn);
> +	unsigned long apqi = AP_QID_QUEUE(q->apqn);
> +	struct ap_matrix_mdev *matrix_mdev;
> +
> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> +		if (test_bit_inv(apid, matrix_mdev->matrix.apm) &&
> +		    test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
> +			q->matrix_mdev = matrix_mdev;
> +			hash_add(matrix_mdev->qtable, &q->mdev_qnode, q->apqn);
> +			break;
> +		}
> +	}
> +}
> +
>  int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
>  {
>  	struct vfio_ap_queue *q;
> @@ -1290,9 +1407,12 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
>  	if (!q)
>  		return -ENOMEM;
>  
> +	mutex_lock(&matrix_dev->lock);
>  	dev_set_drvdata(&queue->ap_dev.device, q);
>  	q->apqn = queue->qid;
>  	q->saved_isc = VFIO_AP_ISC_INVALID;
> +	vfio_ap_queue_link_mdev(q);
> +	mutex_unlock(&matrix_dev->lock);
>  
>  	return 0;
>  }
> @@ -1309,6 +1429,8 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
>  	apqi = AP_QID_QUEUE(q->apqn);
>  	vfio_ap_mdev_reset_queue(apid, apqi, 1);
>  	vfio_ap_irq_disable(q);
> +	if (q->matrix_mdev)
> +		hash_del(&q->mdev_qnode);
>  	kfree(q);
>  	mutex_unlock(&matrix_dev->lock);
>  }
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index a2aa05bec718..57da703b549a 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -87,6 +87,7 @@ struct ap_matrix_mdev {
>  	struct kvm *kvm;
>  	struct kvm_s390_module_hook pqap_hook;
>  	struct mdev_device *mdev;
> +	DECLARE_HASHTABLE(qtable, 8);
>  };
>  
>  extern int vfio_ap_mdev_register(void);
> @@ -98,6 +99,7 @@ struct vfio_ap_queue {
>  	int	apqn;
>  #define VFIO_AP_ISC_INVALID 0xff
>  	unsigned char saved_isc;
> +	struct hlist_node mdev_qnode;
>  };
>  
>  int vfio_ap_mdev_probe_queue(struct ap_queue *queue);


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use
  2020-08-21 19:56 ` [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
  2020-09-14 15:29   ` Cornelia Huck
@ 2020-09-25  9:24   ` Halil Pasic
  2020-09-29 13:59     ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-25  9:24 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:04 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The intent of
> this callback is to provide a driver with the means to prevent a root user
> from inadvertently taking a queue away from a matrix mdev and giving it to
> the host while it is assigned to the matrix mdev. The callback will
> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver facilitates pass-through of an AP queue to a
> guest. The idea here is that a guest may be administered by a different
> sysadmin than the host and we don't want AP resources to unexpectedly
> disappear from a guest's AP configuration (i.e., adapters, domains and
> control domains assigned to the matrix mdev). This will enforce the proper
> procedure for removing AP resources intended for guest usage which is to
> first unassign them from the matrix mdev, then unbind them from the
> vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
>  drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>  drivers/s390/crypto/ap_bus.h |   4 +
>  2 files changed, 142 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 24a1940b829e..db27bd931308 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -35,6 +35,7 @@
>  #include <linux/mod_devicetable.h>
>  #include <linux/debugfs.h>
>  #include <linux/ctype.h>
> +#include <linux/module.h>
>  
>  #include "ap_bus.h"
>  #include "ap_debug.h"
> @@ -889,6 +890,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
>  	return 0;
>  }
>  
> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
> +			       unsigned long *newmap)
> +{
> +	unsigned long size;
> +	int rc;
> +
> +	size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
> +	if (*str == '+' || *str == '-') {
> +		memcpy(newmap, bitmap, size);
> +		rc = modify_bitmap(str, newmap, bits);
> +	} else {
> +		memset(newmap, 0, size);
> +		rc = hex2bitmap(str, newmap, bits);
> +	}
> +	return rc;
> +}
> +
>  int ap_parse_mask_str(const char *str,
>  		      unsigned long *bitmap, int bits,
>  		      struct mutex *lock)
> @@ -908,14 +926,7 @@ int ap_parse_mask_str(const char *str,
>  		kfree(newmap);
>  		return -ERESTARTSYS;
>  	}
> -
> -	if (*str == '+' || *str == '-') {
> -		memcpy(newmap, bitmap, size);
> -		rc = modify_bitmap(str, newmap, bits);
> -	} else {
> -		memset(newmap, 0, size);
> -		rc = hex2bitmap(str, newmap, bits);
> -	}
> +	rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
>  	if (rc == 0)
>  		memcpy(bitmap, newmap, size);
>  	mutex_unlock(lock);
> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>  	return rc;
>  }
>  
> +static int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> +	int rc = 0;
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +	unsigned long *newapm = (unsigned long *)data;
> +
> +	/*
> +	 * No need to verify whether the driver is using the queues if it is the
> +	 * default driver.
> +	 */
> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> +		return 0;
> +
> +	/* The non-default driver's module must be loaded */
> +	if (!try_module_get(drv->owner))
> +		return 0;
> +
> +	if (ap_drv->in_use)
> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
> +			rc = -EADDRINUSE;
> +
> +	module_put(drv->owner);
> +
> +	return rc;
> +}
> +
> +static int apmask_commit(unsigned long *newapm)
> +{
> +	int rc;
> +	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
> +
> +	/*
> +	 * Check if any bits in the apmask have been set which will
> +	 * result in queues being removed from non-default drivers
> +	 */
> +	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> +				      __verify_card_reservations);
> +		if (rc)
> +			return rc;
> +	}

I understand the above asks all the non-default drivers if some of the
queues are 'used'. But AFAIU this reflects the truth ap_drv->in_use()
is only telling us something about a given moment...

> +
> +	memcpy(ap_perms.apm, newapm, APMASKSIZE);

... So I fail to understand what will prevent us from performing a
successful commit if some of the resources become 'used' between
the call to the in_use() callback and the memcpy.

Of course I might be wrong.

BTW I was never a fan of this mechanism, so I don't mind if it
does not work perfectly, and this should catch most of the cases. Just
want to make sure we don't introduce more confusion than necessary.

> +
> +	return 0;
> +}
> +
>  static ssize_t apmask_store(struct bus_type *bus, const char *buf,
>  			    size_t count)
>  {
>  	int rc;
> +	DECLARE_BITMAP(newapm, AP_DEVICES);
> +
> +	if (mutex_lock_interruptible(&ap_perms_mutex))
> +		return -ERESTARTSYS;
> +
> +	rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
> +	if (rc)
> +		goto done;
>  
> -	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
> +	rc = apmask_commit(newapm);
> +
> +done:
> +	mutex_unlock(&ap_perms_mutex);
>  	if (rc)
>  		return rc;
>  
> @@ -1138,12 +1207,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
>  	return rc;
>  }
>  
> +static int __verify_queue_reservations(struct device_driver *drv, void *data)
> +{
> +	int rc = 0;
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +	unsigned long *newaqm = (unsigned long *)data;
> +
> +	/*
> +	 * If the reserved bits do not identify queues reserved for use by the
> +	 * non-default driver, there is no need to verify the driver is using
> +	 * the queues.
> +	 */
> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> +		return 0;
> +
> +	/* The non-default driver's module must be loaded */
> +	if (!try_module_get(drv->owner))
> +		return 0;
> +
> +	if (ap_drv->in_use)
> +		if (ap_drv->in_use(ap_perms.apm, newaqm))
> +			rc = -EADDRINUSE;
> +
> +	module_put(drv->owner);
> +
> +	return rc;
> +}
> +
> +static int aqmask_commit(unsigned long *newaqm)
> +{
> +	int rc;
> +	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
> +
> +	/*
> +	 * Check if any bits in the aqmask have been set which will
> +	 * result in queues being removed from non-default drivers
> +	 */
> +	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
> +				      __verify_queue_reservations);
> +		if (rc)
> +			return rc;
> +	}
> +
> +	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
> +

Same here.

Regards,
Halil

> +	return 0;
> +}
> +
>  static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
>  			    size_t count)
>  {
>  	int rc;
> +	DECLARE_BITMAP(newaqm, AP_DOMAINS);
>  
> -	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
> +	if (mutex_lock_interruptible(&ap_perms_mutex))
> +		return -ERESTARTSYS;
> +
> +	rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
> +	if (rc)
> +		goto done;
> +
> +	rc = aqmask_commit(newaqm);
> +
> +done:
> +	mutex_unlock(&ap_perms_mutex);
>  	if (rc)
>  		return rc;
>  
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 1ea046324e8f..48c57b3d53a0 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -136,6 +136,7 @@ struct ap_driver {
>  
>  	int (*probe)(struct ap_device *);
>  	void (*remove)(struct ap_device *);
> +	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
>  };
>  
>  #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
> @@ -255,6 +256,9 @@ void ap_queue_init_state(struct ap_queue *aq);
>  struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
>  			       int comp_device_type, unsigned int functions);
>  
> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
> +
>  struct ap_perms {
>  	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
>  	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-08-21 19:56 ` [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
  2020-09-14 15:31   ` Cornelia Huck
@ 2020-09-25  9:29   ` Halil Pasic
  2020-09-29 14:00     ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-25  9:29 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:05 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> +
> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> +{
> +	bool in_use;
> +
> +	mutex_lock(&matrix_dev->lock);
> +	in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
> +	mutex_unlock(&matrix_dev->lock);

See also my comment for patch 4. AFAIU as soon as you release the lock
the in_use may become outdated in any moment.

> +
> +	return in_use;
> +}

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB
  2020-08-21 19:56 ` [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB Tony Krowiak
  2020-09-17 14:22   ` Cornelia Huck
@ 2020-09-26  1:38   ` Halil Pasic
  2020-09-29 16:04     ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-26  1:38 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:06 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The APCB is a field within the CRYCB that provides the AP configuration
> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
> maintain it for the lifespan of the guest.
> 

AFAIU this is supposed to be a no change in behavior patch that lays the
groundwork.

> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c     | 32 ++++++++++++++++++++++-----
>  drivers/s390/crypto/vfio_ap_private.h |  2 ++
>  2 files changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index fc1aa6f947eb..efb229033f9e 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -305,14 +305,35 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
> +{
> +	bitmap_clear(matrix->apm, 0, AP_DEVICES);
> +	bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
> +	bitmap_clear(matrix->adm, 0, AP_DOMAINS);
> +}
> +
>  static void vfio_ap_matrix_init(struct ap_config_info *info,
>  				struct ap_matrix *matrix)
>  {
> +	vfio_ap_matrix_clear_masks(matrix);

I don't quite understand the idea behind this. The only place
vfio_ap_matrix_init() is used, is in create right after the whole
matrix_mdev got allocated with kzalloc.

>  	matrix->apm_max = info->apxa ? info->Na : 63;
>  	matrix->aqm_max = info->apxa ? info->Nd : 15;
>  	matrix->adm_max = info->apxa ? info->Nd : 15;
>  }
>  
> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
> +}
> +
> +static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
> +				  matrix_mdev->shadow_apcb.apm,
> +				  matrix_mdev->shadow_apcb.aqm,
> +				  matrix_mdev->shadow_apcb.adm);
> +}
> +
>  static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  {
>  	struct ap_matrix_mdev *matrix_mdev;
> @@ -1202,13 +1223,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	if (ret)
>  		return NOTIFY_DONE;
>  
> -	/* If there is no CRYCB pointer, then we can't copy the masks */
> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>  		return NOTIFY_DONE;
>  
> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
> -				  matrix_mdev->matrix.aqm,
> -				  matrix_mdev->matrix.adm);
> +	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> +	       sizeof(matrix_mdev->shadow_apcb));

A note on the thread safety of the access to matrix_mdev->matrix. I
guess the idea is, that this is still safe because we did
vfio_ap_mdev_set_kvm() and that is supposed to inhibit changes the
matrix.

There are two things that bother me with this:
1) the assign operations don't check matrix_mdev->kvm under the lock
2) with dynamic, this is supposed to change (So I have to be careful
about it when reviewing the following patches. A sneak-peek at the end
result makes me worried).

> +	vfio_ap_mdev_commit_crycb(matrix_mdev);
>  
>  	return NOTIFY_OK;
>  }
> @@ -1323,6 +1343,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>  		kvm_put_kvm(matrix_mdev->kvm);
>  		matrix_mdev->kvm = NULL;
>  	}
> +
> +	vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);

What is the idea behind this? From the above, it looks like we are going
to overwrite matrix_mdev->shadow_apcb with matrix_mdev->matrix before
the next commit anyway.

I suppose this is probably about no guest unolies no resources passed
through at the moment. If that is the case maybe we can document it
below. 

>  	mutex_unlock(&matrix_dev->lock);
>  
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 0c796ef11426..055bce6d45db 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -75,6 +75,7 @@ struct ap_matrix {
>   * @list:	allows the ap_matrix_mdev struct to be added to a list
>   * @matrix:	the adapters, usage domains and control domains assigned to the
>   *		mediated matrix device.
> + * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
>   * @group_notifier: notifier block used for specifying callback function for
>   *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
>   * @kvm:	the struct holding guest's state
> @@ -82,6 +83,7 @@ struct ap_matrix {
>  struct ap_matrix_mdev {
>  	struct list_head node;
>  	struct ap_matrix matrix;
> +	struct ap_matrix shadow_apcb;
>  	struct notifier_block group_notifier;
>  	struct notifier_block iommu_notifier;
>  	struct kvm *kvm;


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-09-18 17:09     ` Tony Krowiak
@ 2020-09-26  7:16       ` Halil Pasic
  2020-09-29 21:00         ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-26  7:16 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: Cornelia Huck, linux-s390, linux-kernel, kvm, freude,
	borntraeger, mjrosato, alex.williamson, kwankhede, fiuczy,
	frankja, david, imbrenda, hca, gor

On Fri, 18 Sep 2020 13:09:25 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 
> 
> On 9/17/20 10:34 AM, Cornelia Huck wrote:
> > On Fri, 21 Aug 2020 15:56:07 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >
> >> The matrix of adapters and domains configured in a guest's CRYCB may
> >> differ from the matrix of adapters and domains assigned to the matrix mdev,
> >> so this patch introduces a sysfs attribute to display the matrix of a guest
> >> using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
> >> guest using the matrix mdev can be displayed as follows:
> >>
> >>     cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
> >>
> >> If a guest is not using the matrix mdev at the time the crycb is displayed,
> >> an error (ENODEV) will be returned.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> >> ---
> >>   drivers/s390/crypto/vfio_ap_ops.c | 58 +++++++++++++++++++++++++++++++
> >>   1 file changed, 58 insertions(+)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >> index efb229033f9e..30bf23734af6 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -1119,6 +1119,63 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
> >>   }
> >>   static DEVICE_ATTR_RO(matrix);
> >>   
> >> +static ssize_t guest_matrix_show(struct device *dev,
> >> +				 struct device_attribute *attr, char *buf)
> >> +{
> >> +	struct mdev_device *mdev = mdev_from_dev(dev);
> >> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +	char *bufpos = buf;
> >> +	unsigned long apid;
> >> +	unsigned long apqi;
> >> +	unsigned long apid1;
> >> +	unsigned long apqi1;
> >> +	unsigned long napm_bits = matrix_mdev->shadow_apcb.apm_max + 1;
> >> +	unsigned long naqm_bits = matrix_mdev->shadow_apcb.aqm_max + 1;
> >> +	int nchars = 0;
> >> +	int n;
> >> +
> >> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> >> +		return -ENODEV;
> >> +
> >> +	apid1 = find_first_bit_inv(matrix_mdev->shadow_apcb.apm, napm_bits);
> >> +	apqi1 = find_first_bit_inv(matrix_mdev->shadow_apcb.aqm, naqm_bits);
> >> +
> >> +	mutex_lock(&matrix_dev->lock);
> >> +
> >> +	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
> >> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
> >> +				     napm_bits) {
> >> +			for_each_set_bit_inv(apqi,
> >> +					     matrix_mdev->shadow_apcb.aqm,
> >> +					     naqm_bits) {
> >> +				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
> >> +					    apqi);
> >> +				bufpos += n;
> >> +				nchars += n;
> >> +			}
> >> +		}
> >> +	} else if (apid1 < napm_bits) {
> >> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
> >> +				     napm_bits) {
> >> +			n = sprintf(bufpos, "%02lx.\n", apid);
> >> +			bufpos += n;
> >> +			nchars += n;
> >> +		}
> >> +	} else if (apqi1 < naqm_bits) {
> >> +		for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
> >> +				     naqm_bits) {
> >> +			n = sprintf(bufpos, ".%04lx\n", apqi);
> >> +			bufpos += n;
> >> +			nchars += n;
> >> +		}
> >> +	}
> >> +
> >> +	mutex_unlock(&matrix_dev->lock);
> >> +
> >> +	return nchars;
> >> +}
> > This basically looks like a version of matrix_show() operating on the
> > shadow apcb. I'm wondering if we could consolidate these two functions
> > by passing in the structure to operate on as a parameter? Might not be
> > worth the effort, though.
> 
> We still need the two functions because they back the mdev's
> sysfs matrix and guest_matrix attributes, but we could call a function.
> I'm not sure it buys us much though.

The logic seems identical with the exception that the guest variant
checks if vfio_ap_mdev_has_crycb(matrix_mdev). I'm not a big fan of
duplicated code, and especially not in such close proximity. I'm voting
for factoring out the common logic.

Otherwise looks OK.

Regards,
Halil


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 08/16] s390/vfio-ap: filter matrix for unavailable queue devices
  2020-08-21 19:56 ` [PATCH v10 08/16] s390/vfio-ap: filter matrix for unavailable queue devices Tony Krowiak
@ 2020-09-26  8:24   ` Halil Pasic
  2020-09-29 21:59     ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-26  8:24 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:08 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Even though APQNs for queues that are not in the host's AP configuration
> may be assigned to a matrix mdev, we do not want to set bits in the guest's
> APCB for APQNs that do not reference AP queue devices bound to the vfio_ap
> device driver. Ideally, it would be great if such APQNs could be filtered
> out before setting the bits in the guest's APCB; however, the architecture
> precludes filtering individual APQNs. Consequently, either the APID or APQI
> must be filtered.
> 
> This patch introduces code to filter the APIDs or APQIs assigned to the
> matrix mdev's AP configuration before assigning them to the guest's AP
> configuration (i.e., APCB). We'll start by filtering the APIDs:
> 
>    If an APQN assigned to the matrix mdev's AP configuration does not
>    reference a queue device bound to the vfio_ap device driver, the APID
>    will be filtered out (i.e., not assigned to the guest's APCB).
> 
> If every APID assigned to the matrix mdev is filtered out, then we'll try
> filtering the APQI's:
> 
>    If an APQN assigned to the matrix mdev's AP configuration does not
>    reference a queue device bound to the vfio_ap device driver, the APQI
>    will be filtered out (i.e., not assigned to the guest's APCB).
> 
> In any case, if after filtering either the APIDs or APQIs there are any
> APQNs that can be assigned to the guest's APCB, they will be assigned and
> the CRYCB will be hot plugged into the guest.
> 
> Example
> =======
> 
> APQNs bound to vfio_ap device driver:
>    04.0004
>    04.0047
>    04.0054
> 
>    05.0005
>    05.0047
>    05.0054
> 
> Assignments to matrix mdev:
>    APIDs  APQIs  -> APQNs
>    04     0004      04.0004
>    05     0005      04.0005
>           0047      04.0047
>           0054      04.0054
>                     05.0004
>                     05.0005
>                     05.0047
>                     04.0054
> 
> Filter APIDs:
>    APID 04 will be filtered because APQN 04.0005 is not bound.
>    APID 05 will be filtered because APQN 05.0004 is not bound.
>    APQNs remaining: None
> 
> Filter APQIs:
>    APQI 04 will be filtered because APQN 05.0004 is not bound.
>    APQI 05 will be filtered because APQN 04.0005 is not bound.
>    APQNs remaining: 04.0047, 04.0054, 05.0047, 05.0054
> 
> APQNs 04.0047, 04.0054, 05.0047, 05.0054 will be assigned to the CRYCB and
> hot plugged into the KVM guest.
> 

I find this logic where we first do one strategy, and if nothing remains
do the other strategy a little confusing. I will ramble on about it some
more in the code.

> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 159 +++++++++++++++++++++++++++++-
>  1 file changed, 155 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 30bf23734af6..eaf4e9eab6cb 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -326,7 +326,7 @@ static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
>  	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
>  }
>  
> -static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>  {
>  	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>  				  matrix_mdev->shadow_apcb.apm,
> @@ -597,6 +597,157 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>  	return 0;
>  }
>  
> +/**
> + * vfio_ap_mdev_filter_matrix
> + *
> + * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
> + * device bound to the vfio_ap device driver.
> + *
> + * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
> + * @shadow_apcb:  the shadow of the KVM guest's APCB (contains AP configuration
> + *		  for guest)
> + * @filter_apids: boolean value indicating whether the APQNs shall be filtered
> + *		  by APID (true) or by APQI (false).
> + *
> + * Returns the number of APQNs remaining after filtering is complete.
> + */
> +static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
> +				      struct ap_matrix *shadow_apcb,
> +				      bool filter_apids)
> +{
> +	unsigned long apid, apqi, apqn;
> +
> +	memcpy(shadow_apcb, &matrix_mdev->matrix, sizeof(*shadow_apcb));
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		/*
> +		 * If the APID is not assigned to the host AP configuration,
> +		 * we can not assign it to the guest's AP configuration
> +		 */
> +		if (!test_bit_inv(apid,
> +				  (unsigned long *)matrix_dev->info.apm)) {

The patch description and the code seem to be out of sync. Here you do
some filtering based on the host's  AP config info read at module read at
module initialization time.

> +			clear_bit_inv(apid, shadow_apcb->apm);
> +			continue;
> +		}
> +
> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> +				     AP_DOMAINS) {
> +			/*
> +			 * If the APQI is not assigned to the host AP
> +			 * configuration, then it can not be assigned to the
> +			 * guest's AP configuration
> +			 */
> +			if (!test_bit_inv(apqi, (unsigned long *)
> +					  matrix_dev->info.aqm)) {
> +				clear_bit_inv(apqi, shadow_apcb->aqm);
> +				continue;
> +			}
> +
> +			/*
> +			 * If the APQN is not bound to the vfio_ap device
> +			 * driver, then we can't assign it to the guest's
> +			 * AP configuration. The AP architecture won't
> +			 * allow filtering of a single APQN, so if we're
> +			 * filtering APIDs, then filter the APID; otherwise,
> +			 * filter the APQI.
> +			 */
> +			apqn = AP_MKQID(apid, apqi);
> +			if (!vfio_ap_get_queue(apqn)) {

Is this really gonna give NULL if the queue is not bound to vfio-ap? I
don't think so. This will get NULL if the queue is not known to the AP
bus, or has no driver-data assigned. In the current state it should give
you non-NULL if another driver has the queue, and maintains it's own
driver specific data in drvdata.

> +				if (filter_apids)
> +					clear_bit_inv(apid, shadow_apcb->apm);
> +				else
> +					clear_bit_inv(apqi, shadow_apcb->aqm);
> +				break;
> +			}
> +		}
> +
> +		/*
> +		 * If we're filtering APQIs and all of them have been filtered,
> +		 * there's no need to continue filtering.
> +		 */
> +		if (!filter_apids)
> +			if (bitmap_empty(shadow_apcb->aqm, AP_DOMAINS))
> +				break;
> +	}
> +
> +	return bitmap_weight(shadow_apcb->apm, AP_DEVICES) *
> +	       bitmap_weight(shadow_apcb->aqm, AP_DOMAINS);
> +}
> +
> +/**
> + * vfio_ap_mdev_config_shadow_apcb
> + *
> + * Configure the shadow of a KVM guest's APCB specifying the adapters, domains
> + * and control domains to be assigned to the guest. The shadow APCB will be
> + * configured after filtering the APQNs assigned to the matrix mdev that do not
> + * reference a queue device bound to the vfio_ap device driver.
> + *
> + * @matrix_mdev: the matrix mdev whose shadow APCB is to be configured.
> + *
> + * Returns true if the shadow APCB contents have been changed; otherwise,
> + * returns false.
> + */
> +static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
> +{
> +	int napm, naqm;
> +	struct ap_matrix shadow_apcb;
> +
> +	vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
> +	napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
> +	naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
> +
> +	/*
> +	 * If there are no APIDs or no APQIs assigned to the matrix mdev,
> +	 * then no APQNs shall be assigned to the guest CRYCB.
> +	 */
> +	if ((napm != 0) || (naqm != 0)) {
> +		/*
> +		 * Filter the APIDs assigned to the matrix mdev for APQNs that
> +		 * do not reference an AP queue device bound to the driver.
> +		 */
> +		napm = vfio_ap_mdev_filter_matrix(matrix_mdev, &shadow_apcb,
> +						  true);
> +		/*
> +		 * If there are no APQNs that can be assigned to the guest's
> +		 * CRYCB after filtering, then try filtering the APQIs.
> +		 */
> +		if (napm == 0) {

When do we expect this to happen? Currently we don't assign queues that
are not bound to us, and we have ->in_use() that inhibits disappearance
of queues due to re-partitioning.

So what we are left with is queue becomes unavailable to the host
because of a config change, and maybe manual unbind -- not sure about
that.

Now if matrix_dev->info was to reflect the config the bus acts by, which
seems to the idea behind patch 12 we could react accordingly (if the
domain is gone filter aqm).

I mean, the purpose of this callback seems to be getting us out of
trouble when domains are missing across all cards (i.e. some domains
were assigned away from us on the lower level).

Or am I missing something?

> +			naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
> +							  &shadow_apcb, false);
> +
> +			/*
> +			 * If there are no APQNs that can be assigned to the
> +			 * matrix mdev after filtering the APQIs, then no APQNs
> +			 * shall be assigned to the guest's CRYCB.
> +			 */
> +			if (naqm == 0) {
> +				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
> +				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
> +			}
> +		}
> +	}
> +
> +	/*
> +	 * If the guest's AP configuration has not changed, then return
> +	 * indicating such.
> +	 */
> +	if (bitmap_equal(matrix_mdev->shadow_apcb.apm, shadow_apcb.apm,
> +			 AP_DEVICES) &&
> +	    bitmap_equal(matrix_mdev->shadow_apcb.aqm, shadow_apcb.aqm,
> +			 AP_DOMAINS) &&
> +	    bitmap_equal(matrix_mdev->shadow_apcb.adm, shadow_apcb.adm,
> +			 AP_DOMAINS))
> +		return false;
> +
> +	/*
> +	 * Copy the changes to the guest's CRYCB, then return indicating that
> +	 * the guest's AP configuration has changed.
> +	 */
> +	memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb, sizeof(shadow_apcb));
> +
> +	return true;
> +}
> +
>  enum qlink_type {
>  	LINK_APID,
>  	LINK_APQI,
> @@ -1284,9 +1435,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>  		return NOTIFY_DONE;
>  
> -	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
> -	       sizeof(matrix_mdev->shadow_apcb));
> -	vfio_ap_mdev_commit_crycb(matrix_mdev);
> +	if (vfio_ap_mdev_config_shadow_apcb(matrix_mdev))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  
>  	return NOTIFY_OK;
>  }
> @@ -1396,6 +1546,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>  	mutex_lock(&matrix_dev->lock);
>  	if (matrix_mdev->kvm) {
>  		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> +		vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
>  		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>  		vfio_ap_mdev_reset_queues(mdev);
>  		kvm_put_kvm(matrix_mdev->kvm);


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-08-21 19:56 ` [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
@ 2020-09-26 23:49   ` Halil Pasic
  2020-09-30 12:59     ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-26 23:49 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:09 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The current implementation does not allow assignment of an AP adapter or
> domain to an mdev device if the APQNs resulting from the assignment
> do not reference AP queue devices that are bound to the vfio_ap device
> driver. This patch allows assignment of AP resources to the matrix mdev as
> long as the APQNs resulting from the assignment:
>    1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
>    2. Are not assigned to another matrix mdev.
> 
> The rationale behind this is twofold:
>    1. The AP architecture does not preclude assignment of APQNs to an AP
>       configuration that are not available to the system.
>    2. APQNs that do not reference a queue device bound to the vfio_ap
>       device driver will not be assigned to the guest's CRYCB, so the
>       guest will not get access to queues not bound to the vfio_ap driver.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 212 +++++-------------------------
>  1 file changed, 35 insertions(+), 177 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index eaf4e9eab6cb..24fd47e43b80 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1,4 +1,3 @@
> -// SPDX-License-Identifier: GPL-2.0+

Probably not intentional, or?

>  /*
>   * Adjunct processor matrix VFIO device driver callbacks.
>   *
> @@ -420,122 +419,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>  	NULL,
>  };
>  
> -struct vfio_ap_queue_reserved {
> -	unsigned long *apid;
> -	unsigned long *apqi;
> -	bool reserved;
> -};
> -
> -/**
> - * vfio_ap_has_queue
> - *
> - * @dev: an AP queue device
> - * @data: a struct vfio_ap_queue_reserved reference
> - *
> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
> - * apid or apqi specified in @data:
> - *
> - * - If @data contains both an apid and apqi value, then @data will be flagged
> - *   as reserved if the APID and APQI fields for the AP queue device matches
> - *
> - * - If @data contains only an apid value, @data will be flagged as
> - *   reserved if the APID field in the AP queue device matches
> - *
> - * - If @data contains only an apqi value, @data will be flagged as
> - *   reserved if the APQI field in the AP queue device matches
> - *
> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
> - * @data does not contain either an apid or apqi.
> - */
> -static int vfio_ap_has_queue(struct device *dev, void *data)
> -{
> -	struct vfio_ap_queue_reserved *qres = data;
> -	struct ap_queue *ap_queue = to_ap_queue(dev);
> -	ap_qid_t qid;
> -	unsigned long id;
> -
> -	if (qres->apid && qres->apqi) {
> -		qid = AP_MKQID(*qres->apid, *qres->apqi);
> -		if (qid == ap_queue->qid)
> -			qres->reserved = true;
> -	} else if (qres->apid && !qres->apqi) {
> -		id = AP_QID_CARD(ap_queue->qid);
> -		if (id == *qres->apid)
> -			qres->reserved = true;
> -	} else if (!qres->apid && qres->apqi) {
> -		id = AP_QID_QUEUE(ap_queue->qid);
> -		if (id == *qres->apqi)
> -			qres->reserved = true;
> -	} else {
> -		return -EINVAL;
> -	}
> -
> -	return 0;
> -}
> -
> -/**
> - * vfio_ap_verify_queue_reserved
> - *
> - * @matrix_dev: a mediated matrix device
> - * @apid: an AP adapter ID
> - * @apqi: an AP queue index
> - *
> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
> - * driver according to the following rules:
> - *
> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
> - *   device bound to the vfio_ap driver with the APQN identified by @apid and
> - *   @apqi
> - *
> - * - If only @apid is not NULL, then there must be an AP queue device bound
> - *   to the vfio_ap driver with an APQN containing @apid
> - *
> - * - If only @apqi is not NULL, then there must be an AP queue device bound
> - *   to the vfio_ap driver with an APQN containing @apqi
> - *
> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
> - */
> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
> -					 unsigned long *apqi)
> -{
> -	int ret;
> -	struct vfio_ap_queue_reserved qres;
> -
> -	qres.apid = apid;
> -	qres.apqi = apqi;
> -	qres.reserved = false;
> -
> -	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> -				     &qres, vfio_ap_has_queue);
> -	if (ret)
> -		return ret;
> -
> -	if (qres.reserved)
> -		return 0;
> -
> -	return -EADDRNOTAVAIL;
> -}
> -
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
> -					     unsigned long apid)
> -{
> -	int ret;
> -	unsigned long apqi;
> -	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
> -
> -	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
> -		return vfio_ap_verify_queue_reserved(&apid, NULL);
> -
> -	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	return 0;
> -}
> -
>  #define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>  			 "already assigned to %s"
>  
> @@ -572,6 +455,11 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>  	DECLARE_BITMAP(aqm, AP_DOMAINS);
>  
>  	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> +		/*
> +		 * If either of the input masks belongs to the mdev to which an
> +		 * AP resource is being assigned, then we don't need to verify
> +		 * that mdev's masks.
> +		 */
>  		if (matrix_mdev == lstdev)
>  			continue;
>  

Seems unrelated.

> @@ -597,6 +485,20 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>  	return 0;
>  }
>  
> +static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
> +				       unsigned long *mdev_apm,
> +				       unsigned long *mdev_aqm)
> +{
> +	DECLARE_BITMAP(apm, AP_DEVICES);
> +	DECLARE_BITMAP(aqm, AP_DOMAINS);
> +
> +	if (bitmap_and(apm, mdev_apm, ap_perms.apm, AP_DEVICES) &&
> +	    bitmap_and(aqm, mdev_aqm, ap_perms.aqm, AP_DOMAINS))

Isn't ap_perms supposed to be protected by ap_perms_mutex? In theory
you could end up with a torn write (catch the a[pq]mask_commit() with
its pants down, in a sense that only a part of the memcpy was done (and
became observable on the other CPU doing this validate).

> +		return -EADDRNOTAVAIL;
> +
> +	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
> +}
> +
>  /**
>   * vfio_ap_mdev_filter_matrix
>   *
> @@ -882,33 +784,21 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	if (apid > matrix_mdev->matrix.apm_max)
>  		return -ENODEV;
>  
> -	/*
> -	 * Set the bit in the AP mask (APM) corresponding to the AP adapter
> -	 * number (APID). The bits in the mask, from most significant to least
> -	 * significant bit, correspond to APIDs 0-255.
> -	 */
> -	mutex_lock(&matrix_dev->lock);
> -
> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
> -	if (ret)
> -		goto done;
> -
>  	memset(apm, 0, sizeof(apm));
>  	set_bit_inv(apid, apm);
>  
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
> -					     matrix_mdev->matrix.aqm);
> -	if (ret)
> -		goto done;
> -
> +	mutex_lock(&matrix_dev->lock);
> +	ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
> +					  matrix_mdev->matrix.aqm);
> +	if (ret) {
> +		mutex_unlock(&matrix_dev->lock);
> +		return ret;
> +	}

At this point the ap_perms may have already changed, or?

>  	set_bit_inv(apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
> -	ret = count;
> -
> -done:
>  	mutex_unlock(&matrix_dev->lock);
>  
> -	return ret;
> +	return count;
>  }
>  static DEVICE_ATTR_WO(assign_adapter);
>  
> @@ -958,26 +848,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(unassign_adapter);
>  
> -static int
> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
> -					     unsigned long apqi)
> -{
> -	int ret;
> -	unsigned long apid;
> -	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
> -
> -	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
> -		return vfio_ap_verify_queue_reserved(NULL, &apqi);
> -
> -	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
> -		if (ret)
> -			return ret;
> -	}
> -
> -	return 0;
> -}
> -
>  /**
>   * assign_domain_store
>   *
> @@ -1031,28 +901,21 @@ static ssize_t assign_domain_store(struct device *dev,
>  	if (apqi > max_apqi)
>  		return -ENODEV;
>  
> -	mutex_lock(&matrix_dev->lock);
> -
> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
> -	if (ret)
> -		goto done;
> -
>  	memset(aqm, 0, sizeof(aqm));
>  	set_bit_inv(apqi, aqm);
>  
> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
> -					     matrix_mdev->matrix.apm, aqm);
> -	if (ret)
> -		goto done;
> -
> +	mutex_lock(&matrix_dev->lock);
> +	ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
> +					  aqm);
> +	if (ret) {
> +		mutex_unlock(&matrix_dev->lock);
> +		return ret;
> +	}
>  	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
> -	ret = count;
> -
> -done:
>  	mutex_unlock(&matrix_dev->lock);
>  
> -	return ret;
> +	return count;
>  }
>  static DEVICE_ATTR_WO(assign_domain);
>  
> @@ -1139,11 +1002,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>  	if (id > matrix_mdev->matrix.adm_max)
>  		return -ENODEV;
>  
> -	/* Set the bit in the ADM (bitmask) corresponding to the AP control
> -	 * domain number (id). The bits in the mask, from most significant to
> -	 * least significant, correspond to IDs 0 up to the one less than the
> -	 * number of control domains that can be assigned.
> -	 */
>  	mutex_lock(&matrix_dev->lock);
>  	set_bit_inv(id, matrix_mdev->matrix.adm);
>  	mutex_unlock(&matrix_dev->lock);


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 10/16] s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
  2020-08-21 19:56 ` [PATCH v10 10/16] s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest Tony Krowiak
@ 2020-09-27  0:03   ` Halil Pasic
  2020-09-30 13:19     ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-27  0:03 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:10 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> The current support for pass-through crypto adapters does not allow
> configuration of a matrix mdev when it is in use by a KVM guest. Let's
> allow AP resources - i.e., adapters, domains and control domains - to be
> assigned to or unassigned from a matrix mdev while it is in use by a guest.
> This is in preparation for the introduction of support for dynamic
> configuration of the AP matrix for a running KVM guest.

AFAIU this will let the user do the assign, which will however only take
effect if the same mdev is re-used with a freshly constructed VM, or?

This is however supposed to change real soon (in patch 11). From the
perspective of bisectability we would end up with a single commit that
acts funny.

How about switching up patches 10 and 11. This way the changes you have
in the current 11 would remain dormant until the changes in the current
10 enable the complete new feature (hotplug)?


> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 24 ------------------------
>  1 file changed, 24 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 24fd47e43b80..cf3321eb239b 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -773,10 +773,6 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow assignment of adapter */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apid);
>  	if (ret)
>  		return ret;
> @@ -828,10 +824,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow un-assignment of adapter */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apid);
>  	if (ret)
>  		return ret;
> @@ -891,10 +883,6 @@ static ssize_t assign_domain_store(struct device *dev,
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>  
> -	/* If the guest is running, disallow assignment of domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apqi);
>  	if (ret)
>  		return ret;
> @@ -946,10 +934,6 @@ static ssize_t unassign_domain_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow un-assignment of domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apqi);
>  	if (ret)
>  		return ret;
> @@ -991,10 +975,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow assignment of control domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &id);
>  	if (ret)
>  		return ret;
> @@ -1036,10 +1016,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
>  
> -	/* If the guest is running, disallow un-assignment of control domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &domid);
>  	if (ret)
>  		return ret;


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 12/16] s390/zcrypt: Notify driver on config changed and scan complete callbacks
  2020-08-21 19:56 ` [PATCH v10 12/16] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
@ 2020-09-27  1:39   ` Halil Pasic
  0 siblings, 0 replies; 79+ messages in thread
From: Halil Pasic @ 2020-09-27  1:39 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:12 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> This patch intruduces an extension to the ap bus to notify drivers
> on crypto config changed and bus scan complete events.
> Two new callbacks are introduced for ap_drivers:
> 
>   void (*on_config_changed)(struct ap_config_info *new_config_info,
>                              struct ap_config_info *old_config_info);
>   void (*on_scan_complete)(struct ap_config_info *new_config_info,
>                              struct ap_config_info *old_config_info);
> 
> Both callbacks are optional. Both callbacks are only triggered
> when QCI information is available (facility bit 12):
> 
> * The on_config_changed callback is invoked at the start of the AP bus scan
>   function when it determines that the host AP configuration information
>   has changed since the previous scan. This is done by storing
>   an old and current QCI info struct and comparing them. If there is any
>   difference, the callback is invoked.
> 
>   Note that when the AP bus scan detects that AP adapters or domains have
>   been removed from the host's AP configuration, it will remove the
>   associated devices from the AP bus subsystem's device model. This
>   callback gives the device driver a chance to respond to the removal
>   of the AP devices in bulk rather than one at a time as its remove
>   callback is invoked. It will also allow the device driver to do any
>   any cleanup prior to giving control back to the bus piecemeal. This is
>   particularly important for the vfio_ap driver because there may be
>   guests using the queues at the time.
> 
> * The on_scan_complete callback is invoked after the ap bus scan is
>   complete if the host AP configuration data has changed.
> 
>   Note that when the AP bus scan detects that adapters or domains have
>   been added to the host's configuration, it will create new devices in
>   the AP bus subsystem's device model. This callback also allows the driver
>   to process all of the new devices in bulk.
> 
> Please note that changes to the apmask and aqmask do not trigger
> these two callbacks since the bus scan function is not invoked by changes
> to those masks.
> 
> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/ap_bus.c | 85 +++++++++++++++++++++++++++++++++++-
>  drivers/s390/crypto/ap_bus.h | 12 +++++
>  2 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index db27bd931308..cbf4c4d2e573 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -73,8 +73,10 @@ struct ap_perms ap_perms;
>  EXPORT_SYMBOL(ap_perms);
>  DEFINE_MUTEX(ap_perms_mutex);
>  EXPORT_SYMBOL(ap_perms_mutex);
> +DEFINE_MUTEX(ap_config_lock);
>  
>  static struct ap_config_info *ap_qci_info;
> +static struct ap_config_info *ap_qci_info_old;
>  
>  /*
>   * AP bus related debug feature things.
> @@ -1412,6 +1414,52 @@ static int __match_queue_device_with_queue_id(struct device *dev, const void *da
>  		&& AP_QID_QUEUE(to_ap_queue(dev)->qid) == (int)(long) data;
>  }
>  
> +/* Helper function for notify_config_changed */
> +static int __drv_notify_config_changed(struct device_driver *drv, void *data)
> +{
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> +	if (try_module_get(drv->owner)) {
> +		if (ap_drv->on_config_changed)
> +			ap_drv->on_config_changed(ap_qci_info,
> +						  ap_qci_info_old);
> +		module_put(drv->owner);
> +	}
> +
> +	return 0;
> +}
> +
> +/* Notify all drivers about an qci config change */
> +static inline void notify_config_changed(void)
> +{
> +	bus_for_each_drv(&ap_bus_type, NULL, NULL,
> +			 __drv_notify_config_changed);
> +}
> +
> +/* Helper function for notify_scan_complete */
> +static int __drv_notify_scan_complete(struct device_driver *drv, void *data)
> +{
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +
> +	if (try_module_get(drv->owner)) {
> +		if (ap_drv->on_scan_complete)
> +			ap_drv->on_scan_complete(ap_qci_info,
> +						 ap_qci_info_old);
> +		module_put(drv->owner);
> +	}
> +
> +	return 0;
> +}
> +
> +/* Notify all drivers about bus scan complete */
> +static inline void notify_scan_complete(void)
> +{
> +	bus_for_each_drv(&ap_bus_type, NULL, NULL,
> +			 __drv_notify_scan_complete);
> +}
> +
> +
> +

Too many blank lines?

>  /*
>   * Helper function for ap_scan_bus().
>   * Does the scan bus job for the given adapter id.
> @@ -1565,15 +1613,44 @@ static void _ap_scan_bus_adapter(int id)
>  		put_device(&ac->ap_dev.device);
>  }
>  
> +static int ap_config_changed(void)
> +{
> +	int cfg_chg = 0;
> +
> +	if (ap_qci_info) {
> +		if (!ap_qci_info_old) {
> +			ap_qci_info_old = kzalloc(sizeof(*ap_qci_info_old),
> +						  GFP_KERNEL);
> +			if (!ap_qci_info_old)
> +				return 0;
> +		} else {
> +			memcpy(ap_qci_info_old, ap_qci_info,
> +			       sizeof(struct ap_config_info));
> +		}
> +		ap_fetch_qci_info(ap_qci_info);
> +		cfg_chg = memcmp(ap_qci_info,
> +				 ap_qci_info_old,
> +				 sizeof(struct ap_config_info)) != 0;
> +	}
> +
> +	return cfg_chg;
> +}
> +
>  /**
>   * ap_scan_bus(): Scan the AP bus for new devices
>   * Runs periodically, workqueue timer (ap_config_time)
>   */
>  static void ap_scan_bus(struct work_struct *unused)
>  {
> -	int id;
> +	int id, config_changed = 0;
>  
>  	ap_fetch_qci_info(ap_qci_info);

Do we still need this ap_fetch_qci_info()? ...

> +	mutex_lock(&ap_config_lock);

The usage of ap_qci_info does not seem to change substantially, and
ap_qci_info_old is not used unlike. I believe if we need ap_config_lock
now, then we used to need it before. And then adding this lock should
really be a separate patch than clearly advertises its fix nature.


> +
> +	/* config change notify */
> +	config_changed = ap_config_changed();

... I mean ap_config_changed() does a ap_fetch_qci_info()
of it's own.

Otherwise looks OK!

Regards,
Halil

> +	if (config_changed)
> +		notify_config_changed();
>  	ap_select_domain();
>  
>  	AP_DBF(DBF_DEBUG, "%s running\n", __func__);
> @@ -1582,6 +1659,12 @@ static void ap_scan_bus(struct work_struct *unused)
>  	for (id = 0; id < AP_DEVICES; id++)
>  		_ap_scan_bus_adapter(id);
>  
> +	/* scan complete notify */
> +	if (config_changed)
> +		notify_scan_complete();
> +
> +	mutex_unlock(&ap_config_lock);
> +
>  	/* check if there is at least one queue available with default domain */
>  	if (ap_domain_index >= 0) {
>  		struct device *dev =
> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
> index 48c57b3d53a0..3fc743ac549c 100644
> --- a/drivers/s390/crypto/ap_bus.h
> +++ b/drivers/s390/crypto/ap_bus.h
> @@ -137,6 +137,18 @@ struct ap_driver {
>  	int (*probe)(struct ap_device *);
>  	void (*remove)(struct ap_device *);
>  	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
> +	/*
> +	 * Called at the start of the ap bus scan function when
> +	 * the crypto config information (qci) has changed.
> +	 */
> +	void (*on_config_changed)(struct ap_config_info *new_config_info,
> +				  struct ap_config_info *old_config_info);
> +	/*
> +	 * Called at the end of the ap bus scan function when
> +	 * the crypto config information (qci) has changed.
> +	 */
> +	void (*on_scan_complete)(struct ap_config_info *new_config_info,
> +				 struct ap_config_info *old_config_info);
>  };
>  
>  #define to_ap_drv(x) container_of((x), struct ap_driver, driver)


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-08-21 19:56 ` [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
@ 2020-09-28  1:01   ` Halil Pasic
  2020-10-05 16:24     ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-28  1:01 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:11 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's hot plug/unplug adapters, domains and control domains assigned to or
> unassigned from an AP matrix mdev device while it is in use by a guest per
> the following:
> 
> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>   guest, the adapter will be hot plugged into the KVM guest as long as each
>   APQN derived from the Cartesian product of the APID being assigned and
>   the APQIs already assigned to the guest's CRYCB references a queue device
>   bound to the vfio_ap device driver.
> 
> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>   KVM guest, the adapter will be hot unplugged from the KVM guest.
> 
> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>   guest, the domain will be hot plugged into the KVM guest as long as each
>   APQN derived from the Cartesian product of the APQI being assigned and
>   the APIDs already assigned to the guest's CRYCB references a queue device
>   bound to the vfio_ap device driver.
> 
> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>   KVM guest, the domain will be hot unplugged from the KVM guest



Hm, I suppose this means that what your guest effectively gets may depend
on whether assign_domain or assign_adapter is done first.

Suppose we have the queues
0.0 0.1
1.0 
bound to vfio_ap, i.e. 1.1 is missing for a reason different than
belonging to the default drivers (for what exact reason no idea).

Let's suppose we started with the matix containing only adapter
0 (0.) and domain 0 (.0).

After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
matrix:
0.0 0.1
1.0 1.1
guest_matrix:
0.0 0.1
while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
with:
matrix:
0.0 0.1
1.0 1.1
guest_matrix:
0.0
0.1

That means, the set of bound queues and the set of assigned resources do
not fully determine the set of resources passed through to the guest.

I that a deliberate design choice?



> 
> * When the domain number of a control domain is assigned to a matrix mdev
>   in use by a KVM guest, the control domain will be hot plugged into the
>   KVM guest.
> 
> * When the domain number of a control domain is unassigned from a matrix
>   mdev in use by a KVM guest, the control domain will be hot unplugged
>   from the KVM guest.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 196 ++++++++++++++++++++++++++++++
>  1 file changed, 196 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index cf3321eb239b..2b01a8eb6ee7 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -731,6 +731,56 @@ static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
>  	}
>  }
>  
> +static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
> +					     unsigned long apid)
> +{
> +	DECLARE_BITMAP(aqm, AP_DOMAINS);
> +	unsigned long apqi, apqn;
> +
> +	bitmap_copy(aqm, matrix_mdev->matrix.aqm, AP_DOMAINS);
> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
> +		if (!test_bit_inv(apqi,
> +				  (unsigned long *) matrix_dev->info.aqm))
> +			clear_bit_inv(apqi, aqm);
> +
> +		apqn = AP_MKQID(apid, apqi);
> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
> +			clear_bit_inv(apqi, aqm);
> +	}
> +
> +	if (bitmap_empty(aqm, AP_DOMAINS))
> +		return false;
> +
> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +	bitmap_copy(matrix_mdev->shadow_apcb.aqm, aqm, AP_DOMAINS);
> +
> +	return true;
> +}
> +
> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
> +					   unsigned long apid)
> +{
> +	unsigned long apqi, apqn;
> +
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
> +		return false;
> +
> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> +		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);


Hm. Let's say we have the same situation regarding the bound queues as
above but we start with the empty matrix, and do all the assignments
while the guest is running.

Consider the following sequence of actions.

1) echo 0 > assign_domain
2) echo 1 > assign_domain
3) echo 1 > assign_adapter
4) echo 0 > assign_adapter
5) echo 1 > unassign_adapter

I understand that at 3), because
bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
aqm containing just domain 0, as queue 1.1 ain't bound to us.

Thus at the end we would have
matrix:
0.0 0.1
guest_matrix:
0.0

And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
then after
6) echo 2 > assign_adapter
we get
Thus at the end we would have
matrix:
0.0 0.1
2.0 2.1
guest_matrix:
0.0
2.0

This looks very quirky to me. Did I read the code wrong? Opinions?

> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
> +		apqn = AP_MKQID(apid, apqi);
> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
> +			return false;
> +	}
> +
> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +
> +	return true;
> +}
> +
>  /**
>   * assign_adapter_store
>   *
> @@ -792,12 +842,42 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	}
>  	set_bit_inv(apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
> +	if (vfio_ap_mdev_assign_guest_apid(matrix_mdev, apid))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(assign_adapter);
>  
> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
> +					     unsigned long apid)
> +{
> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
> +		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
> +			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +
> +			/*
> +			 * If there are no APIDs assigned to the guest, then
> +			 * the guest will not have access to any queues, so
> +			 * let's also go ahead and unassign the APQIs. Keeping
> +			 * them around may yield unpredictable results during
> +			 * a probe that is not related to a host AP
> +			 * configuration change (i.e., an AP adapter is
> +			 * configured online).
> +			 */

I don't quite understand this comment. Clearing out the other mask when
the one becomes empty, does allow us to recover the full possible guest
matrix in the scenario described above. I don't see any shadow
manipulation in the probe handler at this stage. Are we maybe
talking about the same effect as I described for assign?

Regards,
Halil

> +			if (bitmap_empty(matrix_mdev->shadow_apcb.apm,
> +					 AP_DEVICES))
> +				bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
> +					     AP_DOMAINS);
> +
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
>  /**
>   * unassign_adapter_store
>   *
> @@ -834,12 +914,64 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
> +	if (vfio_ap_mdev_unassign_guest_apid(matrix_mdev, apid))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(unassign_adapter);
>  
> +static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
> +					     unsigned long apqi)
> +{
> +	DECLARE_BITMAP(apm, AP_DEVICES);
> +	unsigned long apid, apqn;
> +
> +	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> +		if (!test_bit_inv(apid,
> +				  (unsigned long *) matrix_dev->info.apm))
> +			clear_bit_inv(apqi, apm);
> +
> +		apqn = AP_MKQID(apid, apqi);
> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
> +			clear_bit_inv(apid, apm);
> +	}
> +
> +	if (bitmap_empty(apm, AP_DEVICES))
> +		return false;
> +
> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +	bitmap_copy(matrix_mdev->shadow_apcb.apm, apm, AP_DEVICES);
> +
> +	return true;
> +}
> +
> +static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
> +					   unsigned long apqi)
> +{
> +	unsigned long apid, apqn;
> +
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
> +	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
> +		return false;
> +
> +	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
> +		return vfio_ap_mdev_assign_apids_4_apqi(matrix_mdev, apqi);
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
> +		apqn = AP_MKQID(apid, apqi);
> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
> +			return false;
> +	}
> +
> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +
> +	return true;
> +}
> +
>  /**
>   * assign_domain_store
>   *
> @@ -901,12 +1033,41 @@ static ssize_t assign_domain_store(struct device *dev,
>  	}
>  	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
> +	if (vfio_ap_mdev_assign_guest_apqi(matrix_mdev, apqi))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(assign_domain);
>  
> +static bool vfio_ap_mdev_unassign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
> +					     unsigned long apqi)
> +{
> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
> +		if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
> +			clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +
> +			/*
> +			 * If there are no APQIs assigned to the guest, then
> +			 * the guest will not have access to any queues, so
> +			 * let's also go ahead and unassign the APIDs. Keeping
> +			 * them around may yield unpredictable results during
> +			 * a probe that is not related to a host AP
> +			 * configuration change (i.e., an AP adapter is
> +			 * configured online).
> +			 */
> +			if (bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> +					 AP_DOMAINS))
> +				bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
> +					     AP_DEVICES);
> +
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
>  
>  /**
>   * unassign_domain_store
> @@ -944,12 +1105,28 @@ static ssize_t unassign_domain_store(struct device *dev,
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
> +	if (vfio_ap_mdev_unassign_guest_apqi(matrix_mdev, apqi))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(unassign_domain);
>  
> +static bool vfio_ap_mdev_assign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
> +					   unsigned long domid)
> +{
> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
> +		if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
> +			set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> +
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
>  /**
>   * assign_control_domain_store
>   *
> @@ -984,12 +1161,29 @@ static ssize_t assign_control_domain_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	set_bit_inv(id, matrix_mdev->matrix.adm);
> +	if (vfio_ap_mdev_assign_guest_cdom(matrix_mdev, id))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(assign_control_domain);
>  
> +static bool
> +vfio_ap_mdev_unassign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
> +				 unsigned long domid)
> +{
> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
> +		if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
> +			clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> +
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
>  /**
>   * unassign_control_domain_store
>   *
> @@ -1024,6 +1218,8 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv(domid, matrix_mdev->matrix.adm);
> +	if (vfio_ap_mdev_unassign_guest_cdom(matrix_mdev, domid))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;

u

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification
  2020-08-21 19:56 ` [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification Tony Krowiak
@ 2020-09-28  1:38   ` Halil Pasic
  2020-10-12 20:53     ` Tony Krowiak
  2020-10-12 21:27     ` Tony Krowiak
  0 siblings, 2 replies; 79+ messages in thread
From: Halil Pasic @ 2020-09-28  1:38 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot

On Fri, 21 Aug 2020 15:56:13 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Implements the driver callback invoked by the AP bus when the host
> AP configuration has changed. Since this callback is invoked prior to
> unbinding a device from its device driver, the vfio_ap driver will
> respond by unplugging the AP adapters, domains and control domains
> removed from the host's AP configuration from the guests using them.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reported-by: kernel test robot <lkp@intel.com>

Looks reasonable, but shouldn't vfio_ap_mdev_remove_queue() already
have code that kicks the queue from the shadow at this stage?

I mean if the removal is for a reason different that host config change,
we wont update the guest_matrix or?

> ---
>  drivers/s390/crypto/vfio_ap_drv.c     |   5 +-
>  drivers/s390/crypto/vfio_ap_ops.c     | 147 ++++++++++++++++++++++++--
>  drivers/s390/crypto/vfio_ap_private.h |   7 +-
>  3 files changed, 146 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index aae5b3d8e3fa..ea0a7603e886 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -115,9 +115,11 @@ static int vfio_ap_matrix_dev_create(void)
>  
>  	/* Fill in config info via PQAP(QCI), if available */
>  	if (test_facility(12)) {
> -		ret = ap_qci(&matrix_dev->info);
> +		ret = ap_qci(&matrix_dev->config_info);
>  		if (ret)
>  			goto matrix_alloc_err;
> +		memcpy(&matrix_dev->config_info_prev, &matrix_dev->config_info,
> +		       sizeof(struct ap_config_info));
>  	}
>  
>  	mutex_init(&matrix_dev->lock);
> @@ -177,6 +179,7 @@ static int __init vfio_ap_init(void)
>  	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
>  	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>  	vfio_ap_drv.ids = ap_queue_ids;
> +	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
>  
>  	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
>  	if (ret) {
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 2b01a8eb6ee7..e002d556abab 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -347,7 +347,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>  	}
>  
>  	matrix_mdev->mdev = mdev;
> -	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
> +	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
> +	vfio_ap_matrix_init(&matrix_dev->config_info,
> +			    &matrix_mdev->shadow_apcb);
>  	hash_init(matrix_mdev->qtable);
>  	mdev_set_drvdata(mdev, matrix_mdev);
>  	matrix_mdev->pqap_hook.hook = handle_pqap;
> @@ -526,8 +528,8 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>  		 * If the APID is not assigned to the host AP configuration,
>  		 * we can not assign it to the guest's AP configuration
>  		 */
> -		if (!test_bit_inv(apid,
> -				  (unsigned long *)matrix_dev->info.apm)) {
> +		if (!test_bit_inv(apid, (unsigned long *)
> +				  matrix_dev->config_info.apm)) {
>  			clear_bit_inv(apid, shadow_apcb->apm);
>  			continue;
>  		}
> @@ -540,7 +542,7 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>  			 * guest's AP configuration
>  			 */
>  			if (!test_bit_inv(apqi, (unsigned long *)
> -					  matrix_dev->info.aqm)) {
> +					  matrix_dev->config_info.aqm)) {
>  				clear_bit_inv(apqi, shadow_apcb->aqm);
>  				continue;
>  			}
> @@ -594,7 +596,7 @@ static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>  	int napm, naqm;
>  	struct ap_matrix shadow_apcb;
>  
> -	vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
> +	vfio_ap_matrix_init(&matrix_dev->config_info, &shadow_apcb);
>  	napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
>  	naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
>  
> @@ -741,7 +743,7 @@ static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
>  
>  	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>  		if (!test_bit_inv(apqi,
> -				  (unsigned long *) matrix_dev->info.aqm))
> +				  (unsigned long *)matrix_dev->config_info.aqm))
>  			clear_bit_inv(apqi, aqm);
>  
>  		apqn = AP_MKQID(apid, apqi);
> @@ -764,7 +766,7 @@ static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>  	unsigned long apqi, apqn;
>  
>  	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
> -	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->config_info.apm))
>  		return false;
>  
>  	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> @@ -931,8 +933,8 @@ static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
>  	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
>  
>  	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
> -		if (!test_bit_inv(apid,
> -				  (unsigned long *) matrix_dev->info.apm))
> +		if (!test_bit_inv(apid, (unsigned long *)
> +				  matrix_dev->config_info.apm))
>  			clear_bit_inv(apqi, apm);
>  
>  		apqn = AP_MKQID(apid, apqi);
> @@ -955,7 +957,7 @@ static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>  	unsigned long apid, apqn;
>  
>  	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
> -	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
> +	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->config_info.aqm))
>  		return false;
>  
>  	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
> @@ -1702,7 +1704,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
>  void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
>  {
>  	struct vfio_ap_queue *q;
> -	int apid, apqi;
> +	unsigned long apid, apqi;
>  

Unrelated?

>  	mutex_lock(&matrix_dev->lock);
>  	q = dev_get_drvdata(&queue->ap_dev.device);
> @@ -1727,3 +1729,126 @@ bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>  
>  	return in_use;
>  }
> +
> +/**
> + * vfio_ap_mdev_unassign_apids
> + *
> + * @matrix_mdev: The matrix mediated device
> + *
> + * @aqm: A bitmap with 256 bits. Each bit in the map represents an APID from 0
> + *	 to 255 (with the leftmost bit corresponding to APID 0).
> + *
> + * Unassigns each APID specified in @aqm that is assigned to the shadow CRYCB
> + * of @matrix_mdev. Returns true if at least one APID is unassigned; otherwise,
> + * returns false.
> + */
> +static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
> +					unsigned long *apm_unassign)
> +{
> +	unsigned long apid;
> +	bool unassigned = false;
> +
> +	/*
> +	 * If the matrix mdev is not in use by a KVM guest, return indicating
> +	 * that no APIDs have been unassigned.
> +	 */
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> +		return false;
> +
> +	for_each_set_bit_inv(apid, apm_unassign, AP_DEVICES) {
> +		unassigned |= vfio_ap_mdev_unassign_guest_apid(matrix_mdev,
> +							       apid);
> +	}

I guess, we could accomplish the unassign with operations operating on
full bitmaps (without looping over bits), but I have no strong opinion
here.

> +
> +	return unassigned;
> +}
> +
> +/**
> + * vfio_ap_mdev_unassign_apqis
> + *
> + * @matrix_mdev: The matrix mediated device
> + *
> + * @aqm: A bitmap with 256 bits. Each bit in the map represents an APQI from 0
> + *	 to 255 (with the leftmost bit corresponding to APQI 0).
> + *
> + * Unassigns each APQI specified in @aqm that is assigned to the shadow CRYCB
> + * of @matrix_mdev. Returns true if at least one APQI is unassigned; otherwise,
> + * returns false.
> + */
> +static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
> +					unsigned long *aqm_unassign)
> +{
> +	unsigned long apqi;
> +	bool unassigned = false;
> +
> +	/*
> +	 * If the matrix mdev is not in use by a KVM guest, return indicating
> +	 * that no APQIs have been unassigned.
> +	 */
> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> +		return false;
> +
> +	for_each_set_bit_inv(apqi, aqm_unassign, AP_DOMAINS) {
> +		unassigned |= vfio_ap_mdev_unassign_guest_apqi(matrix_mdev,
> +							       apqi);
> +	}
> +
> +	return unassigned;
> +}
> +
> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
> +			    struct ap_config_info *old_config_info)
> +{
> +	bool unassigned;
> +	int ap_remove, aq_remove;
> +	struct ap_matrix_mdev *matrix_mdev;
> +	DECLARE_BITMAP(apm_unassign, AP_DEVICES);
> +	DECLARE_BITMAP(aqm_unassign, AP_DOMAINS);
> +
> +	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
> +
> +	if (matrix_dev->flags & AP_MATRIX_CFG_CHG) {
> +		WARN_ONCE(1, "AP host configuration change already reported");
> +		return;
> +	}
> +
> +	memcpy(&matrix_dev->config_info, new_config_info,
> +	       sizeof(struct ap_config_info));
> +	memcpy(&matrix_dev->config_info_prev, old_config_info,
> +	       sizeof(struct ap_config_info));
> +
> +	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
> +	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
> +	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
> +	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
> +
> +	ap_remove = bitmap_andnot(apm_unassign, prev_apm, cur_apm, AP_DEVICES);
> +	aq_remove = bitmap_andnot(aqm_unassign, prev_aqm, cur_aqm, AP_DOMAINS);
> +
> +	mutex_lock(&matrix_dev->lock);
> +	matrix_dev->flags |= AP_MATRIX_CFG_CHG;
> +
> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> +		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> +			continue;
> +
> +		unassigned = false;
> +
> +		if (ap_remove)
> +			if (bitmap_intersects(matrix_mdev->shadow_apcb.apm,
> +					      apm_unassign, AP_DEVICES))
> +				if (vfio_ap_mdev_unassign_apids(matrix_mdev,
> +								apm_unassign))

This can be done with a single "if".

if (A)
	if (B)
		if (C)
			D;

should be equivalent with
if (A && B && C)
	D;
and your wouldn't end up that deep indentation. It is a style thing,
so unless regulated by the official coding style, it is up to you :)


> +					unassigned = true;
> +		if (aq_remove)
> +			if (bitmap_intersects(matrix_mdev->shadow_apcb.aqm,
> +					      aqm_unassign, AP_DOMAINS))
> +				if (vfio_ap_mdev_unassign_apqis(matrix_mdev,
> +								aqm_unassign))
> +					unassigned = true;
> +
> +		if (unassigned)
> +			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +	mutex_unlock(&matrix_dev->lock);
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 055bce6d45db..fc8629e28ad3 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -40,10 +40,13 @@
>  struct ap_matrix_dev {
>  	struct device device;
>  	atomic_t available_instances;
> -	struct ap_config_info info;
> +	struct ap_config_info config_info;
> +	struct ap_config_info config_info_prev;
>  	struct list_head mdev_list;
>  	struct mutex lock;
>  	struct ap_driver  *vfio_ap_drv;
> +	#define AP_MATRIX_CFG_CHG (1UL << 0)
> +	unsigned long flags;
>  };
>  
>  extern struct ap_matrix_dev *matrix_dev;
> @@ -108,5 +111,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue);
>  void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
>  
>  bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
> +			    struct ap_config_info *old_config_info);
>  
>  #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 14/16] s390/vfio-ap: handle AP bus scan completed notification
  2020-08-21 19:56 ` [PATCH v10 14/16] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
@ 2020-09-28  2:11   ` Halil Pasic
  0 siblings, 0 replies; 79+ messages in thread
From: Halil Pasic @ 2020-09-28  2:11 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:14 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Implements the driver callback invoked by the AP bus when the AP bus
> scan has completed. Since this callback is invoked after binding the newly
> added devices to their respective device drivers, the vfio_ap driver will
> attempt to plug the adapters, domains and control domains into each guest
> using a matrix mdev to which they are assigned. Keep in mind that an
> adapter or domain can be plugged in only if each APQN with the APID of the
> adapter or the APQI of the domain references a queue device bound to the
> vfio_ap device driver. Consequently, not all newly added adapters and
> domains will necessarily get hot plugged.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_drv.c     |   1 +
>  drivers/s390/crypto/vfio_ap_ops.c     | 110 +++++++++++++++++++++++++-
>  drivers/s390/crypto/vfio_ap_private.h |   2 +
>  3 files changed, 110 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index ea0a7603e886..21bfae928be5 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -180,6 +180,7 @@ static int __init vfio_ap_init(void)
>  	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>  	vfio_ap_drv.ids = ap_queue_ids;
>  	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
> +	vfio_ap_drv.on_scan_complete = vfio_ap_on_scan_complete;
>  
>  	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
>  	if (ret) {
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index e002d556abab..e6480f31a42b 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -616,14 +616,13 @@ static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>  		 * CRYCB after filtering, then try filtering the APQIs.
>  		 */
>  		if (napm == 0) {
> -			naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
> -							  &shadow_apcb, false);
> -
>  			/*
>  			 * If there are no APQNs that can be assigned to the
>  			 * matrix mdev after filtering the APQIs, then no APQNs
>  			 * shall be assigned to the guest's CRYCB.
>  			 */
> +			naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
> +							  &shadow_apcb, false);

Here you just moved the thing around the comment, or?

>  			if (naqm == 0) {
>  				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
>  				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
> @@ -1758,6 +1757,16 @@ static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
>  	for_each_set_bit_inv(apid, apm_unassign, AP_DEVICES) {
>  		unassigned |= vfio_ap_mdev_unassign_guest_apid(matrix_mdev,
>  							       apid);
> +		/*
> +		 * If the APID is not assigned to the matrix mdev's shadow
> +		 * CRYCB, continue with the next APID.
> +		 */
> +		if (!test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> +			continue;
> +
> +		/* Unassign the APID from the matrix mdev's shadow CRYCB */
> +		clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +		unassigned = true;

I don't understand this at all. This patch is supposed to be about
assign and not unassign, or?

>  	}
>  
>  	return unassigned;
> @@ -1791,6 +1800,17 @@ static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
>  	for_each_set_bit_inv(apqi, aqm_unassign, AP_DOMAINS) {
>  		unassigned |= vfio_ap_mdev_unassign_guest_apqi(matrix_mdev,
>  							       apqi);
> +
> +		/*
> +		 * If the APQI is not assigned to the matrix mdev's shadow
> +		 * CRYCB, continue with the next APQI
> +		 */
> +		if (!test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
> +			continue;
> +
> +		/* Unassign the APQI from the matrix mdev's shadow CRYCB */
> +		clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +		unassigned = true;
>  	}
>  
>  	return unassigned;
> @@ -1852,3 +1872,87 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>  	}
>  	mutex_unlock(&matrix_dev->lock);
>  }
> +
> +bool vfio_ap_mdev_assign_apids(struct ap_matrix_mdev *matrix_mdev,
> +			       unsigned long *apm_assign)
> +{
> +	unsigned long apid;
> +	bool assigned = false;
> +
> +	for_each_set_bit_inv(apid, apm_assign, AP_DEVICES)
> +		if (test_bit_inv(apid, matrix_mdev->matrix.apm))
> +			if (vfio_ap_mdev_assign_guest_apid(matrix_mdev, apid))
> +				assigned = true;
> +
> +	return assigned;
> +}
> +
> +bool vfio_ap_mdev_assign_apqis(struct ap_matrix_mdev *matrix_mdev,
> +			       unsigned long *aqm_assign)
> +{
> +	unsigned long apqi;
> +	bool assigned = false;
> +
> +	for_each_set_bit_inv(apqi, aqm_assign, AP_DOMAINS)
> +		if (test_bit_inv(apqi, matrix_mdev->matrix.aqm))
> +			if (vfio_ap_mdev_assign_guest_apqi(matrix_mdev, apqi))
> +				assigned = true;
> +
> +	return assigned;
> +}
> +
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> +			      struct ap_config_info *old_config_info)
> +{
> +	struct ap_matrix_mdev *matrix_mdev;
> +	DECLARE_BITMAP(apm_assign, AP_DEVICES);
> +	DECLARE_BITMAP(aqm_assign, AP_DOMAINS);
> +	int ap_add, aq_add;
> +	bool assign;
> +	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
> +
> +	/*
> +	 * If we are not in the middle of a host configuration change scan it is
> +	 * likely that the vfio_ap driver was loaded mid-scan, so let's handle
> +	 * this scenario by calling the vfio_ap_on_cfg_changed function which
> +	 * gets called at the start of an AP bus scan when the host AP
> +	 * configuration has changed.
> +	 */
> +	if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
> +		vfio_ap_on_cfg_changed(new_config_info, old_config_info);

Or we could just let the not-optimized variant handle it. Patch 15 has
to take care of single queues anyway, and 13 and 14 are about avoiding
having a bunch of updates to the CRYCB in short succession. But if we
just loaded the module in a middle of a config changing scan, then I
guess having a bunch of populated mdevs attached to guests is not very
likely.

> +
> +	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
> +	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
> +
> +	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
> +	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
> +
> +	ap_add = bitmap_andnot(apm_assign, cur_apm, prev_apm, AP_DEVICES);
> +	aq_add = bitmap_andnot(aqm_assign, cur_aqm, prev_aqm, AP_DOMAINS);
> +
> +	mutex_lock(&matrix_dev->lock);
> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
> +		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
> +			continue;
> +
> +		assign = false;
> +
> +		if (ap_add)
> +			if (bitmap_intersects(matrix_mdev->matrix.apm,
> +					      apm_assign, AP_DEVICES))
> +				assign |= vfio_ap_mdev_assign_apids(matrix_mdev,
> +								    apm_assign);
> +
> +		if (aq_add)
> +			if (bitmap_intersects(matrix_mdev->matrix.aqm,
> +					      aqm_assign, AP_DOMAINS))
> +				assign |= vfio_ap_mdev_assign_apqis(matrix_mdev,
> +								    aqm_assign);
> +
> +		if (assign)
> +			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +
> +	matrix_dev->flags &= ~AP_MATRIX_CFG_CHG;
> +	mutex_unlock(&matrix_dev->lock);
> +}


There may be a simpler and more concise way to accomplish this logic,
but at this point I don't want to think about that. We can do
refactoring any time we want. I'm more worried about the points I
addressed in reference to the previous patches.

Regards,
Halil

> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index fc8629e28ad3..da1754fd4f66 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -113,5 +113,7 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
>  bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>  void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>  			    struct ap_config_info *old_config_info);
> +void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
> +			      struct ap_config_info *old_config_info);
>  
>  #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 15/16] s390/vfio-ap: handle probe/remove not due to host AP config changes
  2020-08-21 19:56 ` [PATCH v10 15/16] s390/vfio-ap: handle probe/remove not due to host AP config changes Tony Krowiak
@ 2020-09-28  2:45   ` Halil Pasic
  0 siblings, 0 replies; 79+ messages in thread
From: Halil Pasic @ 2020-09-28  2:45 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:15 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> AP queue devices are probed or removed for reasons other than changes
> to the host AP configuration. For example:
> 
> * The state of an AP adapter can be dynamically changed from standby to
>   online via the SE or by execution of the SCLP Configure AP command. When
>   the state changes, each queue device associated with the card device
>   representing the adapter will get created and probed.
> 
> * The state of an AP adapter can be dynamically changed from online to
>   standby via the SE or by execution of the SCLP Deconfigure AP command.
>   When the state changes, each queue device associated with the card device
>   representing the adapter will get removed.
> 
> * Each queue device associated with a card device will get removed
>   when the type of the AP adapter represented by the card device
>   dynamically changes.
> 
> * Each queue device associated with a card device will get removed
>   when the status of the queue represented by the queue device changes
>   from operating to check stop.
> 
> * AP queue devices can be manually bound to or unbound from the vfio_ap
>   device driver by a root user via the sysfs bind/unbind attributes of the
>   driver.
> 
> In response to a queue device probe or remove that is not the result of a
> change to the host's AP configuration, if a KVM guest is using the matrix
> mdev to which the APQN of the queue device is assigned, the vfio_ap device
> driver must respond accordingly. In an ideal world, the queue device being
> probed would be hot plugged into the guest. Likewise, the queue
> corresponding to the queue device being removed would
> be hot unplugged from the guest. Unfortunately, the AP architecture
> precludes plugging or unplugging individual queues, so let's handle
> the probe or remove of an AP queue device as follows:
> 
> Handling Probe
> --------------
> There are two requirements that must be met in order to give a
> guest access to the queue corresponding to the queue device being probed:
> 
> * Each APQN derived from the APID of the queue device and the APQIs of the
>   domains already assigned to the guest's AP configuration must reference
>   a queue device bound to the vfio_ap device driver.
> 
> * Each APQN derived from the APQI of the queue device and the APIDs of the
>   adapters assigned to the guest's AP configuration must reference a queue
>   device bound to the vfio_ap device driver.
> 
> If the above conditions are met, the APQN will be assigned to the guest's
> AP configuration and the guest will be given access to the queue.
> 
> Handling Remove
> ---------------
> Since the AP architecture precludes us from taking access to an individual
> queue from a guest, we are left with the choice of taking access away from
> either the adapter or the domain to which the queue is connected. Access to
> the adapter will be taken away because it is likely that most of the time,
> the remove callback will be invoked because the adapter state has
> transitioned from online to standby. In such a case, no queue connected
> to the adapter will be available to access.
> 

I think I would like to have the 'react to binds and unbinds'
functionality added as a single patch to avoid introducing commits that
realize that don't act like designed. You could, for example implement
the config change callbacks in separate patches (like you did) to ease
review, but delay their registration with the AP bus.

I would also prefer 'react to binds and unbinds' implemented before
'allow changes to a running guests config'. Actually the 'react to binds
and unbinds' should be introduced together with filtering, because if
we filtered because of the bind situation, we want to revisit the
filtering when the bind situation changes. At least in my opinion.


> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 84 +++++++++++++++++++++++++++++++
>  1 file changed, 84 insertions(+)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index e6480f31a42b..b6a1e280991d 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -1682,6 +1682,61 @@ static void vfio_ap_queue_link_mdev(struct vfio_ap_queue *q)
>  	}
>  }
>  
> +static bool vfio_ap_mdev_assign_shadow_apid(struct ap_matrix_mdev *matrix_mdev,
> +					    unsigned long apid)
> +{
> +	unsigned long apqi;
> +
> +	for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
> +			     matrix_mdev->shadow_apcb.aqm_max + 1) {
> +		if (!vfio_ap_get_queue(AP_MKQID(apid, apqi)))
> +			return false;
> +	}
> +
> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +
> +	return true;
> +}
> +
> +static bool vfio_ap_mdev_assign_shadow_apqi(struct ap_matrix_mdev *matrix_mdev,
> +					    unsigned long apqi)
> +{
> +	unsigned long apid;
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
> +			     matrix_mdev->shadow_apcb.apm_max + 1) {
> +		if (!vfio_ap_get_queue(AP_MKQID(apid, apqi)))
> +			return false;
> +	}
> +
> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +
> +	return true;
> +}
> +
> +static void vfio_ap_mdev_hot_plug_queue(struct vfio_ap_queue *q)
> +{
> +	bool commit = false;
> +	unsigned long apid = AP_QID_CARD(q->apqn);
> +	unsigned long apqi = AP_QID_QUEUE(q->apqn);
> +
> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> +		return;
> +
> +	if (!test_bit_inv(apid, q->matrix_mdev->matrix.apm) ||
> +	    !test_bit_inv(apqi, q->matrix_mdev->matrix.aqm))
> +		return;
> +
> +	if (!test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm))
> +		commit |= vfio_ap_mdev_assign_shadow_apid(q->matrix_mdev, apid);
> +
> +	if (!test_bit_inv(apqi, q->matrix_mdev->shadow_apcb.aqm))
> +		commit |= vfio_ap_mdev_assign_shadow_apqi(q->matrix_mdev, apqi);
> +
> +	if (commit)
> +		vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
> +}
> +
>  int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
>  {
>  	struct vfio_ap_queue *q;
> @@ -1695,11 +1750,35 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
>  	q->apqn = queue->qid;
>  	q->saved_isc = VFIO_AP_ISC_INVALID;
>  	vfio_ap_queue_link_mdev(q);
> +	/* Make sure we're not in the middle of an AP configuration change. */
> +	if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
> +		vfio_ap_mdev_hot_plug_queue(q);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return 0;
>  }
>  
> +void vfio_ap_mdev_hot_unplug_queue(struct vfio_ap_queue *q)
> +{
> +	unsigned long apid = AP_QID_CARD(q->apqn);
> +	unsigned long apqi = AP_QID_QUEUE(q->apqn);
> +
> +	if ((q->matrix_mdev == NULL) || !vfio_ap_mdev_has_crycb(q->matrix_mdev))
> +		return;
> +
> +	/*
> +	 * If the APQN is assigned to the guest, then let's
> +	 * go ahead and unplug the adapter since the
> +	 * architecture does not provide a means to unplug
> +	 * an individual queue.
> +	 */
> +	if (test_bit_inv(apid, q->matrix_mdev->shadow_apcb.apm) &&
> +	    test_bit_inv(apqi, q->matrix_mdev->shadow_apcb.aqm)) {
> +		if (vfio_ap_mdev_unassign_guest_apid(q->matrix_mdev, apid))
> +			vfio_ap_mdev_commit_shadow_apcb(q->matrix_mdev);
> +	}
> +}
> +
>  void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
>  {
>  	struct vfio_ap_queue *q;
> @@ -1707,6 +1786,11 @@ void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
>  
>  	mutex_lock(&matrix_dev->lock);
>  	q = dev_get_drvdata(&queue->ap_dev.device);
> +
> +	/* Make sure we're not in the middle of an AP configuration change. */
> +	if (!(matrix_dev->flags & AP_MATRIX_CFG_CHG))
> +		vfio_ap_mdev_hot_unplug_queue(q);
> +

Can a queue get unplugged for a different reason than a configuration
change, while we are in a middle of a configuration change?

If it can then I don't think we would react accordingly -- it would
slip through the cracks.

Actually I would use the link between the mdev and the queue to shortcut
remove_queue(). That is on_cfg_changed should severe the by setting the
matrix_mdev pointer to NULL after the queue got cleaned up. If the
matrix_mdev pointer is still valid remove_queue should do the full
program.

Please also consider a similar scenario in probe (e.g. queue comes back
form manual unbind while AP_MATRIX_CFG_CHG. It is less critical that
remove though.

Regards,
Halil

>  	dev_set_drvdata(&queue->ap_dev.device, NULL);
>  	apid = AP_QID_CARD(q->apqn);
>  	apqi = AP_QID_QUEUE(q->apqn);


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support
  2020-08-21 19:56 ` [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
  2020-08-25 10:45   ` Cornelia Huck
@ 2020-09-28  2:48   ` Halil Pasic
  2020-10-16 16:36     ` Tony Krowiak
  1 sibling, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-28  2:48 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Fri, 21 Aug 2020 15:56:16 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Update the documentation in vfio-ap.rst to include information about the
> AP dynamic configuration support (i.e., hot plug of adapters, domains
> and control domains via the matrix mediated device's sysfs assignment
> attributes).

If you don't mind I would like to skip out on commenting on the
documentation update, because of the design issues I've raised. I think
we should first clear that up. Is that OK with you?

> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  Documentation/s390/vfio-ap.rst | 362 ++++++++++++++++++++++++++-------
>  1 file changed, 285 insertions(+), 77 deletions(-)
> 
> diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
> index e15436599086..8907aeca8fb7 100644
> --- a/Documentation/s390/vfio-ap.rst
> +++ b/Documentation/s390/vfio-ap.rst
> @@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM guest is:
>  1. The administrator loads the vfio_ap device driver
>  2. The vfio-ap driver during its initialization will register a single 'matrix'
>     device with the device core. This will serve as the parent device for
> -   all mediated matrix devices used to configure an AP matrix for a guest.
> +   all matrix mediated devices used to configure an AP matrix for a guest.
>  3. The /sys/devices/vfio_ap/matrix device is created by the device core
>  4. The vfio_ap device driver will register with the AP bus for AP queue devices
>     of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
> @@ -269,7 +269,7 @@ The process for reserving an AP queue for use by a KVM guest is:
>     default zcrypt cex4queue driver.
>  8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
>     it.
> -9. The administrator creates a passthrough type mediated matrix device to be
> +9. The administrator creates a passthrough type matrix mediated device to be
>     used by a guest
>  10. The administrator assigns the adapters, usage domains and control domains
>      to be exclusively used by a guest.
> @@ -279,14 +279,14 @@ Set up the VFIO mediated device interfaces
>  The VFIO AP device driver utilizes the common interface of the VFIO mediated
>  device core driver to:
>  
> -* Register an AP mediated bus driver to add a mediated matrix device to and
> +* Register an AP mediated bus driver to add a matrix mediated device to and
>    remove it from a VFIO group.
> -* Create and destroy a mediated matrix device
> -* Add a mediated matrix device to and remove it from the AP mediated bus driver
> -* Add a mediated matrix device to and remove it from an IOMMU group
> +* Create and destroy a matrix mediated device
> +* Add a matrix mediated device to and remove it from the AP mediated bus driver
> +* Add a matrix mediated device to and remove it from an IOMMU group
>  
>  The following high-level block diagram shows the main components and interfaces
> -of the VFIO AP mediated matrix device driver::
> +of the VFIO AP matrix mediated device driver::
>  
>     +-------------+
>     |             |
> @@ -351,29 +351,37 @@ matrix device.
>      This attribute group identifies the user-defined sysfs attributes of the
>      mediated device. When a device is registered with the VFIO mediated device
>      framework, the sysfs attribute files identified in the 'mdev_attr_groups'
> -    structure will be created in the mediated matrix device's directory. The
> -    sysfs attributes for a mediated matrix device are:
> +    structure will be created in the matrix mediated device's directory. The
> +    sysfs attributes for a matrix mediated device are:
>  
>      assign_adapter / unassign_adapter:
>        Write-only attributes for assigning/unassigning an AP adapter to/from the
> -      mediated matrix device. To assign/unassign an adapter, the APID of the
> +      matrix mediated device. To assign/unassign an adapter, the APID of the
>        adapter is echoed to the respective attribute file.
>      assign_domain / unassign_domain:
>        Write-only attributes for assigning/unassigning an AP usage domain to/from
> -      the mediated matrix device. To assign/unassign a domain, the domain
> +      the matrix mediated device. To assign/unassign a domain, the domain
>        number of the usage domain is echoed to the respective attribute
>        file.
>      matrix:
> -      A read-only file for displaying the APQNs derived from the cross product
> -      of the adapter and domain numbers assigned to the mediated matrix device.
> +      A read-only file for displaying the APQNs derived from the Cartesian
> +      product of the adapter and domain numbers assigned to the mediated matrix
> +      device.
> +    guest_matrix:
> +      A read-only file for displaying the APQNs derived from the Cartesian
> +      product of the adapter and domain numbers assigned to the APM and AQM
> +      fields respectively of the KVM guest's CRYCB. This will differ from the
> +      matrix if any APQNs assigned to the matrix mediated device do not
> +      reference a queue device bound to the vfio_ap device driver (i.e., the
> +      queue is not in the AP configuration).
>      assign_control_domain / unassign_control_domain:
>        Write-only attributes for assigning/unassigning an AP control domain
> -      to/from the mediated matrix device. To assign/unassign a control domain,
> +      to/from the matrix mediated device. To assign/unassign a control domain,
>        the ID of the domain to be assigned/unassigned is echoed to the respective
>        attribute file.
>      control_domains:
>        A read-only file for displaying the control domain numbers assigned to the
> -      mediated matrix device.
> +      matrix mediated device.
>  
>  * functions:
>  
> @@ -385,7 +393,7 @@ matrix device.
>        domains assigned via the corresponding sysfs attributes files
>  
>    remove:
> -    deallocates the mediated matrix device's ap_matrix_mdev structure. This will
> +    deallocates the matrix mediated device's ap_matrix_mdev structure. This will
>      be allowed only if a running guest is not using the mdev.
>  
>  * callback interfaces
> @@ -397,7 +405,7 @@ matrix device.
>      for the mdev matrix device to the MDEV bus. Access to the KVM structure used
>      to configure the KVM guest is provided via this callback. The KVM structure,
>      is used to configure the guest's access to the AP matrix defined via the
> -    mediated matrix device's sysfs attribute files.
> +    matrix mediated device's sysfs attribute files.
>    release:
>      unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
>      mdev matrix device and deconfigures the guest's AP matrix.
> @@ -410,11 +418,49 @@ function is called when QEMU connects to KVM. The guest's AP matrix is
>  configured via it's CRYCB by:
>  
>  * Setting the bits in the APM corresponding to the APIDs assigned to the
> -  mediated matrix device via its 'assign_adapter' interface.
> +  matrix mediated device via its 'assign_adapter' interface.
>  * Setting the bits in the AQM corresponding to the domains assigned to the
> -  mediated matrix device via its 'assign_domain' interface.
> +  matrix mediated device via its 'assign_domain' interface.
>  * Setting the bits in the ADM corresponding to the domain dIDs assigned to the
> -  mediated matrix device via its 'assign_control_domains' interface.
> +  matrix mediated device via its 'assign_control_domains' interface.
> +
> +The linux device model precludes passing a device through to a KVM guest that
> +is not bound to the device driver facilitating its pass-through. Consequently,
> +an APQN that does not reference a queue device bound to the vfio_ap device
> +driver will not be assigned to a KVM guest's CRYCB. The AP architecture,
> +however, does not provide a means to filter individual APQNs from the guest's
> +CRYCB, so the following logic is employed to filter them:
> +
> +* Filter the APQNs assigned to the matrix mediated device by APID.
> +
> +  To filter APQNs by APID, each APQN derived from the Cartesian product of the
> +  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
> +  examined and if any one of them does not reference a queue device bound to the
> +  vfio_ap device driver, the adapter will not be plugged into the guest (i.e.,
> +  the bit corresponding to its APID will not be set in the APM of the guest's
> +  CRYCB).
> +
> +  If at least one adapter is plugged into the guest, then all domains assigned
> +  to the mdev will also be plugged into the guest (i.e., the bits corresponding
> +  to the APQIs of the domains assigned to the mdev will be set in the AQM field
> +  of the guest's CRYCB).
> +
> +* Filter the APQNs assigned to the matrix mediated device by APQI.
> +
> +  The APQNs will be filtered by APQI if filtering by APID does not result in any
> +  adapters or domains getting plugged into the guest.
> +
> +  To filter APQNs by APQI, each APQN derived from the Cartesian product of the
> +  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
> +  examined and if any one of them does not reference a queue device bound to the
> +  vfio_ap device driver, the domain will not be plugged into the guest (i.e.,
> +  the bit corresponding to its APQI will not be set in the AQM of the guest's
> +  CRYCB).
> +
> +  If at least one domain is plugged into the guest, then all adapters assigned
> +  to the mdev will also be plugged into the guest (i.e., the bits corresponding
> +  to the APIDs of the adapters assigned to the mdev will be set in the APM field
> +  of the guest's CRYCB).
>  
>  The CPU model features for AP
>  -----------------------------
> @@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model features:
>     can be made available to the guest only if it is available on the host (i.e.,
>     facility bit 12 is set).
>  
> +4. apqi: Indicates AP queue interrupts are available on the guest. This facility
> +   can be made available to the guest only if it is available on the host (i.e.,
> +   facility bit 65 is set).
> +
>  Note: If the user chooses to specify a CPU model different than the 'host'
>  model to QEMU, the CPU model features and facilities need to be turned on
>  explicitly; for example::
> @@ -444,7 +494,7 @@ explicitly; for example::
>  A guest can be precluded from using AP features/facilities by turning them off
>  explicitly; for example::
>  
> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
>  
>  Note: If the APFT facility is turned off (apft=off) for the guest, the guest
>  will not see any AP devices. The zcrypt device drivers that register for type 10
> @@ -530,40 +580,56 @@ These are the steps:
>  
>  2. Secure the AP queues to be used by the three guests so that the host can not
>     access them. To secure them, there are two sysfs files that specify
> -   bitmasks marking a subset of the APQN range as 'usable by the default AP
> -   queue device drivers' or 'not usable by the default device drivers' and thus
> -   available for use by the vfio_ap device driver'. The location of the sysfs
> -   files containing the masks are::
> +   bitmasks marking a subset of the APQN range as usable only by the default AP
> +   queue device drivers. All remaining APQNs are available available for use by
> +   any other device driver. The vfio_ap device driver is currently the only
> +   non-default device driver. The location of the sysfs files containing the
> +   masks are::
>  
>       /sys/bus/ap/apmask
>       /sys/bus/ap/aqmask
>  
>     The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
> -   (APID). Each bit in the mask, from left to right (i.e., from most significant
> -   to least significant bit in big endian order), corresponds to an APID from
> -   0-255. If a bit is set, the APID is marked as usable only by the default AP
> -   queue device drivers; otherwise, the APID is usable by the vfio_ap
> -   device driver.
> +   (APID). Each bit in the mask, from left to right corresponds to an APID from
> +   0-255. If a bit is set, the APID is marked as available to the default AP
> +   queue device drivers.
>  
>     The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
> -   (APQI). Each bit in the mask, from left to right (i.e., from most significant
> -   to least significant bit in big endian order), corresponds to an APQI from
> -   0-255. If a bit is set, the APQI is marked as usable only by the default AP
> -   queue device drivers; otherwise, the APQI is usable by the vfio_ap device
> -   driver.
> +   (APQI). Each bit in the mask, from left to right corresponds to an APQI from
> +   0-255. If a bit is set, the APQI is marked as available to the default AP
> +   queue device drivers.
> +
> +   The Cartesian product of the APIDs corresponding to the bits set in the
> +   apmask and the APQIs corresponding to the bits set in the aqmask comprise
> +   the subset of APQNs that can be used only by the host default device drivers.
> +   All other APQNs are available to the non-default device drivers such as the
> +   vfio_ap driver.
> +
> +   Take, for example, the following masks::
> +
> +      apmask:
> +      0x7d00000000000000000000000000000000000000000000000000000000000000
> +
> +      aqmask:
> +      0x8000000000000000000000000000000000000000000000000000000000000000
> +
> +   The masks indicate:
>  
> -   Take, for example, the following mask::
> +   * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
> +     device drivers.
>  
> -      0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
> +   * Domain 0 is available for use by the host default device drivers
>  
> -    It indicates:
> +   * The subset of APQNs available for use only by the default host device
> +     drivers are:
>  
> -      1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
> -      belong to the vfio_ap device driver's pool.
> +     (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
> +
> +   * All other APQNs are available for use by the non-default device drivers.
>  
>     The APQN of each AP queue device assigned to the linux host is checked by the
> -   AP bus against the set of APQNs derived from the cross product of APIDs
> -   and APQIs marked as usable only by the default AP queue device drivers. If a
> +   AP bus against the set of APQNs derived from the Cartesian product of APIDs
> +   and APQIs marked as available to the default AP queue device drivers. If a
>     match is detected,  only the default AP queue device drivers will be probed;
>     otherwise, the vfio_ap device driver will be probed.
>  
> @@ -627,6 +693,16 @@ These are the steps:
>  	    default drivers pool:    adapter 0-15, domain 1
>  	    alternate drivers pool:  adapter 16-255, domains 0, 2-255
>  
> +   Note ***:
> +   Changing a mask such that one or more APQNs will be taken from a matrix
> +   mediated device (see below) will fail with an error (EADDRINUSE). The error
> +   is logged to the kernel ring buffer which can be viewed with the 'dmesg'
> +   command. The output identifies each APQN flagged as 'in use' and the matrix
> +   mediated device to which it is assigned; for example:
> +
> +   Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
> +   Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
> +
>  Securing the APQNs for our example
>  ----------------------------------
>     To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
> @@ -684,7 +760,7 @@ Securing the APQNs for our example
>  
>       /sys/devices/vfio_ap/matrix/
>       --- [mdev_supported_types]
> -     ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
> +     ------ [vfio_ap-passthrough] (passthrough matrix mediated device type)
>       --------- create
>       --------- [devices]
>  
> @@ -775,17 +851,18 @@ Securing the APQNs for our example
>       higher than the maximum is specified, the operation will terminate with
>       an error (ENODEV).
>  
> -   * All APQNs that can be derived from the adapter ID and the IDs of
> -     the previously assigned domains must be bound to the vfio_ap device
> -     driver. If no domains have yet been assigned, then there must be at least
> -     one APQN with the specified APID bound to the vfio_ap driver. If no such
> -     APQNs are bound to the driver, the operation will terminate with an
> -     error (EADDRNOTAVAIL).
> +   * All APQNs that can be derived from the Cartesian product of the APID of the
> +     adapter being assigned and the APQIs of the previously assigned domains
> +     must be available to the vfio_ap device driver as specified in the sysfs
> +     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
> +     is reserved for use by the host device driver, the operation will terminate
> +     with an error (EADDRNOTAVAIL).
>  
> -     No APQN that can be derived from the adapter ID and the IDs of the
> -     previously assigned domains can be assigned to another mediated matrix
> -     device. If an APQN is assigned to another mediated matrix device, the
> -     operation will terminate with an error (EADDRINUSE).
> +   * No APQN that can be derived from the Cartesian product of the APID of the
> +     adapter being assigned and the APQIs of the previously assigned domains can
> +     be assigned to another matrix mediated device. If even one APQN is assigned
> +     to another matrix mediated device, the operation will terminate with an
> +     error (EADDRINUSE).
>  
>     In order to successfully assign a domain:
>  
> @@ -794,17 +871,18 @@ Securing the APQNs for our example
>       higher than the maximum is specified, the operation will terminate with
>       an error (ENODEV).
>  
> -   * All APQNs that can be derived from the domain ID and the IDs of
> -     the previously assigned adapters must be bound to the vfio_ap device
> -     driver. If no domains have yet been assigned, then there must be at least
> -     one APQN with the specified APQI bound to the vfio_ap driver. If no such
> -     APQNs are bound to the driver, the operation will terminate with an
> -     error (EADDRNOTAVAIL).
> +   * All APQNs that can be derived from the Cartesian product of the APQI of the
> +     domain being assigned and the APIDs of the previously assigned adapters
> +     must be available to the vfio_ap device driver as specified in the sysfs
> +     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
> +     is reserved for use by the host device driver, the operation will terminate
> +     with an error (EADDRNOTAVAIL).
>  
> -     No APQN that can be derived from the domain ID and the IDs of the
> -     previously assigned adapters can be assigned to another mediated matrix
> -     device. If an APQN is assigned to another mediated matrix device, the
> -     operation will terminate with an error (EADDRINUSE).
> +   * No APQN that can be derived from the Cartesian product of the APQI of the
> +     domain being assigned and the APIDs of the previously assigned adapters can
> +     be assigned to another matrix mediated device. If even one APQN is assigned
> +     to another matrix mediated device, the operation will terminate with an
> +     error (EADDRINUSE).
>  
>     In order to successfully assign a control domain, the domain number
>     specified must represent a value from 0 up to the maximum domain number
> @@ -813,22 +891,22 @@ Securing the APQNs for our example
>  
>  5. Start Guest1::
>  
> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
>  	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
>  
>  7. Start Guest2::
>  
> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
>  	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
>  
>  7. Start Guest3::
>  
> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
>  	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
>  
> -When the guest is shut down, the mediated matrix devices may be removed.
> +When the guest is shut down, the matrix mediated devices may be removed.
>  
> -Using our example again, to remove the mediated matrix device $uuid1::
> +Using our example again, to remove the matrix mediated device $uuid1::
>  
>     /sys/devices/vfio_ap/matrix/
>        --- [mdev_supported_types]
> @@ -851,16 +929,146 @@ remove it if no guest will use it during the remaining lifetime of the linux
>  host. If the mdev matrix device is removed, one may want to also reconfigure
>  the pool of adapters and queues reserved for use by the default drivers.
>  
> +Hot plug support:
> +================
> +An adapter, domain or control domain may be hot plugged into a running KVM
> +guest by assigning it to the matrix mediated device being used by the guest.
> +Control domains will always be hot plugged; however, an adapter or domain will
> +be hot plugged only if each new APQN resulting from its assignment
> +references a queue device bound to the vfio_ap device driver as described
> +below.
> +
> +When an adapter is assigned to a matrix mediated device in use by a KVM guest:
> +
> +* If no domains have yet been plugged into the KVM guest:
> +
> +  Hot plug the adapter and every domain previously assigned to the mdev if each
> +  APQN derived from the Cartesian product of the APID of the adapter being
> +  assigned and the APQIs of the domains previously assigned references a queue
> +  device bound to the vfio_ap device driver.
> +
> +* If one or more domains have previously been plugged into the guest:
> +
> +  Hot plug the adapter if each APQN derived from the Cartesian product of the
> +  APID of the adapter being assigned and the APQIs of the domains already
> +  plugged into the guest references a queue device bound to the vfio_ap device
> +  driver.
> +
> +When a domain is assigned to a matrix mediated device in use by a KVM guest:
> +
> +* If no adapters have yet been plugged into the KVM guest:
> +
> +  Hot plug the domain and every adapter previously assigned to the mdev if each
> +  APQN derived from the Cartesian product of the APIDs of the adapters
> +  previously assigned and the APQI of the domain being assigned references a
> +  queue device bound to the vfio_ap device driver.
> +
> +* If one or more adapters have previously been plugged into the guest:
> +
> +  Hot plug the domain if each APQN derived from the Cartesian product of the
> +  APIDs of the adapters already plugged into the guest and the APQI of the
> +  domain being assigned references a queue device bound to the vfio_ap device
> +  driver.
> +
> +Over-provisioning of AP queues for a KVM guest:
> +==============================================
> +Over-provisioning is defined herein as the assignment of adapters or domains to
> +a matrix mediated device that do not reference AP devices in the host's AP
> +configuration. The idea here is that when the adapter or domain becomes
> +available, it will be automatically hot-plugged into the KVM guest using
> +the matrix mediated device to which it is assigned as long as each new APQN
> +resulting from plugging it in references a queue device bound to the vfio_ap
> +device driver.
> +
>  Limitations
>  ===========
> -* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
> -  to the default drivers pool of a queue that is still assigned to a mediated
> -  device in use by a guest. It is incumbent upon the administrator to
> -  ensure there is no mediated device in use by a guest to which the APQN is
> -  assigned lest the host be given access to the private data of the AP queue
> -  device such as a private key configured specifically for the guest.
> +Live guest migration is not supported for guests using AP devices without
> +intervention by a system administrator. Before a KVM guest can be migrated,
> +the matrix mediated device must be removed. Unfortunately, it can not be
> +removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
> +the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
> +its mdev can be hot unplugged from the guest in one of two ways:
> +
> +1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
> +   the following commands:
> +
> +      virsh detach-device <guestname> <path-to-device-xml>
> +
> +      For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
> +      the guest named 'my-guest':
> +
> +         virsh detach-device my-guest ~/config/my-guest-hostdev.xml
> +
> +            The contents of my-guest-hostdev.xml:
> +
> +            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
> +              <source>
> +                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
> +              </source>
> +            </hostdev>
> +
> +
> +      virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
> +
> +      For example, to hot unplug the matrix mediated device identified on the
> +      qemu command line with 'id=hostdev0' from the guest named 'my-guest':
> +
> +         virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
> +
> +2. A matrix mediated device can be hot unplugged by attaching the qemu monitor
> +   to the guest and using the following qemu monitor command:
> +
> +      (QEMU) device-del id=<device-id>
> +
> +      For example, to hot unplug the matrix mediated device that was specified
> +      on the qemu command line with 'id=hostdev0' when the guest was started:
> +
> +         (QEMU) device-del id=hostdev0
> +
> +After live migration of the KVM guest completes, an AP configuration can be
> +restored to the KVM guest by hot plugging a matrix mediated device on the target
> +system into the guest in one of two ways:
> +
> +1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
> +   device into the guest via the following virsh commands:
> +
> +   virsh attach-device <guestname> <path-to-device-xml>
> +
> +      For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
> +      the guest named 'my-guest':
> +
> +         virsh attach-device my-guest ~/config/my-guest-hostdev.xml
> +
> +            The contents of my-guest-hostdev.xml:
> +
> +            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
> +              <source>
> +                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
> +              </source>
> +            </hostdev>
> +
> +
> +   virsh qemu-monitor-command <guest-name> --hmp \
> +   "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
> +
> +      For example, to hot plug the matrix mediated device
> +      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
> +      device-id hostdev0:
> +
> +      virsh qemu-monitor-command my-guest --hmp \
> +      "device_add vfio-ap,\
> +      sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
> +      id=hostdev0"
> +
> +2. A matrix mediated device can be hot plugged by attaching the qemu monitor
> +   to the guest and using the following qemu monitor command:
> +
> +      (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
>  
> -* Dynamically modifying the AP matrix for a running guest (which would amount to
> -  hot(un)plug of AP devices for the guest) is currently not supported
> +      For example, to plug the matrix mediated device 
> +      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
> +      hostdev0:
>  
> -* Live guest migration is not supported for guests using AP devices.
> +         (QEMU) device-add "vfio-ap,\
> +         sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
> +         id=hostdev0"
> \ No newline at end of file


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-09-25  2:27   ` Halil Pasic
@ 2020-09-29 13:07     ` Tony Krowiak
  2020-09-29 13:37       ` Halil Pasic
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-09-29 13:07 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 9/24/20 10:27 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:02 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -26,43 +26,26 @@
>>   
>>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>   
>> -static int match_apqn(struct device *dev, const void *data)
>> -{
>> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
>> -
>> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
>> -}
>> -
>>   /**
>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>> - * @matrix_mdev: the associated mediated matrix
>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>>    * @apqn: The queue APQN
>>    *
>> - * Retrieve a queue with a specific APQN from the list of the
>> - * devices of the vfio_ap_drv.
>> - * Verify that the APID and the APQI are set in the matrix.
>> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
>> + * the AP bus.
>>    *
>> - * Returns the pointer to the associated vfio_ap_queue
>> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>>    */
>> -static struct vfio_ap_queue *vfio_ap_get_queue(
>> -					struct ap_matrix_mdev *matrix_mdev,
>> -					int apqn)
>> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>>   {
>> +	struct ap_queue *queue;
>>   	struct vfio_ap_queue *q;
>> -	struct device *dev;
>>   
>> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>> -		return NULL;
>> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>> +	queue = ap_get_qdev(apqn);
>> +	if (!queue)
>>   		return NULL;
>>   
>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> -				 &apqn, match_apqn);
>> -	if (!dev)
>> -		return NULL;
>> -	q = dev_get_drvdata(dev);
>> -	q->matrix_mdev = matrix_mdev;
>> -	put_device(dev);
>> +	q = dev_get_drvdata(&queue->ap_dev.device);
> Is this cast here safe? (I don't think it is.)

In the probe, we execute:
dev_set_drvdata(&queue->ap_dev.device, q);

I don't get any compile nor execution errors. Why wouldn't it be safe?

>
>> +	put_device(&queue->ap_dev.device);
>>   
>>   	return q;
>>   }


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-09-29 13:07     ` Tony Krowiak
@ 2020-09-29 13:37       ` Halil Pasic
  2020-09-29 20:57         ` Tony Krowiak
  0 siblings, 1 reply; 79+ messages in thread
From: Halil Pasic @ 2020-09-29 13:37 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot

On Tue, 29 Sep 2020 09:07:40 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 
> 
> On 9/24/20 10:27 PM, Halil Pasic wrote:
> > On Fri, 21 Aug 2020 15:56:02 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -26,43 +26,26 @@
> >>   
> >>   static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
> >>   
> >> -static int match_apqn(struct device *dev, const void *data)
> >> -{
> >> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
> >> -
> >> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
> >> -}
> >> -
> >>   /**
> >> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
> >> - * @matrix_mdev: the associated mediated matrix
> >> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
> >>    * @apqn: The queue APQN
> >>    *
> >> - * Retrieve a queue with a specific APQN from the list of the
> >> - * devices of the vfio_ap_drv.
> >> - * Verify that the APID and the APQI are set in the matrix.
> >> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
> >> + * the AP bus.
> >>    *
> >> - * Returns the pointer to the associated vfio_ap_queue
> >> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
> >>    */
> >> -static struct vfio_ap_queue *vfio_ap_get_queue(
> >> -					struct ap_matrix_mdev *matrix_mdev,
> >> -					int apqn)
> >> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
> >>   {
> >> +	struct ap_queue *queue;
> >>   	struct vfio_ap_queue *q;
> >> -	struct device *dev;
> >>   
> >> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
> >> -		return NULL;
> >> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
> >> +	queue = ap_get_qdev(apqn);
> >> +	if (!queue)
> >>   		return NULL;
> >>   
> >> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
> >> -				 &apqn, match_apqn);
> >> -	if (!dev)
> >> -		return NULL;
> >> -	q = dev_get_drvdata(dev);
> >> -	q->matrix_mdev = matrix_mdev;
> >> -	put_device(dev);
> >> +	q = dev_get_drvdata(&queue->ap_dev.device);
> > Is this cast here safe? (I don't think it is.)
> 
> In the probe, we execute:
> dev_set_drvdata(&queue->ap_dev.device, q);
> 
> I don't get any compile nor execution errors. Why wouldn't it be safe?
> 

Because the queue may or may not be bound to the vfio_ap driver. AFAICT
this function can be called with an arbitrary APQN.

If it is bound to another driver then drvdata is not likely to hold a
struct vfio_ap_queue.


> >
> >> +	put_device(&queue->ap_dev.device);
> >>   
> >>   	return q;
> >>   }
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use
  2020-09-25  9:24   ` Halil Pasic
@ 2020-09-29 13:59     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-29 13:59 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/25/20 5:24 AM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:04 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The intent of
>> this callback is to provide a driver with the means to prevent a root user
>> from inadvertently taking a queue away from a matrix mdev and giving it to
>> the host while it is assigned to the matrix mdev. The callback will
>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a device
>> in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver facilitates pass-through of an AP queue to a
>> guest. The idea here is that a guest may be administered by a different
>> sysadmin than the host and we don't want AP resources to unexpectedly
>> disappear from a guest's AP configuration (i.e., adapters, domains and
>> control domains assigned to the matrix mdev). This will enforce the proper
>> procedure for removing AP resources intended for guest usage which is to
>> first unassign them from the matrix mdev, then unbind them from the
>> vfio_ap device driver.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reported-by: kernel test robot <lkp@intel.com>
>> ---
>>   drivers/s390/crypto/ap_bus.c | 148 ++++++++++++++++++++++++++++++++---
>>   drivers/s390/crypto/ap_bus.h |   4 +
>>   2 files changed, 142 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index 24a1940b829e..db27bd931308 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -35,6 +35,7 @@
>>   #include <linux/mod_devicetable.h>
>>   #include <linux/debugfs.h>
>>   #include <linux/ctype.h>
>> +#include <linux/module.h>
>>   
>>   #include "ap_bus.h"
>>   #include "ap_debug.h"
>> @@ -889,6 +890,23 @@ static int modify_bitmap(const char *str, unsigned long *bitmap, int bits)
>>   	return 0;
>>   }
>>   
>> +static int ap_parse_bitmap_str(const char *str, unsigned long *bitmap, int bits,
>> +			       unsigned long *newmap)
>> +{
>> +	unsigned long size;
>> +	int rc;
>> +
>> +	size = BITS_TO_LONGS(bits)*sizeof(unsigned long);
>> +	if (*str == '+' || *str == '-') {
>> +		memcpy(newmap, bitmap, size);
>> +		rc = modify_bitmap(str, newmap, bits);
>> +	} else {
>> +		memset(newmap, 0, size);
>> +		rc = hex2bitmap(str, newmap, bits);
>> +	}
>> +	return rc;
>> +}
>> +
>>   int ap_parse_mask_str(const char *str,
>>   		      unsigned long *bitmap, int bits,
>>   		      struct mutex *lock)
>> @@ -908,14 +926,7 @@ int ap_parse_mask_str(const char *str,
>>   		kfree(newmap);
>>   		return -ERESTARTSYS;
>>   	}
>> -
>> -	if (*str == '+' || *str == '-') {
>> -		memcpy(newmap, bitmap, size);
>> -		rc = modify_bitmap(str, newmap, bits);
>> -	} else {
>> -		memset(newmap, 0, size);
>> -		rc = hex2bitmap(str, newmap, bits);
>> -	}
>> +	rc = ap_parse_bitmap_str(str, bitmap, bits, newmap);
>>   	if (rc == 0)
>>   		memcpy(bitmap, newmap, size);
>>   	mutex_unlock(lock);
>> @@ -1107,12 +1118,70 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>>   	return rc;
>>   }
>>   
>> +static int __verify_card_reservations(struct device_driver *drv, void *data)
>> +{
>> +	int rc = 0;
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +	unsigned long *newapm = (unsigned long *)data;
>> +
>> +	/*
>> +	 * No need to verify whether the driver is using the queues if it is the
>> +	 * default driver.
>> +	 */
>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> +		return 0;
>> +
>> +	/* The non-default driver's module must be loaded */
>> +	if (!try_module_get(drv->owner))
>> +		return 0;
>> +
>> +	if (ap_drv->in_use)
>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
>> +			rc = -EADDRINUSE;
>> +
>> +	module_put(drv->owner);
>> +
>> +	return rc;
>> +}
>> +
>> +static int apmask_commit(unsigned long *newapm)
>> +{
>> +	int rc;
>> +	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
>> +
>> +	/*
>> +	 * Check if any bits in the apmask have been set which will
>> +	 * result in queues being removed from non-default drivers
>> +	 */
>> +	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
>> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> +				      __verify_card_reservations);
>> +		if (rc)
>> +			return rc;
>> +	}
> I understand the above asks all the non-default drivers if some of the
> queues are 'used'. But AFAIU this reflects the truth ap_drv->in_use()
> is only telling us something about a given moment...
>
>> +
>> +	memcpy(ap_perms.apm, newapm, APMASKSIZE);
> ... So I fail to understand what will prevent us from performing a
> successful commit if some of the resources become 'used' between
> the call to the in_use() callback and the memcpy.

So, the scenario you describe would go something like this:
1. User changes apmask or aqmask attempting to take
     queue xx.yyyy away from the vfio_ap driver.
2. The in_use callback does not detect the affected
     queues to be in use (i.e., it is not assigned to an mdev).
3. Another user assigns queue xx.yyyy to an mdev
4. The memcpy is performed and ownership of xx.yyyy is
     transferred to the host.
5. Afterward, the queues are reprobed which results in the
     remove callback on the vfio_ap driver for xx.yyyy and the
     probe callback on the cex4queue driver for xx.yyyy.

You are correct, there is nothing preventing a resource from becoming
'used' between the in_use callback and the memcpy. While the
above scenario could be considered a circumvention of the
intent of this design, the result would be no different than if
the in_use callback was not implemented at all. When the
remove callback is invoked for xx.yyyy on the vfio_ap driver
due to the reprobe, the queue will be released.

The chances of this scenario occurring are probably quite tiny
given the timing of two root users almost simultaneously
taking the required actions in the time it takes the verification
loop to complete and the mask to be copied. I suppose this
could happen if there are a great number of mdevs or a very large
number of queues bound to the vfio_ap driver, but this scenario
seems very unlikely.

>
> Of course I might be wrong.
>
> BTW I was never a fan of this mechanism, so I don't mind if it
> does not work perfectly, and this should catch most of the cases. Just
> want to make sure we don't introduce more confusion than necessary.
>
>> +
>> +	return 0;
>> +}
>> +
>>   static ssize_t apmask_store(struct bus_type *bus, const char *buf,
>>   			    size_t count)
>>   {
>>   	int rc;
>> +	DECLARE_BITMAP(newapm, AP_DEVICES);
>> +
>> +	if (mutex_lock_interruptible(&ap_perms_mutex))
>> +		return -ERESTARTSYS;
>> +
>> +	rc = ap_parse_bitmap_str(buf, ap_perms.apm, AP_DEVICES, newapm);
>> +	if (rc)
>> +		goto done;
>>   
>> -	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
>> +	rc = apmask_commit(newapm);
>> +
>> +done:
>> +	mutex_unlock(&ap_perms_mutex);
>>   	if (rc)
>>   		return rc;
>>   
>> @@ -1138,12 +1207,71 @@ static ssize_t aqmask_show(struct bus_type *bus, char *buf)
>>   	return rc;
>>   }
>>   
>> +static int __verify_queue_reservations(struct device_driver *drv, void *data)
>> +{
>> +	int rc = 0;
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +	unsigned long *newaqm = (unsigned long *)data;
>> +
>> +	/*
>> +	 * If the reserved bits do not identify queues reserved for use by the
>> +	 * non-default driver, there is no need to verify the driver is using
>> +	 * the queues.
>> +	 */
>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> +		return 0;
>> +
>> +	/* The non-default driver's module must be loaded */
>> +	if (!try_module_get(drv->owner))
>> +		return 0;
>> +
>> +	if (ap_drv->in_use)
>> +		if (ap_drv->in_use(ap_perms.apm, newaqm))
>> +			rc = -EADDRINUSE;
>> +
>> +	module_put(drv->owner);
>> +
>> +	return rc;
>> +}
>> +
>> +static int aqmask_commit(unsigned long *newaqm)
>> +{
>> +	int rc;
>> +	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
>> +
>> +	/*
>> +	 * Check if any bits in the aqmask have been set which will
>> +	 * result in queues being removed from non-default drivers
>> +	 */
>> +	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
>> +		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
>> +				      __verify_queue_reservations);
>> +		if (rc)
>> +			return rc;
>> +	}
>> +
>> +	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
>> +
> Same here.
>
> Regards,
> Halil
>
>> +	return 0;
>> +}
>> +
>>   static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
>>   			    size_t count)
>>   {
>>   	int rc;
>> +	DECLARE_BITMAP(newaqm, AP_DOMAINS);
>>   
>> -	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
>> +	if (mutex_lock_interruptible(&ap_perms_mutex))
>> +		return -ERESTARTSYS;
>> +
>> +	rc = ap_parse_bitmap_str(buf, ap_perms.aqm, AP_DOMAINS, newaqm);
>> +	if (rc)
>> +		goto done;
>> +
>> +	rc = aqmask_commit(newaqm);
>> +
>> +done:
>> +	mutex_unlock(&ap_perms_mutex);
>>   	if (rc)
>>   		return rc;
>>   
>> diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
>> index 1ea046324e8f..48c57b3d53a0 100644
>> --- a/drivers/s390/crypto/ap_bus.h
>> +++ b/drivers/s390/crypto/ap_bus.h
>> @@ -136,6 +136,7 @@ struct ap_driver {
>>   
>>   	int (*probe)(struct ap_device *);
>>   	void (*remove)(struct ap_device *);
>> +	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
>>   };
>>   
>>   #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
>> @@ -255,6 +256,9 @@ void ap_queue_init_state(struct ap_queue *aq);
>>   struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
>>   			       int comp_device_type, unsigned int functions);
>>   
>> +#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
>> +#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
>> +
>>   struct ap_perms {
>>   	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
>>   	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver
  2020-09-25  9:29   ` Halil Pasic
@ 2020-09-29 14:00     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-29 14:00 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/25/20 5:29 AM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:05 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> +
>> +bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>> +{
>> +	bool in_use;
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	in_use = !!vfio_ap_mdev_verify_no_sharing(NULL, apm, aqm);
>> +	mutex_unlock(&matrix_dev->lock);
> See also my comment for patch 4. AFAIU as soon as you release the lock
> the in_use may become outdated in any moment.

See my response to your comment for patch 4.

>
>> +
>> +	return in_use;
>> +}


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB
  2020-09-26  1:38   ` Halil Pasic
@ 2020-09-29 16:04     ` Tony Krowiak
  2020-09-29 16:19       ` Halil Pasic
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-09-29 16:04 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/25/20 9:38 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:06 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The APCB is a field within the CRYCB that provides the AP configuration
>> to a KVM guest. Let's introduce a shadow copy of the KVM guest's APCB and
>> maintain it for the lifespan of the guest.
>>
> AFAIU this is supposed to be a no change in behavior patch that lays the
> groundwork.

I suppose this is in the eyes of the beholder because this patch does
lay the groundwork for the APQN filtering and hot plug/unplug support
introduced in subsequent patches. Maybe it will be more in line with your
expectations after I make the changes I agreed to below.

>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c     | 32 ++++++++++++++++++++++-----
>>   drivers/s390/crypto/vfio_ap_private.h |  2 ++
>>   2 files changed, 29 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index fc1aa6f947eb..efb229033f9e 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -305,14 +305,35 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>   	return 0;
>>   }
>>   
>> +static void vfio_ap_matrix_clear_masks(struct ap_matrix *matrix)
>> +{
>> +	bitmap_clear(matrix->apm, 0, AP_DEVICES);
>> +	bitmap_clear(matrix->aqm, 0, AP_DOMAINS);
>> +	bitmap_clear(matrix->adm, 0, AP_DOMAINS);
>> +}
>> +
>>   static void vfio_ap_matrix_init(struct ap_config_info *info,
>>   				struct ap_matrix *matrix)
>>   {
>> +	vfio_ap_matrix_clear_masks(matrix);
> I don't quite understand the idea behind this. The only place
> vfio_ap_matrix_init() is used, is in create right after the whole
> matrix_mdev got allocated with kzalloc.

You are correct, this does not belong here. I am going to remove
the vfio_ap_matrix_clear_masks function because that is not needed
until the filtering patch.

>
>>   	matrix->apm_max = info->apxa ? info->Na : 63;
>>   	matrix->aqm_max = info->apxa ? info->Nd : 15;
>>   	matrix->adm_max = info->apxa ? info->Nd : 15;
>>   }
>>   
>> +static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
>> +}
>> +
>> +static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>> +				  matrix_mdev->shadow_apcb.apm,
>> +				  matrix_mdev->shadow_apcb.aqm,
>> +				  matrix_mdev->shadow_apcb.adm);
>> +}
>> +
>>   static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   {
>>   	struct ap_matrix_mdev *matrix_mdev;
>> @@ -1202,13 +1223,12 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   	if (ret)
>>   		return NOTIFY_DONE;
>>   
>> -	/* If there is no CRYCB pointer, then we can't copy the masks */
>> -	if (!matrix_mdev->kvm->arch.crypto.crycbd)
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>   		return NOTIFY_DONE;
>>   
>> -	kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
>> -				  matrix_mdev->matrix.aqm,
>> -				  matrix_mdev->matrix.adm);
>> +	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> +	       sizeof(matrix_mdev->shadow_apcb));
> A note on the thread safety of the access to matrix_mdev->matrix. I
> guess the idea is, that this is still safe because we did
> vfio_ap_mdev_set_kvm() and that is supposed to inhibit changes the
> matrix.
>
> There are two things that bother me with this:
> 1) the assign operations don't check matrix_mdev->kvm under the lock
> 2) with dynamic, this is supposed to change (So I have to be careful
> about it when reviewing the following patches. A sneak-peek at the end
> result makes me worried).

As you will see in the subsequent patches,
all operations performed within the context of the
assign/unassign interfaces are executed under the
matrix_dev->lock. This locks access to every
matrix_mdev. When an adapter, domain or control
domain are assigned, matrix_mdev-> kvm is
checked prior to assigning anything to the guest's APCB.
This occurs in between the lock/unlock of
matrix_dev->lock.

>
>> +	vfio_ap_mdev_commit_crycb(matrix_mdev);
>>   
>>   	return NOTIFY_OK;
>>   }
>> @@ -1323,6 +1343,8 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>   		kvm_put_kvm(matrix_mdev->kvm);
>>   		matrix_mdev->kvm = NULL;
>>   	}
>> +
>> +	vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
> What is the idea behind this? From the above, it looks like we are going
> to overwrite matrix_mdev->shadow_apcb with matrix_mdev->matrix before
> the next commit anyway.

The clearing of the masks in the shadow_apcb is premature
and doesn't belong in this patch. There is no reason to clear
these masks at this point, so I will remove this and the
vfio_ap_matrix_clear_masks function too.

>
> I suppose this is probably about no guest unolies no resources passed
> through at the moment. If that is the case maybe we can document it
> below.

I'm not quite sure what you are saying here or what I should be
documenting below.

>   
>
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 0c796ef11426..055bce6d45db 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -75,6 +75,7 @@ struct ap_matrix {
>>    * @list:	allows the ap_matrix_mdev struct to be added to a list
>>    * @matrix:	the adapters, usage domains and control domains assigned to the
>>    *		mediated matrix device.
>> + * @shadow_apcb:    the shadow copy of the APCB field of the KVM guest's CRYCB
>>    * @group_notifier: notifier block used for specifying callback function for
>>    *		    handling the VFIO_GROUP_NOTIFY_SET_KVM event
>>    * @kvm:	the struct holding guest's state
>> @@ -82,6 +83,7 @@ struct ap_matrix {
>>   struct ap_matrix_mdev {
>>   	struct list_head node;
>>   	struct ap_matrix matrix;
>> +	struct ap_matrix shadow_apcb;
>>   	struct notifier_block group_notifier;
>>   	struct notifier_block iommu_notifier;
>>   	struct kvm *kvm;


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB
  2020-09-29 16:04     ` Tony Krowiak
@ 2020-09-29 16:19       ` Halil Pasic
  0 siblings, 0 replies; 79+ messages in thread
From: Halil Pasic @ 2020-09-29 16:19 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Tue, 29 Sep 2020 12:04:25 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >
> > I suppose this is probably about no guest unolies no resources passed
> > through at the moment. If that is the case maybe we can document it
> > below.  
> 
> I'm not quite sure what you are saying here or what I should be
> documenting below.

No wonder, took me like 10 seconds to figure it out myself. The solution
is s/unolies/implies. I was one off to the left when typing 'imp'. 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-09-29 13:37       ` Halil Pasic
@ 2020-09-29 20:57         ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-29 20:57 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 9/29/20 9:37 AM, Halil Pasic wrote:
> On Tue, 29 Sep 2020 09:07:40 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 9/24/20 10:27 PM, Halil Pasic wrote:
>>> On Fri, 21 Aug 2020 15:56:02 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -26,43 +26,26 @@
>>>>    
>>>>    static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>>>    
>>>> -static int match_apqn(struct device *dev, const void *data)
>>>> -{
>>>> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
>>>> -
>>>> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
>>>> -}
>>>> -
>>>>    /**
>>>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>>>> - * @matrix_mdev: the associated mediated matrix
>>>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>>>>     * @apqn: The queue APQN
>>>>     *
>>>> - * Retrieve a queue with a specific APQN from the list of the
>>>> - * devices of the vfio_ap_drv.
>>>> - * Verify that the APID and the APQI are set in the matrix.
>>>> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
>>>> + * the AP bus.
>>>>     *
>>>> - * Returns the pointer to the associated vfio_ap_queue
>>>> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>>>>     */
>>>> -static struct vfio_ap_queue *vfio_ap_get_queue(
>>>> -					struct ap_matrix_mdev *matrix_mdev,
>>>> -					int apqn)
>>>> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>>>>    {
>>>> +	struct ap_queue *queue;
>>>>    	struct vfio_ap_queue *q;
>>>> -	struct device *dev;
>>>>    
>>>> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>>>> -		return NULL;
>>>> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>>>> +	queue = ap_get_qdev(apqn);
>>>> +	if (!queue)
>>>>    		return NULL;
>>>>    
>>>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>>> -				 &apqn, match_apqn);
>>>> -	if (!dev)
>>>> -		return NULL;
>>>> -	q = dev_get_drvdata(dev);
>>>> -	q->matrix_mdev = matrix_mdev;
>>>> -	put_device(dev);
>>>> +	q = dev_get_drvdata(&queue->ap_dev.device);
>>> Is this cast here safe? (I don't think it is.)
>> In the probe, we execute:
>> dev_set_drvdata(&queue->ap_dev.device, q);
>>
>> I don't get any compile nor execution errors. Why wouldn't it be safe?
>>
> Because the queue may or may not be bound to the vfio_ap driver. AFAICT
> this function can be called with an arbitrary APQN.
>
> If it is bound to another driver then drvdata is not likely to hold a
> struct vfio_ap_queue.

Then the function will return NULL. All callers must check for
NULL before using it which is the case in all places where this
function is called.

>
>
>>>> +	put_device(&queue->ap_dev.device);
>>>>    
>>>>    	return q;
>>>>    }


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix
  2020-09-26  7:16       ` Halil Pasic
@ 2020-09-29 21:00         ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-29 21:00 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, linux-s390, linux-kernel, kvm, freude,
	borntraeger, mjrosato, alex.williamson, kwankhede, fiuczy,
	frankja, david, imbrenda, hca, gor



On 9/26/20 3:16 AM, Halil Pasic wrote:
> On Fri, 18 Sep 2020 13:09:25 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 9/17/20 10:34 AM, Cornelia Huck wrote:
>>> On Fri, 21 Aug 2020 15:56:07 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>
>>>> The matrix of adapters and domains configured in a guest's CRYCB may
>>>> differ from the matrix of adapters and domains assigned to the matrix mdev,
>>>> so this patch introduces a sysfs attribute to display the matrix of a guest
>>>> using the matrix mdev. For a matrix mdev denoted by $uuid, the crycb for a
>>>> guest using the matrix mdev can be displayed as follows:
>>>>
>>>>      cat /sys/devices/vfio_ap/matrix/$uuid/guest_matrix
>>>>
>>>> If a guest is not using the matrix mdev at the time the crycb is displayed,
>>>> an error (ENODEV) will be returned.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/crypto/vfio_ap_ops.c | 58 +++++++++++++++++++++++++++++++
>>>>    1 file changed, 58 insertions(+)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index efb229033f9e..30bf23734af6 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -1119,6 +1119,63 @@ static ssize_t matrix_show(struct device *dev, struct device_attribute *attr,
>>>>    }
>>>>    static DEVICE_ATTR_RO(matrix);
>>>>    
>>>> +static ssize_t guest_matrix_show(struct device *dev,
>>>> +				 struct device_attribute *attr, char *buf)
>>>> +{
>>>> +	struct mdev_device *mdev = mdev_from_dev(dev);
>>>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>>> +	char *bufpos = buf;
>>>> +	unsigned long apid;
>>>> +	unsigned long apqi;
>>>> +	unsigned long apid1;
>>>> +	unsigned long apqi1;
>>>> +	unsigned long napm_bits = matrix_mdev->shadow_apcb.apm_max + 1;
>>>> +	unsigned long naqm_bits = matrix_mdev->shadow_apcb.aqm_max + 1;
>>>> +	int nchars = 0;
>>>> +	int n;
>>>> +
>>>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>>> +		return -ENODEV;
>>>> +
>>>> +	apid1 = find_first_bit_inv(matrix_mdev->shadow_apcb.apm, napm_bits);
>>>> +	apqi1 = find_first_bit_inv(matrix_mdev->shadow_apcb.aqm, naqm_bits);
>>>> +
>>>> +	mutex_lock(&matrix_dev->lock);
>>>> +
>>>> +	if ((apid1 < napm_bits) && (apqi1 < naqm_bits)) {
>>>> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
>>>> +				     napm_bits) {
>>>> +			for_each_set_bit_inv(apqi,
>>>> +					     matrix_mdev->shadow_apcb.aqm,
>>>> +					     naqm_bits) {
>>>> +				n = sprintf(bufpos, "%02lx.%04lx\n", apid,
>>>> +					    apqi);
>>>> +				bufpos += n;
>>>> +				nchars += n;
>>>> +			}
>>>> +		}
>>>> +	} else if (apid1 < napm_bits) {
>>>> +		for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm,
>>>> +				     napm_bits) {
>>>> +			n = sprintf(bufpos, "%02lx.\n", apid);
>>>> +			bufpos += n;
>>>> +			nchars += n;
>>>> +		}
>>>> +	} else if (apqi1 < naqm_bits) {
>>>> +		for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm,
>>>> +				     naqm_bits) {
>>>> +			n = sprintf(bufpos, ".%04lx\n", apqi);
>>>> +			bufpos += n;
>>>> +			nchars += n;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	mutex_unlock(&matrix_dev->lock);
>>>> +
>>>> +	return nchars;
>>>> +}
>>> This basically looks like a version of matrix_show() operating on the
>>> shadow apcb. I'm wondering if we could consolidate these two functions
>>> by passing in the structure to operate on as a parameter? Might not be
>>> worth the effort, though.
>> We still need the two functions because they back the mdev's
>> sysfs matrix and guest_matrix attributes, but we could call a function.
>> I'm not sure it buys us much though.
> The logic seems identical with the exception that the guest variant
> checks if vfio_ap_mdev_has_crycb(matrix_mdev). I'm not a big fan of
> duplicated code, and especially not in such close proximity. I'm voting
> for factoring out the common logic.

Not a problem, will do.

>
> Otherwise looks OK.
>
> Regards,
> Halil
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 08/16] s390/vfio-ap: filter matrix for unavailable queue devices
  2020-09-26  8:24   ` Halil Pasic
@ 2020-09-29 21:59     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-29 21:59 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/26/20 4:24 AM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:08 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Even though APQNs for queues that are not in the host's AP configuration
>> may be assigned to a matrix mdev, we do not want to set bits in the guest's
>> APCB for APQNs that do not reference AP queue devices bound to the vfio_ap
>> device driver. Ideally, it would be great if such APQNs could be filtered
>> out before setting the bits in the guest's APCB; however, the architecture
>> precludes filtering individual APQNs. Consequently, either the APID or APQI
>> must be filtered.
>>
>> This patch introduces code to filter the APIDs or APQIs assigned to the
>> matrix mdev's AP configuration before assigning them to the guest's AP
>> configuration (i.e., APCB). We'll start by filtering the APIDs:
>>
>>     If an APQN assigned to the matrix mdev's AP configuration does not
>>     reference a queue device bound to the vfio_ap device driver, the APID
>>     will be filtered out (i.e., not assigned to the guest's APCB).
>>
>> If every APID assigned to the matrix mdev is filtered out, then we'll try
>> filtering the APQI's:
>>
>>     If an APQN assigned to the matrix mdev's AP configuration does not
>>     reference a queue device bound to the vfio_ap device driver, the APQI
>>     will be filtered out (i.e., not assigned to the guest's APCB).
>>
>> In any case, if after filtering either the APIDs or APQIs there are any
>> APQNs that can be assigned to the guest's APCB, they will be assigned and
>> the CRYCB will be hot plugged into the guest.
>>
>> Example
>> =======
>>
>> APQNs bound to vfio_ap device driver:
>>     04.0004
>>     04.0047
>>     04.0054
>>
>>     05.0005
>>     05.0047
>>     05.0054
>>
>> Assignments to matrix mdev:
>>     APIDs  APQIs  -> APQNs
>>     04     0004      04.0004
>>     05     0005      04.0005
>>            0047      04.0047
>>            0054      04.0054
>>                      05.0004
>>                      05.0005
>>                      05.0047
>>                      04.0054
>>
>> Filter APIDs:
>>     APID 04 will be filtered because APQN 04.0005 is not bound.
>>     APID 05 will be filtered because APQN 05.0004 is not bound.
>>     APQNs remaining: None
>>
>> Filter APQIs:
>>     APQI 04 will be filtered because APQN 05.0004 is not bound.
>>     APQI 05 will be filtered because APQN 04.0005 is not bound.
>>     APQNs remaining: 04.0047, 04.0054, 05.0047, 05.0054
>>
>> APQNs 04.0047, 04.0054, 05.0047, 05.0054 will be assigned to the CRYCB and
>> hot plugged into the KVM guest.
>>
> I find this logic where we first do one strategy, and if nothing remains
> do the other strategy a little confusing. I will ramble on about it some
> more in the code.
>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 159 +++++++++++++++++++++++++++++-
>>   1 file changed, 155 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 30bf23734af6..eaf4e9eab6cb 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -326,7 +326,7 @@ static bool vfio_ap_mdev_has_crycb(struct ap_matrix_mdev *matrix_mdev)
>>   	return (matrix_mdev->kvm && matrix_mdev->kvm->arch.crypto.crycbd);
>>   }
>>   
>> -static void vfio_ap_mdev_commit_crycb(struct ap_matrix_mdev *matrix_mdev)
>> +static void vfio_ap_mdev_commit_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>>   {
>>   	kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>>   				  matrix_mdev->shadow_apcb.apm,
>> @@ -597,6 +597,157 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>>   	return 0;
>>   }
>>   
>> +/**
>> + * vfio_ap_mdev_filter_matrix
>> + *
>> + * Filter APQNs assigned to the matrix mdev that do not reference an AP queue
>> + * device bound to the vfio_ap device driver.
>> + *
>> + * @matrix_mdev:  the matrix mdev whose AP configuration is to be filtered
>> + * @shadow_apcb:  the shadow of the KVM guest's APCB (contains AP configuration
>> + *		  for guest)
>> + * @filter_apids: boolean value indicating whether the APQNs shall be filtered
>> + *		  by APID (true) or by APQI (false).
>> + *
>> + * Returns the number of APQNs remaining after filtering is complete.
>> + */
>> +static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>> +				      struct ap_matrix *shadow_apcb,
>> +				      bool filter_apids)
>> +{
>> +	unsigned long apid, apqi, apqn;
>> +
>> +	memcpy(shadow_apcb, &matrix_mdev->matrix, sizeof(*shadow_apcb));
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +		/*
>> +		 * If the APID is not assigned to the host AP configuration,
>> +		 * we can not assign it to the guest's AP configuration
>> +		 */
>> +		if (!test_bit_inv(apid,
>> +				  (unsigned long *)matrix_dev->info.apm)) {
> The patch description and the code seem to be out of sync. Here you do
> some filtering based on the host's  AP config info read at module read at
> module initialization time.
>
>> +			clear_bit_inv(apid, shadow_apcb->apm);
>> +			continue;
>> +		}
>> +
>> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> +				     AP_DOMAINS) {
>> +			/*
>> +			 * If the APQI is not assigned to the host AP
>> +			 * configuration, then it can not be assigned to the
>> +			 * guest's AP configuration
>> +			 */
>> +			if (!test_bit_inv(apqi, (unsigned long *)
>> +					  matrix_dev->info.aqm)) {
>> +				clear_bit_inv(apqi, shadow_apcb->aqm);
>> +				continue;
>> +			}
>> +
>> +			/*
>> +			 * If the APQN is not bound to the vfio_ap device
>> +			 * driver, then we can't assign it to the guest's
>> +			 * AP configuration. The AP architecture won't
>> +			 * allow filtering of a single APQN, so if we're
>> +			 * filtering APIDs, then filter the APID; otherwise,
>> +			 * filter the APQI.
>> +			 */
>> +			apqn = AP_MKQID(apid, apqi);
>> +			if (!vfio_ap_get_queue(apqn)) {
> Is this really gonna give NULL if the queue is not bound to vfio-ap? I
> don't think so. This will get NULL if the queue is not known to the AP
> bus, or has no driver-data assigned. In the current state it should give
> you non-NULL if another driver has the queue, and maintains it's own
> driver specific data in drvdata.

It will not give you a NULL if the zcrypt driver has the queue because
no zcrypt driver sets any drvdata. You do bring up a good point though
because there is no guarantee that another driver will never set
the driver data for a queue device. Consequently, I will be changing the
vfio_ap_get_queue(apqn) function to check the driver associated with
the device and return NULL if it is not the vfio_ap driver.

>
>> +				if (filter_apids)
>> +					clear_bit_inv(apid, shadow_apcb->apm);
>> +				else
>> +					clear_bit_inv(apqi, shadow_apcb->aqm);
>> +				break;
>> +			}
>> +		}
>> +
>> +		/*
>> +		 * If we're filtering APQIs and all of them have been filtered,
>> +		 * there's no need to continue filtering.
>> +		 */
>> +		if (!filter_apids)
>> +			if (bitmap_empty(shadow_apcb->aqm, AP_DOMAINS))
>> +				break;
>> +	}
>> +
>> +	return bitmap_weight(shadow_apcb->apm, AP_DEVICES) *
>> +	       bitmap_weight(shadow_apcb->aqm, AP_DOMAINS);
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_config_shadow_apcb
>> + *
>> + * Configure the shadow of a KVM guest's APCB specifying the adapters, domains
>> + * and control domains to be assigned to the guest. The shadow APCB will be
>> + * configured after filtering the APQNs assigned to the matrix mdev that do not
>> + * reference a queue device bound to the vfio_ap device driver.
>> + *
>> + * @matrix_mdev: the matrix mdev whose shadow APCB is to be configured.
>> + *
>> + * Returns true if the shadow APCB contents have been changed; otherwise,
>> + * returns false.
>> + */
>> +static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>> +{
>> +	int napm, naqm;
>> +	struct ap_matrix shadow_apcb;
>> +
>> +	vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
>> +	napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
>> +	naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
>> +
>> +	/*
>> +	 * If there are no APIDs or no APQIs assigned to the matrix mdev,
>> +	 * then no APQNs shall be assigned to the guest CRYCB.
>> +	 */
>> +	if ((napm != 0) || (naqm != 0)) {
>> +		/*
>> +		 * Filter the APIDs assigned to the matrix mdev for APQNs that
>> +		 * do not reference an AP queue device bound to the driver.
>> +		 */
>> +		napm = vfio_ap_mdev_filter_matrix(matrix_mdev, &shadow_apcb,
>> +						  true);
>> +		/*
>> +		 * If there are no APQNs that can be assigned to the guest's
>> +		 * CRYCB after filtering, then try filtering the APQIs.
>> +		 */
>> +		if (napm == 0) {
> When do we expect this to happen? Currently we don't assign queues that
> are not bound to us, and we have ->in_use() that inhibits disappearance
> of queues due to re-partitioning.

This will happen when domains are over-provisioned for a matrix
mdev. Suppose the following APQNs are assigned to the matrix
mdev:

00.0000
00.0004
00.0047

If queue 00.0047 is not bound to the vfio_ap device driver, then
the filtering code will filter APID 00.

Is your objection that this patch occurs prior to the
patch that implements over-provisioning? I guess your confusion
makes sense given over-provisioning is not introduced until after
this patch. Maybe I should re-order these to patches.

>
> So what we are left with is queue becomes unavailable to the host
> because of a config change, and maybe manual unbind -- not sure about
> that.

A queue can be removed from the vfio_ap device driver for the
following reasons:

1. The apmask or aqmask change can result in a queue device
     being unbound from vfio_ap.

2. A queue device can be manually unbound from vfio_ap.

3. A queue device can be unbound due to dynamic deconfiguration of
     the adapter via the SE or SCLP Deconfigure Adjunct Processor
     command (i.e., a configuration change)

>
> Now if matrix_dev->info was to reflect the config the bus acts by, which
> seems to the idea behind patch 12 we could react accordingly (if the
> domain is gone filter aqm).

That is handled in patch 13 which is the callback that handles the
notification introduced in patch 12. That patch does not use this
filtering code, however.

>
> I mean, the purpose of this callback seems to be getting us out of
> trouble when domains are missing across all cards (i.e. some domains
> were assigned away from us on the lower level).
>
> Or am I missing something?

I think you are missing the fact that there are other reasons
why a queue device may not be bound to vfio_ap (see reasons
above).

>
>> +			naqm = vfio_ap_mdev_filter_matrix(matrix_mdev,
>> +							  &shadow_apcb, false);
>> +
>> +			/*
>> +			 * If there are no APQNs that can be assigned to the
>> +			 * matrix mdev after filtering the APQIs, then no APQNs
>> +			 * shall be assigned to the guest's CRYCB.
>> +			 */
>> +			if (naqm == 0) {
>> +				bitmap_clear(shadow_apcb.apm, 0, AP_DEVICES);
>> +				bitmap_clear(shadow_apcb.aqm, 0, AP_DOMAINS);
>> +			}
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * If the guest's AP configuration has not changed, then return
>> +	 * indicating such.
>> +	 */
>> +	if (bitmap_equal(matrix_mdev->shadow_apcb.apm, shadow_apcb.apm,
>> +			 AP_DEVICES) &&
>> +	    bitmap_equal(matrix_mdev->shadow_apcb.aqm, shadow_apcb.aqm,
>> +			 AP_DOMAINS) &&
>> +	    bitmap_equal(matrix_mdev->shadow_apcb.adm, shadow_apcb.adm,
>> +			 AP_DOMAINS))
>> +		return false;
>> +
>> +	/*
>> +	 * Copy the changes to the guest's CRYCB, then return indicating that
>> +	 * the guest's AP configuration has changed.
>> +	 */
>> +	memcpy(&matrix_mdev->shadow_apcb, &shadow_apcb, sizeof(shadow_apcb));
>> +
>> +	return true;
>> +}
>> +
>>   enum qlink_type {
>>   	LINK_APID,
>>   	LINK_APQI,
>> @@ -1284,9 +1435,8 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>>   		return NOTIFY_DONE;
>>   
>> -	memcpy(&matrix_mdev->shadow_apcb, &matrix_mdev->matrix,
>> -	       sizeof(matrix_mdev->shadow_apcb));
>> -	vfio_ap_mdev_commit_crycb(matrix_mdev);
>> +	if (vfio_ap_mdev_config_shadow_apcb(matrix_mdev))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   
>>   	return NOTIFY_OK;
>>   }
>> @@ -1396,6 +1546,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>   	mutex_lock(&matrix_dev->lock);
>>   	if (matrix_mdev->kvm) {
>>   		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>> +		vfio_ap_matrix_clear_masks(&matrix_mdev->shadow_apcb);
>>   		matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>>   		vfio_ap_mdev_reset_queues(mdev);
>>   		kvm_put_kvm(matrix_mdev->kvm);


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-09-26 23:49   ` Halil Pasic
@ 2020-09-30 12:59     ` Tony Krowiak
  2020-09-30 22:29       ` Halil Pasic
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-09-30 12:59 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/26/20 7:49 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:09 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The current implementation does not allow assignment of an AP adapter or
>> domain to an mdev device if the APQNs resulting from the assignment
>> do not reference AP queue devices that are bound to the vfio_ap device
>> driver. This patch allows assignment of AP resources to the matrix mdev as
>> long as the APQNs resulting from the assignment:
>>     1. Are not reserved by the AP BUS for use by the zcrypt device drivers.
>>     2. Are not assigned to another matrix mdev.
>>
>> The rationale behind this is twofold:
>>     1. The AP architecture does not preclude assignment of APQNs to an AP
>>        configuration that are not available to the system.
>>     2. APQNs that do not reference a queue device bound to the vfio_ap
>>        device driver will not be assigned to the guest's CRYCB, so the
>>        guest will not get access to queues not bound to the vfio_ap driver.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 212 +++++-------------------------
>>   1 file changed, 35 insertions(+), 177 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index eaf4e9eab6cb..24fd47e43b80 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -1,4 +1,3 @@
>> -// SPDX-License-Identifier: GPL-2.0+
> Probably not intentional, or?

Definitely not intentional. I'll restore it.

>
>>   /*
>>    * Adjunct processor matrix VFIO device driver callbacks.
>>    *
>> @@ -420,122 +419,6 @@ static struct attribute_group *vfio_ap_mdev_type_groups[] = {
>>   	NULL,
>>   };
>>   
>> -struct vfio_ap_queue_reserved {
>> -	unsigned long *apid;
>> -	unsigned long *apqi;
>> -	bool reserved;
>> -};
>> -
>> -/**
>> - * vfio_ap_has_queue
>> - *
>> - * @dev: an AP queue device
>> - * @data: a struct vfio_ap_queue_reserved reference
>> - *
>> - * Flags whether the AP queue device (@dev) has a queue ID containing the APQN,
>> - * apid or apqi specified in @data:
>> - *
>> - * - If @data contains both an apid and apqi value, then @data will be flagged
>> - *   as reserved if the APID and APQI fields for the AP queue device matches
>> - *
>> - * - If @data contains only an apid value, @data will be flagged as
>> - *   reserved if the APID field in the AP queue device matches
>> - *
>> - * - If @data contains only an apqi value, @data will be flagged as
>> - *   reserved if the APQI field in the AP queue device matches
>> - *
>> - * Returns 0 to indicate the input to function succeeded. Returns -EINVAL if
>> - * @data does not contain either an apid or apqi.
>> - */
>> -static int vfio_ap_has_queue(struct device *dev, void *data)
>> -{
>> -	struct vfio_ap_queue_reserved *qres = data;
>> -	struct ap_queue *ap_queue = to_ap_queue(dev);
>> -	ap_qid_t qid;
>> -	unsigned long id;
>> -
>> -	if (qres->apid && qres->apqi) {
>> -		qid = AP_MKQID(*qres->apid, *qres->apqi);
>> -		if (qid == ap_queue->qid)
>> -			qres->reserved = true;
>> -	} else if (qres->apid && !qres->apqi) {
>> -		id = AP_QID_CARD(ap_queue->qid);
>> -		if (id == *qres->apid)
>> -			qres->reserved = true;
>> -	} else if (!qres->apid && qres->apqi) {
>> -		id = AP_QID_QUEUE(ap_queue->qid);
>> -		if (id == *qres->apqi)
>> -			qres->reserved = true;
>> -	} else {
>> -		return -EINVAL;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>> -/**
>> - * vfio_ap_verify_queue_reserved
>> - *
>> - * @matrix_dev: a mediated matrix device
>> - * @apid: an AP adapter ID
>> - * @apqi: an AP queue index
>> - *
>> - * Verifies that the AP queue with @apid/@apqi is reserved by the VFIO AP device
>> - * driver according to the following rules:
>> - *
>> - * - If both @apid and @apqi are not NULL, then there must be an AP queue
>> - *   device bound to the vfio_ap driver with the APQN identified by @apid and
>> - *   @apqi
>> - *
>> - * - If only @apid is not NULL, then there must be an AP queue device bound
>> - *   to the vfio_ap driver with an APQN containing @apid
>> - *
>> - * - If only @apqi is not NULL, then there must be an AP queue device bound
>> - *   to the vfio_ap driver with an APQN containing @apqi
>> - *
>> - * Returns 0 if the AP queue is reserved; otherwise, returns -EADDRNOTAVAIL.
>> - */
>> -static int vfio_ap_verify_queue_reserved(unsigned long *apid,
>> -					 unsigned long *apqi)
>> -{
>> -	int ret;
>> -	struct vfio_ap_queue_reserved qres;
>> -
>> -	qres.apid = apid;
>> -	qres.apqi = apqi;
>> -	qres.reserved = false;
>> -
>> -	ret = driver_for_each_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>> -				     &qres, vfio_ap_has_queue);
>> -	if (ret)
>> -		return ret;
>> -
>> -	if (qres.reserved)
>> -		return 0;
>> -
>> -	return -EADDRNOTAVAIL;
>> -}
>> -
>> -static int
>> -vfio_ap_mdev_verify_queues_reserved_for_apid(struct ap_matrix_mdev *matrix_mdev,
>> -					     unsigned long apid)
>> -{
>> -	int ret;
>> -	unsigned long apqi;
>> -	unsigned long nbits = matrix_mdev->matrix.aqm_max + 1;
>> -
>> -	if (find_first_bit_inv(matrix_mdev->matrix.aqm, nbits) >= nbits)
>> -		return vfio_ap_verify_queue_reserved(&apid, NULL);
>> -
>> -	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, nbits) {
>> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>> -		if (ret)
>> -			return ret;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>>   #define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
>>   			 "already assigned to %s"
>>   
>> @@ -572,6 +455,11 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>>   	DECLARE_BITMAP(aqm, AP_DOMAINS);
>>   
>>   	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
>> +		/*
>> +		 * If either of the input masks belongs to the mdev to which an
>> +		 * AP resource is being assigned, then we don't need to verify
>> +		 * that mdev's masks.
>> +		 */
>>   		if (matrix_mdev == lstdev)
>>   			continue;
>>   
> Seems unrelated.

What seems unrelated? The matrix_mdev passed in is the mdev to which 
assignment is
being made. This function is verifying that no APQN assigned to the 
matrix_mdev is
assigned to any other mdev. Since we are looping through all mdevs here, 
we are
skipping the verification if the current mdev being examined is the same 
as the matrix_mdev
to which the assignment is being made. Maybe I'm not understanding your 
point here.

>
>> @@ -597,6 +485,20 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
>>   	return 0;
>>   }
>>   
>> +static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev,
>> +				       unsigned long *mdev_apm,
>> +				       unsigned long *mdev_aqm)
>> +{
>> +	DECLARE_BITMAP(apm, AP_DEVICES);
>> +	DECLARE_BITMAP(aqm, AP_DOMAINS);
>> +
>> +	if (bitmap_and(apm, mdev_apm, ap_perms.apm, AP_DEVICES) &&
>> +	    bitmap_and(aqm, mdev_aqm, ap_perms.aqm, AP_DOMAINS))
> Isn't ap_perms supposed to be protected by ap_perms_mutex? In theory
> you could end up with a torn write (catch the a[pq]mask_commit() with
> its pants down, in a sense that only a part of the memcpy was done (and
> became observable on the other CPU doing this validate).

Good catch. I should probably use the 
ap_apqn_in_matrix_owned_by_def_drv(apm, aqm)
function in ap_bus.c.

>
>> +		return -EADDRNOTAVAIL;
>> +
>> +	return vfio_ap_mdev_verify_no_sharing(matrix_mdev, mdev_apm, mdev_aqm);
>> +}
>> +
>>   /**
>>    * vfio_ap_mdev_filter_matrix
>>    *
>> @@ -882,33 +784,21 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	if (apid > matrix_mdev->matrix.apm_max)
>>   		return -ENODEV;
>>   
>> -	/*
>> -	 * Set the bit in the AP mask (APM) corresponding to the AP adapter
>> -	 * number (APID). The bits in the mask, from most significant to least
>> -	 * significant bit, correspond to APIDs 0-255.
>> -	 */
>> -	mutex_lock(&matrix_dev->lock);
>> -
>> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apid(matrix_mdev, apid);
>> -	if (ret)
>> -		goto done;
>> -
>>   	memset(apm, 0, sizeof(apm));
>>   	set_bit_inv(apid, apm);
>>   
>> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev, apm,
>> -					     matrix_mdev->matrix.aqm);
>> -	if (ret)
>> -		goto done;
>> -
>> +	mutex_lock(&matrix_dev->lock);
>> +	ret = vfio_ap_mdev_validate_masks(matrix_mdev, apm,
>> +					  matrix_mdev->matrix.aqm);
>> +	if (ret) {
>> +		mutex_unlock(&matrix_dev->lock);
>> +		return ret;
>> +	}
> At this point the ap_perms may have already changed, or?

Both this function and the in_use callback take
the matrix_dev->lock; therefore, the ap_perms will not be changed until
getting an answer from the in_use callback which will be blocked until
this assignment function releases the lock. Does that sound about
right?

>
>>   	set_bit_inv(apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>> -	ret = count;
>> -
>> -done:
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>> -	return ret;
>> +	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_adapter);
>>   
>> @@ -958,26 +848,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>   }
>>   static DEVICE_ATTR_WO(unassign_adapter);
>>   
>> -static int
>> -vfio_ap_mdev_verify_queues_reserved_for_apqi(struct ap_matrix_mdev *matrix_mdev,
>> -					     unsigned long apqi)
>> -{
>> -	int ret;
>> -	unsigned long apid;
>> -	unsigned long nbits = matrix_mdev->matrix.apm_max + 1;
>> -
>> -	if (find_first_bit_inv(matrix_mdev->matrix.apm, nbits) >= nbits)
>> -		return vfio_ap_verify_queue_reserved(NULL, &apqi);
>> -
>> -	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, nbits) {
>> -		ret = vfio_ap_verify_queue_reserved(&apid, &apqi);
>> -		if (ret)
>> -			return ret;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>>   /**
>>    * assign_domain_store
>>    *
>> @@ -1031,28 +901,21 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	if (apqi > max_apqi)
>>   		return -ENODEV;
>>   
>> -	mutex_lock(&matrix_dev->lock);
>> -
>> -	ret = vfio_ap_mdev_verify_queues_reserved_for_apqi(matrix_mdev, apqi);
>> -	if (ret)
>> -		goto done;
>> -
>>   	memset(aqm, 0, sizeof(aqm));
>>   	set_bit_inv(apqi, aqm);
>>   
>> -	ret = vfio_ap_mdev_verify_no_sharing(matrix_mdev,
>> -					     matrix_mdev->matrix.apm, aqm);
>> -	if (ret)
>> -		goto done;
>> -
>> +	mutex_lock(&matrix_dev->lock);
>> +	ret = vfio_ap_mdev_validate_masks(matrix_mdev, matrix_mdev->matrix.apm,
>> +					  aqm);
>> +	if (ret) {
>> +		mutex_unlock(&matrix_dev->lock);
>> +		return ret;
>> +	}
>>   	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>> -	ret = count;
>> -
>> -done:
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>> -	return ret;
>> +	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_domain);
>>   
>> @@ -1139,11 +1002,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>   	if (id > matrix_mdev->matrix.adm_max)
>>   		return -ENODEV;
>>   
>> -	/* Set the bit in the ADM (bitmask) corresponding to the AP control
>> -	 * domain number (id). The bits in the mask, from most significant to
>> -	 * least significant, correspond to IDs 0 up to the one less than the
>> -	 * number of control domains that can be assigned.
>> -	 */
>>   	mutex_lock(&matrix_dev->lock);
>>   	set_bit_inv(id, matrix_mdev->matrix.adm);
>>   	mutex_unlock(&matrix_dev->lock);


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 10/16] s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest
  2020-09-27  0:03   ` Halil Pasic
@ 2020-09-30 13:19     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-09-30 13:19 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/26/20 8:03 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:10 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> The current support for pass-through crypto adapters does not allow
>> configuration of a matrix mdev when it is in use by a KVM guest. Let's
>> allow AP resources - i.e., adapters, domains and control domains - to be
>> assigned to or unassigned from a matrix mdev while it is in use by a guest.
>> This is in preparation for the introduction of support for dynamic
>> configuration of the AP matrix for a running KVM guest.
> AFAIU this will let the user do the assign, which will however only take
> effect if the same mdev is re-used with a freshly constructed VM, or?
>
> This is however supposed to change real soon (in patch 11). From the
> perspective of bisectability we would end up with a single commit that
> acts funny.
>
> How about switching up patches 10 and 11. This way the changes you have
> in the current 11 would remain dormant until the changes in the current
> 10 enable the complete new feature (hotplug)?

I can do that, but maybe it makes more sense to squash patches 10
and 11 since they are completely dependent on each other. What say
you?

>
>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 24 ------------------------
>>   1 file changed, 24 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 24fd47e43b80..cf3321eb239b 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -773,10 +773,6 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow assignment of adapter */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apid);
>>   	if (ret)
>>   		return ret;
>> @@ -828,10 +824,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow un-assignment of adapter */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apid);
>>   	if (ret)
>>   		return ret;
>> @@ -891,10 +883,6 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>>   
>> -	/* If the guest is running, disallow assignment of domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apqi);
>>   	if (ret)
>>   		return ret;
>> @@ -946,10 +934,6 @@ static ssize_t unassign_domain_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow un-assignment of domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apqi);
>>   	if (ret)
>>   		return ret;
>> @@ -991,10 +975,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow assignment of control domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &id);
>>   	if (ret)
>>   		return ret;
>> @@ -1036,10 +1016,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
>>   
>> -	/* If the guest is running, disallow un-assignment of control domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &domid);
>>   	if (ret)
>>   		return ret;


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  2020-09-30 12:59     ` Tony Krowiak
@ 2020-09-30 22:29       ` Halil Pasic
  0 siblings, 0 replies; 79+ messages in thread
From: Halil Pasic @ 2020-09-30 22:29 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Wed, 30 Sep 2020 08:59:36 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >> @@ -572,6 +455,11 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev,
> >>   	DECLARE_BITMAP(aqm, AP_DOMAINS);
> >>   
> >>   	list_for_each_entry(lstdev, &matrix_dev->mdev_list, node) {
> >> +		/*
> >> +		 * If either of the input masks belongs to the mdev to which an
> >> +		 * AP resource is being assigned, then we don't need to verify
> >> +		 * that mdev's masks.
> >> +		 */
> >>   		if (matrix_mdev == lstdev)
> >>   			continue;
> >>     
> > Seems unrelated.  
> 
> What seems unrelated? The matrix_mdev passed in is the mdev to which 
> assignment is
> being made. This function is verifying that no APQN assigned to the 
> matrix_mdev is
> assigned to any other mdev. Since we are looping through all mdevs here, 
> we are
> skipping the verification if the current mdev being examined is the same 
> as the matrix_mdev
> to which the assignment is being made. Maybe I'm not understanding your 
> point here.

Sorry I did not explain myself clear enough. By seems unrelated, I mean
that while valid possibly it does not contribute towards achieving the
objective of the patch (as stated by the commit message and the short
description).

AFAICT this is about documenting some pre-existing logic that is not
changed, nor it needs to be changed to achieve the objective.

This patch does not change the function at all (except for the
comment). If the comment is about the new arguments, then is
belongs to "implement in-use callback for vfio_ap driver" where those
were added.

BTW I find the comment hard to understand because I don't see "If either
of the input masks belongs to the mdev to which an  AP resource is being
assigned expressed in the code.

I would rather have the docstring of the function updated so the
relationship of the three arguments is clear.

Regards,
Halil

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-09-28  1:01   ` Halil Pasic
@ 2020-10-05 16:24     ` Tony Krowiak
  2020-10-05 18:30       ` Halil Pasic
  0 siblings, 1 reply; 79+ messages in thread
From: Tony Krowiak @ 2020-10-05 16:24 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/27/20 9:01 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:11 -0400
> Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
>
>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>> unassigned from an AP matrix mdev device while it is in use by a guest per
>> the following:
>>
>> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>>    guest, the adapter will be hot plugged into the KVM guest as long as each
>>    APQN derived from the Cartesian product of the APID being assigned and
>>    the APQIs already assigned to the guest's CRYCB references a queue device
>>    bound to the vfio_ap device driver.
>>
>> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>>    KVM guest, the adapter will be hot unplugged from the KVM guest.
>>
>> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>>    guest, the domain will be hot plugged into the KVM guest as long as each
>>    APQN derived from the Cartesian product of the APQI being assigned and
>>    the APIDs already assigned to the guest's CRYCB references a queue device
>>    bound to the vfio_ap device driver.
>>
>> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>>    KVM guest, the domain will be hot unplugged from the KVM guest
> Hm, I suppose this means that what your guest effectively gets may depend
> on whether assign_domain or assign_adapter is done first.
>
> Suppose we have the queues
> 0.0 0.1
> 1.0
> bound to vfio_ap, i.e. 1.1 is missing for a reason different than
> belonging to the default drivers (for what exact reason no idea).

I'm not quite sure what you mean be "we have queue". I will
assume you mean those queues are bound to the vfio_ap
device driver. The only way this could happen is if somebody
manually unbinds queue 1.1.

> Let's suppose we started with the matix containing only adapter
> 0 (0.) and domain 0 (.0).
>
> After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
> matrix:
> 0.0 0.1
> 1.0 1.1
> guest_matrix:
> 0.0 0.1
> while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
> with:
> matrix:
> 0.0 0.1
> 1.0 1.1
> guest_matrix:
> 0.0
> 0.1
>
> That means, the set of bound queues and the set of assigned resources do
> not fully determine the set of resources passed through to the guest.
>
> I that a deliberate design choice?

Yes, it is a deliberate choice to only allow guest access to queues
represented by queue devices bound to the vfio_ap device driver.
The idea here is to adhere to the linux device model.

>
>> * When the domain number of a control domain is assigned to a matrix mdev
>>    in use by a KVM guest, the control domain will be hot plugged into the
>>    KVM guest.
>>
>> * When the domain number of a control domain is unassigned from a matrix
>>    mdev in use by a KVM guest, the control domain will be hot unplugged
>>    from the KVM guest.
>>
>> Signed-off-by: Tony Krowiak<akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 196 ++++++++++++++++++++++++++++++
>>   1 file changed, 196 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index cf3321eb239b..2b01a8eb6ee7 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -731,6 +731,56 @@ static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
>>   	}
>>   }
>>   
>> +static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apid)
>> +{
>> +	DECLARE_BITMAP(aqm, AP_DOMAINS);
>> +	unsigned long apqi, apqn;
>> +
>> +	bitmap_copy(aqm, matrix_mdev->matrix.aqm, AP_DOMAINS);
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +		if (!test_bit_inv(apqi,
>> +				  (unsigned long *) matrix_dev->info.aqm))
>> +			clear_bit_inv(apqi, aqm);
>> +
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			clear_bit_inv(apqi, aqm);
>> +	}
>> +
>> +	if (bitmap_empty(aqm, AP_DOMAINS))
>> +		return false;
>> +
>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +	bitmap_copy(matrix_mdev->shadow_apcb.aqm, aqm, AP_DOMAINS);
>> +
>> +	return true;
>> +}
>> +
>> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>> +					   unsigned long apid)
>> +{
>> +	unsigned long apqi, apqn;
>> +
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>> +		return false;
>> +
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>> +		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
> Hm. Let's say we have the same situation regarding the bound queues as
> above but we start with the empty matrix, and do all the assignments
> while the guest is running.
>
> Consider the following sequence of actions.
>
> 1) echo 0 > assign_domain

matrix:            .0
guest_matrix: no APQNs

> 2) echo 1 > assign_domain

matrix:            .0, .1
guest_matrix: no APQNs

> 3) echo 1 > assign_adapter

matrix:           1.0, 1.1
guest_matrix: 1.0

> 4) echo 0 > assign_adapter

matrix:           0.0, 0.1, 1.0, 1.1
guest_matrix: 0.0, 1.0
> 5) echo 1 > unassign_adapter

matrix:           0.0, 0.1
guest_matrix: 0.0

> I understand that at 3), because
> bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
> aqm containing just domain 0, as queue 1.1 ain't bound to us.

True

> Thus at the end we would have
> matrix:
> 0.0 0.1
> guest_matrix:
> 0.0

At the end I had:
matrix:            0.0, 0.1
guest_matrix: 0.0

> And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
> then after
> 6) echo 2 > assign_adapter
> we get
> Thus at the end we would have
> matrix:
> 0.0 0.1
> 2.0 2.1
> guest_matrix:
> 0.0
> 2.0
>
> This looks very quirky to me. Did I read the code wrong? Opinions?

You read the code correctly and I agree, this is a bit quirky. I would say
that after adding adapter 2, we should end up with guest matrix:
0.0, 0.1
2.0, 2.1

If you agree, I'll make the adjustment.

>
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			return false;
>> +	}
>> +
>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> +	return true;
>> +}
>> +
>>   /**
>>    * assign_adapter_store
>>    *
>> @@ -792,12 +842,42 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	}
>>   	set_bit_inv(apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>> +	if (vfio_ap_mdev_assign_guest_apid(matrix_mdev, apid))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_adapter);
>>   
>> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apid)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>> +			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> +			/*
>> +			 * If there are no APIDs assigned to the guest, then
>> +			 * the guest will not have access to any queues, so
>> +			 * let's also go ahead and unassign the APQIs. Keeping
>> +			 * them around may yield unpredictable results during
>> +			 * a probe that is not related to a host AP
>> +			 * configuration change (i.e., an AP adapter is
>> +			 * configured online).
>> +			 */
> I don't quite understand this comment. Clearing out the other mask when
> the one becomes empty, does allow us to recover the full possible guest
> matrix in the scenario described above. I don't see any shadow
> manipulation in the probe handler at this stage. Are we maybe
> talking about the same effect as I described for assign?

Patch 15/16 is for the probe.

>
> Regards,
> Halil
>
>> +			if (bitmap_empty(matrix_mdev->shadow_apcb.apm,
>> +					 AP_DEVICES))
>> +				bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
>> +					     AP_DOMAINS);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /**
>>    * unassign_adapter_store
>>    *
>> @@ -834,12 +914,64 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
>> +	if (vfio_ap_mdev_unassign_guest_apid(matrix_mdev, apid))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(unassign_adapter);
>>   
>> +static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apqi)
>> +{
>> +	DECLARE_BITMAP(apm, AP_DEVICES);
>> +	unsigned long apid, apqn;
>> +
>> +	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +		if (!test_bit_inv(apid,
>> +				  (unsigned long *) matrix_dev->info.apm))
>> +			clear_bit_inv(apqi, apm);
>> +
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			clear_bit_inv(apid, apm);
>> +	}
>> +
>> +	if (bitmap_empty(apm, AP_DEVICES))
>> +		return false;
>> +
>> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +	bitmap_copy(matrix_mdev->shadow_apcb.apm, apm, AP_DEVICES);
>> +
>> +	return true;
>> +}
>> +
>> +static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>> +					   unsigned long apqi)
>> +{
>> +	unsigned long apid, apqn;
>> +
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> +	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
>> +		return false;
>> +
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> +		return vfio_ap_mdev_assign_apids_4_apqi(matrix_mdev, apqi);
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			return false;
>> +	}
>> +
>> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> +	return true;
>> +}
>> +
>>   /**
>>    * assign_domain_store
>>    *
>> @@ -901,12 +1033,41 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	}
>>   	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>> +	if (vfio_ap_mdev_assign_guest_apqi(matrix_mdev, apqi))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_domain);
>>   
>> +static bool vfio_ap_mdev_unassign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apqi)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
>> +			clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> +			/*
>> +			 * If there are no APQIs assigned to the guest, then
>> +			 * the guest will not have access to any queues, so
>> +			 * let's also go ahead and unassign the APIDs. Keeping
>> +			 * them around may yield unpredictable results during
>> +			 * a probe that is not related to a host AP
>> +			 * configuration change (i.e., an AP adapter is
>> +			 * configured online).
>> +			 */
>> +			if (bitmap_empty(matrix_mdev->shadow_apcb.aqm,
>> +					 AP_DOMAINS))
>> +				bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
>> +					     AP_DEVICES);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>>   
>>   /**
>>    * unassign_domain_store
>> @@ -944,12 +1105,28 @@ static ssize_t unassign_domain_store(struct device *dev,
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
>> +	if (vfio_ap_mdev_unassign_guest_apqi(matrix_mdev, apqi))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(unassign_domain);
>>   
>> +static bool vfio_ap_mdev_assign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
>> +					   unsigned long domid)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> +			set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /**
>>    * assign_control_domain_store
>>    *
>> @@ -984,12 +1161,29 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	set_bit_inv(id, matrix_mdev->matrix.adm);
>> +	if (vfio_ap_mdev_assign_guest_cdom(matrix_mdev, id))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_control_domain);
>>   
>> +static bool
>> +vfio_ap_mdev_unassign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
>> +				 unsigned long domid)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> +			clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /**
>>    * unassign_control_domain_store
>>    *
>> @@ -1024,6 +1218,8 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv(domid, matrix_mdev->matrix.adm);
>> +	if (vfio_ap_mdev_unassign_guest_cdom(matrix_mdev, domid))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
> u


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-10-05 16:24     ` Tony Krowiak
@ 2020-10-05 18:30       ` Halil Pasic
  2020-10-05 21:48         ` Tony Krowiak
  2020-10-05 23:05         ` Tony Krowiak
  0 siblings, 2 replies; 79+ messages in thread
From: Halil Pasic @ 2020-10-05 18:30 UTC (permalink / raw)
  To: Tony Krowiak
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

On Mon, 5 Oct 2020 12:24:39 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 
> 
> On 9/27/20 9:01 PM, Halil Pasic wrote:
> > On Fri, 21 Aug 2020 15:56:11 -0400
> > Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
> >
> >> Let's hot plug/unplug adapters, domains and control domains assigned to or
> >> unassigned from an AP matrix mdev device while it is in use by a guest per
> >> the following:
> >>
> >> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
> >>    guest, the adapter will be hot plugged into the KVM guest as long as each
> >>    APQN derived from the Cartesian product of the APID being assigned and
> >>    the APQIs already assigned to the guest's CRYCB references a queue device
> >>    bound to the vfio_ap device driver.
> >>
> >> * When the APID of an adapter is unassigned from a matrix mdev in use by a
> >>    KVM guest, the adapter will be hot unplugged from the KVM guest.
> >>
> >> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
> >>    guest, the domain will be hot plugged into the KVM guest as long as each
> >>    APQN derived from the Cartesian product of the APQI being assigned and
> >>    the APIDs already assigned to the guest's CRYCB references a queue device
> >>    bound to the vfio_ap device driver.
> >>
> >> * When the APQI of a domain is unassigned from a matrix mdev in use by a
> >>    KVM guest, the domain will be hot unplugged from the KVM guest
> > Hm, I suppose this means that what your guest effectively gets may depend
> > on whether assign_domain or assign_adapter is done first.
> >
> > Suppose we have the queues
> > 0.0 0.1
> > 1.0
> > bound to vfio_ap, i.e. 1.1 is missing for a reason different than
> > belonging to the default drivers (for what exact reason no idea).
> 
> I'm not quite sure what you mean be "we have queue". I will
> assume you mean those queues are bound to the vfio_ap
> device driver. 

Yes, this is exactly what I've meant.


> The only way this could happen is if somebody
> manually unbinds queue 1.1.
> 

Assuming that:
1) every time we observe ap_perm the ap subsystem in in a settled state
(i.e. not in a middle of pushing things left and right
because of an ap_perm change, 
2) the only non-default driver is vfio_ap, and that
3) queues handle non-operational states by other means than dissapearing
(should be the case with the latest reworks)
I agree what is left is manual unbind, which I lean towards considering
an edge case.

If this is indeed just about that edge case, maybe we can live with a
simpler algorithm than this one.


> > Let's suppose we started with the matix containing only adapter
> > 0 (0.) and domain 0 (.0).
> >
> > After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
> > matrix:
> > 0.0 0.1
> > 1.0 1.1
> > guest_matrix:
> > 0.0 0.1
> > while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
> > with:
> > matrix:
> > 0.0 0.1
> > 1.0 1.1
> > guest_matrix:
> > 0.0
> > 0.1
> >
> > That means, the set of bound queues and the set of assigned resources do
> > not fully determine the set of resources passed through to the guest.
> >
> > Is that a deliberate design choice?
> 
> Yes, it is a deliberate choice to only allow guest access to queues
> represented by queue devices bound to the vfio_ap device driver.
> The idea here is to adhere to the linux device model.
> 

This is not what I've asked. My question was about he fact that
reordering assignments gives different results. Well this was kind
of the case before as well, with the notable difference, that in a
past we always had an error. So if a full sequence of assignments could
be performed without an error, than any permutation would be performed
with the exact same result.

I'm all for only allowing guest access to queues represented by queue
devices bound to the vfio_ap device driver. I'm concerned with the
permutation (and calculus).

> >
> >> * When the domain number of a control domain is assigned to a matrix mdev
> >>    in use by a KVM guest, the control domain will be hot plugged into the
> >>    KVM guest.
> >>
> >> * When the domain number of a control domain is unassigned from a matrix
> >>    mdev in use by a KVM guest, the control domain will be hot unplugged
> >>    from the KVM guest.
> >>
> >> Signed-off-by: Tony Krowiak<akrowiak@linux.ibm.com>
> >> ---

[..]

> >> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
> >> +					   unsigned long apid)
> >> +{
> >> +	unsigned long apqi, apqn;
> >> +
> >> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
> >> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
> >> +		return false;
> >> +
> >> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> >> +		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
> > Hm. Let's say we have the same situation regarding the bound queues as
> > above but we start with the empty matrix, and do all the assignments
> > while the guest is running.
> >
> > Consider the following sequence of actions.
> >
> > 1) echo 0 > assign_domain
> 
> matrix:            .0
> guest_matrix: no APQNs
> 
> > 2) echo 1 > assign_domain
> 
> matrix:            .0, .1
> guest_matrix: no APQNs
> 
> > 3) echo 1 > assign_adapter
> 
> matrix:           1.0, 1.1
> guest_matrix: 1.0
> 
> > 4) echo 0 > assign_adapter
> 
> matrix:           0.0, 0.1, 1.0, 1.1
> guest_matrix: 0.0, 1.0
> > 5) echo 1 > unassign_adapter
> 
> matrix:           0.0, 0.1
> guest_matrix: 0.0
> 
> > I understand that at 3), because
> > bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
> > aqm containing just domain 0, as queue 1.1 ain't bound to us.
> 
> True
> 
> > Thus at the end we would have
> > matrix:
> > 0.0 0.1
> > guest_matrix:
> > 0.0
> 
> At the end I had:
> matrix:            0.0, 0.1
> guest_matrix: 0.0
> 
> > And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
> > then after
> > 6) echo 2 > assign_adapter
> > we get
> > Thus at the end we would have
> > matrix:
> > 0.0 0.1
> > 2.0 2.1
> > guest_matrix:
> > 0.0
> > 2.0
> >
> > This looks very quirky to me. Did I read the code wrong? Opinions?
> 
> You read the code correctly and I agree, this is a bit quirky. I would say
> that after adding adapter 2, we should end up with guest matrix:
> 0.0, 0.1
> 2.0, 2.1
> 
> If you agree, I'll make the adjustment.
> 

I do agree, but maybe we should discuss what adjustments do you have in
mind.

[..]

> >> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
> >> +					     unsigned long apid)
> >> +{
> >> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
> >> +		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
> >> +			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >> +
> >> +			/*
> >> +			 * If there are no APIDs assigned to the guest, then
> >> +			 * the guest will not have access to any queues, so
> >> +			 * let's also go ahead and unassign the APQIs. Keeping
> >> +			 * them around may yield unpredictable results during
> >> +			 * a probe that is not related to a host AP
> >> +			 * configuration change (i.e., an AP adapter is
> >> +			 * configured online).
> >> +			 */
> > I don't quite understand this comment. Clearing out the other mask when
> > the other one becomes empty, does allow us to recover the full possible guest
> > matrix in the scenario described above. I don't see any shadow
> > manipulation in the probe handler at this stage. Are we maybe
> > talking about the same effect as I described for assign?
> 
> Patch 15/16 is for the probe.
> 

I still don't understand the logic, but I guess we want to make
adjustments anyways, so maybe I don't have to.

Regards,
Halil

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-10-05 18:30       ` Halil Pasic
@ 2020-10-05 21:48         ` Tony Krowiak
  2020-10-05 23:05         ` Tony Krowiak
  1 sibling, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-10-05 21:48 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 10/5/20 2:30 PM, Halil Pasic wrote:
> On Mon, 5 Oct 2020 12:24:39 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 9/27/20 9:01 PM, Halil Pasic wrote:
>>> On Fri, 21 Aug 2020 15:56:11 -0400
>>> Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
>>>
>>>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>>>> unassigned from an AP matrix mdev device while it is in use by a guest per
>>>> the following:
>>>>
>>>> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>>>>     guest, the adapter will be hot plugged into the KVM guest as long as each
>>>>     APQN derived from the Cartesian product of the APID being assigned and
>>>>     the APQIs already assigned to the guest's CRYCB references a queue device
>>>>     bound to the vfio_ap device driver.
>>>>
>>>> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>>>>     KVM guest, the adapter will be hot unplugged from the KVM guest.
>>>>
>>>> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>>>>     guest, the domain will be hot plugged into the KVM guest as long as each
>>>>     APQN derived from the Cartesian product of the APQI being assigned and
>>>>     the APIDs already assigned to the guest's CRYCB references a queue device
>>>>     bound to the vfio_ap device driver.
>>>>
>>>> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>>>>     KVM guest, the domain will be hot unplugged from the KVM guest
>>> Hm, I suppose this means that what your guest effectively gets may depend
>>> on whether assign_domain or assign_adapter is done first.
>>>
>>> Suppose we have the queues
>>> 0.0 0.1
>>> 1.0
>>> bound to vfio_ap, i.e. 1.1 is missing for a reason different than
>>> belonging to the default drivers (for what exact reason no idea).
>> I'm not quite sure what you mean be "we have queue". I will
>> assume you mean those queues are bound to the vfio_ap
>> device driver.
> Yes, this is exactly what I've meant.
>
>
>> The only way this could happen is if somebody
>> manually unbinds queue 1.1.
>>
> Assuming that:
> 1) every time we observe ap_perm the ap subsystem in in a settled state
> (i.e. not in a middle of pushing things left and right
> because of an ap_perm change,
> 2) the only non-default driver is vfio_ap, and that
> 3) queues handle non-operational states by other means than dissapearing
> (should be the case with the latest reworks)
> I agree what is left is manual unbind, which I lean towards considering
> an edge case.
>
> If this is indeed just about that edge case, maybe we can live with a
> simpler algorithm than this one.

The simplest algorithm is:
* Forego hot plug of an adapter if any APQN derived from the adapter's
    APID and the APQIs of the domains assigned to the matrix mdev 
references a
    queue device not bound to the vfio_ap device driver. In your 
scenario, this
    would result in no queues for the guest at 3) despite the fact that 
1.0 is bound
    to the vfio_ap dd.

As you stated, this is an edge case, so maybe this would be sufficient. 
Do you
have any alternative algorithm that makes sense?

Also, keep in mind that one of my goals with this design was to avoid 
leaving
the guest without any queues if possible. Maybe that is not a critical 
requirement.

>
>
>>> Let's suppose we started with the matix containing only adapter
>>> 0 (0.) and domain 0 (.0).
>>>
>>> After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0 0.1
>>> while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
>>> with:
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0
>>> 0.1
>>>
>>> That means, the set of bound queues and the set of assigned resources do
>>> not fully determine the set of resources passed through to the guest.
>>>
>>> Is that a deliberate design choice?
>> Yes, it is a deliberate choice to only allow guest access to queues
>> represented by queue devices bound to the vfio_ap device driver.
>> The idea here is to adhere to the linux device model.
>>
> This is not what I've asked. My question was about he fact that
> reordering assignments gives different results. Well this was kind
> of the case before as well, with the notable difference, that in a
> past we always had an error. So if a full sequence of assignments could
> be performed without an error, than any permutation would be performed
> with the exact same result.

I'm not sure that the exact same result can be achieved regardless
of the order of assignments - other than possibly the simple algorithm
I suggested above - it is something I would have to think about some
more.

Another option is to call the filtering mechanism introduced
in patch 8/16 in which case the results will be consistent with the
configuration of the shadow_apcb. This would be the same
configuration as if the guest were started after the assignment.

>
> I'm all for only allowing guest access to queues represented by queue
> devices bound to the vfio_ap device driver. I'm concerned with the
> permutation (and calculus).
>
>>>> * When the domain number of a control domain is assigned to a matrix mdev
>>>>     in use by a KVM guest, the control domain will be hot plugged into the
>>>>     KVM guest.
>>>>
>>>> * When the domain number of a control domain is unassigned from a matrix
>>>>     mdev in use by a KVM guest, the control domain will be hot unplugged
>>>>     from the KVM guest.
>>>>
>>>> Signed-off-by: Tony Krowiak<akrowiak@linux.ibm.com>
>>>> ---
> [..]
>
>>>> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> +					   unsigned long apid)
>>>> +{
>>>> +	unsigned long apqi, apqn;
>>>> +
>>>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>>>> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>>>> +		return false;
>>>> +
>>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> +		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
>>> Hm. Let's say we have the same situation regarding the bound queues as
>>> above but we start with the empty matrix, and do all the assignments
>>> while the guest is running.
>>>
>>> Consider the following sequence of actions.
>>>
>>> 1) echo 0 > assign_domain
>> matrix:            .0
>> guest_matrix: no APQNs
>>
>>> 2) echo 1 > assign_domain
>> matrix:            .0, .1
>> guest_matrix: no APQNs
>>
>>> 3) echo 1 > assign_adapter
>> matrix:           1.0, 1.1
>> guest_matrix: 1.0
>>
>>> 4) echo 0 > assign_adapter
>> matrix:           0.0, 0.1, 1.0, 1.1
>> guest_matrix: 0.0, 1.0
>>> 5) echo 1 > unassign_adapter
>> matrix:           0.0, 0.1
>> guest_matrix: 0.0
>>
>>> I understand that at 3), because
>>> bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
>>> aqm containing just domain 0, as queue 1.1 ain't bound to us.
>> True
>>
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> guest_matrix:
>>> 0.0
>> At the end I had:
>> matrix:            0.0, 0.1
>> guest_matrix: 0.0
>>
>>> And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
>>> then after
>>> 6) echo 2 > assign_adapter
>>> we get
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> 2.0 2.1
>>> guest_matrix:
>>> 0.0
>>> 2.0
>>>
>>> This looks very quirky to me. Did I read the code wrong? Opinions?
>> You read the code correctly and I agree, this is a bit quirky. I would say
>> that after adding adapter 2, we should end up with guest matrix:
>> 0.0, 0.1
>> 2.0, 2.1
>>
>> If you agree, I'll make the adjustment.
>>
> I do agree, but maybe we should discuss what adjustments do you have in
> mind.
>
> [..]
>
>>>> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> +					     unsigned long apid)
>>>> +{
>>>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>>>> +		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>>>> +			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> +			/*
>>>> +			 * If there are no APIDs assigned to the guest, then
>>>> +			 * the guest will not have access to any queues, so
>>>> +			 * let's also go ahead and unassign the APQIs. Keeping
>>>> +			 * them around may yield unpredictable results during
>>>> +			 * a probe that is not related to a host AP
>>>> +			 * configuration change (i.e., an AP adapter is
>>>> +			 * configured online).
>>>> +			 */
>>> I don't quite understand this comment. Clearing out the other mask when
>>> the other one becomes empty, does allow us to recover the full possible guest
>>> matrix in the scenario described above. I don't see any shadow
>>> manipulation in the probe handler at this stage. Are we maybe
>>> talking about the same effect as I described for assign?
>> Patch 15/16 is for the probe.
>>
> I still don't understand the logic, but I guess we want to make
> adjustments anyways, so maybe I don't have to.
>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  2020-10-05 18:30       ` Halil Pasic
  2020-10-05 21:48         ` Tony Krowiak
@ 2020-10-05 23:05         ` Tony Krowiak
  1 sibling, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-10-05 23:05 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor

I proposed two algorithms in my last response. The following summarizes the
results from executing your scenario:

bound queues:
0.0
0.1

1.0

2.0
2.1

algorithm: use filtering on assign/unassign
scenario:
echo 0 > assign_domain
echo 1 > assign_domain
echo 1 > assign_adapter

matrix:
1.0
1.1
guest_matrix:
1.0

echo 0 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
guest_matrix:
0.0
0.1

echo 1 > unassign_adapter
0.0
0.1
guest_matrix:
0.0
0.1

echo 2 > assign_adapter

matrix:
0.0
0.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

echo 1 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

algorithm: do not plug adapter if all assigned APQNs are not bound
scenario:
echo 0 > assign_domain
echo 1 > assign_domain
echo 1 > assign_adapter

matrix:
1.0
1.1
guest_matrix:
no bits set

echo 0 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
guest_matrix:
0.0
0.1

echo 1 > unassign_adapter
0.0
0.1
guest_matrix:
0.0
0.1

echo 2 > assign_adapter

matrix:
0.0
0.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

echo 1 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

On 10/5/20 2:30 PM, Halil Pasic wrote:
> On Mon, 5 Oct 2020 12:24:39 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 9/27/20 9:01 PM, Halil Pasic wrote:
>>> On Fri, 21 Aug 2020 15:56:11 -0400
>>> Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
>>>
>>>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>>>> unassigned from an AP matrix mdev device while it is in use by a guest per
>>>> the following:
>>>>
>>>> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>>>>     guest, the adapter will be hot plugged into the KVM guest as long as each
>>>>     APQN derived from the Cartesian product of the APID being assigned and
>>>>     the APQIs already assigned to the guest's CRYCB references a queue device
>>>>     bound to the vfio_ap device driver.
>>>>
>>>> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>>>>     KVM guest, the adapter will be hot unplugged from the KVM guest.
>>>>
>>>> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>>>>     guest, the domain will be hot plugged into the KVM guest as long as each
>>>>     APQN derived from the Cartesian product of the APQI being assigned and
>>>>     the APIDs already assigned to the guest's CRYCB references a queue device
>>>>     bound to the vfio_ap device driver.
>>>>
>>>> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>>>>     KVM guest, the domain will be hot unplugged from the KVM guest
>>> Hm, I suppose this means that what your guest effectively gets may depend
>>> on whether assign_domain or assign_adapter is done first.
>>>
>>> Suppose we have the queues
>>> 0.0 0.1
>>> 1.0
>>> bound to vfio_ap, i.e. 1.1 is missing for a reason different than
>>> belonging to the default drivers (for what exact reason no idea).
>> I'm not quite sure what you mean be "we have queue". I will
>> assume you mean those queues are bound to the vfio_ap
>> device driver.
> Yes, this is exactly what I've meant.
>
>
>> The only way this could happen is if somebody
>> manually unbinds queue 1.1.
>>
> Assuming that:
> 1) every time we observe ap_perm the ap subsystem in in a settled state
> (i.e. not in a middle of pushing things left and right
> because of an ap_perm change,
> 2) the only non-default driver is vfio_ap, and that
> 3) queues handle non-operational states by other means than dissapearing
> (should be the case with the latest reworks)
> I agree what is left is manual unbind, which I lean towards considering
> an edge case.
>
> If this is indeed just about that edge case, maybe we can live with a
> simpler algorithm than this one.
>
>
>>> Let's suppose we started with the matix containing only adapter
>>> 0 (0.) and domain 0 (.0).
>>>
>>> After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0 0.1
>>> while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
>>> with:
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0
>>> 0.1
>>>
>>> That means, the set of bound queues and the set of assigned resources do
>>> not fully determine the set of resources passed through to the guest.
>>>
>>> Is that a deliberate design choice?
>> Yes, it is a deliberate choice to only allow guest access to queues
>> represented by queue devices bound to the vfio_ap device driver.
>> The idea here is to adhere to the linux device model.
>>
> This is not what I've asked. My question was about he fact that
> reordering assignments gives different results. Well this was kind
> of the case before as well, with the notable difference, that in a
> past we always had an error. So if a full sequence of assignments could
> be performed without an error, than any permutation would be performed
> with the exact same result.
>
> I'm all for only allowing guest access to queues represented by queue
> devices bound to the vfio_ap device driver. I'm concerned with the
> permutation (and calculus).
>
>>>> * When the domain number of a control domain is assigned to a matrix mdev
>>>>     in use by a KVM guest, the control domain will be hot plugged into the
>>>>     KVM guest.
>>>>
>>>> * When the domain number of a control domain is unassigned from a matrix
>>>>     mdev in use by a KVM guest, the control domain will be hot unplugged
>>>>     from the KVM guest.
>>>>
>>>> Signed-off-by: Tony Krowiak<akrowiak@linux.ibm.com>
>>>> ---
> [..]
>
>>>> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> +					   unsigned long apid)
>>>> +{
>>>> +	unsigned long apqi, apqn;
>>>> +
>>>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>>>> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>>>> +		return false;
>>>> +
>>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> +		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
>>> Hm. Let's say we have the same situation regarding the bound queues as
>>> above but we start with the empty matrix, and do all the assignments
>>> while the guest is running.
>>>
>>> Consider the following sequence of actions.
>>>
>>> 1) echo 0 > assign_domain
>> matrix:            .0
>> guest_matrix: no APQNs
>>
>>> 2) echo 1 > assign_domain
>> matrix:            .0, .1
>> guest_matrix: no APQNs
>>
>>> 3) echo 1 > assign_adapter
>> matrix:           1.0, 1.1
>> guest_matrix: 1.0
>>
>>> 4) echo 0 > assign_adapter
>> matrix:           0.0, 0.1, 1.0, 1.1
>> guest_matrix: 0.0, 1.0
>>> 5) echo 1 > unassign_adapter
>> matrix:           0.0, 0.1
>> guest_matrix: 0.0
>>
>>> I understand that at 3), because
>>> bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
>>> aqm containing just domain 0, as queue 1.1 ain't bound to us.
>> True
>>
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> guest_matrix:
>>> 0.0
>> At the end I had:
>> matrix:            0.0, 0.1
>> guest_matrix: 0.0
>>
>>> And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
>>> then after
>>> 6) echo 2 > assign_adapter
>>> we get
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> 2.0 2.1
>>> guest_matrix:
>>> 0.0
>>> 2.0
>>>
>>> This looks very quirky to me. Did I read the code wrong? Opinions?
>> You read the code correctly and I agree, this is a bit quirky. I would say
>> that after adding adapter 2, we should end up with guest matrix:
>> 0.0, 0.1
>> 2.0, 2.1
>>
>> If you agree, I'll make the adjustment.
>>
> I do agree, but maybe we should discuss what adjustments do you have in
> mind.
>
> [..]
>
>>>> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> +					     unsigned long apid)
>>>> +{
>>>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>>>> +		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>>>> +			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> +			/*
>>>> +			 * If there are no APIDs assigned to the guest, then
>>>> +			 * the guest will not have access to any queues, so
>>>> +			 * let's also go ahead and unassign the APQIs. Keeping
>>>> +			 * them around may yield unpredictable results during
>>>> +			 * a probe that is not related to a host AP
>>>> +			 * configuration change (i.e., an AP adapter is
>>>> +			 * configured online).
>>>> +			 */
>>> I don't quite understand this comment. Clearing out the other mask when
>>> the other one becomes empty, does allow us to recover the full possible guest
>>> matrix in the scenario described above. I don't see any shadow
>>> manipulation in the probe handler at this stage. Are we maybe
>>> talking about the same effect as I described for assign?
>> Patch 15/16 is for the probe.
>>
> I still don't understand the logic, but I guess we want to make
> adjustments anyways, so maybe I don't have to.
>
> Regards,
> Halil


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification
  2020-09-28  1:38   ` Halil Pasic
@ 2020-10-12 20:53     ` Tony Krowiak
  2020-10-12 21:27     ` Tony Krowiak
  1 sibling, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-10-12 20:53 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 9/27/20 9:38 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:13 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Implements the driver callback invoked by the AP bus when the host
>> AP configuration has changed. Since this callback is invoked prior to
>> unbinding a device from its device driver, the vfio_ap driver will
>> respond by unplugging the AP adapters, domains and control domains
>> removed from the host's AP configuration from the guests using them.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reported-by: kernel test robot <lkp@intel.com>
> Looks reasonable, but shouldn't vfio_ap_mdev_remove_queue() already
> have code that kicks the queue from the shadow at this stage?
>
> I mean if the removal is for a reason different that host config change,
> we wont update the guest_matrix or?
>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     |   5 +-
>>   drivers/s390/crypto/vfio_ap_ops.c     | 147 ++++++++++++++++++++++++--
>>   drivers/s390/crypto/vfio_ap_private.h |   7 +-
>>   3 files changed, 146 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index aae5b3d8e3fa..ea0a7603e886 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -115,9 +115,11 @@ static int vfio_ap_matrix_dev_create(void)
>>   
>>   	/* Fill in config info via PQAP(QCI), if available */
>>   	if (test_facility(12)) {
>> -		ret = ap_qci(&matrix_dev->info);
>> +		ret = ap_qci(&matrix_dev->config_info);
>>   		if (ret)
>>   			goto matrix_alloc_err;
>> +		memcpy(&matrix_dev->config_info_prev, &matrix_dev->config_info,
>> +		       sizeof(struct ap_config_info));
>>   	}
>>   
>>   	mutex_init(&matrix_dev->lock);
>> @@ -177,6 +179,7 @@ static int __init vfio_ap_init(void)
>>   	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
>>   	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>>   	vfio_ap_drv.ids = ap_queue_ids;
>> +	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
>>   
>>   	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
>>   	if (ret) {
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 2b01a8eb6ee7..e002d556abab 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -347,7 +347,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   	}
>>   
>>   	matrix_mdev->mdev = mdev;
>> -	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info,
>> +			    &matrix_mdev->shadow_apcb);
>>   	hash_init(matrix_mdev->qtable);
>>   	mdev_set_drvdata(mdev, matrix_mdev);
>>   	matrix_mdev->pqap_hook.hook = handle_pqap;
>> @@ -526,8 +528,8 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>>   		 * If the APID is not assigned to the host AP configuration,
>>   		 * we can not assign it to the guest's AP configuration
>>   		 */
>> -		if (!test_bit_inv(apid,
>> -				  (unsigned long *)matrix_dev->info.apm)) {
>> +		if (!test_bit_inv(apid, (unsigned long *)
>> +				  matrix_dev->config_info.apm)) {
>>   			clear_bit_inv(apid, shadow_apcb->apm);
>>   			continue;
>>   		}
>> @@ -540,7 +542,7 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>>   			 * guest's AP configuration
>>   			 */
>>   			if (!test_bit_inv(apqi, (unsigned long *)
>> -					  matrix_dev->info.aqm)) {
>> +					  matrix_dev->config_info.aqm)) {
>>   				clear_bit_inv(apqi, shadow_apcb->aqm);
>>   				continue;
>>   			}
>> @@ -594,7 +596,7 @@ static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>>   	int napm, naqm;
>>   	struct ap_matrix shadow_apcb;
>>   
>> -	vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info, &shadow_apcb);
>>   	napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
>>   	naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
>>   
>> @@ -741,7 +743,7 @@ static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
>>   
>>   	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>>   		if (!test_bit_inv(apqi,
>> -				  (unsigned long *) matrix_dev->info.aqm))
>> +				  (unsigned long *)matrix_dev->config_info.aqm))
>>   			clear_bit_inv(apqi, aqm);
>>   
>>   		apqn = AP_MKQID(apid, apqi);
>> @@ -764,7 +766,7 @@ static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>   	unsigned long apqi, apqn;
>>   
>>   	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> -	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->config_info.apm))
>>   		return false;
>>   
>>   	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>> @@ -931,8 +933,8 @@ static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
>>   	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
>>   
>>   	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> -		if (!test_bit_inv(apid,
>> -				  (unsigned long *) matrix_dev->info.apm))
>> +		if (!test_bit_inv(apid, (unsigned long *)
>> +				  matrix_dev->config_info.apm))
>>   			clear_bit_inv(apqi, apm);
>>   
>>   		apqn = AP_MKQID(apid, apqi);
>> @@ -955,7 +957,7 @@ static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>>   	unsigned long apid, apqn;
>>   
>>   	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> -	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
>> +	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->config_info.aqm))
>>   		return false;
>>   
>>   	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> @@ -1702,7 +1704,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
>>   void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
>>   {
>>   	struct vfio_ap_queue *q;
>> -	int apid, apqi;
>> +	unsigned long apid, apqi;
>>   
> Unrelated?

Yes, I'll remove it.

>
>>   	mutex_lock(&matrix_dev->lock);
>>   	q = dev_get_drvdata(&queue->ap_dev.device);
>> @@ -1727,3 +1729,126 @@ bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>>   
>>   	return in_use;
>>   }
>> +
>> +/**
>> + * vfio_ap_mdev_unassign_apids
>> + *
>> + * @matrix_mdev: The matrix mediated device
>> + *
>> + * @aqm: A bitmap with 256 bits. Each bit in the map represents an APID from 0
>> + *	 to 255 (with the leftmost bit corresponding to APID 0).
>> + *
>> + * Unassigns each APID specified in @aqm that is assigned to the shadow CRYCB
>> + * of @matrix_mdev. Returns true if at least one APID is unassigned; otherwise,
>> + * returns false.
>> + */
>> +static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
>> +					unsigned long *apm_unassign)
>> +{
>> +	unsigned long apid;
>> +	bool unassigned = false;
>> +
>> +	/*
>> +	 * If the matrix mdev is not in use by a KVM guest, return indicating
>> +	 * that no APIDs have been unassigned.
>> +	 */
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +		return false;
>> +
>> +	for_each_set_bit_inv(apid, apm_unassign, AP_DEVICES) {
>> +		unassigned |= vfio_ap_mdev_unassign_guest_apid(matrix_mdev,
>> +							       apid);
>> +	}
> I guess, we could accomplish the unassign with operations operating on
> full bitmaps (without looping over bits), but I have no strong opinion
> here.
>
>> +
>> +	return unassigned;
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_unassign_apqis
>> + *
>> + * @matrix_mdev: The matrix mediated device
>> + *
>> + * @aqm: A bitmap with 256 bits. Each bit in the map represents an APQI from 0
>> + *	 to 255 (with the leftmost bit corresponding to APQI 0).
>> + *
>> + * Unassigns each APQI specified in @aqm that is assigned to the shadow CRYCB
>> + * of @matrix_mdev. Returns true if at least one APQI is unassigned; otherwise,
>> + * returns false.
>> + */
>> +static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
>> +					unsigned long *aqm_unassign)
>> +{
>> +	unsigned long apqi;
>> +	bool unassigned = false;
>> +
>> +	/*
>> +	 * If the matrix mdev is not in use by a KVM guest, return indicating
>> +	 * that no APQIs have been unassigned.
>> +	 */
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +		return false;
>> +
>> +	for_each_set_bit_inv(apqi, aqm_unassign, AP_DOMAINS) {
>> +		unassigned |= vfio_ap_mdev_unassign_guest_apqi(matrix_mdev,
>> +							       apqi);
>> +	}
>> +
>> +	return unassigned;
>> +}
>> +
>> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>> +			    struct ap_config_info *old_config_info)
>> +{
>> +	bool unassigned;
>> +	int ap_remove, aq_remove;
>> +	struct ap_matrix_mdev *matrix_mdev;
>> +	DECLARE_BITMAP(apm_unassign, AP_DEVICES);
>> +	DECLARE_BITMAP(aqm_unassign, AP_DOMAINS);
>> +
>> +	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
>> +
>> +	if (matrix_dev->flags & AP_MATRIX_CFG_CHG) {
>> +		WARN_ONCE(1, "AP host configuration change already reported");
>> +		return;
>> +	}
>> +
>> +	memcpy(&matrix_dev->config_info, new_config_info,
>> +	       sizeof(struct ap_config_info));
>> +	memcpy(&matrix_dev->config_info_prev, old_config_info,
>> +	       sizeof(struct ap_config_info));
>> +
>> +	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>> +	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>> +	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>> +	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>> +
>> +	ap_remove = bitmap_andnot(apm_unassign, prev_apm, cur_apm, AP_DEVICES);
>> +	aq_remove = bitmap_andnot(aqm_unassign, prev_aqm, cur_aqm, AP_DOMAINS);
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	matrix_dev->flags |= AP_MATRIX_CFG_CHG;
>> +
>> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
>> +		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +			continue;
>> +
>> +		unassigned = false;
>> +
>> +		if (ap_remove)
>> +			if (bitmap_intersects(matrix_mdev->shadow_apcb.apm,
>> +					      apm_unassign, AP_DEVICES))
>> +				if (vfio_ap_mdev_unassign_apids(matrix_mdev,
>> +								apm_unassign))
> This can be done with a single "if".
>
> if (A)
> 	if (B)
> 		if (C)
> 			D;
>
> should be equivalent with
> if (A && B && C)
> 	D;
> and your wouldn't end up that deep indentation. It is a style thing,
> so unless regulated by the official coding style, it is up to you :)
>
>
>> +					unassigned = true;
>> +		if (aq_remove)
>> +			if (bitmap_intersects(matrix_mdev->shadow_apcb.aqm,
>> +					      aqm_unassign, AP_DOMAINS))
>> +				if (vfio_ap_mdev_unassign_apqis(matrix_mdev,
>> +								aqm_unassign))
>> +					unassigned = true;
>> +
>> +		if (unassigned)
>> +			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +	}
>> +	mutex_unlock(&matrix_dev->lock);
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 055bce6d45db..fc8629e28ad3 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -40,10 +40,13 @@
>>   struct ap_matrix_dev {
>>   	struct device device;
>>   	atomic_t available_instances;
>> -	struct ap_config_info info;
>> +	struct ap_config_info config_info;
>> +	struct ap_config_info config_info_prev;
>>   	struct list_head mdev_list;
>>   	struct mutex lock;
>>   	struct ap_driver  *vfio_ap_drv;
>> +	#define AP_MATRIX_CFG_CHG (1UL << 0)
>> +	unsigned long flags;
>>   };
>>   
>>   extern struct ap_matrix_dev *matrix_dev;
>> @@ -108,5 +111,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue);
>>   void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
>>   
>>   bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>> +			    struct ap_config_info *old_config_info);
>>   
>>   #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification
  2020-09-28  1:38   ` Halil Pasic
  2020-10-12 20:53     ` Tony Krowiak
@ 2020-10-12 21:27     ` Tony Krowiak
  1 sibling, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-10-12 21:27 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor, kernel test robot



On 9/27/20 9:38 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:13 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Implements the driver callback invoked by the AP bus when the host
>> AP configuration has changed. Since this callback is invoked prior to
>> unbinding a device from its device driver, the vfio_ap driver will
>> respond by unplugging the AP adapters, domains and control domains
>> removed from the host's AP configuration from the guests using them.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reported-by: kernel test robot <lkp@intel.com>
> Looks reasonable, but shouldn't vfio_ap_mdev_remove_queue() already
> have code that kicks the queue from the shadow at this stage?
>
> I mean if the removal is for a reason different that host config change,
> we wont update the guest_matrix or?

This patch specifically handles AP configuration change notification.
The idea behind this notification is that if a configuration change results
in one or more queues getting removed from the guest, it can be done
in bulk before each of the queues is unbound by the AP bus. That way
any cleanup (e.g., resets etc.) can be performed before the bus gets
control.

Also, keep in mind that an unbind can take place for reasons other than
an AP configuration change:
1. Manual unbind of a queue
2. Deconfiguration of an adapter
3. Adapter is broken (e.g., CHECKSTOP)
If the queue is being unbound as a result of an AP configuration
change, the vfio_ap_remove_queue() function will ignore the
unbind because it has already been handled by this on_config_changed
callback prior to the unbind operation (see patch 15/16).
>
>> ---
>>   drivers/s390/crypto/vfio_ap_drv.c     |   5 +-
>>   drivers/s390/crypto/vfio_ap_ops.c     | 147 ++++++++++++++++++++++++--
>>   drivers/s390/crypto/vfio_ap_private.h |   7 +-
>>   3 files changed, 146 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>> index aae5b3d8e3fa..ea0a7603e886 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -115,9 +115,11 @@ static int vfio_ap_matrix_dev_create(void)
>>   
>>   	/* Fill in config info via PQAP(QCI), if available */
>>   	if (test_facility(12)) {
>> -		ret = ap_qci(&matrix_dev->info);
>> +		ret = ap_qci(&matrix_dev->config_info);
>>   		if (ret)
>>   			goto matrix_alloc_err;
>> +		memcpy(&matrix_dev->config_info_prev, &matrix_dev->config_info,
>> +		       sizeof(struct ap_config_info));
>>   	}
>>   
>>   	mutex_init(&matrix_dev->lock);
>> @@ -177,6 +179,7 @@ static int __init vfio_ap_init(void)
>>   	vfio_ap_drv.remove = vfio_ap_queue_dev_remove;
>>   	vfio_ap_drv.in_use = vfio_ap_mdev_resource_in_use;
>>   	vfio_ap_drv.ids = ap_queue_ids;
>> +	vfio_ap_drv.on_config_changed = vfio_ap_on_cfg_changed;
>>   
>>   	ret = ap_driver_register(&vfio_ap_drv, THIS_MODULE, VFIO_AP_DRV_NAME);
>>   	if (ret) {
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 2b01a8eb6ee7..e002d556abab 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -347,7 +347,9 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
>>   	}
>>   
>>   	matrix_mdev->mdev = mdev;
>> -	vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info, &matrix_mdev->matrix);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info,
>> +			    &matrix_mdev->shadow_apcb);
>>   	hash_init(matrix_mdev->qtable);
>>   	mdev_set_drvdata(mdev, matrix_mdev);
>>   	matrix_mdev->pqap_hook.hook = handle_pqap;
>> @@ -526,8 +528,8 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>>   		 * If the APID is not assigned to the host AP configuration,
>>   		 * we can not assign it to the guest's AP configuration
>>   		 */
>> -		if (!test_bit_inv(apid,
>> -				  (unsigned long *)matrix_dev->info.apm)) {
>> +		if (!test_bit_inv(apid, (unsigned long *)
>> +				  matrix_dev->config_info.apm)) {
>>   			clear_bit_inv(apid, shadow_apcb->apm);
>>   			continue;
>>   		}
>> @@ -540,7 +542,7 @@ static int vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
>>   			 * guest's AP configuration
>>   			 */
>>   			if (!test_bit_inv(apqi, (unsigned long *)
>> -					  matrix_dev->info.aqm)) {
>> +					  matrix_dev->config_info.aqm)) {
>>   				clear_bit_inv(apqi, shadow_apcb->aqm);
>>   				continue;
>>   			}
>> @@ -594,7 +596,7 @@ static bool vfio_ap_mdev_config_shadow_apcb(struct ap_matrix_mdev *matrix_mdev)
>>   	int napm, naqm;
>>   	struct ap_matrix shadow_apcb;
>>   
>> -	vfio_ap_matrix_init(&matrix_dev->info, &shadow_apcb);
>> +	vfio_ap_matrix_init(&matrix_dev->config_info, &shadow_apcb);
>>   	napm = bitmap_weight(matrix_mdev->matrix.apm, AP_DEVICES);
>>   	naqm = bitmap_weight(matrix_mdev->matrix.aqm, AP_DOMAINS);
>>   
>> @@ -741,7 +743,7 @@ static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
>>   
>>   	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>>   		if (!test_bit_inv(apqi,
>> -				  (unsigned long *) matrix_dev->info.aqm))
>> +				  (unsigned long *)matrix_dev->config_info.aqm))
>>   			clear_bit_inv(apqi, aqm);
>>   
>>   		apqn = AP_MKQID(apid, apqi);
>> @@ -764,7 +766,7 @@ static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>   	unsigned long apqi, apqn;
>>   
>>   	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> -	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->config_info.apm))
>>   		return false;
>>   
>>   	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>> @@ -931,8 +933,8 @@ static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
>>   	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
>>   
>>   	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> -		if (!test_bit_inv(apid,
>> -				  (unsigned long *) matrix_dev->info.apm))
>> +		if (!test_bit_inv(apid, (unsigned long *)
>> +				  matrix_dev->config_info.apm))
>>   			clear_bit_inv(apqi, apm);
>>   
>>   		apqn = AP_MKQID(apid, apqi);
>> @@ -955,7 +957,7 @@ static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>>   	unsigned long apid, apqn;
>>   
>>   	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> -	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
>> +	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->config_info.aqm))
>>   		return false;
>>   
>>   	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> @@ -1702,7 +1704,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue)
>>   void vfio_ap_mdev_remove_queue(struct ap_queue *queue)
>>   {
>>   	struct vfio_ap_queue *q;
>> -	int apid, apqi;
>> +	unsigned long apid, apqi;
>>   
> Unrelated?

Yes, I'll remove it.

>
>>   	mutex_lock(&matrix_dev->lock);
>>   	q = dev_get_drvdata(&queue->ap_dev.device);
>> @@ -1727,3 +1729,126 @@ bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
>>   
>>   	return in_use;
>>   }
>> +
>> +/**
>> + * vfio_ap_mdev_unassign_apids
>> + *
>> + * @matrix_mdev: The matrix mediated device
>> + *
>> + * @aqm: A bitmap with 256 bits. Each bit in the map represents an APID from 0
>> + *	 to 255 (with the leftmost bit corresponding to APID 0).
>> + *
>> + * Unassigns each APID specified in @aqm that is assigned to the shadow CRYCB
>> + * of @matrix_mdev. Returns true if at least one APID is unassigned; otherwise,
>> + * returns false.
>> + */
>> +static bool vfio_ap_mdev_unassign_apids(struct ap_matrix_mdev *matrix_mdev,
>> +					unsigned long *apm_unassign)
>> +{
>> +	unsigned long apid;
>> +	bool unassigned = false;
>> +
>> +	/*
>> +	 * If the matrix mdev is not in use by a KVM guest, return indicating
>> +	 * that no APIDs have been unassigned.
>> +	 */
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +		return false;
>> +
>> +	for_each_set_bit_inv(apid, apm_unassign, AP_DEVICES) {
>> +		unassigned |= vfio_ap_mdev_unassign_guest_apid(matrix_mdev,
>> +							       apid);
>> +	}
> I guess, we could accomplish the unassign with operations operating on
> full bitmaps (without looping over bits), but I have no strong opinion
> here.

Yes we can and will.

>
>> +
>> +	return unassigned;
>> +}
>> +
>> +/**
>> + * vfio_ap_mdev_unassign_apqis
>> + *
>> + * @matrix_mdev: The matrix mediated device
>> + *
>> + * @aqm: A bitmap with 256 bits. Each bit in the map represents an APQI from 0
>> + *	 to 255 (with the leftmost bit corresponding to APQI 0).
>> + *
>> + * Unassigns each APQI specified in @aqm that is assigned to the shadow CRYCB
>> + * of @matrix_mdev. Returns true if at least one APQI is unassigned; otherwise,
>> + * returns false.
>> + */
>> +static bool vfio_ap_mdev_unassign_apqis(struct ap_matrix_mdev *matrix_mdev,
>> +					unsigned long *aqm_unassign)
>> +{
>> +	unsigned long apqi;
>> +	bool unassigned = false;
>> +
>> +	/*
>> +	 * If the matrix mdev is not in use by a KVM guest, return indicating
>> +	 * that no APQIs have been unassigned.
>> +	 */
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +		return false;
>> +
>> +	for_each_set_bit_inv(apqi, aqm_unassign, AP_DOMAINS) {
>> +		unassigned |= vfio_ap_mdev_unassign_guest_apqi(matrix_mdev,
>> +							       apqi);
>> +	}
>> +
>> +	return unassigned;
>> +}
>> +
>> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>> +			    struct ap_config_info *old_config_info)
>> +{
>> +	bool unassigned;
>> +	int ap_remove, aq_remove;
>> +	struct ap_matrix_mdev *matrix_mdev;
>> +	DECLARE_BITMAP(apm_unassign, AP_DEVICES);
>> +	DECLARE_BITMAP(aqm_unassign, AP_DOMAINS);
>> +
>> +	unsigned long *cur_apm, *cur_aqm, *prev_apm, *prev_aqm;
>> +
>> +	if (matrix_dev->flags & AP_MATRIX_CFG_CHG) {
>> +		WARN_ONCE(1, "AP host configuration change already reported");
>> +		return;
>> +	}
>> +
>> +	memcpy(&matrix_dev->config_info, new_config_info,
>> +	       sizeof(struct ap_config_info));
>> +	memcpy(&matrix_dev->config_info_prev, old_config_info,
>> +	       sizeof(struct ap_config_info));
>> +
>> +	cur_apm = (unsigned long *)matrix_dev->config_info.apm;
>> +	cur_aqm = (unsigned long *)matrix_dev->config_info.aqm;
>> +	prev_apm = (unsigned long *)matrix_dev->config_info_prev.apm;
>> +	prev_aqm = (unsigned long *)matrix_dev->config_info_prev.aqm;
>> +
>> +	ap_remove = bitmap_andnot(apm_unassign, prev_apm, cur_apm, AP_DEVICES);
>> +	aq_remove = bitmap_andnot(aqm_unassign, prev_aqm, cur_aqm, AP_DOMAINS);
>> +
>> +	mutex_lock(&matrix_dev->lock);
>> +	matrix_dev->flags |= AP_MATRIX_CFG_CHG;
>> +
>> +	list_for_each_entry(matrix_mdev, &matrix_dev->mdev_list, node) {
>> +		if (!vfio_ap_mdev_has_crycb(matrix_mdev))
>> +			continue;
>> +
>> +		unassigned = false;
>> +
>> +		if (ap_remove)
>> +			if (bitmap_intersects(matrix_mdev->shadow_apcb.apm,
>> +					      apm_unassign, AP_DEVICES))
>> +				if (vfio_ap_mdev_unassign_apids(matrix_mdev,
>> +								apm_unassign))
> This can be done with a single "if".
>
> if (A)
> 	if (B)
> 		if (C)
> 			D;
>
> should be equivalent with
> if (A && B && C)
> 	D;
> and your wouldn't end up that deep indentation. It is a style thing,
> so unless regulated by the official coding style, it is up to you :)

I will simplify it.

>
>
>> +					unassigned = true;
>> +		if (aq_remove)
>> +			if (bitmap_intersects(matrix_mdev->shadow_apcb.aqm,
>> +					      aqm_unassign, AP_DOMAINS))
>> +				if (vfio_ap_mdev_unassign_apqis(matrix_mdev,
>> +								aqm_unassign))
>> +					unassigned = true;
>> +
>> +		if (unassigned)
>> +			vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +	}
>> +	mutex_unlock(&matrix_dev->lock);
>> +}
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
>> index 055bce6d45db..fc8629e28ad3 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -40,10 +40,13 @@
>>   struct ap_matrix_dev {
>>   	struct device device;
>>   	atomic_t available_instances;
>> -	struct ap_config_info info;
>> +	struct ap_config_info config_info;
>> +	struct ap_config_info config_info_prev;
>>   	struct list_head mdev_list;
>>   	struct mutex lock;
>>   	struct ap_driver  *vfio_ap_drv;
>> +	#define AP_MATRIX_CFG_CHG (1UL << 0)
>> +	unsigned long flags;
>>   };
>>   
>>   extern struct ap_matrix_dev *matrix_dev;
>> @@ -108,5 +111,7 @@ int vfio_ap_mdev_probe_queue(struct ap_queue *queue);
>>   void vfio_ap_mdev_remove_queue(struct ap_queue *queue);
>>   
>>   bool vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
>> +void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
>> +			    struct ap_config_info *old_config_info);
>>   
>>   #endif /* _VFIO_AP_PRIVATE_H_ */


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support
  2020-09-28  2:48   ` Halil Pasic
@ 2020-10-16 16:36     ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-10-16 16:36 UTC (permalink / raw)
  To: Halil Pasic
  Cc: linux-s390, linux-kernel, kvm, freude, borntraeger, cohuck,
	mjrosato, alex.williamson, kwankhede, fiuczy, frankja, david,
	imbrenda, hca, gor



On 9/27/20 10:48 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:16 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Update the documentation in vfio-ap.rst to include information about the
>> AP dynamic configuration support (i.e., hot plug of adapters, domains
>> and control domains via the matrix mediated device's sysfs assignment
>> attributes).
> If you don't mind I would like to skip out on commenting on the
> documentation update, because of the design issues I've raised. I think
> we should first clear that up. Is that OK with you?

That's fine since the docs will likely have to change.

>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   Documentation/s390/vfio-ap.rst | 362 ++++++++++++++++++++++++++-------
>>   1 file changed, 285 insertions(+), 77 deletions(-)
>>
>> diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst
>> index e15436599086..8907aeca8fb7 100644
>> --- a/Documentation/s390/vfio-ap.rst
>> +++ b/Documentation/s390/vfio-ap.rst
>> @@ -253,7 +253,7 @@ The process for reserving an AP queue for use by a KVM guest is:
>>   1. The administrator loads the vfio_ap device driver
>>   2. The vfio-ap driver during its initialization will register a single 'matrix'
>>      device with the device core. This will serve as the parent device for
>> -   all mediated matrix devices used to configure an AP matrix for a guest.
>> +   all matrix mediated devices used to configure an AP matrix for a guest.
>>   3. The /sys/devices/vfio_ap/matrix device is created by the device core
>>   4. The vfio_ap device driver will register with the AP bus for AP queue devices
>>      of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap
>> @@ -269,7 +269,7 @@ The process for reserving an AP queue for use by a KVM guest is:
>>      default zcrypt cex4queue driver.
>>   8. The AP bus probes the vfio_ap device driver to bind the queues reserved for
>>      it.
>> -9. The administrator creates a passthrough type mediated matrix device to be
>> +9. The administrator creates a passthrough type matrix mediated device to be
>>      used by a guest
>>   10. The administrator assigns the adapters, usage domains and control domains
>>       to be exclusively used by a guest.
>> @@ -279,14 +279,14 @@ Set up the VFIO mediated device interfaces
>>   The VFIO AP device driver utilizes the common interface of the VFIO mediated
>>   device core driver to:
>>   
>> -* Register an AP mediated bus driver to add a mediated matrix device to and
>> +* Register an AP mediated bus driver to add a matrix mediated device to and
>>     remove it from a VFIO group.
>> -* Create and destroy a mediated matrix device
>> -* Add a mediated matrix device to and remove it from the AP mediated bus driver
>> -* Add a mediated matrix device to and remove it from an IOMMU group
>> +* Create and destroy a matrix mediated device
>> +* Add a matrix mediated device to and remove it from the AP mediated bus driver
>> +* Add a matrix mediated device to and remove it from an IOMMU group
>>   
>>   The following high-level block diagram shows the main components and interfaces
>> -of the VFIO AP mediated matrix device driver::
>> +of the VFIO AP matrix mediated device driver::
>>   
>>      +-------------+
>>      |             |
>> @@ -351,29 +351,37 @@ matrix device.
>>       This attribute group identifies the user-defined sysfs attributes of the
>>       mediated device. When a device is registered with the VFIO mediated device
>>       framework, the sysfs attribute files identified in the 'mdev_attr_groups'
>> -    structure will be created in the mediated matrix device's directory. The
>> -    sysfs attributes for a mediated matrix device are:
>> +    structure will be created in the matrix mediated device's directory. The
>> +    sysfs attributes for a matrix mediated device are:
>>   
>>       assign_adapter / unassign_adapter:
>>         Write-only attributes for assigning/unassigning an AP adapter to/from the
>> -      mediated matrix device. To assign/unassign an adapter, the APID of the
>> +      matrix mediated device. To assign/unassign an adapter, the APID of the
>>         adapter is echoed to the respective attribute file.
>>       assign_domain / unassign_domain:
>>         Write-only attributes for assigning/unassigning an AP usage domain to/from
>> -      the mediated matrix device. To assign/unassign a domain, the domain
>> +      the matrix mediated device. To assign/unassign a domain, the domain
>>         number of the usage domain is echoed to the respective attribute
>>         file.
>>       matrix:
>> -      A read-only file for displaying the APQNs derived from the cross product
>> -      of the adapter and domain numbers assigned to the mediated matrix device.
>> +      A read-only file for displaying the APQNs derived from the Cartesian
>> +      product of the adapter and domain numbers assigned to the mediated matrix
>> +      device.
>> +    guest_matrix:
>> +      A read-only file for displaying the APQNs derived from the Cartesian
>> +      product of the adapter and domain numbers assigned to the APM and AQM
>> +      fields respectively of the KVM guest's CRYCB. This will differ from the
>> +      matrix if any APQNs assigned to the matrix mediated device do not
>> +      reference a queue device bound to the vfio_ap device driver (i.e., the
>> +      queue is not in the AP configuration).
>>       assign_control_domain / unassign_control_domain:
>>         Write-only attributes for assigning/unassigning an AP control domain
>> -      to/from the mediated matrix device. To assign/unassign a control domain,
>> +      to/from the matrix mediated device. To assign/unassign a control domain,
>>         the ID of the domain to be assigned/unassigned is echoed to the respective
>>         attribute file.
>>       control_domains:
>>         A read-only file for displaying the control domain numbers assigned to the
>> -      mediated matrix device.
>> +      matrix mediated device.
>>   
>>   * functions:
>>   
>> @@ -385,7 +393,7 @@ matrix device.
>>         domains assigned via the corresponding sysfs attributes files
>>   
>>     remove:
>> -    deallocates the mediated matrix device's ap_matrix_mdev structure. This will
>> +    deallocates the matrix mediated device's ap_matrix_mdev structure. This will
>>       be allowed only if a running guest is not using the mdev.
>>   
>>   * callback interfaces
>> @@ -397,7 +405,7 @@ matrix device.
>>       for the mdev matrix device to the MDEV bus. Access to the KVM structure used
>>       to configure the KVM guest is provided via this callback. The KVM structure,
>>       is used to configure the guest's access to the AP matrix defined via the
>> -    mediated matrix device's sysfs attribute files.
>> +    matrix mediated device's sysfs attribute files.
>>     release:
>>       unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
>>       mdev matrix device and deconfigures the guest's AP matrix.
>> @@ -410,11 +418,49 @@ function is called when QEMU connects to KVM. The guest's AP matrix is
>>   configured via it's CRYCB by:
>>   
>>   * Setting the bits in the APM corresponding to the APIDs assigned to the
>> -  mediated matrix device via its 'assign_adapter' interface.
>> +  matrix mediated device via its 'assign_adapter' interface.
>>   * Setting the bits in the AQM corresponding to the domains assigned to the
>> -  mediated matrix device via its 'assign_domain' interface.
>> +  matrix mediated device via its 'assign_domain' interface.
>>   * Setting the bits in the ADM corresponding to the domain dIDs assigned to the
>> -  mediated matrix device via its 'assign_control_domains' interface.
>> +  matrix mediated device via its 'assign_control_domains' interface.
>> +
>> +The linux device model precludes passing a device through to a KVM guest that
>> +is not bound to the device driver facilitating its pass-through. Consequently,
>> +an APQN that does not reference a queue device bound to the vfio_ap device
>> +driver will not be assigned to a KVM guest's CRYCB. The AP architecture,
>> +however, does not provide a means to filter individual APQNs from the guest's
>> +CRYCB, so the following logic is employed to filter them:
>> +
>> +* Filter the APQNs assigned to the matrix mediated device by APID.
>> +
>> +  To filter APQNs by APID, each APQN derived from the Cartesian product of the
>> +  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
>> +  examined and if any one of them does not reference a queue device bound to the
>> +  vfio_ap device driver, the adapter will not be plugged into the guest (i.e.,
>> +  the bit corresponding to its APID will not be set in the APM of the guest's
>> +  CRYCB).
>> +
>> +  If at least one adapter is plugged into the guest, then all domains assigned
>> +  to the mdev will also be plugged into the guest (i.e., the bits corresponding
>> +  to the APQIs of the domains assigned to the mdev will be set in the AQM field
>> +  of the guest's CRYCB).
>> +
>> +* Filter the APQNs assigned to the matrix mediated device by APQI.
>> +
>> +  The APQNs will be filtered by APQI if filtering by APID does not result in any
>> +  adapters or domains getting plugged into the guest.
>> +
>> +  To filter APQNs by APQI, each APQN derived from the Cartesian product of the
>> +  adapter numbers (APID) and domain numbers (APQI) assigned to the mdev is
>> +  examined and if any one of them does not reference a queue device bound to the
>> +  vfio_ap device driver, the domain will not be plugged into the guest (i.e.,
>> +  the bit corresponding to its APQI will not be set in the AQM of the guest's
>> +  CRYCB).
>> +
>> +  If at least one domain is plugged into the guest, then all adapters assigned
>> +  to the mdev will also be plugged into the guest (i.e., the bits corresponding
>> +  to the APIDs of the adapters assigned to the mdev will be set in the APM field
>> +  of the guest's CRYCB).
>>   
>>   The CPU model features for AP
>>   -----------------------------
>> @@ -435,6 +481,10 @@ available to a KVM guest via the following CPU model features:
>>      can be made available to the guest only if it is available on the host (i.e.,
>>      facility bit 12 is set).
>>   
>> +4. apqi: Indicates AP queue interrupts are available on the guest. This facility
>> +   can be made available to the guest only if it is available on the host (i.e.,
>> +   facility bit 65 is set).
>> +
>>   Note: If the user chooses to specify a CPU model different than the 'host'
>>   model to QEMU, the CPU model features and facilities need to be turned on
>>   explicitly; for example::
>> @@ -444,7 +494,7 @@ explicitly; for example::
>>   A guest can be precluded from using AP features/facilities by turning them off
>>   explicitly; for example::
>>   
>> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off
>> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off
>>   
>>   Note: If the APFT facility is turned off (apft=off) for the guest, the guest
>>   will not see any AP devices. The zcrypt device drivers that register for type 10
>> @@ -530,40 +580,56 @@ These are the steps:
>>   
>>   2. Secure the AP queues to be used by the three guests so that the host can not
>>      access them. To secure them, there are two sysfs files that specify
>> -   bitmasks marking a subset of the APQN range as 'usable by the default AP
>> -   queue device drivers' or 'not usable by the default device drivers' and thus
>> -   available for use by the vfio_ap device driver'. The location of the sysfs
>> -   files containing the masks are::
>> +   bitmasks marking a subset of the APQN range as usable only by the default AP
>> +   queue device drivers. All remaining APQNs are available available for use by
>> +   any other device driver. The vfio_ap device driver is currently the only
>> +   non-default device driver. The location of the sysfs files containing the
>> +   masks are::
>>   
>>        /sys/bus/ap/apmask
>>        /sys/bus/ap/aqmask
>>   
>>      The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs
>> -   (APID). Each bit in the mask, from left to right (i.e., from most significant
>> -   to least significant bit in big endian order), corresponds to an APID from
>> -   0-255. If a bit is set, the APID is marked as usable only by the default AP
>> -   queue device drivers; otherwise, the APID is usable by the vfio_ap
>> -   device driver.
>> +   (APID). Each bit in the mask, from left to right corresponds to an APID from
>> +   0-255. If a bit is set, the APID is marked as available to the default AP
>> +   queue device drivers.
>>   
>>      The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes
>> -   (APQI). Each bit in the mask, from left to right (i.e., from most significant
>> -   to least significant bit in big endian order), corresponds to an APQI from
>> -   0-255. If a bit is set, the APQI is marked as usable only by the default AP
>> -   queue device drivers; otherwise, the APQI is usable by the vfio_ap device
>> -   driver.
>> +   (APQI). Each bit in the mask, from left to right corresponds to an APQI from
>> +   0-255. If a bit is set, the APQI is marked as available to the default AP
>> +   queue device drivers.
>> +
>> +   The Cartesian product of the APIDs corresponding to the bits set in the
>> +   apmask and the APQIs corresponding to the bits set in the aqmask comprise
>> +   the subset of APQNs that can be used only by the host default device drivers.
>> +   All other APQNs are available to the non-default device drivers such as the
>> +   vfio_ap driver.
>> +
>> +   Take, for example, the following masks::
>> +
>> +      apmask:
>> +      0x7d00000000000000000000000000000000000000000000000000000000000000
>> +
>> +      aqmask:
>> +      0x8000000000000000000000000000000000000000000000000000000000000000
>> +
>> +   The masks indicate:
>>   
>> -   Take, for example, the following mask::
>> +   * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default
>> +     device drivers.
>>   
>> -      0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
>> +   * Domain 0 is available for use by the host default device drivers
>>   
>> -    It indicates:
>> +   * The subset of APQNs available for use only by the default host device
>> +     drivers are:
>>   
>> -      1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6
>> -      belong to the vfio_ap device driver's pool.
>> +     (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0)
>> +
>> +   * All other APQNs are available for use by the non-default device drivers.
>>   
>>      The APQN of each AP queue device assigned to the linux host is checked by the
>> -   AP bus against the set of APQNs derived from the cross product of APIDs
>> -   and APQIs marked as usable only by the default AP queue device drivers. If a
>> +   AP bus against the set of APQNs derived from the Cartesian product of APIDs
>> +   and APQIs marked as available to the default AP queue device drivers. If a
>>      match is detected,  only the default AP queue device drivers will be probed;
>>      otherwise, the vfio_ap device driver will be probed.
>>   
>> @@ -627,6 +693,16 @@ These are the steps:
>>   	    default drivers pool:    adapter 0-15, domain 1
>>   	    alternate drivers pool:  adapter 16-255, domains 0, 2-255
>>   
>> +   Note ***:
>> +   Changing a mask such that one or more APQNs will be taken from a matrix
>> +   mediated device (see below) will fail with an error (EADDRINUSE). The error
>> +   is logged to the kernel ring buffer which can be viewed with the 'dmesg'
>> +   command. The output identifies each APQN flagged as 'in use' and the matrix
>> +   mediated device to which it is assigned; for example:
>> +
>> +   Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804
>> +   Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4
>> +
>>   Securing the APQNs for our example
>>   ----------------------------------
>>      To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047,
>> @@ -684,7 +760,7 @@ Securing the APQNs for our example
>>   
>>        /sys/devices/vfio_ap/matrix/
>>        --- [mdev_supported_types]
>> -     ------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
>> +     ------ [vfio_ap-passthrough] (passthrough matrix mediated device type)
>>        --------- create
>>        --------- [devices]
>>   
>> @@ -775,17 +851,18 @@ Securing the APQNs for our example
>>        higher than the maximum is specified, the operation will terminate with
>>        an error (ENODEV).
>>   
>> -   * All APQNs that can be derived from the adapter ID and the IDs of
>> -     the previously assigned domains must be bound to the vfio_ap device
>> -     driver. If no domains have yet been assigned, then there must be at least
>> -     one APQN with the specified APID bound to the vfio_ap driver. If no such
>> -     APQNs are bound to the driver, the operation will terminate with an
>> -     error (EADDRNOTAVAIL).
>> +   * All APQNs that can be derived from the Cartesian product of the APID of the
>> +     adapter being assigned and the APQIs of the previously assigned domains
>> +     must be available to the vfio_ap device driver as specified in the sysfs
>> +     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
>> +     is reserved for use by the host device driver, the operation will terminate
>> +     with an error (EADDRNOTAVAIL).
>>   
>> -     No APQN that can be derived from the adapter ID and the IDs of the
>> -     previously assigned domains can be assigned to another mediated matrix
>> -     device. If an APQN is assigned to another mediated matrix device, the
>> -     operation will terminate with an error (EADDRINUSE).
>> +   * No APQN that can be derived from the Cartesian product of the APID of the
>> +     adapter being assigned and the APQIs of the previously assigned domains can
>> +     be assigned to another matrix mediated device. If even one APQN is assigned
>> +     to another matrix mediated device, the operation will terminate with an
>> +     error (EADDRINUSE).
>>   
>>      In order to successfully assign a domain:
>>   
>> @@ -794,17 +871,18 @@ Securing the APQNs for our example
>>        higher than the maximum is specified, the operation will terminate with
>>        an error (ENODEV).
>>   
>> -   * All APQNs that can be derived from the domain ID and the IDs of
>> -     the previously assigned adapters must be bound to the vfio_ap device
>> -     driver. If no domains have yet been assigned, then there must be at least
>> -     one APQN with the specified APQI bound to the vfio_ap driver. If no such
>> -     APQNs are bound to the driver, the operation will terminate with an
>> -     error (EADDRNOTAVAIL).
>> +   * All APQNs that can be derived from the Cartesian product of the APQI of the
>> +     domain being assigned and the APIDs of the previously assigned adapters
>> +     must be available to the vfio_ap device driver as specified in the sysfs
>> +     /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even one APQN
>> +     is reserved for use by the host device driver, the operation will terminate
>> +     with an error (EADDRNOTAVAIL).
>>   
>> -     No APQN that can be derived from the domain ID and the IDs of the
>> -     previously assigned adapters can be assigned to another mediated matrix
>> -     device. If an APQN is assigned to another mediated matrix device, the
>> -     operation will terminate with an error (EADDRINUSE).
>> +   * No APQN that can be derived from the Cartesian product of the APQI of the
>> +     domain being assigned and the APIDs of the previously assigned adapters can
>> +     be assigned to another matrix mediated device. If even one APQN is assigned
>> +     to another matrix mediated device, the operation will terminate with an
>> +     error (EADDRINUSE).
>>   
>>      In order to successfully assign a control domain, the domain number
>>      specified must represent a value from 0 up to the maximum domain number
>> @@ -813,22 +891,22 @@ Securing the APQNs for our example
>>   
>>   5. Start Guest1::
>>   
>> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
>> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
>>   	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ...
>>   
>>   7. Start Guest2::
>>   
>> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
>> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
>>   	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ...
>>   
>>   7. Start Guest3::
>>   
>> -     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \
>> +     /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \
>>   	-device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ...
>>   
>> -When the guest is shut down, the mediated matrix devices may be removed.
>> +When the guest is shut down, the matrix mediated devices may be removed.
>>   
>> -Using our example again, to remove the mediated matrix device $uuid1::
>> +Using our example again, to remove the matrix mediated device $uuid1::
>>   
>>      /sys/devices/vfio_ap/matrix/
>>         --- [mdev_supported_types]
>> @@ -851,16 +929,146 @@ remove it if no guest will use it during the remaining lifetime of the linux
>>   host. If the mdev matrix device is removed, one may want to also reconfigure
>>   the pool of adapters and queues reserved for use by the default drivers.
>>   
>> +Hot plug support:
>> +================
>> +An adapter, domain or control domain may be hot plugged into a running KVM
>> +guest by assigning it to the matrix mediated device being used by the guest.
>> +Control domains will always be hot plugged; however, an adapter or domain will
>> +be hot plugged only if each new APQN resulting from its assignment
>> +references a queue device bound to the vfio_ap device driver as described
>> +below.
>> +
>> +When an adapter is assigned to a matrix mediated device in use by a KVM guest:
>> +
>> +* If no domains have yet been plugged into the KVM guest:
>> +
>> +  Hot plug the adapter and every domain previously assigned to the mdev if each
>> +  APQN derived from the Cartesian product of the APID of the adapter being
>> +  assigned and the APQIs of the domains previously assigned references a queue
>> +  device bound to the vfio_ap device driver.
>> +
>> +* If one or more domains have previously been plugged into the guest:
>> +
>> +  Hot plug the adapter if each APQN derived from the Cartesian product of the
>> +  APID of the adapter being assigned and the APQIs of the domains already
>> +  plugged into the guest references a queue device bound to the vfio_ap device
>> +  driver.
>> +
>> +When a domain is assigned to a matrix mediated device in use by a KVM guest:
>> +
>> +* If no adapters have yet been plugged into the KVM guest:
>> +
>> +  Hot plug the domain and every adapter previously assigned to the mdev if each
>> +  APQN derived from the Cartesian product of the APIDs of the adapters
>> +  previously assigned and the APQI of the domain being assigned references a
>> +  queue device bound to the vfio_ap device driver.
>> +
>> +* If one or more adapters have previously been plugged into the guest:
>> +
>> +  Hot plug the domain if each APQN derived from the Cartesian product of the
>> +  APIDs of the adapters already plugged into the guest and the APQI of the
>> +  domain being assigned references a queue device bound to the vfio_ap device
>> +  driver.
>> +
>> +Over-provisioning of AP queues for a KVM guest:
>> +==============================================
>> +Over-provisioning is defined herein as the assignment of adapters or domains to
>> +a matrix mediated device that do not reference AP devices in the host's AP
>> +configuration. The idea here is that when the adapter or domain becomes
>> +available, it will be automatically hot-plugged into the KVM guest using
>> +the matrix mediated device to which it is assigned as long as each new APQN
>> +resulting from plugging it in references a queue device bound to the vfio_ap
>> +device driver.
>> +
>>   Limitations
>>   ===========
>> -* The KVM/kernel interfaces do not provide a way to prevent restoring an APQN
>> -  to the default drivers pool of a queue that is still assigned to a mediated
>> -  device in use by a guest. It is incumbent upon the administrator to
>> -  ensure there is no mediated device in use by a guest to which the APQN is
>> -  assigned lest the host be given access to the private data of the AP queue
>> -  device such as a private key configured specifically for the guest.
>> +Live guest migration is not supported for guests using AP devices without
>> +intervention by a system administrator. Before a KVM guest can be migrated,
>> +the matrix mediated device must be removed. Unfortunately, it can not be
>> +removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
>> +the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
>> +its mdev can be hot unplugged from the guest in one of two ways:
>> +
>> +1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
>> +   the following commands:
>> +
>> +      virsh detach-device <guestname> <path-to-device-xml>
>> +
>> +      For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
>> +      the guest named 'my-guest':
>> +
>> +         virsh detach-device my-guest ~/config/my-guest-hostdev.xml
>> +
>> +            The contents of my-guest-hostdev.xml:
>> +
>> +            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
>> +              <source>
>> +                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
>> +              </source>
>> +            </hostdev>
>> +
>> +
>> +      virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
>> +
>> +      For example, to hot unplug the matrix mediated device identified on the
>> +      qemu command line with 'id=hostdev0' from the guest named 'my-guest':
>> +
>> +         virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
>> +
>> +2. A matrix mediated device can be hot unplugged by attaching the qemu monitor
>> +   to the guest and using the following qemu monitor command:
>> +
>> +      (QEMU) device-del id=<device-id>
>> +
>> +      For example, to hot unplug the matrix mediated device that was specified
>> +      on the qemu command line with 'id=hostdev0' when the guest was started:
>> +
>> +         (QEMU) device-del id=hostdev0
>> +
>> +After live migration of the KVM guest completes, an AP configuration can be
>> +restored to the KVM guest by hot plugging a matrix mediated device on the target
>> +system into the guest in one of two ways:
>> +
>> +1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
>> +   device into the guest via the following virsh commands:
>> +
>> +   virsh attach-device <guestname> <path-to-device-xml>
>> +
>> +      For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
>> +      the guest named 'my-guest':
>> +
>> +         virsh attach-device my-guest ~/config/my-guest-hostdev.xml
>> +
>> +            The contents of my-guest-hostdev.xml:
>> +
>> +            <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
>> +              <source>
>> +                <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
>> +              </source>
>> +            </hostdev>
>> +
>> +
>> +   virsh qemu-monitor-command <guest-name> --hmp \
>> +   "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
>> +
>> +      For example, to hot plug the matrix mediated device
>> +      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
>> +      device-id hostdev0:
>> +
>> +      virsh qemu-monitor-command my-guest --hmp \
>> +      "device_add vfio-ap,\
>> +      sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
>> +      id=hostdev0"
>> +
>> +2. A matrix mediated device can be hot plugged by attaching the qemu monitor
>> +   to the guest and using the following qemu monitor command:
>> +
>> +      (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
>>   
>> -* Dynamically modifying the AP matrix for a running guest (which would amount to
>> -  hot(un)plug of AP devices for the guest) is currently not supported
>> +      For example, to plug the matrix mediated device
>> +      62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
>> +      hostdev0:
>>   
>> -* Live guest migration is not supported for guests using AP devices.
>> +         (QEMU) device-add "vfio-ap,\
>> +         sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
>> +         id=hostdev0"
>> \ No newline at end of file


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices
  2020-09-25  2:11       ` Halil Pasic
@ 2020-10-16 20:59         ` Tony Krowiak
  0 siblings, 0 replies; 79+ messages in thread
From: Tony Krowiak @ 2020-10-16 20:59 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, linux-s390, linux-kernel, kvm, freude,
	borntraeger, mjrosato, alex.williamson, kwankhede, fiuczy,
	frankja, david, imbrenda, hca, gor



On 9/24/20 10:11 PM, Halil Pasic wrote:
> On Thu, 27 Aug 2020 10:24:07 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 8/25/20 6:13 AM, Cornelia Huck wrote:
>>> On Fri, 21 Aug 2020 15:56:02 -0400
>>> Tony Krowiak<akrowiak@linux.ibm.com>  wrote:
>>>
>>>> This patch refactor's the vfio_ap device driver to use the AP bus's
>>> s/refactor's/refactors/
>> Of course, what was I thinking?:)
>>
>>>> ap_get_qdev() function to retrieve the vfio_ap_queue struct containing
>>>> information about a queue that is bound to the vfio_ap device driver.
>>>> The bus's ap_get_qdev() function retrieves the queue device from a
>>>> hashtable keyed by APQN. This is much more efficient than looping over
>>>> the list of devices attached to the AP bus by several orders of
>>>> magnitude.
>>>>
>>>> Signed-off-by: Tony Krowiak<akrowiak@linux.ibm.com>
>>>> Reported-by: kernel test robot<lkp@intel.com>
>>>> ---
>>>>    drivers/s390/crypto/vfio_ap_drv.c     | 27 ++-------
>>>>    drivers/s390/crypto/vfio_ap_ops.c     | 86 +++++++++++++++------------
>>>>    drivers/s390/crypto/vfio_ap_private.h |  8 ++-
>>>>    3 files changed, 59 insertions(+), 62 deletions(-)
>>>>
>>> (...)
>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index e0bde8518745..ad3925f04f61 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -26,43 +26,26 @@vfio_ap_get_queue()
>>>>    
>>>>    static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev);
>>>>    
>>>> -static int match_apqn(struct device *dev, const void *data)
>>>> -{
>>>> -	struct vfio_ap_queue *q = dev_get_drvdata(dev);
>>>> -
>>>> -	return (q->apqn == *(int *)(data)) ? 1 : 0;
>>>> -}
>>>> -
>>>>    /**
>>>> - * vfio_ap_get_queue: Retrieve a queue with a specific APQN from a list
>>>> - * @matrix_mdev: the associated mediated matrix
>>>> + * vfio_ap_get_queue: Retrieve a queue with a specific APQN.
>>>>     * @apqn: The queue APQN
>>>>     *
>>>> - * Retrieve a queue with a specific APQN from the list of the
>>>> - * devices of the vfio_ap_drv.
>>>> - * Verify that the APID and the APQI are set in the matrix.
>>>> + * Retrieve a queue with a specific APQN from the AP queue devices attached to
>>>> + * the AP bus.
>>>>     *
>>>> - * Returns the pointer to the associated vfio_ap_queue
>>>> + * Returns the pointer to the vfio_ap_queue with the specified APQN, or NULL.
>>>>     */
>>>> -static struct vfio_ap_queue *vfio_ap_get_queue(
>>>> -					struct ap_matrix_mdev *matrix_mdev,
>>>> -					int apqn)
>>>> +static struct vfio_ap_queue *vfio_ap_get_queue(unsigned long apqn)
>>>>    {
>>>> +	struct ap_queue *queue;
>>>>    	struct vfio_ap_queue *q;
>>>> -	struct device *dev;
>>>>    
>>>> -	if (!test_bit_inv(AP_QID_CARD(apqn), matrix_mdev->matrix.apm))
>>>> -		return NULL;
>>>> -	if (!test_bit_inv(AP_QID_QUEUE(apqn), matrix_mdev->matrix.aqm))
>>> I think you should add some explanation to the patch description why
>>> testing the matrix bitmasks is not needed anymore.
>> As a result of this comment, I took a closer look at the code to
>> determine the reason for eliminating the matrix_mdev
>> parameter. The reason is because the code below (i.e., find the device
>> and get the driver data) was also repeated in the vfio_ap_irq_disable_apqn()
>> function, so I replaced it with a call to the function above; however, the
>> vfio_ap_irq_disable_apqn() function  does not have a reference to the
>> matrix_mdev, so I eliminated the matrix_mdev parameter. Note that the
>> vfio_ap_irq_disable_apqn() is called for each APQN assigned to a matrix
>> mdev, so there is no need to test the bitmasks there.
>>
>> The other place from which the function above is called is
>> the handle_pqap() function which does have a reference to the
>> matrix_mdev. In order to ensure the integrity of the instruction
>> being intercepted - i.e., PQAP(AQIC) enable/disable IRQ for aN
>> AP queue - the testing of the matrix bitmasks probably ought to
>> be performed, so it will be done there instead of in the
>> vfio_ap_get_queue() function above.
> I'm a little confused. I do agree that in handle_pqap() we do want to
> make sure that we only operate on queues that belong to the given guest
> that issued the PQAP instruction.
>
> AFAICT with this patch set applied, this is not the case any more. Does
> that 'will be done there instead' refer to v11?

I understand your confusion, so here is what I'm going to do
to clear things up. I will leave the signature of the vfio_ap_get_queue()
function the same and leave in the bitmap checking. As per your
comment below, in patch 3 I will replace the call to
vfio_ap_get_queue() with a call to vfio_ap_get_mdev_queue().
Since the vfio_ap_get_mdev_queue() function is mdev-specific,
I can then remove the mdev parameter from the
vfio_ap_get_queue() function since it will no longer be needed.

> Another question is, can we use vfio_ap_get_mdev_queue() in
> handle_pqap() (instead of vfio_ap_get_queue())?

Yes, we can and should do that as it will eliminate both the need to
test the matrix bitmasks and several lines of code; however, that
function is not available until patch 3/16, so that change will be
made there.

>   
>>
>>> +	queue = ap_get_qdev(apqn);
>>> +	if (!queue)
>>>    		return NULL;
>>>    
>>> -	dev = driver_find_device(&matrix_dev->vfio_ap_drv->driver, NULL,
>>> -				 &apqn, match_apqn);
>>> -	if (!dev)
>>> -		return NULL;
>>> -	q = dev_get_drvdata(dev);
>>> -	q->matrix_mdev = matrix_mdev;
>>> -	put_device(dev);
>>> +	q = dev_get_drvdata(&queue->ap_dev.device);
>>> +	put_device(&queue->ap_dev.device);
>>>    
>>>    	return q;
>>>    }
>>> (...)
>>>


^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, back to index

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21 19:56 [PATCH v10 00/16] s390/vfio-ap: dynamic configuration support Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 01/16] s390/vfio-ap: add version vfio_ap module Tony Krowiak
2020-08-25 10:04   ` Cornelia Huck
2020-08-26 14:49     ` Tony Krowiak
2020-08-27 10:32       ` Cornelia Huck
2020-08-27 14:39         ` Tony Krowiak
2020-08-28  8:10           ` Cornelia Huck
2020-08-21 19:56 ` [PATCH v10 02/16] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
2020-08-25 10:13   ` Cornelia Huck
2020-08-27 14:24     ` Tony Krowiak
2020-08-28  8:13       ` Cornelia Huck
2020-08-28 15:10         ` Tony Krowiak
2020-09-25  2:11       ` Halil Pasic
2020-10-16 20:59         ` Tony Krowiak
2020-09-04  8:11   ` Christian Borntraeger
2020-09-08 18:54     ` Tony Krowiak
2020-09-25  2:27   ` Halil Pasic
2020-09-29 13:07     ` Tony Krowiak
2020-09-29 13:37       ` Halil Pasic
2020-09-29 20:57         ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 03/16] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
2020-08-25 10:25   ` Cornelia Huck
2020-08-28 23:05     ` Tony Krowiak
2020-09-04  8:15   ` Christian Borntraeger
2020-09-08 19:03     ` Tony Krowiak
2020-09-25  7:58   ` Halil Pasic
2020-08-21 19:56 ` [PATCH v10 04/16] s390/zcrypt: driver callback to indicate resource in use Tony Krowiak
2020-09-14 15:29   ` Cornelia Huck
2020-09-15 19:32     ` Tony Krowiak
2020-09-17 12:14       ` Cornelia Huck
2020-09-17 13:54         ` Tony Krowiak
2020-09-25  9:24   ` Halil Pasic
2020-09-29 13:59     ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 05/16] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
2020-09-14 15:31   ` Cornelia Huck
2020-09-25  9:29   ` Halil Pasic
2020-09-29 14:00     ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 06/16] s390/vfio-ap: introduce shadow APCB Tony Krowiak
2020-09-17 14:22   ` Cornelia Huck
2020-09-18 17:03     ` Tony Krowiak
2020-09-26  1:38   ` Halil Pasic
2020-09-29 16:04     ` Tony Krowiak
2020-09-29 16:19       ` Halil Pasic
2020-08-21 19:56 ` [PATCH v10 07/16] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
2020-09-17 14:34   ` Cornelia Huck
2020-09-18 17:09     ` Tony Krowiak
2020-09-26  7:16       ` Halil Pasic
2020-09-29 21:00         ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 08/16] s390/vfio-ap: filter matrix for unavailable queue devices Tony Krowiak
2020-09-26  8:24   ` Halil Pasic
2020-09-29 21:59     ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 09/16] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
2020-09-26 23:49   ` Halil Pasic
2020-09-30 12:59     ` Tony Krowiak
2020-09-30 22:29       ` Halil Pasic
2020-08-21 19:56 ` [PATCH v10 10/16] s390/vfio-ap: allow configuration of matrix mdev in use by a KVM guest Tony Krowiak
2020-09-27  0:03   ` Halil Pasic
2020-09-30 13:19     ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device Tony Krowiak
2020-09-28  1:01   ` Halil Pasic
2020-10-05 16:24     ` Tony Krowiak
2020-10-05 18:30       ` Halil Pasic
2020-10-05 21:48         ` Tony Krowiak
2020-10-05 23:05         ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 12/16] s390/zcrypt: Notify driver on config changed and scan complete callbacks Tony Krowiak
2020-09-27  1:39   ` Halil Pasic
2020-08-21 19:56 ` [PATCH v10 13/16] s390/vfio-ap: handle host AP config change notification Tony Krowiak
2020-09-28  1:38   ` Halil Pasic
2020-10-12 20:53     ` Tony Krowiak
2020-10-12 21:27     ` Tony Krowiak
2020-08-21 19:56 ` [PATCH v10 14/16] s390/vfio-ap: handle AP bus scan completed notification Tony Krowiak
2020-09-28  2:11   ` Halil Pasic
2020-08-21 19:56 ` [PATCH v10 15/16] s390/vfio-ap: handle probe/remove not due to host AP config changes Tony Krowiak
2020-09-28  2:45   ` Halil Pasic
2020-08-21 19:56 ` [PATCH v10 16/16] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
2020-08-25 10:45   ` Cornelia Huck
2020-08-31 18:34     ` Tony Krowiak
2020-09-28  2:48   ` Halil Pasic
2020-10-16 16:36     ` Tony Krowiak

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git